Model Card Tutorial

Even with the increasing use of ML systems, proper documentation of such systems has yet to gain traction. This has often been linked to the absence of standardized documentation procedures, which has recently led to the proposal of new documentation frameworks for ML models. One example includes the Model Card project, proposed by M. Mitchell et al. in Model Cards for Model Reporting (2019). This framework aims to allow more transparent model reporting by having the model creator document numerous suggested details (e.g., training algorithms, intended users, etc.).

To support this framework and contribute to the efforts towards transparent model reporting, we added a Model Card system to Tribuo. To decrease the workload on the developer's end and partially automate the process, the Model Card system uses Tribuo's built-in provenance to fill in the details relating to the model, its training, and its testing while also allowing the developer to explicitly specify the model's usage details either programmatically or by running a CLI.

Setup

We are going to load in the LibSVM anomaly detection and Model Card jars and also import a few packages. Note that the Model Card jar is written in Java 17, and so this tutorial requires Java 17 or later.

In [1]:
%jars ./tribuo-anomaly-libsvm-4.3.0-jar-with-dependencies.jar
%jars ./tribuo-modelcard-4.3.0-jar-with-dependencies.jar
In [2]:
import com.oracle.labs.mlrg.olcut.provenance.ProvenanceUtil;
import org.tribuo.MutableDataset;
import org.tribuo.interop.modelcard.ModelCard;
import org.tribuo.interop.modelcard.UsageDetailsBuilder;
import org.tribuo.anomaly.evaluation.AnomalyEvaluator;
import org.tribuo.anomaly.example.GaussianAnomalyDataSource;
import org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer;
import org.tribuo.anomaly.libsvm.SVMAnomalyType;
import org.tribuo.common.libsvm.KernelType;
import org.tribuo.common.libsvm.SVMParameters;

import java.io.File;
import java.io.IOException;
import java.util.Map;

Creating a Model Card for a Tribuo Model

The Model Card system only supports Tribuo models and is currently incompatible with external models. The information stored within a ModelCard is separated into four components: ModelDetails, TrainingDetails, TestingDetails, and UsageDetails. Construction of ModelDetails and TrainingDetails only relies on the built-in provenance of a Model instance while TestingDetails relies on the built-in provenance of an Evaluation instance and can be further augmented by a Map of testing metrics pre-selected by the user. In contrast, UsageDetails relies entirely on fields set by the user and can either be constructed programmatically using the UsageDetailsBuilder or interactively using the ModelCardCLI.

We will first re-create the anomaly detection model from the Anomaly Detection tutorial and then construct numerous ModelCard objects.

In [3]:
var trainData = new MutableDataset<>(new GaussianAnomalyDataSource(2000,0.0f, 1L));
var evalData = new MutableDataset<>(new GaussianAnomalyDataSource(2000,0.2f,2L));
var params = new SVMParameters<>(new SVMAnomalyType(SVMAnomalyType.SVMMode.ONE_CLASS), KernelType.RBF);
params.setGamma(1.0);
params.setNu(0.1);

var trainer = new LibSVMAnomalyTrainer(params);
var model = trainer.train(trainData);
var evaluator = new AnomalyEvaluator();
var evaluation = evaluator.evaluate(model,evalData);

System.out.println();
System.out.println(ProvenanceUtil.formattedProvenanceString(model.getProvenance()));
System.out.println();
System.out.println(ProvenanceUtil.formattedProvenanceString(evaluation.getProvenance()));
*
optimization finished, #iter = 653
obj = 289.5926348816893, rho = 3.144570476807895
nSV = 296, nBSV = 114

LibSVMModel(
	class-name = org.tribuo.common.libsvm.LibSVMModel
	dataset = MutableDataset(
			class-name = org.tribuo.MutableDataset
			datasource = GaussianAnomalyDataSource(
					class-name = org.tribuo.anomaly.example.GaussianAnomalyDataSource
					expectedMeans = List[
						1.0
						2.0
						1.0
						2.0
						5.0
					]
					anomalousMeans = List[
						-2.0
						2.0
						-2.0
						2.0
						-10.0
					]
					seed = 1
					numSamples = 2000
					fractionAnomalous = 0.0
					anomalousVariances = List[
						1.0
						0.5
						0.25
						1.0
						0.1
					]
					expectedVariances = List[
						1.0
						0.5
						0.25
						1.0
						0.1
					]
					host-short-name = DataSource
				)
			transformations = List[]
			is-sequence = false
			is-dense = true
			num-examples = 2000
			num-features = 5
			num-outputs = 2
			tribuo-version = 4.3.0
		)
	trainer = LibSVMAnomalyTrainer(
			class-name = org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer
			cost = 1.0
			coef0 = 0.0
			seed = 12345
			cache_size = 500.0
			probability = false
			nu = 0.1
			degree = 3
			eps = 0.001
			kernelType = RBF
			p = 0.1
			shrinking = true
			svmType = SVMAnomalyType(
					class-name = org.tribuo.anomaly.libsvm.SVMAnomalyType
					type = ONE_CLASS
					host-short-name = SVMType
				)
			gamma = 1.0
			tribuo-version = 4.3.0
			train-invocation-count = 0
			is-sequence = false
			host-short-name = Trainer
		)
	trained-at = 2022-10-07T12:03:06.539476091-04:00
	instance-values = Map{}
	tribuo-version = 4.3.0
	java-version = 17.0.4.1
	os-name = Linux
	os-arch = amd64
)

EvaluationProvenance(
	class-name = org.tribuo.provenance.EvaluationProvenance
	model-provenance = LibSVMModel(
			class-name = org.tribuo.common.libsvm.LibSVMModel
			dataset = MutableDataset(
					class-name = org.tribuo.MutableDataset
					datasource = GaussianAnomalyDataSource(
							class-name = org.tribuo.anomaly.example.GaussianAnomalyDataSource
							expectedMeans = List[
								1.0
								2.0
								1.0
								2.0
								5.0
							]
							anomalousMeans = List[
								-2.0
								2.0
								-2.0
								2.0
								-10.0
							]
							seed = 1
							numSamples = 2000
							fractionAnomalous = 0.0
							anomalousVariances = List[
								1.0
								0.5
								0.25
								1.0
								0.1
							]
							expectedVariances = List[
								1.0
								0.5
								0.25
								1.0
								0.1
							]
							host-short-name = DataSource
						)
					transformations = List[]
					is-sequence = false
					is-dense = true
					num-examples = 2000
					num-features = 5
					num-outputs = 2
					tribuo-version = 4.3.0
				)
			trainer = LibSVMAnomalyTrainer(
					class-name = org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer
					cost = 1.0
					coef0 = 0.0
					seed = 12345
					cache_size = 500.0
					probability = false
					nu = 0.1
					degree = 3
					eps = 0.001
					kernelType = RBF
					p = 0.1
					shrinking = true
					svmType = SVMAnomalyType(
							class-name = org.tribuo.anomaly.libsvm.SVMAnomalyType
							type = ONE_CLASS
							host-short-name = SVMType
						)
					gamma = 1.0
					tribuo-version = 4.3.0
					train-invocation-count = 0
					is-sequence = false
					host-short-name = Trainer
				)
			trained-at = 2022-10-07T12:03:06.539476091-04:00
			instance-values = Map{}
			tribuo-version = 4.3.0
			java-version = 17.0.4.1
			os-name = Linux
			os-arch = amd64
		)
	dataset-provenance = MutableDataset(
			class-name = org.tribuo.MutableDataset
			datasource = GaussianAnomalyDataSource(
					class-name = org.tribuo.anomaly.example.GaussianAnomalyDataSource
					expectedMeans = List[
						1.0
						2.0
						1.0
						2.0
						5.0
					]
					anomalousMeans = List[
						-2.0
						2.0
						-2.0
						2.0
						-10.0
					]
					seed = 2
					numSamples = 2000
					fractionAnomalous = 0.2
					anomalousVariances = List[
						1.0
						0.5
						0.25
						1.0
						0.1
					]
					expectedVariances = List[
						1.0
						0.5
						0.25
						1.0
						0.1
					]
					host-short-name = DataSource
				)
			transformations = List[]
			is-sequence = false
			is-dense = true
			num-examples = 2000
			num-features = 5
			num-outputs = 2
			tribuo-version = 4.3.0
		)
	tribuo-version = 4.3.0
)

At the very least, constructing a ModelCard requires a Model and its Evaluation. Providing only these two parameters results in a ModelCard without any testing metrics and a null UsageDetails.

In [4]:
ModelCard card1 = new ModelCard(model, evaluation);
System.out.println(card1.toString());
{
  "ModelDetails" : {
    "schema-version" : "1.0",
    "model-type" : "LibSVMAnomalyModel",
    "model-package" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyModel",
    "tribuo-version" : "4.3.0",
    "java-version" : "17.0.4.1",
    "configured-parameters" : {
      "cost" : "1.0",
      "coef0" : "0.0",
      "seed" : "12345",
      "cache_size" : "500.0",
      "probability" : "false",
      "nu" : "0.1",
      "train-invocation-count" : "0",
      "is-sequence" : "false",
      "degree" : "3",
      "eps" : "0.001",
      "host-short-name" : "Trainer",
      "class-name" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer",
      "kernelType" : "RBF",
      "p" : "0.1",
      "shrinking" : "true",
      "svmType" : {
        "type" : "ONE_CLASS",
        "host-short-name" : "SVMType",
        "class-name" : "org.tribuo.anomaly.libsvm.SVMAnomalyType"
      },
      "tribuo-version" : "4.3.0",
      "gamma" : "1.0"
    }
  },
  "TrainingDetails" : {
    "schema-version" : "1.0",
    "training-time" : "2022-10-07T12:03:06.539476091-04:00",
    "training-set-size" : 2000,
    "num-features" : 5,
    "features-list" : [ "A", "B", "C", "D", "E" ],
    "num-outputs" : 2,
    "outputs-distribution" : {
      "ANOMALOUS" : 0,
      "EXPECTED" : 2000
    }
  },
  "TestingDetails" : {
    "schema-version" : "1.0",
    "testing-set-size" : 2000,
    "metrics" : { }
  },
  "UsageDetails" : null
}

To include testing metrics in the TestingDetails section of a ModelCard, a map of the metrics (with the key being the metric description and the value being the metric value) can be provided as an additional parameter to the constructor. Note that the ModelCard constructor copies all the items from the parameter map rather than storing a reference to that map. This means that any changes applied to the parameter map after a ModelCard is constructed will not be present in the metrics map of the ModelCard.

In [5]:
Map<String, Double> testingMetrics = new HashMap<>();
testingMetrics.put("overall-precision", evaluation.getPrecision());
testingMetrics.put("overall-recall", evaluation.getRecall());

ModelCard card2 = new ModelCard(model, evaluation, testingMetrics);
System.out.println(card2);
{
  "ModelDetails" : {
    "schema-version" : "1.0",
    "model-type" : "LibSVMAnomalyModel",
    "model-package" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyModel",
    "tribuo-version" : "4.3.0",
    "java-version" : "17.0.4.1",
    "configured-parameters" : {
      "cost" : "1.0",
      "coef0" : "0.0",
      "seed" : "12345",
      "cache_size" : "500.0",
      "probability" : "false",
      "nu" : "0.1",
      "train-invocation-count" : "0",
      "is-sequence" : "false",
      "degree" : "3",
      "eps" : "0.001",
      "host-short-name" : "Trainer",
      "class-name" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer",
      "kernelType" : "RBF",
      "p" : "0.1",
      "shrinking" : "true",
      "svmType" : {
        "type" : "ONE_CLASS",
        "host-short-name" : "SVMType",
        "class-name" : "org.tribuo.anomaly.libsvm.SVMAnomalyType"
      },
      "tribuo-version" : "4.3.0",
      "gamma" : "1.0"
    }
  },
  "TrainingDetails" : {
    "schema-version" : "1.0",
    "training-time" : "2022-10-07T12:03:06.539476091-04:00",
    "training-set-size" : 2000,
    "num-features" : 5,
    "features-list" : [ "A", "B", "C", "D", "E" ],
    "num-outputs" : 2,
    "outputs-distribution" : {
      "ANOMALOUS" : 0,
      "EXPECTED" : 2000
    }
  },
  "TestingDetails" : {
    "schema-version" : "1.0",
    "testing-set-size" : 2000,
    "metrics" : {
      "overall-recall" : 1.0,
      "overall-precision" : 0.6357927786499215
    }
  },
  "UsageDetails" : null
}

In addition, we can construct a UsageDetails object using the UsageDetailsBuilder and specify details on the appropriate usage of our trained model. The UsageDetails object can then be provided as an additional parameter to the ModelCard constructor. Note that UsageDetailsBuilder will have the empty string or list as the default value for any of its fields, which will also get carried over to the UsageDetails object after building.

In [6]:
var builder = new UsageDetailsBuilder();
builder = builder.intendedUse("Anomaly detection")
                 .intendedUsers("ML learners")
                 .primaryContact("Alice");

ModelCard card3 = new ModelCard(model, evaluation, testingMetrics, builder.build());
System.out.println(card3);
{
  "ModelDetails" : {
    "schema-version" : "1.0",
    "model-type" : "LibSVMAnomalyModel",
    "model-package" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyModel",
    "tribuo-version" : "4.3.0",
    "java-version" : "17.0.4.1",
    "configured-parameters" : {
      "cost" : "1.0",
      "coef0" : "0.0",
      "seed" : "12345",
      "cache_size" : "500.0",
      "probability" : "false",
      "nu" : "0.1",
      "train-invocation-count" : "0",
      "is-sequence" : "false",
      "degree" : "3",
      "eps" : "0.001",
      "host-short-name" : "Trainer",
      "class-name" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer",
      "kernelType" : "RBF",
      "p" : "0.1",
      "shrinking" : "true",
      "svmType" : {
        "type" : "ONE_CLASS",
        "host-short-name" : "SVMType",
        "class-name" : "org.tribuo.anomaly.libsvm.SVMAnomalyType"
      },
      "tribuo-version" : "4.3.0",
      "gamma" : "1.0"
    }
  },
  "TrainingDetails" : {
    "schema-version" : "1.0",
    "training-time" : "2022-10-07T12:03:06.539476091-04:00",
    "training-set-size" : 2000,
    "num-features" : 5,
    "features-list" : [ "A", "B", "C", "D", "E" ],
    "num-outputs" : 2,
    "outputs-distribution" : {
      "ANOMALOUS" : 0,
      "EXPECTED" : 2000
    }
  },
  "TestingDetails" : {
    "schema-version" : "1.0",
    "testing-set-size" : 2000,
    "metrics" : {
      "overall-recall" : 1.0,
      "overall-precision" : 0.6357927786499215
    }
  },
  "UsageDetails" : {
    "schema-version" : "1.0",
    "intended-use" : "Anomaly detection",
    "intended-users" : "ML learners",
    "out-of-scope-uses" : [ ],
    "pre-processing-steps" : [ ],
    "considerations-list" : [ ],
    "relevant-factors-list" : [ ],
    "resources-list" : [ ],
    "primary-contact" : "Alice",
    "model-citation" : "",
    "model-license" : ""
  }
}

Alternatively, the UsageDetails can also be appended to a ModelCard after the construction of the ModelCard. This will require using the ModelCardCLI, which will launch a new shell and allow the user to interactively set the fields of UsageDetails. In the end, to append the UsageDetails to a desired ModelCard, the ModelCardCLI will require a file containing a serialized version of the desired ModelCard. We can easily create the serialized version of a ModelCard and save it to a file. The file containing the serialized ModelCard can also be deserialized to instantiate a ModelCard object.

In [7]:
File output = File.createTempFile("output", "json");
card3.saveToFile(output.toPath());

ModelCard card4 = ModelCard.deserializeFromJson(output.toPath());
System.out.println(card4);

output.delete();
{
  "ModelDetails" : {
    "schema-version" : "1.0",
    "model-type" : "LibSVMAnomalyModel",
    "model-package" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyModel",
    "tribuo-version" : "4.3.0",
    "java-version" : "17.0.4.1",
    "configured-parameters" : {
      "cost" : "1.0",
      "coef0" : "0.0",
      "seed" : "12345",
      "cache_size" : "500.0",
      "probability" : "false",
      "nu" : "0.1",
      "train-invocation-count" : "0",
      "is-sequence" : "false",
      "degree" : "3",
      "eps" : "0.001",
      "host-short-name" : "Trainer",
      "class-name" : "org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer",
      "kernelType" : "RBF",
      "p" : "0.1",
      "shrinking" : "true",
      "svmType" : {
        "type" : "ONE_CLASS",
        "host-short-name" : "SVMType",
        "class-name" : "org.tribuo.anomaly.libsvm.SVMAnomalyType"
      },
      "tribuo-version" : "4.3.0",
      "gamma" : "1.0"
    }
  },
  "TrainingDetails" : {
    "schema-version" : "1.0",
    "training-time" : "2022-10-07T12:03:06.539476091-04:00",
    "training-set-size" : 2000,
    "num-features" : 5,
    "features-list" : [ "A", "B", "C", "D", "E" ],
    "num-outputs" : 2,
    "outputs-distribution" : {
      "ANOMALOUS" : 0,
      "EXPECTED" : 2000
    }
  },
  "TestingDetails" : {
    "schema-version" : "1.0",
    "testing-set-size" : 2000,
    "metrics" : {
      "overall-precision" : 0.6357927786499215,
      "overall-recall" : 1.0
    }
  },
  "UsageDetails" : {
    "schema-version" : "1.0",
    "intended-use" : "Anomaly detection",
    "intended-users" : "ML learners",
    "out-of-scope-uses" : [ ],
    "pre-processing-steps" : [ ],
    "considerations-list" : [ ],
    "relevant-factors-list" : [ ],
    "resources-list" : [ ],
    "primary-contact" : "Alice",
    "model-citation" : "",
    "model-license" : ""
  }
}
Out[7]:
true

Note that ModelCardCLI will only append a UsageDetails to a ModelCard that has a null UsageDetails and will throw an error in all other cases. This means that a serialized ModelCard file that had its UsageDetails set programmatically with the UsageDetailsBuilder cannot be provided to the ModelCardCLI.

Conclusion

We showed how to create a partially automated documentation of a Tribuo ML model using the Model Card system. The system relies on Tribuo's built-in provenance to fill in the details relating to the model, its training, and its testing. It then allows the developer to specify details on the model's usage either programmatically or interactively with a CLI. Over time, we plan to allow developers to specify additional details about their models, with a special focus on supporting more quantitative and statistical details.