Model Card Tutorial¶
Even with the increasing use of ML systems, proper documentation of such systems has yet to gain traction. This has often been linked to the absence of standardized documentation procedures, which has recently led to the proposal of new documentation frameworks for ML models. One example includes the Model Card project, proposed by M. Mitchell et al. in Model Cards for Model Reporting (2019). This framework aims to allow more transparent model reporting by having the model creator document numerous suggested details (e.g., training algorithms, intended users, etc.).
To support this framework and contribute to the efforts towards transparent model reporting, we added a Model Card system to Tribuo. To decrease the workload on the developer's end and partially automate the process, the Model Card system uses Tribuo's built-in provenance to fill in the details relating to the model, its training, and its testing while also allowing the developer to explicitly specify the model's usage details either programmatically or by running a CLI.
Setup¶
We are going to load in the LibSVM anomaly detection and Model Card jars and also import a few packages. Note that the Model Card jar is written in Java 17, and so this tutorial requires Java 17 or later.
%jars ./tribuo-anomaly-libsvm-4.3.0-jar-with-dependencies.jar
%jars ./tribuo-modelcard-4.3.0-jar-with-dependencies.jar
import com.oracle.labs.mlrg.olcut.provenance.ProvenanceUtil;
import org.tribuo.MutableDataset;
import org.tribuo.interop.modelcard.ModelCard;
import org.tribuo.interop.modelcard.UsageDetailsBuilder;
import org.tribuo.anomaly.evaluation.AnomalyEvaluator;
import org.tribuo.anomaly.example.GaussianAnomalyDataSource;
import org.tribuo.anomaly.libsvm.LibSVMAnomalyTrainer;
import org.tribuo.anomaly.libsvm.SVMAnomalyType;
import org.tribuo.common.libsvm.KernelType;
import org.tribuo.common.libsvm.SVMParameters;
import java.io.File;
import java.io.IOException;
import java.util.Map;
Creating a Model Card for a Tribuo Model¶
The Model Card system only supports Tribuo models and is currently incompatible with external models. The information stored within a ModelCard
is separated into four components: ModelDetails
, TrainingDetails
, TestingDetails
, and UsageDetails
. Construction of ModelDetails
and TrainingDetails
only relies on the built-in provenance of a Model
instance while TestingDetails
relies on the built-in provenance of an Evaluation
instance and can be further augmented by a Map
of testing metrics pre-selected by the user. In contrast, UsageDetails
relies entirely on fields set by the user and can either be constructed programmatically using the UsageDetailsBuilder
or interactively using the ModelCardCLI
.
We will first re-create the anomaly detection model from the Anomaly Detection tutorial and then construct numerous ModelCard
objects.
var trainData = new MutableDataset<>(new GaussianAnomalyDataSource(2000,0.0f, 1L));
var evalData = new MutableDataset<>(new GaussianAnomalyDataSource(2000,0.2f,2L));
var params = new SVMParameters<>(new SVMAnomalyType(SVMAnomalyType.SVMMode.ONE_CLASS), KernelType.RBF);
params.setGamma(1.0);
params.setNu(0.1);
var trainer = new LibSVMAnomalyTrainer(params);
var model = trainer.train(trainData);
var evaluator = new AnomalyEvaluator();
var evaluation = evaluator.evaluate(model,evalData);
System.out.println();
System.out.println(ProvenanceUtil.formattedProvenanceString(model.getProvenance()));
System.out.println();
System.out.println(ProvenanceUtil.formattedProvenanceString(evaluation.getProvenance()));
At the very least, constructing a ModelCard
requires a Model
and its Evaluation
. Providing only these two parameters results in a ModelCard
without any testing metrics and a null UsageDetails
.
ModelCard card1 = new ModelCard(model, evaluation);
System.out.println(card1.toString());
To include testing metrics in the TestingDetails
section of a ModelCard
, a map of the metrics (with the key being the metric description and the value being the metric value) can be provided as an additional parameter to the constructor. Note that the ModelCard
constructor copies all the items from the parameter map rather than storing a reference to that map. This means that any changes applied to the parameter map after a ModelCard
is constructed will not be present in the metrics map of the ModelCard
.
Map<String, Double> testingMetrics = new HashMap<>();
testingMetrics.put("overall-precision", evaluation.getPrecision());
testingMetrics.put("overall-recall", evaluation.getRecall());
ModelCard card2 = new ModelCard(model, evaluation, testingMetrics);
System.out.println(card2);
In addition, we can construct a UsageDetails
object using the UsageDetailsBuilder
and specify details on the appropriate usage of our trained model. The UsageDetails
object can then be provided as an additional parameter to the ModelCard
constructor. Note that UsageDetailsBuilder
will have the empty string or list as the default value for any of its fields, which will also get carried over to the UsageDetails
object after building.
var builder = new UsageDetailsBuilder();
builder = builder.intendedUse("Anomaly detection")
.intendedUsers("ML learners")
.primaryContact("Alice");
ModelCard card3 = new ModelCard(model, evaluation, testingMetrics, builder.build());
System.out.println(card3);
Alternatively, the UsageDetails
can also be appended to a ModelCard
after the construction of the ModelCard
. This will require using the ModelCardCLI
, which will launch a new shell and allow the user to interactively set the fields of UsageDetails
. In the end, to append the UsageDetails
to a desired ModelCard
, the ModelCardCLI
will require a file containing a serialized version of the desired ModelCard
. We can easily create the serialized version of a ModelCard
and save it to a file. The file containing the serialized ModelCard
can also be deserialized to instantiate a ModelCard
object.
File output = File.createTempFile("output", "json");
card3.saveToFile(output.toPath());
ModelCard card4 = ModelCard.deserializeFromJson(output.toPath());
System.out.println(card4);
output.delete();
Note that ModelCardCLI
will only append a UsageDetails
to a ModelCard
that has a null UsageDetails
and will throw an error in all other cases. This means that a serialized ModelCard
file that had its UsageDetails
set programmatically with the UsageDetailsBuilder
cannot be provided to the ModelCardCLI
.
Conclusion¶
We showed how to create a partially automated documentation of a Tribuo ML model using the Model Card system. The system relies on Tribuo's built-in provenance to fill in the details relating to the model, its training, and its testing. It then allows the developer to specify details on the model's usage either programmatically or interactively with a CLI. Over time, we plan to allow developers to specify additional details about their models, with a special focus on supporting more quantitative and statistical details.