Reproducibility Tutorial

Reproducibility of ML models and evaluations is frequently a problem across many ML systems. It's usually two problems, the first is a description of the computation that was executed, and the second is a method of replaying that computation. In Tribuo we built our provenance system to make our models self-describing by which we mean they capture a complete description of the computation that produced them, solving the first issue. In v4.2 we added an automated reproducibility system which consumes the provenance data and retrains the model. As well as the reproducibility system we also added a mechanism for diffing provenance objects allowing easy comparison between the reproduced and original models. This is because the models are only guaranteed to be identical if the data is the same, and any differences in the data will show up in the data provenance object.

Setup

Before running this tutorial, please run the irises classification and ONNX export tutorial to build the two models that we're going to reproduce.

We're going to load in the classification jar, onnx jar, and the reproducibility jar. Note the reproducibility jar is written in Java 16, and so this tutorial requires Java 16 or later. Then we'll import the necessary classes.

In [1]:
%jars ./tribuo-classification-experiments-4.3.0-jar-with-dependencies.jar
%jars ./tribuo-onnx-4.3.0-jar-with-dependencies.jar
%jars ./tribuo-json-4.3.0-jar-with-dependencies.jar
%jars ./tribuo-reproducibility-4.3.0-jar-with-dependencies.jar
In [2]:
import org.tribuo.*;
import org.tribuo.classification.*;
import org.tribuo.classification.evaluation.*;
import org.tribuo.classification.sgd.fm.*;
import org.tribuo.classification.sgd.linear.*;
import org.tribuo.datasource.*;
import org.tribuo.interop.onnx.*;
import org.tribuo.reproducibility.*;
import com.oracle.labs.mlrg.olcut.provenance.*;
import com.oracle.labs.mlrg.olcut.util.*;
import ai.onnxruntime.*;

import java.nio.file.*;

Reproducing a Tribuo Model

The reproducibility system works on Tribuo Model or ModelProvenance objects. When using the ModelProvenance the system loads in the original training data, processes and transforms it according to the columnar processing and transforms applied, then rebuilds the original trainer including it's RNG state, before passing the data into the train method and returning the reproduced model. When using the Model object, it performs the same steps as for a ModelProvenance and then compares the feature and output domains to provide more information about any differences between the feature and output domains used by the model. Over time we plan to expand the validation applied to the reproduced model to show if the features have different ranges or histograms.

We're going to load in the Irises logistic regression model trained in the first tutorial.

In [3]:
File irisModelFile = new File("iris-lr-model.ser");
String filterPattern = Files.readAllLines(Paths.get("../docs/jep-290-filter.txt")).get(0);
ObjectInputFilter filter = ObjectInputFilter.Config.createFilter(filterPattern);
LinearSGDModel loadedModel;
try (ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new FileInputStream(irisModelFile)))) {
    ois.setObjectInputFilter(filter);
    loadedModel = (LinearSGDModel) ois.readObject();
}

System.out.println(loadedModel.toString());
linear-sgd-model - Model(class-name=org.tribuo.classification.sgd.linear.LinearSGDModel,dataset=Dataset(class-name=org.tribuo.MutableDataset,datasource=SplitDataSourceProvenance(className=org.tribuo.evaluation.TrainTestSplitter,innerSourceProvenance=DataSource(class-name=org.tribuo.data.csv.CSVDataSource,headers=[sepalLength, sepalWidth, petalLength, petalWidth, species],rowProcessor=RowProcessor(class-name=org.tribuo.data.columnar.RowProcessor,metadataExtractors=[],fieldProcessorList=[FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=petalLength,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor), FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=petalWidth,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor), FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=sepalWidth,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor), FieldProcessor(class-name=org.tribuo.data.columnar.processors.field.DoubleFieldProcessor,fieldName=sepalLength,onlyFieldName=true,throwOnInvalid=true,host-short-name=FieldProcessor)],featureProcessors=[],responseProcessor=ResponseProcessor(class-name=org.tribuo.data.columnar.processors.response.FieldResponseProcessor,uppercase=false,fieldNames=[species],defaultValues=[],displayField=false,outputFactory=OutputFactory(class-name=org.tribuo.classification.LabelFactory),host-short-name=ResponseProcessor),weightExtractor=null,replaceNewlinesWithSpaces=true,regexMappingProcessors={},host-short-name=RowProcessor),quote=",outputRequired=true,outputFactory=OutputFactory(class-name=org.tribuo.classification.LabelFactory),separator=,,dataPath=/local/ExternalRepositories/tribuo/tutorials/bezdekIris.data,resource-hash=SHA-256[0FED2A99DB77EC533A62DC66894D3EC6DF3B58B6A8F3CF4A6B47E4086B7F97DC],file-modified-time=1999-12-14T15:12:39-05:00,datasource-creation-time=2022-10-07T11:20:06.279351-04:00,host-short-name=DataSource),trainProportion=0.7,seed=1,size=150,isTrain=true),transformations=[],is-sequence=false,is-dense=true,num-examples=105,num-features=4,num-outputs=3,tribuo-version=4.3.0),trainer=Trainer(class-name=org.tribuo.classification.sgd.linear.LogisticRegressionTrainer,seed=12345,minibatchSize=1,shuffle=true,epochs=5,optimiser=StochasticGradientOptimiser(class-name=org.tribuo.math.optimisers.AdaGrad,epsilon=0.1,initialLearningRate=1.0,initialValue=0.0,host-short-name=StochasticGradientOptimiser),loggingInterval=1000,objective=LabelObjective(class-name=org.tribuo.classification.sgd.objectives.LogMulticlass,host-short-name=LabelObjective),tribuo-version=4.3.0,train-invocation-count=0,is-sequence=false,host-short-name=Trainer),trained-at=2022-10-07T11:20:06.643297-04:00,instance-values={},tribuo-version=4.3.0,java-version=12,os-name=Linux,os-arch=amd64)

The reproducibility system lives in the ReproUtil class. This class is constructed with a Model or a ModelProvenance and Class<T extends Output<T>> for the output class.

In [4]:
var repro = new ReproUtil<>(loadedModel);

Now we can separately rebuild the dataset and the trainer, though note if you mutate the objects returned by these methods then you won't get the exact same model back from the reproduction. We're still working on the API for the reproducibility system and expect to make this API more robust over time.

In [5]:
var dataset = repro.recoverDataset();

System.out.println(ProvenanceUtil.formattedProvenanceString(dataset.getProvenance()));
MutableDataset(
	class-name = org.tribuo.MutableDataset
	datasource = TrainTestSplitter(
			class-name = org.tribuo.evaluation.TrainTestSplitter
			source = CSVDataSource(
					class-name = org.tribuo.data.csv.CSVDataSource
					headers = List[
						sepalLength
						sepalWidth
						petalLength
						petalWidth
						species
					]
					rowProcessor = RowProcessor(
							class-name = org.tribuo.data.columnar.RowProcessor
							metadataExtractors = List[]
							fieldProcessorList = List[
								DoubleFieldProcessor(
											class-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor
											fieldName = petalLength
											onlyFieldName = true
											throwOnInvalid = true
											host-short-name = FieldProcessor
										)
								DoubleFieldProcessor(
											class-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor
											fieldName = petalWidth
											onlyFieldName = true
											throwOnInvalid = true
											host-short-name = FieldProcessor
										)
								DoubleFieldProcessor(
											class-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor
											fieldName = sepalWidth
											onlyFieldName = true
											throwOnInvalid = true
											host-short-name = FieldProcessor
										)
								DoubleFieldProcessor(
											class-name = org.tribuo.data.columnar.processors.field.DoubleFieldProcessor
											fieldName = sepalLength
											onlyFieldName = true
											throwOnInvalid = true
											host-short-name = FieldProcessor
										)
							]
							featureProcessors = List[]
							responseProcessor = FieldResponseProcessor(
									class-name = org.tribuo.data.columnar.processors.response.FieldResponseProcessor
									uppercase = false
									fieldNames = List[
										species
									]
									defaultValues = List[
										
									]
									displayField = false
									outputFactory = LabelFactory(
											class-name = org.tribuo.classification.LabelFactory
										)
									host-short-name = ResponseProcessor
								)
							weightExtractor = FieldExtractor(
									class-name = org.tribuo.data.columnar.FieldExtractor
								)
							replaceNewlinesWithSpaces = true
							regexMappingProcessors = Map{}
							host-short-name = RowProcessor
						)
					quote = "
					outputRequired = true
					outputFactory = LabelFactory(
							class-name = org.tribuo.classification.LabelFactory
						)
					separator = ,
					dataPath = /local/ExternalRepositories/tribuo/tutorials/bezdekIris.data
					resource-hash = 0FED2A99DB77EC533A62DC66894D3EC6DF3B58B6A8F3CF4A6B47E4086B7F97DC
					file-modified-time = 1999-12-14T15:12:39-05:00
					datasource-creation-time = 2022-10-07T12:03:48.921236415-04:00
					host-short-name = DataSource
				)
			train-proportion = 0.7
			seed = 1
			size = 150
			is-train = true
		)
	transformations = List[]
	is-sequence = false
	is-dense = true
	num-examples = 105
	num-features = 4
	num-outputs = 3
	tribuo-version = 4.3.0
)

Our irises dataset was loaded in using the CSVLoader and split with a 70/30 train test split, and we can see that the reproduced training dataset has been split just as we expect.

In [6]:
var trainer = repro.recoverTrainer();
System.out.println(ProvenanceUtil.formattedProvenanceString(trainer.getProvenance()));
LogisticRegressionTrainer(
	class-name = org.tribuo.classification.sgd.linear.LogisticRegressionTrainer
	seed = 12345
	minibatchSize = 1
	shuffle = true
	epochs = 5
	optimiser = AdaGrad(
			class-name = org.tribuo.math.optimisers.AdaGrad
			epsilon = 0.1
			initialLearningRate = 1.0
			initialValue = 0.0
			host-short-name = StochasticGradientOptimiser
		)
	loggingInterval = 1000
	objective = LogMulticlass(
			class-name = org.tribuo.classification.sgd.objectives.LogMulticlass
			host-short-name = LabelObjective
		)
	tribuo-version = 4.3.0
	train-invocation-count = 0
	is-sequence = false
	host-short-name = Trainer
)

The irises model is a logistic regression, using seed 12345 and it's the first model trained by that trainer (as train-invocation-count is zero).

In [7]:
var reproduction = repro.reproduceFromModel();
var reproducedModel = (LinearSGDModel) reproduction.model();

We can compare this provenance to the one in the original model using our diff tool, however as Tribuo records construction timestamps they will not be identical.

In [8]:
System.out.println(ReproUtil.diffProvenance(loadedModel.getProvenance(),reproducedModel.getProvenance()));
{
  "dataset" : {
    "datasource" : {
      "source" : {
        "datasource-creation-time" : {
          "original" : "2022-10-07T11:20:06.279351-04:00",
          "reproduced" : "2022-10-07T12:03:48.921236415-04:00"
        }
      }
    }
  },
  "java-version" : {
    "original" : "12",
    "reproduced" : "17.0.4.1"
  },
  "trained-at" : {
    "original" : "2022-10-07T11:20:06.643297-04:00",
    "reproduced" : "2022-10-07T12:03:49.150931420-04:00"
  }
}

We can see that the timestamps are a little different, though the precise difference will depend on when you ran the irises tutorial. You may also see differences in the JVM or other machine provenance if you ran that tutorial on a different machine. If the irises dataset grows a new feature or additional rows in the same file, then the diff will show that the datasets have different numbers of features or samples, and that the file has a different hash.

For some models we can easily compare the model contents, e.g., for the logistic regression we can directly compare the model weights.

In [9]:
var originalWeights = loadedModel.getWeightsCopy();
var reproducedWeights = reproducedModel.getWeightsCopy();

System.out.println("Weights are equal = " + originalWeights.equals(reproducedWeights));
Weights are equal = true

Reproducing an ONNX exported Tribuo Model

Tribuo models can be exported into the ONNX format. When Tribuo models are exported the model provenance is stored as a metadata field in the ONNX file. This doesn't affect anything which serves the ONNX model, but allows Tribuo to load the provenance back in if the model is loaded in as an ONNXExternalModel which is Tribuo's class for loading in ONNX models.

To load a model in as an ONNXExternalModel we need to define the feature and label mappings which should be written out separately when the ONNX model is exported. We're going to cheat slightly and get them from the MNIST training set itself.

In [10]:
var labelFactory = new LabelFactory();
var mnistTrainSource = new IDXDataSource<>(Paths.get("train-images-idx3-ubyte.gz"),Paths.get("train-labels-idx1-ubyte.gz"),labelFactory);
var mnistTestSource = new IDXDataSource<>(Paths.get("t10k-images-idx3-ubyte.gz"),Paths.get("t10k-labels-idx1-ubyte.gz"),labelFactory);
var mnistTrain = new MutableDataset<>(mnistTrainSource);
var mnistTest = new MutableDataset<>(mnistTestSource);

Map<String, Integer> mnistFeatureMap = new HashMap<>();
for (VariableInfo f : mnistTrain.getFeatureIDMap()){
    VariableIDInfo id = (VariableIDInfo) f;
    mnistFeatureMap.put(id.getName(),id.getID());
}
Map<Label, Integer> mnistOutputMap = new HashMap<>();
for (Pair<Integer,Label> l : mnistTrain.getOutputIDInfo()) {
    mnistOutputMap.put(l.getB(), l.getA());
}

Now let's load in the ONNX file:

In [11]:
var ortEnv = OrtEnvironment.getEnvironment();
var sessionOpts = new OrtSession.SessionOptions();
var denseTransformer = new DenseTransformer();
var labelTransformer = new LabelTransformer();
var mnistModelPath = Paths.get(".","fm-mnist.onnx");
ONNXExternalModel<Label> onnx = ONNXExternalModel.createOnnxModel(labelFactory, mnistFeatureMap, mnistOutputMap,
                    denseTransformer, labelTransformer, sessionOpts, mnistModelPath, "input");

This model has two provenance objects, one from the creation of the ONNXExternalModel, and one from the original training run in Tribuo which is persisted inside the ONNX file.

In [12]:
System.out.println(ProvenanceUtil.formattedProvenanceString(onnx.getProvenance()));
ONNXExternalModel(
	class-name = org.tribuo.interop.onnx.ONNXExternalModel
	dataset = Dataset(
			class-name = org.tribuo.Dataset
			datasource = DataSource(
					description = unknown-external-data
					outputFactory = LabelFactory(
							class-name = org.tribuo.classification.LabelFactory
						)
					datasource-creation-time = 2022-10-07T12:03:57.351723125-04:00
				)
			transformations = List[]
			is-sequence = false
			is-dense = false
			num-examples = -1
			num-features = 717
			num-outputs = 10
			tribuo-version = 4.3.0
		)
	trainer = Trainer(
			class-name = org.tribuo.Trainer
			fileModifiedTime = 2022-10-07T11:46:10.476-04:00
			modelHash = 9DD2FABC436FB75BAD6A3E061BE51022A79F140FC491C6CA8B8033253F43CD5F
			location = file:/local/ExternalRepositories/tribuo/tutorials/./fm-mnist.onnx
		)
	trained-at = 2022-10-07T12:03:57.349886186-04:00
	instance-values = Map{
		model-domain=org.tribuo.tutorials.onnxexport.fm
		model-graphname=FMClassificationModel
		model-description=factorization-machine-model - Model(class-name=org.tribuo.classification.sgd.fm.FMClassificationModel,dataset=Dataset(class-name=org.tribuo.MutableDataset,datasource=DataSource(class-name=org.tribuo.datasource.IDXDataSource,outputPath=/local/ExternalRepositories/tribuo/tutorials/train-labels-idx1-ubyte.gz,outputFactory=OutputFactory(class-name=org.tribuo.classification.LabelFactory),featuresPath=/local/ExternalRepositories/tribuo/tutorials/train-images-idx3-ubyte.gz,features-file-modified-time=2000-07-21T14:20:24-04:00,output-resource-hash=SHA-256[3552534A0A558BBED6AED32B30C495CCA23D567EC52CAC8BE1A0730E8010255C],datasource-creation-time=2022-10-07T11:45:53.253680-04:00,output-file-modified-time=2000-07-21T14:20:27-04:00,idx-feature-type=UBYTE,features-resource-hash=SHA-256[440FCABF73CC546FA21475E81EA370265605F56BE210A4024D2CA8F203523609],host-short-name=DataSource),transformations=[],is-sequence=false,is-dense=false,num-examples=60000,num-features=717,num-outputs=10,tribuo-version=4.3.0),trainer=Trainer(class-name=org.tribuo.classification.sgd.fm.FMClassificationTrainer,seed=12345,variance=0.1,minibatchSize=1,factorizedDimSize=6,shuffle=true,epochs=5,optimiser=StochasticGradientOptimiser(class-name=org.tribuo.math.optimisers.AdaGrad,epsilon=0.1,initialLearningRate=0.1,initialValue=0.0,host-short-name=StochasticGradientOptimiser),loggingInterval=30000,objective=LabelObjective(class-name=org.tribuo.classification.sgd.objectives.LogMulticlass,host-short-name=LabelObjective),tribuo-version=4.3.0,train-invocation-count=0,is-sequence=false,host-short-name=Trainer),trained-at=2022-10-07T11:46:09.759423-04:00,instance-values={},tribuo-version=4.3.0,java-version=12,os-name=Linux,os-arch=amd64)
		model-producer=Tribuo
		model-version=0
		input-name=input
	}
	tribuo-version = 4.3.0
	java-version = 17.0.4.1
	os-name = Linux
	os-arch = amd64
)

The ONNXExternalModel provenance has a lot of placeholders in it, as you might expect given the information is not always present in ONNX files.

We can load the Tribuo model provenance using getTribuoProvenance():

In [13]:
var tribuoProvenance = onnx.getTribuoProvenance().get();
System.out.println(ProvenanceUtil.formattedProvenanceString(tribuoProvenance));
FMClassificationModel(
	class-name = org.tribuo.classification.sgd.fm.FMClassificationModel
	dataset = MutableDataset(
			class-name = org.tribuo.MutableDataset
			datasource = IDXDataSource(
					class-name = org.tribuo.datasource.IDXDataSource
					outputFactory = LabelFactory(
							class-name = org.tribuo.classification.LabelFactory
						)
					outputPath = /local/ExternalRepositories/tribuo/tutorials/train-labels-idx1-ubyte.gz
					featuresPath = /local/ExternalRepositories/tribuo/tutorials/train-images-idx3-ubyte.gz
					features-file-modified-time = 2000-07-21T14:20:24-04:00
					output-resource-hash = 3552534A0A558BBED6AED32B30C495CCA23D567EC52CAC8BE1A0730E8010255C
					datasource-creation-time = 2022-10-07T11:45:53.253680-04:00
					output-file-modified-time = 2000-07-21T14:20:27-04:00
					idx-feature-type = UBYTE
					features-resource-hash = 440FCABF73CC546FA21475E81EA370265605F56BE210A4024D2CA8F203523609
					host-short-name = DataSource
				)
			transformations = List[]
			is-sequence = false
			is-dense = false
			num-examples = 60000
			num-features = 717
			num-outputs = 10
			tribuo-version = 4.3.0
		)
	trainer = FMClassificationTrainer(
			class-name = org.tribuo.classification.sgd.fm.FMClassificationTrainer
			seed = 12345
			variance = 0.1
			minibatchSize = 1
			factorizedDimSize = 6
			shuffle = true
			epochs = 5
			optimiser = AdaGrad(
					class-name = org.tribuo.math.optimisers.AdaGrad
					epsilon = 0.1
					initialLearningRate = 0.1
					initialValue = 0.0
					host-short-name = StochasticGradientOptimiser
				)
			loggingInterval = 30000
			objective = LogMulticlass(
					class-name = org.tribuo.classification.sgd.objectives.LogMulticlass
					host-short-name = LabelObjective
				)
			tribuo-version = 4.3.0
			train-invocation-count = 0
			is-sequence = false
			host-short-name = Trainer
		)
	trained-at = 2022-10-07T11:46:09.759423-04:00
	instance-values = Map{}
	tribuo-version = 4.3.0
	java-version = 12
	os-name = Linux
	os-arch = amd64
)

From this provenance we can see that the model is a factorization machine running on MNIST (as expected). So now we can build a ReproUtil and rebuild the model.

In [14]:
var mnistRepro = new ReproUtil<>(tribuoProvenance,Label.class);

var reproducedMNISTModel = mnistRepro.reproduceFromProvenance();

We can diff the two provenances:

In [15]:
System.out.println(ReproUtil.diffProvenance(tribuoProvenance, reproducedMNISTModel.getProvenance()));
{
  "dataset" : {
    "datasource" : {
      "datasource-creation-time" : {
        "original" : "2022-10-07T11:45:53.253680-04:00",
        "reproduced" : "2022-10-07T12:04:03.138366189-04:00"
      }
    }
  },
  "java-version" : {
    "original" : "12",
    "reproduced" : "17.0.4.1"
  },
  "trained-at" : {
    "original" : "2022-10-07T11:46:09.759423-04:00",
    "reproduced" : "2022-10-07T12:04:15.478400652-04:00"
  }
}

As before, it's not very interesting as we're using the same files and so only the creation timestamps are differing. Checking the model weights is tricky with an ONNX model, so we can instead check that the predictions are the same (though Tribuo computes in doubles and ONNX Runtime uses floats so the answers are slightly different). We'll borrow the checkPredictions function from the ONNX export tutorial.

In [16]:
public boolean checkPredictions(List<Prediction<Label>> nativePredictions, List<Prediction<Label>> onnxPredictions, double delta) {
    for (int i = 0; i < nativePredictions.size(); i++) {
        Prediction<Label> tribuo = nativePredictions.get(i);
        Prediction<Label> external = onnxPredictions.get(i);
        // Check the predicted label
        if (!tribuo.getOutput().getLabel().equals(external.getOutput().getLabel())) {
            System.out.println("At index " + i + " predictions are not equal - "
                    + tribuo.getOutput().getLabel() + " and "
                    + external.getOutput().getLabel());
            return false;
        }
        // Check the maximum score
        if (Math.abs(tribuo.getOutput().getScore() - external.getOutput().getScore()) > delta) {
            System.out.println("At index " + i + " predictions are not equal - "
                    + tribuo.getOutput() + " and "
                    + external.getOutput());
            return false;
        }
        // Check the score distribution
        for (Map.Entry<String, Label> l : tribuo.getOutputScores().entrySet()) {
            Label other = external.getOutputScores().get(l.getKey());
            if (other == null) {
                System.out.println("At index " + i + " failed to find label " + l.getKey() + " in ORT prediction.");
                return false;
            } else {
                if (Math.abs(l.getValue().getScore() - other.getScore()) > delta) {
                    System.out.println("At index " + i + " predictions are not equal - "
                            + tribuo.getOutputScores() + " and "
                            + external.getOutputScores());
                    return false;
                }
            }
        }
    }
    return true;
}

Now we can make predictions from both models and compare the outputs:

In [17]:
var onnxPredictions = onnx.predict(mnistTest);
var reproducedPredictions = reproducedMNISTModel.predict(mnistTest);

System.out.println("Predictions are equal = " + checkPredictions(reproducedPredictions,onnxPredictions,1e-5));
Predictions are equal = true

Working with provenance diffs

We can use the provenance diff methods to compute diffs for unrelated models too. We're going to train a logistic regression on MNIST and compare the model provenance against the ONNX factorization machine we just used.

In [18]:
var lrTrainer = new LogisticRegressionTrainer();
var lrModel = lrTrainer.train(mnistTrain);

System.out.println(ReproUtil.diffProvenance(tribuoProvenance, lrModel.getProvenance()));
{
  "class-name" : {
    "original" : "org.tribuo.classification.sgd.fm.FMClassificationModel",
    "reproduced" : "org.tribuo.classification.sgd.linear.LinearSGDModel"
  },
  "dataset" : {
    "datasource" : {
      "datasource-creation-time" : {
        "original" : "2022-10-07T11:45:53.253680-04:00",
        "reproduced" : "2022-10-07T12:03:56.006018468-04:00"
      }
    }
  },
  "java-version" : {
    "original" : "12",
    "reproduced" : "17.0.4.1"
  },
  "trained-at" : {
    "original" : "2022-10-07T11:46:09.759423-04:00",
    "reproduced" : "2022-10-07T12:04:24.453627627-04:00"
  },
  "trainer" : {
    "class-name" : {
      "original" : "org.tribuo.classification.sgd.fm.FMClassificationTrainer",
      "reproduced" : "org.tribuo.classification.sgd.linear.LogisticRegressionTrainer"
    },
    "loggingInterval" : {
      "original" : "30000",
      "reproduced" : "1000"
    },
    "optimiser" : {
      "initialLearningRate" : {
        "original" : "0.1",
        "reproduced" : "1.0"
      }
    },
    "factorizedDimSize" : {
      "original" : "6"
    },
    "variance" : {
      "original" : "0.1"
    }
  }
}

This diff is longer than the others we've seen, as expected for two different models with different trainers. As expected the dataset section is mostly empty as both models are trained on an unmodified MNIST training set. The FMClassificationTrainer and LogisticRegressionTrainer show more differences, but as both are SGD based models there are many common fields. They share fields like a loss function (both used LogMulticlass), a gradient optimiser (both used AdaGrad), the number of training epochs, and the minibatch size. They used different learning rates (which do appear in the diff under optimiser) and the factorization machine also has a few extra parameters not found in the logistic regression, factorizedDimSize and variance, which are reported as just having an original value, meaning they are only found in the first provenance and not the second.

The current diff format is JSON, and designed to be easily human readable. We left designing a navigable diff object which is easily inspectable from code to future work once we have a better understanding of how people want to use the generated diffs.

Conclusion

We showed how to load in Tribuo models and reproduce them using our automated reproducibility system. The system executes the same computations as the original training, which in most cases results in an identical model. We have noted that there are some differences between gradient descent based models that are trained on ARM and x86 architectures due to underlying differences in the JVM, but otherwise the reproductions are exact. Over time we plan to expand this reproducibility system into a full experimental framework allowing models to be rebuilt using different datasets, data transformations or training hyperparameters holding all other parameters constant.