Class ReproUtil<T extends Output<T>>
- Type Parameters:
T
- The output type of the model being reproduced.
Note: this class is designed to be used to reproduce a single object.
Repeated calls to reproduceFromModel()
or reproduceFromProvenance()
may produce different outputs due to internal state changes.
Note: this class's API is experimental and may change in Tribuo minor releases as we work to make it more robust and increase coverage.
At the moment the reproducibility system supports ConfigurableDataSource
s and a single level of splitting
using a TrainTestSplitter
. It does not support DatasetView
which forms the basis
of the cross-validation support in Tribuo, nor other datasets which take a subset of their input (e.g.,
MinimumCardinalityDataset
), we will add this support in future releases.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final record
Record for any differences between feature sets.static final record
ReproUtil.ModelReproduction<T extends Output<T>>
Record for a model reproduction.static final record
ReproUtil.OutputDiff<T extends Output<T>>
Record for any differences between output domains. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
diffProvenance
(ModelProvenance originalProvenance, ModelProvenance newProvenance) Creates a JSON String diff of twoModelProvenance
objects.com.oracle.labs.mlrg.olcut.config.ConfigurationManager
Returns the ConfigurationManager the ReproUtil is using to manage the reproduced models.Return aDataset
used when a model was trained.Extract the trainer from this repro util.Using a suppliedModel
object, recreates an identical model object that the provenance describes.Recreates a model object using theModelProvenance
supplied when the ReproUtil object was created.
-
Constructor Details
-
ReproUtil
Creates a ReproUtil instanceThrows
IllegalArgumentException
if the model is an external model trained outside of Tribuo.The output class is validated when the model is reproduced.
- Parameters:
provenance
- The ReproUtil will re-train a model based on the information contained in thisModelProvenance
.outputClass
- The output class for the model.
-
ReproUtil
Creates a ReproUtil instance.Throws
IllegalArgumentException
if the model is an external model trained outside of Tribuo.- Parameters:
originalModel
- The ReproUtil will re-train a model based on the provenance contained in thisModel
.
-
-
Method Details
-
recoverTrainer
Extract the trainer from this repro util.Note calling
Trainer.train(org.tribuo.Dataset<T>)
on the returned trainer object may distort any future reproductions produced by this instance ofReproUtil
.- Returns:
- A
Trainer
found in the configuration manager, used to train the originalModel.
-
recoverDataset
Return aDataset
used when a model was trained.Throws
IllegalStateException
if the dataset could not be recovered or one of the classes could not be instantiated.Note transforming or otherwise mutating the returned
Dataset
object may distort any future reproductions produced by this instance ofReproUtil
.At the moment this function supports
ConfigurableDataSource
s and a single level of splitting using aTrainTestSplitter
. It does not supportDatasetView
which forms the basis of the cross-validation support in Tribuo, nor other datasets which take a subset of their input (e.g.,MinimumCardinalityDataset
), we will add this support in future releases.- Returns:
- A new
Dataset
.
-
getConfigurationManager
public com.oracle.labs.mlrg.olcut.config.ConfigurationManager getConfigurationManager()Returns the ConfigurationManager the ReproUtil is using to manage the reproduced models.Note modifying the returned
ConfigurationManager
will distort the results of any future reproductions performed by thisReproUtil
.- Returns:
- a ConfigurationManager the ReproUtil is managing.
-
reproduceFromProvenance
Recreates a model object using theModelProvenance
supplied when the ReproUtil object was created.Recovers the trainer and dataset information before training an identical model.
Throws
IllegalStateException
if the source or trainer can not be loaded or are not configurable.- Returns:
- A reproduced model identical to the one described in the provenance.
- Throws:
ClassNotFoundException
- If the dataset or trainer could not be instantiated.
-
reproduceFromModel
public ReproUtil.ModelReproduction<T> reproduceFromModel() throws ClassNotFoundException, com.fasterxml.jackson.core.JsonProcessingExceptionUsing a suppliedModel
object, recreates an identical model object that the provenance describes.Recovers the trainer and dataset information before training an identical model.
Throws
IllegalStateException
if the model provenance is malformed or not defined.- Returns:
- A reproduced model identical to the one described in the provenance.
- Throws:
ClassNotFoundException
- If the trainer or datasource class cannot be instantiated.com.fasterxml.jackson.core.JsonProcessingException
- If the json diff could not be created.
-
diffProvenance
public static String diffProvenance(ModelProvenance originalProvenance, ModelProvenance newProvenance) throws com.fasterxml.jackson.core.JsonProcessingException Creates a JSON String diff of twoModelProvenance
objects. Only the differences will appear in the resulting diff.Recovers the trainer and dataset information before training an identical model.
Throws
IllegalStateException
if the model provenances could not be parsed.- Parameters:
originalProvenance
- The first of the two provenance objects to diffnewProvenance
- The second of the two provenance objects to diff- Returns:
- A String JSON report displaying the differences in the model.
- Throws:
com.fasterxml.jackson.core.JsonProcessingException
- If the json diff could not be created.
-