Class ReproUtil<T extends Output<T>>

java.lang.Object
org.tribuo.reproducibility.ReproUtil<T>
Type Parameters:
T - The output type of the model being reproduced.

public final class ReproUtil<T extends Output<T>> extends Object
Reproducibility utility based on Tribuo's provenance objects.

Note: this class is designed to be used to reproduce a single object. Repeated calls to reproduceFromModel() or reproduceFromProvenance() may produce different outputs due to internal state changes.

Note: this class's API is experimental and may change in Tribuo minor releases as we work to make it more robust and increase coverage.

At the moment the reproducibility system supports ConfigurableDataSources and a single level of splitting using a TrainTestSplitter. It does not support DatasetView which forms the basis of the cross-validation support in Tribuo, nor other datasets which take a subset of their input (e.g., MinimumCardinalityDataset), we will add this support in future releases.

  • Constructor Details

    • ReproUtil

      public ReproUtil(ModelProvenance provenance, Class<T> outputClass)
      Creates a ReproUtil instance

      Throws IllegalArgumentException if the model is an external model trained outside of Tribuo.

      The output class is validated when the model is reproduced.

      Parameters:
      provenance - The ReproUtil will re-train a model based on the information contained in this ModelProvenance.
      outputClass - The output class for the model.
    • ReproUtil

      public ReproUtil(Model<T> originalModel)
      Creates a ReproUtil instance.

      Throws IllegalArgumentException if the model is an external model trained outside of Tribuo.

      Parameters:
      originalModel - The ReproUtil will re-train a model based on the provenance contained in this Model.
  • Method Details

    • recoverTrainer

      public Trainer<T> recoverTrainer()
      Extract the trainer from this repro util.

      Note calling Trainer.train(org.tribuo.Dataset<T>) on the returned trainer object may distort any future reproductions produced by this instance of ReproUtil.

      Returns:
      A Trainer found in the configuration manager, used to train the originalModel.
    • recoverDataset

      public Dataset<T> recoverDataset()
      Return a Dataset used when a model was trained.

      Throws IllegalStateException if the dataset could not be recovered or one of the classes could not be instantiated.

      Note transforming or otherwise mutating the returned Dataset object may distort any future reproductions produced by this instance of ReproUtil.

      At the moment this function supports ConfigurableDataSources and a single level of splitting using a TrainTestSplitter. It does not support DatasetView which forms the basis of the cross-validation support in Tribuo, nor other datasets which take a subset of their input (e.g., MinimumCardinalityDataset), we will add this support in future releases.

      Returns:
      A new Dataset.
    • getConfigurationManager

      public com.oracle.labs.mlrg.olcut.config.ConfigurationManager getConfigurationManager()
      Returns the ConfigurationManager the ReproUtil is using to manage the reproduced models.

      Note modifying the returned ConfigurationManager will distort the results of any future reproductions performed by this ReproUtil.

      Returns:
      a ConfigurationManager the ReproUtil is managing.
    • reproduceFromProvenance

      public Model<T> reproduceFromProvenance() throws ClassNotFoundException
      Recreates a model object using the ModelProvenance supplied when the ReproUtil object was created.

      Recovers the trainer and dataset information before training an identical model.

      Throws IllegalStateException if the source or trainer can not be loaded or are not configurable.

      Returns:
      A reproduced model identical to the one described in the provenance.
      Throws:
      ClassNotFoundException - If the dataset or trainer could not be instantiated.
    • reproduceFromModel

      public ReproUtil.ModelReproduction<T> reproduceFromModel() throws ClassNotFoundException, com.fasterxml.jackson.core.JsonProcessingException
      Using a supplied Model object, recreates an identical model object that the provenance describes.

      Recovers the trainer and dataset information before training an identical model.

      Throws IllegalStateException if the model provenance is malformed or not defined.

      Returns:
      A reproduced model identical to the one described in the provenance.
      Throws:
      ClassNotFoundException - If the trainer or datasource class cannot be instantiated.
      com.fasterxml.jackson.core.JsonProcessingException - If the json diff could not be created.
    • diffProvenance

      public static String diffProvenance(ModelProvenance originalProvenance, ModelProvenance newProvenance) throws com.fasterxml.jackson.core.JsonProcessingException
      Creates a JSON String diff of two ModelProvenance objects. Only the differences will appear in the resulting diff.

      Recovers the trainer and dataset information before training an identical model.

      Throws IllegalStateException if the model provenances could not be parsed.

      Parameters:
      originalProvenance - The first of the two provenance objects to diff
      newProvenance - The second of the two provenance objects to diff
      Returns:
      A String JSON report displaying the differences in the model.
      Throws:
      com.fasterxml.jackson.core.JsonProcessingException - If the json diff could not be created.