Class DatasetProvenance

java.lang.Object
org.tribuo.provenance.DatasetProvenance
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance, com.oracle.labs.mlrg.olcut.provenance.Provenance, Serializable, Iterable<com.oracle.labs.mlrg.olcut.util.Pair<String,com.oracle.labs.mlrg.olcut.provenance.Provenance>>, DataProvenance
Direct Known Subclasses:
DatasetView.DatasetViewProvenance, EmptyDatasetProvenance, ExternalDatasetProvenance, MinimumCardinalityDataset.MinimumCardinalityDatasetProvenance, MinimumCardinalitySequenceDataset.MinimumCardinalitySequenceDatasetProvenance

public class DatasetProvenance extends Object implements DataProvenance, com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance
Base class for dataset provenance.

Dataset provenance can be a chain of other DataProvenances which track operations like selection and subsampling.

See Also:
  • Field Summary

    Fields inherited from interface com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance

    CLASS_NAME, DEFAULT_HASH_TYPE
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
     
    DatasetProvenance(Map<String,com.oracle.labs.mlrg.olcut.provenance.Provenance> map)
    Deserialization constructor.
    protected
    DatasetProvenance(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, String datasetClassName, boolean isDense, boolean isSequence, int numExamples, int numFeatures, int numOutputs)
    Constructs a dataset provenance using the supplied information.
     
    DatasetProvenance(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, Dataset<T> dataset)
    Creates a dataset provenance from the supplied dataset.
     
    DatasetProvenance(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, SequenceDataset<T> dataset)
    Creates a dataset provenance from the supplied sequence dataset.
  • Method Summary

    Modifier and Type
    Method
    Description
    protected List<com.oracle.labs.mlrg.olcut.util.Pair<String,com.oracle.labs.mlrg.olcut.provenance.Provenance>>
    Returns a list of all the provenances.
    boolean
     
     
    int
    The number of examples.
    int
    The number of features.
    int
    The number of output dimensions.
    The input data provenance.
    com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance>
    The transformation provenances, in application order.
    The Tribuo version used to create this dataset.
    int
     
    boolean
    Is the Dataset dense?
    boolean
    Is it a sequence dataset?
    Iterator<com.oracle.labs.mlrg.olcut.util.Pair<String,com.oracle.labs.mlrg.olcut.provenance.Provenance>>
     
     

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface java.lang.Iterable

    forEach, spliterator

    Methods inherited from interface com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance

    generateString
  • Constructor Details

    • DatasetProvenance

      public DatasetProvenance(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, Dataset<T> dataset)
      Creates a dataset provenance from the supplied dataset.
      Type Parameters:
      T - The output type of the dataset.
      Parameters:
      sourceProvenance - The data source provenance.
      transformationProvenance - The transformations applied to this dataset.
      dataset - The dataset itself.
    • DatasetProvenance

      public DatasetProvenance(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, SequenceDataset<T> dataset)
      Creates a dataset provenance from the supplied sequence dataset.
      Type Parameters:
      T - The output type of the sequence dataset.
      Parameters:
      sourceProvenance - The data source provenance.
      transformationProvenance - The transformations applied to this sequence dataset.
      dataset - The sequence dataset itself.
    • DatasetProvenance

      protected DatasetProvenance(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, String datasetClassName, boolean isDense, boolean isSequence, int numExamples, int numFeatures, int numOutputs)
      Constructs a dataset provenance using the supplied information.
      Parameters:
      sourceProvenance - The data source provenance.
      transformationProvenance - The transformations applied to the dataset.
      datasetClassName - The dataset class name.
      isDense - Is the dataset dense?
      isSequence - Is it a sequence dataset?
      numExamples - The number of examples in this dataset.
      numFeatures - The number of features in this dataset.
      numOutputs - The output dimensionality.
    • DatasetProvenance

      public DatasetProvenance(Map<String,com.oracle.labs.mlrg.olcut.provenance.Provenance> map)
      Deserialization constructor.
      Parameters:
      map - The provenances.
  • Method Details

    • getClassName

      public String getClassName()
      Specified by:
      getClassName in interface com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance
    • getSourceProvenance

      public DataProvenance getSourceProvenance()
      The input data provenance.
      Returns:
      The data provenance.
    • getTransformationProvenance

      public com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> getTransformationProvenance()
      The transformation provenances, in application order.
      Returns:
      The transformation provenances.
    • isDense

      public boolean isDense()
      Is the Dataset dense?
      Returns:
      True if dense.
    • isSequence

      public boolean isSequence()
      Is it a sequence dataset?
      Returns:
      True if a sequence dataset.
    • getNumExamples

      public int getNumExamples()
      The number of examples.
      Returns:
      The number of examples.
    • getNumFeatures

      public int getNumFeatures()
      The number of features.
      Returns:
      The number of features.
    • getNumOutputs

      public int getNumOutputs()
      The number of output dimensions.
      Returns:
      The number of output dimensions.
    • getTribuoVersion

      public String getTribuoVersion()
      The Tribuo version used to create this dataset.
      Returns:
      The Tribuo version.
    • iterator

      public Iterator<com.oracle.labs.mlrg.olcut.util.Pair<String,com.oracle.labs.mlrg.olcut.provenance.Provenance>> iterator()
      Specified by:
      iterator in interface Iterable<com.oracle.labs.mlrg.olcut.util.Pair<String,com.oracle.labs.mlrg.olcut.provenance.Provenance>>
    • allProvenances

      protected List<com.oracle.labs.mlrg.olcut.util.Pair<String,com.oracle.labs.mlrg.olcut.provenance.Provenance>> allProvenances()
      Returns a list of all the provenances.
      Returns:
      The provenances.
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object