Class SequenceDataset<T extends Output<T>>

java.lang.Object
org.tribuo.sequence.SequenceDataset<T>
Type Parameters:
T - the type of the outputs in the data set.
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>, Serializable, Iterable<SequenceExample<T>>
Direct Known Subclasses:
ImmutableSequenceDataset, MutableSequenceDataset

public abstract class SequenceDataset<T extends Output<T>> extends Object implements Iterable<SequenceExample<T>>, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>, Serializable
A class for sets of data, which are used to train and evaluate classifiers.

Subclass either MutableSequenceDataset or ImmutableSequenceDataset rather than this class.

See Also:
  • Field Details

  • Constructor Details

  • Method Details

    • getSourceDescription

      public String getSourceDescription()
      Returns the description of the source provenance.
      Returns:
      The source provenance in text form.
    • getData

      public List<SequenceExample<T>> getData()
      Returns an unmodifiable view on the data.
      Returns:
      The data.
    • getSourceProvenance

      public DataProvenance getSourceProvenance()
      Returns the source provenance.
      Returns:
      The source provenance.
    • getOutputs

      public abstract Set<T> getOutputs()
      Gets the set of labels that occur in the examples in this dataset.
      Returns:
      the set of labels that occur in the examples in this dataset.
    • getExample

      public SequenceExample<T> getExample(int index)
      Gets the example at the specified index, or throws IllegalArgumentException if the index is out of bounds.
      Parameters:
      index - The index.
      Returns:
      The example at that index.
    • getFlatDataset

      public Dataset<T> getFlatDataset()
      Returns a view on this SequenceDataset which aggregates all the examples and ignores the sequence structure.
      Returns:
      A flattened view on this dataset.
    • size

      public int size()
      Gets the size of the data set.
      Returns:
      the size of the data set.
    • getOutputIDInfo

      public abstract ImmutableOutputInfo<T> getOutputIDInfo()
      An immutable view on the output info in this dataset.
      Returns:
      The output info.
    • getOutputInfo

      public abstract OutputInfo<T> getOutputInfo()
      The output info in this dataset.
      Returns:
      The output info.
    • getFeatureIDMap

      public abstract ImmutableFeatureMap getFeatureIDMap()
      An immutable view on the feature map.
      Returns:
      The feature map.
    • getFeatureMap

      public abstract FeatureMap getFeatureMap()
      The feature map.
      Returns:
      The feature map.
    • getOutputFactory

      public OutputFactory<T> getOutputFactory()
      Gets the output factory.
      Returns:
      The output factory.
    • iterator

      public Iterator<SequenceExample<T>> iterator()
      Specified by:
      iterator in interface Iterable<T extends Output<T>>
    • toString

      public String toString()
      Overrides:
      toString in class Object