Class DatasetView<T extends Output<T>>

java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.ImmutableDataset<T>
org.tribuo.dataset.DatasetView<T>
Type Parameters:
T - The output type of this dataset.
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>, Serializable, Iterable<Example<T>>, ProtoSerializable<org.tribuo.protos.core.DatasetProto>

public final class DatasetView<T extends Output<T>> extends ImmutableDataset<T>
DatasetView provides an immutable view on another Dataset that only exposes selected examples. Does not copy the examples.
See Also:
  • Field Details

    • CURRENT_VERSION

      public static final int CURRENT_VERSION
      Protobuf serialization version.
      See Also:
  • Constructor Details

    • DatasetView

      public DatasetView(Dataset<T> dataset, int[] exampleIndices, String tag)
      Creates a DatasetView which includes the supplied indices from the dataset.

      It uses the feature and output infos from the wrapped dataset.

      Parameters:
      dataset - The dataset to wrap.
      exampleIndices - The indices to present.
      tag - A tag for the view.
    • DatasetView

      public DatasetView(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag)
      Creates a DatasetView which includes the supplied indices from the dataset.

      This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated (e.g., in BaggingTrainer).

      Parameters:
      dataset - The dataset to sample from.
      exampleIndices - The indices of this view in the wrapped dataset.
      featureIDs - The featureIDs to use for this dataset.
      labelIDs - The labelIDs to use for this dataset.
      tag - A tag for the view.
  • Method Details

    • deserializeFromProto

      public static DatasetView<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
    • createView

      public static <T extends Output<T>> DatasetView<T> createView(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag)
      Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.
      Type Parameters:
      T - The type of the Output in the dataset.
      Parameters:
      dataset - The dataset to create a view over.
      predicate - The predicate which determines if an example is in this view.
      tag - A tag denoting what the predicate does.
      Returns:
      A dataset view containing each example where the predicate is true.
    • createBootstrapView

      public static <T extends Output<T>> DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed)
      Generates a DatasetView bootstrapped from the supplied Dataset.
      Type Parameters:
      T - The type of the Output in the dataset.
      Parameters:
      dataset - The dataset to sample from.
      size - The size of the sample.
      seed - A seed for the RNG.
      Returns:
      A dataset view containing a bootstrap sample of the supplied dataset.
    • createBootstrapView

      public static <T extends Output<T>> DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs)
      Generates a DatasetView bootstrapped from the supplied Dataset.

      This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated.

      Type Parameters:
      T - The type of the Output in the dataset.
      Parameters:
      dataset - The dataset to sample from.
      size - The size of the sample.
      seed - A seed for the RNG.
      featureIDs - The featureIDs to use for this dataset.
      outputIDs - The output info to use for this dataset.
      Returns:
      A dataset view containing a bootstrap sample of the supplied dataset.
    • createWeightedBootstrapView

      public static <T extends Output<T>> DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights)
      Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.
      Type Parameters:
      T - The type of the Output in the dataset.
      Parameters:
      dataset - The dataset to sample from.
      size - The size of the sample.
      seed - A seed for the RNG.
      exampleWeights - The sampling weights for each example, must be in the range 0,1.
      Returns:
      A dataset view containing a weighted bootstrap sample of the supplied dataset.
    • createWeightedBootstrapView

      public static <T extends Output<T>> DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs)
      Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.

      This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated.

      Type Parameters:
      T - The type of the Output in the dataset.
      Parameters:
      dataset - The dataset to sample from.
      size - The size of the sample.
      seed - A seed for the RNG.
      exampleWeights - The sampling weights for each example, must be in the range 0,1.
      featureIDs - The featureIDs to use for this dataset.
      outputIDs - The output info to use for this dataset.
      Returns:
      A dataset view containing a weighted bootstrap sample of the supplied dataset.
    • storeIndicesInProvenance

      public boolean storeIndicesInProvenance()
      Are the indices stored in the provenance system.
      Returns:
      True if the indices will be stored in the provenance of this view.
    • setStoreIndices

      public void setStoreIndices(boolean storeIndices)
      Set to true to store the indices in the provenance system.
      Parameters:
      storeIndices - True if the indices should be stored in the provenance of this view.
    • toString

      public String toString()
      Overrides:
      toString in class ImmutableDataset<T extends Output<T>>
    • getOutputs

      public Set<T> getOutputs()
      Gets the set of outputs that occur in the examples in this dataset.
      Overrides:
      getOutputs in class ImmutableDataset<T extends Output<T>>
      Returns:
      the set of outputs that occur in the examples in this dataset.
    • size

      public int size()
      Gets the size of the data set.
      Overrides:
      size in class Dataset<T extends Output<T>>
      Returns:
      the size of the data set.
    • getFeatureMap

      public ImmutableFeatureMap getFeatureMap()
      Description copied from class: Dataset
      Returns this dataset's FeatureMap.
      Overrides:
      getFeatureMap in class ImmutableDataset<T extends Output<T>>
      Returns:
      The feature map from this dataset.
    • getOutputInfo

      public ImmutableOutputInfo<T> getOutputInfo()
      Description copied from class: Dataset
      Returns this dataset's OutputInfo.
      Overrides:
      getOutputInfo in class ImmutableDataset<T extends Output<T>>
      Returns:
      The output info.
    • iterator

      public Iterator<Example<T>> iterator()
      Specified by:
      iterator in interface Iterable<T extends Output<T>>
      Overrides:
      iterator in class Dataset<T extends Output<T>>
    • getData

      public List<Example<T>> getData()
      Description copied from class: Dataset
      Gets the examples as an unmodifiable list. This list will throw an UnsupportedOperationException if any elements are added to it.

      In other words, using the following to add additional examples to this dataset with throw an exception: dataset.getData().add(example) Instead, use MutableDataset.add(Example).

      Overrides:
      getData in class Dataset<T extends Output<T>>
      Returns:
      The unmodifiable example list.
    • getExample

      public Example<T> getExample(int index)
      Description copied from class: Dataset
      Gets the example at the supplied index.

      Throws IllegalArgumentException if the index is invalid or outside the bounds.

      Overrides:
      getExample in class Dataset<T extends Output<T>>
      Parameters:
      index - The index of the example.
      Returns:
      The example.
    • getProvenance

      public DatasetView.DatasetViewProvenance getProvenance()
      Specified by:
      getProvenance in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<T extends Output<T>>
      Overrides:
      getProvenance in class ImmutableDataset<T extends Output<T>>
    • getTag

      public String getTag()
      The tag associated with this dataset, if it exists.
      Returns:
      The dataset tag.
    • getExampleIndices

      public int[] getExampleIndices()
      Returns a copy of the indices used in this view.
      Returns:
      The indices.
    • serialize

      public org.tribuo.protos.core.DatasetProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Specified by:
      serialize in interface ProtoSerializable<T extends Output<T>>
      Overrides:
      serialize in class ImmutableDataset<T extends Output<T>>
      Returns:
      The protobuf.