Class DatasetView<T extends Output<T>>
- Type Parameters:
T- The output type of this dataset.
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>,Serializable,Iterable<Example<T>>,ProtoSerializable<org.tribuo.protos.core.DatasetProto>
Dataset that only exposes selected examples.
Does not copy the examples.- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intProtobuf serialization version.Fields inherited from class org.tribuo.ImmutableDataset
dropInvalidExamples, featureIDMap, outputIDInfoFields inherited from class org.tribuo.Dataset
data, indices, outputFactory, rng, sourceProvenance, tribuoVersionFields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER -
Constructor Summary
ConstructorsConstructorDescriptionDatasetView(Dataset<T> dataset, int[] exampleIndices, String tag) Creates a DatasetView which includes the supplied indices from the dataset.DatasetView(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag) Creates a DatasetView which includes the supplied indices from the dataset. -
Method Summary
Modifier and TypeMethodDescriptionstatic <T extends Output<T>>
DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T> createView(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag) Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.static <T extends Output<T>>
DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.static <T extends Output<T>>
DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.static DatasetView<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) Deserialization factory.getData()Gets the examples as an unmodifiable list.getExample(int index) Gets the example at the supplied index.int[]Returns a copy of the indices used in this view.Returns this dataset'sFeatureMap.Returns this dataset'sOutputInfo.Gets the set of outputs that occur in the examples in this dataset.getTag()The tag associated with this dataset, if it exists.iterator()org.tribuo.protos.core.DatasetProtoSerializes this object to a protobuf.voidsetStoreIndices(boolean storeIndices) Set to true to store the indices in the provenance system.voidshuffle(boolean shuffle) Shuffles the indices, or stops shuffling them.intsize()Gets the size of the data set.booleanAre the indices stored in the provenance system.toString()Methods inherited from class org.tribuo.ImmutableDataset
add, add, copyDataset, copyDataset, copyDataset, getDropInvalidExamples, getFeatureIDMap, getOutputIDInfo, hashFeatureMapMethods inherited from class org.tribuo.Dataset
castDataset, createDataCarrier, createDataCarrier, createTransformers, createTransformers, deserialize, deserializeExamples, deserializeFromFile, deserializeFromStream, getOutputFactory, getSourceDescription, getSourceProvenance, serializeToFile, serializeToStream, validateMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
-
Constructor Details
-
DatasetView
Creates a DatasetView which includes the supplied indices from the dataset.It uses the feature and output infos from the wrapped dataset.
- Parameters:
dataset- The dataset to wrap.exampleIndices- The indices to present.tag- A tag for the view.
-
DatasetView
public DatasetView(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag) Creates a DatasetView which includes the supplied indices from the dataset.This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated (e.g., in BaggingTrainer).
- Parameters:
dataset- The dataset to sample from.exampleIndices- The indices of this view in the wrapped dataset.featureIDs- The featureIDs to use for this dataset.labelIDs- The labelIDs to use for this dataset.tag- A tag for the view.
-
-
Method Details
-
deserializeFromProto
public static DatasetView<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version- The serialized object version.className- The class name.message- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException- If the protobuf could not be parsed from themessage.
-
createView
public static <T extends Output<T>> DatasetView<T> createView(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag) Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.- Type Parameters:
T- The type of the Output in the dataset.- Parameters:
dataset- The dataset to create a view over.predicate- The predicate which determines if an example is in this view.tag- A tag denoting what the predicate does.- Returns:
- A dataset view containing each example where the predicate is true.
-
createBootstrapView
public static <T extends Output<T>> DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed) Generates a DatasetView bootstrapped from the supplied Dataset.- Type Parameters:
T- The type of the Output in the dataset.- Parameters:
dataset- The dataset to sample from.size- The size of the sample.seed- A seed for the RNG.- Returns:
- A dataset view containing a bootstrap sample of the supplied dataset.
-
createBootstrapView
public static <T extends Output<T>> DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset.This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated.
- Type Parameters:
T- The type of the Output in the dataset.- Parameters:
dataset- The dataset to sample from.size- The size of the sample.seed- A seed for the RNG.featureIDs- The featureIDs to use for this dataset.outputIDs- The output info to use for this dataset.- Returns:
- A dataset view containing a bootstrap sample of the supplied dataset.
-
createWeightedBootstrapView
public static <T extends Output<T>> DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.- Type Parameters:
T- The type of the Output in the dataset.- Parameters:
dataset- The dataset to sample from.size- The size of the sample.seed- A seed for the RNG.exampleWeights- The sampling weights for each example, must be in the range 0,1.- Returns:
- A dataset view containing a weighted bootstrap sample of the supplied dataset.
-
createWeightedBootstrapView
public static <T extends Output<T>> DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated.
- Type Parameters:
T- The type of the Output in the dataset.- Parameters:
dataset- The dataset to sample from.size- The size of the sample.seed- A seed for the RNG.exampleWeights- The sampling weights for each example, must be in the range 0,1.featureIDs- The featureIDs to use for this dataset.outputIDs- The output info to use for this dataset.- Returns:
- A dataset view containing a weighted bootstrap sample of the supplied dataset.
-
storeIndicesInProvenance
public boolean storeIndicesInProvenance()Are the indices stored in the provenance system.- Returns:
- True if the indices will be stored in the provenance of this view.
-
setStoreIndices
public void setStoreIndices(boolean storeIndices) Set to true to store the indices in the provenance system.- Parameters:
storeIndices- True if the indices should be stored in the provenance of this view.
-
toString
-
getOutputs
Gets the set of outputs that occur in the examples in this dataset.- Overrides:
getOutputsin classImmutableDataset<T extends Output<T>>- Returns:
- the set of outputs that occur in the examples in this dataset.
-
size
-
getFeatureMap
Description copied from class:DatasetReturns this dataset'sFeatureMap.- Overrides:
getFeatureMapin classImmutableDataset<T extends Output<T>>- Returns:
- The feature map from this dataset.
-
getOutputInfo
Description copied from class:DatasetReturns this dataset'sOutputInfo.- Overrides:
getOutputInfoin classImmutableDataset<T extends Output<T>>- Returns:
- The output info.
-
shuffle
public void shuffle(boolean shuffle) Description copied from class:DatasetShuffles the indices, or stops shuffling them.The shuffle only affects the iterator, it does not affect
Dataset.getExample(int).Multiple calls with the argument true will shuffle the dataset multiple times. The RNG is shared across all Dataset instances, so methods which access it are synchronized.
Using this method will prevent the provenance system from tracking the exact state of the dataset, which may be important for trainers which depend on the example order, like those using stochastic gradient descent.
-
iterator
-
getData
Description copied from class:DatasetGets the examples as an unmodifiable list. This list will throw an UnsupportedOperationException if any elements are added to it.In other words, using the following to add additional examples to this dataset with throw an exception:
dataset.getData().add(example)Instead, useMutableDataset.add(Example). -
getExample
-
getProvenance
- Specified by:
getProvenancein interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<T extends Output<T>>- Overrides:
getProvenancein classImmutableDataset<T extends Output<T>>
-
getTag
The tag associated with this dataset, if it exists.- Returns:
- The dataset tag.
-
getExampleIndices
public int[] getExampleIndices()Returns a copy of the indices used in this view.- Returns:
- The indices.
-
serialize
public org.tribuo.protos.core.DatasetProto serialize()Description copied from interface:ProtoSerializableSerializes this object to a protobuf.- Specified by:
serializein interfaceProtoSerializable<T extends Output<T>>- Overrides:
serializein classImmutableDataset<T extends Output<T>>- Returns:
- The protobuf.
-