Package org.tribuo.dataset
Class DatasetView<T extends Output<T>>
java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.ImmutableDataset<T>
org.tribuo.dataset.DatasetView<T>
- Type Parameters:
T
- The output type of this dataset.
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
,Serializable
,Iterable<Example<T>>
,ProtoSerializable<org.tribuo.protos.core.DatasetProto>
DatasetView provides an immutable view on another
Dataset
that only exposes selected examples.
Does not copy the examples.- See Also:
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
Protobuf serialization version.Fields inherited from class org.tribuo.ImmutableDataset
dropInvalidExamples, featureIDMap, outputIDInfo
Fields inherited from class org.tribuo.Dataset
data, indices, outputFactory, sourceProvenance, tribuoVersion
Fields inherited from interface org.tribuo.protos.ProtoSerializable
DESERIALIZATION_METHOD_NAME, PROVENANCE_SERIALIZER
-
Constructor Summary
ConstructorDescriptionDatasetView
(Dataset<T> dataset, int[] exampleIndices, String tag) Creates a DatasetView which includes the supplied indices from the dataset.DatasetView
(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag) Creates a DatasetView which includes the supplied indices from the dataset. -
Method Summary
Modifier and TypeMethodDescriptionstatic <T extends Output<T>>
DatasetView<T>createBootstrapView
(Dataset<T> dataset, int size, long seed) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T>createBootstrapView
(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T>createView
(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag) Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.static <T extends Output<T>>
DatasetView<T>createWeightedBootstrapView
(Dataset<T> dataset, int size, long seed, float[] exampleWeights) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.static <T extends Output<T>>
DatasetView<T>createWeightedBootstrapView
(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.static DatasetView<?>
deserializeFromProto
(int version, String className, com.google.protobuf.Any message) Deserialization factory.getData()
Gets the examples as an unmodifiable list.getExample
(int index) Gets the example at the supplied index.int[]
Returns a copy of the indices used in this view.Returns this dataset'sFeatureMap
.Returns this dataset'sOutputInfo
.Gets the set of outputs that occur in the examples in this dataset.getTag()
The tag associated with this dataset, if it exists.iterator()
org.tribuo.protos.core.DatasetProto
Serializes this object to a protobuf.void
setStoreIndices
(boolean storeIndices) Set to true to store the indices in the provenance system.int
size()
Gets the size of the data set.boolean
Are the indices stored in the provenance system.toString()
Methods inherited from class org.tribuo.ImmutableDataset
add, add, copyDataset, copyDataset, copyDataset, getDropInvalidExamples, getFeatureIDMap, getOutputIDInfo, hashFeatureMap
Methods inherited from class org.tribuo.Dataset
castDataset, createDataCarrier, createDataCarrier, createTransformers, createTransformers, deserialize, deserializeExamples, deserializeFromFile, deserializeFromStream, getOutputFactory, getSourceDescription, getSourceProvenance, serializeToFile, serializeToStream, shuffle, validate
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
CURRENT_VERSION
public static final int CURRENT_VERSIONProtobuf serialization version.- See Also:
-
-
Constructor Details
-
DatasetView
Creates a DatasetView which includes the supplied indices from the dataset.It uses the feature and output infos from the wrapped dataset.
- Parameters:
dataset
- The dataset to wrap.exampleIndices
- The indices to present.tag
- A tag for the view.
-
DatasetView
public DatasetView(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag) Creates a DatasetView which includes the supplied indices from the dataset.This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated (e.g., in BaggingTrainer).
- Parameters:
dataset
- The dataset to sample from.exampleIndices
- The indices of this view in the wrapped dataset.featureIDs
- The featureIDs to use for this dataset.labelIDs
- The labelIDs to use for this dataset.tag
- A tag for the view.
-
-
Method Details
-
deserializeFromProto
public static DatasetView<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Deserialization factory.- Parameters:
version
- The serialized object version.className
- The class name.message
- The serialized data.- Returns:
- The deserialized object.
- Throws:
com.google.protobuf.InvalidProtocolBufferException
- If the protobuf could not be parsed from themessage
.
-
createView
public static <T extends Output<T>> DatasetView<T> createView(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag) Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.- Type Parameters:
T
- The type of the Output in the dataset.- Parameters:
dataset
- The dataset to create a view over.predicate
- The predicate which determines if an example is in this view.tag
- A tag denoting what the predicate does.- Returns:
- A dataset view containing each example where the predicate is true.
-
createBootstrapView
public static <T extends Output<T>> DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed) Generates a DatasetView bootstrapped from the supplied Dataset.- Type Parameters:
T
- The type of the Output in the dataset.- Parameters:
dataset
- The dataset to sample from.size
- The size of the sample.seed
- A seed for the RNG.- Returns:
- A dataset view containing a bootstrap sample of the supplied dataset.
-
createBootstrapView
public static <T extends Output<T>> DatasetView<T> createBootstrapView(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset.This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated.
- Type Parameters:
T
- The type of the Output in the dataset.- Parameters:
dataset
- The dataset to sample from.size
- The size of the sample.seed
- A seed for the RNG.featureIDs
- The featureIDs to use for this dataset.outputIDs
- The output info to use for this dataset.- Returns:
- A dataset view containing a bootstrap sample of the supplied dataset.
-
createWeightedBootstrapView
public static <T extends Output<T>> DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.- Type Parameters:
T
- The type of the Output in the dataset.- Parameters:
dataset
- The dataset to sample from.size
- The size of the sample.seed
- A seed for the RNG.exampleWeights
- The sampling weights for each example, must be in the range 0,1.- Returns:
- A dataset view containing a weighted bootstrap sample of the supplied dataset.
-
createWeightedBootstrapView
public static <T extends Output<T>> DatasetView<T> createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.This takes the ImmutableFeatureMap and ImmutableOutputInfo parameters to save them being regenerated.
- Type Parameters:
T
- The type of the Output in the dataset.- Parameters:
dataset
- The dataset to sample from.size
- The size of the sample.seed
- A seed for the RNG.exampleWeights
- The sampling weights for each example, must be in the range 0,1.featureIDs
- The featureIDs to use for this dataset.outputIDs
- The output info to use for this dataset.- Returns:
- A dataset view containing a weighted bootstrap sample of the supplied dataset.
-
storeIndicesInProvenance
public boolean storeIndicesInProvenance()Are the indices stored in the provenance system.- Returns:
- True if the indices will be stored in the provenance of this view.
-
setStoreIndices
public void setStoreIndices(boolean storeIndices) Set to true to store the indices in the provenance system.- Parameters:
storeIndices
- True if the indices should be stored in the provenance of this view.
-
toString
- Overrides:
toString
in classImmutableDataset<T extends Output<T>>
-
getOutputs
Gets the set of outputs that occur in the examples in this dataset.- Overrides:
getOutputs
in classImmutableDataset<T extends Output<T>>
- Returns:
- the set of outputs that occur in the examples in this dataset.
-
size
public int size()Gets the size of the data set. -
getFeatureMap
Description copied from class:Dataset
Returns this dataset'sFeatureMap
.- Overrides:
getFeatureMap
in classImmutableDataset<T extends Output<T>>
- Returns:
- The feature map from this dataset.
-
getOutputInfo
Description copied from class:Dataset
Returns this dataset'sOutputInfo
.- Overrides:
getOutputInfo
in classImmutableDataset<T extends Output<T>>
- Returns:
- The output info.
-
iterator
-
getData
Description copied from class:Dataset
Gets the examples as an unmodifiable list. This list will throw an UnsupportedOperationException if any elements are added to it.In other words, using the following to add additional examples to this dataset with throw an exception:
dataset.getData().add(example)
Instead, useMutableDataset.add(Example)
. -
getExample
Description copied from class:Dataset
Gets the example at the supplied index.Throws IllegalArgumentException if the index is invalid or outside the bounds.
- Overrides:
getExample
in classDataset<T extends Output<T>>
- Parameters:
index
- The index of the example.- Returns:
- The example.
-
getProvenance
- Specified by:
getProvenance
in interfacecom.oracle.labs.mlrg.olcut.provenance.Provenancable<T extends Output<T>>
- Overrides:
getProvenance
in classImmutableDataset<T extends Output<T>>
-
getTag
The tag associated with this dataset, if it exists.- Returns:
- The dataset tag.
-
getExampleIndices
public int[] getExampleIndices()Returns a copy of the indices used in this view.- Returns:
- The indices.
-
serialize
public org.tribuo.protos.core.DatasetProto serialize()Description copied from interface:ProtoSerializable
Serializes this object to a protobuf.- Specified by:
serialize
in interfaceProtoSerializable<T extends Output<T>>
- Overrides:
serialize
in classImmutableDataset<T extends Output<T>>
- Returns:
- The protobuf.
-