Class ImmutableDataset<T extends Output<T>>
java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.ImmutableDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>,Serializable,Iterable<Example<T>>
- Direct Known Subclasses:
DatasetView,MinimumCardinalityDataset
This is a
Dataset which has an ImmutableFeatureMap to store the feature information.
Whenever an example is added to this dataset it removes features that do not exist in the FeatureMap.
The dataset is immutable after construction (unless the examples are modified).
This class is mostly for performance optimisations inside the framework, and should not generally be used by external code.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final booleanIf true, instead of throwing an exception when an invalidExampleis encountered, this Dataset will log a warning and drop it.protected ImmutableFeatureMapA map from feature names to IDs for the features found in this dataset.protected ImmutableOutputInfo<T> Output information, and id numbers for outputs found in this dataset.Fields inherited from class org.tribuo.Dataset
data, indices, outputFactory, sourceProvenance -
Constructor Summary
ConstructorsModifierConstructorDescriptionImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.ImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.ImmutableDataset(DataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.ImmutableDataset(DataSource<T> dataSource, Model<T> model, boolean dropInvalidExamples) Creates a dataset from a data source.protectedImmutableDataset(DataProvenance description, OutputFactory<T> outputFactory) If you call this it's your job to setup outputMap, featureIDMap and fill it with examples.protectedImmutableDataset(DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) This is dangerous, and should not be used unless you've overridden everything in ImmutableDataset. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidAdds anExampleto the dataset, which will remove features with unknown names.protected voidAdds aExampleto the dataset, which will insert feature ids, remove unknown features and sort the examples by the feature ids (merging duplicate ids).static <T extends Output<T>>
ImmutableDataset<T> copyDataset(Dataset<T> dataset) Creates an immutable deep copy of the supplied dataset.static <T extends Output<T>>
ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.static <T extends Output<T>>
ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.booleanReturns true if this immutable dataset dropped any invalid examples on construction.Returns or generates anImmutableFeatureMap.Returns this dataset'sFeatureMap.Returns or generates anImmutableOutputInfo.Returns this dataset'sOutputInfo.Gets the set of outputs that occur in the examples in this dataset.static <T extends Output<T>>
ImmutableDataset<T> hashFeatureMap(Dataset<T> dataset, Hasher hasher) Creates an immutable shallow copy of the supplied dataset, using the hasher to generate aHashedFeatureMapwhich transparently maps from the feature name to the hashed variant.toString()Methods inherited from class org.tribuo.Dataset
createTransformers, createTransformers, getData, getExample, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, shuffle, sizeMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
outputIDInfo
Output information, and id numbers for outputs found in this dataset. -
featureIDMap
A map from feature names to IDs for the features found in this dataset. -
dropInvalidExamples
protected final boolean dropInvalidExamplesIf true, instead of throwing an exception when an invalidExampleis encountered, this Dataset will log a warning and drop it.
-
-
Constructor Details
-
ImmutableDataset
If you call this it's your job to setup outputMap, featureIDMap and fill it with examples.Note: Sets dropInvalidExamples to false.
- Parameters:
description- A description of the input data (including preprocessing steps).outputFactory- The factory for this output type.
-
ImmutableDataset
Creates a dataset from a data source. It copies the feature and output maps from the supplied model.- Parameters:
dataSource- The examples.model- A model to extract feature and output maps from.dropInvalidExamples- If true, instead of throwing an exception when an invalidExampleis encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
public ImmutableDataset(DataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source. Creates immutable feature and output maps from the supplied ones.- Parameters:
dataSource- The examples.featureIDMap- The feature map.outputIDInfo- The output map.dropInvalidExamples- If true, instead of throwing an exception when an invalidExampleis encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
public ImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source. Creates immutable feature and output maps from the supplied ones.- Parameters:
dataSource- The examples.description- A description of the input data (including preprocessing steps).outputFactory- The output factory.featureIDMap- The feature id map, used to remove unknown features.outputIDInfo- The output id map.dropInvalidExamples- If true, instead of throwing an exception when an invalidExampleis encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
public ImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.- Parameters:
dataSource- The examples.description- A description of the input data (including preprocessing steps).outputFactory- The factory for this output type.featureIDMap- The feature id map, used to remove unknown features.outputIDInfo- The output id map.dropInvalidExamples- If true, instead of throwing an exception when an invalidExampleis encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
protected ImmutableDataset(DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) This is dangerous, and should not be used unless you've overridden everything in ImmutableDataset.Note: Sets dropInvalidExamples to false.
- Parameters:
description- A description of the data you're going to add to this dataset.outputFactory- The factory for this output type.featureIDMap- The feature id map, used to remove unknown features.outputIDInfo- The output id map.
-
-
Method Details
-
add
-
add
-
getOutputs
-
getFeatureIDMap
Description copied from class:DatasetReturns or generates anImmutableFeatureMap.- Specified by:
getFeatureIDMapin classDataset<T extends Output<T>>- Returns:
- An immutable feature map with id numbers.
-
getFeatureMap
Description copied from class:DatasetReturns this dataset'sFeatureMap.- Specified by:
getFeatureMapin classDataset<T extends Output<T>>- Returns:
- The feature map from this dataset.
-
getOutputIDInfo
Description copied from class:DatasetReturns or generates anImmutableOutputInfo.- Specified by:
getOutputIDInfoin classDataset<T extends Output<T>>- Returns:
- An immutable output info.
-
getOutputInfo
Description copied from class:DatasetReturns this dataset'sOutputInfo.- Specified by:
getOutputInfoin classDataset<T extends Output<T>>- Returns:
- The output info.
-
getDropInvalidExamples
public boolean getDropInvalidExamples()Returns true if this immutable dataset dropped any invalid examples on construction.- Returns:
- True if it drops invalid examples.
-
toString
-
getProvenance
-
copyDataset
Creates an immutable deep copy of the supplied dataset.- Type Parameters:
T- The type of output.- Parameters:
dataset- The dataset to copy.- Returns:
- An immutable copy of the dataset.
-
copyDataset
public static <T extends Output<T>> ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.- Type Parameters:
T- The type of output.- Parameters:
dataset- The dataset to copy.featureIDMap- The new feature map to use. Removes features which are not found in this map.outputIDInfo- The new output info to use.- Returns:
- An immutable copy of the dataset.
-
copyDataset
public static <T extends Output<T>> ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.- Type Parameters:
T- The type of output.- Parameters:
dataset- The dataset to copy.featureIDMap- The new feature map to use. Removes features which are not found in this map.outputIDInfo- The new output info to use.merger- The merge function to use to reduce features given new ids.- Returns:
- An immutable copy of the dataset.
-
hashFeatureMap
public static <T extends Output<T>> ImmutableDataset<T> hashFeatureMap(Dataset<T> dataset, Hasher hasher) Creates an immutable shallow copy of the supplied dataset, using the hasher to generate aHashedFeatureMapwhich transparently maps from the feature name to the hashed variant.- Type Parameters:
T- The type of output.- Parameters:
dataset- The dataset to copy.hasher- The hashing function to use.- Returns:
- An immutable copy of the dataset.
-