Package org.tribuo
Class ImmutableDataset<T extends Output<T>>
java.lang.Object
org.tribuo.Dataset<T>
org.tribuo.ImmutableDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
,Serializable
,Iterable<Example<T>>
- Direct Known Subclasses:
DatasetView
,MinimumCardinalityDataset
This is a
Dataset
which has an ImmutableFeatureMap
to store the feature information.
Whenever an example is added to this dataset it removes features that do not exist in the FeatureMap
.
The dataset is immutable after construction (unless the examples are modified).
This class is mostly for performance optimisations inside the framework, and should not generally be used by external code.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionprotected final boolean
If true, instead of throwing an exception when an invalidExample
is encountered, this Dataset will log a warning and drop it.protected ImmutableFeatureMap
A map from feature names to IDs for the features found in this dataset.protected ImmutableOutputInfo<T>
Output information, and id numbers for outputs found in this dataset.Fields inherited from class org.tribuo.Dataset
data, indices, outputFactory, sourceProvenance
-
Constructor Summary
ModifierConstructorDescriptionImmutableDataset
(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.ImmutableDataset
(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.ImmutableDataset
(DataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.ImmutableDataset
(DataSource<T> dataSource, Model<T> model, boolean dropInvalidExamples) Creates a dataset from a data source.protected
ImmutableDataset
(DataProvenance description, OutputFactory<T> outputFactory) If you call this it's your job to setup outputMap, featureIDMap and fill it with examples.protected
ImmutableDataset
(DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) This is dangerous, and should not be used unless you've overridden everything in ImmutableDataset. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Adds anExample
to the dataset, which will remove features with unknown names.protected void
Adds aExample
to the dataset, which will insert feature ids, remove unknown features and sort the examples by the feature ids (merging duplicate ids).static <T extends Output<T>>
ImmutableDataset<T>copyDataset
(Dataset<T> dataset) Creates an immutable deep copy of the supplied dataset.static <T extends Output<T>>
ImmutableDataset<T>copyDataset
(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.static <T extends Output<T>>
ImmutableDataset<T>copyDataset
(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.boolean
Returns true if this immutable dataset dropped any invalid examples on construction.Returns or generates anImmutableFeatureMap
.Returns this dataset'sFeatureMap
.Returns or generates anImmutableOutputInfo
.Returns this dataset'sOutputInfo
.Gets the set of outputs that occur in the examples in this dataset.static <T extends Output<T>>
ImmutableDataset<T>hashFeatureMap
(Dataset<T> dataset, Hasher hasher) Creates an immutable shallow copy of the supplied dataset, using the hasher to generate aHashedFeatureMap
which transparently maps from the feature name to the hashed variant.toString()
Methods inherited from class org.tribuo.Dataset
castDataset, createTransformers, createTransformers, getData, getExample, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, shuffle, size, validate
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
outputIDInfo
Output information, and id numbers for outputs found in this dataset. -
featureIDMap
A map from feature names to IDs for the features found in this dataset. -
dropInvalidExamples
protected final boolean dropInvalidExamplesIf true, instead of throwing an exception when an invalidExample
is encountered, this Dataset will log a warning and drop it.
-
-
Constructor Details
-
ImmutableDataset
If you call this it's your job to setup outputMap, featureIDMap and fill it with examples.Note: Sets dropInvalidExamples to false.
- Parameters:
description
- A description of the input data (including preprocessing steps).outputFactory
- The factory for this output type.
-
ImmutableDataset
Creates a dataset from a data source. It copies the feature and output maps from the supplied model.- Parameters:
dataSource
- The examples.model
- A model to extract feature and output maps from.dropInvalidExamples
- If true, instead of throwing an exception when an invalidExample
is encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
public ImmutableDataset(DataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source. Creates immutable feature and output maps from the supplied ones.- Parameters:
dataSource
- The examples.featureIDMap
- The feature map.outputIDInfo
- The output map.dropInvalidExamples
- If true, instead of throwing an exception when an invalidExample
is encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
public ImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source. Creates immutable feature and output maps from the supplied ones.- Parameters:
dataSource
- The examples.description
- A description of the input data (including preprocessing steps).outputFactory
- The output factory.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.dropInvalidExamples
- If true, instead of throwing an exception when an invalidExample
is encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
public ImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, boolean dropInvalidExamples) Creates a dataset from a data source.- Parameters:
dataSource
- The examples.description
- A description of the input data (including preprocessing steps).outputFactory
- The factory for this output type.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.dropInvalidExamples
- If true, instead of throwing an exception when an invalidExample
is encountered, this Dataset will log a warning and drop it.
-
ImmutableDataset
protected ImmutableDataset(DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) This is dangerous, and should not be used unless you've overridden everything in ImmutableDataset.Note: Sets dropInvalidExamples to false.
- Parameters:
description
- A description of the data you're going to add to this dataset.outputFactory
- The factory for this output type.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.
-
-
Method Details
-
add
Adds anExample
to the dataset, which will remove features with unknown names.- Parameters:
ex
- AnExample
to add to the dataset.
-
add
Adds aExample
to the dataset, which will insert feature ids, remove unknown features and sort the examples by the feature ids (merging duplicate ids).- Parameters:
ex
- The example to add.merger
- TheMerger
to use.
-
getOutputs
Description copied from class:Dataset
Gets the set of outputs that occur in the examples in this dataset.- Specified by:
getOutputs
in classDataset<T extends Output<T>>
- Returns:
- the set of outputs that occur in the examples in this dataset.
-
getFeatureIDMap
Description copied from class:Dataset
Returns or generates anImmutableFeatureMap
.- Specified by:
getFeatureIDMap
in classDataset<T extends Output<T>>
- Returns:
- An immutable feature map with id numbers.
-
getFeatureMap
Description copied from class:Dataset
Returns this dataset'sFeatureMap
.- Specified by:
getFeatureMap
in classDataset<T extends Output<T>>
- Returns:
- The feature map from this dataset.
-
getOutputIDInfo
Description copied from class:Dataset
Returns or generates anImmutableOutputInfo
.- Specified by:
getOutputIDInfo
in classDataset<T extends Output<T>>
- Returns:
- An immutable output info.
-
getOutputInfo
Description copied from class:Dataset
Returns this dataset'sOutputInfo
.- Specified by:
getOutputInfo
in classDataset<T extends Output<T>>
- Returns:
- The output info.
-
getDropInvalidExamples
public boolean getDropInvalidExamples()Returns true if this immutable dataset dropped any invalid examples on construction.- Returns:
- True if it drops invalid examples.
-
toString
-
getProvenance
-
copyDataset
Creates an immutable deep copy of the supplied dataset.- Type Parameters:
T
- The type of output.- Parameters:
dataset
- The dataset to copy.- Returns:
- An immutable copy of the dataset.
-
copyDataset
public static <T extends Output<T>> ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.- Type Parameters:
T
- The type of output.- Parameters:
dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.- Returns:
- An immutable copy of the dataset.
-
copyDataset
public static <T extends Output<T>> ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.- Type Parameters:
T
- The type of output.- Parameters:
dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.merger
- The merge function to use to reduce features given new ids.- Returns:
- An immutable copy of the dataset.
-
hashFeatureMap
public static <T extends Output<T>> ImmutableDataset<T> hashFeatureMap(Dataset<T> dataset, Hasher hasher) Creates an immutable shallow copy of the supplied dataset, using the hasher to generate aHashedFeatureMap
which transparently maps from the feature name to the hashed variant.- Type Parameters:
T
- The type of output.- Parameters:
dataset
- The dataset to copy.hasher
- The hashing function to use.- Returns:
- An immutable copy of the dataset.
-