Package org.tribuo.sequence
Class ImmutableSequenceDataset<T extends Output<T>>
java.lang.Object
org.tribuo.sequence.SequenceDataset<T>
org.tribuo.sequence.ImmutableSequenceDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
,Serializable
,Iterable<SequenceExample<T>>
- Direct Known Subclasses:
MinimumCardinalitySequenceDataset
public class ImmutableSequenceDataset<T extends Output<T>>
extends SequenceDataset<T>
implements Serializable
This is a
SequenceDataset
which has an ImmutableFeatureMap
to store the feature information.
Whenever an example is added to this dataset it removes features that do not exist in the FeatureMap.
The dataset is immutable after construction (unless the examples are modified).- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionprotected ImmutableFeatureMap
A map from feature names to IDs for the features found in this dataset.protected ImmutableOutputInfo<T>
A map from labels to IDs for the labels found in this dataset.Fields inherited from class org.tribuo.sequence.SequenceDataset
data, outputFactory, sourceProvenance
-
Constructor Summary
ModifierConstructorDescriptionImmutableSequenceDataset
(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, OutputFactory<T> outputFactory) Creates a dataset from a data source.ImmutableSequenceDataset
(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, OutputFactory<T> outputFactory) Creates a dataset from a data source.protected
ImmutableSequenceDataset
(DataProvenance sourceProvenance, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) This is dangerous, and should not be used unless you've overridden everything in ImmutableSequenceDataset.protected
ImmutableSequenceDataset
(DataProvenance sourceProvenance, OutputFactory<T> outputFactory) If you call this it's your job to setup outputIDInfo and featureIDMap.ImmutableSequenceDataset
(SequenceDataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo) Creates a dataset from a data source, using the specified output and feature domains.ImmutableSequenceDataset
(SequenceDataSource<T> dataSource, SequenceModel<T> model) Creates a dataset from a data source, taking the output and feature domains from the supplied model. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
add
(SequenceExample<T> ex) Adds aSequenceExample
to the dataset, which will insert feature ids, remove unknown features and sort the examples by the feature ids.protected void
add
(SequenceExample<T> ex, Merger merger) Adds aSequenceExample
to the dataset, which will insert feature ids, remove unknown features and sort the examples by the feature ids.static <T extends Output<T>>
ImmutableSequenceDataset<T>copyDataset
(SequenceDataset<T> dataset) Creates an immutable deep copy of the supplied dataset.static <T extends Output<T>>
ImmutableSequenceDataset<T>copyDataset
(SequenceDataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.static <T extends Output<T>>
ImmutableSequenceDataset<T>copyDataset
(SequenceDataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.An immutable view on the feature map.The feature map.An immutable view on the output info in this dataset.The output info in this dataset.Gets the set of labels that occur in the examples in this dataset.toString()
Methods inherited from class org.tribuo.sequence.SequenceDataset
getData, getExample, getFlatDataset, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, size
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
outputIDInfo
A map from labels to IDs for the labels found in this dataset. -
featureIDMap
A map from feature names to IDs for the features found in this dataset.
-
-
Constructor Details
-
ImmutableSequenceDataset
If you call this it's your job to setup outputIDInfo and featureIDMap.- Parameters:
sourceProvenance
- A description of the dataset including preprocessing steps.outputFactory
- The output factory.
-
ImmutableSequenceDataset
Creates a dataset from a data source, taking the output and feature domains from the supplied model.- Parameters:
dataSource
- The input data.model
- The model to use for the feature and output domains.
-
ImmutableSequenceDataset
public ImmutableSequenceDataset(SequenceDataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo) Creates a dataset from a data source, using the specified output and feature domains.- Parameters:
dataSource
- The input data.featureIDMap
- The feature domain.outputIDInfo
- The output domain.
-
ImmutableSequenceDataset
public ImmutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, OutputFactory<T> outputFactory) Creates a dataset from a data source. This method will create the output and feature ID maps that are needed for training and evaluating classifiers.- Parameters:
dataSource
- The input data.sourceProvenance
- A description of the data.featureIDMap
- The feature map, used to remove unknown features.outputIDInfo
- The output map.outputFactory
- The output factory.
-
ImmutableSequenceDataset
public ImmutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, OutputFactory<T> outputFactory) Creates a dataset from a data source.- Parameters:
dataSource
- The input data.sourceProvenance
- A description of the data.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.outputFactory
- The output factory.
-
ImmutableSequenceDataset
protected ImmutableSequenceDataset(DataProvenance sourceProvenance, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) This is dangerous, and should not be used unless you've overridden everything in ImmutableSequenceDataset.- Parameters:
sourceProvenance
- A description of the data, including all preprocessing.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.
-
-
Method Details
-
add
Adds aSequenceExample
to the dataset, which will insert feature ids, remove unknown features and sort the examples by the feature ids.- Parameters:
ex
- The example to add.
-
add
Adds aSequenceExample
to the dataset, which will insert feature ids, remove unknown features and sort the examples by the feature ids.- Parameters:
ex
- The example to add.merger
- The merger to use to remove duplicate features.
-
getOutputs
Description copied from class:SequenceDataset
Gets the set of labels that occur in the examples in this dataset.- Specified by:
getOutputs
in classSequenceDataset<T extends Output<T>>
- Returns:
- the set of labels that occur in the examples in this dataset.
-
getFeatureIDMap
Description copied from class:SequenceDataset
An immutable view on the feature map.- Specified by:
getFeatureIDMap
in classSequenceDataset<T extends Output<T>>
- Returns:
- The feature map.
-
getFeatureMap
Description copied from class:SequenceDataset
The feature map.- Specified by:
getFeatureMap
in classSequenceDataset<T extends Output<T>>
- Returns:
- The feature map.
-
getOutputIDInfo
Description copied from class:SequenceDataset
An immutable view on the output info in this dataset.- Specified by:
getOutputIDInfo
in classSequenceDataset<T extends Output<T>>
- Returns:
- The output info.
-
getOutputInfo
Description copied from class:SequenceDataset
The output info in this dataset.- Specified by:
getOutputInfo
in classSequenceDataset<T extends Output<T>>
- Returns:
- The output info.
-
toString
- Overrides:
toString
in classSequenceDataset<T extends Output<T>>
-
getProvenance
-
copyDataset
public static <T extends Output<T>> ImmutableSequenceDataset<T> copyDataset(SequenceDataset<T> dataset) Creates an immutable deep copy of the supplied dataset.- Type Parameters:
T
- The type of output.- Parameters:
dataset
- The dataset to copy.- Returns:
- An immutable copy of the dataset.
-
copyDataset
public static <T extends Output<T>> ImmutableSequenceDataset<T> copyDataset(SequenceDataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.- Type Parameters:
T
- The type of output.- Parameters:
dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.- Returns:
- An immutable copy of the dataset.
-
copyDataset
public static <T extends Output<T>> ImmutableSequenceDataset<T> copyDataset(SequenceDataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.- Type Parameters:
T
- The type of output.- Parameters:
dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.merger
- The merge function to use to reduce features given new ids.- Returns:
- An immutable copy of the dataset.
-