Class MutableSequenceDataset<T extends Output<T>>
java.lang.Object
org.tribuo.sequence.SequenceDataset<T>
org.tribuo.sequence.MutableSequenceDataset<T>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
,Serializable
,Iterable<SequenceExample<T>>
A MutableSequenceDataset is a
SequenceDataset
with a MutableFeatureMap
which grows over time.
Whenever an SequenceExample
is added to the dataset.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected boolean
protected final MutableFeatureMap
A map from feature names to IDs for the features found in this dataset.protected final MutableOutputInfo
<T> A map from labels to IDs for the labels found in this dataset.Fields inherited from class org.tribuo.sequence.SequenceDataset
data, outputFactory, sourceProvenance
-
Constructor Summary
ConstructorsConstructorDescriptionMutableSequenceDataset
(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source.MutableSequenceDataset
(DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates an empty sequence dataset.MutableSequenceDataset
(ImmutableSequenceDataset<T> dataset) MutableSequenceDataset
(SequenceDataSource<T> dataSource) -
Method Summary
Modifier and TypeMethodDescriptionvoid
add
(SequenceExample<T> ex) Adds aSequenceExample
to this dataset.void
addAll
(Collection<SequenceExample<T>> collection) Adds all the SequenceExamples in the supplied collection to this dataset.void
clear()
Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances.void
densify()
Iterates through the examples, converting implicit zeros into explicit zeros.An immutable view on the feature map.The feature map.An immutable view on the output info in this dataset.The output info in this dataset.Gets the set of labels that occur in the examples in this dataset.boolean
isDense()
Is the dataset dense (i.e., do all features in the domain have a value in each example).toString()
Methods inherited from class org.tribuo.sequence.SequenceDataset
getData, getExample, getFlatDataset, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, size
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
outputInfo
A map from labels to IDs for the labels found in this dataset. -
featureMap
A map from feature names to IDs for the features found in this dataset. -
dense
protected boolean dense
-
-
Constructor Details
-
MutableSequenceDataset
Creates an empty sequence dataset.- Parameters:
sourceProvenance
- A description of the input data, including preprocessing steps.outputFactory
- The output factory.
-
MutableSequenceDataset
public MutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, OutputFactory<T> outputFactory) Creates a dataset from a data source. This method will create the output and feature ID maps that are needed for training and evaluating classifiers.- Parameters:
dataSource
- The input data.sourceProvenance
- A description of the data, including preprocessing steps.outputFactory
- The output factory.
-
MutableSequenceDataset
-
MutableSequenceDataset
-
-
Method Details
-
clear
public void clear()Clears all the examples out of this dataset, and flushes the FeatureMap, OutputInfo, and transform provenances. -
add
Adds aSequenceExample
to this dataset.It also canonicalises the reference to each feature's name (i.e., replacing the reference to a feature's name with the canonical one stored in this Dataset's
VariableInfo
). This greatly reduces the memory footprint.- Parameters:
ex
- The example to add.
-
addAll
Adds all the SequenceExamples in the supplied collection to this dataset.- Parameters:
collection
- The collection of SequenceExamples.
-
getOutputs
Description copied from class:SequenceDataset
Gets the set of labels that occur in the examples in this dataset.- Specified by:
getOutputs
in classSequenceDataset<T extends Output<T>>
- Returns:
- the set of labels that occur in the examples in this dataset.
-
getFeatureIDMap
Description copied from class:SequenceDataset
An immutable view on the feature map.- Specified by:
getFeatureIDMap
in classSequenceDataset<T extends Output<T>>
- Returns:
- The feature map.
-
getFeatureMap
Description copied from class:SequenceDataset
The feature map.- Specified by:
getFeatureMap
in classSequenceDataset<T extends Output<T>>
- Returns:
- The feature map.
-
getOutputIDInfo
Description copied from class:SequenceDataset
An immutable view on the output info in this dataset.- Specified by:
getOutputIDInfo
in classSequenceDataset<T extends Output<T>>
- Returns:
- The output info.
-
getOutputInfo
Description copied from class:SequenceDataset
The output info in this dataset.- Specified by:
getOutputInfo
in classSequenceDataset<T extends Output<T>>
- Returns:
- The output info.
-
isDense
public boolean isDense()Is the dataset dense (i.e., do all features in the domain have a value in each example).- Returns:
- True if the dataset is dense.
-
densify
public void densify()Iterates through the examples, converting implicit zeros into explicit zeros. -
toString
-
getProvenance
-