public class ImmutableSequenceDataset<T extends Output<T>> extends SequenceDataset<T> implements Serializable
SequenceDataset
which has an ImmutableFeatureMap
to store the feature information.
Whenever an example is added to this dataset it removes features that do not exist in the FeatureMap.
The dataset is immutable after construction (unless the examples are modified).Modifier and Type | Field and Description |
---|---|
protected ImmutableFeatureMap |
featureIDMap
A map from feature names to IDs for the features found in this dataset.
|
protected ImmutableOutputInfo<T> |
outputIDInfo
A map from labels to IDs for the labels found in this dataset.
|
data, outputFactory, sourceProvenance
Modifier | Constructor and Description |
---|---|
protected |
ImmutableSequenceDataset(DataProvenance sourceProvenance,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo)
This is dangerous, and should not be used unless you've overridden everything in ImmutableSequenceDataset.
|
protected |
ImmutableSequenceDataset(DataProvenance sourceProvenance,
OutputFactory<T> outputFactory)
If you call this it's your job to setup outputIDInfo and featureIDMap.
|
|
ImmutableSequenceDataset(Iterable<SequenceExample<T>> dataSource,
DataProvenance sourceProvenance,
FeatureMap featureIDMap,
OutputInfo<T> outputIDInfo,
OutputFactory<T> outputFactory)
Creates a dataset from a data source.
|
|
ImmutableSequenceDataset(Iterable<SequenceExample<T>> dataSource,
DataProvenance sourceProvenance,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo,
OutputFactory<T> outputFactory)
Creates a dataset from a data source.
|
|
ImmutableSequenceDataset(SequenceDataSource<T> dataSource,
FeatureMap featureIDMap,
OutputInfo<T> outputIDInfo) |
|
ImmutableSequenceDataset(SequenceDataSource<T> dataSource,
SequenceModel<T> model) |
Modifier and Type | Method and Description |
---|---|
protected void |
add(SequenceExample<T> ex)
Adds a
SequenceExample to the dataset, which will insert feature ids, remove unknown features
and sort the examples by the feature ids. |
protected void |
add(SequenceExample<T> ex,
Merger merger)
Adds a
SequenceExample to the dataset, which will insert feature ids, remove unknown features
and sort the examples by the feature ids. |
static <T extends Output<T>> |
copyDataset(SequenceDataset<T> dataset)
Creates an immutable deep copy of the supplied dataset.
|
static <T extends Output<T>> |
copyDataset(SequenceDataset<T> dataset,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo)
Creates an immutable deep copy of the supplied dataset, using a different feature and output map.
|
static <T extends Output<T>> |
copyDataset(SequenceDataset<T> dataset,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo,
Merger merger)
Creates an immutable deep copy of the supplied dataset.
|
ImmutableFeatureMap |
getFeatureIDMap()
An immutable view on the feature map.
|
ImmutableFeatureMap |
getFeatureMap()
The feature map.
|
ImmutableOutputInfo<T> |
getOutputIDInfo()
An immutable view on the output info in this dataset.
|
ImmutableOutputInfo<T> |
getOutputInfo()
The output info in this dataset.
|
Set<T> |
getOutputs()
Gets the set of labels that occur in the examples in this dataset.
|
DatasetProvenance |
getProvenance() |
String |
toString() |
getData, getExample, getFlatDataset, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, size
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
forEach, spliterator
protected ImmutableOutputInfo<T extends Output<T>> outputIDInfo
protected ImmutableFeatureMap featureIDMap
protected ImmutableSequenceDataset(DataProvenance sourceProvenance, OutputFactory<T> outputFactory)
sourceProvenance
- A description of the dataset including preprocessing steps.outputFactory
- The output factory.public ImmutableSequenceDataset(SequenceDataSource<T> dataSource, SequenceModel<T> model)
public ImmutableSequenceDataset(SequenceDataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo)
public ImmutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, OutputFactory<T> outputFactory)
dataSource
- The input data.sourceProvenance
- A description of the data.featureIDMap
- The feature map, used to remove unknown features.outputIDInfo
- The output map.outputFactory
- The output factory.public ImmutableSequenceDataset(Iterable<SequenceExample<T>> dataSource, DataProvenance sourceProvenance, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, OutputFactory<T> outputFactory)
dataSource
- The input data.sourceProvenance
- A description of the data.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.outputFactory
- The output factory.protected ImmutableSequenceDataset(DataProvenance sourceProvenance, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo)
sourceProvenance
- A description of the data, including all preprocessing.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.protected void add(SequenceExample<T> ex)
SequenceExample
to the dataset, which will insert feature ids, remove unknown features
and sort the examples by the feature ids.ex
- The example to add.protected void add(SequenceExample<T> ex, Merger merger)
SequenceExample
to the dataset, which will insert feature ids, remove unknown features
and sort the examples by the feature ids.ex
- The example to add.merger
- The merger to use to remove duplicate features.public Set<T> getOutputs()
SequenceDataset
getOutputs
in class SequenceDataset<T extends Output<T>>
public ImmutableFeatureMap getFeatureIDMap()
SequenceDataset
getFeatureIDMap
in class SequenceDataset<T extends Output<T>>
public ImmutableFeatureMap getFeatureMap()
SequenceDataset
getFeatureMap
in class SequenceDataset<T extends Output<T>>
public ImmutableOutputInfo<T> getOutputIDInfo()
SequenceDataset
getOutputIDInfo
in class SequenceDataset<T extends Output<T>>
public ImmutableOutputInfo<T> getOutputInfo()
SequenceDataset
getOutputInfo
in class SequenceDataset<T extends Output<T>>
public DatasetProvenance getProvenance()
getProvenance
in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
public static <T extends Output<T>> ImmutableSequenceDataset<T> copyDataset(SequenceDataset<T> dataset)
T
- The type of output.dataset
- The dataset to copy.public static <T extends Output<T>> ImmutableSequenceDataset<T> copyDataset(SequenceDataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo)
T
- The type of output.dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.public static <T extends Output<T>> ImmutableSequenceDataset<T> copyDataset(SequenceDataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger)
T
- The type of output.dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.merger
- The merge function to use to reduce features given new ids.Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.