public class ImmutableDataset<T extends Output<T>> extends Dataset<T> implements Serializable
Dataset
which has an ImmutableFeatureMap
to store the feature information.
Whenever an example is added to this dataset it removes features that do not exist in the FeatureMap
.
The dataset is immutable after construction (unless the examples are modified).
This class is mostly for performance optimisations inside the framework, and should not generally be used by external code.
Modifier and Type | Field and Description |
---|---|
protected boolean |
dropInvalidExamples
If true, instead of throwing an exception when an invalid
Example is encountered, this Dataset will log a warning and drop it. |
protected ImmutableFeatureMap |
featureIDMap
A map from feature names to IDs for the features found in this dataset.
|
protected ImmutableOutputInfo<T> |
outputIDInfo
Output information, and id numbers for outputs found in this dataset.
|
data, indices, outputFactory, sourceProvenance
Modifier | Constructor and Description |
---|---|
protected |
ImmutableDataset(DataProvenance description,
OutputFactory<T> outputFactory)
If you call this it's your job to setup outputMap, featureIDMap and fill it with examples.
|
protected |
ImmutableDataset(DataProvenance description,
OutputFactory<T> outputFactory,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo)
This is dangerous, and should not be used unless you've overridden everything in ImmutableDataset.
|
|
ImmutableDataset(DataSource<T> dataSource,
FeatureMap featureIDMap,
OutputInfo<T> outputIDInfo,
boolean dropInvalidExamples)
Creates a dataset from a data source.
|
|
ImmutableDataset(DataSource<T> dataSource,
Model<T> model,
boolean dropInvalidExamples)
Creates a dataset from a data source.
|
|
ImmutableDataset(Iterable<Example<T>> dataSource,
DataProvenance description,
OutputFactory<T> outputFactory,
FeatureMap featureIDMap,
OutputInfo<T> outputIDInfo,
boolean dropInvalidExamples)
Creates a dataset from a data source.
|
|
ImmutableDataset(Iterable<Example<T>> dataSource,
DataProvenance description,
OutputFactory<T> outputFactory,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo,
boolean dropInvalidExamples)
Creates a dataset from a data source.
|
Modifier and Type | Method and Description |
---|---|
protected void |
add(Example<T> ex)
Adds an
Example to the dataset, which will remove features with
unknown names. |
protected void |
add(Example<T> ex,
Merger merger)
Adds a
Example to the dataset, which will insert feature ids,
remove unknown features and sort the examples by the feature ids (merging duplicate ids). |
static <T extends Output<T>> |
copyDataset(Dataset<T> dataset)
Creates an immutable deep copy of the supplied dataset.
|
static <T extends Output<T>> |
copyDataset(Dataset<T> dataset,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo)
Creates an immutable deep copy of the supplied dataset, using a different feature and output map.
|
static <T extends Output<T>> |
copyDataset(Dataset<T> dataset,
ImmutableFeatureMap featureIDMap,
ImmutableOutputInfo<T> outputIDInfo,
Merger merger)
Creates an immutable deep copy of the supplied dataset.
|
boolean |
getDropInvalidExamples()
Returns true if this immutable dataset dropped any invalid examples on construction.
|
ImmutableFeatureMap |
getFeatureIDMap()
Returns or generates an
ImmutableFeatureMap . |
ImmutableFeatureMap |
getFeatureMap()
Returns this dataset's
FeatureMap . |
ImmutableOutputInfo<T> |
getOutputIDInfo()
Returns or generates an
ImmutableOutputInfo . |
ImmutableOutputInfo<T> |
getOutputInfo()
Returns this dataset's
OutputInfo . |
Set<T> |
getOutputs()
Gets the set of outputs that occur in the examples in this dataset.
|
DatasetProvenance |
getProvenance() |
static <T extends Output<T>> |
hashFeatureMap(Dataset<T> dataset,
Hasher hasher)
Creates an immutable shallow copy of the supplied dataset, using the hasher to generate a
HashedFeatureMap which transparently maps from the feature name to the hashed variant. |
String |
toString() |
createTransformers, getData, getExample, getOutputFactory, getSourceDescription, getSourceProvenance, iterator, shuffle, size
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
forEach, spliterator
protected ImmutableOutputInfo<T extends Output<T>> outputIDInfo
protected ImmutableFeatureMap featureIDMap
protected final boolean dropInvalidExamples
Example
is encountered, this Dataset will log a warning and drop it.protected ImmutableDataset(DataProvenance description, OutputFactory<T> outputFactory)
Note: Sets dropInvalidExamples to false.
description
- A description of the input data (including preprocessing steps).outputFactory
- The factory for this output type.public ImmutableDataset(DataSource<T> dataSource, Model<T> model, boolean dropInvalidExamples)
dataSource
- The examples.model
- A model to extract feature and output maps from.dropInvalidExamples
- If true, instead of throwing an exception when an invalid Example
is encountered, this Dataset will log a warning and drop it.public ImmutableDataset(DataSource<T> dataSource, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples)
dataSource
- The examples.featureIDMap
- The feature map.outputIDInfo
- The output map.dropInvalidExamples
- If true, instead of throwing an exception when an invalid Example
is encountered, this Dataset will log a warning and drop it.public ImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, FeatureMap featureIDMap, OutputInfo<T> outputIDInfo, boolean dropInvalidExamples)
dataSource
- The examples.description
- A description of the input data (including preprocessing steps).outputFactory
- The output factory.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.dropInvalidExamples
- If true, instead of throwing an exception when an invalid Example
is encountered, this Dataset will log a warning and drop it.public ImmutableDataset(Iterable<Example<T>> dataSource, DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, boolean dropInvalidExamples)
dataSource
- The examples.description
- A description of the input data (including preprocessing steps).outputFactory
- The factory for this output type.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.dropInvalidExamples
- If true, instead of throwing an exception when an invalid Example
is encountered, this Dataset will log a warning and drop it.protected ImmutableDataset(DataProvenance description, OutputFactory<T> outputFactory, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo)
Note: Sets dropInvalidExamples to false.
description
- A description of the data you're going to add to this dataset.outputFactory
- The factory for this output type.featureIDMap
- The feature id map, used to remove unknown features.outputIDInfo
- The output id map.protected void add(Example<T> ex)
Example
to the dataset, which will remove features with
unknown names.ex
- An Example
to add to the dataset.protected void add(Example<T> ex, Merger merger)
Example
to the dataset, which will insert feature ids,
remove unknown features and sort the examples by the feature ids (merging duplicate ids).ex
- The example to add.merger
- The Merger
to use.public Set<T> getOutputs()
Dataset
getOutputs
in class Dataset<T extends Output<T>>
public ImmutableFeatureMap getFeatureIDMap()
Dataset
ImmutableFeatureMap
.getFeatureIDMap
in class Dataset<T extends Output<T>>
public ImmutableFeatureMap getFeatureMap()
Dataset
FeatureMap
.getFeatureMap
in class Dataset<T extends Output<T>>
public ImmutableOutputInfo<T> getOutputIDInfo()
Dataset
ImmutableOutputInfo
.getOutputIDInfo
in class Dataset<T extends Output<T>>
public ImmutableOutputInfo<T> getOutputInfo()
Dataset
OutputInfo
.getOutputInfo
in class Dataset<T extends Output<T>>
public boolean getDropInvalidExamples()
public DatasetProvenance getProvenance()
getProvenance
in interface com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>
public static <T extends Output<T>> ImmutableDataset<T> copyDataset(Dataset<T> dataset)
T
- The type of output.dataset
- The dataset to copy.public static <T extends Output<T>> ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo)
T
- The type of output.dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.public static <T extends Output<T>> ImmutableDataset<T> copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger)
T
- The type of output.dataset
- The dataset to copy.featureIDMap
- The new feature map to use. Removes features which are not found in this map.outputIDInfo
- The new output info to use.merger
- The merge function to use to reduce features given new ids.public static <T extends Output<T>> ImmutableDataset<T> hashFeatureMap(Dataset<T> dataset, Hasher hasher)
HashedFeatureMap
which transparently maps from the feature name to the hashed variant.T
- The type of output.dataset
- The dataset to copy.hasher
- The hashing function to use.Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.