T
- the type of the features in the data set.public abstract class Dataset<T extends Output<T>> extends Object implements Iterable<Example<T>>, com.oracle.labs.mlrg.olcut.provenance.Provenancable<DatasetProvenance>, Serializable
Subclass MutableDataset
rather than this class.
Modifier and Type | Field and Description |
---|---|
protected List<Example<T>> |
data
The data in this data set.
|
protected int[] |
indices
The indices of the shuffled order.
|
protected OutputFactory<T> |
outputFactory
A factory for making
OutputInfo and Output of the appropriate type. |
protected DataProvenance |
sourceProvenance
The provenance of the data source, extracted on construction.
|
Modifier | Constructor and Description |
---|---|
protected |
Dataset(DataProvenance provenance,
OutputFactory<T> outputFactory)
Creates a dataset.
|
protected |
Dataset(DataSource<T> dataSource)
Creates a dataset.
|
Modifier and Type | Method and Description |
---|---|
TransformerMap |
createTransformers(TransformationMap transformations)
Takes a
TransformationMap and converts it into a TransformerMap by
observing all the values in this dataset. |
List<Example<T>> |
getData()
Gets the examples as an unmodifiable list.
|
Example<T> |
getExample(int index)
Gets the example at the supplied index.
|
abstract ImmutableFeatureMap |
getFeatureIDMap()
Returns or generates an
ImmutableFeatureMap . |
abstract FeatureMap |
getFeatureMap()
Returns this dataset's
FeatureMap . |
OutputFactory<T> |
getOutputFactory()
Gets the output factory this dataset contains.
|
abstract ImmutableOutputInfo<T> |
getOutputIDInfo()
Returns or generates an
ImmutableOutputInfo . |
abstract OutputInfo<T> |
getOutputInfo()
Returns this dataset's
OutputInfo . |
abstract Set<T> |
getOutputs()
Gets the set of outputs that occur in the examples in this dataset.
|
String |
getSourceDescription()
A String description of this dataset.
|
DataProvenance |
getSourceProvenance()
The provenance of the data this Dataset contains.
|
Iterator<Example<T>> |
iterator() |
void |
shuffle(boolean shuffle)
Shuffles the indices, or stops shuffling them.
|
int |
size()
Gets the size of the data set.
|
String |
toString() |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
forEach, spliterator
protected final DataProvenance sourceProvenance
protected final OutputFactory<T extends Output<T>> outputFactory
OutputInfo
and Output
of the appropriate type.protected int[] indices
protected Dataset(DataProvenance provenance, OutputFactory<T> outputFactory)
provenance
- A description of the data, including preprocessing steps.outputFactory
- The output factory.protected Dataset(DataSource<T> dataSource)
dataSource
- the DataSource to use.public String getSourceDescription()
public DataProvenance getSourceProvenance()
public List<Example<T>> getData()
In other words, using the following to add additional examples to this dataset with throw an exception:
dataset.getData().add(example)
Instead, use MutableDataset.add(Example)
.
public OutputFactory<T> getOutputFactory()
public abstract Set<T> getOutputs()
public Example<T> getExample(int index)
Throws IllegalArgumentException if the index is invalid or outside the bounds.
index
- The index of the example.public int size()
public void shuffle(boolean shuffle)
The shuffle only affects the iterator, it does not affect
getExample(int)
.
Multiple calls with the argument true will shuffle the dataset multiple times. The RNG is shared across all Dataset instances, so methods which access it are synchronized.
Using this method will prevent the provenance system from tracking the exact state of the dataset, which may be important for trainers which depend on the example order, like those using stochastic gradient descent.
shuffle
- If true shuffle the data.public abstract ImmutableOutputInfo<T> getOutputIDInfo()
ImmutableOutputInfo
.public abstract OutputInfo<T> getOutputInfo()
OutputInfo
.public abstract ImmutableFeatureMap getFeatureIDMap()
ImmutableFeatureMap
.public abstract FeatureMap getFeatureMap()
FeatureMap
.public TransformerMap createTransformers(TransformationMap transformations)
TransformationMap
and converts it into a TransformerMap
by
observing all the values in this dataset.
Does not mutate the dataset, if you wish to apply the TransformerMap, use
MutableDataset.transform(org.tribuo.transform.TransformerMap)
or TransformerMap.transformDataset(org.tribuo.Dataset<T>)
.
Currently TransformationMaps and TransformerMaps only operate on feature values
which are present, sparse values are ignored and not transformed. If the zeros
should be transformed, call MutableDataset.densify()
on the datasets.
Throws IllegalArgumentException
if the TransformationMap object has
regexes which apply to multiple features.
transformations
- The transformations to fit.Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.