Uses of Class org.tribuo.Dataset (Tribuo 4.0.2 API)

Provides classes for loading in data from disk, processing it into examples, and splitting datasets for things like cross-validation and train-test splits.

org.tribuo.data.csv

Provides classes which can load columnar data (using a RowProcessor) from a CSV (or other character delimited format) file.

org.tribuo.dataset

Provides utility datasets which subsample or otherwise transform the wrapped dataset.

org.tribuo.datasource

Simple data sources for ingesting or aggregating data.

org.tribuo.ensemble

Provides an interface for model prediction combinations, two base classes for ensemble models, a base class for ensemble excuses, and a Bagging implementation.

org.tribuo.evaluation

Evaluation base classes, along with code for train/test splits and cross validation.

org.tribuo.evaluation.metrics

This package contains the infrastructure classes for building evaluation metrics.

org.tribuo.hash

Provides the base interface and implementations of the Model hashing which obscures the feature names stored in a model.

org.tribuo.interop.tensorflow

Provides an interface to Tensorflow, allowing the training of non-sequential models using any supported Tribuo output type.

org.tribuo.math.la

Provides a linear algebra system used for numerical operations in Tribuo.

org.tribuo.multilabel.baseline

Provides an implementation of independent multi-label classification that wraps a Label Trainer and uses it to make independent predictions of each label.

org.tribuo.multilabel.example

Provides a multi-label data generator for testing implementations.

org.tribuo.provenance

Provides Tribuo specific infrastructure for the Provenance system which tracks models and datasets.

org.tribuo.regression.baseline

Provides simple baseline regression predictors.

org.tribuo.regression.example

Provides some example regression data generators for testing implementations.

org.tribuo.regression.impl

Provides skeletal implementations of Regressor Trainer that can wrap a single dimension trainer/model and produce one prediction per dimension independently.

org.tribuo.regression.liblinear

Provides an interface to liblinear for regression problems.

org.tribuo.regression.libsvm

Provides an interface to LibSVM for regression problems.

org.tribuo.regression.rtree

Provides an implementation of decision trees for regression problems.

org.tribuo.regression.rtree.impl

Provides internal implementation classes for the regression trees.

org.tribuo.regression.sgd.linear

Provides an implementation of linear regression using Stochastic Gradient Descent.

org.tribuo.regression.slm

Provides implementations of sparse linear regression using various forms of regularisation penalty.

org.tribuo.regression.xgboost

Provides an interface to XGBoost for regression problems.

org.tribuo.sequence

Provides core classes for working with sequences of Examples.

org.tribuo.transform

Provides infrastructure for applying transformations to a Dataset.

Uses of Dataset in org.tribuo

Subclasses of Dataset in org.tribuo

Modifier and Type

Class

Description

class

ImmutableDataset<T extends Output<T>>

This is a Dataset which has an ImmutableFeatureMap to store the feature information.

class

MutableDataset<T extends Output<T>>

A MutableDataset is a Dataset with a MutableFeatureMap which grows over time.

Methods in org.tribuo with parameters of type Dataset

Modifier and Type

Method

Description

static <T extends Output<T>> ImmutableDataset<T>

ImmutableDataset.copyDataset(Dataset<T> dataset)

Creates an immutable deep copy of the supplied dataset.

static <T extends Output<T>> ImmutableDataset<T>

ImmutableDataset.copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo)

Creates an immutable deep copy of the supplied dataset, using a different feature and output map.

static <T extends Output<T>> ImmutableDataset<T>

ImmutableDataset.copyDataset(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger)

Creates an immutable deep copy of the supplied dataset.

static <T extends Output<T>> MutableDataset<T>

MutableDataset.createDeepCopy(Dataset<T> other)

Creates a deep copy of the supplied Dataset which is mutable.

static <T extends Output<T>> ImmutableDataset<T>

ImmutableDataset.hashFeatureMap(Dataset<T> dataset, Hasher hasher)

Creates an immutable shallow copy of the supplied dataset, using the hasher to generate a HashedFeatureMap which transparently maps from the feature name to the hashed variant.

U

IncrementalTrainer.incrementalTrain(Dataset<T> newData, U model)

Incrementally trains the supplied model with the new data.

List<Prediction<T>>

Model.predict(Dataset<T> examples)

Uses the model to predict the outputs for multiple examples contained in a data set.

default SparseModel<T>

SparseTrainer.train(Dataset<T> examples)

Trains a sparse predictive model using the examples in the given data set.

SparseModel<T>

SparseTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

Trains a sparse predictive model using the examples in the given data set.

default Model<T>

Trainer.train(Dataset<T> examples)

Trains a predictive model using the examples in the given data set.

Model<T>

Trainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

Trains a predictive model using the examples in the given data set.
Uses of Dataset in org.tribuo.anomaly.example

Methods in org.tribuo.anomaly.example that return types with arguments of type Dataset

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.denseTrainTest()

Makes a simple dataset for training and testing.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.denseTrainTest()

Makes a simple dataset for training and testing.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.gaussianAnomaly()

Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.gaussianAnomaly()

Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.gaussianAnomaly(long size, double fractionAnomalous)

Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.gaussianAnomaly(long size, double fractionAnomalous)

Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.sparseTrainTest()

Makes a simple dataset for training and testing.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.sparseTrainTest()

Makes a simple dataset for training and testing.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Event>, Dataset<Event>>

AnomalyDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.
Uses of Dataset in org.tribuo.anomaly.libsvm

Methods in org.tribuo.anomaly.libsvm with parameters of type Dataset

Modifier and Type

Method

Description

protected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][], double[][]>

LibSVMAnomalyTrainer.extractData(Dataset<Event> data, ImmutableOutputInfo<Event> outputInfo, ImmutableFeatureMap featureMap)

LibSVMModel<Event>

LibSVMAnomalyTrainer.train(Dataset<Event> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance)
Uses of Dataset in org.tribuo.classification.baseline

Methods in org.tribuo.classification.baseline with parameters of type Dataset

Modifier and Type

Method

Description

Model<Label>

DummyClassifierTrainer.train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance)
Uses of Dataset in org.tribuo.classification.dtree

Methods in org.tribuo.classification.dtree with parameters of type Dataset

Modifier and Type

Method

Description

protected AbstractTrainingNode<Label>

CARTClassificationTrainer.mkTrainingNode(Dataset<Label> examples)
Uses of Dataset in org.tribuo.classification.dtree.impl

Constructors in org.tribuo.classification.dtree.impl with parameters of type Dataset

Modifier

Constructor

Description

ClassifierTrainingNode(LabelImpurity impurity, Dataset<Label> examples)

Constructor which creates the inverted file.
Uses of Dataset in org.tribuo.classification.ensemble

Methods in org.tribuo.classification.ensemble with parameters of type Dataset

Modifier and Type

Method

Description

Model<Label>

AdaBoostTrainer.train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

If the trainer implements WeightedExamples then do boosting by weighting, otherwise do boosting by sampling.
Uses of Dataset in org.tribuo.classification.example

Methods in org.tribuo.classification.example that return types with arguments of type Dataset

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.binarySparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.binarySparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.binarySparseTrainTest(double negate)

Generates a pair of datasets with sparse features and unknown features in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.binarySparseTrainTest(double negate)

Generates a pair of datasets with sparse features and unknown features in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.denseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.denseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.sparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.sparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

LabelledDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.
Uses of Dataset in org.tribuo.classification.experiments

Methods in org.tribuo.classification.experiments that return types with arguments of type Dataset

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Model<Label>, Dataset<Label>>

Test.load(Test.ConfigurableTestOptions o)
Uses of Dataset in org.tribuo.classification.liblinear

Methods in org.tribuo.classification.liblinear with parameters of type Dataset

Modifier and Type

Method

Description

protected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][], double[][]>

LibLinearClassificationTrainer.extractData(Dataset<Label> data, ImmutableOutputInfo<Label> outputInfo, ImmutableFeatureMap featureMap)
Uses of Dataset in org.tribuo.classification.libsvm

Methods in org.tribuo.classification.libsvm with parameters of type Dataset

Modifier and Type

Method

Description

protected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][], double[][]>

LibSVMClassificationTrainer.extractData(Dataset<Label> data, ImmutableOutputInfo<Label> outputInfo, ImmutableFeatureMap featureMap)
Uses of Dataset in org.tribuo.classification.mnb

Methods in org.tribuo.classification.mnb with parameters of type Dataset

Modifier and Type

Method

Description

Model<Label>

MultinomialNaiveBayesTrainer.train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.classification.sgd.kernel

Methods in org.tribuo.classification.sgd.kernel with parameters of type Dataset

Modifier and Type

Method

Description

KernelSVMModel

KernelSVMTrainer.train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.classification.sgd.linear

Methods in org.tribuo.classification.sgd.linear with parameters of type Dataset

Modifier and Type

Method

Description

Model<Label>

LinearSGDTrainer.train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.classification.xgboost

Methods in org.tribuo.classification.xgboost with parameters of type Dataset

Modifier and Type

Method

Description

XGBoostModel<Label>

XGBoostClassificationTrainer.train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.clustering.example

Methods in org.tribuo.clustering.example that return Dataset

Modifier and Type

Method

Description

static Dataset<ClusterID>

ClusteringDataGenerator.gaussianClusters(long size, long seed)

Generates a dataset drawn from a mixture of 5 2d gaussians.

Methods in org.tribuo.clustering.example that return types with arguments of type Dataset

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.denseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.denseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.sparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.sparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<ClusterID>, Dataset<ClusterID>>

ClusteringDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.
Uses of Dataset in org.tribuo.clustering.kmeans

Methods in org.tribuo.clustering.kmeans with parameters of type Dataset

Modifier and Type

Method

Description

protected static DenseVector[]

KMeansTrainer.initialiseCentroids(int centroids, Dataset<ClusterID> examples, ImmutableFeatureMap featureMap, SplittableRandom rng)

Initialisation method called at the start of each train call.

KMeansModel

KMeansTrainer.train(Dataset<ClusterID> dataset)

KMeansModel

KMeansTrainer.train(Dataset<ClusterID> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.common.liblinear

Methods in org.tribuo.common.liblinear with parameters of type Dataset

Modifier and Type

Method

Description

protected abstract com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][], double[][]>

LibLinearTrainer.extractData(Dataset<T> data, ImmutableOutputInfo<T> outputInfo, ImmutableFeatureMap featureMap)

Extracts the features and Outputs in LibLinear's format.

LibLinearModel<T>

LibLinearTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.common.libsvm

Methods in org.tribuo.common.libsvm with parameters of type Dataset

Modifier and Type

Method

Description

protected abstract com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][], double[][]>

LibSVMTrainer.extractData(Dataset<T> data, ImmutableOutputInfo<T> outputInfo, ImmutableFeatureMap featureMap)

Extracts the features and Outputs in LibLinear's format.

LibSVMModel<T>

LibSVMTrainer.train(Dataset<T> examples)

LibSVMModel<T>

LibSVMTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.common.nearest

Methods in org.tribuo.common.nearest with parameters of type Dataset

Modifier and Type

Method

Description

Model<T>

KNNTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.common.tree

Methods in org.tribuo.common.tree with parameters of type Dataset

Modifier and Type

Method

Description

protected abstract AbstractTrainingNode<T>

AbstractCARTTrainer.mkTrainingNode(Dataset<T> examples)

TreeModel<T>

AbstractCARTTrainer.train(Dataset<T> examples)

TreeModel<T>

AbstractCARTTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.common.xgboost

Methods in org.tribuo.common.xgboost with parameters of type Dataset

Modifier and Type

Method

Description

protected static <T extends Output<T>> XGBoostTrainer.DMatrixTuple<T>

XGBoostTrainer.convertDataset(Dataset<T> examples)

protected static <T extends Output<T>> XGBoostTrainer.DMatrixTuple<T>

XGBoostTrainer.convertDataset(Dataset<T> examples, Function<T,Float> responseExtractor)

List<Prediction<T>>

XGBoostModel.predict(Dataset<T> examples)

Uses the model to predict the labels for multiple examples contained in a data set.
Uses of Dataset in org.tribuo.data

Methods in org.tribuo.data that return types with arguments of type Dataset

Modifier and Type

Method

Description

<T extends Output<T>> com.oracle.labs.mlrg.olcut.util.Pair<Dataset<T>, Dataset<T>>

DataOptions.load(OutputFactory<T> outputFactory)

<T extends Output<T>> com.oracle.labs.mlrg.olcut.util.Pair<Dataset<T>, Dataset<T>>

DataOptions.load(OutputFactory<T> outputFactory)
Uses of Dataset in org.tribuo.data.csv

Methods in org.tribuo.data.csv with parameters of type Dataset

Modifier and Type

Method

Description

<T extends Output<T>> void

CSVSaver.save(Path csvPath, Dataset<T> dataset, String responseName)

Saves the dataset to the specified path.

<T extends Output<T>> void

CSVSaver.save(Path csvPath, Dataset<T> dataset, Set<String> responseNames)

Saves the dataset to the specified path.
Uses of Dataset in org.tribuo.dataset

Subclasses of Dataset in org.tribuo.dataset

Modifier and Type

Class

Description

final class

DatasetView<T extends Output<T>>

DatasetView provides an immutable view on another Dataset that only exposes selected examples.

class

MinimumCardinalityDataset<T extends Output<T>>

This class creates a pruned dataset in which low frequency features that occur less than the provided minimum cardinality have been removed.

Methods in org.tribuo.dataset with parameters of type Dataset

Modifier and Type

Method

Description

static <T extends Output<T>> DatasetView<T>

DatasetView.createBootstrapView(Dataset<T> dataset, int size, long seed)

Generates a DatasetView bootstrapped from the supplied Dataset.

static <T extends Output<T>> DatasetView<T>

DatasetView.createBootstrapView(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs)

Generates a DatasetView bootstrapped from the supplied Dataset.

static <T extends Output<T>> DatasetView<T>

DatasetView.createView(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag)

Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.

static <T extends Output<T>> DatasetView<T>

DatasetView.createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights)

Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.

static <T extends Output<T>> DatasetView<T>

DatasetView.createWeightedBootstrapView(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs)

Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.

Constructors in org.tribuo.dataset with parameters of type Dataset

Modifier

Constructor

Description

DatasetView(Dataset<T> dataset, int[] exampleIndices, String tag)

Creates a DatasetView which includes the supplied indices from the dataset.

DatasetView(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag)

Creates a DatasetView which includes the supplied indices from the dataset.

MinimumCardinalityDataset(Dataset<T> dataset, int minCardinality)
Uses of Dataset in org.tribuo.datasource

Methods in org.tribuo.datasource with parameters of type Dataset

Modifier and Type

Method

Description

static <T extends Output<T>> void

LibSVMDataSource.writeLibSVMFormat(Dataset<T> dataset, PrintStream out, boolean zeroIndexed, Function<T,Number> transformationFunc)

Writes out a dataset in LibSVM format.
Uses of Dataset in org.tribuo.ensemble

Methods in org.tribuo.ensemble with parameters of type Dataset

Modifier and Type

Method

Description

Model<T>

BaggingTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

protected Model<T>

BaggingTrainer.trainSingleModel(Dataset<T> examples, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, SplittableRandom localRNG, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.evaluation

Methods in org.tribuo.evaluation with parameters of type Dataset

Modifier and Type

Method

Description

static <T extends Output<T>, C extends MetricContext<T>> com.oracle.labs.mlrg.olcut.util.Pair<Integer,Double>

EvaluationAggregator.argmax(EvaluationMetric<T,C> metric, List<? extends Model<T>> models, Dataset<T> dataset)

Calculates the argmax of a metric across the supplied models (i.e., the index of the model which performed the best).

final E

AbstractEvaluator.evaluate(Model<T> model, Dataset<T> dataset)

Produces an evaluation for the supplied model and dataset, by calling Model.predict(org.tribuo.Example<T>) to create the predictions, then aggregating the appropriate statistics.

E

Evaluator.evaluate(Model<T> model, Dataset<T> dataset)

Evaluates the dataset using the supplied model, returning an immutable Evaluation of the appropriate type.

Iterator<KFoldSplitter.TrainTestFold<T>>

KFoldSplitter.split(Dataset<T> dataset, boolean shuffle)

Splits a dataset into k consecutive folds; for each fold, the remaining k-1 folds form the training set.

static <T extends Output<T>, C extends MetricContext<T>> DescriptiveStats

EvaluationAggregator.summarize(List<? extends EvaluationMetric<T,C>> metrics, Model<T> model, Dataset<T> dataset)

Summarize model performance on dataset across several metrics.

static <T extends Output<T>, R extends Evaluation<T>> Map<MetricID<T>, DescriptiveStats>

EvaluationAggregator.summarize(Evaluator<T,R> evaluator, List<? extends Model<T>> models, Dataset<T> dataset)

Summarize performance using the supplied evaluator across several models on one dataset.

static <T extends Output<T>, C extends MetricContext<T>> DescriptiveStats

EvaluationAggregator.summarize(EvaluationMetric<T,C> metric, List<? extends Model<T>> models, Dataset<T> dataset)

Summarize performance w.r.t.

Method parameters in org.tribuo.evaluation with type arguments of type Dataset

Modifier and Type

Method

Description

static <T extends Output<T>, C extends MetricContext<T>> com.oracle.labs.mlrg.olcut.util.Pair<Integer,Double>

EvaluationAggregator.argmax(EvaluationMetric<T,C> metric, Model<T> model, List<? extends Dataset<T>> datasets)

Calculates the argmax of a metric across the supplied datasets.

static <T extends Output<T>, R extends Evaluation<T>> Map<MetricID<T>, DescriptiveStats>

EvaluationAggregator.summarize(Evaluator<T,R> evaluator, Model<T> model, List<? extends Dataset<T>> datasets)

Summarize performance according to evaluator for a single model across several datasets.

static <T extends Output<T>, C extends MetricContext<T>> DescriptiveStats

EvaluationAggregator.summarize(EvaluationMetric<T,C> metric, Model<T> model, List<? extends Dataset<T>> datasets)

Summarize a model's performance w.r.t.

Constructors in org.tribuo.evaluation with parameters of type Dataset

Modifier

Constructor

Description

CrossValidation(Trainer<T> trainer, Dataset<T> data, Evaluator<T,E> evaluator, int k)

Builds a k-fold cross-validation loop.

CrossValidation(Trainer<T> trainer, Dataset<T> data, Evaluator<T,E> evaluator, int k, long seed)

Builds a k-fold cross-validation loop.
Uses of Dataset in org.tribuo.evaluation.metrics

Methods in org.tribuo.evaluation.metrics with parameters of type Dataset

Modifier and Type

Method

Description

default C

EvaluationMetric.createContext(Model<T> model, Dataset<T> dataset)

Creates the metric context used to compute this metric's value, generating Predictions for each Example in the supplied dataset.
Uses of Dataset in org.tribuo.hash

Methods in org.tribuo.hash with parameters of type Dataset

Modifier and Type

Method

Description

Model<T>

HashingTrainer.train(Dataset<T> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance)

This clones the Dataset, hashes each of the examples and rewrites their feature ids before passing it to the inner trainer.
Uses of Dataset in org.tribuo.interop.tensorflow

Methods in org.tribuo.interop.tensorflow that return types with arguments of type Dataset

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

TrainTest.load(Path trainingPath, Path testingPath, OutputFactory<Label> outputFactory)

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Label>, Dataset<Label>>

TrainTest.load(Path trainingPath, Path testingPath, OutputFactory<Label> outputFactory)

Methods in org.tribuo.interop.tensorflow with parameters of type Dataset

Modifier and Type

Method

Description

Model<T>

TensorflowCheckpointTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

Model<T>

TensorflowTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.math.la

Methods in org.tribuo.math.la with parameters of type Dataset

Modifier and Type

Method

Description

static <T extends Output<T>> SparseVector[]

SparseVector.transpose(Dataset<T> dataset)

Converts a dataset of row-major examples into an array of column-major sparse vectors.

static <T extends Output<T>> SparseVector[]

SparseVector.transpose(Dataset<T> dataset, ImmutableFeatureMap fMap)

Converts a dataset of row-major examples into an array of column-major sparse vectors.
Uses of Dataset in org.tribuo.multilabel.baseline

Methods in org.tribuo.multilabel.baseline with parameters of type Dataset

Modifier and Type

Method

Description

Model<MultiLabel>

IndependentMultiLabelTrainer.train(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.multilabel.example

Methods in org.tribuo.multilabel.example that return Dataset

Modifier and Type

Method

Description

static Dataset<MultiLabel>

MultiLabelDataGenerator.generateTestData()

static Dataset<MultiLabel>

MultiLabelDataGenerator.generateTrainData()

Methods in org.tribuo.multilabel.example that return types with arguments of type Dataset

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<MultiLabel>, Dataset<MultiLabel>>

MultiLabelDataGenerator.generateDataset()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<MultiLabel>, Dataset<MultiLabel>>

MultiLabelDataGenerator.generateDataset()
Uses of Dataset in org.tribuo.provenance

Constructors in org.tribuo.provenance with parameters of type Dataset

Modifier

Constructor

Description

<T extends Output<T>>

DatasetProvenance(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, Dataset<T> dataset)
Uses of Dataset in org.tribuo.regression.baseline

Methods in org.tribuo.regression.baseline with parameters of type Dataset

Modifier and Type

Method

Description

DummyRegressionModel

DummyRegressionTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance)
Uses of Dataset in org.tribuo.regression.example

Methods in org.tribuo.regression.example that return Dataset

Modifier and Type

Method

Description

static Dataset<Regressor>

GaussianDataSource.generateDataset(int numSamples, float slope, float intercept, float variance, float xMin, float xMax, long seed)

Generates a single dimensional output drawn from N(slope*x + intercept,variance).

Methods in org.tribuo.regression.example that return types with arguments of type Dataset

Modifier and Type

Method

Description

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.denseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.denseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.denseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimDenseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimDenseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimDenseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimDenseTrainTest(double negate)

Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimSparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimSparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimSparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.multiDimSparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.sparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.sparseTrainTest()

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.

static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<Regressor>, Dataset<Regressor>>

RegressionDataGenerator.sparseTrainTest(double negate)

Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.
Uses of Dataset in org.tribuo.regression.impl

Methods in org.tribuo.regression.impl with parameters of type Dataset

Modifier and Type

Method

Description

SkeletalIndependentRegressionSparseModel

SkeletalIndependentRegressionSparseTrainer.train(Dataset<Regressor> examples)

SkeletalIndependentRegressionSparseModel

SkeletalIndependentRegressionSparseTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

SkeletalIndependentRegressionModel

SkeletalIndependentRegressionTrainer.train(Dataset<Regressor> examples)

SkeletalIndependentRegressionModel

SkeletalIndependentRegressionTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.regression.liblinear

Methods in org.tribuo.regression.liblinear with parameters of type Dataset

Modifier and Type

Method

Description

protected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][], double[][]>

LibLinearRegressionTrainer.extractData(Dataset<Regressor> data, ImmutableOutputInfo<Regressor> outputInfo, ImmutableFeatureMap featureMap)
Uses of Dataset in org.tribuo.regression.libsvm

Methods in org.tribuo.regression.libsvm with parameters of type Dataset

Modifier and Type

Method

Description

protected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][], double[][]>

LibSVMRegressionTrainer.extractData(Dataset<Regressor> data, ImmutableOutputInfo<Regressor> outputInfo, ImmutableFeatureMap featureMap)
Uses of Dataset in org.tribuo.regression.rtree

Methods in org.tribuo.regression.rtree with parameters of type Dataset

Modifier and Type

Method

Description

protected AbstractTrainingNode<Regressor>

CARTJointRegressionTrainer.mkTrainingNode(Dataset<Regressor> examples)

protected AbstractTrainingNode<Regressor>

CARTRegressionTrainer.mkTrainingNode(Dataset<Regressor> examples)

TreeModel<Regressor>

CARTRegressionTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.regression.rtree.impl

Methods in org.tribuo.regression.rtree.impl with parameters of type Dataset

Modifier and Type

Method

Description

static RegressorTrainingNode.InvertedData

RegressorTrainingNode.invertData(Dataset<Regressor> examples)

Inverts a training dataset from row major to column major.

Constructors in org.tribuo.regression.rtree.impl with parameters of type Dataset

Modifier

Constructor

Description

JointRegressorTrainingNode(RegressorImpurity impurity, Dataset<Regressor> examples, boolean normalize)

Constructor which creates the inverted file.
Uses of Dataset in org.tribuo.regression.sgd.linear

Methods in org.tribuo.regression.sgd.linear with parameters of type Dataset

Modifier and Type

Method

Description

LinearSGDModel

LinearSGDTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.regression.slm

Methods in org.tribuo.regression.slm with parameters of type Dataset

Modifier and Type

Method

Description

SparseModel<Regressor>

ElasticNetCDTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

SparseLinearModel

SLMTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

Trains a sparse linear model.
Uses of Dataset in org.tribuo.regression.xgboost

Methods in org.tribuo.regression.xgboost with parameters of type Dataset

Modifier and Type

Method

Description

XGBoostModel<Regressor>

XGBoostRegressionTrainer.train(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Uses of Dataset in org.tribuo.sequence

Methods in org.tribuo.sequence that return Dataset

Modifier and Type

Method

Description

Dataset<T>

SequenceDataset.getFlatDataset()

Returns a view on this SequenceDataset which aggregates all the examples and ignores the sequence structure.
Uses of Dataset in org.tribuo.transform

Methods in org.tribuo.transform with parameters of type Dataset

Modifier and Type

Method

Description

List<Prediction<T>>

TransformedModel.predict(Dataset<T> examples)

TransformedModel<T>

TransformTrainer.train(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance)

<T extends Output<T>> MutableDataset<T>

TransformerMap.transformDataset(Dataset<T> dataset)

Copies the supplied dataset and applies the transformers to each example in it.

<T extends Output<T>> MutableDataset<T>

TransformerMap.transformDataset(Dataset<T> dataset, boolean densify)

Copies the supplied dataset and applies the transformers to each example in it.

Uses of Classorg.tribuo.Dataset

Uses of Dataset in org.tribuo

Uses of Dataset in org.tribuo.anomaly.example

Uses of Dataset in org.tribuo.anomaly.libsvm

Uses of Dataset in org.tribuo.classification.baseline

Uses of Dataset in org.tribuo.classification.dtree

Uses of Dataset in org.tribuo.classification.dtree.impl

Uses of Dataset in org.tribuo.classification.ensemble

Uses of Dataset in org.tribuo.classification.example

Uses of Dataset in org.tribuo.classification.experiments

Uses of Dataset in org.tribuo.classification.liblinear

Uses of Dataset in org.tribuo.classification.libsvm

Uses of Dataset in org.tribuo.classification.mnb

Uses of Dataset in org.tribuo.classification.sgd.kernel

Uses of Dataset in org.tribuo.classification.sgd.linear

Uses of Dataset in org.tribuo.classification.xgboost

Uses of Dataset in org.tribuo.clustering.example

Uses of Dataset in org.tribuo.clustering.kmeans

Uses of Dataset in org.tribuo.common.liblinear

Uses of Dataset in org.tribuo.common.libsvm

Uses of Dataset in org.tribuo.common.nearest

Uses of Dataset in org.tribuo.common.tree

Uses of Dataset in org.tribuo.common.xgboost

Uses of Dataset in org.tribuo.data

Uses of Dataset in org.tribuo.data.csv

Uses of Dataset in org.tribuo.dataset

Uses of Dataset in org.tribuo.datasource

Uses of Dataset in org.tribuo.ensemble

Uses of Dataset in org.tribuo.evaluation

Uses of Dataset in org.tribuo.evaluation.metrics

Uses of Dataset in org.tribuo.hash

Uses of Dataset in org.tribuo.interop.tensorflow

Uses of Dataset in org.tribuo.math.la

Uses of Dataset in org.tribuo.multilabel.baseline

Uses of Dataset in org.tribuo.multilabel.example

Uses of Dataset in org.tribuo.provenance

Uses of Dataset in org.tribuo.regression.baseline

Uses of Dataset in org.tribuo.regression.example

Uses of Dataset in org.tribuo.regression.impl

Uses of Dataset in org.tribuo.regression.liblinear

Uses of Dataset in org.tribuo.regression.libsvm

Uses of Dataset in org.tribuo.regression.rtree

Uses of Dataset in org.tribuo.regression.rtree.impl

Uses of Dataset in org.tribuo.regression.sgd.linear

Uses of Dataset in org.tribuo.regression.slm

Uses of Dataset in org.tribuo.regression.xgboost

Uses of Dataset in org.tribuo.sequence

Uses of Dataset in org.tribuo.transform

Uses of Class
org.tribuo.Dataset