Uses of Class
org.tribuo.Dataset
Packages that use Dataset
Package
Description
Provides the core interfaces and classes for using Tribuo.
Provides anomaly data generators used for demos and testing implementations.
Provides an interface to LibLinear-java for anomaly detection problems.
Provides an interface to LibSVM for anomaly detection problems.
Provides simple baseline multiclass classifiers.
Provides implementations of decision trees for classification problems.
Provides internal implementation classes for classification decision trees.
Provides majority vote ensemble combiners for classification
along with an implementation of multiclass Adaboost.
Provides a multiclass data generator used for testing implementations, along with several synthetic data generators
for 2d binary classification problems to be used in demos or tutorials.
Provides a set of main methods for interacting with classification tasks.
Provides an interface to LibLinear-java for classification problems.
Provides an interface to LibSVM for classification problems.
Provides an implementation of multinomial naive bayes (i.e., naive bayes for non-negative count data).
Provides a SGD implementation of a Kernel SVM using the Pegasos algorithm.
Provides an interface to XGBoost for classification problems.
Provides clustering data generators used for demos and testing implementations.
Provides an implementation of HDBSCAN*.
Provides a multithreaded implementation of K-Means, with a
configurable distance function.
Provides base classes for using liblinear from Tribuo.
The base interface to LibSVM.
Provides a K-Nearest Neighbours implementation which works across
all Tribuo
Output
types.Provides the base classes for models trained with stochastic gradient descent.
Provides common functionality for building decision trees, irrespective
of the predicted
Output
.Provides abstract classes for interfacing with XGBoost abstracting away all the
Output
dependent parts.Provides classes for loading in data from disk, processing it into examples, and splitting datasets for
things like cross-validation and train-test splits.
Provides classes which can load columnar data (using a
RowProcessor
)
from a CSV (or other character delimited format) file.Provides utility datasets which subsample or otherwise
transform the wrapped dataset.
Simple data sources for ingesting or aggregating data.
Provides an interface for model prediction combinations,
two base classes for ensemble models, a base class for
ensemble excuses, and a Bagging implementation.
Evaluation base classes, along with code for train/test splits and cross validation.
This package contains the infrastructure classes for building evaluation metrics.
Provides the base interface and implementations of the
Model
hashing
which obscures the feature names stored in a model.Provides an interface to TensorFlow, allowing the training of non-sequential models using any supported
Tribuo output type.
Provides a linear algebra system used for numerical operations in Tribuo.
Provides implementations of binary relevance based multi-label classification
algorithms.
Provides a multi-label ensemble combiner that performs a (possibly
weighted) majority vote among each label independently, along with an
implementation of classifier chain ensembles.
Provides a multi-label data generator for testing implementations and a
configurable data source suitable for demos and tests.
Provides Tribuo specific infrastructure for the
Provenance
system which
tracks models and datasets.Provides simple baseline regression predictors.
Provides some example regression data generators for testing implementations.
Provides an interface to liblinear for regression problems.
Provides an interface to LibSVM for regression problems.
Provides an implementation of decision trees for regression problems.
Provides internal implementation classes for the regression trees.
Provides implementations of sparse linear regression using various forms of regularisation penalty.
Provides an interface to XGBoost for regression problems.
Reproducibility utility based on Tribuo's provenance objects.
Provides core classes for working with sequences of
Example
s.Provides infrastructure for applying transformations to a
Dataset
.-
Uses of Dataset in org.tribuo
Subclasses of Dataset in org.tribuoModifier and TypeClassDescriptionclass
ImmutableDataset<T extends Output<T>>
This is aDataset
which has anImmutableFeatureMap
to store the feature information.class
MutableDataset<T extends Output<T>>
A MutableDataset is aDataset
with aMutableFeatureMap
which grows over time.Methods in org.tribuo that return DatasetModifier and TypeMethodDescriptionDataset.castDataset
(Dataset<?> inputDataset, Class<T> outputType) Casts the dataset to the specified output type, assuming it is valid.Methods in org.tribuo with parameters of type DatasetModifier and TypeMethodDescriptionDataset.castDataset
(Dataset<?> inputDataset, Class<T> outputType) Casts the dataset to the specified output type, assuming it is valid.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.copyDataset
(Dataset<T> dataset) Creates an immutable deep copy of the supplied dataset.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.copyDataset
(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.copyDataset
(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.static <T extends Output<T>>
MutableDataset<T>MutableDataset.createDeepCopy
(Dataset<T> other) Creates a deep copy of the suppliedDataset
which is mutable.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.hashFeatureMap
(Dataset<T> dataset, Hasher hasher) Creates an immutable shallow copy of the supplied dataset, using the hasher to generate aHashedFeatureMap
which transparently maps from the feature name to the hashed variant.IncrementalTrainer.incrementalTrain
(Dataset<T> newData, U model) Incrementally trains the supplied model with the new data.List<Prediction<T>>
Uses the model to predict the outputs for multiple examples contained in a data set.default SparseModel<T>
Trains a sparse predictive model using the examples in the given data set.SparseTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) Trains a sparse predictive model using the examples in the given data set.default SparseModel<T>
SparseTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a predictive model using the examples in the given data set.Trains a predictive model using the examples in the given data set.Trainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) Trains a predictive model using the examples in the given data set.Trainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a predictive model using the examples in the given data set. -
Uses of Dataset in org.tribuo.anomaly.example
Methods in org.tribuo.anomaly.example that return types with arguments of type DatasetModifier and TypeMethodDescriptionAnomalyDataGenerator.denseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.denseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.AnomalyDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.AnomalyDataGenerator.gaussianAnomaly()
Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.gaussianAnomaly()
Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.gaussianAnomaly
(long size, double fractionAnomalous) Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.gaussianAnomaly
(long size, double fractionAnomalous) Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.sparseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.sparseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.AnomalyDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. -
Uses of Dataset in org.tribuo.anomaly.liblinear
Methods in org.tribuo.anomaly.liblinear with parameters of type DatasetModifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearAnomalyTrainer.extractData
(Dataset<Event> data, ImmutableOutputInfo<Event> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.anomaly.libsvm
Methods in org.tribuo.anomaly.libsvm with parameters of type DatasetModifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMAnomalyTrainer.extractData
(Dataset<Event> data, ImmutableOutputInfo<Event> outputInfo, ImmutableFeatureMap featureMap) LibSVMAnomalyTrainer.train
(Dataset<Event> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) -
Uses of Dataset in org.tribuo.classification.baseline
Methods in org.tribuo.classification.baseline with parameters of type DatasetModifier and TypeMethodDescriptionDummyClassifierTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) DummyClassifierTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.dtree
Methods in org.tribuo.classification.dtree with parameters of type DatasetModifier and TypeMethodDescriptionprotected AbstractTrainingNode<Label>
CARTClassificationTrainer.mkTrainingNode
(Dataset<Label> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) -
Uses of Dataset in org.tribuo.classification.dtree.impl
Constructors in org.tribuo.classification.dtree.impl with parameters of type DatasetModifierConstructorDescriptionClassifierTrainingNode
(LabelImpurity impurity, Dataset<Label> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) Constructor which creates the inverted file. -
Uses of Dataset in org.tribuo.classification.ensemble
Methods in org.tribuo.classification.ensemble with parameters of type DatasetModifier and TypeMethodDescriptionAdaBoostTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) If the trainer implementsWeightedExamples
then do boosting by weighting, otherwise do boosting by sampling.AdaBoostTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.example
Methods in org.tribuo.classification.example that return types with arguments of type DatasetModifier and TypeMethodDescriptionLabelledDataGenerator.binarySparseTrainTest()
Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.binarySparseTrainTest()
Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.binarySparseTrainTest
(double negate) Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.binarySparseTrainTest
(double negate) Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.LabelledDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.LabelledDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.LabelledDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. -
Uses of Dataset in org.tribuo.classification.experiments
Methods in org.tribuo.classification.experiments that return types with arguments of type Dataset -
Uses of Dataset in org.tribuo.classification.liblinear
Methods in org.tribuo.classification.liblinear with parameters of type DatasetModifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearClassificationTrainer.extractData
(Dataset<Label> data, ImmutableOutputInfo<Label> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.classification.libsvm
Methods in org.tribuo.classification.libsvm with parameters of type DatasetModifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMClassificationTrainer.extractData
(Dataset<Label> data, ImmutableOutputInfo<Label> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.classification.mnb
Methods in org.tribuo.classification.mnb with parameters of type DatasetModifier and TypeMethodDescriptionMultinomialNaiveBayesTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) MultinomialNaiveBayesTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.sgd.kernel
Methods in org.tribuo.classification.sgd.kernel with parameters of type DatasetModifier and TypeMethodDescriptionKernelSVMTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) KernelSVMTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.xgboost
Methods in org.tribuo.classification.xgboost with parameters of type DatasetModifier and TypeMethodDescriptionXGBoostClassificationTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) XGBoostClassificationTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.clustering.example
Methods in org.tribuo.clustering.example that return DatasetModifier and TypeMethodDescriptionClusteringDataGenerator.gaussianClusters
(long size, long seed) Generates a dataset drawn from a mixture of 5 2d gaussians.Methods in org.tribuo.clustering.example that return types with arguments of type DatasetModifier and TypeMethodDescriptionClusteringDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.ClusteringDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.ClusteringDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.ClusteringDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. -
Uses of Dataset in org.tribuo.clustering.hdbscan
Methods in org.tribuo.clustering.hdbscan with parameters of type Dataset -
Uses of Dataset in org.tribuo.clustering.kmeans
Methods in org.tribuo.clustering.kmeans with parameters of type DatasetModifier and TypeMethodDescriptionKMeansTrainer.train
(Dataset<ClusterID> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) KMeansTrainer.train
(Dataset<ClusterID> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.liblinear
Methods in org.tribuo.common.liblinear with parameters of type DatasetModifier and TypeMethodDescriptionprotected abstract com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearTrainer.extractData
(Dataset<T> data, ImmutableOutputInfo<T> outputInfo, ImmutableFeatureMap featureMap) Extracts the features andOutput
s in LibLinear's format.LibLinearTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) LibLinearTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.libsvm
Methods in org.tribuo.common.libsvm with parameters of type DatasetModifier and TypeMethodDescriptionprotected abstract com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMTrainer.extractData
(Dataset<T> data, ImmutableOutputInfo<T> outputInfo, ImmutableFeatureMap featureMap) Extracts the features andOutput
s in LibSVM's format.LibSVMTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) LibSVMTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.nearest
Methods in org.tribuo.common.nearest with parameters of type Dataset -
Uses of Dataset in org.tribuo.common.sgd
Methods in org.tribuo.common.sgd with parameters of type DatasetModifier and TypeMethodDescriptionAbstractSGDTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) AbstractSGDTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.tree
Methods in org.tribuo.common.tree with parameters of type DatasetModifier and TypeMethodDescriptionprotected abstract AbstractTrainingNode<T>
AbstractCARTTrainer.mkTrainingNode
(Dataset<T> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) Makes the initial training node.AbstractCARTTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) AbstractCARTTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.xgboost
Methods in org.tribuo.common.xgboost with parameters of type DatasetModifier and TypeMethodDescriptionprotected static <T extends Output<T>>
XGBoostTrainer.DMatrixTuple<T>XGBoostTrainer.convertDataset
(Dataset<T> examples) protected static <T extends Output<T>>
XGBoostTrainer.DMatrixTuple<T>XGBoostTrainer.convertDataset
(Dataset<T> examples, Function<T, Float> responseExtractor) List<Prediction<T>>
Uses the model to predict the labels for multiple examples contained in a data set. -
Uses of Dataset in org.tribuo.data
Methods in org.tribuo.data that return types with arguments of type DatasetModifier and TypeMethodDescriptionDataOptions.load
(OutputFactory<T> outputFactory) Loads the training and testing data fromDataOptions.trainingPath
andDataOptions.testingPath
according to the other parameters specified in this class.DataOptions.load
(OutputFactory<T> outputFactory) Loads the training and testing data fromDataOptions.trainingPath
andDataOptions.testingPath
according to the other parameters specified in this class. -
Uses of Dataset in org.tribuo.data.csv
Methods in org.tribuo.data.csv with parameters of type Dataset -
Uses of Dataset in org.tribuo.dataset
Subclasses of Dataset in org.tribuo.datasetModifier and TypeClassDescriptionfinal class
DatasetView<T extends Output<T>>
DatasetView provides an immutable view on anotherDataset
that only exposes selected examples.class
MinimumCardinalityDataset<T extends Output<T>>
This class creates a pruned dataset in which low frequency features that occur less than the provided minimum cardinality have been removed.Methods in org.tribuo.dataset with parameters of type DatasetModifier and TypeMethodDescriptionstatic <T extends Output<T>>
DatasetView<T>DatasetView.createBootstrapView
(Dataset<T> dataset, int size, long seed) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T>DatasetView.createBootstrapView
(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T>DatasetView.createView
(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag) Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.static <T extends Output<T>>
DatasetView<T>DatasetView.createWeightedBootstrapView
(Dataset<T> dataset, int size, long seed, float[] exampleWeights) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.static <T extends Output<T>>
DatasetView<T>DatasetView.createWeightedBootstrapView
(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.Constructors in org.tribuo.dataset with parameters of type DatasetModifierConstructorDescriptionDatasetView
(Dataset<T> dataset, int[] exampleIndices, String tag) Creates a DatasetView which includes the supplied indices from the dataset.DatasetView
(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag) Creates a DatasetView which includes the supplied indices from the dataset.MinimumCardinalityDataset
(Dataset<T> dataset, int minCardinality) -
Uses of Dataset in org.tribuo.datasource
Methods in org.tribuo.datasource with parameters of type DatasetModifier and TypeMethodDescriptionstatic <T extends Output<T>>
voidLibSVMDataSource.writeLibSVMFormat
(Dataset<T> dataset, PrintStream out, boolean zeroIndexed, Function<T, Number> transformationFunc) Writes out a dataset in LibSVM format. -
Uses of Dataset in org.tribuo.ensemble
Methods in org.tribuo.ensemble with parameters of type DatasetModifier and TypeMethodDescriptionBaggingTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) BaggingTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) BaggingTrainer.trainSingleModel
(Dataset<T> examples, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, int randInt, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a single model. -
Uses of Dataset in org.tribuo.evaluation
Methods in org.tribuo.evaluation with parameters of type DatasetModifier and TypeMethodDescriptionstatic <T extends Output<T>,
C extends MetricContext<T>>
com.oracle.labs.mlrg.olcut.util.Pair<Integer,Double> EvaluationAggregator.argmax
(EvaluationMetric<T, C> metric, List<? extends Model<T>> models, Dataset<T> dataset) Calculates the argmax of a metric across the supplied models (i.e., the index of the model which performed the best).final E
Produces an evaluation for the supplied model and dataset, by callingModel.predict(org.tribuo.Example<T>)
to create the predictions, then aggregating the appropriate statistics.Evaluates the dataset using the supplied model, returning an immutableEvaluation
of the appropriate type.Splits a dataset into k consecutive folds; for each fold, the remaining k-1 folds form the training set.static <T extends Output<T>,
C extends MetricContext<T>>
DescriptiveStatsEvaluationAggregator.summarize
(List<? extends EvaluationMetric<T, C>> metrics, Model<T> model, Dataset<T> dataset) Summarize model performance on dataset across several metrics.static <T extends Output<T>,
R extends Evaluation<T>>
Map<MetricID<T>,DescriptiveStats> EvaluationAggregator.summarize
(Evaluator<T, R> evaluator, List<? extends Model<T>> models, Dataset<T> dataset) Summarize performance using the supplied evaluator across several models on one dataset.static <T extends Output<T>,
C extends MetricContext<T>>
DescriptiveStatsEvaluationAggregator.summarize
(EvaluationMetric<T, C> metric, List<? extends Model<T>> models, Dataset<T> dataset) Summarize performance w.r.t.Method parameters in org.tribuo.evaluation with type arguments of type DatasetModifier and TypeMethodDescriptionstatic <T extends Output<T>,
C extends MetricContext<T>>
com.oracle.labs.mlrg.olcut.util.Pair<Integer,Double> EvaluationAggregator.argmax
(EvaluationMetric<T, C> metric, Model<T> model, List<? extends Dataset<T>> datasets) Calculates the argmax of a metric across the supplied datasets.static <T extends Output<T>,
R extends Evaluation<T>>
Map<MetricID<T>,DescriptiveStats> EvaluationAggregator.summarize
(Evaluator<T, R> evaluator, Model<T> model, List<? extends Dataset<T>> datasets) Summarize performance according to evaluator for a single model across several datasets.static <T extends Output<T>,
C extends MetricContext<T>>
DescriptiveStatsEvaluationAggregator.summarize
(EvaluationMetric<T, C> metric, Model<T> model, List<? extends Dataset<T>> datasets) Summarize a model's performance w.r.t.Constructors in org.tribuo.evaluation with parameters of type Dataset -
Uses of Dataset in org.tribuo.evaluation.metrics
Methods in org.tribuo.evaluation.metrics with parameters of type DatasetModifier and TypeMethodDescriptiondefault C
EvaluationMetric.createContext
(Model<T> model, Dataset<T> dataset) Creates the metric context used to compute this metric's value, generatingPrediction
s for eachExample
in the supplied dataset. -
Uses of Dataset in org.tribuo.hash
Methods in org.tribuo.hash with parameters of type DatasetModifier and TypeMethodDescriptionHashingTrainer.train
(Dataset<T> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) This clones theDataset
, hashes each of the examples and rewrites their feature ids before passing it to the inner trainer.HashingTrainer.train
(Dataset<T> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.interop.tensorflow
Methods in org.tribuo.interop.tensorflow with parameters of type Dataset -
Uses of Dataset in org.tribuo.math.la
Methods in org.tribuo.math.la with parameters of type DatasetModifier and TypeMethodDescriptionstatic <T extends Output<T>>
SparseVector[]Converts a dataset of row-major examples into an array of column-major sparse vectors.static <T extends Output<T>>
SparseVector[]SparseVector.transpose
(Dataset<T> dataset, ImmutableFeatureMap fMap) Converts a dataset of row-major examples into an array of column-major sparse vectors. -
Uses of Dataset in org.tribuo.multilabel.baseline
Methods in org.tribuo.multilabel.baseline with parameters of type DatasetModifier and TypeMethodDescriptionClassifierChainTrainer.train
(Dataset<MultiLabel> examples) ClassifierChainTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) ClassifierChainTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) IndependentMultiLabelTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) IndependentMultiLabelTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.multilabel.ensemble
Methods in org.tribuo.multilabel.ensemble with parameters of type DatasetModifier and TypeMethodDescriptionCCEnsembleTrainer.train
(Dataset<MultiLabel> examples) CCEnsembleTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) CCEnsembleTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.multilabel.example
Methods in org.tribuo.multilabel.example that return DatasetModifier and TypeMethodDescriptionstatic Dataset<MultiLabel>
MultiLabelDataGenerator.generateTestData()
Simple test data for checking multi-label trainers.static Dataset<MultiLabel>
MultiLabelDataGenerator.generateTrainData()
Simple training data for checking multi-label trainers.Methods in org.tribuo.multilabel.example that return types with arguments of type DatasetModifier and TypeMethodDescriptionstatic com.oracle.labs.mlrg.olcut.util.Pair<Dataset<MultiLabel>,
Dataset<MultiLabel>> MultiLabelDataGenerator.generateDataset()
Generate training and testing datasets.static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<MultiLabel>,
Dataset<MultiLabel>> MultiLabelDataGenerator.generateDataset()
Generate training and testing datasets. -
Uses of Dataset in org.tribuo.provenance
Constructors in org.tribuo.provenance with parameters of type DatasetModifierConstructorDescriptionDatasetProvenance
(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, Dataset<T> dataset) Creates a dataset provenance from the supplied dataset. -
Uses of Dataset in org.tribuo.regression.baseline
Methods in org.tribuo.regression.baseline with parameters of type DatasetModifier and TypeMethodDescriptionDummyRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) DummyRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.regression.example
Methods in org.tribuo.regression.example that return types with arguments of type DatasetModifier and TypeMethodDescriptionRegressionDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimSparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.multiDimSparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.multiDimSparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.multiDimSparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.threeDimDenseTrainTest
(double negate, boolean remapIndices) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.threeDimDenseTrainTest
(double negate, boolean remapIndices) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}. -
Uses of Dataset in org.tribuo.regression.impl
Methods in org.tribuo.regression.impl with parameters of type DatasetModifier and TypeMethodDescriptionSkeletalIndependentRegressionSparseTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) SkeletalIndependentRegressionSparseTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) SkeletalIndependentRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) SkeletalIndependentRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.regression.liblinear
Methods in org.tribuo.regression.liblinear with parameters of type DatasetModifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearRegressionTrainer.extractData
(Dataset<Regressor> data, ImmutableOutputInfo<Regressor> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.regression.libsvm
Methods in org.tribuo.regression.libsvm with parameters of type DatasetModifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMRegressionTrainer.extractData
(Dataset<Regressor> data, ImmutableOutputInfo<Regressor> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.regression.rtree
Methods in org.tribuo.regression.rtree with parameters of type DatasetModifier and TypeMethodDescriptionprotected AbstractTrainingNode<Regressor>
CARTJointRegressionTrainer.mkTrainingNode
(Dataset<Regressor> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) protected AbstractTrainingNode<Regressor>
CARTRegressionTrainer.mkTrainingNode
(Dataset<Regressor> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) CARTRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) CARTRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.regression.rtree.impl
Methods in org.tribuo.regression.rtree.impl with parameters of type DatasetModifier and TypeMethodDescriptionRegressorTrainingNode.invertData
(Dataset<Regressor> examples) Inverts a training dataset from row major to column major.Constructors in org.tribuo.regression.rtree.impl with parameters of type DatasetModifierConstructorDescriptionJointRegressorTrainingNode
(RegressorImpurity impurity, Dataset<Regressor> examples, boolean normalize, AbstractTrainingNode.LeafDeterminer leafDeterminer) Constructor which creates the inverted file. -
Uses of Dataset in org.tribuo.regression.slm
Methods in org.tribuo.regression.slm with parameters of type DatasetModifier and TypeMethodDescriptionElasticNetCDTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) ElasticNetCDTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) SLMTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) Trains a sparse linear model.SLMTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a sparse linear model. -
Uses of Dataset in org.tribuo.regression.xgboost
Methods in org.tribuo.regression.xgboost with parameters of type DatasetModifier and TypeMethodDescriptionXGBoostRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) XGBoostRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.reproducibility
Methods in org.tribuo.reproducibility that return DatasetModifier and TypeMethodDescriptionReproUtil.recoverDataset()
Return aDataset
used when a model was trained. -
Uses of Dataset in org.tribuo.sequence
Methods in org.tribuo.sequence that return DatasetModifier and TypeMethodDescriptionSequenceDataset.getFlatDataset()
Returns a view on this SequenceDataset which aggregates all the examples and ignores the sequence structure. -
Uses of Dataset in org.tribuo.transform
Methods in org.tribuo.transform with parameters of type DatasetModifier and TypeMethodDescriptionList<Prediction<T>>
TransformTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) TransformTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) <T extends Output<T>>
MutableDataset<T>TransformerMap.transformDataset
(Dataset<T> dataset) Copies the supplied dataset and applies the transformers to each example in it.<T extends Output<T>>
MutableDataset<T>TransformerMap.transformDataset
(Dataset<T> dataset, boolean densify) Copies the supplied dataset and applies the transformers to each example in it.