Uses of Class
org.tribuo.Dataset
Package
Description
Provides the core interfaces and classes for using Tribuo.
Provides anomaly data generators used for demos and testing implementations.
Provides an interface to LibLinear-java for anomaly detection problems.
Provides an interface to LibSVM for anomaly detection problems.
Provides simple baseline multiclass classifiers.
Provides implementations of decision trees for classification problems.
Provides internal implementation classes for classification decision trees.
Provides majority vote ensemble combiners for classification
along with an implementation of multiclass Adaboost.
Provides a multiclass data generator used for testing implementations, along with several synthetic data generators
for 2d binary classification problems to be used in demos or tutorials.
Provides a set of main methods for interacting with classification tasks.
Information theoretic feature selection algorithms.
Provides an interface to LibLinear-java for classification problems.
Provides an interface to LibSVM for classification problems.
Provides an implementation of multinomial naive bayes (i.e., naive bayes for non-negative count data).
Provides a SGD implementation of a Kernel SVM using the Pegasos algorithm.
Provides an interface to XGBoost for classification problems.
Provides clustering data generators used for demos and testing implementations.
Provides an implementation of HDBSCAN*.
Provides a multithreaded implementation of K-Means, with a
configurable distance function.
Provides base classes for using liblinear from Tribuo.
The base interface to LibSVM.
Provides a K-Nearest Neighbours implementation which works across
all Tribuo
Output
types.Provides the base classes for models trained with stochastic gradient descent.
Provides common functionality for building decision trees, irrespective
of the predicted
Output
.Provides abstract classes for interfacing with XGBoost abstracting away all the
Output
dependent parts.Provides classes for loading in data from disk, processing it into examples, and splitting datasets for
things like cross-validation and train-test splits.
Provides classes which can load columnar data (using a
RowProcessor
)
from a CSV (or other character delimited format) file.Provides utility datasets which subsample or otherwise
transform the wrapped dataset.
Simple data sources for ingesting or aggregating data.
Provides an interface for model prediction combinations,
two base classes for ensemble models, a base class for
ensemble excuses, and a Bagging implementation.
Evaluation base classes, along with code for train/test splits and cross validation.
This package contains the infrastructure classes for building evaluation metrics.
Provides the base interface and implementations of the
Model
hashing
which obscures the feature names stored in a model.Provides an interface to TensorFlow, allowing the training of non-sequential models using any supported
Tribuo output type.
Provides a linear algebra system used for numerical operations in Tribuo.
Provides implementations of binary relevance based multi-label classification
algorithms.
Provides a multi-label ensemble combiner that performs a (possibly
weighted) majority vote among each label independently, along with an
implementation of classifier chain ensembles.
Provides a multi-label data generator for testing implementations and a
configurable data source suitable for demos and tests.
Provides Tribuo specific infrastructure for the
Provenance
system which
tracks models and datasets.Provides simple baseline regression predictors.
Provides some example regression data generators for testing implementations.
Provides an interface to liblinear for regression problems.
Provides an interface to LibSVM for regression problems.
Provides an implementation of decision trees for regression problems.
Provides internal implementation classes for the regression trees.
Provides implementations of sparse linear regression using various forms of regularisation penalty.
Provides an interface to XGBoost for regression problems.
Reproducibility utility based on Tribuo's provenance objects.
Provides core classes for working with sequences of
Example
s.Provides infrastructure for applying transformations to a
Dataset
.-
Uses of Dataset in org.tribuo
Modifier and TypeClassDescriptionclass
ImmutableDataset<T extends Output<T>>
This is aDataset
which has anImmutableFeatureMap
to store the feature information.class
MutableDataset<T extends Output<T>>
A MutableDataset is aDataset
with aMutableFeatureMap
which grows over time.Modifier and TypeMethodDescriptionDataset.castDataset
(Dataset<?> inputDataset, Class<T> outputType) Casts the dataset to the specified output type, assuming it is valid.static Dataset<?>
Dataset.deserialize
(org.tribuo.protos.core.DatasetProto datasetProto) Deserializes a dataset proto into a dataset.static Dataset<?>
Dataset.deserializeFromFile
(Path path) Reads an instance ofDatasetProto
from the supplied path and deserializes it.static Dataset<?>
Dataset.deserializeFromStream
(InputStream is) Reads an instance ofDatasetProto
from the supplied input stream and deserializes it.Modifier and TypeMethodDescriptionDataset.castDataset
(Dataset<?> inputDataset, Class<T> outputType) Casts the dataset to the specified output type, assuming it is valid.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.copyDataset
(Dataset<T> dataset) Creates an immutable deep copy of the supplied dataset.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.copyDataset
(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo) Creates an immutable deep copy of the supplied dataset, using a different feature and output map.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.copyDataset
(Dataset<T> dataset, ImmutableFeatureMap featureIDMap, ImmutableOutputInfo<T> outputIDInfo, Merger merger) Creates an immutable deep copy of the supplied dataset.static <T extends Output<T>>
MutableDataset<T>MutableDataset.createDeepCopy
(Dataset<T> other) Creates a deep copy of the suppliedDataset
which is mutable.static <T extends Output<T>>
ImmutableDataset<T>ImmutableDataset.hashFeatureMap
(Dataset<T> dataset, Hasher hasher) Creates an immutable shallow copy of the supplied dataset, using the hasher to generate aHashedFeatureMap
which transparently maps from the feature name to the hashed variant.IncrementalTrainer.incrementalTrain
(Dataset<T> newData, U model) Incrementally trains the supplied model with the new data.List<Prediction<T>>
Uses the model to predict the outputs for multiple examples contained in a data set.Selects features according to this selection algorithm from the specified dataset.default SparseModel<T>
Trains a sparse predictive model using the examples in the given data set.SparseTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) Trains a sparse predictive model using the examples in the given data set.default SparseModel<T>
SparseTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a predictive model using the examples in the given data set.Trains a predictive model using the examples in the given data set.Trainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) Trains a predictive model using the examples in the given data set.Trainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a predictive model using the examples in the given data set. -
Uses of Dataset in org.tribuo.anomaly.example
Modifier and TypeMethodDescriptionAnomalyDataGenerator.denseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.denseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.AnomalyDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.AnomalyDataGenerator.gaussianAnomaly()
Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.gaussianAnomaly()
Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.gaussianAnomaly
(long size, double fractionAnomalous) Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.gaussianAnomaly
(long size, double fractionAnomalous) Generates two datasets, one without anomalies drawn from a single gaussian and the second drawn from a mixture of two gaussians, with the second tagged anomalous.AnomalyDataGenerator.sparseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.sparseTrainTest()
Makes a simple dataset for training and testing.AnomalyDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.AnomalyDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. -
Uses of Dataset in org.tribuo.anomaly.liblinear
Modifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearAnomalyTrainer.extractData
(Dataset<Event> data, ImmutableOutputInfo<Event> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.anomaly.libsvm
Modifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMAnomalyTrainer.extractData
(Dataset<Event> data, ImmutableOutputInfo<Event> outputInfo, ImmutableFeatureMap featureMap) LibSVMAnomalyTrainer.train
(Dataset<Event> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) -
Uses of Dataset in org.tribuo.classification.baseline
Modifier and TypeMethodDescriptionDummyClassifierTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) DummyClassifierTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.dtree
Modifier and TypeMethodDescriptionprotected AbstractTrainingNode<Label>
CARTClassificationTrainer.mkTrainingNode
(Dataset<Label> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) -
Uses of Dataset in org.tribuo.classification.dtree.impl
ModifierConstructorDescriptionClassifierTrainingNode
(LabelImpurity impurity, Dataset<Label> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) Constructor which creates the inverted file. -
Uses of Dataset in org.tribuo.classification.ensemble
Modifier and TypeMethodDescriptionAdaBoostTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) If the trainer implementsWeightedExamples
then do boosting by weighting, otherwise do boosting by sampling.AdaBoostTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.example
Modifier and TypeMethodDescriptionLabelledDataGenerator.binarySparseTrainTest()
Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.binarySparseTrainTest()
Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.binarySparseTrainTest
(double negate) Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.binarySparseTrainTest
(double negate) Generates a pair of datasets with sparse features and unknown features in the test data.LabelledDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 classes, {Foo,Bar,Baz,Quux}.LabelledDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.LabelledDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.LabelledDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.LabelledDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. -
Uses of Dataset in org.tribuo.classification.experiments
-
Uses of Dataset in org.tribuo.classification.fs
Modifier and TypeMethodDescription -
Uses of Dataset in org.tribuo.classification.liblinear
Modifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearClassificationTrainer.extractData
(Dataset<Label> data, ImmutableOutputInfo<Label> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.classification.libsvm
Modifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMClassificationTrainer.extractData
(Dataset<Label> data, ImmutableOutputInfo<Label> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.classification.mnb
Modifier and TypeMethodDescriptionMultinomialNaiveBayesTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) MultinomialNaiveBayesTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.sgd.kernel
Modifier and TypeMethodDescriptionKernelSVMTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) KernelSVMTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.classification.xgboost
Modifier and TypeMethodDescriptionXGBoostClassificationTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) XGBoostClassificationTrainer.train
(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.clustering.example
Modifier and TypeMethodDescriptionClusteringDataGenerator.gaussianClusters
(long size, long seed) Generates a dataset drawn from a mixture of 5 2d gaussians.Modifier and TypeMethodDescriptionClusteringDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}, and there are 4 clusters, {0,1,2,3}.ClusteringDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.ClusteringDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.ClusteringDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.ClusteringDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data. -
Uses of Dataset in org.tribuo.clustering.hdbscan
-
Uses of Dataset in org.tribuo.clustering.kmeans
Modifier and TypeMethodDescriptionKMeansTrainer.train
(Dataset<ClusterID> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) KMeansTrainer.train
(Dataset<ClusterID> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.liblinear
Modifier and TypeMethodDescriptionprotected abstract com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearTrainer.extractData
(Dataset<T> data, ImmutableOutputInfo<T> outputInfo, ImmutableFeatureMap featureMap) Extracts the features andOutput
s in LibLinear's format.LibLinearTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) LibLinearTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.libsvm
Modifier and TypeMethodDescriptionprotected abstract com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMTrainer.extractData
(Dataset<T> data, ImmutableOutputInfo<T> outputInfo, ImmutableFeatureMap featureMap) Extracts the features andOutput
s in LibSVM's format.LibSVMTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) LibSVMTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.nearest
-
Uses of Dataset in org.tribuo.common.sgd
Modifier and TypeMethodDescriptionAbstractSGDTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) AbstractSGDTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.tree
Modifier and TypeMethodDescriptionprotected abstract AbstractTrainingNode<T>
AbstractCARTTrainer.mkTrainingNode
(Dataset<T> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) Makes the initial training node.AbstractCARTTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) AbstractCARTTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.common.xgboost
Modifier and TypeMethodDescriptionprotected static <T extends Output<T>>
XGBoostTrainer.DMatrixTuple<T>XGBoostTrainer.convertDataset
(Dataset<T> examples) Converts a dataset into a DMatrix.protected static <T extends Output<T>>
XGBoostTrainer.DMatrixTuple<T>XGBoostTrainer.convertDataset
(Dataset<T> examples, Function<T, Float> responseExtractor) Converts a dataset into a DMatrix.List<Prediction<T>>
Uses the model to predict the labels for multiple examples contained in a data set. -
Uses of Dataset in org.tribuo.data
Modifier and TypeMethodDescriptionDataOptions.load
(OutputFactory<T> outputFactory) Loads the training and testing data fromDataOptions.trainingPath
andDataOptions.testingPath
according to the other parameters specified in this class.DataOptions.load
(OutputFactory<T> outputFactory) Loads the training and testing data fromDataOptions.trainingPath
andDataOptions.testingPath
according to the other parameters specified in this class. -
Uses of Dataset in org.tribuo.data.csv
-
Uses of Dataset in org.tribuo.dataset
Modifier and TypeClassDescriptionfinal class
DatasetView<T extends Output<T>>
DatasetView provides an immutable view on anotherDataset
that only exposes selected examples.class
MinimumCardinalityDataset<T extends Output<T>>
This class creates a pruned dataset in which low frequency features that occur less than the provided minimum cardinality have been removed.final class
SelectedFeatureDataset<T extends Output<T>>
This class creates a pruned dataset which only contains the selected features.Modifier and TypeMethodDescriptionstatic <T extends Output<T>>
DatasetView<T>DatasetView.createBootstrapView
(Dataset<T> dataset, int size, long seed) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T>DatasetView.createBootstrapView
(Dataset<T> dataset, int size, long seed, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset.static <T extends Output<T>>
DatasetView<T>DatasetView.createView
(Dataset<T> dataset, Predicate<Example<T>> predicate, String tag) Creates a view from the supplied dataset, using the specified predicate to test if each example should be in this view.static <T extends Output<T>>
DatasetView<T>DatasetView.createWeightedBootstrapView
(Dataset<T> dataset, int size, long seed, float[] exampleWeights) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.static <T extends Output<T>>
DatasetView<T>DatasetView.createWeightedBootstrapView
(Dataset<T> dataset, int size, long seed, float[] exampleWeights, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> outputIDs) Generates a DatasetView bootstrapped from the supplied Dataset using the supplied example weights.ModifierConstructorDescriptionDatasetView
(Dataset<T> dataset, int[] exampleIndices, String tag) Creates a DatasetView which includes the supplied indices from the dataset.DatasetView
(Dataset<T> dataset, int[] exampleIndices, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, String tag) Creates a DatasetView which includes the supplied indices from the dataset.MinimumCardinalityDataset
(Dataset<T> dataset, int minCardinality) SelectedFeatureDataset
(Dataset<T> dataset, SelectedFeatureSet featureSet) Constructs a selected feature dataset using all the features in the supplied feature set.SelectedFeatureDataset
(Dataset<T> dataset, SelectedFeatureSet featureSet, int k) Constructs a selected feature dataset. -
Uses of Dataset in org.tribuo.datasource
Modifier and TypeMethodDescriptionstatic <T extends Output<T>>
voidLibSVMDataSource.writeLibSVMFormat
(Dataset<T> dataset, PrintStream out, boolean zeroIndexed, Function<T, Number> transformationFunc) Writes out a dataset in LibSVM format. -
Uses of Dataset in org.tribuo.ensemble
Modifier and TypeMethodDescriptionBaggingTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) BaggingTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) BaggingTrainer.trainSingleModel
(Dataset<T> examples, ImmutableFeatureMap featureIDs, ImmutableOutputInfo<T> labelIDs, int randInt, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a single model. -
Uses of Dataset in org.tribuo.evaluation
Modifier and TypeMethodDescriptionstatic <T extends Output<T>,
C extends MetricContext<T>>
com.oracle.labs.mlrg.olcut.util.Pair<Integer,Double> EvaluationAggregator.argmax
(EvaluationMetric<T, C> metric, List<? extends Model<T>> models, Dataset<T> dataset) Calculates the argmax of a metric across the supplied models (i.e., the index of the model which performed the best).final E
Produces an evaluation for the supplied model and dataset, by callingModel.predict(org.tribuo.Example<T>)
to create the predictions, then aggregating the appropriate statistics.Evaluates the dataset using the supplied model, returning an immutableEvaluation
of the appropriate type.Splits a dataset into k consecutive folds; for each fold, the remaining k-1 folds form the training set.static <T extends Output<T>,
C extends MetricContext<T>>
DescriptiveStatsEvaluationAggregator.summarize
(List<? extends EvaluationMetric<T, C>> metrics, Model<T> model, Dataset<T> dataset) Summarize model performance on dataset across several metrics.static <T extends Output<T>,
R extends Evaluation<T>>
Map<MetricID<T>,DescriptiveStats> EvaluationAggregator.summarize
(Evaluator<T, R> evaluator, List<? extends Model<T>> models, Dataset<T> dataset) Summarize performance using the supplied evaluator across several models on one dataset.static <T extends Output<T>,
C extends MetricContext<T>>
DescriptiveStatsEvaluationAggregator.summarize
(EvaluationMetric<T, C> metric, List<? extends Model<T>> models, Dataset<T> dataset) Summarize performance w.r.t.Modifier and TypeMethodDescriptionstatic <T extends Output<T>,
C extends MetricContext<T>>
com.oracle.labs.mlrg.olcut.util.Pair<Integer,Double> EvaluationAggregator.argmax
(EvaluationMetric<T, C> metric, Model<T> model, List<? extends Dataset<T>> datasets) Calculates the argmax of a metric across the supplied datasets.static <T extends Output<T>,
R extends Evaluation<T>>
Map<MetricID<T>,DescriptiveStats> EvaluationAggregator.summarize
(Evaluator<T, R> evaluator, Model<T> model, List<? extends Dataset<T>> datasets) Summarize performance according to evaluator for a single model across several datasets.static <T extends Output<T>,
C extends MetricContext<T>>
DescriptiveStatsEvaluationAggregator.summarize
(EvaluationMetric<T, C> metric, Model<T> model, List<? extends Dataset<T>> datasets) Summarize a model's performance w.r.t. -
Uses of Dataset in org.tribuo.evaluation.metrics
Modifier and TypeMethodDescriptiondefault C
EvaluationMetric.createContext
(Model<T> model, Dataset<T> dataset) Creates the metric context used to compute this metric's value, generatingPrediction
s for eachExample
in the supplied dataset. -
Uses of Dataset in org.tribuo.hash
Modifier and TypeMethodDescriptionHashingTrainer.train
(Dataset<T> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) This clones theDataset
, hashes each of the examples and rewrites their feature ids before passing it to the inner trainer.HashingTrainer.train
(Dataset<T> dataset, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.interop.tensorflow
-
Uses of Dataset in org.tribuo.math.la
Modifier and TypeMethodDescriptionstatic <T extends Output<T>>
SparseVector[]Converts a dataset of row-major examples into an array of column-major sparse vectors.static <T extends Output<T>>
SparseVector[]SparseVector.transpose
(Dataset<T> dataset, ImmutableFeatureMap fMap) Converts a dataset of row-major examples into an array of column-major sparse vectors. -
Uses of Dataset in org.tribuo.multilabel.baseline
Modifier and TypeMethodDescriptionClassifierChainTrainer.train
(Dataset<MultiLabel> examples) ClassifierChainTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) ClassifierChainTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) IndependentMultiLabelTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) IndependentMultiLabelTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.multilabel.ensemble
Modifier and TypeMethodDescriptionCCEnsembleTrainer.train
(Dataset<MultiLabel> examples) CCEnsembleTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) CCEnsembleTrainer.train
(Dataset<MultiLabel> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.multilabel.example
Modifier and TypeMethodDescriptionstatic Dataset<MultiLabel>
MultiLabelDataGenerator.generateTestData()
Simple test data for checking multi-label trainers.static Dataset<MultiLabel>
MultiLabelDataGenerator.generateTrainData()
Simple training data for checking multi-label trainers.Modifier and TypeMethodDescriptionstatic com.oracle.labs.mlrg.olcut.util.Pair<Dataset<MultiLabel>,
Dataset<MultiLabel>> MultiLabelDataGenerator.generateDataset()
Generate training and testing datasets.static com.oracle.labs.mlrg.olcut.util.Pair<Dataset<MultiLabel>,
Dataset<MultiLabel>> MultiLabelDataGenerator.generateDataset()
Generate training and testing datasets. -
Uses of Dataset in org.tribuo.provenance
ModifierConstructorDescriptionDatasetProvenance
(DataProvenance sourceProvenance, com.oracle.labs.mlrg.olcut.provenance.ListProvenance<com.oracle.labs.mlrg.olcut.provenance.ObjectProvenance> transformationProvenance, Dataset<T> dataset) Creates a dataset provenance from the supplied dataset. -
Uses of Dataset in org.tribuo.regression.baseline
Modifier and TypeMethodDescriptionDummyRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) DummyRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.regression.example
Modifier and TypeMethodDescriptionRegressionDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.denseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.denseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest()
Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimDenseTrainTest
(double negate) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.multiDimSparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.multiDimSparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.multiDimSparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.multiDimSparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest()
Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.sparseTrainTest
(double negate) Generates a pair of datasets, where the features are sparse, and unknown features appear in the test data.RegressionDataGenerator.threeDimDenseTrainTest
(double negate, boolean remapIndices) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}.RegressionDataGenerator.threeDimDenseTrainTest
(double negate, boolean remapIndices) Generates a train/test dataset pair which is dense in the features, each example has 4 features,{A,B,C,D}. -
Uses of Dataset in org.tribuo.regression.impl
Modifier and TypeMethodDescriptionSkeletalIndependentRegressionSparseTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) SkeletalIndependentRegressionSparseTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) SkeletalIndependentRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) SkeletalIndependentRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.regression.liblinear
Modifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<de.bwaldvogel.liblinear.FeatureNode[][],
double[][]> LibLinearRegressionTrainer.extractData
(Dataset<Regressor> data, ImmutableOutputInfo<Regressor> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.regression.libsvm
Modifier and TypeMethodDescriptionprotected com.oracle.labs.mlrg.olcut.util.Pair<libsvm.svm_node[][],
double[][]> LibSVMRegressionTrainer.extractData
(Dataset<Regressor> data, ImmutableOutputInfo<Regressor> outputInfo, ImmutableFeatureMap featureMap) -
Uses of Dataset in org.tribuo.regression.rtree
Modifier and TypeMethodDescriptionprotected AbstractTrainingNode<Regressor>
CARTJointRegressionTrainer.mkTrainingNode
(Dataset<Regressor> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) protected AbstractTrainingNode<Regressor>
CARTRegressionTrainer.mkTrainingNode
(Dataset<Regressor> examples, AbstractTrainingNode.LeafDeterminer leafDeterminer) CARTRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) CARTRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.regression.rtree.impl
Modifier and TypeMethodDescriptionRegressorTrainingNode.invertData
(Dataset<Regressor> examples) Inverts a training dataset from row major to column major.ModifierConstructorDescriptionJointRegressorTrainingNode
(RegressorImpurity impurity, Dataset<Regressor> examples, boolean normalize, AbstractTrainingNode.LeafDeterminer leafDeterminer) Constructor which creates the inverted file. -
Uses of Dataset in org.tribuo.regression.slm
Modifier and TypeMethodDescriptionElasticNetCDTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) ElasticNetCDTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) SLMTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) Trains a sparse linear model.SLMTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) Trains a sparse linear model. -
Uses of Dataset in org.tribuo.regression.xgboost
Modifier and TypeMethodDescriptionXGBoostRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance) XGBoostRegressionTrainer.train
(Dataset<Regressor> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount) -
Uses of Dataset in org.tribuo.reproducibility
Modifier and TypeMethodDescriptionReproUtil.recoverDataset()
Return aDataset
used when a model was trained. -
Uses of Dataset in org.tribuo.sequence
Modifier and TypeMethodDescriptionSequenceDataset.getFlatDataset()
Returns a view on this SequenceDataset which aggregates all the examples and ignores the sequence structure. -
Uses of Dataset in org.tribuo.transform
Modifier and TypeMethodDescriptionList<Prediction<T>>
TransformTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance) TransformTrainer.train
(Dataset<T> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance, int invocationCount) <T extends Output<T>>
MutableDataset<T>TransformerMap.transformDataset
(Dataset<T> dataset) Copies the supplied dataset and applies the transformers to each example in it.<T extends Output<T>>
MutableDataset<T>TransformerMap.transformDataset
(Dataset<T> dataset, boolean densify) Copies the supplied dataset and applies the transformers to each example in it.