All Classes and Interfaces

Class
Description
Absolute loss (i.e., l1).
Base class for Trainer's that use an approximation of the CART algorithm to build a decision tree.
Deprecated.
AbstractEvaluator<T extends Output<T>,C extends MetricContext<T>,E extends Evaluation<T>,M extends EvaluationMetric<T,C>>
Base class for evaluators.
A quadratic factorization machine model trained using SGD.
A trainer for a quadratic factorization machine model which uses SGD.
A linear model trained using SGD.
A trainer for a linear model which uses SGD.
Base class for sequence evaluators.
A model trained using SGD.
A nominal tuple used to capture the prediction and the number of active features used by the model.
A trainer for a model which uses SGD.
Base class for decision tree nodes used at training time.
Contains parameters needed to determine whether a node is a leaf.
Implements Adaboost.SAMME one of the more popular algorithms for multiclass boosting.
An implementation of the AdaDelta gradient optimiser.
An implementation of the AdaGrad gradient optimiser.
An implementation of the AdaGrad gradient optimiser with regularized dual averaging.
An implementation of the Adam gradient optimiser.
Aggregates multiple ConfigurableDataSources, uses AggregateDataSource.IterationOrder to control the iteration order.
Aggregates multiple DataSources, uses AggregateDataSource.IterationOrder to control the iteration order.
Provenance for the AggregateDataSource.
Specifies the iteration order of the inner sources.
Aggregates all the classification algorithms.
Types of algorithms supported.
Generates three example train and test datasets, used for unit testing.
An Evaluation for anomaly detection Events.
An Evaluator for anomaly detection Events.
A factory for generating events.
Provenance for AnomalyFactory.
The base class for tracking anomalous events.
A metric for evaluating anomaly detection problems.
Default metrics for evaluating anomaly detection.
An Example backed by two arrays, one of String and one of double.
A feature aggregator that averages feature values across a feature list.
A combiner which performs a weighted or unweighted average of the predicted regressors independently across the output dimensions.
A Trainer that wraps another trainer and produces a bagged ensemble.
An example implementation of TextPipeline.
Builds examples and sequence examples using features from BERT.
CLI options for running BERT.
The type of output pooling to perform.
A multilabel version of binary cross entropy loss which expects logits.
An Example backed by a single array of feature names.
A ResponseProcessor that takes a single value of the field as the positive class and all other values as the negative class.
A Transformation which bins values.
Provenance for BinningTransformation.
The implementation of a Transformer which splits the input into n bins.
The allowed binning types.
A tokenizer wrapping a BreakIterator instance.
CLI options for a BreakIteratorTokenizer.
A pair of things with a cached hashcode.
A triple of things.
Options for building a classification tree trainer.
The impurity algorithm.
Type of decision tree algorithm.
A Trainer that uses an approximation of the CART algorithm to build a decision tree.
A Trainer that uses an approximation of the CART algorithm to build a decision tree.
A Trainer that uses an approximation of the CART algorithm to build a decision tree.
A document preprocessor which uppercases or lowercases the input.
The possible casing operations.
Same as a CategoricalInfo, but with an additional int id field.
Stores information about Categorical features.
A trainer for an ensemble of randomly ordered Classifier Chains.
A collection of helper methods for performing training and inference in a CRF.
Belief Propagation results.
Clique scores within a chain.
Viterbi output from a linear chain.
Creates a data source using a 2d checkerboard of alternating classes.
Chunk class used for chunk level confidence prediction in the CRFModel.
A tag interface for multi-class and multi-label classification tasks.
Options for building a classification ensemble.
The type of ensemble.
An Options that can produce a classification Trainer based on the provided arguments.
A Classifier Chain Model.
A trainer for a Classifier Chain.
Defines methods that calculate classification performance, used for both multi-class and multi-label classification.
A decision tree node used at training time.
A clustering id.
Generates three example train and test datasets, used for unit testing.
An Evaluation for clustering tasks.
A Evaluator for clustering using ClusterIDs.
A factory for making ClusterID related classes.
Provenance for ClusteringFactory.
The base class for a ClusterID OutputInfo.
A metric for evaluating clustering problems.
Default metrics for evaluating clusterings.
Selects features according to the Conditional Mutual Information Maximisation algorithm.
Static factory methods which produce Convolutional Neural Network architectures.
A ConfigurableDataSource base class which takes columnar data (e.g., csv or DB table rows) and generates Examples.
An explainer for data using Tribuo's columnar data package.
A Feature with extra bookkeeping for use inside the columnar package.
An abstract class for iterators that read data in to a columnar format, usually from a file of some kind.
A representation of a row of untyped data from a columnar data source.
Build and run a predictor for a standard dataset.
Command line options.
A data source for two concentric circles, one per class.
A Sequence model which can provide confidence predictions for subsequence predictions.
A range class used to define a subsequence of a SequenceExample.
It's a DataSource that's also Configurable.
Build and run a classifier for a standard dataset.
Build and run a predictor for a standard dataset.
Command line options.
Command line options.
A tag interface for configurable data source provenance.
A confusion matrix for Classifiables.
Static functions for computing classification metrics based on a ConfusionMatrix.
CLI Options for all the tokenizers in the core package.
Tokenizer type.
Cosine similarity used as a distance measure.
An inference time model for a linear chain CRF trained using SGD.
The type of subsequence level confidence to predict.
CLI options for training a linear chain CRF model.
A Parameters for training a CRF using SGD.
A trainer for CRFs using SGD.
CrossValidation<T extends Output<T>,E extends Evaluation<T>>
A class that does k-fold cross-validation.
A DataSource for loading separable data from a text file (e.g., CSV, TSV) and applying FieldProcessors to it.
Provenance for CSVDataSource.
An iterator over a CSV file.
CSVLoader<T extends Output<T>>
Load a DataSource/Dataset from a CSV file.
Deprecated.
Deprecated in 4.2 as CSVLoader now returns a CSVDataSource.
Saves a Dataset in CSV format suitable for loading by CSVLoader.
Options for working with training and test data in a CLI.
The delimiters supported by CSV files in this options object.
The input formats supported by this options object.
Tag interface for data sources provenances.
Dataset<T extends Output<T>>
A class for sets of data, which are used to train and evaluate classifiers.
Serialization carrier for common fields in Dataset.
A CLI for exploring a serialised Dataset.
Command line options.
Base class for dataset provenance.
DatasetView<T extends Output<T>>
DatasetView provides an immutable view on another Dataset that only exposes selected examples.
Provenance for the DatasetView.
DataSource<T extends Output<T>>
A interface for things that can be given to a Dataset's constructor.
Data source provenance.
Extracts the field value and translates it to a LocalDate based on the specified DateTimeFormatter.
Processes a column that contains a date value.
The types of date features which can be extracted.
A tag interface for a Trainer so the random forests trainer can check if it's actually a tree.
A label feature extractor that produces several kinds of label-based features.
The base class for the 2d binary classification data sources in org.tribuo.classification.example.
Provenance for DemoLabelDataSource.
Converts a sparse example into a dense float vector, then wraps it in a TFloat32.
A dense matrix, backed by a primitive array.
The output of a successful Cholesky factorization.
The output of a successful eigen decomposition.
The output of a successful LU factorization.
A matrix which is dense in the first dimension and sparse in the second.
Converts a sparse Tribuo example into a dense float vector, then wraps it in an OnnxTensor.
A dense vector, backed by a double array.
Descriptive statistics calculated across a list of doubles.
A data source for a somewhat-common format for text classification datasets: a top level directory that contains a number of subdirectories.
Provenance for DirectoryFileSource.
Interface for distance functions.
The built-in distance functions.
An interface for things that can pre-process documents before they are broken into features.
Extracts the field value and converts it to a double.
Processes a column that contains a real value.
A model which performs dummy classifications (e.g., constant output, uniform sampled labels, stratified sampled labels).
A trainer for simple baseline classifiers.
Types of dummy classifier.
A model which performs dummy regressions (e.g., constant output, gaussian sampled output, mean value, median, quartile).
A trainer for simple baseline regressors.
Deprecated.
Types of dummy regression model.
An ElasticNet trainer that uses co-ordinate descent.
An empty DatasetProvenance, should not be used except by the provenance removal system.
An empty DataSourceProvenance, should not be used except by the provenance removal system.
A ResponseProcessor that always emits an empty optional.
An empty TrainerProvenance, should not be used except by the provenance removal system.
An interface for combining predictions.
An Excuse which has a List of excuses for each of the ensemble members.
A model which contains a list of other Models.
Model provenance for ensemble models.
A log_e entropy impurity measure.
Evaluation<T extends Output<T>>
An immutable evaluation of a specific model and dataset.
Aggregates metrics from a list of evaluations, or a list of models and datasets.
A metric that can be calculated for the specified output type.
Specifies what form of average to use for a EvaluationMetric.
Provenance for evaluations.
Renders an Evaluation into a String.
Evaluator<T extends Output<T>,E extends Evaluation<T>>
An evaluation factory which produces immutable Evaluations of a given Dataset using the given Model.
An Output representing either an Event.EventType.ANOMALOUS or an Event.EventType.EXPECTED event.
The type of event.
Example<T extends Output<T>>
An example used for training and evaluation.
Transforms a SparseVector, extracting the features from it as a OnnxTensor.
Excuse<T extends Output<T>>
Holds an Example, a Prediction and a Map from String to List of Pairs that contains the per output explanation.
Explanation<T extends Output<T>>
An explanation knows what features are used, what the explaining Model is and what the original Model's prediction is.
Normalizes the exponential values of the input array.
A dummy provenance used to describe the dataset of external models.
This is the base class for third party models which are trained externally and loaded into Tribuo for prediction.
A dummy provenance for a model trained outside Tribuo.
A trainer which produces an Extremely Randomized Tree Ensemble.
A class for features.
An interface for aggregating feature values into other values.
Transforms an Example or SGDVector, extracting the features from it as a TensorMap.
Hashes the feature names to reduce the dimensionality.
A map from Strings to VariableInfo objects storing information about a feature.
Takes a list of columnar features and adds new features or removes existing features.
An interface for feature selection algorithms.
A tag interface for feature selection algorithms.
An implementation of FeatureSelectorProvenance which delegates everything to SkeletalConfiguredObjectProvenance.
Contains provenance information for an instance of a SelectedFeatureSet.
A feature transformer maps a list of features to a new list of features.
A Parameters for models which make a single prediction like logistic regressions and neural networks.
Extracts a value from a field to be placed in an Example's metadata field.
An interface for things that process the columns in a data set.
The types of generated features.
A response processor that returns the value(s) in a given (set of) fields.
Extracts the field value and converts it to a float.
The inference time version of a factorization machine trained using SGD.
CLI options for training a factorization machine classifier.
Available loss types.
A trainer for a classification factorization machine using SGD.
The inference time version of a multi-label factorization machine trained using SGD.
CLI options for training a linear classifier.
Available loss types.
A trainer for a multi-label classification factorization machine using SGD.
A Parameters for factorization machines.
The inference time model of a regression factorization machine trained using SGD.
A trainer for a regression factorization machine using SGD.
A combiner which performs a weighted or unweighted vote across the predicted labels.
Static functions for computing the Gamma and log Gamma functions on real valued inputs.
Generates an anomaly detection dataset sampling each feature uniformly from a univariate Gaussian.
Generates a clustering dataset drawn from a mixture of 5 Gaussians.
Generates a single dimensional output drawn from N(slope*x + intercept,variance).
Provenance for GaussianDataSource.
A data source for two classes generated from separate Gaussians.
The Gini index impurity measure.
An enum for the gradient optimisers exposed by TensorFlow-Java.
CLI options for configuring a gradient optimiser.
Type of the gradient optimisers available in CLIs.
A tuple containing a graph def protobuf along with the relevant operation names.
Hashes names using String.hashCode().
Provenance for the HashCodeHasher.
A FeatureMap used by the HashingTrainer to provide feature name hashing and guarantee that the Model does not contain feature name information, but still works with unhashed features names.
An abstract base class for hash functions used to hash the names of features.
An Options implementation which provides CLI arguments for the model hashing functionality.
Supported types of hashes in CLI programs.
A SequenceTrainer that hashes all the feature names on the way in.
Provenance for HashingSequenceTrainer.
A Trainer which hashes the Dataset before the Model is produced.
A trained HDBSCAN* model which provides the cluster assignment labels and outlier scores for every data point.
OLCUT Options for the HDBSCAN* implementation.
An HDBSCAN* trainer which generates a hierarchical, density-based clustering representation of the supplied data.
A cluster exemplar, with attributes for the point's label, outlier score and its features.
Deprecated.
This Enum is deprecated in version 4.3, replaced by DistanceType
Merges each SparseVector separately using a PriorityQueue as a heap.
Hinge loss, scores the correct value margin and any incorrect predictions -margin.
Hinge loss, scores the correct value margin and any incorrect predictions -margin.
Utilities for nice HTML output that can be put in wikis and such.
Huber loss, i.e., a mixture of l2 and l1 losses.
Extracts the field value and emits it as a String.
A FieldProcessor which converts the field name and value into a feature with a value of IdentityProcessor.FEATURE_VALUE.
A feature transformation that computes the IDF for features and then transforms them with a TF-IDF weighting.
Provenance for IDFTransformation.
A DataSource which can read IDX formatted data (i.e., MNIST).
Java side representation for an IDX file.
Provenance class for IDXDataSource.
The possible IDX input formats.
Image converter.
Image transformer.
An ImmutableOutputInfo object for Events.
An ImmutableOutputInfo object for ClusterIDs.
This is a Dataset which has an ImmutableFeatureMap to store the feature information.
ImmutableFeatureMap is used when unknown features should not be added to the FeatureMap.
An ImmutableOutputInfo object for Labels.
An ImmutableOutputInfo for working with MultiLabel tasks.
An OutputInfo that is fixed, and contains an id number for each valid output.
This is a SequenceDataset which has an ImmutableFeatureMap to store the feature information.
IncrementalTrainer<T extends Output<T>,U extends Model<T>>
An interface for incremental training of Models.
A Model which wraps n binary models, where n is the size of the MultiLabel domain.
Trains n independent binary Models, each of which predicts a single Label.
A Model wrapped around a list of decision tree root Nodes used to generate independent predictions for each dimension in a regression.
A SequenceModel which independently predicts each element of the sequence.
Trains a sequence model by training a regular model to independently predict every example in each sequence.
A version of ArrayExample which also has the id numbers.
A tuple of the feature name, id and value.
An Extractor with special casing for loading the index from a Row.
A class of (discrete) information theoretic functions.
An immutable named tuple containing the statistics from a G test.
Demo showing how to calculate various mutual informations and entropies.
Command line options.
Type of data distribution.
An array container which maintains the array and the size.
A Pair of a primitive int and a primitive double.
A data source of two interleaved half circles.
Extracts the field value and converts it to a int.
Internal datastructure for implementing a decision tree.
Selects features according to the Joint Mutual Information algorithm.
A decision tree node used at training time.
A DataSource for loading data from a JSON text file and applying FieldProcessors to it.
Provenance for JsonDataSource.
An iterator for JSON format files converting them into a format suitable for RowProcessor.
Utilities for interacting with JSON objects or text representations.
A k-d tree nearest neighbour query implementation.
A factory which creates k-d tree nearest neighbour query objects.
An interface for a Mercer kernel function.
The inference time version of a kernel model trained using Pegasos.
Options for using the KernelSVMTrainer.
The kernel types.
A trainer for a kernelised model using the Pegasos optimiser.
Kernel types from libsvm.
A k-fold splitter to be used in cross-validation.
Stores a train/test split for a dataset.
A K-Means model with a selectable distance function.
OLCUT Options for the K-Means implementation.
A K-Means trainer, which generates a K-means clustering of the supplied data.
Deprecated.
This Enum is deprecated in version 4.3, replaced by DistanceType
Possible initialization functions.
CLI Options for training a k-nearest neighbour predictor.
The type of combination function.
KNNModel<T extends Output<T>>
A k-nearest neighbours model.
The parallel backend for batch predictions.
KNNTrainer<T extends Output<T>>
A Trainer for k-nearest neighbour models.
Deprecated.
This Enum is deprecated in version 4.3, replaced by DistanceType
L1 (or Manhattan) distance.
L2 (or Euclidean) distance.
An immutable multi-class classification label.
A confusion matrix for Labels.
Can convert a Label into a Tensor containing one hot encoding of the label and can convert a TFloat16 or TFloat32 into a Prediction or a Label.
Adds multi-class classification specific metrics to ClassifierEvaluation.
Static utility functions for calculating performance metrics on Labels.
Stores the Precision-Recall curve as three arrays: the precisions, the recalls, and the thresholds associated with those values.
Stores the ROC curve as three arrays: the false positive rate, the true positive rate, and the thresholds associated with those rates.
An Evaluator for Labels.
A factory for making Label related classes.
Provenance for LabelFactory.
A class for featurising labels from previous steps in Viterbi.
Calculates a tree impurity score based on label counts, weighted label counts or a probability distribution.
The base class for information about multi-class classification Labels.
Generates three example train and test datasets, used for unit testing.
A EvaluationMetric for Labels which calculates the value based on a ConfusionMatrix.
The context for a LabelMetric is a ConfusionMatrix.
An enum of the default LabelMetrics supported by the multi-class classification evaluation package.
An interface for single label prediction objectives.
Can convert an OnnxValue into a Prediction or a Label.
A class that can be used to evaluate a sequence label classification model element wise on a given set of data.
A sequence evaluator for labels.
Can convert an OnnxValue into a Prediction or a Label.
A trainer for a lasso linear regression model which uses LARS to construct the model.
A trainer for a linear regression model which uses least angle regression.
LeafNode<T extends Output<T>>
An immutable leaf Node that can create a prediction.
A Model which wraps a LibLinear-java anomaly detection model.
A Trainer which wraps a liblinear-java anomaly detection trainer using a one-class SVM.
A Model which wraps a LibLinear-java classification model.
A Trainer which wraps a liblinear-java classifier trainer.
A Model which wraps a LibLinear-java model.
Command line options for working with a classification liblinear model.
A Model which wraps a LibLinear-java model.
A Trainer which wraps a liblinear-java regression trainer.
A Trainer which wraps a liblinear-java trainer.
A carrier type for the liblinear algorithm type.
An anomaly detection model that uses an underlying libSVM model to make the predictions.
A trainer for anomaly models that uses LibSVM.
A classification model that uses an underlying LibSVM model to make the predictions.
A trainer for classification models that uses LibSVM.
A DataSource which can read LibSVM formatted data.
The provenance for a LibSVMDataSource.
LibSVMModel<T extends Output<T>>
A model that uses an underlying libSVM model to make the predictions.
CLI options for training a LibSVM classification model.
A regression model that uses an underlying libSVM model to make the predictions.
A trainer for regression models that uses LibSVM.
A trainer that will train using libsvm's Java implementation.
LIMEBase merges the lime_base.py and lime_tabular.py implementations, and deals with simple matrices of numerical or categorical data.
Uses the columnar data processing infrastructure to mix text and tabular data.
An Explanation using LIME.
Uses a Tribuo TextFeatureExtractor to explain the prediction for a given piece of text.
A CLI for interacting with LIMEText.
Command line options.
A linear kernel, u.dot(v).
The carrier type for liblinear anomaly detection modes.
The different model types available for classification.
The carrier type for liblinear classification modes.
The different model types available for classification.
A Parameters for producing linear models.
The carrier type for liblinear linear regression modes.
The type of linear regression algorithm.
A Transformation which takes an observed distribution and rescales it so all values are between the desired min and max.
The inference time version of a linear model trained using SGD.
The inference time version of a multi-label linear model trained using SGD.
The inference time version of a linear model trained using SGD.
CLI options for training a linear classifier.
CLI options for training a linear classifier.
Available loss types.
Available loss types.
A trainer for a linear classifier using SGD.
A trainer for a multi-label linear model which uses SGD.
A trainer for a linear regression model which uses SGD.
A data source which wraps up a list of Examples along with their DataSourceProvenance and an OutputFactory.
ListExample<T extends Output<T>>
This class will not be performant until value types are available in Java.
A logistic regression trainer that uses a reasonable objective, optimiser, number of epochs and minibatch size.
A multiclass version of the log loss.
Interface for 2 dimensional Tensors.
Interface for matrix factorizations.
Merges each DenseSparseMatrix using a PriorityQueue as a heap on the MatrixIterator.
A mutable tuple used to avoid allocation when iterating a matrix.
Measures the mean absolute error over a set of inputs.
Measures the mean squared error over a set of inputs.
A Transformation which takes an observed distribution and rescales it so it has the desired mean and standard deviation.
Provenance for MeanStdDevTransformation.
An accumulator for online calculation of the mean and variance of a stream of doubles.
An interface for merging an array of DenseSparseMatrix into a single DenseSparseMatrix.
An interface which can merge double values.
Hashes Strings using the supplied MessageDigest type.
Provenance for MessageDigestHasher.
The context for a metric or set of metrics.
MetricID<T extends Output<T>>
Just an easier-to-read alias for Pair<MetricTarget<T>, String>.
Used by a given EvaluationMetric to determine whether it should compute its value for a specific Output value or whether it should average them.
Selects features according to their mutual information with the class label (aka Mutual Information Maximisation).
This class creates a pruned dataset in which low frequency features that occur less than the provided minimum cardinality have been removed.
This class creates a pruned dataset in which low frequency features that occur less than the provided minimum cardinality have been removed.
Static factory methods which produce Multi-Layer Perceptron architectures.
Model<T extends Output<T>>
A prediction model, which is used to predict outputs for unseen instances.
ModelCard feature to allow more transparent model reporting.
A command line interface for creating and appending UsageDetails to the serialized version of an existing ModelCard.
CLI options for ModelCardCLI.
Serialization carrier for common fields in Model and SequenceModel.
ModelDetails section of a ModelCard.
A command line interface for loading in models and inspecting their feature and output spaces.
CLI options for ModelExplorer.
Contains provenance information for an instance of a Model.
Hashes names using String.hashCode(), then reduces the dimension.
Provenance for the ModHashCodeHasher.
Selects features according to the Minimum Redundancy Maximum Relevance algorithm.
A class for multi-label classification.
A ConfusionMatrix which accepts MultiLabels.
Can convert a MultiLabel into a Tensor containing a binary encoding of the label vector and can convert a TFloat16 or TFloat32 into a Prediction or a MultiLabel.
Generates three example train and test datasets, used for unit testing.
The implementation of a MultiLabelEvaluation using the default metrics.
An Evaluator for MultiLabel problems.
A factory for generating MultiLabel objects and their associated OutputInfo and Evaluator objects.
Provenance for MultiLabelFactory.
Generates a multi label output drawn from a series of functions.
The base class for information about MultiLabel outputs.
A EvaluationMetric for evaluating MultiLabel problems.
An enum of the default MultiLabelMetrics supported by the multi-label classification evaluation package.
An interface for multi-label prediction objectives.
Can convert an OnnxValue into a Prediction or a MultiLabel.
A combiner which performs a weighted or unweighted vote independently across the predicted labels in each multi-label.
A Model for multinomial Naive Bayes with Laplace smoothing.
CLI options for a multinomial naive bayes model.
A Trainer which trains a multinomial Naive Bayes model with Laplace smoothing.
A class for sampling from multivariate normal distributions.
The MurmurHash3 algorithm was created by Austin Appleby and placed in the public domain.
128 bits of state
An MutableOutputInfo object for Events.
A mutable ClusteringInfo.
A MutableDataset is a Dataset with a MutableFeatureMap which grows over time.
A feature map that can record new feature value observations.
A mutable LabelInfo.
A MutableOutputInfo for working with multi-label tasks.
A mutable OutputInfo that can record observed output values.
A MutableSequenceDataset is a SequenceDataset with a MutableFeatureMap which grows over time.
A brute-force nearest neighbour query implementation.
A factory which creates brute-force nearest neighbour query objects.
An interface for nearest neighbour query objects.
An interface for factories which create nearest neighbour query objects.
These are the supported neighbour query implementations.
A document pre-processor for 20 newsgroup data.
A text processor that will generate token ngrams of a particular size.
Node<T extends Output<T>>
A node in a decision tree.
A data source of two interleaved half circles with some zero mean Gaussian noise applied to each point.
Generates a single dimensional output drawn from N(w_0*x_0 + w_1*x_1 + w_2*x_1*x_0 + w_3*x_1*x_1*x_1 + intercept,variance).
A convenience class for when you are required to provide a tokenizer but you don't actually want to split up the text into tokens.
A label feature extractor that doesn't produce any label based features.
NoopNormalizer returns a copy in NoopNormalizer.normalize(double[]) and is a no-op in place.
Normalizes, but first subtracts the minimum value (to ensure positivity).
A converter for DenseMatrix and DenseVector into Label Predictions.
OCIModel<T extends Output<T>>
A wrapper class around an OCI Data Science Model Deployment endpoint which sends off inputs for scoring and converts the output into a Tribuo prediction.
Carrier type for easy deserialization from JSON.
This class provides a CLI for deploying and scoring a Tribuo Classification model.
Options for the OCIModelCLI.
Mode for the CLI.
A converter for DenseMatrix and DenseVector into MultiLabel Predictions.
Converter for a DenseMatrix received from OCI Data Science Model Deployment.
A converter for DenseMatrix and DenseVector into Regressor Predictions.
Utils for uploading and deploying models to OCI Data Science.
Configuration for OCI DS.
Configuration for an OCI DS Model artifact.
Configuration for an OCI DS Model Deployment.
Enum for OCI model types.
Extracts the field value and translates it to an OffsetDateTime based on the specified DateTimeFormatter.
OnlineEvaluator<T extends Output<T>,E extends Evaluation<T>>
An evaluator which aggregates predictions and produces Evaluations covering all the Predictions it has seen or created.
The spec for an attribute, used to produce the attribute proto at construction time.
Context object used to scope and manage the creation of ONNX OnnxMl.GraphProto and OnnxMl.ModelProto instances.
An interface which denotes this Model can be exported as an ONNX model.
A Tribuo wrapper around a ONNX model.
A subclass of ONNXRef specialized for OnnxMl.TensorProto.
Tribuo Math specific helper functions for building ONNX protos.
A subclass of ONNXRef specialized for OnnxMl.NodeProto.
An interface for ONNX operators.
ONNX Opset 13, and ONNX-ML version 1.
A subclass of ONNXRef specialized for OnnxMl.ValueInfoProto.
ONNXRef<T extends com.google.protobuf.GeneratedMessageV3>
An abstract reference that represents both a node in an ONNX computation graph and a container for a specific ONNX proto object that denotes that node.
Helper functions for building ONNX protos.
Output<T extends Output<T>>
Output is the root interface for the supported prediction types.
Converts the Output into a Tensor and vice versa.
An interface associated with a specific Output, which can generate the appropriate Output subclass, and OutputInfo subclass.
A tag provenance for an output factory.
OutputInfo<T extends Output<T>>
Tracks relevant properties of the appropriate Output subclass.
Converts an OnnxValue into an Output or a Prediction.
A count distribution over CachedPair objects.
Averages the parameters across a gradient run.
An interface to a Tensor[] array which accepts updates to the parameters.
An implementation of the Pegasos gradient optimiser used primarily for solving the SVM problem.
A polynomial kernel, (gamma*u.dot(v) + intercept)^degree.
Prediction<T extends Output<T>>
A prediction made by a Model.
Reads in a Datasource, processes all the data, and writes it out as a serialized dataset.
Command line options.
ProtoSerializable<T extends com.google.protobuf.Message>
Interface for serializing an implementing object to the specified protobuf.
Mark a class as being ProtoSerializable and specify the class type used to serialize the "serialized_data".
Annotation which denotes that a field should be part of the protobuf serialized representation.
Annotation which denotes that the map field this is applied to is serialized as two repeated fields, one for keys and one for values.
Annotation which denotes that a map field should be part of the protobuf serialized representation.
Annotation which denotes that the map field this is applied to is serialized as a list of values.
Utilities for working with Tribuo protobufs.
A quartile to split data into 4 chunks.
Processes the response into quartiles and emits them as classification outputs.
A trainer which produces a random forest.
A range currently being segmented.
A Radial Basis Function (RBF) kernel, exp(-gamma*|u-v|^2).
Same as a RealInfo, but with an additional int id field.
Stores information about real valued features.
A FieldProcessor which applies a regex to a field and generates ColumnarFeatures based on the matches.
Matching mode.
A simple document preprocessor which applies regular expressions to the input.
Generates two example train and test datasets, used for unit testing.
Defines methods that calculate regression performance.
A Evaluator for multi-dimensional regression using Regressors.
A factory for creating Regressors and RegressionInfos.
Provenance for RegressionFactory.
The base class for regression information using Regressors.
A EvaluationMetric for Regressors which calculates the metric based on a the true values and the predicted values.
An enum of the default RegressionMetrics supported by the multi-dimensional regression evaluation package.
An interface for regression objectives.
The sufficient statistics for regression metrics (i.e., each prediction and each true value).
An Output for n-dimensional real valued regression.
A Regressor which contains a single dimension, used internally when the model implementation doesn't natively support multi-dimensional regression outputs.
Can convert a Regressor to a TFloat32 vector and a TFloat32 into a Prediction or Regressor.
Calculates a tree impurity score based on the regression targets.
Tuple class for the impurity and summed weight.
A decision tree node used at training time.
Tuple containing an inverted dataset (i.e., feature-wise not exmaple-wise).
Can convert an OnnxValue into a Prediction or Regressor.
ReproUtil<T extends Output<T>>
Reproducibility utility based on Tribuo's provenance objects.
Record for any differences between feature sets.
Record for a model reproduction.
Record for any differences between output domains.
Utils for working with classpath resources at test time.
An interface that will take the response field and produce an Output.
An iterator over a ResultSet returned from JDBC.
An implementation of the RMSProp gradient optimiser.
A row of values from a RowList.
An implementation of a List which wraps a set of lists.
A processor which takes a Map of String to String and returns an Example.
Builder for RowProcessor.
Trains and tests a model using the supplied data, for each trainer inside a configuration file.
Command line options.
This class creates a pruned dataset which only contains the selected features.
Provenance for SelectedFeatureDataset.
A record-like class for a selected feature set.
Build and run a sequence classifier on a generated dataset.
Command line options.
Build and run a sequence classifier on a generated or serialized dataset using the trainer specified in the configuration file.
Command line options.
A data generator for smoke testing sequence label models.
A class for sets of data, which are used to train and evaluate classifiers.
A interface for things that can be given to a SequenceDataset's constructor.
An immutable evaluation of a specific sequence model and dataset.
An evaluation factory which produces immutable SequenceEvaluations of a given SequenceDataset using the given SequenceModel.
A sequence of examples, used for sequence classification.
Converts a sequence example into a feed dict suitable for TensorFlow.
A prediction model, which is used to predict outputs for unseen instances.
A CLI for interacting with a SequenceModel.
Command line options.
Converts a TensorFlow output tensor into a list of predictions, and a Tribuo sequence example into a Tensorflow tensor suitable for training.
An interface for things that can train sequence prediction models.
An implementation of single learning rate SGD and optionally momentum.
Momentum types.
An interface for a loss function that can produce the loss and gradient incurred by a single prediction.
Interface for 1 dimensional Tensors.
This tokenizer is loosely based on the notion of word shape which is a common feature used in NLP.
A subclass of DenseMatrix which shrinks the value every time a new value is added.
An interface which tags a Tensor with a convertToDense method.
A subclass of DenseVector which shrinks the value every time a new value is added.
A sigmoid kernel, tanh(gamma*u.dot(v) + intercept).
Normalizes the input by applying a logistic sigmoid to each element.
This class stores a String describing the data source, along with a timestamp.
Extracts a value from a single field to be placed in an Example's metadata field.
A version of SimpleTextDataSource that accepts a List of Strings.
Provenance for SimpleStringDataSource.
A dataset for a simple data format for text classification experiments.
Provenance for SimpleTextDataSource.
This is used for stateless functions such as exp, log, addition or multiplication by a constant.
Operations understood by this Transformation.
Provenance for SimpleTransform.
A Model which wraps n independent regression models, where n is the size of the MultipleRegressor domain.
A SparseModel which wraps n independent regression models, where n is the size of the MultipleRegressor domain.
Base class for training n independent sparse models, one per dimension.
Trains n independent binary Models, each of which predicts a single Regressor.
The skeleton of a TrainerProvenance that extracts the configured parameters.
Contains information about a feature and can be stored in the feature map in a Dataset.
A trainer for a sparse linear regression model.
The inference time version of a sparse linear regression model.
SparseModel<T extends Output<T>>
A model which uses a subset of the features it knows about to make predictions.
Denotes this trainer emits a SparseModel.
A sparse vector.
This implementation of Tokenizer is instantiated with an array of characters that are considered split characters.
Splits tokens at the supplied characters.
CLI options for a SplitCharactersTokenizer.
This class supports character-by-character (that is, codepoint-by-codepoint) iteration over input text to create tokens.
An interface for checking if the text should be split at the supplied codepoint.
Defines different ways that a tokenizer can split the input text at a given character.
SplitNode<T extends Output<T>>
An immutable Node with a split and two child nodes.
This implementation of Tokenizer is instantiated with a regular expression pattern which determines how to split a string into tokens.
CLI options for a SplitPatternTokenizer.
Splits data in our standard text format into training and testing portions.
Command line options.
A DataSource for loading columnar data from a database and applying FieldProcessors to it.
Provenance for SQLDataSource.
N.B.
Read an SQL query in on the standard input, write a CSV file containing the results to the standard output.
Command line options.
Squared loss, i.e., l2.
Interface for gradient based optimisation methods.
A main class for stripping out and storing provenance from a model.
Types of provenance that can be removed.
Command line options.
A feature aggregator that aggregates occurrence counts across a number of feature lists.
The carrier type for LibSVM anomaly detection modes.
Valid SVM modes for anomaly detection.
The carrier type for LibSVM classification modes.
The classification model types.
A container for SVM parameters and the kernel.
The carrier type for LibSVM regression modes.
Type of regression SVM.
SVMType<T extends Output<T>>
A carrier type for the SVM type.
An explainer for tabular data.
An interface for Tensors, currently Vectors and Matrices.
This model encapsulates a simple model with an input feed dict, and produces a single output tensor.
A Tribuo wrapper around a TensorFlow frozen model.
Base class for a TensorFlow model that operates on Examples.
This model encapsulates a TensorFlow model running in graph mode with a single tensor output.
A Tribuo wrapper around a TensorFlow saved model bundle.
A TensorFlow model which implements SequenceModel, suitable for use in sequential prediction tasks.
A trainer for SequenceModels which use an underlying TensorFlow graph.
Trainer for TensorFlow.
Provenance for TensorFlowTrainer.
The model format to emit.
Helper functions for working with TensorFlow.
A serializable tuple containing the tensor class name, the shape and the data.
A map of names and tensors to feed into a session.
Test a classifier for a standard dataset.
Command line options.
TestingDetails section of a ModelCard.
A base class for textual data sets.
An explainer for text data.
An interface for things that take text and turn them into examples that we can use to train or evaluate a classifier.
An implementation of TextFeatureExtractor that takes a TextPipeline and generates ArrayExample.
A FieldProcessor which takes a text field and runs a TextPipeline on it to generate features.
A pipeline that takes a String and returns a List of Features.
An exception thrown by the text processing system.
A TextProcessor takes some text and optionally a feature tag and generates a list of Features from that text.
A TrainerProvenance with a timestamp, used when there was no trainer involved in model construction (e.g., creating an EnsembleModel from existing models).
A single token extracted from a String.
Tokenizers may product multiple kinds of tokens, depending on the application to which they're being put.
Wraps exceptions thrown by tokenizers.
An interface for things that tokenize text: breaking it into words according to some set of rules.
CLI Options for creating a tokenizer.
A pipeline for generating ngram features.
Trainer<T extends Output<T>>
An interface for things that can train predictive models.
A tag interface for trainer provenances.
An implementation of TrainerProvenance that delegates everything to SkeletalTrainerProvenance.
TrainingDetails section of a ModelCard.
Build and run a decision tree classifier for a standard dataset.
Build and run a classifier for a standard dataset.
Build and run a liblinear-java classifier for a standard dataset.
Build and run a LibSVM classifier for a standard dataset.
Build and run a multinomial naive bayes classifier for a standard dataset.
Build and run a classifier for a standard dataset using FMClassificationTrainer.
Build and run a kernel SVM classifier for a standard dataset.
Build and run a classifier for a standard dataset using LinearSGDTrainer.
Build and run an XGBoost classifier for a standard dataset.
Build and run a HDBSCAN* clustering model for a standard dataset.
Build and run a k-means clustering model for a standard dataset.
Build and run a Tensorflow multi-class classifier for a standard dataset.
Build and run a LibLinear regressor for a standard dataset.
Build and run a LibSVM regressor for a standard dataset.
Build and run a regression tree for a standard dataset.
Build and run a regression factorization machine for a standard dataset.
Build and run a linear regression for a standard dataset.
Build and run a sparse linear regression model for a standard dataset.
Build and run an XGBoost regressor for a standard dataset.
Command line options.
Command line options.
Options for the HDBSCAN* CLI.
Impurity function.
Type of feature extractor.
Options for the K-Means CLI.
Command line options.
Command line options.
Loss function.
Loss function.
Command line options.
Command line options.
Command line options.
Type of sparse linear model.
Options for training a model in TensorFlow.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Type of tree trainer.
Command line options.
This class provides static methods used by the demo classes in each classification backend.
Splits data into training and testing sets.
Provenance for a split data source.
An interface representing a class of transformations which can be applied to a feature.
A carrier type for a set of transformations to be applied to a Dataset.
A carrier type as OLCUT does not support nested generics.
A tag interface for provenances in the transformation system.
Wraps a Model with it's TransformerMap so all Examples are transformed appropriately before the model makes predictions.
A fitted Transformation which can apply a transform to the input value.
A collection of Transformers which can be applied to a Dataset or Example.
Provenance for TransformerMap.
An interface for the statistics that need to be collected for a specific Transformation on a single feature.
A Trainer which encapsulates another trainer plus a TransformationMap object to apply to each Dataset before training each Model.
An inverted feature, which stores a reference to all the values of this feature.
TreeModel<T extends Output<T>>
A Model wrapped around a decision tree root Node.
This class stores the current Tribuo version, along with other compile time information.
Generates the counts for a triplet of vectors.
Aggregates feature tokens, generating unique features.
Processes a feature list, aggregating all the feature values with the same name.
The type of reduction operation to perform.
This class was originally written for the purpose of document indexing in an information retrieval context (principally used in Sun Labs' Minion search engine).
UsageDetails section of a ModelCard.
A builder class for creating an instance of UsageDetails.
SGD utilities.
Utilities.
Ye olde util class.
A nominal tuple.
A nominal tuple.
Adds an id number to a VariableInfo.
A VariableInfo subclass contains information about a feature and its observed values.
A functional interface that generates a normalized version of a double array.
A mutable tuple used to avoid allocation when iterating a vector.
An implementation of a viterbi model.
Types of label score aggregation.
Builds a Viterbi model using the supplied Trainer.
Options for building a viterbi trainer.
Type of label features to include.
A combiner which performs a weighted or unweighted vote across the predicted labels.
A mutable tuple of a double and a long.
An ensemble model that uses weights to combine the ensemble member predictions.
Tag interface denoting that a Trainer can use example weights.
A class of (discrete) weighted information theoretic functions.
Chooses which variable is the one with associated weights.
Tag interface denoting the Trainer can use label weights.
Generates the counts for a pair of vectors.
Generates the counts for a triplet of vectors.
A simple tokenizer that splits on whitespace.
This is vanilla implementation of the Wordpiece algorithm as found here: https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/tokenization_bert.py
This is a tokenizer that is used "upstream" of WordpieceTokenizer and implements much of the functionality of the 'BasicTokenizer' implementation in huggingface.
This Tokenizer is meant to be a reasonable approximation of the BertTokenizer defined here.
Converts XGBoost outputs into Label Predictions.
A Trainer which wraps the XGBoost training procedure.
A Model which wraps around a XGBoost.Booster which was trained by a system other than Tribuo.
Generate and collate feature importance information from the XGBoost model.
An instance of feature importance values for a single feature.
A Model which wraps around a XGBoost.Booster.
CLI options for training an XGBoost classifier.
CLI options for configuring an XGBoost regression trainer.
Converts the output of XGBoost into the appropriate prediction type.
Converts XGBoost outputs into Regressor Predictions.
A Trainer which wraps the XGBoost training procedure.
Types of regression loss.
A Trainer which wraps the XGBoost training procedure.
The type of XGBoost model.
Tuple of a DMatrix, the number of valid features in each example, and the examples themselves.
The logging verbosity of the native library.
The tree building algorithm.
Deprecated.
Unused.