All Classes and Interfaces
Class
Description
Absolute loss (i.e., l1).
Base class for
Trainer
's that use an approximation of the CART algorithm to build a decision tree.Deprecated.
AbstractEvaluator<T extends Output<T>,C extends MetricContext<T>,E extends Evaluation<T>,M extends EvaluationMetric<T,C>>
Base class for evaluators.
A quadratic factorization machine model trained using SGD.
A trainer for a quadratic factorization machine model which uses SGD.
A linear model trained using SGD.
A trainer for a linear model which uses SGD.
AbstractSequenceEvaluator<T extends Output<T>,C extends MetricContext<T>,E extends SequenceEvaluation<T>,M extends EvaluationMetric<T,C>>
Base class for sequence evaluators.
A model trained using SGD.
A nominal tuple used to capture the prediction and the number of active features used by the model.
A trainer for a model which uses SGD.
Base class for decision tree nodes used at training time.
Contains parameters needed to determine whether a node is a leaf.
Implements Adaboost.SAMME one of the more popular algorithms for multiclass boosting.
An implementation of the AdaDelta gradient optimiser.
An implementation of the AdaGrad gradient optimiser.
An implementation of the AdaGrad gradient optimiser with regularized dual averaging.
An implementation of the Adam gradient optimiser.
Aggregates multiple
ConfigurableDataSource
s, uses AggregateDataSource.IterationOrder
to control the
iteration order.Provenance for the
AggregateConfigurableDataSource
.Aggregates multiple
DataSource
s, uses AggregateDataSource.IterationOrder
to control the
iteration order.Provenance for the
AggregateDataSource
.Specifies the iteration order of the inner sources.
Aggregates all the classification algorithms.
Types of algorithms supported.
Generates three example train and test datasets, used for unit testing.
An
Evaluation
for anomaly detection Event
s.A factory for generating events.
Provenance for
AnomalyFactory
.The base class for tracking anomalous events.
A metric for evaluating anomaly detection problems.
Default metrics for evaluating anomaly detection.
An
Example
backed by two arrays, one of String and one of double.A feature aggregator that averages feature values across a feature list.
A combiner which performs a weighted or unweighted average of the predicted
regressors independently across the output dimensions.
A Trainer that wraps another trainer and produces a bagged ensemble.
An example implementation of
TextPipeline
.Builds examples and sequence examples using features from BERT.
CLI options for running BERT.
The type of output pooling to perform.
A multilabel version of binary cross entropy loss which expects logits.
An
Example
backed by a single array of feature names.A
ResponseProcessor
that takes a single value of the
field as the positive class and all other values as the negative
class.A Transformation which bins values.
Provenance for
BinningTransformation
.The implementation of a
Transformer
which splits the input into n bins.The allowed binning types.
A tokenizer wrapping a
BreakIterator
instance.CLI options for a
BreakIteratorTokenizer
.A pair of things with a cached hashcode.
A triple of things.
Options for building a classification tree trainer.
The impurity algorithm.
Type of decision tree algorithm.
A
Trainer
that uses an approximation of the CART algorithm to build a decision tree.A
Trainer
that uses an approximation of the CART algorithm to build a decision tree.A
Trainer
that uses an approximation of the CART algorithm to build a decision tree.A document preprocessor which uppercases or lowercases the input.
The possible casing operations.
Same as a
CategoricalInfo
, but with an additional int id field.Stores information about Categorical features.
A trainer for an ensemble of randomly ordered Classifier Chains.
A collection of helper methods for performing training and inference in a CRF.
Belief Propagation results.
Clique scores within a chain.
Viterbi output from a linear chain.
Creates a data source using a 2d checkerboard of alternating classes.
Chunk class used for chunk level confidence prediction in the
CRFModel
.A tag interface for multi-class and multi-label classification tasks.
Options for building a classification ensemble.
The type of ensemble.
An
Options
that can produce a classification Trainer
based on the
provided arguments.A Classifier Chain Model.
A trainer for a Classifier Chain.
Defines methods that calculate classification performance, used for both multi-class and multi-label classification.
A decision tree node used at training time.
A clustering id.
Generates three example train and test datasets, used for unit testing.
An
Evaluation
for clustering tasks.A factory for making ClusterID related classes.
Provenance for
ClusteringFactory
.The base class for a ClusterID OutputInfo.
A metric for evaluating clustering problems.
Default metrics for evaluating clusterings.
Selects features according to the Conditional Mutual Information Maximisation algorithm.
Static factory methods which produce Convolutional Neural Network architectures.
A
ConfigurableDataSource
base class which takes columnar data (e.g., csv or DB table rows) and generates Example
s.An explainer for data using Tribuo's columnar data package.
A Feature with extra bookkeeping for use inside the columnar package.
An abstract class for iterators that read data in to a columnar format, usually from a file of some kind.
A representation of a row of untyped data from a columnar data source.
Build and run a predictor for a standard dataset.
Command line options.
A data source for two concentric circles, one per class.
A Sequence model which can provide confidence predictions for subsequence predictions.
A range class used to define a subsequence of a SequenceExample.
It's a
DataSource
that's also Configurable
.Build and run a classifier for a standard dataset.
Build and run a predictor for a standard dataset.
Command line options.
Command line options.
A tag interface for configurable data source provenance.
A confusion matrix for
Classifiable
s.Static functions for computing classification metrics based on a
ConfusionMatrix
.CLI Options for all the tokenizers in the core package.
Tokenizer type.
Cosine similarity used as a distance measure.
An inference time model for a linear chain CRF trained using SGD.
The type of subsequence level confidence to predict.
CLI options for training a linear chain CRF model.
A
Parameters
for training a CRF using SGD.A trainer for CRFs using SGD.
A class that does k-fold cross-validation.
A
DataSource
for loading separable data from a text file (e.g., CSV, TSV)
and applying FieldProcessor
s to it.Provenance for
CSVDataSource
.An iterator over a CSV file.
Load a DataSource/Dataset from a CSV file.
Deprecated.
Saves a Dataset in CSV format suitable for loading by
CSVLoader
.Options for working with training and test data in a CLI.
The delimiters supported by CSV files in this options object.
The input formats supported by this options object.
Tag interface for data sources provenances.
A class for sets of data, which are used to train and evaluate classifiers.
Serialization carrier for common fields in Dataset.
A CLI for exploring a serialised
Dataset
.Command line options.
Base class for dataset provenance.
DatasetView provides an immutable view on another
Dataset
that only exposes selected examples.Provenance for the
DatasetView
.A interface for things that can be given to a Dataset's constructor.
Data source provenance.
Extracts the field value and translates it to a
LocalDate
based on the specified DateTimeFormatter
.Processes a column that contains a date value.
The types of date features which can be extracted.
A tag interface for a
Trainer
so the random forests trainer can check if it's actually a tree.A label feature extractor that produces several kinds of label-based features.
The base class for the 2d binary classification data sources in
org.tribuo.classification.example
.Provenance for
DemoLabelDataSource
.Converts a sparse example into a dense float vector, then wraps it in a
TFloat32
.A dense matrix, backed by a primitive array.
The output of a successful Cholesky factorization.
The output of a successful eigen decomposition.
The output of a successful LU factorization.
A matrix which is dense in the first dimension and sparse in the second.
Converts a sparse Tribuo example into a dense float vector, then wraps it in an
OnnxTensor
.A dense vector, backed by a double array.
Descriptive statistics calculated across a list of doubles.
A data source for a somewhat-common format for text classification datasets:
a top level directory that contains a number of subdirectories.
Provenance for
DirectoryFileSource
.Interface for distance functions.
The built-in distance functions.
An interface for things that can pre-process documents before they are
broken into features.
Extracts the field value and converts it to a double.
Processes a column that contains a real value.
A model which performs dummy classifications (e.g., constant output, uniform sampled labels, stratified sampled labels).
A trainer for simple baseline classifiers.
Types of dummy classifier.
A model which performs dummy regressions (e.g., constant output, gaussian sampled output, mean value, median, quartile).
A trainer for simple baseline regressors.
Deprecated.
Types of dummy regression model.
An ElasticNet trainer that uses co-ordinate descent.
An empty DatasetProvenance, should not be used except by the provenance removal system.
An empty DataSourceProvenance, should not be used except by the provenance removal system.
A
ResponseProcessor
that always emits an empty optional.An empty TrainerProvenance, should not be used except by the provenance removal system.
An interface for combining predictions.
An
Excuse
which has a List of excuses for each of the ensemble members.A model which contains a list of other
Model
s.Model provenance for ensemble models.
A log_e entropy impurity measure.
An immutable evaluation of a specific model and dataset.
Aggregates metrics from a list of evaluations, or a list of models and datasets.
A metric that can be calculated for the specified output type.
Specifies what form of average to use for a
EvaluationMetric
.Provenance for evaluations.
Renders an
Evaluation
into a String.An evaluation factory which produces immutable
Evaluation
s of a given Dataset
using the given Model
.The type of event.
An example used for training and evaluation.
Transforms a
SparseVector
, extracting the features from it as a OnnxTensor
.Holds an
Example
, a Prediction
and a Map from String to List of Pairs
that contains the per output explanation.An explanation knows what features are used, what the explaining Model is and what the original Model's prediction is.
Normalizes the exponential values of the input array.
A dummy provenance used to describe the dataset of external models.
This is the base class for third party models which are trained externally and
loaded into Tribuo for prediction.
A dummy provenance for a model trained outside Tribuo.
A trainer which produces an Extremely Randomized Tree Ensemble.
A class for features.
An interface for aggregating feature values into other values.
Hashes the feature names to reduce the dimensionality.
A map from Strings to
VariableInfo
objects storing
information about a feature.Takes a list of columnar features and adds new features or removes existing features.
An interface for feature selection algorithms.
A tag interface for feature selection algorithms.
An implementation of
FeatureSelectorProvenance
which delegates everything to
SkeletalConfiguredObjectProvenance
.Contains provenance information for an instance of a
SelectedFeatureSet
.A feature transformer maps a list of features to a new list of features.
A Parameters for models which make a single prediction like logistic regressions and neural networks.
Extracts a value from a field to be placed in an
Example
's metadata field.An interface for things that process the columns in a data set.
The types of generated features.
A response processor that returns the value(s) in a given (set of) fields.
Extracts the field value and converts it to a float.
The inference time version of a factorization machine trained using SGD.
CLI options for training a factorization machine classifier.
Available loss types.
A trainer for a classification factorization machine using SGD.
The inference time version of a multi-label factorization machine trained using SGD.
CLI options for training a linear classifier.
Available loss types.
A trainer for a multi-label classification factorization machine using SGD.
A
Parameters
for factorization machines.The inference time model of a regression factorization machine trained using SGD.
A trainer for a regression factorization machine using SGD.
A combiner which performs a weighted or unweighted vote across the predicted labels.
Static functions for computing the Gamma and log Gamma functions on real valued inputs.
Generates an anomaly detection dataset sampling each feature uniformly from a univariate Gaussian.
Provenance for
GaussianAnomalyDataSource
.Generates a clustering dataset drawn from a mixture of 5 Gaussians.
Provenance for
GaussianClusterDataSource
.Generates a single dimensional output drawn from N(slope*x + intercept,variance).
Provenance for
GaussianDataSource
.A data source for two classes generated from separate Gaussians.
The Gini index impurity measure.
An enum for the gradient optimisers exposed by TensorFlow-Java.
CLI options for configuring a gradient optimiser.
Type of the gradient optimisers available in CLIs.
A tuple containing a graph def protobuf along with the relevant operation names.
Hashes names using String.hashCode().
Provenance for the
HashCodeHasher
.A
FeatureMap
used by the HashingTrainer
to
provide feature name hashing and guarantee that the Model
does not contain feature name information, but still works
with unhashed features names.An abstract base class for hash functions used to hash the names of features.
An Options implementation which provides CLI arguments for the model hashing functionality.
Supported types of hashes in CLI programs.
A SequenceTrainer that hashes all the feature names on the way in.
Provenance for
HashingSequenceTrainer
.A trained HDBSCAN* model which provides the cluster assignment labels and outlier scores for every data point.
OLCUT
Options
for the HDBSCAN* implementation.An HDBSCAN* trainer which generates a hierarchical, density-based clustering representation
of the supplied data.
A cluster exemplar, with attributes for the point's label, outlier score and its features.
Deprecated.
This Enum is deprecated in version 4.3, replaced by
DistanceType
Merges each
SparseVector
separately using a PriorityQueue
as a heap.Hinge loss, scores the correct value margin and any incorrect predictions -margin.
Hinge loss, scores the correct value margin and any incorrect predictions -margin.
Utilities for nice HTML output that can be put in wikis and such.
Huber loss, i.e., a mixture of l2 and l1 losses.
Extracts the field value and emits it as a String.
A
FieldProcessor
which converts the field name and value into a feature with a value of IdentityProcessor.FEATURE_VALUE
.A feature transformation that computes the IDF for features and then transforms
them with a TF-IDF weighting.
Provenance for
IDFTransformation
.A DataSource which can read IDX formatted data (i.e., MNIST).
Java side representation for an IDX file.
Provenance class for
IDXDataSource
.The possible IDX input formats.
Image converter.
Image transformer.
An
ImmutableOutputInfo
object for Event
s.An
ImmutableOutputInfo
object for ClusterIDs.This is a
Dataset
which has an ImmutableFeatureMap
to store the feature information.ImmutableFeatureMap is used when unknown features should not be added to the FeatureMap.
An
ImmutableOutputInfo
object for Label
s.An
ImmutableOutputInfo
for working with MultiLabel
tasks.An
OutputInfo
that is fixed, and contains an id number for each valid output.A
ImmutableOutputInfo
for Regressor
s.This is a
SequenceDataset
which has an ImmutableFeatureMap
to store the feature information.An interface for incremental training of
Model
s.A
Model
which wraps n binary models, where n is the
size of the MultiLabel domain.A SequenceModel which independently predicts each element of the sequence.
Trains a sequence model by training a regular model to independently predict every example in each sequence.
A version of ArrayExample which also has the id numbers.
A tuple of the feature name, id and value.
An Extractor with special casing for loading the index from a Row.
A class of (discrete) information theoretic functions.
An immutable named tuple containing the statistics from a G test.
Demo showing how to calculate various mutual informations and entropies.
Command line options.
Type of data distribution.
An array container which maintains the array and the size.
A Pair of a primitive int and a primitive double.
A data source of two interleaved half circles.
Extracts the field value and converts it to a int.
Internal datastructure for implementing a decision tree.
Selects features according to the Joint Mutual Information algorithm.
A decision tree node used at training time.
A
DataSource
for loading data from a JSON text file
and applying FieldProcessor
s to it.Provenance for
JsonDataSource
.An iterator for JSON format files converting them into a format suitable for
RowProcessor
.Utilities for interacting with JSON objects or text representations.
A k-d tree nearest neighbour query implementation.
A factory which creates k-d tree nearest neighbour query objects.
An interface for a Mercer kernel function.
The inference time version of a kernel model trained using Pegasos.
Options for using the KernelSVMTrainer.
The kernel types.
A trainer for a kernelised model using the Pegasos optimiser.
Kernel types from libsvm.
A k-fold splitter to be used in cross-validation.
Stores a train/test split for a dataset.
A K-Means model with a selectable distance function.
OLCUT
Options
for the K-Means implementation.A K-Means trainer, which generates a K-means clustering of the supplied
data.
Deprecated.
This Enum is deprecated in version 4.3, replaced by
DistanceType
Possible initialization functions.
CLI Options for training a k-nearest neighbour predictor.
The type of combination function.
A k-nearest neighbours model.
The parallel backend for batch predictions.
A
Trainer
for k-nearest neighbour models.Deprecated.
This Enum is deprecated in version 4.3, replaced by
DistanceType
L1 (or Manhattan) distance.
L2 (or Euclidean) distance.
An immutable multi-class classification label.
A confusion matrix for
Label
s.Can convert a
Label
into a Tensor
containing one hot encoding of the label and
can convert a TFloat16
or TFloat32
into a Prediction
or a Label
.Adds multi-class classification specific metrics to
ClassifierEvaluation
.Static utility functions for calculating performance metrics on
Label
s.Stores the Precision-Recall curve as three arrays: the precisions, the recalls,
and the thresholds associated with those values.
Stores the ROC curve as three arrays: the false positive rate, the true positive rate,
and the thresholds associated with those rates.
A factory for making Label related classes.
Provenance for
LabelFactory
.A class for featurising labels from previous steps in Viterbi.
Calculates a tree impurity score based on label counts, weighted label counts or a probability distribution.
The base class for information about multi-class classification Labels.
Generates three example train and test datasets, used for unit testing.
The context for a
LabelMetric
is a ConfusionMatrix
.An enum of the default
LabelMetric
s supported by the multi-class classification
evaluation package.An interface for single label prediction objectives.
A class that can be used to evaluate a sequence label classification model element wise on a given set of data.
A sequence evaluator for labels.
A trainer for a lasso linear regression model which uses LARS to construct the model.
A trainer for a linear regression model which uses least angle regression.
An immutable leaf
Node
that can create a prediction.A
Model
which wraps a LibLinear-java anomaly detection model.A
Trainer
which wraps a liblinear-java anomaly detection trainer using a one-class SVM.A
Model
which wraps a LibLinear-java classification model.A
Trainer
which wraps a liblinear-java classifier trainer.A
Model
which wraps a LibLinear-java model.Command line options for working with a classification liblinear model.
A
Model
which wraps a LibLinear-java model.A
Trainer
which wraps a liblinear-java regression trainer.A
Trainer
which wraps a liblinear-java trainer.A carrier type for the liblinear algorithm type.
An anomaly detection model that uses an underlying libSVM model to make the
predictions.
A trainer for anomaly models that uses LibSVM.
A classification model that uses an underlying LibSVM model to make the
predictions.
A trainer for classification models that uses LibSVM.
A DataSource which can read LibSVM formatted data.
The provenance for a
LibSVMDataSource
.A model that uses an underlying libSVM model to make the
predictions.
CLI options for training a LibSVM classification model.
A regression model that uses an underlying libSVM model to make the
predictions.
A trainer for regression models that uses LibSVM.
A trainer that will train using libsvm's Java implementation.
LIMEBase merges the lime_base.py and lime_tabular.py implementations, and deals with simple
matrices of numerical or categorical data.
Uses the columnar data processing infrastructure to mix text and tabular data.
An
Explanation
using LIME.Uses a Tribuo
TextFeatureExtractor
to explain the prediction for a given piece of text.A CLI for interacting with
LIMEText
.Command line options.
A linear kernel, u.dot(v).
The carrier type for liblinear anomaly detection modes.
The different model types available for classification.
The carrier type for liblinear classification modes.
The different model types available for classification.
A
Parameters
for producing linear models.The carrier type for liblinear linear regression modes.
The type of linear regression algorithm.
A Transformation which takes an observed distribution and rescales
it so all values are between the desired min and max.
Provenance for
LinearScalingTransformation
.The inference time version of a linear model trained using SGD.
The inference time version of a multi-label linear model trained using SGD.
The inference time version of a linear model trained using SGD.
CLI options for training a linear classifier.
CLI options for training a linear classifier.
Available loss types.
Available loss types.
A trainer for a linear classifier using SGD.
A trainer for a multi-label linear model which uses SGD.
A trainer for a linear regression model which uses SGD.
A data source which wraps up a list of
Example
s
along with their DataSourceProvenance
and an OutputFactory
.This class will not be performant until value types are available in Java.
A logistic regression trainer that uses a reasonable objective, optimiser,
number of epochs and minibatch size.
A multiclass version of the log loss.
Interface for 2 dimensional
Tensor
s.Interface for matrix factorizations.
A mutable tuple used to avoid allocation when iterating a matrix.
Measures the mean absolute error over a set of inputs.
Measures the mean squared error over a set of inputs.
A Transformation which takes an observed distribution and rescales
it so it has the desired mean and standard deviation.
Provenance for
MeanStdDevTransformation
.An accumulator for online calculation of the mean and variance of a
stream of doubles.
An interface for merging an array of
DenseSparseMatrix
into a single DenseSparseMatrix
.An interface which can merge double values.
Hashes Strings using the supplied MessageDigest type.
Provenance for
MessageDigestHasher
.The context for a metric or set of metrics.
Just an easier-to-read alias for
Pair<MetricTarget<T>, String>
.Used by a given
EvaluationMetric
to determine whether it should compute its value for a specific Output
value
or whether it should average them.Selects features according to their mutual information with the class label (aka Mutual Information Maximisation).
This class creates a pruned dataset in which low frequency features that
occur less than the provided minimum cardinality have been removed.
Provenance for
MinimumCardinalityDataset
.This class creates a pruned dataset in which low frequency features that
occur less than the provided minimum cardinality have been removed.
Provenance for
MinimumCardinalitySequenceDataset
.Static factory methods which produce Multi-Layer Perceptron architectures.
A prediction model, which is used to predict outputs for unseen instances.
ModelCard feature to allow more transparent model reporting.
A command line interface for creating and appending UsageDetails to the serialized version of an
existing ModelCard.
CLI options for
ModelCardCLI
.Serialization carrier for common fields in Model and SequenceModel.
ModelDetails section of a
ModelCard
.A command line interface for loading in models and inspecting their feature and output spaces.
CLI options for
ModelExplorer
.Contains provenance information for an instance of a
Model
.Hashes names using String.hashCode(), then reduces the dimension.
Provenance for the
ModHashCodeHasher
.Selects features according to the Minimum Redundancy Maximum Relevance algorithm.
A class for multi-label classification.
A
ConfusionMatrix
which accepts MultiLabel
s.Can convert a
MultiLabel
into a Tensor
containing a binary encoding of the label vector and
can convert a TFloat16
or TFloat32
into a Prediction
or a MultiLabel
.Generates three example train and test datasets, used for unit testing.
A
MultiLabel
specific ClassifierEvaluation
.The implementation of a
MultiLabelEvaluation
using the default metrics.An
Evaluator
for MultiLabel
problems.A factory for generating MultiLabel objects and their associated OutputInfo and Evaluator objects.
Provenance for
MultiLabelFactory
.Generates a multi label output drawn from a series of functions.
Provenance for
MultiLabelGaussianDataSource
.The base class for information about
MultiLabel
outputs.A
EvaluationMetric
for evaluating MultiLabel
problems.An enum of the default
MultiLabelMetric
s supported by the multi-label classification
evaluation package.An interface for multi-label prediction objectives.
A combiner which performs a weighted or unweighted vote independently across the predicted labels in each multi-label.
A
Model
for multinomial Naive Bayes with Laplace smoothing.CLI options for a multinomial naive bayes model.
A
Trainer
which trains a multinomial Naive Bayes model with Laplace smoothing.A class for sampling from multivariate normal distributions.
The MurmurHash3 algorithm was created by Austin Appleby and placed in the public domain.
128 bits of state
An
MutableOutputInfo
object for Event
s.A mutable
ClusteringInfo
.A MutableDataset is a
Dataset
with a MutableFeatureMap
which grows over time.A feature map that can record new feature value observations.
A mutable
LabelInfo
.A MutableOutputInfo for working with multi-label tasks.
A mutable OutputInfo that can record observed output values.
A
MutableOutputInfo
for Regressor
s.A MutableSequenceDataset is a
SequenceDataset
with a MutableFeatureMap
which grows over time.A brute-force nearest neighbour query implementation.
A factory which creates brute-force nearest neighbour query objects.
An interface for nearest neighbour query objects.
An interface for factories which create nearest neighbour query objects.
These are the supported neighbour query implementations.
A document pre-processor for 20 newsgroup data.
A text processor that will generate token ngrams of a particular size.
A node in a decision tree.
A data source of two interleaved half circles with some zero mean Gaussian noise applied to each point.
Generates a single dimensional output drawn from
N(w_0*x_0 + w_1*x_1 + w_2*x_1*x_0 + w_3*x_1*x_1*x_1 + intercept,variance).
Provenance for
NonlinearGaussianDataSource
.A convenience class for when you are required to provide a tokenizer but you
don't actually want to split up the text into tokens.
A label feature extractor that doesn't produce any label based features.
NoopNormalizer returns a copy in
NoopNormalizer.normalize(double[])
and is a no-op in place.Normalizes, but first subtracts the minimum value (to ensure positivity).
A wrapper class around an OCI Data Science Model Deployment endpoint which sends off inputs for scoring and
converts the output into a Tribuo prediction.
Carrier type for easy deserialization from JSON.
This class provides a CLI for deploying and scoring a Tribuo Classification model.
Options for the OCIModelCLI.
Mode for the CLI.
Converter for a
DenseMatrix
received from OCI Data Science Model Deployment.Utils for uploading and deploying models to OCI Data Science.
Configuration for OCI DS.
Configuration for an OCI DS Model artifact.
Configuration for an OCI DS Model Deployment.
Enum for OCI model types.
Extracts the field value and translates it to an
OffsetDateTime
based on the specified DateTimeFormatter
.An evaluator which aggregates predictions and produces
Evaluation
s
covering all the Prediction
s it has seen or created.The spec for an attribute, used to produce the attribute proto at construction time.
Context object used to scope and manage the creation of ONNX
OnnxMl.GraphProto
and OnnxMl.ModelProto
instances.An interface which denotes this
Model
can be
exported as an ONNX model.A Tribuo wrapper around a ONNX model.
A subclass of
ONNXRef
specialized for OnnxMl.TensorProto
.Tribuo Math specific helper functions for building ONNX protos.
A subclass of
ONNXRef
specialized for OnnxMl.NodeProto
.An interface for ONNX operators.
ONNX Opset 13, and ONNX-ML version 1.
A subclass of
ONNXRef
specialized for OnnxMl.ValueInfoProto
.An abstract reference that represents both a node in an ONNX computation graph and a container for a specific ONNX
proto object that denotes that node.
Helper functions for building ONNX protos.
Output is the root interface for the supported prediction types.
Converts the
Output
into a Tensor
and vice versa.An interface associated with a specific
Output
, which can generate the
appropriate Output subclass, and OutputInfo
subclass.A tag provenance for an output factory.
Tracks relevant properties of the appropriate
Output
subclass.A count distribution over
CachedPair
objects.Averages the parameters across a gradient run.
An interface to a
Tensor
[] array which accepts updates to the parameters.An implementation of the Pegasos gradient optimiser used primarily for solving the SVM problem.
A polynomial kernel, (gamma*u.dot(v) + intercept)^degree.
A prediction made by a
Model
.Reads in a Datasource, processes all the data, and writes it out as a serialized dataset.
Command line options.
ProtoSerializable<T extends com.google.protobuf.Message>
Interface for serializing an implementing object to the specified protobuf.
Mark a class as being
ProtoSerializable
and specify
the class type used to serialize the "serialized_data".Annotation which denotes that a field should be part of the protobuf serialized representation.
Annotation which denotes that the map field this is applied to is
serialized as two repeated fields, one for keys and one for values.
Annotation which denotes that a map field should be part of the protobuf serialized representation.
Annotation which denotes that the map field this is applied to is
serialized as a list of values.
Utilities for working with Tribuo protobufs.
A quartile to split data into 4 chunks.
Processes the response into quartiles and emits them as classification outputs.
A trainer which produces a random forest.
A range currently being segmented.
A Radial Basis Function (RBF) kernel, exp(-gamma*|u-v|^2).
Same as a
RealInfo
, but with an additional int id field.Stores information about real valued features.
A
FieldProcessor
which applies a regex to a field and generates ColumnarFeature
s based on the matches.Matching mode.
A simple document preprocessor which applies regular expressions to the input.
Generates two example train and test datasets, used for unit testing.
Defines methods that calculate regression performance.
A factory for creating
Regressor
s and RegressionInfo
s.Provenance for
RegressionFactory
.The base class for regression information using
Regressor
s.A
EvaluationMetric
for Regressor
s which calculates the metric based on a
the true values and the predicted values.An enum of the default
RegressionMetric
s supported by the multi-dimensional regression
evaluation package.An interface for regression objectives.
The sufficient statistics for regression metrics (i.e., each prediction and each true value).
An
Output
for n-dimensional real valued regression.A
Regressor
which contains a single dimension, used internally
when the model implementation doesn't natively support multi-dimensional
regression outputs.Calculates a tree impurity score based on the regression targets.
Tuple class for the impurity and summed weight.
A decision tree node used at training time.
Tuple containing an inverted dataset (i.e., feature-wise not exmaple-wise).
Reproducibility utility based on Tribuo's provenance objects.
Record for any differences between feature sets.
Record for a model reproduction.
Record for any differences between output domains.
Utils for working with classpath resources at test time.
An interface that will take the response field and produce an
Output
.An iterator over a ResultSet returned from JDBC.
An implementation of the RMSProp gradient optimiser.
A row of values from a
RowList
.An implementation of a List which wraps a set of lists.
A processor which takes a Map of String to String and returns an
Example
.Builder for
RowProcessor
.Trains and tests a model using the supplied data, for each trainer inside a configuration file.
Command line options.
This class creates a pruned dataset which only contains the selected features.
Provenance for
SelectedFeatureDataset
.A record-like class for a selected feature set.
Build and run a sequence classifier on a generated dataset.
Command line options.
Build and run a sequence classifier on a generated or serialized dataset using the trainer specified in the configuration file.
Command line options.
A data generator for smoke testing sequence label models.
A class for sets of data, which are used to train and evaluate classifiers.
A interface for things that can be given to a SequenceDataset's constructor.
An immutable evaluation of a specific sequence model and dataset.
An evaluation factory which produces immutable
SequenceEvaluation
s of a given SequenceDataset
using the given SequenceModel
.A sequence of examples, used for sequence classification.
Converts a sequence example into a feed dict suitable for TensorFlow.
A prediction model, which is used to predict outputs for unseen instances.
A CLI for interacting with a
SequenceModel
.Command line options.
Converts a TensorFlow output tensor into a list of predictions, and a Tribuo sequence example into
a Tensorflow tensor suitable for training.
An interface for things that can train sequence prediction models.
An implementation of single learning rate SGD and optionally momentum.
Momentum types.
An interface for a loss function that can produce the loss and gradient incurred by
a single prediction.
Interface for 1 dimensional
Tensor
s.This tokenizer is loosely based on the notion of word shape which is a common
feature used in NLP.
A subclass of
DenseMatrix
which shrinks the value every time a new value is added.An interface which tags a
Tensor
with a convertToDense method.A subclass of
DenseVector
which shrinks the value every time a new value is added.A sigmoid kernel, tanh(gamma*u.dot(v) + intercept).
Normalizes the input by applying a logistic sigmoid to each element.
This class stores a String describing the data source, along with a
timestamp.
Extracts a value from a single field to be placed in an
Example
's metadata field.A version of
SimpleTextDataSource
that accepts a List
of Strings.Provenance for
SimpleStringDataSource
.A dataset for a simple data format for text classification experiments.
Provenance for
SimpleTextDataSource
.This is used for stateless functions such as exp, log, addition or multiplication by a constant.
Operations understood by this Transformation.
Provenance for
SimpleTransform
.A
Model
which wraps n independent regression models, where n is the
size of the MultipleRegressor domain.A
SparseModel
which wraps n independent regression models, where n is the
size of the MultipleRegressor domain.Base class for training n independent sparse models, one per dimension.
The skeleton of a TrainerProvenance that extracts the configured parameters.
Contains information about a feature and can be stored in the feature map
in a
Dataset
.A trainer for a sparse linear regression model.
The inference time version of a sparse linear regression model.
A model which uses a subset of the features it knows about to make predictions.
Denotes this trainer emits a
SparseModel
.A sparse vector.
This implementation of
Tokenizer
is instantiated with an array of
characters that are considered split characters.Splits tokens at the supplied characters.
CLI options for a
SplitCharactersTokenizer
.This class supports character-by-character (that is, codepoint-by-codepoint)
iteration over input text to create tokens.
An interface for checking if the text should be split at the supplied codepoint.
A combination of a
SplitFunctionTokenizer.SplitType
and a Token.TokenType
.Defines different ways that a tokenizer can split the input text at a given character.
An immutable
Node
with a split and two child nodes.This implementation of
Tokenizer
is instantiated with a regular
expression pattern which determines how to split a string into tokens.CLI options for a
SplitPatternTokenizer
.Splits data in our standard text format into training and testing portions.
Command line options.
A
DataSource
for loading columnar data from a database
and applying FieldProcessor
s to it.Provenance for
SQLDataSource
.N.B.
Read an SQL query in on the standard input, write a CSV file containing the
results to the standard output.
Command line options.
Squared loss, i.e., l2.
Interface for gradient based optimisation methods.
A main class for stripping out and storing provenance from a model.
Types of provenance that can be removed.
Command line options.
A feature aggregator that aggregates occurrence counts across a number of
feature lists.
The carrier type for LibSVM anomaly detection modes.
Valid SVM modes for anomaly detection.
The carrier type for LibSVM classification modes.
The classification model types.
A container for SVM parameters and the kernel.
The carrier type for LibSVM regression modes.
Type of regression SVM.
A carrier type for the SVM type.
An explainer for tabular data.
An interface for Tensors, currently Vectors and Matrices.
This model encapsulates a simple model with an input feed dict,
and produces a single output tensor.
A Tribuo wrapper around a TensorFlow frozen model.
Base class for a TensorFlow model that operates on
Example
s.This model encapsulates a TensorFlow model running in graph mode with a single tensor output.
A Tribuo wrapper around a TensorFlow saved model bundle.
A TensorFlow model which implements SequenceModel, suitable for use in sequential prediction tasks.
A trainer for SequenceModels which use an underlying TensorFlow graph.
Provenance for
TensorFlowSequenceTrainer
.Trainer for TensorFlow.
Provenance for
TensorFlowTrainer
.The model format to emit.
Helper functions for working with TensorFlow.
A serializable tuple containing the tensor class name, the shape and the data.
A map of names and tensors to feed into a session.
Test a classifier for a standard dataset.
Command line options.
TestingDetails section of a
ModelCard
.A base class for textual data sets.
An explainer for text data.
An interface for things that take text and turn them into examples that we
can use to train or evaluate a classifier.
A
FieldProcessor
which takes a text field and runs a TextPipeline
on it
to generate features.A pipeline that takes a String and returns a List of
Feature
s.An exception thrown by the text processing system.
A TextProcessor takes some text and optionally a feature tag and generates a list of
Feature
s from that text.A TrainerProvenance with a timestamp, used when there was no trainer
involved in model construction (e.g., creating an
EnsembleModel
from existing models).A single token extracted from a String.
Tokenizers may product multiple kinds of tokens, depending on the application
to which they're being put.
Wraps exceptions thrown by tokenizers.
An interface for things that tokenize text: breaking it into words according
to some set of rules.
CLI Options for creating a tokenizer.
A pipeline for generating ngram features.
An interface for things that can train predictive models.
A tag interface for trainer provenances.
An implementation of
TrainerProvenance
that delegates everything to
SkeletalTrainerProvenance
.TrainingDetails section of a
ModelCard
.Build and run a decision tree classifier for a standard dataset.
Build and run a classifier for a standard dataset.
Build and run a liblinear-java classifier for a standard dataset.
Build and run a LibSVM classifier for a standard dataset.
Build and run a multinomial naive bayes classifier for a standard dataset.
Build and run a classifier for a standard dataset using FMClassificationTrainer.
Build and run a kernel SVM classifier for a standard dataset.
Build and run a classifier for a standard dataset using LinearSGDTrainer.
Build and run an XGBoost classifier for a standard dataset.
Build and run a HDBSCAN* clustering model for a standard dataset.
Build and run a k-means clustering model for a standard dataset.
Build and run a Tensorflow multi-class classifier for a standard dataset.
Build and run a LibLinear regressor for a standard dataset.
Build and run a LibSVM regressor for a standard dataset.
Build and run a regression tree for a standard dataset.
Build and run a regression factorization machine for a standard dataset.
Build and run a linear regression for a standard dataset.
Build and run a sparse linear regression model for a standard dataset.
Build and run an XGBoost regressor for a standard dataset.
Command line options.
Command line options.
Options for the HDBSCAN* CLI.
Impurity function.
Type of feature extractor.
Options for the K-Means CLI.
Command line options.
Command line options.
Loss function.
Loss function.
Command line options.
Command line options.
Command line options.
Type of sparse linear model.
Options for training a model in TensorFlow.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Command line options.
Type of tree trainer.
Command line options.
This class provides static methods used by the demo classes in each classification backend.
Splits data into training and testing sets.
Provenance for a split data source.
An interface representing a class of transformations
which can be applied to a feature.
A carrier type for a set of transformations to be applied to a
Dataset
.A carrier type as OLCUT does not support nested generics.
A tag interface for provenances in the transformation system.
Wraps a
Model
with it's TransformerMap
so all Example
s are transformed
appropriately before the model makes predictions.A fitted
Transformation
which can apply
a transform to the input value.Provenance for
TransformerMap
.An interface for the statistics that need to be
collected for a specific
Transformation
on
a single feature.A
Trainer
which encapsulates another trainer plus a TransformationMap
object
to apply to each Dataset
before training each Model
.An inverted feature, which stores a reference to all the values of this feature.
This class stores the current Tribuo version, along with other compile time information.
Generates the counts for a triplet of vectors.
Aggregates feature tokens, generating unique features.
Processes a feature list, aggregating all the feature values with the same name.
The type of reduction operation to perform.
This class was originally written for the purpose of document indexing in an
information retrieval context (principally used in Sun Labs' Minion search
engine).
UsageDetails section of a
ModelCard
.A builder class for creating an instance of
UsageDetails
.SGD utilities.
Utilities.
Ye olde util class.
A nominal tuple.
A nominal tuple.
Adds an id number to a
VariableInfo
.A VariableInfo subclass contains information about a feature and
its observed values.
A functional interface that generates a normalized version of a double array.
A mutable tuple used to avoid allocation when iterating a vector.
An implementation of a viterbi model.
Types of label score aggregation.
Builds a Viterbi model using the supplied
Trainer
.Options for building a viterbi trainer.
Type of label features to include.
A combiner which performs a weighted or unweighted vote across the predicted labels.
A mutable tuple of a double and a long.
An ensemble model that uses weights to combine the ensemble member predictions.
Tag interface denoting that a
Trainer
can use example weights.A class of (discrete) weighted information theoretic functions.
Chooses which variable is the one with associated weights.
Tag interface denoting the
Trainer
can use label weights.Generates the counts for a pair of vectors.
Generates the counts for a triplet of vectors.
A simple tokenizer that splits on whitespace.
This is vanilla implementation of the Wordpiece algorithm as found here:
https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/tokenization_bert.py
This is a tokenizer that is used "upstream" of
WordpieceTokenizer
and
implements much of the functionality of the 'BasicTokenizer'
implementation in huggingface.This Tokenizer is meant to be a reasonable approximation of the BertTokenizer
defined here.
Converts XGBoost outputs into
Label
Prediction
s.A
Trainer
which wraps the XGBoost training procedure.A
Model
which wraps around a XGBoost.Booster which was trained by a system other than Tribuo.Generate and collate feature importance information from the XGBoost model.
An instance of feature importance values for a single feature.
A
Model
which wraps around a XGBoost.Booster.CLI options for training an XGBoost classifier.
CLI options for configuring an XGBoost regression trainer.
Converts the output of XGBoost into the appropriate prediction type.
Converts XGBoost outputs into
Regressor
Prediction
s.A
Trainer
which wraps the XGBoost training procedure.Types of regression loss.
A
Trainer
which wraps the XGBoost training procedure.The type of XGBoost model.
Tuple of a DMatrix, the number of valid features in each example, and the examples themselves.
The logging verbosity of the native library.
The tree building algorithm.
Deprecated.
Unused.
CSVDataSource
.