Features
These are just a few of the reasons you’ll want to try out Tribuo.
Provenance
Tribuo’s Models, Datasets and Evaluations have provenance, they know exactly what parameters, transformations and files were used to create them. This means each model can be rebuilt from scratch, and experiments are easy as the evaluation tracks the models and datasets used.
Type-safety
Tribuo is strongly typed (like Java). Each model knows what kind of output it produces, what inputs it expects and the names of everything involved. No more confusion when loading something off disk, Tribuo knows what kind of model it is and what labels it can predict.
Interoperability
Tribuo provides interfaces to popular ML libraries like XGBoost and Tensorflow, along with support for the onnx model exchange format. Our ONNX support (via onnx-runtime) allows you to deploy models built in other packages and other languages (such as Python’s scikit-learn) alongside models trained with Tribuo. Many Tribuo models can be exported in ONNX format for deployment in other systems, or cloud services like OCI Data Science.
Algorithms
Tribuo offers support for many popular machine learning algorithm. These algorithms are grouped by type, and Tribuo’s abstract interface makes switching between implementations simple.
General predictors
Tribuo has several implementations which can be used for a variety prediction tasks:
Algorithm | Implementation | Notes |
---|---|---|
Bagging | Tribuo | Can use any Tribuo trainer as the base learner |
Random Forest | Tribuo | Can use any Tribuo tree trainer as the base learner |
Extra Trees | Tribuo | For both classification and regression |
K-NN | Tribuo | Includes options for several parallel backends, as well as a single threaded backend |
Neural Networks | TensorFlow | Train a neural network in TensorFlow via the Tribuo wrapper. Models can be deployed using the ONNX interface or the TF interface |
The ensembles and K-NN use a combination function to produce the output, those combiners are prediction task specific but the ensemble & K-NN implementations are task agnostic. We provide voting and averaging combiners for multi-class classification, multi-label classification and regression tasks.
Classification
Tribuo has implementations or interfaces for:
Algorithm | Implementation | Notes |
---|---|---|
Linear models | Tribuo | Uses SGD and allows any gradient optimizer |
Factorization Machines | Tribuo | Uses SGD and allows any gradient optimizer |
CART | Tribuo | |
SVM-SGD | Tribuo | An implementation of the Pegasos algorithm |
Adaboost.SAMME | Tribuo | Can use any Tribuo classification trainer as the base learner |
Multinomial Naive Bayes | Tribuo | |
Regularised Linear Models | LibLinear | |
SVM | LibSVM or LibLinear | LibLinear only supports linear SVMs |
Gradient Boosted Decision Trees | XGBoost |
Tribuo also has a linear chain CRF for sequence classification tasks. This is also trained via SGD using any of Tribuo’s gradient optimizers.
Tribuo has a set of information theoretic feature selection algorithms which can be applied to classification tasks. Feature inputs are automatically discretised into equal width bins. At the moment this includes implementations of mutual information maximisation (MIM), Conditional Mutual Information Maximisation (CMIM), minimum Redundancy Maximum Relevancy (mRMR) and Joint Mutual Information (JMI).
To explain classifier predictions there is an implementation of the LIME algorithm. Tribuo’s implementation allows the mixing of text and tabular data, along with the use of any sparse model as an explainer (e.g., regression trees, lasso etc), however it does not support images.
Regression
Tribuo’s regression algorithms are multidimensional by default, any single dimensional implementations are wrapped so they can produce a multidimensional output.
Algorithm | Implementation | Notes |
---|---|---|
Linear models | Tribuo | Uses SGD and allows any gradient optimizer |
Factorization Machines | Tribuo | Uses SGD and allows any gradient optimizer |
CART | Tribuo | |
Lasso | Tribuo | Using the LARS algorithm |
Elastic Net | Tribuo | Using the co-ordinate descent algorithm |
Regularised Linear Models | LibLinear | |
SVM | LibSVM or LibLinear | LibLinear only supports linear SVMs |
Gradient Boosted Decision Trees | XGBoost |
Clustering
Tribuo has infrastructure for clustering and a single algorithm. We expect to add new implementations over time.
Algorithm | Implementation | Notes |
---|---|---|
HDBSCAN* | Tribuo | |
K-Means | Tribuo | Includes both sequential and parallel backends, and the K-Means++ initialisation algorithm |
Anomaly Detection
Tribuo offers infrastructure for anomaly detection tasks. We expect to add new implementations over time.
Algorithm | Implementation | Notes |
---|---|---|
One-class SVM | LibSVM | |
One-class linear SVM | LibLinear |
Multi-label classification
Tribuo offers infrastructure for multi-label classification, along with a wrapper which converts any of Tribuo’s multi-class classification algorithms into a multi-label classification algorithm. We expect to add more multi-label specific implementations over time.
Algorithm | Implementation | Notes |
---|---|---|
Independent wrapper | Tribuo | Converts a multi-class classification algorithm into a multi-label one by producing a separate classifier for each label |
Classifier Chains | Tribuo | Provides classifier chains and randomized classifier chain ensembles using any of Tribuo’s multi-class classification algorithms |
Linear models | Tribuo | Uses SGD and allows any gradient optimizer |
Factorization Machines | Tribuo | Uses SGD and allows any gradient optimizer |
Interfaces
In addition to our own implementations of Machine Learning algorithms, Tribuo also provides a common interface to popular ML tools on the JVM. If you’re interested in contributing a new interface, open a GitHub Issue, and we can discuss how it would fit into Tribuo.
Currently we have interfaces to:
- LibLinear - via the LibLinear-java port of the original LibLinear (v2.44).
- LibSVM - using the pure Java transformed version of the C++ implementation (v3.25).
- ONNX Runtime - via the Java API contributed by our group (v1.12.1).
- TensorFlow - Using TensorFlow Java v0.4.2 (based on TensorFlow v2.7.4). This allows the training and deployment of TensorFlow models entirely in Java.
- XGBoost - via the built in XGBoost4J API (v1.6.2).