org.tribuo.common.xgboost.XGBoostTrainer<Label>

org.tribuo.classification.xgboost.XGBoostClassificationTrainer

All Implemented Interfaces:: com.oracle.labs.mlrg.olcut.config.Configurable, com.oracle.labs.mlrg.olcut.provenance.Provenancable<TrainerProvenance>, Trainer<Label>, WeightedExamples

public final class XGBoostClassificationTrainer extends XGBoostTrainer<Label>

A Trainer which wraps the XGBoost training procedure.

This only exposes a few of XGBoost's training parameters.

It uses pthreads outside of the JVM to parallelise the computation.

See:

 Chen T, Guestrin C.
 "XGBoost: A Scalable Tree Boosting System"
 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.

and for the original algorithm:

 Friedman JH.
 "Greedy Function Approximation: a Gradient Boosting Machine"
 Annals of statistics, 2001.

N.B.: XGBoost4J wraps the native C implementation of xgboost that links to various C libraries, including libgomp and glibc (on Linux). If you're running on Alpine, which does not natively use glibc, you'll need to install glibc into the container. On the macOS binary on Maven Central is compiled without OpenMP support, meaning that XGBoost is single threaded on macOS. You can recompile the macOS binary with OpenMP support after installing libomp from homebrew if necessary.

Nested Class Summary

Nested classes/interfaces inherited from class org.tribuo.common.xgboost.XGBoostTrainer
XGBoostTrainer.BoosterType, XGBoostTrainer.DMatrixTuple<T extends Output<T>>, XGBoostTrainer.LoggingVerbosity, XGBoostTrainer.TreeMethod, XGBoostTrainer.XGBoostTrainerProvenance
Field Summary

Fields inherited from class org.tribuo.common.xgboost.XGBoostTrainer
numTrees, overrideParameters, parameters, trainInvocationCounter

Fields inherited from interface org.tribuo.Trainer
DEFAULT_SEED, INCREMENT_INVOCATION_COUNT
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

XGBoostClassificationTrainer()

For olcut.

XGBoostClassificationTrainer(int numTrees)

Create an XGBoost trainer.

XGBoostClassificationTrainer(int numTrees, double eta, double gamma, int maxDepth, double minChildWeight, double subsample, double featureSubsample, double lambda, double alpha, int nThread, boolean silent, long seed)

Create an XGBoost trainer.

XGBoostClassificationTrainer(int numTrees, int numThreads, boolean silent)

Create an XGBoost trainer.

XGBoostClassificationTrainer(int numTrees, Map<String,Object> parameters)

This gives direct access to the XGBoost parameter map.

XGBoostClassificationTrainer(XGBoostTrainer.BoosterType boosterType, XGBoostTrainer.TreeMethod treeMethod, int numTrees, double eta, double gamma, int maxDepth, double minChildWeight, double subsample, double featureSubsample, double lambda, double alpha, int nThread, XGBoostTrainer.LoggingVerbosity verbosity, long seed)

Create an XGBoost trainer.
Method Summary

Modifier and Type

Method

Description

TrainerProvenance

getProvenance()

void

postConfig()

Used by the OLCUT configuration system, and should not be called by external code.

XGBoostModel<Label>

train(Dataset<Label> examples)

Trains a predictive model using the examples in the given data set.

XGBoostModel<Label>

train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)

Trains a predictive model using the examples in the given data set.

XGBoostModel<Label>

train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount)

Trains a predictive model using the examples in the given data set.

Methods inherited from class org.tribuo.common.xgboost.XGBoostTrainer
convertDataset, convertDataset, convertExample, convertExample, convertExamples, convertExamples, convertSingleExample, convertSparseVector, convertSparseVectors, copyParams, createModel, getInvocationCount, setInvocationCount, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- XGBoostClassificationTrainer
  
  public XGBoostClassificationTrainer(int numTrees)
  
  Create an XGBoost trainer.
  
  Parameters:
  
  numTrees - Number of trees to boost.
- XGBoostClassificationTrainer
  
  public XGBoostClassificationTrainer(int numTrees, int numThreads, boolean silent)
  
  Create an XGBoost trainer.
  
  Parameters:
  
  numTrees - Number of trees to boost.
  
  numThreads - Number of threads to use.
  
  silent - Silence the training output text.
- XGBoostClassificationTrainer
  
  public XGBoostClassificationTrainer(int numTrees, double eta, double gamma, int maxDepth, double minChildWeight, double subsample, double featureSubsample, double lambda, double alpha, int nThread, boolean silent, long seed)
  
  Create an XGBoost trainer.
  
  Parameters:
  
  numTrees - Number of trees to boost.
  
  eta - Step size shrinkage parameter (default 0.3, range [0,1]).
  
  gamma - Minimum loss reduction to make a split (default 0, range [0,inf]).
  
  maxDepth - Maximum tree depth (default 6, range [1,inf]).
  
  minChildWeight - Minimum sum of instance weights needed in a leaf (default 1, range [0, inf]).
  
  subsample - Subsample size for each tree (default 1, range (0,1]).
  
  featureSubsample - Subsample features for each tree (default 1, range (0,1]).
  
  lambda - L2 regularization term on weights (default 1).
  
  alpha - L1 regularization term on weights (default 0).
  
  nThread - Number of threads to use (default 4).
  
  silent - Silence the training output text.
  
  seed - RNG seed.
- XGBoostClassificationTrainer
  
  public XGBoostClassificationTrainer(XGBoostTrainer.BoosterType boosterType, XGBoostTrainer.TreeMethod treeMethod, int numTrees, double eta, double gamma, int maxDepth, double minChildWeight, double subsample, double featureSubsample, double lambda, double alpha, int nThread, XGBoostTrainer.LoggingVerbosity verbosity, long seed)
  
  Create an XGBoost trainer.
  
  Parameters:
  
  boosterType - The base learning algorithm.
  
  treeMethod - The tree building algorithm if using a tree booster.
  
  numTrees - Number of trees to boost.
  
  eta - Step size shrinkage parameter (default 0.3, range [0,1]).
  
  gamma - Minimum loss reduction to make a split (default 0, range [0,inf]).
  
  maxDepth - Maximum tree depth (default 6, range [1,inf]).
  
  minChildWeight - Minimum sum of instance weights needed in a leaf (default 1, range [0, inf]).
  
  subsample - Subsample size for each tree (default 1, range (0,1]).
  
  featureSubsample - Subsample features for each tree (default 1, range (0,1]).
  
  lambda - L2 regularization term on weights (default 1).
  
  alpha - L1 regularization term on weights (default 0).
  
  nThread - Number of threads to use (default 4).
  
  verbosity - Set the logging verbosity of the native library.
  
  seed - RNG seed.
- XGBoostClassificationTrainer
  
  public XGBoostClassificationTrainer(int numTrees, Map<String,Object> parameters)
  
  This gives direct access to the XGBoost parameter map.
  It lets you pick things that we haven't exposed like dropout trees, binary classification etc.
  This sidesteps the validation that Tribuo provides for the hyperparameters, and so can produce unexpected results.
  
  Parameters:
  
  numTrees - Number of trees to boost.
  
  parameters - A map from string to object, where object can be Number or String.
- XGBoostClassificationTrainer
  
  protected XGBoostClassificationTrainer()
  
  For olcut.
Method Details
- postConfig
  
  public void postConfig()
  
  Used by the OLCUT configuration system, and should not be called by external code.
  
  Specified by:
  
  postConfig in interface com.oracle.labs.mlrg.olcut.config.Configurable
  
  Overrides:
  
  postConfig in class XGBoostTrainer<Label>
- train
  
  public XGBoostModel<Label> train(Dataset<Label> examples)
  
  Description copied from interface: Trainer
  
  Trains a predictive model using the examples in the given data set.
  
  Parameters:
  
  examples - the data set containing the examples.
  
  Returns:
  
  a predictive model that can be used to generate predictions for new examples.
- train
  
  public XGBoostModel<Label> train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
  
  Description copied from interface: Trainer
  
  Trains a predictive model using the examples in the given data set.
  
  Parameters:
  
  examples - the data set containing the examples.
  
  runProvenance - Training run specific provenance (e.g., fold number).
  
  Returns:
  
  a predictive model that can be used to generate predictions for new examples.
- train
  
  public XGBoostModel<Label> train(Dataset<Label> examples, Map<String, com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance, int invocationCount)
  
  Description copied from interface: Trainer
  
  Trains a predictive model using the examples in the given data set.
  
  Parameters:
  
  examples - the data set containing the examples.
  
  runProvenance - Training run specific provenance (e.g., fold number).
  
  invocationCount - The invocation counter that the trainer should be set to before training, which in most cases alters the state of the RNG inside this trainer. If the value is set to Trainer.INCREMENT_INVOCATION_COUNT then the invocation count is not changed.
  
  Returns:
  
  a predictive model that can be used to generate predictions for new examples.
- getProvenance
  
  public TrainerProvenance getProvenance()

Class XGBoostClassificationTrainer

Nested Class Summary

Nested classes/interfaces inherited from class org.tribuo.common.xgboost.XGBoostTrainer

Field Summary

Fields inherited from class org.tribuo.common.xgboost.XGBoostTrainer

Fields inherited from interface org.tribuo.Trainer

Constructor Summary

Method Summary

Methods inherited from class org.tribuo.common.xgboost.XGBoostTrainer

Methods inherited from class java.lang.Object

Constructor Details

XGBoostClassificationTrainer

XGBoostClassificationTrainer

XGBoostClassificationTrainer

XGBoostClassificationTrainer

XGBoostClassificationTrainer

XGBoostClassificationTrainer

Method Details

postConfig

train

train

train

getProvenance