public final class XGBoostRegressionTrainer extends XGBoostTrainer<Regressor>
Trainer
which wraps the XGBoost training procedure.
This only exposes a few of XGBoost's training parameters.
It uses pthreads outside of the JVM to parallelise the computation.
Each output dimension is trained independently (and so contains a separate XGBoost ensemble).
See:
Chen T, Guestrin C. "XGBoost: A Scalable Tree Boosting System" Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.and for the original algorithm:
Friedman JH. "Greedy Function Approximation: a Gradient Boosting Machine" Annals of statistics, 2001.
N.B.: XGBoost4J wraps the native C implementation of xgboost that links to various C libraries, including libgomp and glibc (on Linux). If you're running on Alpine, which does not natively use glibc, you'll need to install glibc into the container. On the macOS binary on Maven Central is compiled without OpenMP support, meaning that XGBoost is single threaded on macOS. You can recompile the macOS binary with OpenMP support after installing libomp from homebrew if necessary.
Modifier and Type | Class and Description |
---|---|
static class |
XGBoostRegressionTrainer.RegressionType
Types of regression loss.
|
XGBoostTrainer.BoosterType, XGBoostTrainer.DMatrixTuple<T extends Output<T>>, XGBoostTrainer.LoggingVerbosity, XGBoostTrainer.TreeMethod, XGBoostTrainer.XGBoostTrainerProvenance
numTrees, parameters, trainInvocationCounter
DEFAULT_SEED
Constructor and Description |
---|
XGBoostRegressionTrainer(int numTrees)
Creates an XGBoostRegressionTrainer using the default parameters, the squared error loss
and the supplied number of trees.
|
XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType,
int numTrees)
Creates an XGBoostRegressionTrainer using the default parameters, the supplied loss
and the supplied number of trees.
|
XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType,
int numTrees,
double eta,
double gamma,
int maxDepth,
double minChildWeight,
double subsample,
double featureSubsample,
double lambda,
double alpha,
int nThread,
boolean silent,
long seed)
Create an XGBoost trainer.
|
XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType,
int numTrees,
int numThreads,
boolean silent)
Creates an XGBoostRegressionTrainer using the default parameters with the supplied
loss, number of trees, number of threads, and logging level.
|
XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType,
int numTrees,
Map<String,Object> parameters)
This gives direct access to the XGBoost parameter map.
|
XGBoostRegressionTrainer(XGBoostTrainer.BoosterType boosterType,
XGBoostTrainer.TreeMethod treeMethod,
XGBoostRegressionTrainer.RegressionType rType,
int numTrees,
double eta,
double gamma,
int maxDepth,
double minChildWeight,
double subsample,
double featureSubsample,
double lambda,
double alpha,
int nThread,
XGBoostTrainer.LoggingVerbosity verbosity,
long seed)
Create an XGBoost trainer.
|
Modifier and Type | Method and Description |
---|---|
TrainerProvenance |
getProvenance() |
void |
postConfig()
Used by the OLCUT configuration system, and should not be called by external code.
|
XGBoostModel<Regressor> |
train(Dataset<Regressor> examples,
Map<String,com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Trains a predictive model using the examples in the given data set.
|
convertDataset, convertDataset, convertExample, convertExample, convertExamples, convertExamples, convertSingleExample, convertSparseVector, convertSparseVectors, createModel, getInvocationCount, toString
public XGBoostRegressionTrainer(int numTrees)
numTrees
- The number of trees.public XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType, int numTrees)
rType
- The regression loss function.numTrees
- The number of trees.public XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType, int numTrees, int numThreads, boolean silent)
rType
- The regression loss function.numTrees
- The number of trees.numThreads
- The number of threads.silent
- Silence the XGBoost logger.public XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType, int numTrees, double eta, double gamma, int maxDepth, double minChildWeight, double subsample, double featureSubsample, double lambda, double alpha, int nThread, boolean silent, long seed)
rType
- The type of regression to build.numTrees
- Number of trees to boost.eta
- Step size shrinkage parameter (default 0.3, range [0,1]).gamma
- Minimum loss reduction to make a split (default 0, range
[0,inf]).maxDepth
- Maximum tree depth (default 6, range [1,inf]).minChildWeight
- Minimum sum of instance weights needed in a leaf
(default 1, range [0, inf]).subsample
- Subsample size for each tree (default 1, range (0,1]).featureSubsample
- Subsample features for each tree (default 1,
range (0,1]).lambda
- L2 regularization term on weights (default 1).alpha
- L1 regularization term on weights (default 0).nThread
- Number of threads to use (default 4).silent
- Silence the training output text.seed
- RNG seed.public XGBoostRegressionTrainer(XGBoostTrainer.BoosterType boosterType, XGBoostTrainer.TreeMethod treeMethod, XGBoostRegressionTrainer.RegressionType rType, int numTrees, double eta, double gamma, int maxDepth, double minChildWeight, double subsample, double featureSubsample, double lambda, double alpha, int nThread, XGBoostTrainer.LoggingVerbosity verbosity, long seed)
boosterType
- The base learning algorithm.treeMethod
- The tree building algorithm if using a tree booster.rType
- The type of regression to build.numTrees
- Number of trees to boost.eta
- Step size shrinkage parameter (default 0.3, range [0,1]).gamma
- Minimum loss reduction to make a split (default 0, range
[0,inf]).maxDepth
- Maximum tree depth (default 6, range [1,inf]).minChildWeight
- Minimum sum of instance weights needed in a leaf
(default 1, range [0, inf]).subsample
- Subsample size for each tree (default 1, range (0,1]).featureSubsample
- Subsample features for each tree (default 1,
range (0,1]).lambda
- L2 regularization term on weights (default 1).alpha
- L1 regularization term on weights (default 0).nThread
- Number of threads to use (default 4).verbosity
- Set the logging verbosity of the native library.seed
- RNG seed.public XGBoostRegressionTrainer(XGBoostRegressionTrainer.RegressionType rType, int numTrees, Map<String,Object> parameters)
It lets you pick things that we haven't exposed like dropout trees, binary classification etc.
This sidesteps the validation that Tribuo provides for the hyperparameters, and so can produce unexpected results.
rType
- The type of the regression.numTrees
- Number of trees to boost.parameters
- A map from string to object, where object can be Number or String.public void postConfig()
postConfig
in interface com.oracle.labs.mlrg.olcut.config.Configurable
postConfig
in class XGBoostTrainer<Regressor>
public XGBoostModel<Regressor> train(Dataset<Regressor> examples, Map<String,com.oracle.labs.mlrg.olcut.provenance.Provenance> runProvenance)
Trainer
examples
- the data set containing the examples.runProvenance
- Training run specific provenance (e.g., fold number).public TrainerProvenance getProvenance()
Copyright © 2015–2021 Oracle and/or its affiliates. All rights reserved.