Class XGBoostModel<T extends Output<T>>
- All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<ModelProvenance>
,Serializable
Model
which wraps around a XGBoost.Booster.
XGBoost is a fast implementation of gradient boosted decision trees.
Throws IllegalStateException if the XGBoost C++ library fails to load or throws an exception.
See:
Chen T, Guestrin C. "XGBoost: A Scalable Tree Boosting System" Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.and for the original algorithm:
Friedman JH. "Greedy Function Approximation: a Gradient Boosting Machine" Annals of statistics, 2001.
N.B.: XGBoost4J wraps the native C implementation of xgboost that links to various C libraries, including libgomp and glibc (on Linux). If you're running on Alpine, which does not natively use glibc, you'll need to install glibc into the container. On the macOS binary on Maven Central is compiled without OpenMP support, meaning that XGBoost is single threaded on macOS. You can recompile the macOS binary with OpenMP support after installing libomp from homebrew if necessary.
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionprotected List<ml.dmlc.xgboost4j.java.Booster>
The XGBoost4J Boosters.Fields inherited from class org.tribuo.Model
ALL_OUTPUTS, BIAS_FEATURE, featureIDMap, generatesProbabilities, name, outputIDInfo, provenance, provenanceOutput
-
Method Summary
Modifier and TypeMethodDescriptioncopy
(String newName, ModelProvenance newProvenance) Copies a model, replacing its provenance and name with the supplied values.Generates an excuse for an example.Creates objects to report feature importance metrics for XGBoost.List<ml.dmlc.xgboost4j.java.Booster>
Returns an unmodifiable list containing a copy of each model.Returns the string model dumps from each Booster.getTopFeatures
(int n) Gets the topn
features associated with this model.List<Prediction<T>>
Uses the model to predict the label for multiple examples.List<Prediction<T>>
Uses the model to predict the labels for multiple examples contained in a data set.Uses the model to predict the output for a single example.void
setNumThreads
(int threads) Sets the number of threads to use at prediction time.Methods inherited from class org.tribuo.Model
castModel, copy, generatesProbabilities, getExcuses, getFeatureIDMap, getName, getOutputIDInfo, getProvenance, innerPredict, setName, toString, validate
-
Field Details
-
models
The XGBoost4J Boosters.
-
-
Method Details
-
getInnerModels
Returns an unmodifiable list containing a copy of each model.As XGBoost4J models don't expose a copy constructor this requires serializing each model to a byte array and rebuilding it, and is thus quite expensive.
- Returns:
- A copy of all of the models.
-
setNumThreads
public void setNumThreads(int threads) Sets the number of threads to use at prediction time.If set to 0 sets nthreads = num hardware threads.
- Parameters:
threads
- The new number of threads.
-
predict
Uses the model to predict the labels for multiple examples contained in a data set. -
predict
Uses the model to predict the label for multiple examples. -
predict
Description copied from class:Model
Uses the model to predict the output for a single example.predict does not mutate the example.
Throws
IllegalArgumentException
if the example has no features or no feature overlap with the model. -
getFeatureImportance
Creates objects to report feature importance metrics for XGBoost. See the documentation ofXGBoostFeatureImportance
for more information on what those metrics mean. Typically this list will contain a single instance for the entire model. For multidimensional regression the list will have one entry per dimension, in dimension order.- Returns:
- The feature importance object(s).
-
getTopFeatures
Description copied from class:Model
Gets the topn
features associated with this model.If the model does not produce per output feature lists, it returns a map with a single element with key Model.ALL_OUTPUTS.
If the model cannot describe it's top features then it returns
Collections.emptyMap()
.- Specified by:
getTopFeatures
in classModel<T extends Output<T>>
- Parameters:
n
- the number of features to return. If this value is less than 0, all features should be returned for each class, unless the model cannot score it's features.- Returns:
- a map from string outputs to an ordered list of pairs of feature names and weights associated with that feature in the model
-
getModelDump
Returns the string model dumps from each Booster.- Returns:
- The model dumps.
-
getExcuse
Description copied from class:Model
Generates an excuse for an example.This attempts to explain a classification result. Generating an excuse may be quite an expensive operation.
This excuse either contains per class information or an entry with key Model.ALL_OUTPUTS.
The optional is empty if the model does not provide excuses.
-
copy
Description copied from class:Model
Copies a model, replacing its provenance and name with the supplied values.Used to provide the provenance removal functionality.
-