Class XGBoostExternalModel<T extends Output<T>>

java.lang.Object
org.tribuo.Model<T>
org.tribuo.interop.ExternalModel<T,ml.dmlc.xgboost4j.java.DMatrix,float[][]>
org.tribuo.common.xgboost.XGBoostExternalModel<T>
All Implemented Interfaces:
com.oracle.labs.mlrg.olcut.provenance.Provenancable<ModelProvenance>, Serializable, ProtoSerializable<org.tribuo.protos.core.ModelProto>

public final class XGBoostExternalModel<T extends Output<T>> extends ExternalModel<T,ml.dmlc.xgboost4j.java.DMatrix,float[][]>
A Model which wraps around a XGBoost.Booster which was trained by a system other than Tribuo.

XGBoost is a fast implementation of gradient boosted decision trees.

Throws IllegalStateException if the XGBoost C++ library fails to load or throws an exception.

See:

 Chen T, Guestrin C.
 "XGBoost: A Scalable Tree Boosting System"
 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
 

and for the original algorithm:

 Friedman JH.
 "Greedy Function Approximation: a Gradient Boosting Machine"
 Annals of statistics, 2001.
 

N.B.: XGBoost4J wraps the native C implementation of xgboost that links to various C libraries, including libgomp and glibc (on Linux). If you're running on Alpine, which does not natively use glibc, you'll need to install glibc into the container. On the macOS binary on Maven Central is compiled without OpenMP support, meaning that XGBoost is single threaded on macOS. You can recompile the macOS binary with OpenMP support after installing libomp from homebrew if necessary.

See Also:
  • Field Details

    • CURRENT_VERSION

      public static final int CURRENT_VERSION
      Protobuf serialization version.
      See Also:
    • model

      protected transient ml.dmlc.xgboost4j.java.Booster model
      Transient as we rely upon the native serialisation mechanism to bytes rather than Java serializing the Booster.
  • Method Details

    • deserializeFromProto

      public static XGBoostExternalModel<?> deserializeFromProto(int version, String className, com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException, ml.dmlc.xgboost4j.java.XGBoostError, IOException
      Deserialization factory.
      Parameters:
      version - The serialized object version.
      className - The class name.
      message - The serialized data.
      Returns:
      The deserialized object.
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - If the protobuf could not be parsed from the message.
      ml.dmlc.xgboost4j.java.XGBoostError - If the XGBoost byte array failed to parse.
      IOException - If the XGBoost byte array failed to parse.
    • convertFeatures

      protected ml.dmlc.xgboost4j.java.DMatrix convertFeatures(SparseVector input)
      Description copied from class: ExternalModel
      Converts from a SparseVector using the external model's indices into the ingestion format for the external model.
      Specified by:
      convertFeatures in class ExternalModel<T extends Output<T>,ml.dmlc.xgboost4j.java.DMatrix,float[][]>
      Parameters:
      input - The features using external indices.
      Returns:
      The ingestion format for the external model.
    • convertFeaturesList

      protected ml.dmlc.xgboost4j.java.DMatrix convertFeaturesList(List<SparseVector> input)
      Description copied from class: ExternalModel
      Converts from a list of SparseVector using the external model's indices into the ingestion format for the external model.
      Specified by:
      convertFeaturesList in class ExternalModel<T extends Output<T>,ml.dmlc.xgboost4j.java.DMatrix,float[][]>
      Parameters:
      input - The features using external indices.
      Returns:
      The ingestion format for the external model.
    • externalPrediction

      protected float[][] externalPrediction(ml.dmlc.xgboost4j.java.DMatrix input)
      Description copied from class: ExternalModel
      Runs the external model's prediction function.
      Specified by:
      externalPrediction in class ExternalModel<T extends Output<T>,ml.dmlc.xgboost4j.java.DMatrix,float[][]>
      Parameters:
      input - The input in the external model's format.
      Returns:
      The output in the external model's format.
    • convertOutput

      protected Prediction<T> convertOutput(float[][] output, int numValidFeatures, Example<T> example)
      Description copied from class: ExternalModel
      Converts the output of the external model into a Prediction.
      Specified by:
      convertOutput in class ExternalModel<T extends Output<T>,ml.dmlc.xgboost4j.java.DMatrix,float[][]>
      Parameters:
      output - The output of the external model.
      numValidFeatures - The number of valid features in the input.
      example - The input example, used to construct the Prediction.
      Returns:
      A Tribuo Prediction.
    • convertOutput

      protected List<Prediction<T>> convertOutput(float[][] output, int[] numValidFeatures, List<Example<T>> examples)
      Description copied from class: ExternalModel
      Converts the output of the external model into a list of Predictions.
      Specified by:
      convertOutput in class ExternalModel<T extends Output<T>,ml.dmlc.xgboost4j.java.DMatrix,float[][]>
      Parameters:
      output - The output of the external model.
      numValidFeatures - An array with the number of valid features in each example.
      examples - The input examples, used to construct the Predictions.
      Returns:
      A list of Tribuo Predictions.
    • getFeatureImportance

      public List<XGBoostFeatureImportance> getFeatureImportance()
      Creates objects to report feature importance metrics for XGBoost. See the documentation of XGBoostFeatureImportance for more information on what those metrics mean. Typically this list will contain a single instance for the entire model. For multidimensional regression the list will have one entry per dimension, in dimension order.
      Returns:
      The feature importance object(s).
    • getTopFeatures

      public Map<String,List<com.oracle.labs.mlrg.olcut.util.Pair<String,Double>>> getTopFeatures(int n)
      Description copied from class: Model
      Gets the top n features associated with this model.

      If the model does not produce per output feature lists, it returns a map with a single element with key Model.ALL_OUTPUTS.

      If the model cannot describe it's top features then it returns Collections.emptyMap().

      Specified by:
      getTopFeatures in class Model<T extends Output<T>>
      Parameters:
      n - the number of features to return. If this value is less than 0, all features should be returned for each class, unless the model cannot score it's features.
      Returns:
      a map from string outputs to an ordered list of pairs of feature names and weights associated with that feature in the model
    • serialize

      public org.tribuo.protos.core.ModelProto serialize()
      Description copied from interface: ProtoSerializable
      Serializes this object to a protobuf.
      Specified by:
      serialize in interface ProtoSerializable<T extends Output<T>>
      Overrides:
      serialize in class Model<T extends Output<T>>
      Returns:
      The protobuf.
    • copy

      protected XGBoostExternalModel<T> copy(String newName, ModelProvenance newProvenance)
      Description copied from class: Model
      Copies a model, replacing its provenance and name with the supplied values.

      Used to provide the provenance removal functionality.

      Specified by:
      copy in class Model<T extends Output<T>>
      Parameters:
      newName - The new name.
      newProvenance - The new provenance.
      Returns:
      A copy of the model.
    • createXGBoostModel

      public static <T extends Output<T>> XGBoostExternalModel<T> createXGBoostModel(OutputFactory<T> factory, Map<String,Integer> featureMapping, Map<T,Integer> outputMapping, XGBoostOutputConverter<T> outputFunc, String path)
      Creates an XGBoostExternalModel from the supplied model on disk.
      Type Parameters:
      T - The type of the output.
      Parameters:
      factory - The output factory to use.
      featureMapping - The feature mapping between Tribuo names and XGBoost integer ids.
      outputMapping - The output mapping between Tribuo outputs and XGBoost integer ids.
      outputFunc - The XGBoostOutputConverter function for the output type.
      path - The path to the model on disk.
      Returns:
      An XGBoostExternalModel ready to score new inputs.
    • createXGBoostModel

      public static <T extends Output<T>> XGBoostExternalModel<T> createXGBoostModel(OutputFactory<T> factory, Map<String,Integer> featureMapping, Map<T,Integer> outputMapping, XGBoostOutputConverter<T> outputFunc, Path path)
      Creates an XGBoostExternalModel from the supplied model on disk.
      Type Parameters:
      T - The type of the output.
      Parameters:
      factory - The output factory to use.
      featureMapping - The feature mapping between Tribuo names and XGBoost integer ids.
      outputMapping - The output mapping between Tribuo outputs and XGBoost integer ids.
      outputFunc - The XGBoostOutputConverter function for the output type.
      path - The path to the model on disk.
      Returns:
      An XGBoostExternalModel ready to score new inputs.
    • createXGBoostModel

      @Deprecated public static <T extends Output<T>> XGBoostExternalModel<T> createXGBoostModel(OutputFactory<T> factory, Map<String,Integer> featureMapping, Map<T,Integer> outputMapping, XGBoostOutputConverter<T> outputFunc, ml.dmlc.xgboost4j.java.Booster model, URL provenanceLocation)
      Deprecated.
      As the URL argument must always be valid. To wrap an in-memory booster use createXGBoostModel(OutputFactory, Map, Map, XGBoostOutputConverter, Booster, Map).
      Creates an XGBoostExternalModel from the supplied model.

      Note: the provenance system requires that the URL point to a valid local file and will throw an exception if it is not. However it doesn't check that the file is where the Booster was created from.

      Type Parameters:
      T - The type of the output.
      Parameters:
      factory - The output factory to use.
      featureMapping - The feature mapping between Tribuo names and XGBoost integer ids.
      outputMapping - The output mapping between Tribuo outputs and XGBoost integer ids.
      outputFunc - The XGBoostOutputConverter function for the output type.
      model - The XGBoost model to wrap.
      provenanceLocation - The location where the model was loaded from.
      Returns:
      An XGBoostExternalModel ready to score new inputs.
    • createXGBoostModel

      public static <T extends Output<T>> XGBoostExternalModel<T> createXGBoostModel(OutputFactory<T> factory, Map<String,Integer> featureMapping, Map<T,Integer> outputMapping, XGBoostOutputConverter<T> outputFunc, ml.dmlc.xgboost4j.java.Booster model, Map<String,com.oracle.labs.mlrg.olcut.provenance.Provenance> instanceProvenance)
      Creates an XGBoostExternalModel from the supplied in-memory XGBoost Booster.
      Type Parameters:
      T - The type of the output.
      Parameters:
      factory - The output factory to use.
      featureMapping - The feature mapping between Tribuo names and XGBoost integer ids.
      outputMapping - The output mapping between Tribuo outputs and XGBoost integer ids.
      outputFunc - The XGBoostOutputConverter function for the output type.
      model - The XGBoost model to wrap.
      instanceProvenance - Provenance for this model.
      Returns:
      An XGBoostExternalModel ready to score new inputs.