Class XGBoostFeatureImportance

java.lang.Object
org.tribuo.common.xgboost.XGBoostFeatureImportance

public class XGBoostFeatureImportance extends Object
Generate and collate feature importance information from the XGBoost model. This wraps the underlying functionality of the XGBoost model, and should provide feature importance metrics compatible with those provided by XGBoost's R and Python APIs. For a more treatment of what the different importance metrics mean and how to interpret them, see here. In brief
  • Gain measures the improvement in accuracy that a feature brings to the branches on which it appears. This represents the sum of situated marginal contributions that a given feature makes to the each branching chain in which it appears.
  • Cover measures the number of examples a given feature discriminates across, relative to the total number of examples all features discriminate across.
  • Weight measures the number a times a feature occurs in the model. Due to the way the model builds trees, this value is skewed in favor of continuous features.
  • Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.
  • Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.
  • Method Details

    • getGain

      public LinkedHashMap<String,Double> getGain()
      Gain measures the improvement in accuracy that a feature brings to the branches on which it appears. This represents the sum of situated marginal contributions that a given feature makes to the each branching chain in which it appears.
      Returns:
      Ordered map where the keys are feature names and the value is the gain, sorted descending
    • getGain

      public LinkedHashMap<String,Double> getGain(int numFeatures)
      Gain measures the improvement in accuracy that a feature brings to the branches on which it appears. This represents the sum of situated marginal contributions that a given feature makes to the each branching chain in which it appears. Returns only the top numFeatures features.
      Parameters:
      numFeatures - number of features to return
      Returns:
      Ordered map where the keys are feature names and the value is the gain, sorted descending
    • getCover

      public LinkedHashMap<String,Double> getCover()
      Cover measures the number of examples a given feature discriminates across, relative to the total number of examples all features discriminate across.
      Returns:
      Ordered map where the keys are feature names and the value is the cover, sorted descending
    • getCover

      public LinkedHashMap<String,Double> getCover(int numFeatures)
      Cover measures the number of examples a given feature discriminates across, relative to the total. number of examples all features discriminate across. Returns only the top numFeatures features.
      Parameters:
      numFeatures - number of features to return
      Returns:
      Ordered map where the keys are feature names and the value is the cover, sorted descending
    • getWeight

      public LinkedHashMap<String,Double> getWeight()
      Weight measures the number a times a feature occurs in the model. Due to the way the model builds trees, this value is skewed in favor of continuous features.
      Returns:
      Ordered map where the keys are feature names and the value is the weight, sorted descending
    • getWeight

      public LinkedHashMap<String,Double> getWeight(int numFeatures)
      Weight measures the number a times a feature occurs in the model. Due to the way the model builds trees, this value is skewed in favor of continuous features. Returns only the top numFeatures features.
      Parameters:
      numFeatures - number of features to return
      Returns:
      Ordered map where the keys are feature names and the value is the weight, sorted descending
    • getTotalGain

      public LinkedHashMap<String,Double> getTotalGain()
      Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.
      Returns:
      Ordered map where the keys are feature names and the value is the total gain, sorted descending
    • getTotalGain

      public LinkedHashMap<String,Double> getTotalGain(int numFeatures)
      Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed. Returns only top numFeatures features.
      Parameters:
      numFeatures - number of features to return
      Returns:
      Ordered map where the keys are feature names and the value is the total gain, sorted descending
    • getTotalCover

      public LinkedHashMap<String,Double> getTotalCover()
      Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.
      Returns:
      Ordered map where the keys are feature names and the value is the total gain, sorted descending
    • getTotalCover

      public LinkedHashMap<String,Double> getTotalCover(int numFeatures)
      Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed. Returns only top numFeatures features.
      Parameters:
      numFeatures - number of features to return
      Returns:
      Ordered map where the keys are feature names and the value is the total gain, sorted descending
    • getImportances

      Gets all the feature importances for all the features.
      Returns:
      records of all importance metrics for each feature, sorted by gain.
    • getImportances

      public List<XGBoostFeatureImportance.XGBoostFeatureImportanceInstance> getImportances(int numFeatures)
      Gets the feature importances for the top n features sorted by gain.
      Parameters:
      numFeatures - number of features to return
      Returns:
      records of all importance metrics for each feature, sorted by gain.
    • toString

      public String toString()
      Overrides:
      toString in class Object