Package org.tribuo.common.xgboost
Class XGBoostFeatureImportance
java.lang.Object
org.tribuo.common.xgboost.XGBoostFeatureImportance
Generate and collate feature importance information from the XGBoost model. This wraps the underlying functionality
of the XGBoost model, and should provide feature importance metrics compatible with those provided by XGBoost's R
and Python APIs. For a more treatment of what the different importance metrics mean and how to interpret them, see
here. In brief
- Gain measures the improvement in accuracy that a feature brings to the branches on which it appears. This represents the sum of situated marginal contributions that a given feature makes to the each branching chain in which it appears.
- Cover measures the number of examples a given feature discriminates across, relative to the total number of examples all features discriminate across.
- Weight measures the number a times a feature occurs in the model. Due to the way the model builds trees, this value is skewed in favor of continuous features.
- Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.
- Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
An instance of feature importance values for a single feature. -
Method Summary
Modifier and TypeMethodDescriptiongetCover()
Cover measures the number of examples a given feature discriminates across, relative to the total number of examples all features discriminate across.getCover
(int numFeatures) Cover measures the number of examples a given feature discriminates across, relative to the total.getGain()
Gain measures the improvement in accuracy that a feature brings to the branches on which it appears.getGain
(int numFeatures) Gain measures the improvement in accuracy that a feature brings to the branches on which it appears.Gets all the feature importances for all the features.getImportances
(int numFeatures) Gets the feature importances for the top n features sorted by gain.Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.getTotalCover
(int numFeatures) Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.getTotalGain
(int numFeatures) Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.Weight measures the number a times a feature occurs in the model.getWeight
(int numFeatures) Weight measures the number a times a feature occurs in the model.toString()
-
Method Details
-
getGain
Gain measures the improvement in accuracy that a feature brings to the branches on which it appears. This represents the sum of situated marginal contributions that a given feature makes to the each branching chain in which it appears.- Returns:
- Ordered map where the keys are feature names and the value is the gain, sorted descending
-
getGain
Gain measures the improvement in accuracy that a feature brings to the branches on which it appears. This represents the sum of situated marginal contributions that a given feature makes to the each branching chain in which it appears. Returns only the top numFeatures features.- Parameters:
numFeatures
- number of features to return- Returns:
- Ordered map where the keys are feature names and the value is the gain, sorted descending
-
getCover
Cover measures the number of examples a given feature discriminates across, relative to the total number of examples all features discriminate across.- Returns:
- Ordered map where the keys are feature names and the value is the cover, sorted descending
-
getCover
Cover measures the number of examples a given feature discriminates across, relative to the total. number of examples all features discriminate across. Returns only the top numFeatures features.- Parameters:
numFeatures
- number of features to return- Returns:
- Ordered map where the keys are feature names and the value is the cover, sorted descending
-
getWeight
Weight measures the number a times a feature occurs in the model. Due to the way the model builds trees, this value is skewed in favor of continuous features.- Returns:
- Ordered map where the keys are feature names and the value is the weight, sorted descending
-
getWeight
Weight measures the number a times a feature occurs in the model. Due to the way the model builds trees, this value is skewed in favor of continuous features. Returns only the top numFeatures features.- Parameters:
numFeatures
- number of features to return- Returns:
- Ordered map where the keys are feature names and the value is the weight, sorted descending
-
getTotalGain
Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.- Returns:
- Ordered map where the keys are feature names and the value is the total gain, sorted descending
-
getTotalGain
Total Gain is similar to gain, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed. Returns only top numFeatures features.- Parameters:
numFeatures
- number of features to return- Returns:
- Ordered map where the keys are feature names and the value is the total gain, sorted descending
-
getTotalCover
Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed.- Returns:
- Ordered map where the keys are feature names and the value is the total gain, sorted descending
-
getTotalCover
Total Cover is similar to cover, but not locally averaged by weight, and thus not skewed in the way that weight can be skewed. Returns only top numFeatures features.- Parameters:
numFeatures
- number of features to return- Returns:
- Ordered map where the keys are feature names and the value is the total gain, sorted descending
-
getImportances
Gets all the feature importances for all the features.- Returns:
- records of all importance metrics for each feature, sorted by gain.
-
getImportances
public List<XGBoostFeatureImportance.XGBoostFeatureImportanceInstance> getImportances(int numFeatures) Gets the feature importances for the top n features sorted by gain.- Parameters:
numFeatures
- number of features to return- Returns:
- records of all importance metrics for each feature, sorted by gain.
-
toString
-