XgboostのFeature Imporanceをラベル付きで表示する方法
特定の変数や上位N件だけ表示など,plot_importance関数を使わずにFeature Importanceを表示する方法.
# plot_feature_importance_with_label.py import operator from sklearn.datasets import load_iris import pandas as pd import matplotlib.pyplot as plt import xgboost as xgb def create_feature_map(features): with open('xgb.fmap', 'w') as fp: for i, feat in enumerate(features): fp.write('%d\t%s\tq\n' % (i, feat)) def run(): iris = load_iris() dtrain = xgb.DMatrix(data = iris.data, label=iris.target) params = {'objective': 'multi:softprob', 'num_class': len(iris.feature_names)} model = xgb.train(params, dtrain) create_feature_map([_.replace(" ", "") for _ in iris.feature_names]) importance = model.get_fscore(fmap='xgb.fmap') importance = sorted(importance.items(), key=operator.itemgetter(1)) df = pd.DataFrame(importance, columns=['feature', 'fscore']) df['fscore'] = df['fscore'] / df['fscore'].sum() plt.rcParams['font.size'] = 9 df.plot(kind='barh', x='feature', y='fscore', legend=False) plt.title('XGBoost Feature Importance') plt.xlabel('relative importance') plt.show() if __name__ == "__main__": run()
結果
$python plot_feature_importance_with_label.py
$ cat xgb.fmap 0 sepallength(cm) q 1 sepalwidth(cm) q 2 petallength(cm) q 3 petalwidth(cm) q
References
- Kaggle - XGB Feature Importance (Python), https://www.kaggle.com/mmueller/liberty-mutual-group-property-inspection-prediction/xgb-feature-importance-python/code
- GitHub - xgboost, https://github.com/dmlc/xgboost/blob/6750c8b74316cc41a74b9845951b4edc3f0f1b2d/python-package/xgboost/core.py
- Python XGBoost の変数重要度プロット / 可視化の実装 - StatsFragments
- Python XGBoost + pandas 連携の改善 - StatsFragments