キャリブレーション(Probability Calibration)
キャリブレーション(Probability Calibration)とは
モデルによって算出された予測確率を本来の確率に近づける手法.
例えば20%の確率で正となるようなデータを学習させてみたとする.
理想の予測確率は0.2ですが,実際は0.3となるなどずれてしまうことがある.
複数のモデルを作ることで,本来の確率に近づけようとするのがキャリブレーション.
結果として損失関数値を低くする効果がある.
また,バギングに近い効果もあってか,色々と性能が上がる.
Random ForestにCalibrationを使った例
from sklearn.datasets import make_hastie_10_2 from sklearn.ensemble import RandomForestClassifier from sklearn.cross_validation import train_test_split from sklearn.calibration import CalibratedClassifierCV from sklearn.metrics import (brier_score_loss, precision_score, recall_score, f1_score) def run(): X, y= make_hastie_10_2() X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) for description, clf in [ ("Random Forest Without Calibration", RandomForestClassifier()), ("Random Forest With Calibration", CalibratedClassifierCV(RandomForestClassifier(), cv=5, method='isotonic'))]: clf.fit(X_train, y_train) y_preda = clf.predict_proba(X_test)[:, 1] y_pred = clf.predict(X_test) print description print "\tBrier:", brier_score_loss(y_test, y_preda) print "\tPrecision:", precision_score(y_test, y_pred) print "\tRecall:", recall_score(y_test, y_pred) print "\tF1:", f1_score(y_test, y_pred) if __name__ == "__main__": run()
Random Forest Without Calibration Brier: 0.113356060606 Precision: 0.835116731518 Recall: 0.859359359359 F1: 0.847064627528 Random Forest With Calibration Brier: 0.089109155874 Precision: 0.843904633378 Recall: 0.938938938939 F1: 0.888888888889
References
[1]. scikit learn - 1.16. Probability calibration, http://scikit-learn.org/stable/modules/calibration.html
[2]. Don't Miss These Scripts: Otto Group Product Classification, http://blog.kaggle.com/2015/06/15/dont-miss-these-scripts-otto-group-product-classification/
[3]. 6 Tricks I Learned From The OTTO Kaggle Challenge, https://medium.com/@chris_bour/6-tricks-i-learned-from-the-otto-kaggle-challenge-a9299378cd61#.1nclulkyi