评分卡模型开发-定量特征筛选

2018-02-27 11:10:53来源:https://www.jianshu.com/p/6bba76e750f4作者:鸣人吃土豆人点击

``import pandas as pdimport osos.chdir("C://Users//Administrator//OneDrive//步履不停//评分卡制作//数据")df = pd.read_csv(".//GermanCredit.csv",index_col=0)df.head()#将违约样本用"1"表示，正常样本用0表示import numpy as npdf['credit_risk'] = df['credit_risk'].apply(lambda x:np.where(x=='good',0,1))#获取定量指标df.info()continuous_vars = []category_vars = []for i in df.columns:     if df[i].dtype=='int64': #判断条件依据df.info()的结果而定        continuous_vars.append(i)    else:        category_vars.append(i) X = df.loc[:,continuous_vars[:-1]]X.head()y = df.loc[:,continuous_vars[-1]]y.head()``

``from sklearn.ensemble import RandomForestClassifier#无需对基于树的模型做标准化或归一化处理forest = RandomForestClassifier(n_estimators=10000,random_state=0,n_jobs=-1)forest.fit(X,y)importances=forest.feature_importances_importances``

``array([ 0.18996948,  0.34514053,  0.06920705,  0.07587584,  0.2470823 ,        0.04564897,  0.02707582])``

``indices=np.argsort(importances)[::-1]feat_labels=X.columnsfor f in range(X.shape[1]):    print("%2d) %-*s %f " %(f+1,30,feat_labels[f],importances[indices[f]]))``

``1) duration                       0.345141  2) amount                         0.247082  3) installment_rate               0.189969  4) present_residence              0.075876  5) age                            0.069207  6) number_credits                 0.045649  7) people_liable                  0.027076 ``

``import matplotlib.pyplot as plt%matplotlib inlineplt.title('Feature Importances')plt.bar(range(X.shape[1]),importances[indices],color='lightblue',align='center')plt.xticks(range(X.shape[1]),feat_labels,rotation=90)plt.xlim([-1,X.shape[1]])plt.tight_layout()``

1.png