# Redis-ML简介（第5部分）

2018-01-30 10:41:59来源:https://cloud.tencent.com/developer/article/1029707作者:极客头条人点击

An Introduction to Redis-ML (Part 5)

Redis-ML简介（第5部分）

1912年4月15日，泰坦尼克号在与冰山相撞后在北大西洋沉没。由于碰撞造成1500多名乘客死亡，成为近代历史上最致命商业领域的海上灾害之一。尽管在灾难中存活包含一些幸运的因素，但从数据来看，某些特殊能力可以让一些旅客群体比其他旅客更有可能生存。

pip install panda

(这14组数据与上面所说的14个域是一一对应的，译者注)

pclass 1309
survived 1309
name 1309
sex1309
age1046
sibsp1309
parch1309
ticket 1309
fare 1308
cabin 295
embarked 1307
boat486
body121
home.dest 745

import pandas as pd# load data from excel orig_df = pd.read_excel('titanic3.xls', 'titanic3', index_col=None)
＃删除我们不打算处理的列，删除丢失数据的行
df = orig_df.drop(["name", "ticket", "body", "cabin", "boat", "home.dest"], axis=1) df = df.dropna()

from sklearn import preprocessing
# convert enumerated columns (sex,)
encoder = preprocessing.LabelEncoder()
df.sex = encoder.fit_transform(df.sex)
df.embarked = encoder.fit_transform(df.embarked)

survived age sibsp parch fare pclass sex 1 female 0.961832 36.839695 0.564885 0.511450 112.485402 male 0.350993 41.029250 0.403974 0.331126 74.818213 2 female 0.893204 27.499191 0.514563 0.669903 23.267395 male 0.145570 30.815401 0.354430 0.208861 20.934335 3 female 0.473684 22.185307 0.736842 0.796053 14.655758 male 0.169540 25.863027 0.488506 0.287356 12.103374

X = df.drop(['survived'], axis=1).values
Y = df['survived'].values
X_train = X[:-20]
X_test = X[-20:]
Y_train = Y[:-20]
Y_test = Y[-20:]

＃创建真实的分类器深度= 10

# Create the real classifier
depth=10
cl_tree = tree.DecisionTreeClassifier(max_depth=10, random_state=0)
cl_tree.fit(X_train, Y_train)

Redis-ML模块提供了两个用于处理随机森林的命令： ML.FOREST.ADD 命令在森林的上下文中创建决策树, ML.FOREST.RUN 命令使用随机森林评估数据节点。这些 ML.FOREST 命令具有以下语法：

ML.FOREST.ADD key tree path ((NUMERIC|CATEGORIC) attr val | LEAF val [STATS]) [...] ML.FOREST.RUN key sample (CLASSIFICATION|REGRESSION)

# scikit represents decision trees using a set of arrays,
# create references to make the arrays easy to access
the_tree = cl_tree
t_nodes = the_tree.tree_.node_count
t_left = the_tree.tree_.children_left
t_right = the_tree.tree_.children_right
t_feature = the_tree.tree_.feature
t_threshold = the_tree.tree_.threshold
t_value = the_tree.tree_.value
feature_names = df.drop(['survived'], axis=1).columns.values
# create a buffer to build up our command
forrest_cmd = StringIO()
# Traverse the tree starting with the root and a path of "."
stack = [ (0, ".") ]
while len(stack) > 0:
node_id, path = stack.pop()
# splitter node -- must have 2 children (pre-order traversal)
if (t_left[node_id] != t_right[node_id]):
stack.append((t_right[node_id], path + "r"))
stack.append((t_left[node_id], path + "l"))
cmd = "{} NUMERIC {} {} ".format(path, feature_names[t_feature[node_id]], t_threshold[node_id])
forrest_cmd.write(cmd)
else:
cmd = "{} LEAF {} ".format(path, np.argmax(t_value[node_id]))
forrest_cmd.write(cmd)
# execute command in Redis
r = redis.StrictRedis('localhost', 6379)
r.execute_command(forrest_cmd.getvalue())

#generate a vector of scikit-learn predictors
s_pred = cl_tree.predict(X_test)
# generate a vector of Redis predictions
r_pred = np.full(len(X_test), -1, dtype=int)
for i, x in enumerate(X_test):
cmd = "ML.FOREST.RUN titanic:tree "
# iterate over each feature in the test record to build up the
# feature:value pairs
for j, x_val in enumerate(x):
cmd += "{}:{},".format(feature_names[j], x_val)
cmd = cmd[:-1]
r_pred[i] = int(r.execute_command(cmd))

Y_test: [0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0]
r_pred: [1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
s_pred: [1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]

Redis的预测与scikit-learn软件包的预测相同，包括测试第0项和第14项的错误分类。