k-means算法(python)

2017-09-13 20:35:56来源:CSDN作者:guang_mang人点击

分享

k-means算法

k-means算法是聚类算法,属于无监督学习,是没有标签(分类)的学习,聚类的目的是找到每个类潜在的类别y,并且将同个类别的样本放在一起,这样就是每个类里面的样本相互间的距离比较近,但是就是各个类之间的距离比较远,通过减小每个类里面的样本的相互距离,然后增大不同类别之间的距离,更好的聚类

实现过程

(1)随机选取k个聚类质心点(2)计算每个样本点到每个质心点的距离(3)更新质心点为,属于这个质心点的所有样本点的平均值(4)直到质心点的位置不变

代码
#coding:utf-8from numpy import *import matplotlib.pyplot as pltdef loadDataset(filename):    data = open(filename, 'r')    Data = []    for line in data:        Line = line.strip().split('/t')        fileline = map(float, Line)        Data.append(fileline)    return Datadef distEclud(veca, vecb):#欧式距离    return sqrt(sum(power(veca - vecb, 2)))# def loss_price(dataset, centroids):#     loss = 0#     for i in centroids:#         for j in dataset:#             loss_price = distEclud(i, j)#             loss += loss_price#     return lossdef randCent(dataset, k):#分为2类,随机找到2个中心    n = shape(dataset)[1]#列数    cnetroids = mat(zeros((k, n)))#    for j in range(n):#找出每一列的最大值和最小值的差        minj = min(dataset[:, j])        maxj = max(dataset[:, j])        rangej = float(maxj - minj)        cnetroids[:, j] = minj +rangej * random.rand(k, 1)    return cnetroidsdef kmeans(dataset, k, distmeans = distEclud, creatCent = randCent):    m = shape(dataset)[0]    clusterAssment = mat(zeros((m, 2)))#转化矩阵    centroids = creatCent(dataset, k)#找到2个中心点,返回矩阵    panbie = True    minprice = 100000000    while panbie:        # print centroids        loss_price = 0        # print minprice        for i in range(m):            minDist = inf            minIndex = -1            for j in range(k):                distJI = distmeans(centroids[j, :], dataset[i, :])  # 找出每个特征点属于的标签                if distJI < minDist:                    minDist = distJI                    minIndex = j            clusterAssment[i, :] = minIndex, minDist ** 2  # 每个特征属于的类和欧氏距离            # print clusterAssment        for cent in range(k):            # print cent            ptInclust = dataset[nonzero(clusterAssment[:, 0].A == cent)[0]]  # nonzero返回非cent元素的索引(包括纬度)            centroids[cent, :] = mean(ptInclust, axis=0)            loss = sum(dataset[nonzero(clusterAssment[:, 0].A == cent)[0]] * (centroids[cent, :].transpose()), axis=0)            loss_price += loss        # print centroids        # print loss_price        if loss_price < minprice:            print minprice            print loss_price            minprice = loss_price        else:            print "tiaochu"            break    return centroids, clusterAssmentdatMat = mat(loadDataset(r'D:/PythonDDD/shuju files/4k2_far1.txt'))myCentroids, clusterAssing = kmeans(datMat, 2)x = myCentroids[:, 0],x1 = datMat[:, 0]y = myCentroids[:, 1]y1 = datMat[:, 1]plt.scatter(x1, y1)plt.scatter(x, y)plt.show()# print myCentroids# print clusterAssing




最新文章

123

最新摄影

微信扫一扫

第七城市微信公众平台