# Andrew NG 机器学习 笔记-week1-单变量线性回归

2018-02-27 11:48:23来源:http://blog.csdn.net/zxm1306192988/article/details/77116942作者:zxm1306192988人点击

## 一、Introduction

### 1.1 Welcome

What is Machine Learning

Grew out of work in AI（机器学习源于人工智能领域）
New capacity for computers（ML 已经发展成为计算机的一项新能力）

Examples:(机器学习应用实例)

Database mining Large datasets from growth of automation/web E.g.,Web click data,medical records,biology,engineering
Applications can’t program by hand E.g.,Autonomous helicopter,handwriting recognition,most of Natural language Processing(NLP),Computer Vision.
Self-customizing programs(自定制化程序) E.g.,Amazon,Netfix,iTunes Genius product recommendations(产品推荐)
Understanding human learning （被用来理解人类的学习和了解大脑）

### 1.2 What is ML？

Arthur Samuel（1959）对ML的定义：Field of study that gives computers the ability to learn without being explicitly programmed.(不是靠明确的编程，而是赋予计算机自学能力的研究领域)
Tom Mitchell（1988）Well-posed Learning Problem：A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T , as measured by P, improves with experience E . (对学习问题的合适定义：一个计算机程序在性能指标P的检验下，表现为通过经验E，使得处理任务T的性能有所提高，我们就说他关于P和T学习了E) Example: playing checkers. E = the experience of playing many games of checkers T = the task of playing checkers. P = the probability that the program will win the next game.

ML algorithms:(机器学习分类)

Supervised learning（教计算机如何学习）
Unsupervised learning（让计算机自己进行学习）

### 1.3 Supervised learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.监督学习：需要数据集已经有正确答案，而算出更多正确答案的算法。

In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.这个例子是回归问题（regression problem）：如果要预测的值是连续的比如上述的房价,那么就属于回归问题。

In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.这个例子是分类问题（classification problem）：如果要预测的值是离散的即一个个标签,那么就属于分类问题。

### 1.4 Unsupervised learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.监督学习的数据集中没有任何的标签。

We can derive this structure by clustering the data based on relationships among the variables in the data.监督学习算法可能会把这些数据分成两个不同的簇。所以叫做聚类算法

With unsupervised learning there is no feedback based on the prediction results.

## 二、模型和代价函数（Model and Cost Function）

### 2.1 模型表示

m 代表训练集中实例的数量x 代表特征/输入变量y 代表目标变量/输出变量(x,y) 代表训练集中的实例(x(i) ,y(i)) 代表第 i 个观察实例h 代表学习算法的解决方案或函数也称为假设（hypothesis）

### 2.2 代价函数（Cost Function）

We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x’s and the actual output y’s.

J(θ0,θ1)=12mi=1m(y^iyi)2=12mi=1m(hθ(xi)yi)2

To break it apart, it is 12x¯ where x¯ is the mean of the squares of hθ(xi)yi, or the difference between the predicted value and the actual value.

This function is otherwise called the “Squared error function”(平方误差函数), or “Mean squared error”(MSE,均方误差). The mean is halved (12) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the 12term. 平均值被分成12，是为了计算梯度下降时方便，平方的倒数将会消去12

## 三、参数训练（Parameter Learning）

### 3.1 梯度下降

Have some function：J(θ0,θ1) Want：minθ0,θ1 J(θ0,θ1)

Keep changing θ0,θ1 to reduce J(θ0,θ1) until we hopefully end up at a minimum

## 四、线性代数回顾（Linear Algebra Review）

### 4.1 矩阵和向量(Matrices and Vectors)

Matrices are 2-dimensional arrays:adgjbehkcfilThe above matrix has four rows and three columns, so it is a 4 x 3 matrix.矩阵的维数即行数×列数

wxyzA vector is a matrix with one column and many rows:

Notation and terms:

Aij refers to the element in the ith row and jth column of matrix A.
A vector with ‘n’ rows is referred to as an ‘n’-dimensional vector.
vi refers to the element in the ith row of the vector.
In general, all our vectors and matrices will be 1-indexed. Note that for some programming languages, the arrays are 0-indexed.(第一个元素从0开始还是从1开始)
Matrices are usually denoted by uppercase names while vectors are lowercase.
“Scalar”（数量） means that an object is a single value, not a vector（矢量） or matrix.
R refers to the set of scalar real numbers.
Rn refers to the set of n-dimensional vectors of real numbers.
``% The ; denotes we are going back to a new row.A = [1, 2, 3; 4, 5, 6; 7, 8, 9; 10, 11, 12]% Initialize a vector v = [1;2;3]% Get the dimension of the matrix A where m = rows and n = columns[m,n] = size(A)% You could also store it this waydim_A = size(A)% Get the dimension of the vector v dim_v = size(v)% Now let's index into the 2nd row 3rd column of matrix AA_23 = A(2,3)``
``A =    1    2    3    4    5    6    7    8    9   10   11   12v =   1   2   3m =  4n =  3dim_A =   4   3dim_v =   3   1A_23 =  6``

### 4.2 加法和标量乘法(Addition and Scalar Multiplication)

[acbd]+[wyxz]=[a+wc+yb+xd+z]

[acbd][wyxz]=[awcybxdz]

[acbd]x=[axcxbxdx]

[acbd]/x=[a/xc/xb/xd/x]

``% Initialize matrix A and B A = [1, 2, 4; 5, 3, 2]B = [1, 3, 4; 1, 1, 1]% Initialize constant s s = 2% See how element-wise addition worksadd_AB = A + B% See how element-wise subtraction workssub_AB = A - B% See how scalar multiplication worksmult_As = A * s% Divide A by sdiv_As = A / s% What happens if we have a Matrix + scalar?add_As = A + s``
``A =   1   2   4   5   3   2B =   1   3   4   1   1   1s =  2add_AB =   2   5   8   6   4   3sub_AB =   0  -1   0   4   2   1mult_As =    2    4    8   10    6    4div_As =   0.50000   1.00000   2.00000   2.50000   1.50000   1.00000add_As =   3   4   6   7   5   4``

### 4.3 矩阵向量乘法(Matric Vector Multiplication)

acebdf[xy]=ax+bycx+dyex+fy

``% Initialize matrix AA = [1, 2, 3; 4, 5, 6;7, 8, 9]% Initialize vector vv = [1; 1; 1]% Multiply A * vAv = A * v``
``A =   1   2   3   4   5   6   7   8   9v =   1   1   1Av =    6   15   24``

### 4.4 矩阵乘法(Matirc Matirc Multiplication)

acebdf[wyxz]=aw+bycw+dyew+fyax+bzcx+dzex+fz

``% Initialize a 3 by 2 matrixA = [1, 2; 3, 4;5, 6]% Initialize a 2 by 1 matrixB = [1; 2]% We expect a resulting matrix of (3 by 2)*(2 by 1) = (3 by 1)mult_AB = A*B% Make sure you understand why we got that result``
``A =   1   2   3   4   5   6B =   1   2mult_AB =    5   11   17``

### 4.5 矩阵乘法的性质(Matirc Multiplication Properties)

Matrices are not commutative : A∗B≠B∗A 不满足交换律Matrices are associative: (A∗B)∗C=A∗(B∗C) 满足结合律

The identity matrix（单位矩阵）, when multiplied by any matrix of the same dimensions, results in the original matrix. It’s just like multiplying numbers by 1. The identity matrix simply has 1’s on the diagonal (upper left to lower right diagonal) and 0’s elsewhere.100010001单位矩阵一般用 I 或者 E 表示，满足交换律：AI=IA=A

``% Initialize random matrices A and BA = [1,2;4,5]B = [1,1;0,2]% Initialize a 2 by 2 identity matrixI = eye(2)% The above notation is the same as I = [1,0;0,1]% What happens when we multiply I*A ?IA = I*A% How about A*I ?AI = A*I% Compute A*BAB = A*B% Is it equal to B*A?BA = B*A% Note that IA = AI but AB != BA``
``A =   1   2   4   5B =   1   1   0   2I =Diagonal Matrix   1   0   0   1IA =   1   2   4   5AI =   1   2   4   5AB =    1    5    4   14BA =    5    7    8   10``

### 4.6 逆、转置(Inverse and Transpose)

A=acebdf

AT=[abcdef]

Aij=ATji

``% Initialize matrix AA = [1,2,0;0,5,6;7,0,9]% Transpose AA_trans = A'% Take the inverse of AA_inv = inv(A)% What is A^(-1)*A?A_invA = inv(A)*A``
``A =   1   2   0   0   5   6   7   0   9A_trans =   1   0   7   2   5   0   0   6   9A_inv =   0.348837  -0.139535   0.093023   0.325581   0.069767  -0.046512  -0.271318   0.108527   0.038760A_invA =   1.00000  -0.00000   0.00000   0.00000   1.00000  -0.00000  -0.00000   0.00000   1.00000``