Coursera-Machine Learning 之 Logistic Regression (逻辑回归)-0x02

原文首发于个人CSDN博客:
http://blog.csdn.net/kalenzh/article/details/43817321

Cost Function

Training set(训练集):
$\{(x^{(1)}, (y^{(1)}), (x^{(2)}, (y^{(2)}), ... ,(x^{(m)}, (y^{(m)})\}$

m 个训练样本

$$x \in \begin{bmatrix} x_0\\ x_1\\ ...\\ x_n\\ \end{bmatrix} \space x_0 = 1, y \in \{0,1\}$$ $h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}$

如何选择拟合参数 $\theta$ ?

代价函数

线性回归:
$J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}\frac{1}{2}(h_\theta(x^{(i)}) - y^{(i)})^2$

$Cost(h_\theta(x^{(i)}) , y^{(i)}) = \frac{1}{2}(h_\theta(x^{(i)}) - y^{(i)})^2$

Logistic 回归:
$$Cost(h_\theta(x^{(i)}) , y^{(i)}) = \begin{cases} -log(h_\theta(x)) & \text{if}\space y = 1\\ -log(1 - h_\theta(x)) & \text{if}\space y = 0 \end{cases}$$
Note: $y = 0 \ \text{or}\ 1\ \text{always}$

结合函数图像比较好理解。

Simplified cost function and gradient descent

$Cost(h_\theta(x) , y) = -y log(h_\theta(x)) - (1 - y)log(1 - h_\theta(x))$ $J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}Cost(h_\theta(x^{(i)}), y^{(i)}) = -\frac{1}{m}[\sum\limits_{i = 1}^{m} y^{(i)}log\space h_\theta(x^{(i)}) + (1 - y^{(i)})log(1 - h_\theta(x^{(i)})]$

拟合参数 $\theta$:

$\min\limits_{\theta}J(\theta)$

针对一个新的 $x$ 预测输出值:

Output $h_\theta(x) = \frac{1}{1+e^{-\theta^T x}}$

want $\min\limits_{\theta}J(\theta)$:

Repeat {
$\theta_j := \theta_j -\alpha\frac{\partial}{\partial\theta_j}J(\theta)$
}

$\frac{\partial}{\partial\theta_j}J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}(h_\theta(x^{(i)}), y^{(i)})x_j^{(i)}$

Advanced Optimization(高级优化)

Optimization algorithm

Gradient descent(梯度下降)
Conjugate gradient(共轭梯度法)
BFGS(变尺度法)
L-BFGS(限制变尺度法)

后三种算法优点:
不需要手动选择学习率
一般情况下比梯度下来收敛得更快
缺点:更加复杂

Example:
$$\theta = \begin{bmatrix} \theta_0\\ \theta_1\\ \end{bmatrix}$$

$J(\theta) = (\theta_1 - 5)^2 + (\theta_2 - 5)^2$ $\frac{\partial}{\partial\theta_1}J(\theta) = 2(\theta_1 - 5)$ $\frac{\partial}{\partial\theta_2}J(\theta) = 2(\theta_2 - 5)$
1
2
3
4
5
6
7
8
9
10
function [jVal, gradient] = costFunction(theta)
truejVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;
truegradient = zeros(2, 1);
truegradient(1) = 2*(theta(1) - 5);
truegradient(2) = 2*(theta(2) - 5);

options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTeta = zeros(2,1);
[optTheta, functionVal, exitFlag]
true= fminunc(@costFunction, initialTheta, options);

Multiclass Classification: One-vs-all

One-vs-all(one-vs-rest)

$h_\theta^{(i)}(x) = P(y = i | x;\theta)\space(x = 1,2,3)$

给定新的输入 $x$ 值,选择对应类别:

$\max\limits_{i} h_\theta^{(i)}(x)$