原文首发于个人CSDN博客:
http://blog.csdn.net/kalenzh/article/details/43817321
Cost Function
Training set(训练集):
$\{(x^{(1)}, (y^{(1)}), (x^{(2)}, (y^{(2)}), ... ,(x^{(m)}, (y^{(m)})\}$
m 个训练样本
$$x \in \begin{bmatrix} x_0\\ x_1\\ ...\\ x_n\\ \end{bmatrix} \space x_0 = 1, y \in \{0,1\}$$ $h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}$如何选择拟合参数 $\theta$ ?
代价函数
线性回归:
$J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}\frac{1}{2}(h_\theta(x^{(i)}) - y^{(i)})^2$
Logistic 回归:
$$Cost(h_\theta(x^{(i)}) , y^{(i)}) =
\begin{cases}
-log(h_\theta(x)) & \text{if}\space y = 1\\
-log(1 - h_\theta(x)) & \text{if}\space y = 0
\end{cases}$$
Note: $y = 0 \ \text{or}\ 1\ \text{always}$
结合函数图像比较好理解。
Simplified cost function and gradient descent
$Cost(h_\theta(x) , y) = -y log(h_\theta(x)) - (1 - y)log(1 - h_\theta(x))$ $J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}Cost(h_\theta(x^{(i)}), y^{(i)}) = -\frac{1}{m}[\sum\limits_{i = 1}^{m} y^{(i)}log\space h_\theta(x^{(i)}) + (1 - y^{(i)})log(1 - h_\theta(x^{(i)})]$拟合参数 $\theta$:
$\min\limits_{\theta}J(\theta)$针对一个新的 $x$ 预测输出值:
Output $h_\theta(x) = \frac{1}{1+e^{-\theta^T x}}$
want $\min\limits_{\theta}J(\theta)$:
Repeat {
$\theta_j := \theta_j -\alpha\frac{\partial}{\partial\theta_j}J(\theta)$
}
Advanced Optimization(高级优化)
Optimization algorithm
Gradient descent(梯度下降)
Conjugate gradient(共轭梯度法)
BFGS(变尺度法)
L-BFGS(限制变尺度法)
后三种算法优点:
不需要手动选择学习率
一般情况下比梯度下来收敛得更快
缺点:更加复杂
Example:
$$\theta =
\begin{bmatrix}
\theta_0\\
\theta_1\\
\end{bmatrix}$$
1 | function [jVal, gradient] = costFunction(theta) |
Multiclass Classification: One-vs-all
One-vs-all(one-vs-rest)
$h_\theta^{(i)}(x) = P(y = i | x;\theta)\space(x = 1,2,3)$给定新的输入 $x$ 值,选择对应类别:
$\max\limits_{i} h_\theta^{(i)}(x)$