当前位置: 首页 > news >正文

机器学习笔记之线性分类——高斯判别分析(二)最优参数求解

机器学习笔记之线性分类——高斯判别分析之最优参数求解

引言

上一节介绍了高斯判别分析(Gaussain Discriminant Analysis)的策略构建思路,本节将基于该策略,对概率分布最优参数进行求解

回顾:高斯判别分析的策略构建思路

高斯判别分析典型的概率生成模型,其核心操作是将求解最优后验概率通过贝叶斯定理转化为先验概率分布与似然的乘积形式
Y p r e d ^ = arg ⁡ max ⁡ Y p r e d ∈ { 0 , 1 } P ( Y p r e d = i ∣ X ) ∝ arg ⁡ max ⁡ Y ∈ { 0 , 1 } P ( X ∣ Y ) P ( Y ) \begin{aligned}\hat {\mathcal Y_{pred}} & = \mathop{\arg\max}\limits_{\mathcal Y_{pred} \in \{0,1\}} P(\mathcal Y_{pred} = i \mid \mathcal X) \\ & \propto \mathop{\arg\max}\limits_{\mathcal Y \in \{0,1\}}P(\mathcal X \mid \mathcal Y)P(\mathcal Y) \end{aligned} Ypred^=Ypred{0,1}argmaxP(Ypred=iX)Y{0,1}argmaxP(XY)P(Y)
基于二分类假设,令 Y \mathcal Y Y服从 伯努利分布,则先验分布 P ( Y ) P(\mathcal Y) P(Y)概率密度函数表示如下:
P ( Y ) = ϕ Y ( 1 − ϕ ) 1 − Y P(\mathcal Y) = \phi^{\mathcal Y}(1 - \phi)^{1 - \mathcal Y} P(Y)=ϕY(1ϕ)1Y
其中 ϕ \phi ϕ表示 Y \mathcal Y Y选择标签 1 1 1时的概率结果;在给定先验分布 P ( Y ) P(\mathcal Y) P(Y)条件下,令各类标签对应的似然 P ( X ∣ Y = 1 ) , P ( X ∣ Y = 0 ) P(\mathcal X \mid \mathcal Y=1),P(\mathcal X \mid \mathcal Y = 0) P(XY=1),P(XY=0)均服从高斯分布
{ X ∣ Y = 1 ∼ N ( μ 1 , Σ ) X ∣ Y = 0 ∼ N ( μ 2 , Σ ) \begin{cases}\mathcal X \mid \mathcal Y=1 \sim \mathcal N(\mu_1,\Sigma) \\ \mathcal X \mid \mathcal Y=0 \sim \mathcal N(\mu_2,\Sigma) \end{cases} {XY=1N(μ1,Σ)XY=0N(μ2,Σ)
将上述逻辑合并,使用同一公式进行表示:
P ( X ∣ Y ) = N ( μ 1 , Σ ) Y N ( μ 2 , Σ ) 1 − Y P(\mathcal X \mid \mathcal Y) = \mathcal N(\mu_1,\Sigma)^{\mathcal Y} \mathcal N(\mu_2,\Sigma)^{1 - \mathcal Y} P(XY)=N(μ1,Σ)YN(μ2,Σ)1Y

至此,先验概率 P ( Y ) P(\mathcal Y) P(Y)似然 P ( X ∣ Y ) P(\mathcal X \mid \mathcal Y) P(XY)均设定完毕,并包含四个 概率分布参数
θ = { μ 1 , μ 2 , Σ , ϕ } \theta = \{\mu_1,\mu_2,\Sigma,\phi\} θ={μ1,μ2,Σ,ϕ}
似然函数 L ( θ ) \mathcal L(\theta) L(θ),似然函数表示如下:
注意:该函数本身时’联合概率分布‘,而不是纯粹的似然;
L ( θ ) = log ⁡ ∏ i = 1 N P ( x ( i ) , y ( i ) ) = log ⁡ ∏ i = 1 N P ( x ( i ) ∣ y ( i ) ) P ( y ( i ) ) = ∑ i = 1 N log ⁡ P ( x ( i ) ∣ y ( i ) ) + log ⁡ P ( y ( i ) ) \begin{aligned} \mathcal L(\theta) & = \log \prod_{i=1}^N P(x^{(i)},y^{(i)}) \\ & = \log \prod_{i=1}^N P(x^{(i)} \mid y^{(i)})P(y^{(i)}) \\ & = \sum_{i=1}^N \log P(x^{(i)} \mid y^{(i)}) + \log P(y^{(i)}) \end{aligned} L(θ)=logi=1NP(x(i),y(i))=logi=1NP(x(i)y(i))P(y(i))=i=1NlogP(x(i)y(i))+logP(y(i))
将上述分布带入 L ( θ ) \mathcal L(\theta) L(θ)
L ( θ ) = ∑ i = 1 N { log ⁡ [ N ( μ 1 , Σ ) y ( i ) N ( μ 2 , Σ ) 1 − y ( i ) ] + log ⁡ [ ϕ y ( i ) ( 1 − ϕ ) 1 − y ( i ) ] } = ∑ i = 1 N { log ⁡ [ N ( μ 1 , Σ ) y ( i ) ] + log ⁡ [ N ( μ 2 , Σ ) 1 − y ( i ) ] + log ⁡ [ ϕ y ( i ) ( 1 − ϕ ) 1 − y ( i ) ] } \begin{aligned} \mathcal L(\theta) & = \sum_{i=1}^N \left\{\log\left[\mathcal N(\mu_1,\Sigma)^{y^{(i)}}\mathcal N(\mu_2,\Sigma)^{1- y^{(i)}}\right] + \log \left[\phi^{y^{(i)}}(1- \phi)^{1 - y^{(i)}}\right]\right\} \\ & = \sum_{i=1}^N \left\{\log \left[\mathcal N(\mu_1,\Sigma)^{y^{(i)}}\right] + \log \left[\mathcal N(\mu_2,\Sigma)^{1 - y^{(i)}}\right] + \log \left[\phi^{y^{(i)}}(1- \phi)^{1 - y^{(i)}}\right]\right\} \end{aligned} L(θ)=i=1N{log[N(μ1,Σ)y(i)N(μ2,Σ)1y(i)]+log[ϕy(i)(1ϕ)1y(i)]}=i=1N{log[N(μ1,Σ)y(i)]+log[N(μ2,Σ)1y(i)]+log[ϕy(i)(1ϕ)1y(i)]}
最终,使用极大似然估计求解似然函数中的模型参数 θ \theta θ
θ ^ = arg ⁡ max ⁡ θ L ( θ ) \hat {\theta} = \mathop{\arg\max}\limits_{\theta} \mathcal L(\theta) θ^=θargmaxL(θ)

求解过程

L ( θ ) \mathcal L(\theta) L(θ)完全展开,表示如下:
L ( θ ) = ∑ i = 1 N log ⁡ [ N ( μ 1 , Σ ) y ( i ) ] + ∑ i = 1 N log ⁡ [ N ( μ 2 , Σ ) 1 − y ( i ) ] + ∑ i = 1 N log ⁡ [ ϕ y ( i ) ( 1 − ϕ ) 1 − y ( i ) ] \mathcal L(\theta) = \sum_{i=1}^N \log \left[\mathcal N(\mu_1,\Sigma)^{y^{(i)}}\right] + \sum_{i=1}^N \log \left[\mathcal N(\mu_2,\Sigma)^{1 - y^{(i)}}\right] + \sum_{i=1}^N \log \left[\phi^{y^{(i)}}(1- \phi)^{1 - y^{(i)}}\right] L(θ)=i=1Nlog[N(μ1,Σ)y(i)]+i=1Nlog[N(μ2,Σ)1y(i)]+i=1Nlog[ϕy(i)(1ϕ)1y(i)]

求解最优先验概率分布参数 ϕ \phi ϕ

L ( θ ) \mathcal L(\theta) L(θ)展开结果共包含3项,其中只有最后一项包含参数 ϕ \phi ϕ,因此则有:
ϕ ^ = arg ⁡ max ⁡ ϕ L ( θ ) = arg ⁡ max ⁡ ϕ ∑ i = 1 N log ⁡ [ ϕ y ( i ) ( 1 − ϕ ) 1 − y ( i ) ] \begin{aligned}\hat {\phi} & = \mathop{\arg\max}\limits_{\phi} \mathcal L(\theta) \\ & = \mathop{\arg\max}\limits_{\phi} \sum_{i=1}^N \log \left[\phi^{y^{(i)}}(1- \phi)^{1 - y^{(i)}}\right] \end{aligned} ϕ^=ϕargmaxL(θ)=ϕargmaxi=1Nlog[ϕy(i)(1ϕ)1y(i)]
将该式展开:
ϕ ^ = arg ⁡ max ⁡ ϕ ∑ i = 1 N [ log ⁡ ϕ y ( i ) + log ⁡ ( 1 − ϕ ) 1 − y ( i ) ] = arg ⁡ max ⁡ ϕ ∑ i = 1 N [ y ( i ) log ⁡ ϕ + ( 1 − y ( i ) ) log ⁡ ( 1 − ϕ ) ] \begin{aligned}\hat \phi & = \mathop{\arg\max}\limits_{\phi} \sum_{i=1}^N\left[\log \phi^{y^{(i)}} + \log (1 - \phi)^{1 - y^{(i)}}\right] \\ & = \mathop{\arg\max}\limits_{\phi} \sum_{i=1}^N\left[y^{(i)} \log \phi + (1 - y^{(i)})\log(1 - \phi)\right] \end{aligned} ϕ^=ϕargmaxi=1N[logϕy(i)+log(1ϕ)1y(i)]=ϕargmaxi=1N[y(i)logϕ+(1y(i))log(1ϕ)]
由于只有 ϕ \phi ϕ一个参数,因此令 L ( ϕ ) = ∑ i = 1 N [ y ( i ) log ⁡ ϕ + ( 1 − y ( i ) ) log ⁡ ( 1 − ϕ ) ] \mathcal L(\phi) = \sum_{i=1}^N\left[y^{(i)} \log \phi + (1 - y^{(i)})\log(1 - \phi)\right] L(ϕ)=i=1N[y(i)logϕ+(1y(i))log(1ϕ)],并对 ϕ \phi ϕ求导:
由于分母不含 i i i,因此将连加号提到分母上。
∂ L ( ϕ ) ∂ ϕ = ∑ i = 1 N y ( i ) ( 1 − ϕ ) − ϕ ( 1 − y ( i ) ) ϕ ( 1 − ϕ ) = ∑ i = 1 N y ( i ) ( 1 − ϕ ) − ϕ ( 1 − y ( i ) ) ϕ ( 1 − ϕ ) \begin{aligned}\frac{\partial \mathcal L(\phi)}{\partial \phi} & = \sum_{i=1}^N \frac{y^{(i)}(1 - \phi) - \phi(1 - y^{(i)})}{\phi(1 - \phi)} \\ & = \frac{\sum_{i=1}^Ny^{(i)}(1 - \phi) - \phi(1 - y^{(i)})}{\phi(1 - \phi)} \end{aligned} ϕL(ϕ)=i=1Nϕ(1ϕ)y(i)(1ϕ)ϕ(1y(i))=ϕ(1ϕ)i=1Ny(i)(1ϕ)ϕ(1y(i))
∂ L ( ϕ ) ∂ ϕ ≜ 0 \frac{\partial \mathcal L(\phi)}{\partial \phi} \triangleq 0 ϕL(ϕ)0,则有分子为0
∑ i = 1 N [ y ( i ) ( 1 − ϕ ) − ϕ ( 1 − y ( i ) ) ] = 0 ϕ ^ = 1 N ∑ i = 1 N y ( i ) \sum_{i=1}^N \left[y^{(i)}(1 - \phi) - \phi(1 - y^{(i)})\right] = 0 \\ \hat \phi = \frac{1}{N} \sum_{i=1}^N y^{(i)} i=1N[y(i)(1ϕ)ϕ(1y(i))]=0ϕ^=N1i=1Ny(i)
由于 y ( i ) ∈ { 0 , 1 } y^{(i)} \in \{0,1\} y(i){0,1},因此 ϕ ^ \hat \phi ϕ^可以理解为 标签为1的样本数量占整个样本数量的比率。令 N 1 = ∑ i = 1 N y ( i ) N_1 = \sum_{i=1}^Ny^{(i)} N1=i=1Ny(i),则有:
ϕ ^ = N 1 N \hat \phi = \frac{N_1}{N} ϕ^=NN1

求解最优似然分布的期望参数 μ \mu μ

最优解 μ 1 ^ \hat {\mu_1} μ1^的求解过程

由于不同似然对应的概率分布期望参数 μ \mu μ不同,因此这里以 μ 1 \mu_1 μ1为例,求解 最优参数 μ ^ 1 \hat \mu_1 μ^1
L ( θ ) \mathcal L(\theta) L(θ)展开的三项结果中,只有第一项包含 μ 1 \mu_1 μ1,因此则有:
μ 1 ^ = arg ⁡ max ⁡ μ 1 L ( θ ) = arg ⁡ max ⁡ μ 1 ∑ i = 1 N log ⁡ [ N ( μ 1 , Σ ) y ( i ) ] = arg ⁡ max ⁡ μ 1 ∑ i = 1 N y ( i ) log ⁡ [ N ( μ 1 , Σ ) ] \begin{aligned}\hat {\mu_1} & = \mathop{\arg\max}\limits_{\mu_1} \mathcal L(\theta) \\ & = \mathop{\arg\max}\limits_{\mu_1} \sum_{i=1}^N \log \left[\mathcal N(\mu_1,\Sigma)^{y^{(i)}}\right] \\ & = \mathop{\arg\max}\limits_{\mu_1} \sum_{i=1}^N y^{(i)} \log \left[\mathcal N(\mu_1,\Sigma)\right] \end{aligned} μ1^=μ1argmaxL(θ)=μ1argmaxi=1Nlog[N(μ1,Σ)y(i)]=μ1argmaxi=1Ny(i)log[N(μ1,Σ)]
由于 N ( μ 1 , Σ ) \mathcal N(\mu_1,\Sigma) N(μ1,Σ)是一个 p p p维高斯分布,因此 N ( μ 1 , Σ ) \mathcal N(\mu_1,\Sigma) N(μ1,Σ)概率密度函数表示如下:
N ( μ 1 , Σ ) = 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e − 1 2 ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) \mathcal N(\mu_1,\Sigma) = \frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x^{(i)} - \mu_1)^{T}\Sigma^{-1}(x^{(i)} - \mu_1)} N(μ1,Σ)=(2π)2p∣Σ211e21(x(i)μ1)TΣ1(x(i)μ1)
其中 ∣ Σ ∣ |\Sigma| ∣Σ∣表示协方差矩阵 Σ \Sigma Σ的行列式。将概率密度函数带入上式,得到如下结果:
μ 1 ^ = arg ⁡ max ⁡ μ 1 ∑ i = 1 N y ( i ) log ⁡ [ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e − 1 2 ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) ] = arg ⁡ max ⁡ μ 1 ∑ i = 1 N { y ( i ) log ⁡ [ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 ] + y ( i ) log ⁡ [ e − 1 2 ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) ] } = arg ⁡ max ⁡ μ 1 ∑ i = 1 N { y ( i ) log ⁡ [ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 ] + y ( i ) [ − 1 2 ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) ] } \begin{aligned}\hat {\mu_1} & = \mathop{\arg\max}\limits_{\mu_1} \sum_{i=1}^N y^{(i)} \log \left[\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x^{(i)} - \mu_1)^{T}\Sigma^{-1}(x^{(i)} - \mu_1)}\right] \\ & = \mathop{\arg\max}\limits_{\mu_1} \sum_{i=1}^N \left\{ y^{(i)} \log \left[\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\right] + y^{(i)} \log \left[e^{-\frac{1}{2}(x^{(i)} - \mu_1)^{T}\Sigma^{-1}(x^{(i)} - \mu_1)}\right] \right\} \\ & = \mathop{\arg\max}\limits_{\mu_1} \sum_{i=1}^N \left\{ y^{(i)} \log \left[\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\right] + y^{(i)} \left[-\frac{1}{2}(x^{(i)} - \mu_1)^{T}\Sigma^{-1}(x^{(i)} - \mu_1)\right]\right\} \end{aligned} μ1^=μ1argmaxi=1Ny(i)log[(2π)2p∣Σ211e21(x(i)μ1)TΣ1(x(i)μ1)]=μ1argmaxi=1N{y(i)log[(2π)2p∣Σ211]+y(i)log[e21(x(i)μ1)TΣ1(x(i)μ1)]}=μ1argmaxi=1N{y(i)log[(2π)2p∣Σ211]+y(i)[21(x(i)μ1)TΣ1(x(i)μ1)]}
由于这里求解的是 μ 1 ^ \hat {\mu_1} μ1^,因此 y ( i ) log ⁡ [ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 ] y^{(i)} \log \left[\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\right] y(i)log[(2π)2p∣Σ211]可视为常数。令 L ( μ 1 ) = ∑ i = 1 N y ( i ) [ − 1 2 ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) ] \mathcal L(\mu_1) = \sum_{i=1}^N y^{(i)} \left[-\frac{1}{2}(x^{(i)} - \mu_1)^{T}\Sigma^{-1}(x^{(i)} - \mu_1)\right] L(μ1)=i=1Ny(i)[21(x(i)μ1)TΣ1(x(i)μ1)],对 L ( μ 1 ) \mathcal L(\mu_1) L(μ1)展开结果如下:
L ( μ 1 ) = − 1 2 ∑ i = 1 N ( x ( i ) T Σ − 1 − μ 1 T Σ − 1 ) ( x ( i ) − μ 1 ) = − 1 2 ∑ i = 1 N y ( i ) ( x ( i ) T Σ − 1 x ( i ) − μ 1 T Σ − 1 x ( i ) − x ( i ) Σ − 1 μ 1 T + μ 1 T Σ − 1 μ 1 ) \begin{aligned} \mathcal L(\mu_1) & = -\frac{1}{2} \sum_{i=1}^N({x^{(i)}}^{T}\Sigma^{-1} - \mu_1^{T}\Sigma^{-1})(x^{(i)} - \mu_1) \\ & = -\frac{1}{2} \sum_{i=1}^N y^{(i)}({x^{(i)}}^{T}\Sigma^{-1}x^{(i)} - \mu_1^{T}\Sigma^{-1}x^{(i)} - x^{(i)}\Sigma^{-1}\mu_1^{T} + \mu_1^{T} \Sigma^{-1}\mu_1) \end{aligned} L(μ1)=21i=1N(x(i)TΣ1μ1TΣ1)(x(i)μ1)=21i=1Ny(i)(x(i)TΣ1x(i)μ1TΣ1x(i)x(i)Σ1μ1T+μ1TΣ1μ1)
观察 μ 1 T Σ − 1 x ( i ) \mu_1^{T}\Sigma^{-1}x^{(i)} μ1TΣ1x(i) x ( i ) Σ − 1 μ 1 T x^{(i)} \Sigma^{-1} \mu_1^{T} x(i)Σ1μ1T这两项,其中 x ( i ) x^{(i)} x(i) μ 1 \mu_1 μ1均是 p p p维列向量,而 Σ − 1 \Sigma^{-1} Σ1 p × p p \times p p×p的方阵,所以 μ 1 T Σ − 1 x ( i ) \mu_1^{T}\Sigma^{-1}x^{(i)} μ1TΣ1x(i) x ( i ) Σ − 1 μ 1 T x^{(i)} \Sigma^{-1} \mu_1^{T} x(i)Σ1μ1T结果均是标量,且:
将两式展开后均是一个线性计算,根据乘法交换律,自然是相等的。
μ 1 T Σ − 1 x ( i ) = x ( i ) Σ − 1 μ 1 T ∈ R \mu_1^{T}\Sigma^{-1}x^{(i)} = x^{(i)} \Sigma^{-1} \mu_1^{T} \in \mathbb R μ1TΣ1x(i)=x(i)Σ1μ1TR
因此,将上述结果进行合并:
L ( μ 1 ) = − 1 2 ∑ i = 1 N y ( i ) ( x ( i ) T Σ − 1 x ( i ) − 2 μ 1 T Σ − 1 x ( i ) + μ 1 T Σ − 1 μ 1 ) \mathcal L(\mu_1) = -\frac{1}{2} \sum_{i=1}^N y^{(i)}({x^{(i)}}^{T}\Sigma^{-1}x^{(i)} - 2\mu_1^{T}\Sigma^{-1}x^{(i)} + \mu_1^{T} \Sigma^{-1}\mu_1) L(μ1)=21i=1Ny(i)(x(i)TΣ1x(i)2μ1TΣ1x(i)+μ1TΣ1μ1)
μ 1 \mu_1 μ1求导:
需要学习‘矩阵论’的矩阵求导~
∂ ( μ 1 T Σ − 1 μ 1 ) ∂ μ 1 = 2 Σ − 1 μ 1 ∂ L ( μ 1 ) ∂ μ 1 = 1 2 ∑ i = 1 N y ( i ) ( − 2 Σ − 1 x ( i ) + 2 Σ − 1 μ 1 ) = ∑ i = 1 N y ( i ) ( − Σ − 1 x ( i ) + Σ − 1 μ 1 ) = ∑ i = 1 N y ( i ) Σ − 1 ( − x ( i ) + μ 1 ) \frac{\partial(\mu_1^{T} \Sigma^{-1}\mu_1)}{\partial \mu_1} = 2\Sigma^{-1}\mu_1 \\ \begin{aligned}\frac{\partial \mathcal L(\mu_1)}{\partial \mu_1} & = \frac{1}{2} \sum_{i=1}^N y^{(i)}(-2 \Sigma^{-1}x^{(i)} + 2\Sigma^{-1}\mu_1) \\ & = \sum_{i=1}^N y^{(i)}(-\Sigma^{-1}x^{(i)} + \Sigma^{-1}\mu_1) \\ & = \sum_{i=1}^N y^{(i)}\Sigma^{-1}(-x^{(i)} + \mu_1)\end{aligned} μ1(μ1TΣ1μ1)=2Σ1μ1μ1L(μ1)=21i=1Ny(i)(2Σ1x(i)+2Σ1μ1)=i=1Ny(i)(Σ1x(i)+Σ1μ1)=i=1Ny(i)Σ1(x(i)+μ1)
∂ L ( μ 1 ) ∂ μ 1 ≜ 0 \frac{\partial \mathcal L(\mu_1)}{\partial \mu_1} \triangleq 0 μ1L(μ1)0,则有:
Σ − 1 [ ∑ i = 1 N y ( i ) ( − x ( i ) + μ 1 ) ] = 0 ∑ i = 1 N y ( i ) μ 1 = ∑ i = 1 N y ( i ) x ( i ) μ 1 ^ = ∑ i = 1 N y ( i ) x ( i ) ∑ i = 1 N y ( i ) \begin{aligned} \Sigma^{-1}\left[\sum_{i=1}^N y^{(i)}(-x^{(i)} + \mu_1)\right] = 0 \\ \sum_{i=1}^N y^{(i)}\mu_1 = \sum_{i=1}^N y^{(i)}x^{(i)} \\ \hat {\mu_1} = \frac{\sum_{i=1}^N y^{(i)}x^{(i)}}{\sum_{i=1}^N y^{(i)}} \quad \quad \\ \end{aligned} Σ1[i=1Ny(i)(x(i)+μ1)]=0i=1Ny(i)μ1=i=1Ny(i)x(i)μ1^=i=1Ny(i)i=1Ny(i)x(i)

最优解 μ 2 ^ \hat {\mu_2} μ2^的求解过程

同理, μ 2 \mu_2 μ2的求解过程和 μ 1 \mu_1 μ1的唯一区别是指数部分为 1 − y ( i ) 1 - y^{(i)} 1y(i)
μ 2 ^ = arg ⁡ max ⁡ μ 2 ∑ i = 1 N { ( 1 − y ( i ) ) log ⁡ [ 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 ] + ( 1 − y ( i ) ) [ − 1 2 ( x ( i ) − μ 2 ) T Σ − 1 ( x ( i ) − μ 2 ) ] } \hat {\mu_2} = \mathop{\arg\max}\limits_{\mu_2} \sum_{i=1}^N \left\{(1 - y^{(i)}) \log \left[\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\right] + (1 - y^{(i)})\left[-\frac{1}{2}(x^{(i)} - \mu_2)^{T}\Sigma^{-1}(x^{(i)} - \mu_2)\right] \right\} μ2^=μ2argmaxi=1N{(1y(i))log[(2π)2p∣Σ211]+(1y(i))[21(x(i)μ2)TΣ1(x(i)μ2)]}
中间部分和 μ 1 \mu_1 μ1相同,省略;
关于 μ 2 \mu_2 μ2最优解 μ 2 ^ \hat {\mu_2} μ2^表示如下:
μ 1 \mu_1 μ1求解过程相比,只是将 y ( i ) y^{(i)} y(i)替换为 1 − y ( i ) 1 - y^{(i)} 1y(i)
μ 2 ^ = ∑ i = 1 N ( 1 − y ( i ) ) x ( i ) ∑ i = 1 N ( 1 − y ( i ) ) \hat {\mu_2} = \frac{\sum_{i=1}^N(1 - y^{(i)})x^{(i)}}{\sum_{i=1}^N(1 - y^{(i)})} μ2^=i=1N(1y(i))i=1N(1y(i))x(i)

求解最优似然分布的方差参数 Σ \Sigma Σ

场景描述

在求解 Σ ^ \hat \Sigma Σ^过程中,需要对样本集合进行划分
X 1 = { x ( i ) ∣ y ( i ) = 1 } i = 1 , 2 , ⋯   , N X 2 = { x ( i ) ∣ y ( i ) = 0 } i = 1 , 2 , ⋯   , N \mathcal X_1 = \{x^{(i)} \mid y^{(i)} = 1\}_{i=1,2,\cdots,N} \\ \mathcal X_2 = \{x^{(i)} \mid y^{(i)} = 0\}_{i=1,2,\cdots,N} X1={x(i)y(i)=1}i=1,2,,NX2={x(i)y(i)=0}i=1,2,,N
记样本集合 X 1 \mathcal X_1 X1的数量为 N 1 N_1 N1,样本集合 X 2 \mathcal X_2 X2的数量为 N 2 N_2 N2,那么样本集合包含如下性质:
N 1 + N 2 = N X 1 ∪ X 2 = X N_1 + N_2 = N \\ \mathcal X_1 \cup \mathcal X_2 = \mathcal X N1+N2=NX1X2=X
样本均值 μ X \mu_{\mathcal X} μX,各样本集合均值 μ X i \mu_{\mathcal X_{i}} μXi、方差 S X i \mathcal S_{\mathcal X_i} SXi表示如下
μ X = 1 N ∑ i = 1 N x ( i ) μ X i = 1 N i ∑ x ( j ) ∈ X i x ( j ) ( i = 1 , 2 ) S X i = 1 N i ∑ x ( j ) ∈ X i ( x ( j ) − μ X i ) ( x ( j ) − μ X i ) T ( i = 1 , 2 ) \begin{aligned} \mu_{\mathcal X} & = \frac{1}{N} \sum_{i=1}^N x^{(i)}\\ \mu_{\mathcal X_i} & = \frac{1}{N_i} \sum_{x^{(j)} \in \mathcal X_i} x^{(j)} \quad (i=1,2) \\ \mathcal S_{\mathcal X_i} & = \frac{1}{N_i} \sum_{x^{(j)} \in \mathcal X_i}(x^{(j)} - \mu_{\mathcal X_i})(x^{(j)} - \mu_{\mathcal X_i})^{T} \quad (i=1,2) \end{aligned} μXμXiSXi=N1i=1Nx(i)=Ni1x(j)Xix(j)(i=1,2)=Ni1x(j)Xi(x(j)μXi)(x(j)μXi)T(i=1,2)
基于上述场景,期望最优解 μ 1 ^ , μ 2 ^ \hat {\mu_1},\hat {\mu_2} μ1^,μ2^可以进一步化简:
μ 1 ^ = ∑ i = 1 N y ( i ) x ( i ) ∑ i = 1 N y ( i ) = ∑ x ( j ) ∈ X 1 x ( j ) N 1 = N 1 N 1 μ X 1 = μ X 1 μ 2 ^ = ∑ i = 1 N ( 1 − y ( i ) ) x ( i ) ∑ i = 1 N ( 1 − y ( i ) ) = N ⋅ μ X − N 1 ⋅ μ X 1 N − N 1 = N ⋅ μ X − N 1 ⋅ μ X 1 N 2 \begin{aligned} \hat {\mu_1} & = \frac{\sum_{i=1}^N y^{(i)}x^{(i)}}{\sum_{i=1}^N y^{(i)}} = \frac{\sum_{x^{(j)} \in \mathcal X_1} x^{(j)}}{N_1} = \frac{N_1}{N_1} \mu_{\mathcal X_1} = \mu_{\mathcal X_1}\\ \hat {\mu_2} & = \frac{\sum_{i=1}^N(1 - y^{(i)})x^{(i)}}{\sum_{i=1}^N(1 - y^{(i)})} = \frac{N \cdot \mu_{\mathcal X} - N_1 \cdot \mu_{\mathcal X_1}}{N - N_1} = \frac{N \cdot \mu_{\mathcal X} - N_1 \cdot \mu_{\mathcal X_1}}{N_2} \end{aligned} μ1^μ2^=i=1Ny(i)i=1Ny(i)x(i)=N1x(j)X1x(j)=N1N1μX1=μX1=i=1N(1y(i))i=1N(1y(i))x(i)=NN1NμXN1μX1=N2NμXN1μX1

求解过程

继续观察 L ( θ ) \mathcal L(\theta) L(θ)的展开式,只有第一项与第二项包含 Σ \Sigma Σ。定义 L ( Σ ) \mathcal L(\Sigma) L(Σ)
L ( Σ ) = ∑ i = 1 N log ⁡ [ N ( μ 1 , Σ ) y ( i ) ] + ∑ i = 1 N log ⁡ [ N ( μ 2 , Σ ) 1 − y ( i ) ] \mathcal L(\Sigma) = \sum_{i=1}^N \log \left[\mathcal N(\mu_1,\Sigma)^{y^{(i)}}\right] + \sum_{i=1}^N \log \left[\mathcal N(\mu_2,\Sigma)^{1 - y^{(i)}}\right] L(Σ)=i=1Nlog[N(μ1,Σ)y(i)]+i=1Nlog[N(μ2,Σ)1y(i)]
观察其中任意一项,如: ∑ i = 1 N log ⁡ [ N ( μ 1 , Σ ) y ( i ) ] \sum_{i=1}^N \log \left[\mathcal N(\mu_1,\Sigma)^{y^{(i)}}\right] i=1Nlog[N(μ1,Σ)y(i)],如果 y ( i ) = 0 y^{(i)}=0 y(i)=0,意味着 log ⁡ [ N ( μ 1 , Σ ) y ( i ) ] = 0 \log \left[\mathcal N(\mu_1,\Sigma)^{y^{(i)}}\right] = 0 log[N(μ1,Σ)y(i)]=0。可以看出,上述两项中均包含很多零项。结合场景描述,可以将上述公式化简为如下形式:
将所有的‘零项’全部剔除了。
L ( Σ ) = ∑ x ( j ) ∈ X 1 log ⁡ N ( μ 1 , Σ ) + ∑ x ( j ) ∈ X 2 log ⁡ N ( μ 2 , Σ ) \mathcal L(\Sigma) = \sum_{x^{(j)} \in \mathcal X_1} \log \mathcal N(\mu_1,\Sigma) + \sum_{x^{(j)} \in \mathcal X_2} \log \mathcal N(\mu_2, \Sigma) L(Σ)=x(j)X1logN(μ1,Σ)+x(j)X2logN(μ2,Σ)
观察其中任意一项:以 ∑ x ( j ) ∈ X 1 log ⁡ N ( μ 1 , Σ ) \sum_{x^{(j)} \in \mathcal X_1} \log \mathcal N(\mu_1,\Sigma) x(j)X1logN(μ1,Σ)为例,将概率密度函数带入,将其展开
∑ x ( j ) ∈ X 1 log ⁡ N ( μ 1 , Σ ) = ∑ x ( j ) ∈ X 1 log ⁡ { 1 ( 2 π ) p 2 ∣ Σ ∣ 1 2 e − 1 2 ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) } = ∑ x ( j ) ∈ X 1 { log ⁡ [ 1 ( 2 π ) p 2 ] + log ⁡ [ ∣ Σ ∣ − 1 2 ] + [ − 1 2 ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) ] } = ∑ x ( j ) ∈ X 1 log ⁡ [ 1 ( 2 π ) p 2 ] + ∑ x ( j ) ∈ X 1 log ⁡ [ ∣ Σ ∣ − 1 2 ] + ∑ x ( j ) ∈ X 1 [ − 1 2 ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) ] \begin{aligned} \sum_{x^{(j)} \in \mathcal X_1} \log \mathcal N(\mu_1,\Sigma) & = \sum_{x^{(j)} \in \mathcal X_1} \log \left\{\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}e^{-\frac{1}{2}(x^{(j)}-\mu_1)^{T}\Sigma^{-1}(x^{(j)} - \mu_1)}\right\} \\ & = \sum_{x^{(j)} \in \mathcal X_1} \left\{\log \left[\frac{1}{(2\pi)^{\frac{p}{2}}}\right] + \log \left[|\Sigma|^{-\frac{1}{2}}\right] + \left[-\frac{1}{2} (x^{(j)} - \mu_1)^{T} \Sigma^{-1}(x^{(j)} - \mu_1)\right]\right\} \\ & = \sum_{x^{(j)} \in \mathcal X_1} \log\left[\frac{1}{(2\pi)^{\frac{p}{2}}}\right] + \sum_{x^{(j)} \in \mathcal X_1} \log \left[|\Sigma|^{-\frac{1}{2}}\right] + \sum_{x^{(j)} \in \mathcal X_1} \left[-\frac{1}{2} (x^{(j)} - \mu_1)^{T} \Sigma^{-1}(x^{(j)} - \mu_1)\right] \end{aligned} x(j)X1logN(μ1,Σ)=x(j)X1log{(2π)2p∣Σ211e21(x(j)μ1)TΣ1(x(j)μ1)}=x(j)X1{log[(2π)2p1]+log[∣Σ21]+[21(x(j)μ1)TΣ1(x(j)μ1)]}=x(j)X1log[(2π)2p1]+x(j)X1log[∣Σ21]+x(j)X1[21(x(j)μ1)TΣ1(x(j)μ1)]
观察大括号中的三项,第一项不含 Σ \Sigma Σ,视为常数;仔细观察第三项:
− 1 2 ∑ x ( j ) ∈ X 1 ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) -\frac{1}{2} \sum_{x^{(j)} \in\mathcal X_1}(x^{(j)} - \mu_1)^{T} \Sigma^{-1}(x^{(j)} - \mu_1) 21x(j)X1(x(j)μ1)TΣ1(x(j)μ1)
已知 x ( j ) , μ 1 x^{(j)},\mu_1 x(j),μ1均是 p p p维向量,则 ( x ( j ) − μ 1 ) T (x^{(j)} - \mu_1)^{T} (x(j)μ1)T的维度是 1 × p 1 \times p 1×p Σ − 1 \Sigma^{-1} Σ1协方差矩阵的逆矩阵,是 p × p p \times p p×p维度的方阵 ( x ( j ) − μ 1 ) (x^{(j)} - \mu_1) (x(j)μ1)的维度自然是 p × 1 p \times 1 p×1
因此, ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) (x^{(j)} - \mu_1)^{T} \Sigma^{-1}(x^{(j)} - \mu_1) (x(j)μ1)TΣ1(x(j)μ1)本身就是一个 实数。而实数本身也是一个方阵( 1 × 1 1 \times 1 1×1的方阵)。在这里引入 线性代数中的秩,记作 t r tr tr——实数的秩即实数本身
可以将第三项表示如下:
− 1 2 ∑ x ( j ) ∈ X 1 t r [ ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) ] -\frac{1}{2}\sum_{x^{(j)} \in \mathcal X_1} tr\left[(x^{(j)} - \mu_1)^{T}\Sigma^{-1}(x^{(j)} - \mu_1)\right] 21x(j)X1tr[(x(j)μ1)TΣ1(x(j)μ1)]
根据矩阵的秩的性质,可以将上述结果表示如下
矩阵A,B,C能够相乘并且相乘结果是方阵的条件下:tr(ABC) = tr(CAB) = tr(BCA)
由于 ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) ( x^{(j)} - \mu_1)^{T}\Sigma^{-1}(x^{(j)} - \mu_1) (x(j)μ1)TΣ1(x(j)μ1)结果是实数,因此 ∑ x ( j ) ∈ X 1 \sum_{x^{(j)} \in \mathcal X_1} x(j)X1放到tr的里面还是外面没有任何区别。
− 1 2 ∑ x ( j ) ∈ X 1 t r [ ( x ( j ) − μ 1 ) T Σ − 1 ( x ( j ) − μ 1 ) ] = − 1 2 ∑ x ( j ) ∈ X 1 t r [ ( x ( j ) − μ 1 ) ( x ( j ) − μ 1 ) T Σ − 1 ] = − 1 2 t r [ ∑ x ( j ) ∈ X 1 ( x ( j ) − μ 1 ) ( x ( j ) − μ 1 ) T Σ − 1 ] \begin{aligned} -\frac{1}{2} \sum_{x^{(j)} \in \mathcal X_1} tr\left[( x^{(j)} - \mu_1)^{T}\Sigma^{-1}(x^{(j)} - \mu_1) \right] & = -\frac{1}{2} \sum_{x^{(j)} \in \mathcal X_1} tr\left[(x^{(j)} - \mu_1)( x^{(j)} - \mu_1)^{T}\Sigma^{-1}\right] \\ & = -\frac{1}{2} tr\left[\sum_{x^{(j)} \in \mathcal X_1} (x^{(j)} - \mu_1)( x^{(j)} - \mu_1)^{T}\Sigma^{-1}\right] \end{aligned} 21x(j)X1tr[(x(j)μ1)TΣ1(x(j)μ1)]=21x(j)X1tr[(x(j)μ1)(x(j)μ1)TΣ1]=21tr x(j)X1(x(j)μ1)(x(j)μ1)TΣ1
又因为 Σ − 1 \Sigma^{-1} Σ1中不含 j j j,因此可以将 Σ − 1 \Sigma^{-1} Σ1提出来:
− 1 2 t r [ ( ∑ x ( j ) ∈ X 1 ( x ( j ) − μ 1 ) ( x ( j ) − μ 1 ) T ) Σ − 1 ] -\frac{1}{2} tr\left[\left(\sum_{x^{(j)} \in \mathcal X_1} (x^{(j)} - \mu_1)( x^{(j)} - \mu_1)^{T}\right)\Sigma^{-1}\right] 21tr x(j)X1(x(j)μ1)(x(j)μ1)T Σ1
观察: ∑ x ( j ) ∈ X 1 ( x ( j ) − μ 1 ) ( x ( j ) − μ 1 ) T \sum_{x^{(j)} \in \mathcal X_1} (x^{(j)} - \mu_1)( x^{(j)} - \mu_1)^{T} x(j)X1(x(j)μ1)(x(j)μ1)T 和标签为1的样本的协方差矩阵仅差 N N N

标签为1样本的协方差矩阵为 S 1 \mathcal S_1 S1标签为0样本的协方差矩阵为 S 2 \mathcal S_2 S2。上述第三项可以将其表示为:
− 1 2 N 1 ⋅ t r ( S 1 ⋅ Σ − 1 ) -\frac{1}{2} N_1 \cdot tr(\mathcal S_1 \cdot \Sigma^{-1}) 21N1tr(S1Σ1)
因此, ∑ x ( j ) ∈ X 1 log ⁡ N ( μ 1 , Σ ) \sum_{x^{(j)} \in \mathcal X_1} \log \mathcal N(\mu_1,\Sigma) x(j)X1logN(μ1,Σ)可以表示为:
− 1 2 N 1 ⋅ log ⁡ ∣ Σ ∣ − 1 2 N 1 ⋅ t r ( S 1 ⋅ Σ − 1 ) + C 1 ( C 1 = ∑ x ( j ) ∈ X 1 log ⁡ [ 1 ( 2 π ) p 2 ] ) -\frac{1}{2} N_1 \cdot \log |\Sigma| - \frac{1}{2} N_1 \cdot tr\left(\mathcal S_1 \cdot \Sigma^{-1}\right) + \mathcal C_1 \quad \left(\mathcal C_1 = \sum_{x^{(j)} \in \mathcal X_1} \log \left[\frac{1}{(2\pi)^{\frac{p}{2}}}\right]\right) 21N1log∣Σ∣21N1tr(S1Σ1)+C1 C1=x(j)X1log[(2π)2p1]
同理, ∑ x ( j ) ∈ X 2 log ⁡ N ( μ 2 , Σ ) \sum_{x^{(j)} \in \mathcal X_2} \log \mathcal N(\mu_2,\Sigma) x(j)X2logN(μ2,Σ)可以表示为:
− 1 2 N 2 ⋅ log ⁡ ∣ Σ ∣ − 1 2 N 2 ⋅ t r ( S 2 ⋅ Σ − 1 ) + C 2 ( C 2 = ∑ x ( j ) ∈ X 2 log ⁡ [ 1 ( 2 π ) p 2 ] ) -\frac{1}{2} N_2 \cdot \log |\Sigma| - \frac{1}{2} N_2 \cdot tr\left(\mathcal S_2 \cdot \Sigma^{-1}\right) + \mathcal C_2 \quad \left(\mathcal C_2 = \sum_{x^{(j)} \in \mathcal X_2} \log \left[\frac{1}{(2\pi)^{\frac{p}{2}}}\right]\right) 21N2log∣Σ∣21N2tr(S2Σ1)+C2 C2=x(j)X2log[(2π)2p1]
至此, L ( Σ ) \mathcal L(\Sigma) L(Σ)可以表示如下:
L ( Σ ) = ∑ x ( j ) ∈ X 1 log ⁡ N ( μ 1 , Σ ) + ∑ x ( j ) ∈ X 2 log ⁡ N ( μ 2 , Σ ) = − 1 2 ( N 1 + N 2 ) log ⁡ ∣ Σ ∣ − 1 2 N 1 ⋅ t r ( S 1 ⋅ Σ − 1 ) − 1 2 N 2 ⋅ t r ( S 2 ⋅ Σ − 1 ) + ( C 1 + C 2 ) = − 1 2 [ N log ⁡ ∣ Σ ∣ + N 1 ⋅ t r ( S 1 ⋅ Σ − 1 ) + N 2 ⋅ t r ( S 2 ⋅ Σ − 1 ) ] + C ( C = C 1 + C 2 ) \begin{aligned} \mathcal L(\Sigma) & = \sum_{x^{(j)} \in \mathcal X_1} \log \mathcal N(\mu_1,\Sigma) + \sum_{x^{(j)} \in \mathcal X_2} \log \mathcal N(\mu_2,\Sigma) \\ & = -\frac{1}{2}(N_1 + N_2) \log |\Sigma| - \frac{1}{2}N_1 \cdot tr(\mathcal S_1 \cdot \Sigma^{-1}) - \frac{1}{2}N_2 \cdot tr(\mathcal S_2 \cdot \Sigma^{-1}) + (\mathcal C_1 + \mathcal C_2) \\ & = -\frac{1}{2} \left[N \log |\Sigma| + N_1 \cdot tr(\mathcal S_1 \cdot \Sigma^{-1}) + N_2 \cdot tr(\mathcal S_2 \cdot \Sigma^{-1})\right] + \mathcal C \quad (\mathcal C = \mathcal C_1 + \mathcal C_2) \end{aligned} L(Σ)=x(j)X1logN(μ1,Σ)+x(j)X2logN(μ2,Σ)=21(N1+N2)log∣Σ∣21N1tr(S1Σ1)21N2tr(S2Σ1)+(C1+C2)=21[Nlog∣Σ∣+N1tr(S1Σ1)+N2tr(S2Σ1)]+C(C=C1+C2)

基于上式,对 Σ \Sigma Σ进行求导:
求导过程中,需要注意‘行列式的导数’与‘秩的导数’:
∂ t r ( A B ) ∂ A = B T ∂ ∣ A ∣ ∂ A = ∣ A ∣ ⋅ A − 1 \frac{\partial tr(AB)}{\partial A} = B^{T} \\ \frac{\partial |A|}{\partial A} = |A|\cdot A^{-1} Atr(AB)=BTAA=AA1
求导结果如下:
∂ L ( Σ ) ∂ Σ = − 1 2 ( N ⋅ ∣ Σ ∣ ⋅ Σ − 1 ∣ Σ ∣ + N 1 ⋅ S 1 T ⋅ ( − 1 ) Σ − 2 + N 2 ⋅ S 2 T ⋅ ( − 1 ) Σ − 2 ) = − 1 2 [ N ⋅ Σ − 1 − N 1 ⋅ S 1 T ⋅ Σ − 2 − N 2 ⋅ S 2 T ⋅ Σ − 2 ] \begin{aligned} \frac{\partial \mathcal L(\Sigma)}{\partial \Sigma} & = -\frac{1}{2}(N \cdot \frac{|\Sigma| \cdot\Sigma^{-1}}{|\Sigma|} + N_1 \cdot \mathcal S_1^{T}\cdot (-1) \Sigma^{-2} + N_2 \cdot \mathcal S_2^{T}\cdot (-1) \Sigma^{-2}) \\ & = -\frac{1}{2}[N\cdot \Sigma^{-1} - N_1 \cdot S_1^{T} \cdot \Sigma^{-2} - N_2 \cdot S_2^{T} \cdot \Sigma^{-2}] \end{aligned} ΣL(Σ)=21(N∣Σ∣∣Σ∣Σ1+N1S1T(1)Σ2+N2S2T(1)Σ2)=21[NΣ1N1S1TΣ2N2S2TΣ2]
由于 S 1 , S 2 \mathcal S_1,\mathcal S_2 S1,S2均为协方差矩阵,因此它们是实对称矩阵。即:
S 1 T = S 1 , S 2 T = S 2 \mathcal S_1^{T} = \mathcal S_1,\mathcal S_2^{T} = \mathcal S_2 S1T=S1,S2T=S2
∂ L ( Σ ) ∂ Σ \frac{\partial \mathcal L(\Sigma)}{\partial \Sigma} ΣL(Σ)最终表示为:
∂ L ( Σ ) ∂ Σ = − 1 2 [ N ⋅ Σ − 1 − N 1 ⋅ S 1 ⋅ Σ − 2 − N 2 ⋅ S 2 ⋅ Σ − 2 ] \frac{\partial \mathcal L(\Sigma)}{\partial \Sigma} = -\frac{1}{2}[N\cdot \Sigma^{-1} - N_1 \cdot S_1 \cdot \Sigma^{-2} - N_2 \cdot S_2 \cdot \Sigma^{-2}] ΣL(Σ)=21[NΣ1N1S1Σ2N2S2Σ2]
∂ L ( Σ ) ∂ Σ ≜ 0 \frac{\partial \mathcal L(\Sigma)}{\partial \Sigma} \triangleq 0 ΣL(Σ)0,则有:
N ⋅ Σ − 1 − N 1 ⋅ S 1 ⋅ Σ − 2 − N 2 ⋅ S 2 ⋅ Σ − 2 = 0 N\cdot \Sigma^{-1} - N_1 \cdot S_1 \cdot \Sigma^{-2} - N_2 \cdot S_2 \cdot \Sigma^{-2} = 0 NΣ1N1S1Σ2N2S2Σ2=0
等式两边同乘 Σ 2 \Sigma^2 Σ2,可得:
N Σ − N 1 S 1 − N 2 S 2 = 0 Σ ^ = N 1 S 1 + N 2 S 2 N N \Sigma - N_1 \mathcal S_1 - N_2 \mathcal S_2 = 0 \\ \hat \Sigma = \frac{N_1\mathcal S_1 + N_2 \mathcal S_2}{N} NΣN1S1N2S2=0Σ^=NN1S1+N2S2

思考

这里在定义似然的概率分布时就定义 Σ \Sigma Σ是公用的,在 Σ ^ \hat \Sigma Σ^的求解结果中发现,从理论角度观察, S 1 , S 2 \mathcal S_1,\mathcal S_2 S1,S2本质上应该是相同的。如果将 S 1 = S 2 \mathcal S_1 = \mathcal S_2 S1=S2代入上式会发现 就是一个恒等式。但之所以有差异,自然是 高斯分布产生样本的随机性导致的。

下一节将介绍另一种概率生成模型——朴素贝叶斯。
相关参考:
机器学习-线性分类8-高斯判别分析(Gaussian Discriminant Analysis)-模型求解(求协方差)
机器学习-线性分类7-高斯判别分析(Gaussian Discriminant Analysis)-模型求解(求期望)

相关文章:

  • 爬取头像的程序不好用了?那就再重新做一份咯
  • 常见的三种池化操作:MaxPool2d/AdaptiveMaxPool2d/AvgPool2d/AdaptiveAvgPool2d...(Pytorch)
  • Java 面试八股文 —— SSM 框架常见面试题
  • Tomcat的安装与优化
  • 第三章 神经网络——什么是神经网路激活函数3层神经网络的简单实现手写数字识别
  • 隧道调频广播覆盖-天线分布式部署是隧道调频广播无线覆盖系统设备介绍
  • 2022-8-31 第七小组 学习日记 (day55)JSP
  • 全球与中国吸油烟机行业市场规模调研及未来前瞻报告2022-2028年
  • 金仓数据库KingbaseES客户端应用参考手册--14. sys_receivewal
  • 软件测试自学不知道如何下手?一篇文章7个步骤带你找准方向
  • 金仓数据库KingbaseES客户端应用参考手册--15. sys_restore
  • map函数传入parseInt函数处理数字输出有误
  • C++ 小游戏 视频及资料集(9)
  • git--查看信息的命令--使用/实例
  • 中秋味的可视化大屏 【以python pyecharts为工具】
  • JavaScript 如何正确处理 Unicode 编码问题!
  • 「前端」从UglifyJSPlugin强制开启css压缩探究webpack插件运行机制
  • C++回声服务器_9-epoll边缘触发模式版本服务器
  • Docker下部署自己的LNMP工作环境
  • If…else
  • Laravel 中的一个后期静态绑定
  • laravel5.5 视图共享数据
  • Python 反序列化安全问题(二)
  • Python学习之路13-记分
  • React的组件模式
  • TypeScript实现数据结构(一)栈,队列,链表
  • 今年的LC3大会没了?
  • 猫头鹰的深夜翻译:JDK9 NotNullOrElse方法
  • 排序算法之--选择排序
  • 什么软件可以提取视频中的音频制作成手机铃声
  • 自定义函数
  • ​业务双活的数据切换思路设计(下)
  • # C++之functional库用法整理
  • $$$$GB2312-80区位编码表$$$$
  • (02)Hive SQL编译成MapReduce任务的过程
  • (1)Android开发优化---------UI优化
  • (2020)Java后端开发----(面试题和笔试题)
  • (9)STL算法之逆转旋转
  • (LeetCode) T14. Longest Common Prefix
  • (Redis使用系列) Springboot 使用Redis+Session实现Session共享 ,简单的单点登录 五
  • (附源码)计算机毕业设计ssm电影分享网站
  • .Net Core 中间件验签
  • .NET MAUI学习笔记——2.构建第一个程序_初级篇
  • .NET 依赖注入和配置系统
  • .net安装_还在用第三方安装.NET?Win10自带.NET3.5安装
  • .Net调用Java编写的WebServices返回值为Null的解决方法(SoapUI工具测试有返回值)
  • .NET使用存储过程实现对数据库的增删改查
  • /usr/bin/python: can't decompress data; zlib not available 的异常处理
  • [ Linux 长征路第二篇] 基本指令head,tail,date,cal,find,grep,zip,tar,bc,unname
  • [ 云计算 | Azure 实践 ] 在 Azure 门户中创建 VM 虚拟机并进行验证
  • [BeginCTF]真龙之力
  • [BT]BUUCTF刷题第4天(3.22)
  • [BZOJ 3282] Tree 【LCT】
  • [c#基础]值类型和引用类型的Equals,==的区别
  • [error] 17755#0: *58522 readv() failed (104: Connection reset by peer) while reading upstream