当前位置: 首页 > news >正文

机器学习笔记之高斯混合模型(四)EM算法求解高斯混合模型(M步操作)

机器学习笔记之高斯混合模型——EM算法求解高斯混合模型【M步操作】

  • 引言
    • 回顾:EM算法求解高斯混合模型的E步操作
    • EM算法M步操作
      • 求解过程

引言

上一节介绍了使用EM算法求解高斯混合模型参数的E步操作,本节将继续介绍后续的M步操作

回顾:EM算法求解高斯混合模型的E步操作

  • 高斯混合模型 P ( X ) P(\mathcal X) P(X)引入隐变量 Z \mathcal Z Z后的表示结果如下:
    P ( X ) = ∑ i = 1 K p k ⋅ N ( X ∣ μ k , Σ k ) P(\mathcal X) = \sum_{i=1}^{\mathcal K} p_k \cdot \mathcal N(\mathcal X \mid \mu_k,\Sigma_k) P(X)=i=1KpkN(Xμk,Σk)
    p k p_k pk表示隐变量 Z \mathcal Z Z选择具体某项离散参数 z k z_k zk的概率分布:
    p k = P ( Z = z k ) p_k = P(\mathcal Z = z_k) pk=P(Z=zk)
    N ( X ∣ μ k , Σ k ) \mathcal N(\mathcal X \mid \mu_k,\Sigma_k) N(Xμk,Σk)表示隐变量 Z = z k \mathcal Z = z_k Z=zk条件下,样本 X \mathcal X X服从均值为 μ k \mu_k μk,协方差为 Σ k \Sigma_k Σk 的高斯分布:
    X ∣ Z = z k ∼ N ( X ∣ μ k , Σ k ) \mathcal X \mid \mathcal Z = z_k \sim \mathcal N(\mathcal X \mid \mu_k,\Sigma_k) XZ=zkN(Xμk,Σk)
  • 对应的联合概率分布 P ( X , Z ) P(\mathcal X,\mathcal Z) P(X,Z)表示如下:
    P ( X , Z ) = P ( Z ) ⋅ P ( X ∣ Z ) = p Z ⋅ N ( X ∣ μ Z , Σ Z ) = ∏ i = 1 N p z ( i ) ⋅ N ( X ∣ μ z ( i ) , Σ z ( i ) ) \begin{aligned} P(\mathcal X,\mathcal Z) & = P(\mathcal Z)\cdot P(\mathcal X \mid \mathcal Z) \\ & = p_{\mathcal Z} \cdot \mathcal N(\mathcal X \mid \mu_{\mathcal Z},\Sigma_{\mathcal Z}) \\ & = \prod_{i=1}^N p_{z^{(i)}} \cdot \mathcal N(\mathcal X \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}}) \end{aligned} P(X,Z)=P(Z)P(XZ)=pZN(XμZ,ΣZ)=i=1Npz(i)N(Xμz(i),Σz(i))
    其中 p z ( i ) p_{z^{(i)}} pz(i)表示 x ( i ) x^{(i)} x(i)属于各高斯分布的概率组成的向量。数学符号表示如下:
    p j ( i ) p_j^{(i)} pj(i)表示样本 x ( i ) x^{(i)} x(i)属于离散常量 z j z_j zj对应概率分布 N ( μ j , Σ j ) \mathcal N(\mu_j,\Sigma_j) N(μj,Σj)的概率结果。
    p z ( i ) = ( p 1 ( i ) , p 2 ( i ) , ⋯   , p K ( i ) ) T p j ( i ) = P ( x ( i ) → z j ) = P ( x ( i ) ∈ N ( μ j , Σ j ) ) ( j = 1 , 2 , ⋯   , K ) p_{z^{(i)}} = \left(p_1^{(i)},p_2^{(i)},\cdots,p_{\mathcal K}^{(i)}\right)^{T} \\ p_{j}^{(i)} = P(x^{(i)} \to z_j) = P(x^{(i)} \in \mathcal N(\mu_j,\Sigma_j)) \quad (j =1,2,\cdots,\mathcal K) pz(i)=(p1(i),p2(i),,pK(i))Tpj(i)=P(x(i)zj)=P(x(i)N(μj,Σj))(j=1,2,,K)
    同理, μ z ( i ) \mu_{z^{(i)}} μz(i)表示 x ( i ) x^{(i)} x(i)属于各高斯分布的期望组成的向量。数学符号表示如下:
    μ j ( i ) \mu_j^{(i)} μj(i)表示 x ( i ) x^{(i)} x(i)属于离散常量 z j z_j zj对应概率分布 N ( μ j , Σ j ) \mathcal N(\mu_j,\Sigma_j) N(μj,Σj)的期望信息。
    μ z ( i ) = ( μ 1 ( i ) , μ 2 ( i ) , ⋯   , μ K ( i ) ) T μ j ( i ) = μ j ∈ x ( i ) ∼ N ( μ j , Σ j ) ( j = 1 , 2 , ⋯   , K ) \mu_{z^{(i)}} = \left(\mu_1^{(i)},\mu_2^{(i)},\cdots,\mu_{\mathcal K}^{(i)}\right)^{T} \\ \mu_j^{(i)} = \mu_j \in x^{(i)} \sim \mathcal N(\mu_j,\Sigma_j) \quad (j=1,2,\cdots,\mathcal K) μz(i)=(μ1(i),μ2(i),,μK(i))Tμj(i)=μjx(i)N(μj,Σj)(j=1,2,,K)
    Σ z ( i ) \Sigma_{z^{(i)}} Σz(i)表示 x ( i ) x^{(i)} x(i)属于各高斯分布的协方差组成的向量。数学符号表示如下:
    Σ j ( i ) \Sigma_j^{(i)} Σj(i)表示 x ( i ) x^{(i)} x(i)属于离散常量 z j z_j zj对应概率分布 N ( μ j , Σ j ) \mathcal N(\mu_j,\Sigma_j) N(μj,Σj)的协方差信息。
    Σ z ( i ) = ( Σ 1 ( i ) , Σ 2 ( i ) , ⋯   , Σ K ( i ) ) T Σ j ( i ) = Σ j ∈ x ( i ) ∼ N ( μ j , Σ j ) ( j = 1 , 2 , ⋯   , K ) \Sigma_{z^{(i)}} = \left(\Sigma_1^{(i)},\Sigma_2^{(i)},\cdots,\Sigma_{\mathcal K}^{(i)}\right)^{T} \\ \Sigma_j^{(i)} = \Sigma_j \in x^{(i)} \sim \mathcal N(\mu_j,\Sigma_j) \quad (j=1,2,\cdots,\mathcal K) Σz(i)=(Σ1(i),Σ2(i),,ΣK(i))TΣj(i)=Σjx(i)N(μj,Σj)(j=1,2,,K)
  • 关于隐变量的后验概率 P ( Z ∣ X ) P(\mathcal Z \mid \mathcal X) P(ZX)表示如下:
    P ( Z ∣ X ) = P ( X , Z ) P ( X ) = ∏ i = 1 N p z ( i ) ⋅ N ( X ∣ μ z ( i ) , Σ z ( i ) ) ∑ i = 1 K p k ⋅ N ( X ∣ μ k , Σ k ) \begin{aligned} P(\mathcal Z \mid \mathcal X) & = \frac{P(\mathcal X,\mathcal Z)}{P(\mathcal X)} \\ & = \frac{\prod_{i=1}^Np_{z^{(i)}} \cdot \mathcal N(\mathcal X \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{i=1}^{\mathcal K} p_k \cdot \mathcal N(\mathcal X \mid \mu_k,\Sigma_k)} \end{aligned} P(ZX)=P(X)P(X,Z)=i=1KpkN(Xμk,Σk)i=1Npz(i)N(Xμz(i),Σz(i))
  • 至此,E步操作表示如下:
    E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] \mathbb E_{\mathcal Z \mid \mathcal X,\theta^{(t)}} \left[\log P(\mathcal X,\mathcal Z \mid \theta)\right] EZX,θ(t)[logP(X,Zθ)]是关于 θ , θ ( t ) \theta,\theta^{(t)} θ,θ(t)的函数。即:
    L ( θ , θ ( t ) ) = E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] = ∫ Z P ( Z ∣ X , θ ( t ) ) log ⁡ P ( X , Z ∣ θ ) d Z \begin{aligned} \mathcal L(\theta,\theta^{(t)}) & = \mathbb E_{\mathcal Z \mid \mathcal X,\theta^{(t)}} \left[\log P(\mathcal X,\mathcal Z \mid \theta)\right] \\ & = \int_{\mathcal Z} P(\mathcal Z \mid \mathcal X,\theta^{(t)})\log P(\mathcal X,\mathcal Z \mid \theta) d\mathcal Z \end{aligned} L(θ,θ(t))=EZX,θ(t)[logP(X,Zθ)]=ZP(ZX,θ(t))logP(X,Zθ)dZ
    经过E步的求解过程,求得最终表示结果如下:
    L ( θ , θ ( t ) ) = ∑ i = 1 N ∑ z ( i ) [ log ⁡ p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ] ⋅ p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ∑ k = 1 K p k ⋅ N ( x ( i ) ∣ μ k , Σ k ) \mathcal L(\theta,\theta^{(t)}) = \sum_{i=1}^N \sum_{z^{(i)}} \left[\log p_{z^{(i)}} \cdot\mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\right] \cdot \frac{p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k \cdot \mathcal N(x^{(i)} \mid \mu_k,\Sigma_k)} L(θ,θ(t))=i=1Nz(i)[logpz(i)N(x(i)μz(i),Σz(i))]k=1KpkN(x(i)μk,Σk)pz(i)N(x(i)μz(i),Σz(i))

EM算法M步操作

重新观察E步结果:
L ( θ , θ ( t ) ) = ∑ i = 1 N ∑ z ( i ) [ log ⁡ p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ] ⋅ p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ∑ k = 1 K p k ⋅ N ( x ( i ) ∣ μ k , Σ k ) \mathcal L(\theta,\theta^{(t)}) = \sum_{i=1}^N \sum_{z^{(i)}} \left[\log p_{z^{(i)}} \cdot\mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\right] \cdot \frac{p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k \cdot \mathcal N(x^{(i)} \mid \mu_k,\Sigma_k)} L(θ,θ(t))=i=1Nz(i)[logpz(i)N(x(i)μz(i),Σz(i))]k=1KpkN(x(i)μk,Σk)pz(i)N(x(i)μz(i),Σz(i))

  • 其中 P ( X , Z ∣ θ ) P(\mathcal X,\mathcal Z \mid \theta) P(X,Zθ)部分表示如下:
    log ⁡ P ( x ( i ) , z ( i ) ∣ θ ) = log ⁡ [ p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ] \log P(x^{(i)},z^{(i)} \mid \theta) = \log \left[p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\right] logP(x(i),z(i)θ)=log[pz(i)N(x(i)μz(i),Σz(i))]
    θ \theta θ具体指(共三项):
    p z ( i ) , μ z ( i ) , Σ z ( i ) ( i = 1 , 2 , ⋯   , N ) p_{z^{(i)}},\mu_{z^{(i)}},\Sigma_{z^{(i)}} \quad (i=1,2,\cdots,N) pz(i),μz(i),Σz(i)(i=1,2,,N)

  • P ( Z ∣ X , θ ( t ) ) P(\mathcal Z \mid \mathcal X,\theta^{(t)}) P(ZX,θ(t))部分表示如下:
    P ( z ( i ) ∣ z ( i ) , θ ( t ) ) = p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ∑ k = 1 K p k ⋅ N ( x ( i ) ∣ μ k , Σ k ) P(\mathcal z^{(i)} \mid z^{(i)},\theta^{(t)}) = \frac{p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k \cdot \mathcal N(x^{(i)} \mid \mu_k,\Sigma_k)} P(z(i)z(i),θ(t))=k=1KpkN(x(i)μk,Σk)pz(i)N(x(i)μz(i),Σz(i))
    θ ( t ) \theta^{(t)} θ(t)具体指(共6项):
    p z ( i ) , μ z ( i ) , Σ z ( i ) ( i = 1 , 2 , ⋯   , N ) p k , μ k , Σ k ( k = 1 , 2 , ⋯   , K ) p_{z^{(i)}},\mu_{z^{(i)}},\Sigma_{z^{(i)}} \quad (i=1,2,\cdots,N) \\ p_k,\mu_k,\Sigma_k \quad (k=1,2,\cdots,\mathcal K) pz(i),μz(i),Σz(i)(i=1,2,,N)pk,μk,Σk(k=1,2,,K)
    由于 θ ( t ) \theta^{(t)} θ(t)上一次迭代得到的参数结果,是已知量;因此将 L ( θ , θ ( t ) ) \mathcal L(\theta,\theta^{(t)}) L(θ,θ(t))中的 θ ( t ) \theta^{(t)} θ(t)项修正过来:
    例如 p z ( i ) ( t ) p_{z^{(i)}}^{(t)} pz(i)(t) θ ( t ) \theta^{(t)} θ(t)的一个解,区别于对应 θ \theta θ的解 p z ( i ) p_{z^{(i)}} pz(i)
    L ( θ , θ ( t ) ) = ∑ z ( i ) ∑ i = 1 N [ log ⁡ p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ] ⋅ p z ( i ) ( t ) ⋅ N ( x ( i ) ∣ μ z ( i ) ( t ) , Σ z ( i ) ( t ) ) ∑ k = 1 K p k ( t ) ⋅ N ( x ( i ) ∣ μ k ( t ) , Σ k ( t ) ) \mathcal L(\theta,\theta^{(t)}) = \sum_{z^{(i)}} \sum_{i=1}^N\left[\log p_{z^{(i)}} \cdot\mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\right] \cdot \frac{p_{z^{(i)}}^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}}^{(t)},\Sigma_{z^{(i)}}^{(t)})}{\sum_{k=1}^{\mathcal K} p_k^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_k^{(t)},\Sigma_k^{(t)})} L(θ,θ(t))=z(i)i=1N[logpz(i)N(x(i)μz(i),Σz(i))]k=1Kpk(t)N(x(i)μk(t),Σk(t))pz(i)(t)N(x(i)μz(i)(t),Σz(i)(t))
    实际上 p z ( i ) ( t ) ⋅ N ( x ( i ) ∣ μ z ( i ) ( t ) , Σ z ( i ) ( t ) ) ∑ k = 1 K p k ( t ) ⋅ N ( x ( i ) ∣ μ k ( t ) , Σ k ( t ) ) \frac{p_{z^{(i)}}^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}}^{(t)},\Sigma_{z^{(i)}}^{(t)})}{\sum_{k=1}^{\mathcal K} p_k^{(t)} \cdot \mathcal N(x^{(i)} \mid \mu_k^{(t)},\Sigma_k^{(t)})} k=1Kpk(t)N(x(i)μk(t),Σk(t))pz(i)(t)N(x(i)μz(i)(t),Σz(i)(t))是由 θ ( t ) \theta^{(t)} θ(t)构成的量,它的结果不会对当前迭代步骤 θ \theta θ的最优值产生影响。因此,在这里将其缩写成 P ( z ( i ) ∣ x ( i ) , θ ( t ) ) P(z^{(i)} \mid x^{(i)},\theta^{(t)}) P(z(i)x(i),θ(t))

  • 由于 z i z_i zi本质上是样本 x ( i ) x^{(i)} x(i)所有可能属于的高斯分布组成的向量,即:
    z ( i ) = ( z 1 ( i ) , z 2 ( i ) , ⋯   , z K ( i ) ) T z^{(i)} = (z_1^{(i)},z_2^{(i)},\cdots,z_{\mathcal K}^{(i)})^{T} z(i)=(z1(i),z2(i),,zK(i))T
    并且对应的 p z ( i ) , μ z ( i ) , Σ z ( i ) p_{z^{(i)}},\mu_{z^{(i)}},\Sigma_{z^{(i)}} pz(i),μz(i),Σz(i)分别表示如下:
    查看详情移步至传送门
    p z ( i ) = ( p 1 ( i ) , p 2 ( i ) , ⋯   , p K ( i ) ) T μ z ( i ) = ( μ 1 ( i ) , μ 2 ( i ) , ⋯   , μ K ( i ) ) T Σ z ( i ) = ( Σ 1 ( i ) , Σ 2 ( i ) , ⋯   , Σ K ( i ) ) T p_{z^{(i)}} = (p_1^{(i)},p_2^{(i)},\cdots,p_{\mathcal K}^{(i)})^{T} \\ \mu_{z^{(i)}} = (\mu_1^{(i)},\mu_2^{(i)},\cdots,\mu_{\mathcal K}^{(i)})^{T} \\ \Sigma_{z^{(i)}} = (\Sigma_1^{(i)},\Sigma_2^{(i)},\cdots,\Sigma_{\mathcal K}^{(i)})^{T} pz(i)=(p1(i),p2(i),,pK(i))Tμz(i)=(μ1(i),μ2(i),,μK(i))TΣz(i)=(Σ1(i),Σ2(i),,ΣK(i))T
    因此, ∑ z ( i ) p z ( i ) \sum_{z^{(i)}} p_{z^{(i)}} z(i)pz(i)分别表示如下:
    ∑ z ( i ) p z ( i ) = ∑ k = 1 K p k ( i ) ∑ z ( i ) μ z ( i ) = ∑ k = 1 K μ k ( i ) ∑ z ( i ) Σ z ( i ) = ∑ k = 1 K Σ k ( i ) \sum_{z^{(i)}} p_{z^{(i)}} = \sum_{k=1}^{\mathcal K} p_k^{(i)} \quad \sum_{z^{(i)}} \mu_{z^{(i)}} = \sum_{k=1}^{\mathcal K} \mu_{k}^{(i)} \quad \sum_{z^{(i)}} \Sigma_{z^{(i)}} = \sum_{k=1}^{\mathcal K} \Sigma_{k}^{(i)} z(i)pz(i)=k=1Kpk(i)z(i)μz(i)=k=1Kμk(i)z(i)Σz(i)=k=1KΣk(i)
    基于上述公式,对 L ( θ , θ ( t ) ) \mathcal L(\theta,\theta^{(t)}) L(θ,θ(t))进行变换:
    L ( θ , θ ( t ) ) = ∑ k = 1 K ∑ i = 1 N log ⁡ [ p k ( i ) ⋅ N ( x ( i ) ∣ μ k ( i ) , Σ k ( i ) ) ] P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) = ∑ k = 1 K ∑ i = 1 N [ log ⁡ p k ( i ) + log ⁡ N ( x ( i ) ∣ μ k ( i ) , Σ k ( i ) ) ] P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) \begin{aligned} \mathcal L(\theta,\theta^{(t)}) & = \sum_{k=1}^{\mathcal K} \sum_{i=1}^{N} \log \left[p_k^{(i)} \cdot \mathcal N(x^{(i)} \mid \mu_k^{(i)},\Sigma_{k}^{(i)})\right] P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) \\ & = \sum_{k=1}^{\mathcal K} \sum_{i=1}^{N} \left[\log p_k^{(i)} + \log \mathcal N(x^{(i)} \mid \mu_{k}^{(i)},\Sigma_k^{(i)})\right]P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) \end{aligned} L(θ,θ(t))=k=1Ki=1Nlog[pk(i)N(x(i)μk(i),Σk(i))]P(z(i)=zkx(i),θ(t))=k=1Ki=1N[logpk(i)+logN(x(i)μk(i),Σk(i))]P(z(i)=zkx(i),θ(t))
    我们以求解 p k ( i ) p_k^{(i)} pk(i)在当前时刻的最优解 [ p k ( i ) ] ( t + 1 ) [p_k^{(i)}]^{(t+1)} [pk(i)](t+1)为例。上述式子中只有方括号内第一项包含 p k ( i ) p_k^{(i)} pk(i),因此则有:
    [ p k ( i ) ] ( t + 1 ) = arg ⁡ max ⁡ p k ( i ) ∑ k = 1 K ∑ i = 1 N log ⁡ p k ( i ) P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) [p_k^{(i)}]^{(t+1)} = \mathop{\arg\max}\limits_{p_k^{(i)}}\sum_{k=1}^{\mathcal K} \sum_{i=1}^{N} \log p_k^{(i)} P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) [pk(i)](t+1)=pk(i)argmaxk=1Ki=1Nlogpk(i)P(z(i)=zkx(i),θ(t))
    并且 p k ( i ) p_k^{(i)} pk(i)是概率结果,因此 p k ( i ) p_k^{(i)} pk(i)需要满足约束条件:
    ∑ k = 1 K p k ( i ) = 1 \sum_{k=1}^{\mathcal K} p_k^{(i)} = 1 k=1Kpk(i)=1
    至此,求解 p z ( i ) p_{z^{(i)}} pz(i)的最优解 p z ( i ) ( t + 1 ) p_{z^{(i)}}^{(t+1)} pz(i)(t+1)即:
    p z ( i ) ( t + 1 ) = ( [ p 1 ( i ) ] ( t + 1 ) , [ p 2 ( i ) ] ( t + 1 ) , ⋯   , [ p K ( i ) ] ( t + 1 ) ) T p_{z^{(i)}}^{(t+1)} = ([p_1^{(i)}]^{(t+1)},[p_2^{(i)}]^{(t+1)},\cdots,[p_{\mathcal K}^{(i)}]^{(t+1)})^{T} pz(i)(t+1)=([p1(i)](t+1),[p2(i)](t+1),,[pK(i)](t+1))T
    其中任意一项 [ p k ( i ) ] ( t + 1 ) ( k = 1 , 2 , ⋯   , K ) [p_k^{(i)}]^{(t+1)}(k=1,2,\cdots,\mathcal K) [pk(i)](t+1)(k=1,2,,K)使用如下优化函数进行表示:
    { arg ⁡ max ⁡ p k ( i ) ∑ k = 1 K ∑ i = 1 N log ⁡ p k ( i ) ⋅ P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) s . t . ∑ k = 1 K p k ( i ) = 1 \begin{cases} \mathop{\arg\max}\limits_{p_k^{(i)}}\sum_{k=1}^{\mathcal K} \sum_{i=1}^{N} \log p_k^{(i)} \cdot P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) \\ s.t. \quad \sum_{k=1}^{\mathcal K} p_k^{(i)} = 1 \end{cases} pk(i)argmaxk=1Ki=1Nlogpk(i)P(z(i)=zkx(i),θ(t))s.t.k=1Kpk(i)=1

求解过程

使用拉格朗日乘数法进行求解:

  • 构建拉格朗日函数 S ( p k ( i ) , λ ) \mathcal S(p_k^{(i)},\lambda) S(pk(i),λ)
    S ( p k ( i ) , λ ) = ∑ k = 1 K ∑ i = 1 N log ⁡ p k ( i ) ⋅ P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) + λ ( ∑ k = 1 K p k ( i ) − 1 ) \mathcal S(p_k^{(i)},\lambda) = \sum_{k=1}^{\mathcal K}\sum_{i=1}^{N} \log p_k^{(i)} \cdot P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) + \lambda (\sum_{k=1}^{\mathcal K} p_k^{(i)} - 1) S(pk(i),λ)=k=1Ki=1Nlogpk(i)P(z(i)=zkx(i),θ(t))+λ(k=1Kpk(i)1)
  • 拉格朗日函数 S ( p k ( i ) , λ ) \mathcal S(p_k^{(i)},\lambda) S(pk(i),λ) p k ( i ) p_k^{(i)} pk(i)求偏导:
    观察第一个连加号 ∑ k = 1 K \sum_{k=1}^{\mathcal K} k=1K,只有唯一一个 k k k p k ( i ) p_k^{(i)} pk(i)相关,其余均为常数;
    ∂ S ( p k ( i ) , λ ) ∂ p k ( i ) = ∑ i = 1 N 1 p k ( i ) ⋅ P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) + λ \frac{\partial \mathcal S(p_k^{(i)},\lambda)}{\partial p_k^{(i)}} = \sum_{i=1}^N \frac{1}{p_k^{(i)}}\cdot P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) +\lambda pk(i)S(pk(i),λ)=i=1Npk(i)1P(z(i)=zkx(i),θ(t))+λ
  • ∂ S ( p k ( i ) , λ ) ∂ p k ( i ) ≜ 0 \frac{\partial \mathcal S(p_k^{(i)},\lambda)}{\partial p_k^{(i)}} \triangleq 0 pk(i)S(pk(i),λ)0,等式两端同乘 p k ( i ) p_k^{(i)} pk(i)
    ∑ i = 1 N P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) + p k ( i ) λ = 0 \sum_{i=1}^N P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) +p_k^{(i)} \lambda = 0 i=1NP(z(i)=zkx(i),θ(t))+pk(i)λ=0
  • ∀ k ∈ { 1 , 2 , ⋯   , K } \forall k \in \{1,2,\cdots,\mathcal K\} k{1,2,,K},均进行求导并等于0,则有:
    ∑ i = 1 N ∑ k = 1 K P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) + ∑ k = 1 K p k ( i ) λ = 0 + ⋯ + 0 = 0 \sum_{i=1}^N\sum_{k=1}^{\mathcal K} P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) +\sum_{k=1}^{\mathcal K} p_k^{(i)} \lambda = 0 + \cdots + 0 = 0 i=1Nk=1KP(z(i)=zkx(i),θ(t))+k=1Kpk(i)λ=0++0=0
    其中:
    条件概率密度积分~约束条件~
    ∑ k = 1 K P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) = 1 ∑ k = 1 K p k ( i ) = 1 \sum_{k=1}^{\mathcal K} P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) = 1 \\ \sum_{k=1}^{\mathcal K} p_k^{(i)} = 1 k=1KP(z(i)=zkx(i),θ(t))=1k=1Kpk(i)=1
    整理得:
    λ = − N \lambda = -N λ=N
    因此,将 λ = − N \lambda = -N λ=N带回原式,则有:
    [ p k ( i ) ] ( t + 1 ) = 1 N ∑ i = 1 N P ( z ( i ) = z k ∣ x ( i ) , θ ( t ) ) [p_k^{(i)}]^{(t+1)} = \frac{1}{N} \sum_{i=1}^N P(z^{(i)} = z_k \mid x^{(i)},\theta^{(t)}) [pk(i)](t+1)=N1i=1NP(z(i)=zkx(i),θ(t))
    基于上式,可以求出 [ p 1 ( i ) ] ( t + 1 ) , [ p 2 ( i ) ] ( t + 1 ) , ⋯   , [ p K ( i ) ] ( t + 1 ) [p_1^{(i)}]^{(t+1)},[p_2^{(i)}]^{(t+1)},\cdots,[p_{\mathcal K}^{(i)}]^{(t+1)} [p1(i)](t+1),[p2(i)](t+1),,[pK(i)](t+1)
    因而最终求解隐变量 z ( i ) z^{(i)} z(i)的后验概率分布结果 p z ( i ) ( t + 1 ) p_{z^{(i)}}^{(t+1)} pz(i)(t+1)
    p z ( i ) ( t + 1 ) = ( [ p 1 ( i ) ] ( t + 1 ) , [ p 2 ( i ) ] ( t + 1 ) , ⋯   , [ p K ( i ) ] ( t + 1 ) ) T p_{z^{(i)}}^{(t+1)} = \left([p_1^{(i)}]^{(t+1)},[p_2^{(i)}]^{(t+1)},\cdots,[p_{\mathcal K}^{(i)}]^{(t+1)}\right)^{T} pz(i)(t+1)=([p1(i)](t+1),[p2(i)](t+1),,[pK(i)](t+1))T

同理,可求出其他隐变量 z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) z^{(1)},z^{(2)},\cdots,z^{(N)} z(1),z(2),,z(N)的离散参数的迭代概率结果。

p z ( 1 ) ( t + 1 ) , p z ( 2 ) ( t + 1 ) , ⋯   , p z ( N ) ( t + 1 ) p_{z^{(1)}}^{(t+1)},p_{z^{(2)}}^{(t+1)},\cdots,p_{z^{(N)}}^{(t+1)} pz(1)(t+1),pz(2)(t+1),,pz(N)(t+1)

至此,高斯混合模型部分介绍结束。下一节将介绍隐马尔可夫模型
推导过程本身不是重点,只需要知道‘高斯混合模型’可以使用‘狭义EM’直接求解即可。高斯混合模型作为最经典的‘概率生成模型’,它的‘生成模型思想’需要留意。

相关参考:
机器学习-高斯混合模型(4)-EM求解-M-Step

相关文章:

  • Jupyter 介绍
  • Code For Better 谷歌开发者之声——Google Play
  • 【好书推荐】程序是怎样跑起来的
  • 关于技术分享及内卷
  • 源码解析Java数组如何选择排序的算法
  • java计算机毕业设计基于安卓Android微信小程序的共享单车租赁系统uniApp
  • TCP 的自然律
  • Cobalt Strike 注入msf会话
  • C语言字符串函数简单介绍
  • MySQL高级篇——日志
  • TiDB在线修改集群配置
  • 搭建TiDB双集群主从复制
  • nodejs+vue+elementui寻医问药网站
  • java计算机毕业设计基于安卓Android微信小程序的应急求救信息发布系统小程序uniAPP
  • 混沌映射与动态学习的自适应樽海鞘群算法-附代码
  • Brief introduction of how to 'Call, Apply and Bind'
  • go语言学习初探(一)
  • Python 反序列化安全问题(二)
  • React-redux的原理以及使用
  • Synchronized 关键字使用、底层原理、JDK1.6 之后的底层优化以及 和ReenTrantLock 的对比...
  • Transformer-XL: Unleashing the Potential of Attention Models
  • Vue--数据传输
  • 跨域
  • 如何使用 OAuth 2.0 将 LinkedIn 集成入 iOS 应用
  • 三栏布局总结
  • 手机端车牌号码键盘的vue组件
  • 推荐一款sublime text 3 支持JSX和es201x 代码格式化的插件
  • 一、python与pycharm的安装
  • 智能合约Solidity教程-事件和日志(一)
  • ​​​​​​​​​​​​​​汽车网络信息安全分析方法论
  • ​业务双活的数据切换思路设计(下)
  • # 学号 2017-2018-20172309 《程序设计与数据结构》实验三报告
  • #传输# #传输数据判断#
  • ()、[]、{}、(())、[[]]等各种括号的使用
  • (1)(1.19) TeraRanger One/EVO测距仪
  • (11)MATLAB PCA+SVM 人脸识别
  • (pt可视化)利用torch的make_grid进行张量可视化
  • (四)Tiki-taka算法(TTA)求解无人机三维路径规划研究(MATLAB)
  • (一)pytest自动化测试框架之生成测试报告(mac系统)
  • (译) 函数式 JS #1:简介
  • (原创)Stanford Machine Learning (by Andrew NG) --- (week 9) Anomaly DetectionRecommender Systems...
  • ..thread“main“ com.fasterxml.jackson.databind.JsonMappingException: Jackson version is too old 2.3.1
  • .net core IResultFilter 的 OnResultExecuted和OnResultExecuting的区别
  • .NET Core日志内容详解,详解不同日志级别的区别和有关日志记录的实用工具和第三方库详解与示例
  • .Net 代码性能 - (1)
  • [Android Pro] listView和GridView的item设置的高度和宽度不起作用
  • [BZOJ] 3262: 陌上花开
  • [BZOJ1010] [HNOI2008] 玩具装箱toy (斜率优化)
  • [bzoj1901]: Zju2112 Dynamic Rankings
  • [bzoj2957]楼房重建
  • [HEOI2013]ALO
  • [IE9] IE9 Beta崩溃问题解决方案
  • [Linux] LVS+Keepalived高可用集群部署
  • [Linux]进程创建➕进程终止
  • [macOS] Mojave10.14 夜神安卓模拟器启动问题