当前位置: 首页 > news >正文

【概率论与数理统计(研究生课程)】知识点总结9(回归分析)

原文地址:【概率论与数理统计(研究生课程)】知识点总结9(回归分析)

一元线性回归模型

y = β 0 + β 1 x + ϵ , ϵ ∼ N ( μ , σ 2 ) E ( ϵ ) = 0 , D ( ϵ ) = σ 2 > 0 ⟹ E ( y ) = β 0 + β 1 x \begin{aligned} &y=\beta_0+\beta_1x+\epsilon,\quad \epsilon \sim N(\mu, \sigma^2) \\ &E(\epsilon)=0,D(\epsilon)=\sigma^2>0 \Longrightarrow E(y)=\beta_0+\beta_1x \end{aligned} y=β0+β1x+ϵ,ϵN(μ,σ2)E(ϵ)=0,D(ϵ)=σ2>0E(y)=β0+β1x

回归方程: y ^ = β 0 ^ + β 1 ^ x \hat{y}=\hat{\beta_0}+\hat{\beta_1}x y^=β0^+β1^x

推导过程:
y i − E ( y i ) = y i − ( β 0 + β 1 x i ) Q ( β 1 , β 2 ) = ∑ i = 1 n ( y i − E ( y i ) ) 2 = ∑ i = 1 n ( y i − β 0 − β 1 x i ) 2 make  ∂ Q ( β 0 , β 1 ) ∂ β 0 = − 2 ∑ i = 1 n ( y i − β 0 − β 1 x i ) = 0 make  ∂ Q ( β 0 , β 1 ) ∂ β 1 = − 2 ∑ i = 1 n x i ( y i − β 0 − β 1 x i ) = 0 \begin{aligned} y_i-E(y_i)&=y_i-(\beta_0+\beta_1x_i) \\ Q(\beta_1, \beta_2)&=\sum\limits_{i=1}^{n}(y_i-E(y_i))^2 \\ &=\sum\limits_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2 \\ \text{make }\quad\frac{\partial{Q(\beta_0,\beta_1)}}{\partial{\beta_0}}&=-2\sum\limits_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)=0 \\ \text{make }\quad\frac{\partial{Q(\beta_0,\beta_1)}}{\partial{\beta_1}}&=-2\sum\limits_{i=1}^{n}x_i(y_i-\beta_0-\beta_1x_i)=0 \\ \end{aligned} yiE(yi)Q(β1,β2)make β0Q(β0,β1)make β1Q(β0,β1)=yi(β0+β1xi)=i=1n(yiE(yi))2=i=1n(yiβ0β1xi)2=2i=1n(yiβ0β1xi)=0=2i=1nxi(yiβ0β1xi)=0
整理得到正规方程组:
n β 0 ^ + n x ˉ β 1 ^ = n y ˉ ( 1 ) n x ˉ β 0 ^ + ( ∑ i = 1 n x i 2 ) β 1 ^ = ∑ i = 1 n x i y i ( 2 ) \begin{aligned} &n\hat{\beta_0}+n\bar{x}\hat{\beta_1}=n\bar{y}\quad (1)\\ &n\bar{x}\hat{\beta_0}+(\sum\limits^{n}_{i=1}{x_i^2}) \hat{\beta_1} =\sum\limits_{i=1}^{n}x_iy_i \quad (2) \end{aligned} nβ0^+nxˉβ1^=nyˉ(1)nxˉβ0^+(i=1nxi2)β1^=i=1nxiyi(2)
解上述方程组得到:
β 1 ^ = L x y L x x β 0 ^ = y ˉ − β 1 ^ x ˉ L x x = ∑ i = 1 n ( x i − x ˉ ) 2 = ∑ i = 1 n x i 2 − n x ˉ 2 = ∑ i = 1 n x i 2 − 1 n ( ∑ i = 1 n x i ) 2 L y y = ∑ i = 1 n ( y i − y ˉ ) 2 = ∑ i = 1 n y i 2 − n y ˉ 2 = ∑ i = 1 n y i 2 − 1 n ( ∑ i = 1 n y i ) 2 L x y = ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) = ∑ i = 1 n x i y i − n x ˉ y ˉ = ∑ i = 1 n x i y i − 1 n ∑ i = 1 n x i ∑ i = 1 n y i \begin{aligned} &\hat{\beta_1}=\frac{L_{xy}}{L_{xx}} \\ &\hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x} \\ &L_{xx}=\sum\limits_{i=1}^{n}(x_i-\bar{x})^2=\sum\limits_{i=1}^{n}x_i^2-n\bar{x}^2=\sum\limits_{i=1}^{n}x_i^2-\frac{1}{n}(\sum\limits_{i=1}^{n}x_i)^2 \\ &L_{yy}=\sum\limits_{i=1}^{n}(y_i-\bar{y})^2=\sum\limits_{i=1}^{n}y_i^2-n\bar{y}^2=\sum\limits_{i=1}^{n}y_i^2-\frac{1}{n}(\sum\limits_{i=1}^{n}y_i)^2 \\ &L_{xy}=\sum\limits_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})=\sum\limits_{i=1}^{n}x_iy_i-n\bar{x}\bar{y}=\sum\limits_{i=1}^{n}x_iy_i-\frac{1}{n}\sum\limits_{i=1}^{n}x_i \sum\limits_{i=1}^{n}y_i \end{aligned} β1^=LxxLxyβ0^=yˉβ1^xˉLxx=i=1n(xixˉ)2=i=1nxi2nxˉ2=i=1nxi2n1(i=1nxi)2Lyy=i=1n(yiyˉ)2=i=1nyi2nyˉ2=i=1nyi2n1(i=1nyi)2Lxy=i=1n(xixˉ)(yiyˉ)=i=1nxiyinxˉyˉ=i=1nxiyin1i=1nxii=1nyi

如果题目中给了 ∑ \sum 形式的数据, L x x , L y y , L x y L_{xx},L_{yy},L_{xy} Lxx,Lyy,Lxy一般用上述公式最右边的方式来求。

残差/剩余平方和

Q e = ∑ i = 1 n e i 2 = ∑ i = 1 n ( y i − y i ^ ) 2 = ∑ i = 1 n ( y i − β 0 ^ − β 1 ^ x i ) 2 = L y y − β 1 ^ L x y = L y y − L x y 2 L x x Q_e=\sum\limits_{i=1}^{n}e_i^2=\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2=\sum\limits_{i=1}^{n}(y_i-\hat{\beta_0}-\hat{\beta_1}x_i)^2=L_{yy}-\hat{\beta_1}L_{xy}=L_{yy}-\frac{L_{xy}^2}{L_{xx}} Qe=i=1nei2=i=1n(yiyi^)2=i=1n(yiβ0^β1^xi)2=Lyyβ1^Lxy=LyyLxxLxy2

定理: Q e σ 2 ∼ χ 2 ( n − 2 ) \frac{Q_e}{\sigma^2}\sim\chi^2(n-2) σ2Qeχ2(n2)
E ( Q e σ 2 ) = n − 2 ⟹ E ( Q e n − 2 ) = σ 2 ⟹ σ 2 ^ = Q e n − 2 \begin{aligned} &E(\frac{Q_e}{\sigma^2})=n-2 \\ \Longrightarrow \quad &E(\frac{Q_e}{n-2})=\sigma^2 \\ \Longrightarrow \quad &\hat{\sigma^2}=\frac{Q_e}{n-2} \end{aligned} E(σ2Qe)=n2E(n2Qe)=σ2σ2^=n2Qe
σ ^ 2 \hat{\sigma}^2 σ^2的无偏估计为 Q e n − 2 \frac{Q_e}{n-2} n2Qe

最小二乘估计量的性质

β 0 , β 1 \beta_0,\beta_1 β0,β1的最小二乘估计量都是无偏的: E ( β 0 ^ ) = β 0 , E ( β 1 ^ ) = β 1 E(\hat{\beta_0})=\beta_0,\quad E(\hat{\beta_1})=\beta_1 E(β0^)=β0,E(β1^)=β1

β 0 ^ ∼ N ( β 0 , ( 1 n + x ˉ 2 L x x ) σ 2 ) \hat{\beta_0}\sim N(\beta_0, (\frac{1}{n}+\frac{\bar{x}^2}{L_{xx}})\sigma^2) β0^N(β0,(n1+Lxxxˉ2)σ2)

β 1 ^ ∼ N ( β 1 , σ 2 L x x ) \hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) β1^N(β1,Lxxσ2)

C o v ( β 0 ^ , β 1 ^ ) = − x ˉ L x x σ 2 Cov(\hat{\beta_0},\hat{\beta_1})=-\frac{\bar{x}}{L_{xx}}\sigma^2 Cov(β0^,β1^)=Lxxxˉσ2

y 0 ^ ∼ N ( β 0 + β 1 x 0 , ( 1 n + ( x 0 − x ˉ ) 2 L x x ) σ 2 ) \hat{y_0}\sim N(\beta_0+\beta_1x_0, (\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}})\sigma^2) y0^N(β0+β1x0,(n1+Lxx(x0xˉ)2)σ2)

回归方程显著性检验(t、F、r)

  1. 提出原假设和备择假设(回归方程是否显著,反映在斜率是否为0):

H 0 : β 1 = 0 ; H 1 : β 1 ≠ 0 H_0: \beta_1=0; \quad H_1:\beta_1\neq0 H0:β1=0;H1:β1=0

  1. 选取统计量:
    β 1 ^ ∼ N ( β 1 , σ 2 L x x ) ⟹ β 1 ^ − β 1 σ 2 L x x ∼ N ( 0 , 1 ) → H 0 β 1 ^ L x x σ ∼ N ( 0 , 1 ) \begin{aligned} &\hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) \\ \Longrightarrow \quad &\frac{\hat{\beta_1}-\beta_1}{\sqrt{\frac{\sigma^2}{L_{xx}}}}\sim N(0,1) \\ \xrightarrow{H_0} \quad &\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\sigma}\sim N(0,1) \end{aligned} H0 β1^N(β1,Lxxσ2)Lxxσ2 β1^β1N(0,1)σβ1^Lxx N(0,1)
    若需构造 t t t检验,还需要一个 χ 2 \chi^2 χ2分布,而 Q e σ 2 ∼ χ 2 ( n − 2 ) \frac{Q_e}{\sigma^2}\sim\chi^2(n-2) σ2Qeχ2(n2),从而:
    T = β 1 ^ L x x σ Q e σ 2 / ( n − 2 ) → σ 2 ^ = Q e n − 2 β 1 ^ L x x σ ^ ∼ t ( n − 2 ) T=\frac{\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\sigma}}{\sqrt{\frac{Q_e}{\sigma^2}/(n-2)}}\xrightarrow{\hat{\sigma^2}=\frac{Q_e}{n-2}}\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\hat\sigma} \sim t(n-2) T=σ2Qe/(n2) σβ1^Lxx σ2^=n2Qe σ^β1^Lxx t(n2)
    若使用 F F F检验,需要计算回归平方和以及残差平方和:
    S R 2 = ∑ i = 1 n ( y i ^ − y i ˉ ) 2 = β 1 ^ L x y S e 2 = ∑ i = 1 n ( y i − y i ^ ) 2 = S T 2 − S R 2 = L y y − β 1 ^ L x y S R 2 σ 2 ∼ χ 2 ( 1 ) , S e 2 σ 2 ∼ χ 2 ( n − 2 ) F = S R 2 σ 2 / 1 S e 2 σ 2 / ( n − 2 ) = ( n − 2 ) S R 2 S e 2 ∼ F ( 1 , n − 2 ) \begin{aligned} &S_R^2=\sum\limits_{i=1}^{n}(\hat{y_i}-\bar{y_i})^2=\hat{\beta_1}L_{xy} \\ &S_e^2=\sum\limits_{i=1}^{n}(y_i-\hat{y_i})^2=S_T^2-S_R^2=L_{yy}-\hat{\beta_1}L_{xy} \\ &\frac{S_R^2}{\sigma^2}\sim \chi^2(1), \quad \frac{S_e^2}{\sigma^2}\sim \chi^2(n-2) \\ &F=\frac{\frac{S_R^2}{\sigma^2}/1}{\frac{S_e^2}{\sigma^2}/(n-2)}=\frac{(n-2)S_R^2}{S_e^2}\sim F(1,n-2) \end{aligned} SR2=i=1n(yi^yiˉ)2=β1^LxySe2=i=1n(yiyi^)2=ST2SR2=Lyyβ1^Lxyσ2SR2χ2(1),σ2Se2χ2(n2)F=σ2Se2/(n2)σ2SR2/1=Se2(n2)SR2F(1,n2)

  2. 拒绝域

    t t t检验拒绝域: ∣ T ∣ = ∣ β 1 ^ L x x σ ^ ∣ ≥ t α 2 ( n − 2 ) |T|=|\frac{\hat{\beta_1}\sqrt{L_{xx}}}{\hat{\sigma}}|\ge t_{\frac{\alpha}{2}}(n-2) T=σ^β1^Lxx t2α(n2)

    F F F检验拒绝域: F ≥ F α ( 1 , n − 2 ) F\ge F_\alpha(1,n-2) FFα(1,n2)

  3. 确定 t α 2 ( n − 2 ) o r F α ( 1 , n − 2 ) t_{\frac{\alpha}{2}(n-2)}\quad or \quad F_{\alpha}(1,n-2) t2α(n2)orFα(1,n2)

  4. 计算 ∣ T ∣ o r F |T|\quad or\quad F TorF

  5. 判断结果

回归系数的区间估计

β 1 ^ ∼ N ( β 1 , σ 2 L x x ) ⟹ β 1 ^ − β 1 σ 2 L x x ∼ N ( 0 , 1 ) ⟹ ( β 1 ^ − β 1 ) L x x σ ∼ N ( 0 , 1 ) T = ( β 1 ^ − β 1 ) L x x σ Q e σ 2 / ( n − 2 ) → σ 2 ^ = Q e n − 2 ( β 1 ^ − β 1 ) L x x σ ^ ∼ t ( n − 2 ) \begin{aligned} &\hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) \\ \Longrightarrow \quad &\frac{\hat{\beta_1}-\beta_1}{\sqrt{\frac{\sigma^2}{L_{xx}}}}\sim N(0,1) \\ \Longrightarrow \quad &\frac{(\hat{\beta_1}-\beta_1)\sqrt{L_{xx}}}{\sigma}\sim N(0,1) \\ T=\frac{\frac{(\hat{\beta_1}-\beta_1)\sqrt{L_{xx}}}{\sigma}}{\sqrt{\frac{Q_e}{\sigma^2}/(n-2)}}&\xrightarrow{\hat{\sigma^2}=\frac{Q_e}{n-2}}\frac{(\hat{\beta_1}-\beta_1)\sqrt{L_{xx}}}{\hat\sigma} \sim t(n-2) \end{aligned} T=σ2Qe/(n2) σ(β1^β1)Lxx β1^N(β1,Lxxσ2)Lxxσ2 β1^β1N(0,1)σ(β1^β1)Lxx N(0,1)σ2^=n2Qe σ^(β1^β1)Lxx t(n2)

β 1 \beta_1 β1置信水平为 1 − α 1-\alpha 1α的置信区间为: ( β 1 ^ ± σ ^ L x x t α 2 ( n − 2 ) ) (\hat{\beta_1}\pm \frac{\hat{\sigma}}{\sqrt{L_{xx}}}t_{\frac{\alpha}{2}}(n-2)) (β1^±Lxx σ^t2α(n2))

单值预测

设回归方程为 y ^ = β 0 ^ + β 1 ^ x \hat{y}=\hat{\beta_0}+\hat{\beta_1}x y^=β0^+β1^x,对任意给定的 x = x 0 x=x_0 x=x0 y 0 y_0 y0的预测值为 y 0 ^ = β 0 ^ + β 1 ^ x 0 \hat{y_0}=\hat{\beta_0}+\hat{\beta_1}x_0 y0^=β0^+β1^x0

β 0 ^ ∼ N ( β 0 , ( 1 n + x ˉ 2 L x x ) σ 2 ) \hat{\beta_0}\sim N(\beta_0, (\frac{1}{n}+\frac{\bar{x}^2}{L_{xx}})\sigma^2) β0^N(β0,(n1+Lxxxˉ2)σ2)

β 1 ^ ∼ N ( β 1 , σ 2 L x x ) \hat{\beta_1}\sim N(\beta_1,\frac{\sigma^2}{L_{xx}}) β1^N(β1,Lxxσ2)

C o v ( β 0 ^ , β 1 ^ ) = − x ˉ L x x σ 2 Cov(\hat{\beta_0},\hat{\beta_1})=-\frac{\bar{x}}{L_{xx}}\sigma^2 Cov(β0^,β1^)=Lxxxˉσ2

D ( y 0 ^ ) = D ( β 0 ^ ) + D ( β 1 ^ x 0 ) + 2 C o v ( β 0 , β 1 ^ x 0 ^ ) = ( 1 n + ( x ˉ − x 0 ) 2 L x x ) σ 2 D(\hat{y_0})=D(\hat{\beta_0})+D(\hat{\beta_1}x_0)+2Cov(\hat{\beta_0,\hat{\beta_1}x_0})=(\frac{1}{n}+\frac{(\bar{x}-x_0)^2}{L_{xx}})\sigma^2 D(y0^)=D(β0^)+D(β1^x0)+2Cov(β0,β1^x0^)=(n1+Lxx(xˉx0)2)σ2

y 0 ^ ∼ N ( β 0 + β 1 x 0 , ( 1 n + ( x 0 − x ˉ ) 2 L x x ) σ 2 ) \hat{y_0}\sim N(\beta_0+\beta_1x_0, (\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}})\sigma^2) y0^N(β0+β1x0,(n1+Lxx(x0xˉ)2)σ2)

区间预测

y 0 − y 0 ^ ∼ N ( 0 , [ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ] σ 2 ) U = y 0 − y 0 ^ σ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ∼ N ( 0 , 1 ) T = y 0 − y 0 ^ σ ^ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x ∼ t ( n − 2 ) \begin{aligned} y_0-\hat{y_0}\sim N(0,[1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}]\sigma^2) \\ U=\frac{y_0-\hat{y_0}}{\sigma\sqrt{1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}}}\sim N(0,1) \\ T=\frac{y_0-\hat{y_0}}{\hat\sigma\sqrt{1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}}}\sim t(n-2) \end{aligned} y0y0^N(0,[1+n1+Lxx(x0xˉ)2]σ2)U=σ1+n1+Lxx(x0xˉ)2 y0y0^N(0,1)T=σ^1+n1+Lxx(x0xˉ)2 y0y0^t(n2)

因此, y 0 y_0 y0的置信度为 1 − α 1-\alpha 1α的区间为 ( y 0 ^ − δ , y 0 ^ + δ ) , δ = t α 2 ( n − 2 ) σ ^ 1 + 1 n + ( x 0 − x ˉ ) 2 L x x (\hat{y_0}-\delta,\hat{y_0}+\delta),\delta=t_{\frac{\alpha}{2}}(n-2)\hat{\sigma}\sqrt{1+\frac{1}{n}+\frac{(x_0-\bar{x})^2}{L_{xx}}} (y0^δ,y0^+δ),δ=t2α(n2)σ^1+n1+Lxx(x0xˉ)2

可线性化的一元非线性回归

image20221022153739659.png
image20221022153756446.png
image20221022153810181.png

相关文章:

  • 1-2Java程序运行机制以及运行过程
  • 初次使用Ubuntu18.04遇到的问题——笔记4 (Ubuntu18.04+Anaconda+Pycharm+Pytorch)
  • apache服务web页面执行shell脚本
  • git如何回滚,返回到之前的记录
  • Qt实现侧边栏显示隐藏以及自定义提示框
  • ESP8266/esp32接入阿里云物联网平台点灯控制类案例
  • 【从小白到大白05】c和c++内存管理
  • 【halcon】draw_rectangle1
  • 使用clicktail采集TIDB慢查询日志到clickhouse
  • Web自动化之Pytest测试框架
  • Java模糊查询批量删除Redis的Key实现
  • 来不及细说,毕业三天靠Python兼职赚了两千
  • 标准编解码库:ByteToMessageDecoder
  • SpringBoot整合redis与缓存使用
  • 深度学习visio作图技巧
  • css布局,左右固定中间自适应实现
  • leetcode386. Lexicographical Numbers
  • LeetCode刷题——29. Divide Two Integers(Part 1靠自己)
  • MaxCompute访问TableStore(OTS) 数据
  • rabbitmq延迟消息示例
  • Sass Day-01
  • seaborn 安装成功 + ImportError: DLL load failed: 找不到指定的模块 问题解决
  • Spring Cloud Feign的两种使用姿势
  • Web Storage相关
  • Zepto.js源码学习之二
  • 理解 C# 泛型接口中的协变与逆变(抗变)
  • 我是如何设计 Upload 上传组件的
  • 一个完整Java Web项目背后的密码
  • 因为阿里,他们成了“杭漂”
  • 你学不懂C语言,是因为不懂编写C程序的7个步骤 ...
  • 整理一些计算机基础知识!
  • #考研#计算机文化知识1(局域网及网络互联)
  • (2022版)一套教程搞定k8s安装到实战 | RBAC
  • (pytorch进阶之路)扩散概率模型
  • (二十四)Flask之flask-session组件
  • (附源码)python旅游推荐系统 毕业设计 250623
  • (提供数据集下载)基于大语言模型LangChain与ChatGLM3-6B本地知识库调优:数据集优化、参数调整、Prompt提示词优化实战
  • (转)LINQ之路
  • **CI中自动类加载的用法总结
  • .Net FrameWork总结
  • .net mvc actionresult 返回字符串_.NET架构师知识普及
  • .NET Standard、.NET Framework 、.NET Core三者的关系与区别?
  • .NET/ASP.NETMVC 深入剖析 Model元数据、HtmlHelper、自定义模板、模板的装饰者模式(二)...
  • .NET/C# 编译期能确定的字符串会在字符串暂存池中不会被 GC 垃圾回收掉
  • .NET平台开源项目速览(15)文档数据库RavenDB-介绍与初体验
  • .Net中wcf服务生成及调用
  • .so文件(linux系统)
  • /etc/fstab和/etc/mtab的区别
  • /etc/X11/xorg.conf 文件被误改后进不了图形化界面
  • @SpringBootApplication 包含的三个注解及其含义
  • [1525]字符统计2 (哈希)SDUT
  • [20171113]修改表结构删除列相关问题4.txt
  • [20180312]进程管理其中的SQL Server进程占用内存远远大于SQL server内部统计出来的内存...
  • [20181219]script使用小技巧.txt
  • [3D游戏开发实践] Cocos Cyberpunk 源码解读-高中低端机性能适配策略