当前位置: 首页 > news >正文

机械学习房价预测实战(mse 回归 交叉验证)

题目一:机器学习框架

机器学习的框架有哪些?请写出其构建一个机器学习的流水线

学习的框架

TensorFlow
pytorch
Paddle Paddle
CNTK
MXNet
mindspore
oneflow
MegEngine
Jittor
sklearn

构建一个机器学习的流水线

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris

# 获取数据
#加载机器学习自带的iris数据
dataset = load_iris()
# print(dataset)
X = dataset.data
y = dataset.target
#构件流水线
scaling_pipeline = Pipeline([('scale', MinMaxScaler()),
                             ('predict', KNeighborsClassifier())])
scores = cross_val_score(scaling_pipeline, X, y, scoring='accuracy')
print("预测的准确率为{0:.1f}%".format(np.mean(scores) * 100))


预测的准确率为96.0%

题目二:机器学习的数据加载

课堂上我们已经熟悉了加载机器学习自带的iris数据,请使用相同的方法加载
boston房价数据,了解其数据的格式和结构。

from sklearn.datasets import load_boston
#加载boston数据集
boston = load_boston()
boston
{'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
         4.9800e+00],
        [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
         9.1400e+00],
        [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
         4.0300e+00],
        ...,
        [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         5.6400e+00],
        [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
         6.4800e+00],
        [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
         7.8800e+00]]),
 'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
        18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
        15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
        13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
        21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
        35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
        19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
        20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
        23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
        33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
        21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
        20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
        23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
        15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21.5, 19.6, 15.3, 19.4,
        17. , 15.6, 13.1, 41.3, 24.3, 23.3, 27. , 50. , 50. , 50. , 22.7,
        25. , 50. , 23.8, 23.8, 22.3, 17.4, 19.1, 23.1, 23.6, 22.6, 29.4,
        23.2, 24.6, 29.9, 37.2, 39.8, 36.2, 37.9, 32.5, 26.4, 29.6, 50. ,
        32. , 29.8, 34.9, 37. , 30.5, 36.4, 31.1, 29.1, 50. , 33.3, 30.3,
        34.6, 34.9, 32.9, 24.1, 42.3, 48.5, 50. , 22.6, 24.4, 22.5, 24.4,
        20. , 21.7, 19.3, 22.4, 28.1, 23.7, 25. , 23.3, 28.7, 21.5, 23. ,
        26.7, 21.7, 27.5, 30.1, 44.8, 50. , 37.6, 31.6, 46.7, 31.5, 24.3,
        31.7, 41.7, 48.3, 29. , 24. , 25.1, 31.5, 23.7, 23.3, 22. , 20.1,
        22.2, 23.7, 17.6, 18.5, 24.3, 20.5, 24.5, 26.2, 24.4, 24.8, 29.6,
        42.8, 21.9, 20.9, 44. , 50. , 36. , 30.1, 33.8, 43.1, 48.8, 31. ,
        36.5, 22.8, 30.7, 50. , 43.5, 20.7, 21.1, 25.2, 24.4, 35.2, 32.4,
        32. , 33.2, 33.1, 29.1, 35.1, 45.4, 35.4, 46. , 50. , 32.2, 22. ,
        20.1, 23.2, 22.3, 24.8, 28.5, 37.3, 27.9, 23.9, 21.7, 28.6, 27.1,
        20.3, 22.5, 29. , 24.8, 22. , 26.4, 33.1, 36.1, 28.4, 33.4, 28.2,
        22.8, 20.3, 16.1, 22.1, 19.4, 21.6, 23.8, 16.2, 17.8, 19.8, 23.1,
        21. , 23.8, 23.1, 20.4, 18.5, 25. , 24.6, 23. , 22.2, 19.3, 22.6,
        19.8, 17.1, 19.4, 22.2, 20.7, 21.1, 19.5, 18.5, 20.6, 19. , 18.7,
        32.7, 16.5, 23.9, 31.2, 17.5, 17.2, 23.1, 24.5, 26.6, 22.9, 24.1,
        18.6, 30.1, 18.2, 20.6, 17.8, 21.7, 22.7, 22.6, 25. , 19.9, 20.8,
        16.8, 21.9, 27.5, 21.9, 23.1, 50. , 50. , 50. , 50. , 50. , 13.8,
        13.8, 15. , 13.9, 13.3, 13.1, 10.2, 10.4, 10.9, 11.3, 12.3,  8.8,
         7.2, 10.5,  7.4, 10.2, 11.5, 15.1, 23.2,  9.7, 13.8, 12.7, 13.1,
        12.5,  8.5,  5. ,  6.3,  5.6,  7.2, 12.1,  8.3,  8.5,  5. , 11.9,
        27.9, 17.2, 27.5, 15. , 17.2, 17.9, 16.3,  7. ,  7.2,  7.5, 10.4,
         8.8,  8.4, 16.7, 14.2, 20.8, 13.4, 11.7,  8.3, 10.2, 10.9, 11. ,
         9.5, 14.5, 14.1, 16.1, 14.3, 11.7, 13.4,  9.6,  8.7,  8.4, 12.8,
        10.5, 17.1, 18.4, 15.4, 10.8, 11.8, 14.9, 12.6, 14.1, 13. , 13.4,
        15.2, 16.1, 17.8, 14.9, 14.1, 12.7, 13.5, 14.9, 20. , 16.4, 17.7,
        19.5, 20.2, 21.4, 19.9, 19. , 19.1, 19.1, 20.1, 19.9, 19.6, 23.2,
        29.8, 13.8, 13.3, 16.7, 12. , 14.6, 21.4, 23. , 23.7, 25. , 21.8,
        20.6, 21.2, 19.1, 20.6, 15.2,  7. ,  8.1, 13.6, 20.1, 21.8, 24.5,
        23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9]),
 'feature_names': array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
        'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7'),
 'DESCR': ".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**  \n\n    :Number of Instances: 506 \n\n    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n    :Attribute Information (in order):\n        - CRIM     per capita crime rate by town\n        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.\n        - INDUS    proportion of non-retail business acres per town\n        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n        - NOX      nitric oxides concentration (parts per 10 million)\n        - RM       average number of rooms per dwelling\n        - AGE      proportion of owner-occupied units built prior to 1940\n        - DIS      weighted distances to five Boston employment centres\n        - RAD      index of accessibility to radial highways\n        - TAX      full-value property-tax rate per $10,000\n        - PTRATIO  pupil-teacher ratio by town\n        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n        - LSTAT    % lower status of the population\n        - MEDV     Median value of owner-occupied homes in $1000's\n\n    :Missing Attribute Values: None\n\n    :Creator: Harrison, D. and Rubinfeld, D.L.\n\nThis is a copy of UCI ML housing dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/housing/\n\n\nThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n\nThe Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic\nprices and the demand for clean air', J. Environ. Economics & Management,\nvol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n...', Wiley, 1980.   N.B. Various transformations are used in the table on\npages 244-261 of the latter.\n\nThe Boston house-price data has been used in many machine learning papers that address regression\nproblems.   \n     \n.. topic:: References\n\n   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.\n   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n",
 'filename': 'D:\\dell\\lib\\site-packages\\sklearn\\datasets\\data\\boston_house_prices.csv'}
boston.feature_names
array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
       'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')

共有13个属性标签(feature),也可以在目录中以csv打开,具体目录参考如上代码运行结尾
Variables in order:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000’s

!pip install pandas
import pandas as pd
df=pd.DataFrame(boston.data,columns=boston.feature_names)
df
# df.info() #查看数据的类型,完整性
# df.describe() #查看数据的统计特征(均值、方差等)
# df.dropna(inplace=True) #删除有缺失的样本
Requirement already satisfied: pandas in d:\dell\lib\site-packages (1.2.4)
Requirement already satisfied: python-dateutil>=2.7.3 in d:\dell\lib\site-packages (from pandas) (2.8.1)
Requirement already satisfied: pytz>=2017.3 in d:\dell\lib\site-packages (from pandas) (2021.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\dell\appdata\roaming\python\python38\site-packages (from pandas) (1.19.5)
Requirement already satisfied: six>=1.5 in d:\dell\lib\site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTAT
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.98
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.14
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.03
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.94
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.33
..........................................
5010.062630.011.930.00.5736.59369.12.47861.0273.021.0391.999.67
5020.045270.011.930.00.5736.12076.72.28751.0273.021.0396.909.08
5030.060760.011.930.00.5736.97691.02.16751.0273.021.0396.905.64
5040.109590.011.930.00.5736.79489.32.38891.0273.021.0393.456.48
5050.047410.011.930.00.5736.03080.82.50501.0273.021.0396.907.88

506 rows × 13 columns

题目三:数据集的分割

请将导入的数据集分为训练集与测试集,使用train_test_split()方法

from sklearn.model_selection import train_test_split
df['target']=boston.target
features = pd.DataFrame(np.c_[df['LSTAT'],df['RM'],df['PTRATIO']],columns=['LSTAT','RM','PTRATIO'])
target=df['target']
x_train,x_test,y_train,y_test = train_test_split(features,target,random_state=5,test_size=0.17)

题目四:机器学习的训练

请创建任意一个回归训练模型,并将boston房价数据进行训练

from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression,Lasso
lr = LinearRegression() #实例化一个线性回归对象
lr.fit(x_train, y_train) #采用fit方法,拟合回归系数和截距
y_pred = lr.predict(x_test)#模型预测
print(r2_score(y_test, y_pred))#模型评价, 决定系数
0.7017302408287501

题目五:判断训练的效果

请使用MSE来评判我们的训练效果

from sklearn.metrics import mean_squared_error
print("mse=",mean_squared_error(y_test, y_pred))#均方误差
mse= 23.515599635089057
# EN=ElasticNet(0.02)  #实例化弹性网络回归对象
# EN.fit(x_train,y_train) #训练
# y_pred=EN.predict(x_test) #预测
# #评价
# print(r2_score(y_pred,y_test)) 
# print("mse=",mean_squared_error(y_test, y_pred))#均方误差

题目六:交叉验证

请使用交叉验证来查看训练的效果

from sklearn.model_selection import cross_val_score
scores = cross_val_score(lr, boston.data, boston.target, cv=5)
scores
array([ 0.63919994,  0.71386698,  0.58702344,  0.07923081, -0.25294154])

题目七:模型的保存

保存我们刚刚训练好的模型

!pip install joblib
Requirement already satisfied: joblib in d:\dell\lib\site-packages (1.0.1)
import joblib #jbolib模块
#保存Model(注:save文件夹要预先建立,否则会报错)
joblib.dump(lr, 'lr.pkl')
['lr.pkl']
#读取Model
clf3 = joblib.load('clf.pkl')
#测试读取后的Model
# print(clf3.predict(x_test))

相关文章:

  • 未来5年,这个职业最有可能被BI软件替代,网友:现在跑还来得及
  • 【热力学】基于Matlab模拟生成热晕
  • 『Halcon与C#混合编程』第二章02_迈德威视工业相机SDK图像变量转换
  • NASA成功撞击1100公里外小行星!人类史上首次,主动避免恐龙覆辙,马斯克亦有贡献...
  • PHP Iterable 可迭代对象
  • 5. Hadoop集群操作
  • 数据库安装与配置
  • Lua 在终端使用交互模式
  • TCS34725颜色感应识别模块
  • python学习笔记:基础语法
  • 面试~Synchronized 与 锁升级
  • 【C++学习】C++入门知识(下)
  • 【23秋招c++后端面试技术突围】TCP/IP 之 滑动窗口、Nagle算法和延迟确认
  • .Net Redis的秒杀Dome和异步执行
  • 开学总动员!2022华为开发者大赛等你来挑战!
  • 《Java编程思想》读书笔记-对象导论
  • 0x05 Python数据分析,Anaconda八斩刀
  • Apache Pulsar 2.1 重磅发布
  • HashMap剖析之内部结构
  • Java 最常见的 200+ 面试题:面试必备
  • JavaScript 基本功--面试宝典
  • JS实现简单的MVC模式开发小游戏
  • seaborn 安装成功 + ImportError: DLL load failed: 找不到指定的模块 问题解决
  • 基于Vue2全家桶的移动端AppDEMO实现
  • 罗辑思维在全链路压测方面的实践和工作笔记
  • 使用 @font-face
  • 在electron中实现跨域请求,无需更改服务器端设置
  • 字符串匹配基础上
  • C# - 为值类型重定义相等性
  • 选择阿里云数据库HBase版十大理由
  • (day 12)JavaScript学习笔记(数组3)
  • (Redis使用系列) Springboot 实现Redis消息的订阅与分布 四
  • (安全基本功)磁盘MBR,分区表,活动分区,引导扇区。。。详解与区别
  • (动手学习深度学习)第13章 计算机视觉---图像增广与微调
  • (多级缓存)缓存同步
  • (二)【Jmeter】专栏实战项目靶场drupal部署
  • (学习日记)2024.02.29:UCOSIII第二节
  • (原)本想说脏话,奈何已放下
  • (转)甲方乙方——赵民谈找工作
  • (转)用.Net的File控件上传文件的解决方案
  • .mat 文件的加载与创建 矩阵变图像? ∈ Matlab 使用笔记
  • .md即markdown文件的基本常用编写语法
  • .NET CF命令行调试器MDbg入门(二) 设备模拟器
  • .NET 常见的偏门问题
  • .NET 回调、接口回调、 委托
  • .NET 实现 NTFS 文件系统的硬链接 mklink /J(Junction)
  • .net 桌面开发 运行一阵子就自动关闭_聊城旋转门家用价格大约是多少,全自动旋转门,期待合作...
  • .NET/C# 中你可以在代码中写多个 Main 函数,然后按需要随时切换
  • .net流程开发平台的一些难点(1)
  • .net用HTML开发怎么调试,如何使用ASP.NET MVC在调试中查看控制器生成的html?
  • .NET中GET与SET的用法
  • /etc/sudoer文件配置简析
  • ??在JSP中,java和JavaScript如何交互?
  • [Android学习笔记]ScrollView的使用
  • [Angular] 笔记 6:ngStyle