当前位置: 首页 > news >正文

用pd.DataFrame.to_sql方法插入万行数据耗时21秒

to_sql是Pandas中用于将DataFrame数据写入数据库的方法,可以将DataFrame转换为SQL语句,方便我们将数据存入数据库中,以便进行后续的操作。

to_sql方法中包含多个参数,比较常用的参数有name(表名)、con(数据库连接对象)、if_exists(若表已经存在,进行何种操作)、index(将DataFrame的index列写入数据库中)等。

pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None, dtype_backend=_NoDefault.no_default, dtype=None)
Read SQL query or database table into a DataFrame.

pandas.read_sql — pandas 2.1.2 documentation

def to_sql(frame,name: str,con,schema: str | None = None,if_exists: Literal["fail", "replace", "append"] = "fail",index: bool = True,index_label: IndexLabel | None = None,chunksize: int | None = None,dtype: DtypeArg | None = None,method: Literal["multi"] | Callable | None = None,engine: str = "auto",**engine_kwargs,
) -> int | None:"""Write records stored in a DataFrame to a SQL database.Parameters----------frame : DataFrame, Seriesname : strName of SQL table.con : SQLAlchemy connectable(engine/connection) or database string URIor sqlite3 DBAPI2 connectionUsing SQLAlchemy makes it possible to use any DB supported by thatlibrary.If a DBAPI2 object, only sqlite3 is supported.schema : str, optionalName of SQL schema in database to write to (if database flavorsupports this). If None, use default schema (default).if_exists : {'fail', 'replace', 'append'}, default 'fail'- fail: If table exists, do nothing.- replace: If table exists, drop it, recreate it, and insert data.- append: If table exists, insert data. Create if does not exist.index : bool, default TrueWrite DataFrame index as a column.index_label : str or sequence, optionalColumn label for index column(s). If None is given (default) and`index` is True, then the index names are used.A sequence should be given if the DataFrame uses MultiIndex.chunksize : int, optionalSpecify the number of rows in each batch to be written at a time.By default, all rows will be written at once.dtype : dict or scalar, optionalSpecifying the datatype for columns. If a dictionary is used, thekeys should be the column names and the values should be theSQLAlchemy types or strings for the sqlite3 fallback mode. If ascalar is provided, it will be applied to all columns.method : {None, 'multi', callable}, optionalControls the SQL insertion clause used:- None : Uses standard SQL ``INSERT`` clause (one per row).- ``'multi'``: Pass multiple values in a single ``INSERT`` clause.- callable with signature ``(pd_table, conn, keys, data_iter) -> int | None``.Details and a sample callable implementation can be found in thesection :ref:`insert method <io.sql.method>`.engine : {'auto', 'sqlalchemy'}, default 'auto'SQL engine library to use. If 'auto', then the option``io.sql.engine`` is used. The default ``io.sql.engine``behavior is 'sqlalchemy'.. versionadded:: 1.3.0**engine_kwargsAny additional kwargs are passed to the engine.Returns-------None or intNumber of rows affected by to_sql. None is returned if the callablepassed into ``method`` does not return an integer number of rows... versionadded:: 1.4.0Notes-----The returned rows affected is the sum of the ``rowcount`` attribute of ``sqlite3.Cursor``or SQLAlchemy connectable. The returned value may not reflect the exact number of writtenrows as stipulated in the`sqlite3 <https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.rowcount>`__ or`SQLAlchemy <https://docs.sqlalchemy.org/en/14/core/connections.html#sqlalchemy.engine.BaseCursorResult.rowcount>`__"""  # noqa: E501if if_exists not in ("fail", "replace", "append"):raise ValueError(f"'{if_exists}' is not valid for if_exists")if isinstance(frame, Series):frame = frame.to_frame()elif not isinstance(frame, DataFrame):raise NotImplementedError("'frame' argument should be either a Series or a DataFrame")
  • 读取表格,插入测试: 

import pandas as pd
import pyodbc
import openpyxl
from sqlalchemy import create_engine# Connection parameters
server = 'localhost'
database = 'tsl'
username = 'sa'
password = 'lqxxx'# Create a SQLAlchemy engine
engine = create_engine(f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver=ODBC Driver 17 for SQL Server")#设置文件目录
filePath = r"C:\\Users\\Administrator\\Documents\\traindata20221231.xlsx"#读取excel文件"明细"页签数据
table = pd.read_excel(filePath,sheet_name="Sheet0")print(table.info())#连接测试,验证能否连通
try:pd.read_sql('Employees', con=engine); print("connect successfully!")
except Exception as error:print("connect fail! because of :", error)# import time
# T1 = time.time()
# #用to_sql()方法插入数据,if_exists参数值:"replace"表示如果表存在, 则删掉重建该表, 重新创建;"append"表示如果表存在, 则会追加数据。
# try:
#     table.to_sql("trading", con=engine, index=False, if_exists="replace");
#     print("insert successfully!")
# except Exception as error: 
#     print("insert fail! because of:", error)
# print("data write complete!")
# T2 = time.time()
# print('程序运行时间:%s毫秒' % ((T2 - T1)*1000))# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 10233 entries, 0 to 10232
# Data columns (total 11 columns):
#  #   Column          Non-Null Count  Dtype  
# ---  ------          --------------  -----  
#  0   tradingHours    10233 non-null  object 
#  1   tradingChannel  10233 non-null  object 
#  2   currencyType    10233 non-null  object 
#  3   changeInto      10233 non-null  float64
#  4   changeOut       10233 non-null  float64
#  5   balance         10233 non-null  float64
#  6   tradingName     10141 non-null  object 
#  7   tradingAccount  10153 non-null  object 
#  8   paymentMethod   10233 non-null  object 
#  9   postscript      8099 non-null   object 
#  10  summary         916 non-null    object 
# dtypes: float64(3), object(8)
# memory usage: 879.5+ KB
# None
# connect successfully!
# insert successfully!
# data write complete!
# 程序运行时间:20926.252126693726毫秒
# [Finished in 39.9s]

若数据库表已存在,且没有指定if_exists参数,则to_sql方法默认行为为追加数据,即写入新数据时不会覆盖原有数据。此时需要注意数据重复问题。
to_sql方法写入大量数据时,可能会导致内存不足,需要使用chunksize参数进行分批写入。
to_sql方法写入数据时,默认使用pandas.DataFrame.to_sql(),可能存在性能问题

相关文章:

  • 【经典面试】87 字符串解码
  • yum 命令
  • CSP-S 2023 T1密码锁 T2消消乐
  • RISC-V IDE MRS无感远程协助模块详解
  • 【LeetCode:80. 删除有序数组中的重复项 II | 双指针】
  • Py之auto-gptq:auto-gptq的简介、安装、使用方法之详细攻略
  • SpringCloud中Turbine 1.X版本BUG
  • TensorFlow 的应用场景有哪些
  • Pycharm安装jupyter和d2l
  • Redis与Mysql的数据一致性(双写一致性)
  • 01【保姆级】-GO语言特点 下载安装 hello
  • Python将知网导出的endnote题录转为Refworks模式
  • 计算1到100的和
  • Spring Cloud Alibaba中Nacos的安装(Windows平台)以及服务的发现
  • python中使用websocket调用、获取、保存大模型API
  • css的样式优先级
  • Javascript 原型链
  • JS实现简单的MVC模式开发小游戏
  • React组件设计模式(一)
  • Spark学习笔记之相关记录
  • 阿里云ubuntu14.04 Nginx反向代理Nodejs
  • 不用申请服务号就可以开发微信支付/支付宝/QQ钱包支付!附:直接可用的代码+demo...
  • 给初学者:JavaScript 中数组操作注意点
  • 简单易用的leetcode开发测试工具(npm)
  • 理解在java “”i=i++;”所发生的事情
  • 聊聊flink的TableFactory
  • 爬虫进阶 -- 神级程序员:让你的爬虫就像人类的用户行为!
  • 前端每日实战:70# 视频演示如何用纯 CSS 创作一只徘徊的果冻怪兽
  • 深入体验bash on windows,在windows上搭建原生的linux开发环境,酷!
  • ​总结MySQL 的一些知识点:MySQL 选择数据库​
  • # 飞书APP集成平台-数字化落地
  • #HarmonyOS:软件安装window和mac预览Hello World
  • #Ubuntu(修改root信息)
  • #多叉树深度遍历_结合深度学习的视频编码方法--帧内预测
  • (3)(3.2) MAVLink2数据包签名(安全)
  • (机器学习-深度学习快速入门)第三章机器学习-第二节:机器学习模型之线性回归
  • (蓝桥杯每日一题)love
  • (六) ES6 新特性 —— 迭代器(iterator)
  • (十一)手动添加用户和文件的特殊权限
  • (转)JAVA中的堆栈
  • .NET : 在VS2008中计算代码度量值
  • .Net Core 中间件验签
  • .net 获取url的方法
  • .NET 将混合了多个不同平台(Windows Mac Linux)的文件 目录的路径格式化成同一个平台下的路径
  • .net开发时的诡异问题,button的onclick事件无效
  • .secret勒索病毒数据恢复|金蝶、用友、管家婆、OA、速达、ERP等软件数据库恢复
  • /deep/和 >>>以及 ::v-deep 三者的区别
  • @ModelAttribute注解使用
  • []AT 指令 收发短信和GPRS上网 SIM508/548
  • []Telit UC864E 拨号上网
  • [20150321]索引空块的问题.txt
  • [20150629]简单的加密连接.txt
  • [2023年]-hadoop面试真题(一)
  • [android] 天气app布局练习
  • [C++数据结构](22)哈希表与unordered_set,unordered_map实现