当前位置: 首页 > news >正文

datax把text转mysql_使用DataX实现mysql数据迁移

DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能

在github上找到 Quick Start 部分

https://github.com/alibaba/DataX/blob/master/userGuid.md

1.安装

JDK 1.8.0_151

Python 2.7.18

下载后解压

2.启动

进入bin目录

python datax.py {YOUR_JOB.json}

3.简单输出

(1)创建作业的配置文件(json格式)

stream.json

{"job": {"content": [

{"reader": {"name": "streamreader","parameter": {"sliceRecordCount": 2,"column": [

{"type": "long","value": "2"},

{"type": "string","value": "hello,world-DataX"}

]

}

},"writer": {"name": "streamwriter","parameter": {"encoding": "UTF-8","print": true}

}

}

],"setting": {"speed": {"channel": 3}

}

}

}

(2)启动DataX

python datax.py ./stream.json

输出

……2021-02-04 16:58:42.891 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] isstarted2 hello,world-DataX2 hello,world-DataX2021-02-04 16:58:42.893 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] isstarted2 hello,world-DataX2 hello,world-DataX2 hello,world-DataX2 hello,world-DataX2021-02-04 16:58:42.993 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms2021-02-04 16:58:42.993 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[105]ms2021-02-04 16:58:42.993 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[101]ms2021-02-04 16:58:42.994 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.

2021-02-04 16:58:52.887 [job-0] INFO StandAloneJobContainerCommunicator - Total 6 records, 108 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%

2021-02-04 16:58:52.888 [job-0] INFO AbstractScheduler -Scheduler accomplished all tasks.2021-02-04 16:58:52.889 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] dopost work.2021-02-04 16:58:52.889 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] dopost work.2021-02-04 16:58:52.889 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.2021-02-04 16:58:52.891 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or isa file: D:\data\datax\hook2021-02-04 16:58:52.893 [job-0] INFO JobContainer -[total cpu info]=>averageCpu| maxDeltaCpu |minDeltaCpu-1.00% | -1.00% | -1.00%[total gc info]=>NAME| totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime |minDeltaGCTime

PS MarkSweep| 0 | 0 | 0 | 0.000s | 0.000s | 0.000s

PS Scavenge| 0 | 0 | 0 | 0.000s | 0.000s | 0.000s2021-02-04 16:58:52.894 [job-0] INFO JobContainer - PerfTrace not enable!

2021-02-04 16:58:52.894 [job-0] INFO StandAloneJobContainerCommunicator - Total 6 records, 108 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%

2021-02-04 16:58:52.896 [job-0] INFO JobContainer -任务启动时刻 :2021-02-04 16:58:42任务结束时刻 :2021-02-04 16:58:52任务总计耗时 : 10s

任务平均流量 : 10B/s

记录写入速度 : 0rec/s

读出记录总数 :6读写失败总数 :0

4.获取mysql数据本地打印

mysqltest.json

{"job": {"setting": {"speed": {"channel":1}

},"content": [

{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "root","connection": [

{"querySql": ["select id,name from sys_user limit 3;"],"jdbcUrl": ["jdbc:mysql://localhost:3306/dream"]

}

]

}

},"writer": {"name": "streamwriter","parameter": {"print": true,"encoding": "UTF-8"}

}

}

]

}

}

输出

……2021-02-04 17:00:24.237 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from sys_user limit 3;

] jdbcUrl:[jdbc:mysql://localhost:3306/dream?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].

33abc1admin25caocao2021-02-04 17:00:24.291 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[102]ms2021-02-04 17:00:24.291 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.

2021-02-04 17:00:34.187 [job-0] INFO StandAloneJobContainerCommunicator - Total 3 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%

2021-02-04 17:00:34.188 [job-0] INFO AbstractScheduler -Scheduler accomplished all tasks.2021-02-04 17:00:34.189 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] dopost work.2021-02-04 17:00:34.189 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] dopost work.2021-02-04 17:00:34.189 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.2021-02-04 17:00:34.191 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or isa file: D:\data\datax\hook2021-02-04 17:00:34.194 [job-0] INFO JobContainer -[total cpu info]=>averageCpu| maxDeltaCpu |minDeltaCpu-1.00% | -1.00% | -1.00%[total gc info]=>NAME| totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime |minDeltaGCTime

PS MarkSweep| 0 | 0 | 0 | 0.000s | 0.000s | 0.000s

PS Scavenge| 0 | 0 | 0 | 0.000s | 0.000s | 0.000s2021-02-04 17:00:34.194 [job-0] INFO JobContainer - PerfTrace not enable!

2021-02-04 17:00:34.195 [job-0] INFO StandAloneJobContainerCommunicator - Total 3 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%

2021-02-04 17:00:34.196 [job-0] INFO JobContainer -任务启动时刻 :2021-02-04 17:00:23任务结束时刻 :2021-02-04 17:00:34任务总计耗时 : 10s

任务平均流量 : 1B/s

记录写入速度 : 0rec/s

读出记录总数 :3读写失败总数 :0

5.mysql 不同库表数据迁移

migration.json

{"job": {"content": [

{"reader": {"name": "mysqlreader","parameter": {"column": ["id","name"],"connection": [

{"jdbcUrl": [ "jdbc:mysql://127.0.0.1:3306/dream"],"table": ["sys_user"]

}

],"password": "root","username": "root","where": "id>29"}

},"writer": {"name": "mysqlwriter","parameter": {"column": ["id","name"],"connection": [

{"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/dream1","table": ["user"]

}

],"password": "root","username": "root","preSql": [],"session": [],"writeMode": "insert"}

}

}

],"setting": {"speed": {"channel": "1"}

}

}

}

输出

……2021-02-04 17:10:17.061 [job-0] INFO StandAloneJobContainerCommunicator - Total 4 records, 30 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%

2021-02-04 17:10:17.061 [job-0] INFO JobContainer -任务启动时刻 :2021-02-04 17:10:06任务结束时刻 :2021-02-04 17:10:17任务总计耗时 : 10s

任务平均流量 : 3B/s

记录写入速度 : 0rec/s

读出记录总数 :4读写失败总数 :0

查看 dream1的 user数据已经插入

其他参数可以查看github上mysqlreader和mysqlwriter下的doc

ab795faa5921f6d834779be97e797721.png

相关文章:

  • haccp体系的参照标准是_食品安全标准与ISO、GMP、HACCP等体系之间的相互关系
  • mac mysql 多实例_Mac上mysql多端口实例配置并设置开机自动启动
  • mysql 主要有哪些函数_mysql中有哪些函数
  • mysql索引失效解决6_简单介绍MySQL索引失效的几种情况
  • java 获取方法的参数_Java 8中如何获取参数名称的方法示例
  • java gui程序运行_13.JAVA之GUI编程将程序打包jar
  • java输入输出学生成绩_java从键盘输入学生成绩,找出最高分,并输出学生成绩等级。...
  • java $ 解析api_Java版抖音解析接口
  • java 10的 6次方_【Java】 剑指offer(15) 数值的整数次方
  • java 方法 返回值 int_关于java字节流类的read()方法返回值为int的思考
  • java 迭代器复制_java设计模式之迭代器模式
  • java取文件类型_java如何通过文件描述符获取文件类型
  • java pdb_当12C PDB遇上JDBC
  • java复选框添加选中事件_Java的JTable中添加JCheckBox,点击事件处理问题,求高手指点迷津...
  • java thread类是抽象类_抽象类详解
  • (ckeditor+ckfinder用法)Jquery,js获取ckeditor值
  • 「前端」从UglifyJSPlugin强制开启css压缩探究webpack插件运行机制
  • 【跃迁之路】【585天】程序员高效学习方法论探索系列(实验阶段342-2018.09.13)...
  • 2017 年终总结 —— 在路上
  • 2018天猫双11|这就是阿里云!不止有新技术,更有温暖的社会力量
  • Angular4 模板式表单用法以及验证
  • JavaScript 无符号位移运算符 三个大于号 的使用方法
  • JavaScript的使用你知道几种?(上)
  • java小心机(3)| 浅析finalize()
  • JS字符串转数字方法总结
  • PAT A1092
  • Protobuf3语言指南
  • 从伪并行的 Python 多线程说起
  • 服务器之间,相同帐号,实现免密钥登录
  • 关键词挖掘技术哪家强(一)基于node.js技术开发一个关键字查询工具
  • 那些被忽略的 JavaScript 数组方法细节
  • 区块链将重新定义世界
  • 如何利用MongoDB打造TOP榜小程序
  • 微信小程序开发问题汇总
  • 专访Pony.ai 楼天城:自动驾驶已经走过了“从0到1”,“规模”是行业的分水岭| 自动驾驶这十年 ...
  • ​iOS安全加固方法及实现
  • #Java第九次作业--输入输出流和文件操作
  • #多叉树深度遍历_结合深度学习的视频编码方法--帧内预测
  • (12)目标检测_SSD基于pytorch搭建代码
  • (Python) SOAP Web Service (HTTP POST)
  • (Redis使用系列) Springboot 使用Redis+Session实现Session共享 ,简单的单点登录 五
  • (六)软件测试分工
  • (论文阅读22/100)Learning a Deep Compact Image Representation for Visual Tracking
  • (三)模仿学习-Action数据的模仿
  • (三分钟)速览传统边缘检测算子
  • (转)Google的Objective-C编码规范
  • (转)Spring4.2.5+Hibernate4.3.11+Struts1.3.8集成方案一
  • .NET 5.0正式发布,有什么功能特性(翻译)
  • .NET 事件模型教程(二)
  • .NET6 命令行启动及发布单个Exe文件
  • .stream().map与.stream().flatMap的使用
  • //解决validator验证插件多个name相同只验证第一的问题
  • @Autowired注解的实现原理
  • @LoadBalanced 和 @RefreshScope 同时使用,负载均衡失效分析
  • [ASP]青辰网络考试管理系统NES X3.5