当前位置: 首页 > news >正文

学习大数据DAY58 增量抽取数据表

作业

1 SQL 优化的常见写法有哪些 - 面试经常被问
使用索引:合理创建和使用索引是提高查询效率的关键。索引可以加速数据的检
索速度,但是索引也会占用额外的存储空间,并且在插入、删除和更新操作时会
有额外的开销。
避免全表扫描:尽量避免在 WHERE 子句中使用不等于(!=)、非索引列的函数或者
IS NULL 等操作,因为这些操作往往会导致数据库引擎放弃索引而进行全表扫描。
优化 JOIN 操作:确保在 JOIN 操作中使用的字段上有索引,并且尽可能地减少
JOIN 的数量,使用 INNER JOIN 替代 LEFT JOIN 来减少额外的行返回。
限制结果集大小:在可能的情况下,使用 WHERE 子句来限制结果集的大小。如果
只需要结果集的一部分数据,可以使用 LIMIT 或 TOP 关键字来限制返回的行数。
分批处理大结果集:如果查询返回的结果集非常大,考虑使用分批处理的方式来
逐步获取数据,例如使用游标或分页查询。
使用 EXPLAIN 分析查询:在执行查询前使用 EXPLAIN(MySQL)或类似的工具来
分析查询执行计划,可以帮助识别问题所在,从而优化查询语句。
-- oracle
explain plan for sql 语句
-- mysql
explain sql 语句
减少 SELECT * 的使用:避免使用 SELECT * 来获取所有列,尽量只选择需要的
列,这样可以减少数据传输的量并提高查询效率。
注意数据类型的一致性:在 WHERE 子句中进行比较时,确保数据类型的一致性,
以避免不必要的类型转换操作。
优化子查询:如果可能,尽量将子查询改写成 JOIN 操作,因为子查询可能会导
致查询效率低下。
使用临时表和表变量:对于复杂查询,合理使用临时表或表变量可以帮助存储中
间结果,减少不必要的计算。 2 编写 python 辅助脚本
可以使用 CDH02 作为开发机器 可以不用打开自己的虚拟机
在 SSH 配置文件中添加:(注意要放在最上面)
进入后,在里面创建自己的文件:
如何连接 CDH 的 Hive
把驱动
共享\项目课工具\hive-jdbc-uber-2.6.5.0-292.jar 复制到自己的电脑,位置随意
访问 CDH6 的网页地址
协作开发一定要注意
●不要随便删除别人的数据库和表
●不要随便删除 HDFS 别人的数据路径
快速生成配置文件
python /opt/datax/bin/datax.py -r mysqlreader -w hdfswriter
添加属性:
u_accept_m_inc:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp"
],"querySql": [
"select * from u_accept_m"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id","type":"int"},
{"name":"billcode","type":"string"},
{"name":"acceptno","type":"string"},
{"name":"receiptno","type":"string"}
,
{"name":"orderno","type":"string"},
{"name":"pactno","type":"string"},
{"name":"busno","type":"string"},
{"name":"vendorno","type":"string"},
{"name":"paytype","type":"string"},
{"name":"paydate","type":"string"},
{"name":"buyer","type":"string"},
{"name":"checker1","type":"string"},
{"name":"checkbit1","type":"string"}
,
{"name":"checker2","type":"string"},
{"name":"checkbit2","type":"string"}
,
{"name":"checker3","type":"string"},
{"name":"checkbit3","type":"string"}
,
{"name":"checker4","type":"string"},
{"name":"checkbit4","type":"string"}
,
{"name":"status","type":"string"},
{"name":"accepttype","type":"string"}
,
{"name":"createuser","type":"string"}
,{"name":"createtime","type":"string"}
,
{"name":"notes","type":"string"},
{"name":"stamp","type":"string"},
{"name":"execdate","type":"string"},
{"name":"sendno","type":"string"},
{"name":"distno","type":"string"},
{"name":"whlno","type":"string"},
{"name":"dept","type":"string"},
{"name":"bak1","type":"string"},
{"name":"bak2","type":"string"},
{"name":"bak3","type":"string"},
{"name":"bak4","type":"string"},
{"name":"bak5","type":"string"},
{"name":"bak6","type":"string"},
{"name":"bak7","type":"string"},
{"name":"store_status","type":"strin
g"},
{"name":"bak88","type":"string"},
{"name":"consingerid","type":"string
"},
{"name":"flag_status","type":"string
"},
{"name":"flag_date","type":"string"}
,
{"name":"ownerid","type":"string"},
{"name":"userdeptno","type":"string"}
,
{"name":"bill_source","type":"string
"},
{"name":"puramt","type":"string"},
{"name":"vendorsaler","type":"string
"},
{"name":"yycbillno","type":"string"}
,
{"name":"yycstatus","type":"string"}
,
{"name":"yycvendornotes","type":"str
ing"},
{"name":"yycexecdate","type":"string
"},
{"name":"wms_flag","type":"string"},{"name":"init_createtime","type":"st
ring"},
{"name":"init_createuser","type":"st
ring"},
{"name":"msfx_upflag","type":"string
"},
{"name":"vendor_address","type":"str
ing"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "u_accept_m_inc.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/tmp/u_accept_m_inc",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": 2
}
}
}
}
u_accept_c_inc.json:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"connection": [
{
"jdbcUrl": ["jdbc:mysql://zhiyun.pub:233
06/erp"
],
"querySql": [
"select * from u_accept_c"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id","type":"int"},
{"name":"acceptno","type":"string"},
{"name":"idno","type":"string"},
{"name":"direct","type":"string"},
{"name":"wareid","type":"string"},
{"name":"stallno","type":"string"},
{"name":"wareqty","type":"string"},
{"name":"purprice","type":"string"},
{"name":"purtax","type":"string"},
{"name":"makeno","type":"string"},
{"name":"makedate","type":"string"},
{"name":"invalidate","type":"string"}
,
{"name":"acb_batchno","type":"string
"},
{"name":"acb_no","type":"string"},
{"name":"acb_idno","type":"string"},
{"name":"acb_qty","type":"string"},
{"name":"cker1","type":"string"},
{"name":"cker2","type":"string"},
{"name":"cker3","type":"string"},
{"name":"notes","type":"string"},
{"name":"leastpriceo","type":"string
"},
{"name":"leastpricen","type":"string
"},{"name":"saleprice","type":"string"}
,
{"name":"whlpriceo","type":"string"}
,
{"name":"whlpricen","type":"string"}
,
{"name":"purpriceo","type":"string"}
,
{"name":"rowid","type":"string"},
{"name":"indentqty","type":"string"}
,
{"name":"invalidate_char","type":"st
ring"},
{"name":"bak1","type":"string"},
{"name":"bak2","type":"string"},
{"name":"bak3","type":"string"},
{"name":"bak4","type":"string"},
{"name":"bak5","type":"string"},
{"name":"bak6","type":"string"},
{"name":"bak7","type":"string"},
{"name":"bak8","type":"string"},
{"name":"maxqty","type":"string"},
{"name":"midqty","type":"string"},
{"name":"batchno_act","type":"string
"},
{"name":"idno_act","type":"string"},
{"name":"indentprice","type":"string
"},
{"name":"prodid","type":"string"},
{"name":"seal_stall","type":"string"}
,
{"name":"flag2","type":"string"},
{"name":"indentno","type":"string"},
{"name":"checkstallno","type":"strin
g"},
{"name":"chkcont","type":"string"},
{"name":"chkresult","type":"string"}
,
{"name":"backprice","type":"string"}
,
{"name":"bioavailability","type":"st
ring"},{"name":"sterilemakeno","type":"stri
ng"},
{"name":"chk","type":"string"},
{"name":"groupid","type":"string"},
{"name":"distprice","type":"string"}
,
{"name":"bak9","type":"string"},
{"name":"bak10","type":"string"},
{"name":"purprice_no","type":"string
"},
{"name":"sterileinvalidate","type":"
string"},
{"name":"storeqty","type":"string"},
{"name":"barcode","type":"string"},
{"name":"check_qty","type":"string"}
,
{"name":"check_nook_qty","type":"str
ing"},
{"name":"reason","type":"string"},
{"name":"prod_addid","type":"string"}
,
{"name":"douchecker1","type":"string
"},
{"name":"douchecker2","type":"string
"},
{"name":"tallyqty","type":"string"},
{"name":"tally_checknoqty","type":"s
tring"},
{"name":"checkno_notes","type":"stri
ng"},
{"name":"unqualified","type":"string
"},
{"name":"eq_no","type":"string"},
{"name":"sterilemakedate","type":"st
ring"},
{"name":"tally_idno","type":"string"}
,
{"name":"tally_makeno","type":"strin
g"},
{"name":"tally_stallno","type":"stri
ng"},
{"name":"trayno","type":"string"},
{"name":"bp_id","type":"string"},{"name":"packqty","type":"string"},
{"name":"distcount","type":"string"}
,
{"name":"payeetype","type":"string"}
,
{"name":"wareqty_bak","type":"string
"},
{"name":"scan_flag","type":"string"}
,
{"name":"quality_standard","type":"s
tring"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "u_accept_c_inc.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/tmp/u_accept_c_inc",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": 2
}
}
}
}
数据列的处理
编写两个 python 辅助脚本,自动打印出 datax hive 的字段信息
u_accept_m_inc.py:
#!/bin/python3
# 自动打印 datax 和 hive 的字段信息
ddl='''
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,`billcode` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`acceptno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`receiptno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`orderno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`pactno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`busno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`vendorno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`paytype` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`paydate` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`buyer` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checker1` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checkbit1` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checker2` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checkbit2` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checker3` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checkbit3` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checker4` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checkbit4` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`status` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`accepttype` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`createuser` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`createtime` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`notes` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`stamp` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`execdate` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`sendno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`distno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`whlno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`dept` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak1` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak2` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak3` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak4` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak5` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak6` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak7` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`store_status` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak88` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`consingerid` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`flag_status` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`flag_date` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`ownerid` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`userdeptno` varchar(255) COLLATE utf8_bin DEFAULT NULL,`bill_source` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`puramt` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`vendorsaler` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`yycbillno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`yycstatus` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`yycvendornotes` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`yycexecdate` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`wms_flag` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`init_createtime` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`init_createuser` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`msfx_upflag` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`vendor_address` varchar(255) COLLATE utf8_bin DEFAULT NULL,
'''
# 切割字符串
lines=ddl.strip().split("\n")
fileds=[]
# print(lines)
for line in lines:
# print(line)
data=line.strip().split(" ")
# mysql 的类型
field_name=data[0].replace('`',"")
#数据类型
field_type=data[1].split("(")[0]
#转换成 hive 类型
field_hive_type="string"
if field_type=="int" or field_type=="tinyint" or
field_type=="bigint":
field_hive_type="int"
if field_type=="float" or field_type=="double":
field_hive_type="float"
fileds.append([field_name,field_hive_type])
print("=============== 打印 datax 的列信息 ===============")
for filed in fileds:line='{"name":"'+filed[0]+'","type":"'+filed[1]+'"},'
print(line)
print("=============== 打印 hive 的列信息 ===============")
for filed in fileds:
line=f"{filed[0]} {filed[1]},"
print(line)
hive 建表:
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists ods_shihaihong.u_accept_m_inc(
id int,
billcode string,
acceptno string,
receiptno string,
orderno string,
pactno string,
busno string,
vendorno string,
paytype string,
paydate string,
buyer string,
checker1 string,
checkbit1 string,
checker2 string,
checkbit2 string,
checker3 string,
checkbit3 string,
checker4 string,
checkbit4 string,
status string,
accepttype string,
createuser string,
notes string,
stamp string,
execdate string,
sendno string,distno string,
whlno string,
dept string,
bak1 string,
bak2 string,
bak3 string,
bak4 string,
bak5 string,
bak6 string,
bak7 string,
store_status string,
bak88 string,
consingerid string,
flag_status string,
flag_date string,
ownerid string,
userdeptno string,
bill_source string,
puramt string,
vendorsaler string,
yycbillno string,
yycstatus string,
yycvendornotes string,
yycexecdate string,
wms_flag string,
init_createtime string,
init_createuser string,
msfx_upflag string,
vendor_address string
) partitioned by (createtime string)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/u_accept_m_inc";
抽取数据:
hadoop fs -mkdir -p /zhiyun/shihaihong/tmp/u_accept_m_inc
python
/opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/u_accept_m_inc.json
加载脚本:
进入 hive 后:
load data inpath
"/zhiyun/shihaihong/tmp/u_accept_m_inc/*"
overwrite into table ods_shihaihong.u_accept_m_inc;
验证:
show partitions ods_shihaihong.u_accept_m_inc ;
select count ( 1 ) from ods_shihaihong.u_accept_m_inc where createtime =
"2016-11-10" ;
select * from ods_shihaihong.u_accept_m_inc where createtime =
"2016-11-10" limit 5 ;
编写调度脚本:
u_accept_m_inc.sh:
#!/bin/bash
day=$(date -d "yesterday" +%Y-%m-%d)if [ $1 != "" ]; then
day=$1;
fi;
echo "抽取的日期为 $day"
echo "生成增量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp"
],
"querySql": [
"select * from u_accept_m"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id","type":"int"},
{"name":"billcode","type":"string"},
{"name":"acceptno","type":"string"},
{"name":"receiptno","type":"string"}
,
{"name":"orderno","type":"string"},
{"name":"pactno","type":"string"},{"name":"busno","type":"string"},
{"name":"vendorno","type":"string"},
{"name":"paytype","type":"string"},
{"name":"paydate","type":"string"},
{"name":"buyer","type":"string"},
{"name":"checker1","type":"string"},
{"name":"checkbit1","type":"string"}
,
{"name":"checker2","type":"string"},
{"name":"checkbit2","type":"string"}
,
{"name":"checker3","type":"string"},
{"name":"checkbit3","type":"string"}
,
{"name":"checker4","type":"string"},
{"name":"checkbit4","type":"string"}
,
{"name":"status","type":"string"},
{"name":"accepttype","type":"string"}
,
{"name":"createuser","type":"string"}
,
{"name":"createtime","type":"string"}
,
{"name":"notes","type":"string"},
{"name":"stamp","type":"string"},
{"name":"execdate","type":"string"},
{"name":"sendno","type":"string"},
{"name":"distno","type":"string"},
{"name":"whlno","type":"string"},
{"name":"dept","type":"string"},
{"name":"bak1","type":"string"},
{"name":"bak2","type":"string"},
{"name":"bak3","type":"string"},
{"name":"bak4","type":"string"},
{"name":"bak5","type":"string"},
{"name":"bak6","type":"string"},
{"name":"bak7","type":"string"},
{"name":"store_status","type":"strin
g"},
{"name":"bak88","type":"string"},
{"name":"consingerid","type":"string
"},{"name":"flag_status","type":"string
"},
{"name":"flag_date","type":"string"}
,
{"name":"ownerid","type":"string"},
{"name":"userdeptno","type":"string"}
,
{"name":"bill_source","type":"string
"},
{"name":"puramt","type":"string"},
{"name":"vendorsaler","type":"string
"},
{"name":"yycbillno","type":"string"}
,
{"name":"yycstatus","type":"string"}
,
{"name":"yycvendornotes","type":"str
ing"},
{"name":"yycexecdate","type":"string
"},
{"name":"wms_flag","type":"string"},
{"name":"init_createtime","type":"st
ring"},
{"name":"init_createuser","type":"st
ring"},
{"name":"msfx_upflag","type":"string
"},
{"name":"vendor_address","type":"str
ing"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "u_accept_m_inc.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/tmp/u_accept_m_inc",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {"channel": 2
}
}
}
}' > /zhiyun/shihaihong/jobs/u_accept_m_inc.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/tmp/u_accept_m_inc
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/u_accept_m_inc.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists ods_shihaihong.u_accept_m_inc(
id int unsigned,
billcode
string,
acceptno
string,
receiptno
string,
orderno string,
pactno string,
busno
string,
vendorno
string,
paytype string,
paydate string,
buyer
string,
checker1
string,
checkbit1
string,
checker2
string,
checkbit2
string,
checker3
string,
checkbit3
string,
checker4
string,
checkbit4
string,
status string,
accepttype string,
createuser string,
notes
string,
stamp
string,
execdate
string,
sendno string,distno string,
whlno
string,
dept
string,
bak1
string,
bak2
string,
bak3
string,
bak4
string,
bak5
string,
bak6
string,
bak7
string,
store_status
string,
bak88
string,
consingerid string,
flag_status string,
flag_date
string,
ownerid string,
userdeptno string,
bill_source string,
puramt string,
vendorsaler string,
yycbillno
string,
yycstatus
string,
yycvendornotes string,
yycexecdate string,
wms_flag
string,
init_createtime string,
init_createuser string,
msfx_upflag string,
vendor_address string
) partitioned by (createtime string)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/u_accept_m_inc";
'
echo "加载数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
load data inpath \"/zhiyun/shihaihong/tmp/u_accept_m_inc/*\"
overwrite into table ods_shihaihong.u_accept_m_inc
partition(createtime='$day');
"echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
-- 验证分区
show partitions ods_shihaihong.u_accept_m_inc;
-- 验证分区的总数
select count(1) from ods_shihaihong.u_accept_m_inc where
createtime = \"$day\";
-- 验证数据
select * from ods_shihaihong.u_accept_m_inc where createtime =
\"$day\" limit 5;
"
echo "抽取完成"
用同样的方式导入另一张表:
u_accept_c_inc.py:
#!/bin/python3
# 自动打印 datax 和 hive 的字段信息
ddl='''
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`acceptno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`idno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`direct` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`wareid` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`stallno` varchar(255) COLLATE utf8_bin NOT NULL,
`wareqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`purprice` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`purtax` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`makeno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`makedate` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`invalidate` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`acb_batchno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`acb_no` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`acb_idno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`acb_qty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`cker1` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`cker2` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`cker3` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`notes` varchar(255) COLLATE utf8_bin DEFAULT NULL,`leastpriceo` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`leastpricen` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`saleprice` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`whlpriceo` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`whlpricen` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`purpriceo` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`rowid` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`indentqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`invalidate_char` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak1` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak2` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak3` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak4` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak5` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak6` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak7` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak8` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`maxqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`midqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`batchno_act` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`idno_act` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`indentprice` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`prodid` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`seal_stall` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`flag2` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`indentno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checkstallno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`chkcont` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`chkresult` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`backprice` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bioavailability` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`sterilemakeno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`chk` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`groupid` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`distprice` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak9` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bak10` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`purprice_no` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`sterileinvalidate` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`storeqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`barcode` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`check_qty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`check_nook_qty` varchar(255) COLLATE utf8_bin DEFAULT NULL,`reason` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`prod_addid` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`douchecker1` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`douchecker2` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`tallyqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`tally_checknoqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`checkno_notes` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`unqualified` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`eq_no` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`sterilemakedate` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`tally_idno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`tally_makeno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`tally_stallno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`trayno` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`bp_id` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`packqty` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`distcount` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`payeetype` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`wareqty_bak` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`scan_flag` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`quality_standard` varchar(255) COLLATE utf8_bin DEFAULT NULL
'''
# 切割字符串
lines=ddl.strip().split("\n")
fileds=[]
# print(lines)
for line in lines:
# print(line)
data=line.strip().split(" ")
# mysql 的类型
field_name=data[0].replace('`',"")
#数据类型
field_type=data[1].split("(")[0]
#转换成 hive 类型
field_hive_type="string"if field_type=="int" or field_type=="tinyint" or
field_type=="bigint":
field_hive_type="int"
if field_type=="float" or field_type=="double":
field_hive_type="float"
fileds.append([field_name,field_hive_type])
print("=============== 打印 datax 的列信息 ===============")
for filed in fileds:
line='{"name":"'+filed[0]+'","type":"'+filed[1]+'"},'
print(line)
print("=============== 打印 hive 的列信息 ===============")
for filed in fileds:
line=f"{filed[0]} {filed[1]},"
print(line)
u_accept_c_inc.json:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp"
],
"querySql": [
"select u_accept_c.* from
u_accept_c left join u_accept_m on
u_accept_c.acceptno=u_accept_m.acceptno"
]
}
],
"password": "zhiyun",
"username": "zhiyun"}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id","type":"int"},
{"name":"acceptno","type":"string"},
{"name":"idno","type":"string"},
{"name":"direct","type":"string"},
{"name":"wareid","type":"string"},
{"name":"stallno","type":"string"},
{"name":"wareqty","type":"string"},
{"name":"purprice","type":"string"},
{"name":"purtax","type":"string"},
{"name":"makeno","type":"string"},
{"name":"makedate","type":"string"},
{"name":"invalidate","type":"string"}
,
{"name":"acb_batchno","type":"string
"},
{"name":"acb_no","type":"string"},
{"name":"acb_idno","type":"string"},
{"name":"acb_qty","type":"string"},
{"name":"cker1","type":"string"},
{"name":"cker2","type":"string"},
{"name":"cker3","type":"string"},
{"name":"notes","type":"string"},
{"name":"leastpriceo","type":"string
"},
{"name":"leastpricen","type":"string
"},
{"name":"saleprice","type":"string"}
,
{"name":"whlpriceo","type":"string"}
,
{"name":"whlpricen","type":"string"}
,
{"name":"purpriceo","type":"string"}
,
{"name":"rowid","type":"string"},
{"name":"indentqty","type":"string"}
,{"name":"invalidate_char","type":"st
ring"},
{"name":"bak1","type":"string"},
{"name":"bak2","type":"string"},
{"name":"bak3","type":"string"},
{"name":"bak4","type":"string"},
{"name":"bak5","type":"string"},
{"name":"bak6","type":"string"},
{"name":"bak7","type":"string"},
{"name":"bak8","type":"string"},
{"name":"maxqty","type":"string"},
{"name":"midqty","type":"string"},
{"name":"batchno_act","type":"string
"},
{"name":"idno_act","type":"string"},
{"name":"indentprice","type":"string
"},
{"name":"prodid","type":"string"},
{"name":"seal_stall","type":"string"}
,
{"name":"flag2","type":"string"},
{"name":"indentno","type":"string"},
{"name":"checkstallno","type":"strin
g"},
{"name":"chkcont","type":"string"},
{"name":"chkresult","type":"string"}
,
{"name":"backprice","type":"string"}
,
{"name":"bioavailability","type":"st
ring"},
{"name":"sterilemakeno","type":"stri
ng"},
{"name":"chk","type":"string"},
{"name":"groupid","type":"string"},
{"name":"distprice","type":"string"}
,
{"name":"bak9","type":"string"},
{"name":"bak10","type":"string"},
{"name":"purprice_no","type":"string
"},
{"name":"sterileinvalidate","type":"
string"},{"name":"storeqty","type":"string"},
{"name":"barcode","type":"string"},
{"name":"check_qty","type":"string"}
,
{"name":"check_nook_qty","type":"str
ing"},
{"name":"reason","type":"string"},
{"name":"prod_addid","type":"string"}
,
{"name":"douchecker1","type":"string
"},
{"name":"douchecker2","type":"string
"},
{"name":"tallyqty","type":"string"},
{"name":"tally_checknoqty","type":"s
tring"},
{"name":"checkno_notes","type":"stri
ng"},
{"name":"unqualified","type":"string
"},
{"name":"eq_no","type":"string"},
{"name":"sterilemakedate","type":"st
ring"},
{"name":"tally_idno","type":"string"}
,
{"name":"tally_makeno","type":"strin
g"},
{"name":"tally_stallno","type":"stri
ng"},
{"name":"trayno","type":"string"},
{"name":"bp_id","type":"string"},
{"name":"packqty","type":"string"},
{"name":"distcount","type":"string"}
,
{"name":"payeetype","type":"string"}
,
{"name":"wareqty_bak","type":"string
"},
{"name":"scan_flag","type":"string"}
,
{"name":"quality_standard","type":"s
tring"}
],"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "u_accept_c_inc.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/tmp/u_accept_c_inc",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": 2
}
}
}
}
u_accept_c_inc.sh:
#!/bin/bash
day=$(date -d "yesterday" +%Y-%m-%d)
if [ $1 != "" ]; then
day=$1;
fi;
echo "抽取的日期为 $day"
echo "生成增量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '
{
"job": {
"content": [
{
"reader": {"name": "mysqlreader",
"parameter": {
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"querySql": [
"select a.* from u_accept_c
a left join u_accept_m b on a.acceptno =b.acceptno where
createtime between '\'$day 00:00:00\'' and '\'$day 23:59:59\''"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id","type":"int"},
{"name":"acceptno","type":"string"},
{"name":"idno","type":"string"},
{"name":"direct","type":"string"},
{"name":"wareid","type":"string"},
{"name":"stallno","type":"string"},
{"name":"wareqty","type":"string"},
{"name":"purprice","type":"string"},
{"name":"purtax","type":"string"},
{"name":"makeno","type":"string"},
{"name":"makedate","type":"string"},
{"name":"invalidate","type":"string"}
,
{"name":"acb_batchno","type":"string
"},
{"name":"acb_no","type":"string"},
{"name":"acb_idno","type":"string"},
{"name":"acb_qty","type":"string"},
{"name":"cker1","type":"string"},
{"name":"cker2","type":"string"},{"name":"cker3","type":"string"},
{"name":"notes","type":"string"},
{"name":"leastpriceo","type":"string
"},
{"name":"leastpricen","type":"string
"},
{"name":"saleprice","type":"string"}
,
{"name":"whlpriceo","type":"string"}
,
{"name":"whlpricen","type":"string"}
,
{"name":"purpriceo","type":"string"}
,
{"name":"rowid","type":"string"},
{"name":"indentqty","type":"string"}
,
{"name":"invalidate_char","type":"st
ring"},
{"name":"bak1","type":"string"},
{"name":"bak2","type":"string"},
{"name":"bak3","type":"string"},
{"name":"bak4","type":"string"},
{"name":"bak5","type":"string"},
{"name":"bak6","type":"string"},
{"name":"bak7","type":"string"},
{"name":"bak8","type":"string"},
{"name":"maxqty","type":"string"},
{"name":"midqty","type":"string"},
{"name":"batchno_act","type":"string
"},
{"name":"idno_act","type":"string"},
{"name":"indentprice","type":"string
"},
{"name":"prodid","type":"string"},
{"name":"seal_stall","type":"string"}
,
{"name":"flag2","type":"string"},
{"name":"indentno","type":"string"},
{"name":"checkstallno","type":"strin
g"},
{"name":"chkcont","type":"string"},{"name":"chkresult","type":"string"}
,
{"name":"backprice","type":"string"}
,
{"name":"bioavailability","type":"st
ring"},
{"name":"sterilemakeno","type":"stri
ng"},
{"name":"chk","type":"string"},
{"name":"groupid","type":"string"},
{"name":"distprice","type":"string"}
,
{"name":"bak9","type":"string"},
{"name":"bak10","type":"string"},
{"name":"purprice_no","type":"string
"},
{"name":"sterileinvalidate","type":"
string"},
{"name":"storeqty","type":"string"},
{"name":"barcode","type":"string"},
{"name":"check_qty","type":"string"}
,
{"name":"check_nook_qty","type":"str
ing"},
{"name":"reason","type":"string"},
{"name":"prod_addid","type":"string"}
,
{"name":"douchecker1","type":"string
"},
{"name":"douchecker2","type":"string
"},
{"name":"tallyqty","type":"string"},
{"name":"tally_checknoqty","type":"s
tring"},
{"name":"checkno_notes","type":"stri
ng"},
{"name":"unqualified","type":"string
"},
{"name":"eq_no","type":"string"},
{"name":"sterilemakedate","type":"st
ring"},
{"name":"tally_idno","type":"string"}
,{"name":"tally_makeno","type":"strin
g"},
{"name":"tally_stallno","type":"stri
ng"},
{"name":"trayno","type":"string"},
{"name":"bp_id","type":"string"},
{"name":"packqty","type":"string"},
{"name":"distcount","type":"string"}
,
{"name":"payeetype","type":"string"}
,
{"name":"wareqty_bak","type":"string
"},
{"name":"scan_flag","type":"string"}
,
{"name":"quality_standard","type":"s
tring"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "u_accept_c_inc.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/tmp/u_accept_c_inc",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": 2
}
}
}
}
' > /zhiyun/shihaihong/jobs/u_accept_c_inc.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/tmp/u_accept_c_inc
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/u_accept_c_inc.jsonecho "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists ods_shihaihong.u_accept_c_inc(
id int,
acceptno string,
idno string,
direct string,
wareid string,
stallno string,
wareqty string,
purprice string,
purtax string,
makeno string,
makedate string,
invalidate string,
acb_batchno string,
acb_no string,
acb_idno string,
acb_qty string,
cker1 string,
cker2 string,
cker3 string,
notes string,
leastpriceo string,
leastpricen string,
saleprice string,
whlpriceo string,
whlpricen string,
purpriceo string,
rowid string,
indentqty string,
invalidate_char string,
bak1 string,
bak2 string,
bak3 string,
bak4 string,
bak5 string,
bak6 string,
bak7 string,
bak8 string,maxqty string,
midqty string,
batchno_act string,
idno_act string,
indentprice string,
prodid string,
seal_stall string,
flag2 string,
indentno string,
checkstallno string,
chkcont string,
chkresult string,
backprice string,
bioavailability string,
sterilemakeno string,
chk string,
groupid string,
distprice string,
bak9 string,
bak10 string,
purprice_no string,
sterileinvalidate string,
storeqty string,
barcode string,
check_qty string,
check_nook_qty string,
reason string,
prod_addid string,
douchecker1 string,
douchecker2 string,
tallyqty string,
tally_checknoqty string,
checkno_notes string,
unqualified string,
eq_no string,
sterilemakedate string,
tally_idno string,
tally_makeno string,
tally_stallno string,
trayno string,
bp_id string,
packqty string,
distcount string,payeetype string,
wareqty_bak string,
scan_flag string,
quality_standard string
) partitioned by (dt string)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/u_accept_c_inc";
'
echo "加载数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
load data inpath '/zhiyun/shihaihong/tmp/u_accept_c_inc/*'
overwrite into table ods_shihaihong.u_accept_c_inc
partition(dt='$day');
"
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
show partitions ods_shihaihong.u_accept_c_inc;
select count(1) from ods_shihaihong.u_accept_c_inc where dt =
'$day';
select * from ods_shihaihong.u_accept_c_inc where dt = '$day' limit
5;
"
echo "抽取完成"
3 完成两张表的增量处理, 部署两个调度任务
3.1 每个任务的日期需要传参 3 个不同的日期, 不能报错
3.2 抽取的 SQL 不能是全表扫描
u_accepts_m_inc:
2018-03-12:
2020-01-01:
2015-07-02:
u_accept_c_inc:
2016-10-10:
2016-11-11:
2017-10-10:

相关文章:

  • 北京网站建设多少钱?
  • 辽宁网页制作哪家好_网站建设
  • 高端品牌网站建设_汉中网站制作
  • 鸿蒙开发之ArkTS 基础九 枚举类型
  • 高等数学 2.4 隐函数及由参数方程确定的函数的导数
  • ARM/Linux嵌入式面经(三三):大疆
  • 【多线程】深入剖析线程池的应用
  • 零基础如何学会Appium自动化测试?
  • Rust:深入浅出说一说 Error 类型
  • LeetCode:2848. 与车的相交点 一次遍历,时间复杂度O(n)
  • OPEN AI o1已经像人类一样思考了。。。
  • Oracle发邮件功能:设置的步骤与注意事项?
  • Java-数据结构-二叉树-习题(二) (´▽`)ノ
  • 实习期间git的分枝管理以及最常用的命令
  • 使用vant UI实现时间段选择
  • 移情别恋c++ ദ്ദി˶ー̀֊ー́ ) ——13.mapset
  • 电脑开机速度慢怎么解决?
  • 烧结机等调速系统电气设计-大作业/毕设
  • 分享一款快速APP功能测试工具
  • 【108天】Java——《Head First Java》笔记(第1-4章)
  • Bootstrap JS插件Alert源码分析
  • C++回声服务器_9-epoll边缘触发模式版本服务器
  • CoolViewPager:即刻刷新,自定义边缘效果颜色,双向自动循环,内置垂直切换效果,想要的都在这里...
  • CSS居中完全指南——构建CSS居中决策树
  • es6--symbol
  • IDEA常用插件整理
  • JavaWeb(学习笔记二)
  • passportjs 源码分析
  • Spark VS Hadoop:两大大数据分析系统深度解读
  • 对话 CTO〡听神策数据 CTO 曹犟描绘数据分析行业的无限可能
  • 基于Volley网络库实现加载多种网络图片(包括GIF动态图片、圆形图片、普通图片)...
  • 使用Tinker来调试Laravel应用程序的数据以及使用Tinker一些总结
  • 小程序测试方案初探
  • 运行时添加log4j2的appender
  • 蚂蚁金服CTO程立:真正的技术革命才刚刚开始
  • ​Kaggle X光肺炎检测比赛第二名方案解析 | CVPR 2020 Workshop
  • ​字​节​一​面​
  • (1)(1.8) MSP(MultiWii 串行协议)(4.1 版)
  • (4)STL算法之比较
  • (Redis使用系列) SpringBoot中Redis的RedisConfig 二
  • (超简单)构建高可用网络应用:使用Nginx进行负载均衡与健康检查
  • (附源码)spring boot建达集团公司平台 毕业设计 141538
  • (附源码)spring boot智能服药提醒app 毕业设计 102151
  • (附源码)ssm基于jsp高校选课系统 毕业设计 291627
  • (机器学习-深度学习快速入门)第一章第一节:Python环境和数据分析
  • (剑指Offer)面试题41:和为s的连续正数序列
  • (接口封装)
  • (图)IntelliTrace Tools 跟踪云端程序
  • (原創) X61用戶,小心你的上蓋!! (NB) (ThinkPad) (X61)
  • (转)Sublime Text3配置Lua运行环境
  • (转)总结使用Unity 3D优化游戏运行性能的经验
  • **《Linux/Unix系统编程手册》读书笔记24章**
  • .Net Core 生成管理员权限的应用程序
  • .Net Core中Quartz的使用方法
  • .NET MVC第五章、模型绑定获取表单数据
  • .NET 同步与异步 之 原子操作和自旋锁(Interlocked、SpinLock)(九)
  • .NET单元测试使用AutoFixture按需填充的方法总结
  • .NET实现之(自动更新)