当前位置: 首页 > news >正文

Hive优化(2)之系统评估reduce数为1的MR Job优化

名词解释:

云霄飞车:hive本身对MR Job reduce数估算不合理,导致reduce分配过少,任务运行很慢,云霄飞车项目主要对hive本身reduce数的估算进行优化。

map_input_bytesmap输入文件大小,单位:bytes

map_output_bytesmap输出文件大小,单位:bytes

 

优化背景

云霄飞车一期存在如下问题:只能优化reduce>1MR Job。原因在于无法确定reduce数为1是编译时确定还是根据map输入估算的结果。对于编译时确定,不能进行优化,否则导致结果错误;对于后者,需要进行优化,特别是对于map_output_bytes远大于map_input_bytes的情况,不进行优化将导致reduce执行过慢。

 

解决方法:

确定reduce数为1是编译时确定还是根据map_input_bytes估算得到的。具体实现方式:编译完成后,收集编译时确定的reduce数为1Job;云霄飞车优化时,如果此Jobreduce数为1不在收集的Job集合里面,则此Job不是编译时确定的reduce,则进行优化,否则不优化。

 

优化算法:

hive估算reduce的逻辑如下:

  1. 判断Job是否需要reduce操作,如不需要reduce操作,reduce数设置为0,跳出;如需要reduce操作,执行步骤(2);
  2. 判断Job是否在编译时确定reduce数为1,如编译确定为1reduce数设置为1,跳出;如需要reduce操作,执行步骤(3);
  3. 判断Job是否手动设置reduce数,如果手动设置reduce数,reduce数设置为此值,跳出;如未手动设置,执行步骤(4);
  4. 根据map输入文件大小(map_input_bytes)估算reduce数,默认为输入文件的1G估算为1reduce,根据估算的reduce设置设置此Jobreduce

云霄飞车项目对上述步骤(4)进行了优化,即优化根据输入文件的大小评估reduce数的逻辑。对于hive估算的reduce>1MR Job,直接按照如下算法重新估算reduce数;对于hive估算的reduce=1MR Job,判断此MR Jobreduce数是否是编译时确定的,如果是编译时确定,不进行优化,否则按照相同算法进行优化。

 

算法如下:

map_output_bytes

计算公式

reduce范围

0-30GB:

datasize/128M

1-240

30GB-100GB:

240 + (datasize 30G) / 512M

240 - 380

100GB-500GB:

380 + (datasize-100GB) / 1G

380 - 780

500GB以上:

780 + (datasize-500GB)/2G

780 -


INSERT OVERWRITE TABLE tdl_en_dm_account_kw_effect_smt0_tmp5
SELECT a.keyword
       ,indexation(coalesce(b.search_pv_index, cast(0 as bigint)), '3.14,1.8','0,50,100,1000,5000','1,10,30,100,1000') as search_pv_index
       ,coalesce(c.gs_tp_member_set_cnt, cast(0 as bigint)) as gs_tp_member_set_cnt
       ,(case when d.keyword is not null then '1' else '0' end) as is_ban_kw
       ,dummy_string(200) as dummy_string
  FROM (SELECT keyword
          FROM tdl_en_dm_account_kw_effect_smt0_tmp0
         GROUP BY keyword
       ) a   
  LEFT OUTER JOIN
       (SELECT trim(upper(keyword)) as keyword
               ,sum(coalesce(spv, cast (0 as bigint))) as search_pv_index
          FROM adl_en_kw_effect_se_norm_fdt0
         WHERE hp_stat_date <= '2012-07-31'
           AND hp_stat_date >= '2012-07-01'
         GROUP BY trim(upper(keyword))
       ) b   
    ON (a.keyword = b.keyword)
  LEFT OUTER JOIN
       (SELECT trim(upper(keyword)) as keyword
           ,count(distinct admin_member_seq) as gs_tp_member_set_cnt
      FROM idl_en_kw_cpt_mem_set_fdt0
     WHERE hp_stat_date = '2012-07-31'
       AND service_type_id in ('cgs','hkgs','twgs','tp')
       AND keyword is not null
     GROUP BY trim(upper(keyword))
       ) c  
    ON (a.keyword = c.keyword)
  LEFT OUTER JOIN
       (SELECT trim(upper(keyword)) as keyword
          FROM bdl_en07_ipr_keyword_dw_c
         WHERE type1 = 'ban'
         GROUP BY trim(upper(keyword))
       ) d  
    ON (a.keyword = d.keyword);
    
Total MapReduce jobs = 5
Launching Job 1 out of 5
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 698539343) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2406874, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2406874
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2406874
Hadoop job information for Stage-1: number of mappers: 31; number of reducers: 1
2012-09-10 16:26:40,644 Stage-1 map = 0%,  reduce = 0%
2012-09-10 16:26:51,523 Stage-1 map = 11%,  reduce = 0%
2012-09-10 16:27:02,736 Stage-1 map = 57%,  reduce = 0%
2012-09-10 16:27:17,953 Stage-1 map = 99%,  reduce = 0%
2012-09-10 16:27:41,117 Stage-1 map = 100%,  reduce = 17%
2012-09-10 16:28:09,655 Stage-1 map = 100%,  reduce = 45%
2012-09-10 16:28:41,003 Stage-1 map = 100%,  reduce = 74%
2012-09-10 16:29:01,683 Stage-1 map = 100%,  reduce = 79%
2012-09-10 16:29:04,744 Stage-1 map = 100%,  reduce = 82%
2012-09-10 16:29:10,280 Stage-1 map = 100%,  reduce = 85%
2012-09-10 16:29:23,987 Stage-1 map = 100%,  reduce = 87%
2012-09-10 16:29:33,265 Stage-1 map = 100%,  reduce = 90%
2012-09-10 16:29:42,898 Stage-1 map = 100%,  reduce = 93%
2012-09-10 16:29:58,016 Stage-1 map = 100%,  reduce = 99%
Ended Job = job_201208241319_2406874
Launching Job 2 out of 5
Number of reduce tasks not specified. Estimated from input data size: 65
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 64928671778) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2407439, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2407439
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2407439
Hadoop job information for Stage-3: number of mappers: 333; number of reducers: 65
2012-09-10 16:31:48,096 Stage-3 map = 0%,  reduce = 0%
2012-09-10 16:31:58,278 Stage-3 map = 1%,  reduce = 0%
2012-09-10 16:32:00,878 Stage-3 map = 4%,  reduce = 0%
2012-09-10 16:32:03,450 Stage-3 map = 8%,  reduce = 0%
2012-09-10 16:32:05,322 Stage-3 map = 14%,  reduce = 0%
2012-09-10 16:32:07,365 Stage-3 map = 22%,  reduce = 0%
2012-09-10 16:32:08,801 Stage-3 map = 29%,  reduce = 0%
2012-09-10 16:32:10,335 Stage-3 map = 35%,  reduce = 0%
2012-09-10 16:32:13,453 Stage-3 map = 43%,  reduce = 0%
2012-09-10 16:32:16,894 Stage-3 map = 63%,  reduce = 0%
2012-09-10 16:32:20,426 Stage-3 map = 77%,  reduce = 0%
2012-09-10 16:32:27,855 Stage-3 map = 90%,  reduce = 0%
2012-09-10 16:32:36,965 Stage-3 map = 99%,  reduce = 0%
2012-09-10 16:32:43,084 Stage-3 map = 100%,  reduce = 0%
2012-09-10 16:32:47,360 Stage-3 map = 100%,  reduce = 18%
2012-09-10 16:32:51,149 Stage-3 map = 100%,  reduce = 31%
2012-09-10 16:32:53,988 Stage-3 map = 100%,  reduce = 38%
2012-09-10 16:32:56,459 Stage-3 map = 100%,  reduce = 42%
2012-09-10 16:32:59,834 Stage-3 map = 100%,  reduce = 54%
2012-09-10 16:33:03,535 Stage-3 map = 100%,  reduce = 63%
2012-09-10 16:33:08,789 Stage-3 map = 100%,  reduce = 73%
2012-09-10 16:33:14,299 Stage-3 map = 100%,  reduce = 92%
2012-09-10 16:33:18,423 Stage-3 map = 100%,  reduce = 99%
2012-09-10 16:33:22,124 Stage-3 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2407439
Launching Job 3 out of 5
Number of reduce tasks not specified. Estimated from input data size: 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2711959479) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2407819, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2407819
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2407819
Hadoop job information for Stage-4: number of mappers: 10; number of reducers: 3
2012-09-10 16:33:44,219 Stage-4 map = 0%,  reduce = 0%
2012-09-10 16:34:01,388 Stage-4 map = 1%,  reduce = 0%
2012-09-10 16:34:11,607 Stage-4 map = 6%,  reduce = 0%
2012-09-10 16:34:17,661 Stage-4 map = 11%,  reduce = 0%
2012-09-10 16:34:23,270 Stage-4 map = 14%,  reduce = 0%
2012-09-10 16:34:32,606 Stage-4 map = 17%,  reduce = 0%
2012-09-10 16:34:44,748 Stage-4 map = 22%,  reduce = 0%
2012-09-10 16:35:01,395 Stage-4 map = 32%,  reduce = 0%
2012-09-10 16:35:18,943 Stage-4 map = 43%,  reduce = 0%
2012-09-10 16:35:38,716 Stage-4 map = 54%,  reduce = 0%
2012-09-10 16:36:01,974 Stage-4 map = 73%,  reduce = 0%
2012-09-10 16:36:21,750 Stage-4 map = 97%,  reduce = 0%
2012-09-10 16:36:40,284 Stage-4 map = 100%,  reduce = 4%
2012-09-10 16:36:58,595 Stage-4 map = 100%,  reduce = 21%
2012-09-10 16:37:17,022 Stage-4 map = 100%,  reduce = 52%
2012-09-10 16:37:29,315 Stage-4 map = 100%,  reduce = 69%
2012-09-10 16:37:39,690 Stage-4 map = 100%,  reduce = 72%
2012-09-10 16:37:50,249 Stage-4 map = 100%,  reduce = 75%
2012-09-10 16:38:05,929 Stage-4 map = 100%,  reduce = 81%
2012-09-10 16:38:17,927 Stage-4 map = 100%,  reduce = 84%
2012-09-10 16:38:27,357 Stage-4 map = 100%,  reduce = 87%
2012-09-10 16:38:36,761 Stage-4 map = 100%,  reduce = 88%
2012-09-10 16:38:46,276 Stage-4 map = 100%,  reduce = 92%
2012-09-10 16:38:53,322 Stage-4 map = 100%,  reduce = 95%
2012-09-10 16:39:00,616 Stage-4 map = 100%,  reduce = 96%
2012-09-10 16:39:12,326 Stage-4 map = 100%,  reduce = 99%
2012-09-10 16:39:21,258 Stage-4 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2407819
Launching Job 4 out of 5
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2497170) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2408468, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2408468
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2408468
Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 1
2012-09-10 16:40:04,701 Stage-5 map = 0%,  reduce = 0%
2012-09-10 16:40:26,284 Stage-5 map = 100%,  reduce = 0%
2012-09-10 16:40:48,103 Stage-5 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2408468
Launching Job 5 out of 5
Number of reduce tasks not specified. Estimated from input data size: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 1067723025) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2408626, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2408626
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2408626
Hadoop job information for Stage-2: number of mappers: 70; number of reducers: 2
2012-09-10 16:42:40,831 Stage-2 map = 0%,  reduce = 0%
2012-09-10 16:43:02,831 Stage-2 map = 94%,  reduce = 0%
2012-09-10 16:43:25,577 Stage-2 map = 96%,  reduce = 9%
2012-09-10 16:43:38,820 Stage-2 map = 96%,  reduce = 17%
2012-09-10 16:43:46,859 Stage-2 map = 97%,  reduce = 28%
2012-09-10 16:43:50,491 Stage-2 map = 97%,  reduce = 31%
2012-09-10 16:43:57,931 Stage-2 map = 98%,  reduce = 31%
2012-09-10 16:44:07,289 Stage-2 map = 99%,  reduce = 31%
2012-09-10 16:44:14,606 Stage-2 map = 99%,  reduce = 32%
2012-09-10 16:44:26,118 Stage-2 map = 99%,  reduce = 33%
2012-09-10 16:44:29,891 Stage-2 map = 100%,  reduce = 33%
2012-09-10 16:45:04,755 Stage-2 map = 100%,  reduce = 52%
2012-09-10 16:45:14,944 Stage-2 map = 100%,  reduce = 67%
2012-09-10 16:45:57,172 Stage-2 map = 100%,  reduce = 68%
2012-09-10 16:46:55,271 Stage-2 map = 100%,  reduce = 69%
2012-09-10 16:47:34,879 Stage-2 map = 100%,  reduce = 70%
2012-09-10 16:48:51,459 Stage-2 map = 100%,  reduce = 71%
2012-09-10 16:49:40,682 Stage-2 map = 100%,  reduce = 72%
2012-09-10 16:50:31,918 Stage-2 map = 100%,  reduce = 73%
2012-09-10 16:51:17,001 Stage-2 map = 100%,  reduce = 74%
2012-09-10 16:52:16,802 Stage-2 map = 100%,  reduce = 75%
2012-09-10 16:53:26,683 Stage-2 map = 100%,  reduce = 76%
2012-09-10 16:54:28,473 Stage-2 map = 100%,  reduce = 77%
2012-09-10 16:54:40,219 Stage-2 map = 100%,  reduce = 78%
2012-09-10 16:55:15,820 Stage-2 map = 100%,  reduce = 79%
2012-09-10 16:56:15,632 Stage-2 map = 100%,  reduce = 80%
2012-09-10 16:56:58,645 Stage-2 map = 100%,  reduce = 81%
2012-09-10 16:57:34,794 Stage-2 map = 100%,  reduce = 82%
2012-09-10 16:58:12,770 Stage-2 map = 100%,  reduce = 83%
2012-09-10 16:59:09,950 Stage-2 map = 100%,  reduce = 84%
2012-09-10 16:59:56,071 Stage-2 map = 100%,  reduce = 85%
2012-09-10 17:00:51,556 Stage-2 map = 100%,  reduce = 86%
2012-09-10 17:01:52,019 Stage-2 map = 100%,  reduce = 87%
2012-09-10 17:02:33,026 Stage-2 map = 100%,  reduce = 88%
2012-09-10 17:03:42,677 Stage-2 map = 100%,  reduce = 89%
2012-09-10 17:04:33,151 Stage-2 map = 100%,  reduce = 90%
2012-09-10 17:05:21,476 Stage-2 map = 100%,  reduce = 91%
2012-09-10 17:05:57,097 Stage-2 map = 100%,  reduce = 92%
2012-09-10 17:06:39,520 Stage-2 map = 100%,  reduce = 93%
2012-09-10 17:07:28,118 Stage-2 map = 100%,  reduce = 94%
2012-09-10 17:08:10,033 Stage-2 map = 100%,  reduce = 95%
2012-09-10 17:09:03,468 Stage-2 map = 100%,  reduce = 96%
2012-09-10 17:09:42,495 Stage-2 map = 100%,  reduce = 97%
2012-09-10 17:10:36,427 Stage-2 map = 100%,  reduce = 98%
2012-09-10 17:11:27,875 Stage-2 map = 100%,  reduce = 99%
2012-09-10 17:12:33,050 Stage-2 map = 100%,  reduce = 99%
2012-09-10 17:12:54,651 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2408626
Loading data to table tdl_en_dm_account_kw_effect_smt0_tmp5
27964711 Rows loaded to tdl_en_dm_account_kw_effect_smt0_tmp5
OK
Time taken: 2911.069 seconds    

分析执行过程,发现主要的时间消耗在reduce阶段,主要是因为hadoop根据reduce的input数据量大小来计算需要的reduce数量
(input bytes)/1024/1024/1024,而估算的reduce数量不尽合理,导致任务执行较慢,在资源充裕的情况下可以使得增加reduce数量
以使得效率提升。

set mapred.reduce.tasks=200;
再执行该sql,结果如下:
Total MapReduce jobs = 5
Launching Job 1 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 698539343) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2418716, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2418716
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2418716
Hadoop job information for Stage-1: number of mappers: 31; number of reducers: 200
2012-09-10 18:27:38,519 Stage-1 map = 4%,  reduce = 0%
2012-09-10 18:27:54,686 Stage-1 map = 86%,  reduce = 0%
2012-09-10 18:28:12,664 Stage-1 map = 100%,  reduce = 1%
2012-09-10 18:28:32,951 Stage-1 map = 100%,  reduce = 97%
Ended Job = job_201208241319_2418716
Launching Job 2 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 64928671778) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2418954, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2418954
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2418954
Hadoop job information for Stage-3: number of mappers: 333; number of reducers: 200
2012-09-10 18:29:58,542 Stage-3 map = 41%,  reduce = 0%
2012-09-10 18:30:18,341 Stage-3 map = 97%,  reduce = 0%
2012-09-10 18:30:39,798 Stage-3 map = 100%,  reduce = 30%
2012-09-10 18:30:57,445 Stage-3 map = 100%,  reduce = 33%
2012-09-10 18:31:30,148 Stage-3 map = 100%,  reduce = 48%
2012-09-10 18:31:36,229 Stage-3 map = 100%,  reduce = 82%
2012-09-10 18:31:40,261 Stage-3 map = 100%,  reduce = 95%
2012-09-10 18:31:43,385 Stage-3 map = 100%,  reduce = 98%
2012-09-10 18:31:46,417 Stage-3 map = 100%,  reduce = 99%
2012-09-10 18:31:49,988 Stage-3 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2418954
Launching Job 3 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2711959479) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2419277, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2419277
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2419277
Hadoop job information for Stage-4: number of mappers: 10; number of reducers: 200
2012-09-10 18:32:39,666 Stage-4 map = 0%,  reduce = 0%
2012-09-10 18:32:51,789 Stage-4 map = 2%,  reduce = 0%
2012-09-10 18:33:11,546 Stage-4 map = 13%,  reduce = 0%
2012-09-10 18:33:32,475 Stage-4 map = 23%,  reduce = 0%
2012-09-10 18:33:49,567 Stage-4 map = 33%,  reduce = 0%
2012-09-10 18:34:05,118 Stage-4 map = 36%,  reduce = 0%
2012-09-10 18:34:25,977 Stage-4 map = 48%,  reduce = 0%
2012-09-10 18:34:38,126 Stage-4 map = 55%,  reduce = 0%
2012-09-10 18:34:46,751 Stage-4 map = 63%,  reduce = 0%
2012-09-10 18:34:52,980 Stage-4 map = 67%,  reduce = 0%
2012-09-10 18:34:55,887 Stage-4 map = 73%,  reduce = 0%
2012-09-10 18:35:03,626 Stage-4 map = 78%,  reduce = 0%
2012-09-10 18:35:09,209 Stage-4 map = 82%,  reduce = 0%
2012-09-10 18:35:13,249 Stage-4 map = 84%,  reduce = 0%
2012-09-10 18:35:17,927 Stage-4 map = 85%,  reduce = 0%
2012-09-10 18:35:24,694 Stage-4 map = 89%,  reduce = 0%
2012-09-10 18:35:32,634 Stage-4 map = 90%,  reduce = 0%
2012-09-10 18:35:34,874 Stage-4 map = 91%,  reduce = 0%
2012-09-10 18:35:37,460 Stage-4 map = 93%,  reduce = 0%
2012-09-10 18:35:39,766 Stage-4 map = 95%,  reduce = 0%
2012-09-10 18:35:42,091 Stage-4 map = 97%,  reduce = 0%
2012-09-10 18:35:51,546 Stage-4 map = 100%,  reduce = 0%
2012-09-10 18:35:57,990 Stage-4 map = 100%,  reduce = 11%
2012-09-10 18:36:11,144 Stage-4 map = 100%,  reduce = 90%
2012-09-10 18:36:24,157 Stage-4 map = 100%,  reduce = 99%
2012-09-10 18:36:45,706 Stage-4 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2419277
Launching Job 4 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2497056) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2419707, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2419707
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2419707
Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 200
2012-09-10 18:37:20,531 Stage-5 map = 0%,  reduce = 0%
2012-09-10 18:37:30,908 Stage-5 map = 100%,  reduce = 0%
2012-09-10 18:37:45,810 Stage-5 map = 100%,  reduce = 86%
2012-09-10 18:37:54,667 Stage-5 map = 100%,  reduce = 99%
Ended Job = job_201208241319_2419707
Launching Job 5 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 1327722733) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2419881, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2419881
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2419881
Hadoop job information for Stage-2: number of mappers: 800; number of reducers: 200
2012-09-10 18:40:52,642 Stage-2 map = 79%,  reduce = 0%
2012-09-10 18:41:09,558 Stage-2 map = 100%,  reduce = 19%
2012-09-10 18:41:14,070 Stage-2 map = 100%,  reduce = 34%
2012-09-10 18:41:16,301 Stage-2 map = 100%,  reduce = 48%
2012-09-10 18:41:18,580 Stage-2 map = 100%,  reduce = 60%
2012-09-10 18:41:20,193 Stage-2 map = 100%,  reduce = 68%
2012-09-10 18:41:21,253 Stage-2 map = 100%,  reduce = 73%
2012-09-10 18:41:23,210 Stage-2 map = 100%,  reduce = 77%
2012-09-10 18:41:25,600 Stage-2 map = 100%,  reduce = 83%
2012-09-10 18:41:28,022 Stage-2 map = 100%,  reduce = 89%
2012-09-10 18:41:31,500 Stage-2 map = 100%,  reduce = 93%
2012-09-10 18:41:36,121 Stage-2 map = 100%,  reduce = 98%
2012-09-10 18:41:40,743 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2419881
Loading data to table tdl_en_dm_account_kw_effect_smt0_tmp5
53095125 Rows loaded to tdl_en_dm_account_kw_effect_smt0_tmp5
OK
Time taken: 1020.148 seconds

性能提升约3倍。


相关文章:

  • RecycleView + SwipeRefreshLayout 实现下拉刷新和底部自动加载
  • hive中间接实现不等值连接
  • python之字符编码
  • Hadoop计算文件大小
  • 在Oracle中利用SQL_TRACE跟踪SQL的执行
  • Linux添加/删除用户和用户组
  • Hive优化(3)之随机数避免数据倾斜
  • Angular2学习(一)
  • hive优化(4)之mapjoin和union all避免数据倾斜
  • hive cli
  • Hive优化(5)之选择合适的map数
  • C++ 文件操作(CFile类)
  • Hadoop MapReduce:详解Shuffle过程
  • 编译树莓派2代B型OpenWrt固件实现无线路由器及nodogsplash认证功能
  • 为什么一些公司把dwg文件转化为pdf
  • [nginx文档翻译系列] 控制nginx
  • 03Go 类型总结
  • 11111111
  • angular组件开发
  • CSS相对定位
  • extjs4学习之配置
  • input的行数自动增减
  • Meteor的表单提交:Form
  • vue数据传递--我有特殊的实现技巧
  • 分布式熔断降级平台aegis
  • 关于for循环的简单归纳
  • 简单实现一个textarea自适应高度
  • 可能是历史上最全的CC0版权可以免费商用的图片网站
  • 马上搞懂 GeoJSON
  • 码农张的Bug人生 - 见面之礼
  • 爬虫进阶 -- 神级程序员:让你的爬虫就像人类的用户行为!
  • 如何正确配置 Ubuntu 14.04 服务器?
  • 山寨一个 Promise
  • 栈实现走出迷宫(C++)
  • ​ 无限可能性的探索:Amazon Lightsail轻量应用服务器引领数字化时代创新发展
  • ​猴子吃桃问题:每天都吃了前一天剩下的一半多一个。
  • !!java web学习笔记(一到五)
  • #1015 : KMP算法
  • #14vue3生成表单并跳转到外部地址的方式
  • (0)Nginx 功能特性
  • (Matlab)使用竞争神经网络实现数据聚类
  • (动态规划)5. 最长回文子串 java解决
  • (二) Windows 下 Sublime Text 3 安装离线插件 Anaconda
  • (附源码)springboot 房产中介系统 毕业设计 312341
  • (四)图像的%2线性拉伸
  • (五)MySQL的备份及恢复
  • (一)ClickHouse 中的 `MaterializedMySQL` 数据库引擎的使用方法、设置、特性和限制。
  • (转)VC++中ondraw在什么时候调用的
  • .[hudsonL@cock.li].mkp勒索加密数据库完美恢复---惜分飞
  • .bat批处理(七):PC端从手机内复制文件到本地
  • .net core控制台应用程序初识
  • .NET的数据绑定
  • [ vulhub漏洞复现篇 ] JBOSS AS 5.x/6.x反序列化远程代码执行漏洞CVE-2017-12149
  • [CF494C]Helping People
  • [Contest20180313]灵大会议