hadoop 测试第一个mapreduce程序

说明：测试hadoop自带的实例 wordcount程序（此程序统计每个单词在文件中出现的次数）

2.6.0版本jar程序的路径是

/usr/local/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar

一、在本地创建目录和文件

创建目录：

mkdir /home/hadoop/input

cd /home/hadoop/input

创建文件：

touch wordcount1.txt

touch wordcount2.txt

二、添加内容

echo "Hello World" > wordcount1.txt

echo "Hello Hadoop" > wordcount2.txt

三、在hdfs上创建input目录

hadoop fs -mkdir /input

四、拷贝文件到/input目录

hadoop fs -put /home/hadoop/input/* /input

五、执行程序

hadoop jar /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output

说明：wordcount为程序的主类名， /input 输入目录 /output 输出目录（输出目录不能存在）

六、执行过程信息

15/04/14 15:55:03 INFO client.RMProxy: Connecting to ResourceManager at hdnn140/192.168.152.140:8032

15/04/14 15:55:04 INFO input.FileInputFormat: Total input paths to process : 2

15/04/14 15:55:04 INFO mapreduce.JobSubmitter: number of splits:2

15/04/14 15:55:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428996061278_0002

15/04/14 15:55:05 INFO impl.YarnClientImpl: Submitted application application_1428996061278_0002

15/04/14 15:55:05 INFO mapreduce.Job: The url to track the job: http://hdnn140:8088/proxy/application_1428996061278_0002/

15/04/14 15:55:05 INFO mapreduce.Job: Running job: job_1428996061278_0002

15/04/14 15:55:17 INFO mapreduce.Job: Job job_1428996061278_0002 running in uber mode : false

15/04/14 15:55:17 INFO mapreduce.Job: map 0% reduce 0%

15/04/14 15:56:00 INFO mapreduce.Job: map 100% reduce 0%

15/04/14 15:56:10 INFO mapreduce.Job: map 100% reduce 100%

15/04/14 15:56:11 INFO mapreduce.Job: Job job_1428996061278_0002 completed successfully

15/04/14 15:56:11 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=55

FILE: Number of bytes written=316738

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=235

HDFS: Number of bytes written=25

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=2

Launched reduce tasks=1

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=83088

Total time spent by all reduces in occupied slots (ms)=7098

Total time spent by all map tasks (ms)=83088

Total time spent by all reduce tasks (ms)=7098

Total vcore-seconds taken by all map tasks=83088

Total vcore-seconds taken by all reduce tasks=7098

Total megabyte-seconds taken by all map tasks=85082112

Total megabyte-seconds taken by all reduce tasks=7268352

Map-Reduce Framework

Map input records=2

Map output records=4

Map output bytes=41

Map output materialized bytes=61

Input split bytes=210

Combine input records=4

Combine output records=4

Reduce input groups=3

Reduce shuffle bytes=61

Reduce input records=4

Reduce output records=3

Spilled Records=8

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=1649

CPU time spent (ms)=4260

Physical memory (bytes) snapshot=280866816

Virtual memory (bytes) snapshot=2578739200

Total committed heap usage (bytes)=244625408

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=25

File Output Format Counters

Bytes Written=25

七、完成后查看输出目录

hadoop fs -ls /output

八、查看输出结果

hadoop fs -cat /output/part-r-00000

九、完成

本文转自 yntmdr 51CTO博客，原文链接：http://blog.51cto.com/yntmdr/1632323，如需转载请自行联系原作者

相关文章：