首页 Linux LINUX实操:运行Hadoop自带的wordcount单词统计程序

LINUX实操:运行Hadoop自带的wordcount单词统计程序

介绍《LINUX实操:运行Hadoop自带的wordcount单词统计程序》开发教程,希望对您有用。

《LINUX实操:运行Hadoop自带的wordcount单词统计程序》要点:
本文介绍了LINUX实操:运行Hadoop自带的wordcount单词统计程序,希望对您有用。如果有疑问,可以联系我们。

0.前言

前面一篇《Hadoop初体验:快速搭建Hadoop伪分布式环境》搭建了一个Hadoop的环境,现在就使用Hadoop自带的wordcount法式来做单词统计的案例.

1.使用示例法式实现单词统计

(1)wordcount法式

wordcount法式在hadoop的share目录下,如下:

[root@linuxidc mapreduce]# pwd
/usr/local/hadoop/share/hadoop/mapreduce
[root@linuxidc mapreduce]# ls
hadoop-mapreduce-client-app-2.6.5.jar hadoop-mapreduce-client-jobclient-2.6.5-tests.jar
hadoop-mapreduce-client-common-2.6.5.jar hadoop-mapreduce-client-shuffle-2.6.5.jar
hadoop-mapreduce-client-core-2.6.5.jar hadoop-mapreduce-examples-2.6.5.jar
hadoop-mapreduce-client-hs-2.6.5.jar lib
hadoop-mapreduce-client-hs-plugins-2.6.5.jar lib-examples
hadoop-mapreduce-client-jobclient-2.6.5.jar sources

就是这个hadoop-mapreduce-examples-2.6.5.jar程序.

(2)创建HDFS数据目录
创建一个目录,用于保留MapReduce任务的输入文件:

[root@linuxidc ~]# hadoop fs -mkdir -p /data/wordcount

创建一个目录,用于保留MapReduce任务的输出文件:

[root@linuxidc ~]# hadoop fs -mkdir /output

查看刚刚创立的两个目录:

[root@linuxidc ~]# hadoop fs -ls /
drwxr-xr-x – root supergroup 0 2017-09-01 20:34 /data
drwxr-xr-x – root supergroup 0 2017-09-01 20:35 /output

(3)创立一个单词文件,并上传到HDFS
创立的单词文件如下:

[root@linuxidc ~]# cat myword.txt
linuxidc yyh
yyh xplinuxidc
katy ling
yeyonghao linuxidc
xpleaf katy

上传该文件到HDFS中:

[root@linuxidc ~]# hadoop fs -put myword.txt /data/wordcount

在HDFS中查看方才上传的文件及内容:

[root@linuxidc ~]# hadoop fs -ls /data/wordcount
-rw-r–r– 1 root supergroup 57 2017-09-01 20:40 /data/wordcount/myword.txt
[root@linuxidc ~]# hadoop fs -cat /data/wordcount/myword.txt
linuxidc yyh
yyh xplinuxidc
katy ling
yeyonghao linuxidc
xpleaf katy

(4)运行wordcount法式
执行如下命令:

[root@linuxidc ~]# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount

17/09/01 20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully
17/09/01 20:48:14 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=585940
FILE: Number of bytes written=1099502
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=114
HDFS: Number of bytes written=48
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=5
Map output records=10
Map output bytes=97
Map output materialized bytes=78
Input split bytes=112
Combine input records=10
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=78
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=92
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=241049600
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=57
File Output Format Counters
Bytes Written=48

(5)查看统计成果
如下:

[root@linuxidc ~]# hadoop fs -cat /output/wordcount/part-r-00000
katy 2
linuxidc 2
ling 1
xplinuxidc 2
yeyonghao 1
yyh 2

更多Hadoop相关信息见Hadoop 专题页面 /topicnews.aspx?tid=13

本文永远更新链接地址:

欢迎参与《LINUX实操:运行Hadoop自带的wordcount单词统计程序》讨论,分享您的想法,脚本之家PHP学院为您提供专业教程。

本文来自网络,不代表云浮站长网立场。转载请注明出处: https://www.0766zz.com/html/zhonghe/fwq/linux/20200831/8478.html
上一篇
下一篇

作者: dawei

【声明】:云浮站长网内容转载自互联网,其相关言论仅代表作者个人观点绝非权威,不代表本站立场。如您发现内容存在版权问题,请提交相关链接至邮箱:bqsm@foxmail.com,我们将及时予以处理。

为您推荐

返回顶部