这篇文章主要介绍“mapreduce wordcount怎么理解”,在日常操作中,相信很多人在mapreduce wordcount怎么理解问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”mapreduce wordcount怎么理解”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
wordcount统计个数,在看代码时总是能看懂,但是真正的逻辑反而一直不明比,比如map端时怎么处理,reduce时又是怎么处理的,现在明白了。
原理是这样的,map端时读取每一行数据,并把每行数据中的一个字符统计一次,如下:
map 数据 {key,value} :
{0,hello word by word}
{1,hello hadoop by hadoop}
上面就是map端输入的key与value,在map端处理后会生成以下数据:
{hello,1} {word,1} {by,1} {word,1}
{hello,1} {hadoop,1} {by,1} {hadoop,1}
当看到这时大家都能明白,但是在reduce端时,就怎么也看不明白了,不知道是怎么对字符做统一的,再下通过对hadoop原理的分析得出在到reduce端时,会对map端发过来的数据进行清洗,清洗后的数据应该是以下结构:
[{hello},{1,1}] [{word},{1,1}] [{by},{1,1}] [{hadoop},{1,1}]
然后输入到reduce端,reduce会对每一个values做循环操作,对数据进行叠加,并输出到本地,具体代码请继续欣赏,不做多过解析。
public class WordCount extends Configured implements Tool{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key,Text value, Context context)
throws IOException,InterruptedException{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer();
while(tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken);
context.write(word,one);
}
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
public void reduce(Text key,Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException{
int sum = 0 ;
for(IntWritable val: values) {
sum += val.get();
}
context.write(key,new IntWritable(sum));
}
}
public int run(String[] arge) throws Exception{
Job job = new Job(getConf());
job.setJarByClass(WordCount.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReduceClass(reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextInputFormat.class);
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileInputFormat.setOutputPaths(job, new Path(args[1]));
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception{
int ret = ToolRunner.run(new WordCount(),args);
System.exit(ret);
}
}
到此,关于“mapreduce wordcount怎么理解”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注亿速云网站,小编会继续努力为大家带来更多实用的文章!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。