this is the article I refer to. The program that runs according to this article https://blog.csdn.net/wangato.
, as shown in the picture, always reports the error in the first sentence. I Baidu has the reason, but has not been able to solve it.
1. Python did not report an error in the article, and you can count the number of words according to the article, but use hadoop to report
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
A novice, and I really can"t find the reason. This is my python program
1.mapper.py
-sharp!/usr/bin/env python3
import sys
-sharp input comes from STDIN (standard input)
for line in sys.stdin:
-sharp remove leading and trailing whitespace
line = line.strip()
-sharp split the line into words
words = line.split()
-sharp increase counters
for word in words:
-sharp write the results to STDOUT (standard output);
-sharp what we output here will be the input for the
-sharp Reduce step, i.e. the input for reducer.py
-sharp
-sharp tab-delimited; the trivial word count is 1
print ("%s\t%s" % (word, 1))
2.reduce.py
-sharp!/usr/bin/env python3
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
-sharp input comes from STDIN
for line in sys.stdin:
-sharp remove leading and trailing whitespace
line = line.strip()
-sharp parse the input we got from mapper.py
word, count = line.split("\t", 1)
-sharp convert count (currently a string) to int
try:
count = int(count)
except ValueError:
-sharp count was not a number, so silently
-sharp ignore/discard this line
continue
-sharp this IF-switch only works because Hadoop sorts map output
-sharp by key (here: word) before it is passed to the reducer
if current_word == word:
current_count += count
else:
if current_word:
-sharp write result to STDOUT
print ("%s\t%s" % (current_word, current_count))
current_count = count
current_word = word
-sharp do not forget to output the last word if needed!
if current_word == word:
print ("%s\t%s" % (current_word, current_count))