Notes on Hadoop

Streaming#

Output#

The "-output ~/output_path" *cannot* exist. Erase it before running.

Too many return errors will cause hadoop to kill the whole job. #

If you return with an error code on error in your program on Hadoop, you run the risk of Hadoop killing everything. Better to simply return 0 and use the output to determine what was killed and what was not.


Performance