here is a sample job script I got running to test out some hadoop mapreduce jobs for our new cluster. You can put this in the same directory with the map/reducer files. the -file parameter will package up those files and send them to the tasknodes in the cluster so you don't have to install them yourself.


#!/bin/sh

# remove local output data
rm -rf /data/out/insights-output-traffic

# remove dfs output data
/data/hadoop/bin/hadoop dfs -rmr output-traffic*

# start hadoop job
/data/hadoop/bin/hadoop jar /data/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar \
-jobconf mapred.reduce.tasks=9 \
-mapper mapper.py \
-reducer reducer.py \
-file mapper.py \
-file reducer.py \
-input daytest/smallday/* \
-output output-traffic


# move output to local dir
/data/hadoop/bin/hadoop dfs -copyToLocal output-traffic /data/out

Ready for More?

Follow Me @jimplush