Hadoop


Passing Parameters and Arguments to Mapper and Reducer in Hadoop 2

In Hadoop, it is sometimes difficult to pass arguments to mappers and reducers. If the number of arguments is huge (e.g., big arrays), DistributedCache might be a good choice. However, here, we’re discussing small arguments, usually a hand of configuration parameters.

In fact, the way to configure these parameters is simple. When you initialize “JobConf” object to launch a mapreduce job, you can set the parameter by using “set” method like:

1
2
JobConf job = (JobConf)getConf();
job.set("NumberOfDocuments", args[0]);

Here, “NumberOfDocuments” is the name of parameter and its value is read from “args[0]”, a command line argument. Once you set this arguments, you can retrieve its value in reducer or mapper as follows:

1
2
3
4
private static Long N;
public void configure(JobConf job) {
     N = Long.parseLong(job.get("NumberOfDocuments"));
}

Note, the tricky part is that you cannot set parameters like this:

1
2
Configuration con = new Configuration();
con.set("NumberOfDocuments", args[0]);

and hope that all mappers or reducers can retrieve this parameter. This will fail in running.