Hadoop

Passing Parameters and Arguments to Mapper and Reducer in Hadoop 2

This entry was posted in Hadoop on January 16, 2011 by Liangjie Hong

In Hadoop, it is sometimes difficult to pass arguments to mappers and reducers. If the number of arguments is huge (e.g., big arrays), DistributedCache might be a good choice. However, here, we’re discussing small arguments, usually a hand of configuration parameters.

In fact, the way to configure these parameters is simple. When you initialize “JobConf” object to launch a mapreduce job, you can set the parameter by using “set” method like:

1 2	JobConf job = (JobConf)getConf(); job.set("NumberOfDocuments", args[0]);

Here, “NumberOfDocuments” is the name of parameter and its value is read from “args[0]”, a command line argument. Once you set this arguments, you can retrieve its value in reducer or mapper as follows:

private static Long N;
public void configure(JobConf job) {
     N = Long.parseLong(job.get("NumberOfDocuments"));
}

Note, the tricky part is that you cannot set parameters like this:

1 2	Configuration con = new Configuration(); con.set("NumberOfDocuments", args[0]);

and hope that all mappers or reducers can retrieve this parameter. This will fail in running.

Hong, LiangJie

Director of Engineering – AI at LinkedIn Corporation

Director of Engineering – AI at LinkedIn Corporation

Passing Parameters and Arguments to Mapper and Reducer in Hadoop 2