An approach to controlling logging on Amazon Web Services’ (AWS) Elastic MapReduce (EMR)


I have been using the Elastic MapReduce (EMR) product. EMR is one of many products and services available from Amazon Web Services (AWS). EMR is AWS’s product to dynamically provision a Hadoop cluster. One problem I ran into was how to control logging. Hadoop uses Apache Commons Logging. Both Hadoop and AWS seem to encourage using Log4j as the actual logging implementation. This blog assumes you are familiar with all these components.

One problem I ran into was how to control logging using Apache Commons Logging and Log4j. What I wanted to do was to simply be able to specify logging properties for Log4j. I asked for help on the AWS EMR discussion forum, but for several days (and thus far), no one responded (where did all the AWS/EMR evangelists go?). From digging around the AWS site and other discussion threads, I did get some scattered “hints” of how to go about controlling logging. Still, when I searched for a more detailed explanation (on the general internet) putting all the parts together, I did not see anything helpful or instructive. But, with some perseverance, I did pull through and found a way, reported in this blog, to control logging on an AWS EMR Hadoop cluster.

As you may already know, there is a conf directory inside the root of the Hadoop installation directory (i.e. /home/hadoop/conf). Inside this directory, is a file (i.e. /home/hadoop/conf/ This file is precisely where you need to make modifications to control logging.

Bootstrap Action

When you provision a Hadoop cluster with EMR, you can specify bootstrap actions (you can specify up to 16 bootstrap actions per job flow). If you want to modify, you need to specify a bootstrap action to overwrite the current with your own. Below is an example of how to create a Hadoop cluster with EMR (or, a Job Flow) with a bootstrap action to overwrite the default file with your own. You will need Ruby and Amazon’s EMR Ruby Client.

ruby elastic-mapreduce --create --name j-small --alive --enable-debugging --log-uri s3n://mrhadoop-uri/log/ --bootstrap-action "s3://mrhadoop-uri/"

What you really need to pay attention to is the command line option part: –bootstrap-action “s3://mrhadoop-uri/”. The –boostrap-action is a flag (an option). The “s3://mrhadoop-uri/” is the script you want to run. In this case, I want to run the script that is located in my S3 bucket, mrhadoop-uri. Below is an example of what might look like for you.

hadoop dfs -copyToLocal s3n://mrhadoop-uri/ /home/hadoop/
cp -f /home/hadoop/ /home/hadoop/conf/
mkdir /home/hadoop/logs

As you can see in this script, the first thing I do is copy from my S3 location to the local file system using the “hadoop dfs -copyToLocal” command. The next thing I do is execute a force copy of the new file over the current one. The last thing I do is to create a /home/hadoop/logs directory, since my file specifies that this directory is the location to which my log files will be written.

Below is a snippet of my file. As you can see, I have 3 rolling file appenders; one each for my mappers, reducers, and jobs.,R2,R3,R5

log4j.appender.R2.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

log4j.appender.R3.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

log4j.appender.R5.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

At this point, you can go ahead and submit your Map/Reduce (MR) Job. An example is shown below.

ruby elastic-mapreduce --jobflow <job-id> --json <json-step-file>

Getting the log files

At this point, your EMR Hadoop cluster should be set up to log per your specification. The natural question is, how do I get these log files back? You may SSH into your EMR Hadoop cluster and use Hadoop’s copy utility command to get your log files. To SSH into your EMR Hadoop instance, do the following.

ruby elastic-mapreduce --ssh --jobflow <job-id>

Now you are in a command line shell. Then go ahead and transfer the log files to your S3 bucket while you are in your command line shell.

hadoop dfs -copyFromLocal /home/hadoop/logs/map.log s3n://mrhadoop-uri/map.log

Future works

Future work (and this work should be contributed by Amazon itself, IMO), should be for the availability of an Log4j appender to log to S3 directly. In this way, all you really need to do is specify the boostrap action to copy the file over. In fact, here’s a link,, to a project that does something like what I am talking about. If you click on “download”, it’s a dead link. If you click on the location of the source code,, and browse around the SVN repository, there is no code to be found. So, until we have someone create an Appender that is able to write to S3, what I presented here is a way to control logging and acquire the logging output files. It’s multi-step, and because of that, error-prone, but it should suffice until something better comes along.


You probably do NOT want to completely overwrite the original file. What you may actually want to do is modify it (i.e. add additional Appenders or specify log levels for different classes/packages). What I actually did was

  1. create the AWS EMR Hadoop instance,
  2. SSH into the Hadoop instance,
  3. use Hadoop’s copy utility to copy the original /home/hadoop/conf/ file to my S3 bucket, and
  4. download from my S3 bucket to my computer, modify it, and upload it back to S3.

Just to save you time and the trouble, here is the default /home/hadoop/conf/ file.

# Define some default values that can be overridden by system properties

# Job Summary Appender 
# Use following logger to send summary to separate file defined by 
# hadoop.mapreduce.jobsummary.log.file rolled daily:
# hadoop.mapreduce.jobsummary.logger=INFO,JSA
#AWS SDK Logging

# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hadoop.root.logger}, EventCounter

# Logging Threshold

# Daily Rolling File Appender


# Rollver at every hour

# 30-day backup

# Pattern format: Date LogLevel LoggerName LogMessage
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c (%t): %m%n
# Debugging Pattern format
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

# console
# Add "console" to rootlogger above if you want to use this 

log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

# TaskLog Appender

#Default values


log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c (%t): %m%n

#Security audit appender

log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
#new logger

# Rolling File Appender


# Logfile size and and 30-day backups

#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

# FSNamesystem Audit logging
# All audit events are logged at INFO level

# Custom Logging levels


# Jets3t library

# Null Appender
# Trap security logger on the hadoop client side

# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.

# Job Summary Appender
log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

Summary and conclusion

This blog was about controlling the logging output of Log4j on an AWS EMR Hadoop cluster using a boostrap action. Future work should be to write a Log4j appender to log directly to S3. IMO, this work should be done by Amazon (they should do it because it would help all their users of AWS EMR, and also, since they charge for in/out data transfer of S3, writing such a S3 Appender is in the interest of their profits). In fact, an attempt of some sort has been made, but there is no code to show for the effort. As always, happy coding and I hope you find this blog helpful.

Sib ntsib dua nawb mog!


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s