Implementing RawComparator will speed up your Hadoop Map/Reduce (MR) Jobs

Introduction Implementing the org.apache.hadoop.io.RawComparator interface will definitely help speed up your Map/Reduce (MR) Jobs. As you may recall, a MR Job is composed of receiving and sending key-value pairs. The process looks like the following. (K1,V1) –> Map –> (K2,V2) (K2,List[V2]) –> Reduce –> (K3,V3) The key-value pairs (K2,V2) are called the intermediary key-value pairs. […]

Secondary sorting aka sorting values in Hadoop’s Map/Reduce programming paradigm

Introduction Sometimes, we would like to sort the values coming into the Reducer of a Hadoop Map/Reduce (MR) Job. You can indirectly sort the values by using a combination of implementations. They are as follows. Use a composite key. Extend org.apache.hadoop.mapreduce.Partitioner. Extend org.apache.hadoop.io.WritableComparator. Other tutorials that explains this approach on sorting values going into a […]