In this blog, I will show how to install a single-node Hadoop (v2.3.0) instance with YARN using Vagrant. You might think this is a crazy idea, given that HortonWorks and Cloudera offers free sandboxes with Hadoop. However, it’s not so crazy if you think about wanting to learn about how to actually do it yourself (DIY). There’s a lot that one can learn with a DIY approach (such as dependencies and minimal requirements). Also, I find these sandboxes quite confusing (where’s Hadoop actually installed; you might find files are all over the place) and resembles bloatware (Spark, Hue, Impala, etc…). Furthermore, I found the installation documentation on Hadoop unclear, and I just had to figure out for myself what’s involved. To follow along in this blog, you will need to download the following software.
- VirtualBox v4.3.6
- Vagrant v1.4.3
The first thing you need to do is install VirtualBox. The second thing you need to do is install Vagrant. Next, on the command-line, add the required Vagrant box.
vagrant box add centos65 https://github.com/2creatives/vagrant-centos/releases/download/v6.5.1/centos65-x86_64-20131205.box
Then, using your favorite Git client, check out the Vagrant project from GitHub at https://github.com/vangj/vagrant-hadoop-2.3.0.git. After you checkout the Vagrant project, go into this directory and simply type in the following.
Depending on your connection, it will take a while for the virtual machine (VM) to get created. The primary reason for the installation time is that after the VM is created, we have to download and install OpenJDK and Hadoop. The download of OpenJDK happens through using yum, while the download of Hadoop happens through the use of curl. The secondary reason is that I couldn’t store the Hadoop archive on GitHub (GitHub does not allow files larger than 50 MB), so, the workaround is to have Vagrant execute a script to download Hadoop.
After the VM finishes being created, you can SSH into the VM by typing the following.
When you are done with the VM, you can destroy it by using the following command.
But, before you destroy the VM, you may verify that Hadoop was successfully installed by pointing your browsers to the following URLs.
Note that the URLs are pointing to localhost and NOT the VM. The reason why this is possible is because Vagrant can setup port forwarding from your desktop to the VM. This feature is another reason why Vagrant is an awesome product.
You should also try the hdfs shell command.
hdfs dfs -ls /
Well, that is it for this blog. I hope and expect that we all can easily and at-will now setup our own sandboxes of Hadoop with YARN using Vagrant and VirtualBox. Now, we can move onto real fun things like building applications to run on YARN.
As always, cheers!