My wife takes all my money, so if this helped you in any way and you have some spare BitCoins, you may donate them to me -
16tb2Rgn4uDptrEuR94BkhQAZNgfoMj3ug
Keep this in mind
Host 1 - SuperNinja1 - 172.28.200.161 SuperMicro (The Master and running the NameNode. It also has H-Base and Hive installed)
Host 2 - SuperNinja2 - 172.28.200.163 SuperMicro (DataNode and a NodeManager)
Host 3 - SuperNinja3 - 172.28.200.165 SuperMicro (DataNode and a NodeManager, This node runs the Postgres instance for Hive)
Host 4 - SuperNinja4 - 172.28.200.150 HP Desktop (DataNode and a NodeManager)
Host 5 - SuperNinja5 - 172.28.200.153 HP Desktop (DataNode and a NodeManger)
For Hadoop and all the other stuff to work, you need java, seeing that I'm building on SLES11 SP3, I downloaded the latest Java RPM and installed it with rpm -ivh
SuperNinja5:/opt/temp # rpm -ivh jdk-2000\:1.7.0-fcs.x86_64.rpm
Preparing... ########################################### [100%]
1:jdk ########################################### [100%]
Unpacking JAR files...
rt.jar...
jsse.jar...
charsets.jar...
tools.jar...
localedata.jar...
SuperNinja1:/opt/temp #
SuperNinja2:/opt/temp # which java
/usr/bin/java
SuperNinja2:/opt/temp # ls -ltr /usr/bin/java
lrwxrwxrwx 1 root root 26 May 28 10:07 /usr/bin/java -> /usr/java/default/bin/java
SuperNinja5:/opt/temp # ls -ltr /usr/java/latest
lrwxrwxrwx 1 root root 18 May 28 10:07 /usr/java/latest -> /usr/java/jdk1.7.0
SuperNinja2:/opt/temp #
I created 2 groups and 2 users, one for Hadoop and one for Hbase. The Hadoop user is called hduser and it belongs to the group hadoop, the other being hbuser, which is the H-Base user and belongs to the hbase group.
Let's start with Hadoop
Download the latest Hadoop from Apache's website
http://hadoop.apache.org/#Download+Hadoop
Place the downloaded file in /opt/temp
Make a directory for /opt/app, this is where we will place the Hadoop binaries. gunzip and untar the file, note the -C /opt/app, this means the tarred file contents will be placed in /opt/app
SuperNinja1:/opt/temp # mkdir -p /opt/app
SuperNinja1:/opt/temp # ls -ltr
total 135832
-rw-r--r-- 1 root root 194 May 14 15:29 ETH_MAC_ADDRESSES
-rw-r--r-- 1 root root 138943699 May 15 11:25 hadoop-2.4.0.tar.gz
SuperNinja1:/opt/temp # gunzip hadoop-2.4.0.tar.gz
SuperNinja1:/opt/temp # tar -xvf hadoop-2.4.0.tar -C /opt/app
hadoop-2.4.0/
hadoop-2.4.0/bin/
hadoop-2.4.0/bin/mapred
hadoop-2.4.0/bin/hadoop
hadoop-2.4.0/bin/mapred.cmd
hadoop-2.4.0/bin/rcc
hadoop-2.4.0/bin/container-executor
hadoop-2.4.0/bin/hdfs
hadoop-2.4.0/bin/test-container-executor
Snip....Snip
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_error_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/banner.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/bg.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_info_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/expanded.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/newwindow.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/maven-logo-2.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/h3.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/breadcrumbs.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/h5.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/external.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/build-by-maven-white.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/build-by-maven-black.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/maven-feather.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_warning_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/collapsed.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logo_maven.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logo_apache.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_success_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/apache-maven-project-2.png
SuperNinja1:/opt/temp #
Lets see what happened, change directory to /opt/app
SuperNinja1:/opt/temp # cd /opt/app
SuperNinja1:/opt/app # ls
hadoop-2.4.0
SuperNinja1:/opt/app #
To make it more friendly, I renamed the hadoop-2.4.0 to hadoop
SuperNinja5:/opt/app # mv hadoop-2.4.0 hadoop
SuperNinja1:/opt/app # ls -ltr
total 8
drwxr-xr-x 9 67974 users 4096 Mar 31 11:15 hadoop
SuperNinja1:/opt/app #
Next we need a user for hadoop, I created a user called hduser with group hadoop, also create the user's home directory and set the permissions
SuperNinja1:/opt/app # groupadd hadoop
SuperNinja1:/opt/app # useradd -g hadoop hduser
SuperNinja1:/opt/app # mkdir -p /home/hduser
SuperNinja1:/opt/app # chown -R hduser:hadoop /home/hduser
We then login using the newly created user and generate the user's ssh keys, with this user you must be able to log into ALL the servers without any password
SuperNinja1:/opt/app # su - hduser
hduser@SuperNinja1:~> ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
7e:ce:17:29:64:21:54:73:2c:fd:5c:64:96:b1:91:fc [MD5] hduser@SuperNinja1
The key's randomart image is:
+--[ RSA 2048]----+
| ...oo. .+B|
| . ooo *+|
| . o o o.|
| o o E|
| So . |
| . . o |
| . .. . |
| + . |
| o. |
+--[MD5]----------+
hduser@SuperNinja1:~> ls -la .ssh
total 16
drwx------ 2 hduser hadoop 4096 May 15 11:29 .
drwxr-xr-x 3 hduser hadoop 4096 May 15 11:29 ..
-rw------- 1 hduser hadoop 1679 May 15 11:29 id_rsa
-rw-r--r-- 1 hduser hadoop 400 May 15 11:29 id_rsa.pub
hduser@SuperNinja1:~> echo $HOME
/home/hduser
hduser@SuperNinja1:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hduser@SuperNinja1:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ls -la .ssh
total 20
drwx------ 2 hduser hadoop 4096 May 15 11:30 .
drwxr-xr-x 3 hduser hadoop 4096 May 15 11:29 ..
-rw-r--r-- 1 hduser hadoop 400 May 15 11:30 authorized_keys
-rw------- 1 hduser hadoop 1679 May 15 11:29 id_rsa
-rw-r--r-- 1 hduser hadoop 400 May 15 11:29 id_rsa.pub
hduser@SuperNinja1:~> ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is 06:a7:bc:61:a0:de:14:04:23:d9:2a:84:75:37:23:f4 [MD5].
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
hduser@SuperNinja1:~> exit
logout
Connection to localhost closed.
hduser@SuperNinja1:~> exit
logout
Create the hduser on all the servers, using the procedure above. Then make a text file with all the servers authorized_keys in it and place this file containing all the servers authorized_keys on all the servers in /home/hduser/.ssh/authorized_keys. This ensure that the hduser can log into ALL servers with no password. Below is an example of what it looks like, yes I did change my keys for this printout below, so don't even try it...
SuperNinja1:~ # cd /home/hduser/.ssh/
SuperNinja1:/home/hduser/.ssh # cat authorized_keys
ssh-rsa jCfon0dWBqIffU9G3q+HVzYRs6FDNrov hduser@SuperNinja1
ssh-rsa n0fwO3pBo8bQc2bA9lvKEIHbTwmUWDcu hduser@SuperNinja2
ssh-rsa dwS0ltr6/H1VPaU1X/OS3/Jq83yxjAYT hduser@SuperNinja3
ssh-rsa u1HzxsOH8Leu07JQA3piUaB56B7eJNFz hduser@SuperNinja4
ssh-rsa pnbYOuKz093zZzSMt80AmijczuPctnaf hduser@SuperNinja5
SuperNinja1:/home/hduser/.ssh #
Next step is to login as hduser and set some variables in the .bashrc file on all the servers. Set the following in the .bashrc file in the hduser's home directory - See below
SuperNinja1:/home/hduser/.ssh # cd /
SuperNinja1:/ # su - hduser
hduser@SuperNinja1:~> pwd
/home/hduser
hduser@SuperNinja1:~> cat .bashrc
#Set Hadoop-related environment variables
export HADOOP_HOME=/opt/app/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_HOME=/opt/app/hive
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/java/latest
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
# For jps
export PATH=$PATH:$JAVA_HOME/bin
hduser@SuperNinja1:~>
Logout and log in again with the hduser and see if the .bashrc file is loaded
hduser@SuperNinja1:~> exit
logout
SuperNinja1:/ # su - hduser
hduser@SuperNinja1:~> echo $HADOOP_HOME
/opt/app/hadoop
hduser@SuperNinja1:~> echo $HIVE_HOME
/opt/app/hive
hduser@SuperNinja1:~>
Yea! we can start configuring HADOOP, all changes must be made as the hduser
All the files needed for Hadoop is in /opt/app/hadoop/etc/hadoop
SuperNinja1:/ # su - hduser
hduser@SuperNinja1:~> cd /opt/app/hadoop/etc/hadoop/
hduser@SuperNinja1:/opt/app/hadoop/etc/hadoop> ls -ltr
total 132
-rw-r--r-- 1 hduser hadoop 2268 Mar 31 10:49 ssl-server.xml.example
-rw-r--r-- 1 hduser hadoop 2316 Mar 31 10:49 ssl-client.xml.example
-rw-r--r-- 1 hduser hadoop 11169 Mar 31 10:49 log4j.properties
-rw-r--r-- 1 hduser hadoop 9257 Mar 31 10:49 hadoop-policy.xml
-rw-r--r-- 1 hduser hadoop 2490 Mar 31 10:49 hadoop-metrics.properties
-rw-r--r-- 1 hduser hadoop 3589 Mar 31 10:49 hadoop-env.cmd
-rw-r--r-- 1 hduser hadoop 2178 Mar 31 10:49 yarn-env.cmd
-rw-r--r-- 1 hduser hadoop 4113 Mar 31 10:49 mapred-queues.xml.template
-rw-r--r-- 1 hduser hadoop 1383 Mar 31 10:49 mapred-env.sh
-rw-r--r-- 1 hduser hadoop 918 Mar 31 10:49 mapred-env.cmd
-rw-r--r-- 1 hduser hadoop 620 Mar 31 10:49 httpfs-site.xml
-rw-r--r-- 1 hduser hadoop 21 Mar 31 10:49 httpfs-signature.secret
-rw-r--r-- 1 hduser hadoop 1657 Mar 31 10:49 httpfs-log4j.properties
-rw-r--r-- 1 hduser hadoop 1449 Mar 31 10:49 httpfs-env.sh
-rw-r--r-- 1 hduser hadoop 1774 Mar 31 10:49 hadoop-metrics2.properties
-rw-r--r-- 1 hduser hadoop 318 Mar 31 10:49 container-executor.cfg
-rw-r--r-- 1 hduser hadoop 1335 Mar 31 10:49 configuration.xsl
-rw-r--r-- 1 hduser hadoop 3589 Mar 31 10:49 capacity-scheduler.xml
-rw-r--r-- 1 hduser hadoop 206 May 15 12:28 mapred-site.xml
-rw-r--r-- 1 hduser hadoop 3512 May 15 12:54 hadoop-env.sh
-rw-r--r-- 1 hduser hadoop 4878 May 16 11:06 yarn-env.sh
-rw-r--r-- 1 hduser hadoop 679 May 16 11:27 yarn-site.xml
-rw-r--r-- 1 hduser hadoop 655 May 22 14:40 derby.log
drwxr-xr-x 5 hduser hadoop 4096 May 22 14:40 metastore_db
-rw-r--r-- 1 hduser hadoop 334 May 26 07:42 core-site.xml
-rw-r--r-- 1 hduser hadoop 60 May 28 11:58 slaves
-rw-r--r-- 1 hduser hadoop 510 May 29 11:14 hdfs-site.xml
hduser@SuperNinja1:/opt/app/hadoop/etc/hadoop>
