BIG data - Setup Hadoop, HDFS, HBASE, hive - Installing Java and Hadoop - Part2

My wife takes all my money, so if this helped you in any way and you have some spare BitCoins, you may donate them to me - 16tb2Rgn4uDptrEuR94BkhQAZNgfoMj3ug

Keep this in mind

Host 1 - SuperNinja1 - 172.28.200.161 SuperMicro (The Master and running the NameNode. It also has H-Base and Hive installed)
Host 2 - SuperNinja2 - 172.28.200.163 SuperMicro (DataNode and a NodeManager)
Host 3 - SuperNinja3 - 172.28.200.165 SuperMicro (DataNode and a NodeManager, This node runs the Postgres instance for Hive)
Host 4 - SuperNinja4 - 172.28.200.150 HP Desktop (DataNode and a NodeManager)
Host 5 - SuperNinja5 - 172.28.200.153 HP Desktop (DataNode and a NodeManger)

For Hadoop and all the other stuff to work, you need java, seeing that I'm building on SLES11 SP3, I downloaded the latest Java RPM and installed it with rpm -ivh
SuperNinja5:/opt/temp # rpm -ivh jdk-2000\:1.7.0-fcs.x86_64.rpm
Preparing...                ########################################### [100%]
   1:jdk                    ########################################### [100%]
Unpacking JAR files...
    rt.jar...
    jsse.jar...
    charsets.jar...
    tools.jar...
    localedata.jar...
SuperNinja1:/opt/temp #
SuperNinja2:/opt/temp # which java
/usr/bin/java
SuperNinja2:/opt/temp # ls -ltr /usr/bin/java
lrwxrwxrwx 1 root root 26 May 28 10:07 /usr/bin/java -> /usr/java/default/bin/java
SuperNinja5:/opt/temp # ls -ltr /usr/java/latest
lrwxrwxrwx 1 root root 18 May 28 10:07 /usr/java/latest -> /usr/java/jdk1.7.0
SuperNinja2:/opt/temp #

I created 2 groups and 2 users, one for Hadoop and one for Hbase. The Hadoop user is called hduser and it belongs to the group hadoop, the other being hbuser, which is the H-Base user and belongs to the hbase group.

Let's start with Hadoop

Download the latest Hadoop from Apache's website
http://hadoop.apache.org/#Download+Hadoop
Place the downloaded file in /opt/temp

Make a directory for /opt/app, this is where we will place the Hadoop binaries. gunzip and untar the file, note the -C /opt/app, this means the tarred file contents will be placed in /opt/app
SuperNinja1:/opt/temp # mkdir -p /opt/app
SuperNinja1:/opt/temp # ls -ltr
total 135832
-rw-r--r-- 1 root root       194 May 14 15:29 ETH_MAC_ADDRESSES
-rw-r--r-- 1 root root 138943699 May 15 11:25 hadoop-2.4.0.tar.gz
SuperNinja1:/opt/temp # gunzip hadoop-2.4.0.tar.gz 
SuperNinja1:/opt/temp # tar -xvf hadoop-2.4.0.tar -C /opt/app
hadoop-2.4.0/
hadoop-2.4.0/bin/
hadoop-2.4.0/bin/mapred
hadoop-2.4.0/bin/hadoop
hadoop-2.4.0/bin/mapred.cmd
hadoop-2.4.0/bin/rcc
hadoop-2.4.0/bin/container-executor
hadoop-2.4.0/bin/hdfs
hadoop-2.4.0/bin/test-container-executor
Snip....Snip
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_error_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/banner.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/bg.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_info_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/expanded.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/newwindow.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/maven-logo-2.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/h3.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/breadcrumbs.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/h5.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/external.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/build-by-maven-white.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/build-by-maven-black.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/maven-feather.png
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_warning_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/collapsed.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logo_maven.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logo_apache.jpg
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_success_sml.gif
hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/apache-maven-project-2.png
SuperNinja1:/opt/temp #

Lets see what happened, change directory to /opt/app
SuperNinja1:/opt/temp # cd /opt/app
SuperNinja1:/opt/app # ls
hadoop-2.4.0
SuperNinja1:/opt/app #

To make it more friendly, I renamed the hadoop-2.4.0 to hadoop
SuperNinja5:/opt/app # mv hadoop-2.4.0 hadoop
SuperNinja1:/opt/app # ls -ltr
total 8
drwxr-xr-x 9 67974 users 4096 Mar 31 11:15 hadoop
SuperNinja1:/opt/app #

Next we need a user for hadoop, I created a user called hduser with group hadoop, also create the user's home directory and set the permissions
SuperNinja1:/opt/app # groupadd hadoop
SuperNinja1:/opt/app # useradd -g hadoop hduser
SuperNinja1:/opt/app # mkdir -p /home/hduser
SuperNinja1:/opt/app # chown -R hduser:hadoop /home/hduser

We then login using the newly created user and generate the user's ssh keys, with this user you must be able to log into ALL the servers without any password
SuperNinja1:/opt/app # su - hduser
hduser@SuperNinja1:~> ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa): 
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
7e:ce:17:29:64:21:54:73:2c:fd:5c:64:96:b1:91:fc [MD5] hduser@SuperNinja1
The key's randomart image is:
+--[ RSA 2048]----+
|       ...oo. .+B|
|        . ooo  *+|
|         . o o o.|
|          o   o E|
|        So   .   |
|       .  . o    |
|        . .. .   |
|         +  .    |
|          o.     |
+--[MD5]----------+
hduser@SuperNinja1:~> ls -la .ssh
total 16
drwx------ 2 hduser hadoop 4096 May 15 11:29 .
drwxr-xr-x 3 hduser hadoop 4096 May 15 11:29 ..
-rw------- 1 hduser hadoop 1679 May 15 11:29 id_rsa
-rw-r--r-- 1 hduser hadoop  400 May 15 11:29 id_rsa.pub
hduser@SuperNinja1:~> echo $HOME
/home/hduser
hduser@SuperNinja1:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
hduser@SuperNinja1:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ls -la .ssh
total 20
drwx------ 2 hduser hadoop 4096 May 15 11:30 .
drwxr-xr-x 3 hduser hadoop 4096 May 15 11:29 ..
-rw-r--r-- 1 hduser hadoop  400 May 15 11:30 authorized_keys
-rw------- 1 hduser hadoop 1679 May 15 11:29 id_rsa
-rw-r--r-- 1 hduser hadoop  400 May 15 11:29 id_rsa.pub
hduser@SuperNinja1:~> ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is 06:a7:bc:61:a0:de:14:04:23:d9:2a:84:75:37:23:f4 [MD5].
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.

hduser@SuperNinja1:~> exit
logout
Connection to localhost closed.

hduser@SuperNinja1:~> exit
logout

Create the hduser on all the servers, using the procedure above. Then make a text file with all the servers authorized_keys in it and place this file containing all the servers authorized_keys on all the servers in /home/hduser/.ssh/authorized_keys. This ensure that the hduser can log into ALL servers with no password. Below is an example of what it looks like, yes I did change my keys for this printout below, so don't even try it...
SuperNinja1:~ # cd /home/hduser/.ssh/
SuperNinja1:/home/hduser/.ssh # cat authorized_keys
ssh-rsa jCfon0dWBqIffU9G3q+HVzYRs6FDNrov hduser@SuperNinja1
ssh-rsa n0fwO3pBo8bQc2bA9lvKEIHbTwmUWDcu hduser@SuperNinja2
ssh-rsa dwS0ltr6/H1VPaU1X/OS3/Jq83yxjAYT hduser@SuperNinja3
ssh-rsa u1HzxsOH8Leu07JQA3piUaB56B7eJNFz hduser@SuperNinja4
ssh-rsa pnbYOuKz093zZzSMt80AmijczuPctnaf hduser@SuperNinja5
SuperNinja1:/home/hduser/.ssh # 

Next step is to login as hduser and set some variables in the .bashrc file on all the servers. Set the following in the .bashrc file in the hduser's home directory - See below
SuperNinja1:/home/hduser/.ssh # cd /
SuperNinja1:/ # su - hduser
hduser@SuperNinja1:~> pwd
/home/hduser
hduser@SuperNinja1:~> cat .bashrc
#Set Hadoop-related environment variables
export HADOOP_HOME=/opt/app/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_HOME=/opt/app/hive
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/java/latest

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
# For jps
export PATH=$PATH:$JAVA_HOME/bin
hduser@SuperNinja1:~>

Logout and log in again with the hduser and see if the .bashrc file is loaded

hduser@SuperNinja1:~> exit
logout
SuperNinja1:/ # su - hduser
hduser@SuperNinja1:~> echo $HADOOP_HOME
/opt/app/hadoop
hduser@SuperNinja1:~> echo $HIVE_HOME
/opt/app/hive
hduser@SuperNinja1:~>

Yea! we can start configuring HADOOP, all changes must be made as the hduser
All the files needed for Hadoop is in /opt/app/hadoop/etc/hadoop

SuperNinja1:/ # su - hduser
hduser@SuperNinja1:~> cd /opt/app/hadoop/etc/hadoop/
hduser@SuperNinja1:/opt/app/hadoop/etc/hadoop> ls -ltr
total 132
-rw-r--r-- 1 hduser hadoop  2268 Mar 31 10:49 ssl-server.xml.example
-rw-r--r-- 1 hduser hadoop  2316 Mar 31 10:49 ssl-client.xml.example
-rw-r--r-- 1 hduser hadoop 11169 Mar 31 10:49 log4j.properties
-rw-r--r-- 1 hduser hadoop  9257 Mar 31 10:49 hadoop-policy.xml
-rw-r--r-- 1 hduser hadoop  2490 Mar 31 10:49 hadoop-metrics.properties
-rw-r--r-- 1 hduser hadoop  3589 Mar 31 10:49 hadoop-env.cmd
-rw-r--r-- 1 hduser hadoop  2178 Mar 31 10:49 yarn-env.cmd
-rw-r--r-- 1 hduser hadoop  4113 Mar 31 10:49 mapred-queues.xml.template
-rw-r--r-- 1 hduser hadoop  1383 Mar 31 10:49 mapred-env.sh
-rw-r--r-- 1 hduser hadoop   918 Mar 31 10:49 mapred-env.cmd
-rw-r--r-- 1 hduser hadoop   620 Mar 31 10:49 httpfs-site.xml
-rw-r--r-- 1 hduser hadoop    21 Mar 31 10:49 httpfs-signature.secret
-rw-r--r-- 1 hduser hadoop  1657 Mar 31 10:49 httpfs-log4j.properties
-rw-r--r-- 1 hduser hadoop  1449 Mar 31 10:49 httpfs-env.sh
-rw-r--r-- 1 hduser hadoop  1774 Mar 31 10:49 hadoop-metrics2.properties
-rw-r--r-- 1 hduser hadoop   318 Mar 31 10:49 container-executor.cfg
-rw-r--r-- 1 hduser hadoop  1335 Mar 31 10:49 configuration.xsl
-rw-r--r-- 1 hduser hadoop  3589 Mar 31 10:49 capacity-scheduler.xml
-rw-r--r-- 1 hduser hadoop   206 May 15 12:28 mapred-site.xml
-rw-r--r-- 1 hduser hadoop  3512 May 15 12:54 hadoop-env.sh
-rw-r--r-- 1 hduser hadoop  4878 May 16 11:06 yarn-env.sh
-rw-r--r-- 1 hduser hadoop   679 May 16 11:27 yarn-site.xml
-rw-r--r-- 1 hduser hadoop   655 May 22 14:40 derby.log
drwxr-xr-x 5 hduser hadoop  4096 May 22 14:40 metastore_db
-rw-r--r-- 1 hduser hadoop   334 May 26 07:42 core-site.xml
-rw-r--r-- 1 hduser hadoop    60 May 28 11:58 slaves
-rw-r--r-- 1 hduser hadoop   510 May 29 11:14 hdfs-site.xml
hduser@SuperNinja1:/opt/app/hadoop/etc/hadoop>

1 comment:

Note: only a member of this blog may post a comment.