Keep this in mind
Host 1 - SuperNinja1 - 172.28.200.161 SuperMicro (The Master and running the NameNode. It also has H-Base and Hive installed)Host 2 - SuperNinja2 - 172.28.200.163 SuperMicro (DataNode and a NodeManager)
Host 3 - SuperNinja3 - 172.28.200.165 SuperMicro (DataNode and a NodeManager, This node runs the Postgres instance for Hive)
Host 4 - SuperNinja4 - 172.28.200.150 HP Desktop (DataNode and a NodeManager)
Host 5 - SuperNinja5 - 172.28.200.153 HP Desktop (DataNode and a NodeManger)
For Hadoop and all the other stuff to work, you need java, seeing that I'm building on SLES11 SP3, I downloaded the latest Java RPM and installed it with rpm -ivh
SuperNinja5:/opt/temp # rpm -ivh jdk-2000\:1.7.0-fcs.x86_64.rpm Preparing... ########################################### [100%] 1:jdk ########################################### [100%] Unpacking JAR files... rt.jar... jsse.jar... charsets.jar... tools.jar... localedata.jar... SuperNinja1:/opt/temp # SuperNinja2:/opt/temp # which java /usr/bin/java SuperNinja2:/opt/temp # ls -ltr /usr/bin/java lrwxrwxrwx 1 root root 26 May 28 10:07 /usr/bin/java -> /usr/java/default/bin/java SuperNinja5:/opt/temp # ls -ltr /usr/java/latest lrwxrwxrwx 1 root root 18 May 28 10:07 /usr/java/latest -> /usr/java/jdk1.7.0 SuperNinja2:/opt/temp #
I created 2 groups and 2 users, one for Hadoop and one for Hbase. The Hadoop user is called hduser and it belongs to the group hadoop, the other being hbuser, which is the H-Base user and belongs to the hbase group.
Let's start with Hadoop
Download the latest Hadoop from Apache's websitehttp://hadoop.apache.org/#Download+Hadoop
Place the downloaded file in /opt/temp
Make a directory for /opt/app, this is where we will place the Hadoop binaries. gunzip and untar the file, note the -C /opt/app, this means the tarred file contents will be placed in /opt/app
SuperNinja1:/opt/temp # mkdir -p /opt/app SuperNinja1:/opt/temp # ls -ltr total 135832 -rw-r--r-- 1 root root 194 May 14 15:29 ETH_MAC_ADDRESSES -rw-r--r-- 1 root root 138943699 May 15 11:25 hadoop-2.4.0.tar.gz SuperNinja1:/opt/temp # gunzip hadoop-2.4.0.tar.gz SuperNinja1:/opt/temp # tar -xvf hadoop-2.4.0.tar -C /opt/app hadoop-2.4.0/ hadoop-2.4.0/bin/ hadoop-2.4.0/bin/mapred hadoop-2.4.0/bin/hadoop hadoop-2.4.0/bin/mapred.cmd hadoop-2.4.0/bin/rcc hadoop-2.4.0/bin/container-executor hadoop-2.4.0/bin/hdfs hadoop-2.4.0/bin/test-container-executor Snip....Snip hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_error_sml.gif hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/banner.jpg hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/bg.jpg hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_info_sml.gif hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/expanded.gif hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/newwindow.png hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/maven-logo-2.gif hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/h3.jpg hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/breadcrumbs.jpg hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/h5.jpg hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/external.png hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/ hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/build-by-maven-white.png hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/build-by-maven-black.png hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logos/maven-feather.png hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_warning_sml.gif hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/collapsed.gif hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logo_maven.jpg hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/logo_apache.jpg hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/icon_success_sml.gif hadoop-2.4.0/share/doc/hadoop/hadoop-streaming/images/apache-maven-project-2.png SuperNinja1:/opt/temp #
Lets see what happened, change directory to /opt/app
SuperNinja1:/opt/temp # cd /opt/app SuperNinja1:/opt/app # ls hadoop-2.4.0 SuperNinja1:/opt/app #
To make it more friendly, I renamed the hadoop-2.4.0 to hadoop
SuperNinja5:/opt/app # mv hadoop-2.4.0 hadoop SuperNinja1:/opt/app # ls -ltr total 8 drwxr-xr-x 9 67974 users 4096 Mar 31 11:15 hadoop SuperNinja1:/opt/app #
Next we need a user for hadoop, I created a user called hduser with group hadoop, also create the user's home directory and set the permissions
SuperNinja1:/opt/app # groupadd hadoop SuperNinja1:/opt/app # useradd -g hadoop hduser SuperNinja1:/opt/app # mkdir -p /home/hduser SuperNinja1:/opt/app # chown -R hduser:hadoop /home/hduser
We then login using the newly created user and generate the user's ssh keys, with this user you must be able to log into ALL the servers without any password
SuperNinja1:/opt/app # su - hduser hduser@SuperNinja1:~> ssh-keygen -t rsa -P "" Generating public/private rsa key pair. Enter file in which to save the key (/home/hduser/.ssh/id_rsa): Created directory '/home/hduser/.ssh'. Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub. The key fingerprint is: 7e:ce:17:29:64:21:54:73:2c:fd:5c:64:96:b1:91:fc [MD5] hduser@SuperNinja1 The key's randomart image is: +--[ RSA 2048]----+ | ...oo. .+B| | . ooo *+| | . o o o.| | o o E| | So . | | . . o | | . .. . | | + . | | o. | +--[MD5]----------+ hduser@SuperNinja1:~> ls -la .ssh total 16 drwx------ 2 hduser hadoop 4096 May 15 11:29 . drwxr-xr-x 3 hduser hadoop 4096 May 15 11:29 .. -rw------- 1 hduser hadoop 1679 May 15 11:29 id_rsa -rw-r--r-- 1 hduser hadoop 400 May 15 11:29 id_rsa.pub hduser@SuperNinja1:~> echo $HOME /home/hduser hduser@SuperNinja1:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys hduser@SuperNinja1:~> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ls -la .ssh total 20 drwx------ 2 hduser hadoop 4096 May 15 11:30 . drwxr-xr-x 3 hduser hadoop 4096 May 15 11:29 .. -rw-r--r-- 1 hduser hadoop 400 May 15 11:30 authorized_keys -rw------- 1 hduser hadoop 1679 May 15 11:29 id_rsa -rw-r--r-- 1 hduser hadoop 400 May 15 11:29 id_rsa.pub hduser@SuperNinja1:~> ssh localhost The authenticity of host 'localhost (::1)' can't be established. ECDSA key fingerprint is 06:a7:bc:61:a0:de:14:04:23:d9:2a:84:75:37:23:f4 [MD5]. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts. hduser@SuperNinja1:~> exit logout Connection to localhost closed. hduser@SuperNinja1:~> exit logout
Create the hduser on all the servers, using the procedure above. Then make a text file with all the servers authorized_keys in it and place this file containing all the servers authorized_keys on all the servers in /home/hduser/.ssh/authorized_keys. This ensure that the hduser can log into ALL servers with no password. Below is an example of what it looks like, yes I did change my keys for this printout below, so don't even try it...
SuperNinja1:~ # cd /home/hduser/.ssh/ SuperNinja1:/home/hduser/.ssh # cat authorized_keys ssh-rsa jCfon0dWBqIffU9G3q+HVzYRs6FDNrov hduser@SuperNinja1 ssh-rsa n0fwO3pBo8bQc2bA9lvKEIHbTwmUWDcu hduser@SuperNinja2 ssh-rsa dwS0ltr6/H1VPaU1X/OS3/Jq83yxjAYT hduser@SuperNinja3 ssh-rsa u1HzxsOH8Leu07JQA3piUaB56B7eJNFz hduser@SuperNinja4 ssh-rsa pnbYOuKz093zZzSMt80AmijczuPctnaf hduser@SuperNinja5 SuperNinja1:/home/hduser/.ssh #
Next step is to login as hduser and set some variables in the .bashrc file on all the servers. Set the following in the .bashrc file in the hduser's home directory - See below
SuperNinja1:/home/hduser/.ssh # cd / SuperNinja1:/ # su - hduser hduser@SuperNinja1:~> pwd /home/hduser hduser@SuperNinja1:~> cat .bashrc #Set Hadoop-related environment variables export HADOOP_HOME=/opt/app/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export HIVE_HOME=/opt/app/hive export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) export JAVA_HOME=/usr/java/latest # Some convenient aliases and functions for running Hadoop-related commands unalias fs &> /dev/null alias fs="hadoop fs" unalias hls &> /dev/null alias hls="fs -ls" # If you have LZO compression enabled in your Hadoop cluster and # compress job outputs with LZOP (not covered in this tutorial): # Conveniently inspect an LZOP compressed file from the command # line; run via: # # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo # # Requires installed 'lzop' command. # lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less } # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin # For jps export PATH=$PATH:$JAVA_HOME/bin hduser@SuperNinja1:~>
Logout and log in again with the hduser and see if the .bashrc file is loaded
hduser@SuperNinja1:~> exit logout SuperNinja1:/ # su - hduser hduser@SuperNinja1:~> echo $HADOOP_HOME /opt/app/hadoop hduser@SuperNinja1:~> echo $HIVE_HOME /opt/app/hive hduser@SuperNinja1:~>
Yea! we can start configuring HADOOP, all changes must be made as the hduser
All the files needed for Hadoop is in /opt/app/hadoop/etc/hadoop
SuperNinja1:/ # su - hduser hduser@SuperNinja1:~> cd /opt/app/hadoop/etc/hadoop/ hduser@SuperNinja1:/opt/app/hadoop/etc/hadoop> ls -ltr total 132 -rw-r--r-- 1 hduser hadoop 2268 Mar 31 10:49 ssl-server.xml.example -rw-r--r-- 1 hduser hadoop 2316 Mar 31 10:49 ssl-client.xml.example -rw-r--r-- 1 hduser hadoop 11169 Mar 31 10:49 log4j.properties -rw-r--r-- 1 hduser hadoop 9257 Mar 31 10:49 hadoop-policy.xml -rw-r--r-- 1 hduser hadoop 2490 Mar 31 10:49 hadoop-metrics.properties -rw-r--r-- 1 hduser hadoop 3589 Mar 31 10:49 hadoop-env.cmd -rw-r--r-- 1 hduser hadoop 2178 Mar 31 10:49 yarn-env.cmd -rw-r--r-- 1 hduser hadoop 4113 Mar 31 10:49 mapred-queues.xml.template -rw-r--r-- 1 hduser hadoop 1383 Mar 31 10:49 mapred-env.sh -rw-r--r-- 1 hduser hadoop 918 Mar 31 10:49 mapred-env.cmd -rw-r--r-- 1 hduser hadoop 620 Mar 31 10:49 httpfs-site.xml -rw-r--r-- 1 hduser hadoop 21 Mar 31 10:49 httpfs-signature.secret -rw-r--r-- 1 hduser hadoop 1657 Mar 31 10:49 httpfs-log4j.properties -rw-r--r-- 1 hduser hadoop 1449 Mar 31 10:49 httpfs-env.sh -rw-r--r-- 1 hduser hadoop 1774 Mar 31 10:49 hadoop-metrics2.properties -rw-r--r-- 1 hduser hadoop 318 Mar 31 10:49 container-executor.cfg -rw-r--r-- 1 hduser hadoop 1335 Mar 31 10:49 configuration.xsl -rw-r--r-- 1 hduser hadoop 3589 Mar 31 10:49 capacity-scheduler.xml -rw-r--r-- 1 hduser hadoop 206 May 15 12:28 mapred-site.xml -rw-r--r-- 1 hduser hadoop 3512 May 15 12:54 hadoop-env.sh -rw-r--r-- 1 hduser hadoop 4878 May 16 11:06 yarn-env.sh -rw-r--r-- 1 hduser hadoop 679 May 16 11:27 yarn-site.xml -rw-r--r-- 1 hduser hadoop 655 May 22 14:40 derby.log drwxr-xr-x 5 hduser hadoop 4096 May 22 14:40 metastore_db -rw-r--r-- 1 hduser hadoop 334 May 26 07:42 core-site.xml -rw-r--r-- 1 hduser hadoop 60 May 28 11:58 slaves -rw-r--r-- 1 hduser hadoop 510 May 29 11:14 hdfs-site.xml hduser@SuperNinja1:/opt/app/hadoop/etc/hadoop>