BIG data - Setup Hadoop, HDFS, HBASE, hive - Machine setup - Part1

Setup Hadoop, HDFS, HBASE, hive

My, what an experience, if you have never done this before, be afraid, very afraid. Just joking, once you have the basics, it is fairly simple.

I have to support my lavish lifestyle, so if this helped you in any way and you have some spare BitCoins, you may donate them to me - 16tb2Rgn4uDptrEuR94BkhQAZNgfoMj3ug

 

My setup

3 x Supermicro Servers 813MTQ-600CB
  • Intel Xeon E5-2609v2 CPUs
  • 64GB Memory
  • 4 x 1TB 7200rpm SATA drives
  • 4 Port SATA raid controller
2 x HP desktop machines, simple machines,
  • 16GB RAM
  • 2 x 1TB drives, nothing fancy
All the machines where connected via ETH0 to a small switch, for heartbeat signals I used ETH1 connected to another small switch. I wanted to test load balancing using PaceMaker and CoroSync on Apache, so that's the reason for the heartbeat NICs

Keep this in mind

Host 1 - SuperNinja1 - 172.28.200.161 SuperMicro (The Master and running the NameNode. It also has H-Base and Hive installed)
Host 2 - SuperNinja2 - 172.28.200.163 SuperMicro (DataNode and a NodeManager)
Host 3 - SuperNinja3 - 172.28.200.165 SuperMicro (DataNode and a NodeManager, This node runs the Postgres instance for Hive)
Host 4 - SuperNinja4 - 172.28.200.150 HP Desktop (DataNode and a NodeManager)
Host 5 - SuperNinja5 - 172.28.200.153 HP Desktop (DataNode and a NodeManger)

Setup from my Zabbix server


So let's get cracking - Setup the machines

I use SLES 11 -SP3, once you get the OS installed, change the ETH setups as follow - I assigned the ETH0's as the normal TCP/IP and ETH1's as the heartbeats, internal IPs, you can skip the heartbeats as this was for my PaceMaker and CoroSync testing. Check the blog for a setup guide on PaceMaker and CoroSync.
The two files in question are ifcfg-eth0 and ifcfg-eth1
SuperNinja1:~ # cat /etc/sysconfig/network/ifcfg-eth0
STARTMODE='auto'
BOOTPROTO='static'
IPADDR='172.28.200.161'
NETMASK='255.255.255.0'
GATEWAY='172.28.200.1'
NM_CONTROLLED='no'
SuperNinja1:~ # cat /etc/sysconfig/network/ifcfg-eth1
STARTMODE='auto'
BOOTPROTO='static'
IPADDR='172.16.0.5/24'
NM_CONTROLLED='no'
SuperNinja1:~ #

Set the 2 IP addresses in the 2 files, the normal TCP/IP address is 172.28.200.161 and the heartbeat IP is set to 172.16.0.5 for this host, setup all your hosts. (with different IP addresses of course)

Restart networking for the changes to take affect
SuperNinja1:~ # service network restart
Shutting down network interfaces:
    eth0      device: Intel Corporation I350 Gigabit Network Connec                                                                                                        done
    eth1      device: Intel Corporation I350 Gigabit Network Connec                                                                                                        done
Shutting down service network  .  .  .  .  .  .  .  .  .                                                                                                                   done
Hint: you may set mandatory devices in /etc/sysconfig/network/config
Setting up network interfaces:
    eth0      device: Intel Corporation I350 Gigabit Network Connec
    eth0      IP address: 172.28.200.161/24                                                                                                                                done
    eth1      device: Intel Corporation I350 Gigabit Network Connec
    eth1      IP address: 172.16.0.5/24                                                                                                                                    done
Setting up service network  .  .  .  .  .  .  .  .  .  .                                                                                                                   done
SuperNinja1:~ # ifconfig -a
eth0      Link encap:Ethernet  HWaddr 0C:C4:7A:03:70:18  
          inet addr:172.28.200.161  Bcast:172.28.200.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:65826308 errors:0 dropped:18078140 overruns:0 frame:0
          TX packets:258625520 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:13830916872 (13190.1 Mb)  TX bytes:373734605729 (356421.0 Mb)
          Memory:fb920000-fb940000 

eth1      Link encap:Ethernet  HWaddr 0C:C4:7A:03:70:19  
          inet addr:172.16.0.5  Bcast:172.16.0.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Memory:fb900000-fb920000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2515035 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2515035 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:621406221 (592.6 Mb)  TX bytes:621406221 (592.6 Mb)

SuperNinja1:~ #

If all it setup correctly, you should be able to ping all hosts.
SuperNinja1:~ # ping -c 4 SuperNinja2
PING SuperNinja2.xxxx.com (172.28.200.163) 56(84) bytes of data.
64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=1 ttl=64 time=0.141 ms
64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=2 ttl=64 time=0.146 ms
64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=3 ttl=64 time=0.161 ms
64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=4 ttl=64 time=0.177 ms

--- SuperNinja2.xxxx.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.141/0.156/0.177/0.016 ms
SuperNinja1:~ # ping -c 4 SuperNinja3
PING SuperNinja3.xxxx.com (172.28.200.165) 56(84) bytes of data.
64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=1 ttl=64 time=0.113 ms
64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=2 ttl=64 time=0.214 ms
64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=3 ttl=64 time=0.181 ms
64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=4 ttl=64 time=0.173 ms

--- SuperNinja3.xxxx.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2998ms
rtt min/avg/max/mdev = 0.113/0.170/0.214/0.037 ms
SuperNinja1:~ # ping -c 4 SuperNinja4
PING SuperNinja4.xxxx.com (172.28.200.150) 56(84) bytes of data.
64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=1 ttl=64 time=3.18 ms
64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=2 ttl=64 time=0.169 ms
64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=3 ttl=64 time=0.202 ms
64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=4 ttl=64 time=0.147 ms

--- SuperNinja4.xxxx.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.147/0.925/3.185/1.305 ms
SuperNinja1:~ # ping -c 4 SuperNinja5
PING SuperNinja5.xxxx.com (172.28.200.153) 56(84) bytes of data.
64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=1 ttl=64 time=5.96 ms
64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=2 ttl=64 time=0.224 ms
64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=3 ttl=64 time=0.152 ms
64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=4 ttl=64 time=0.150 ms

--- SuperNinja5.xxxx.com ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.150/1.623/5.966/2.507 ms
SuperNinja1:~ #


Add all you hosts to all the /etc/hosts files in all the nodes
SuperNinja1:~ # cat /etc/hosts
#
# hosts         This file describes a number of hostname-to-address
#               mappings for the TCP/IP subsystem.  It is mostly
#               used at boot time, when no name servers are running.
#               On small systems, this file can be used instead of a
#               "named" name server.
# Syntax:
#    
# IP-Address  Full-Qualified-Hostname  Short-Hostname
#


# special IPv6 addresses
::1             localhost ipv6-localhost ipv6-loopback

fe00::0         ipv6-localnet

ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts
172.28.200.161 SuperNinja1.xxxx.com SuperNinja1
127.0.0.1 localhost.localdomain localhost
172.28.200.163 SuperNinja2.xxxx.com SuperNinja2
172.28.200.165 SuperNinja3.xxxx.com SuperNinja3
172.28.200.150 SuperNinja4.xxxx.com SuperNinja4
172.28.200.153 SuperNinja5.xxxx.com SuperNinja5
SuperNinja1:~ #

HDFS works better if there in no LV setup, so what I did is to fdisk a partition and named it /hdfsfilesystemx, in the case of SuperNinja1, I also created a LV named /data, this is where the RAW files to be processed will be loaded to, see my blog post on how to create PVs, VGs and LVs. http://kingratlinux.blogspot.com/2014/06/create-physical-volumes-pv-volume.html

The 1st thing, see how many disks are in the machine
SuperNinja1:~ # hwinfo --disk --short
disk:                                                           
  /dev/sdd             SMC2108
  /dev/sda             SMC2108
  /dev/sdc             SMC2108
  /dev/sdb             SMC2108
SuperNinja1:~ #

Use fdisk to create the disks, on the 1st disk /dev/sda1 and /dev/sda2 is used for the OS, I then created /dev/sda3 for the /data slice
SuperNinja:~ #fdisk -l /dev/sda

Disk /dev/sda: 999.0 GB, 998999326720 bytes
255 heads, 63 sectors/track, 121454 cylinders, total 1951170560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00079fbc

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     1060863      529408   83  Linux
/dev/sda2         1060864   147861503    73400320   8e  Linux LVM
/dev/sda3       147861504  1951170559   901654528   8e  Linux LVM
SuperNinja:~ #

For /dev/sdb, /dev/sdc and /dev/sdd, no LV's are created, just a raw slice that's mounted, Note the type is set to Linux, not Linux LVM as /dev/sda3
SuperNinja1:~ # fdisk -l /dev/sdb

Disk /dev/sdb: 999.0 GB, 998999326720 bytes
192 heads, 17 sectors/track, 597785 cylinders, total 1951170560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x25303156

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb3            2048  1951170559   975584256   83  Linux
SuperNinja1:~ # fdisk -l /dev/sdc

Disk /dev/sdc: 999.0 GB, 998999326720 bytes
192 heads, 17 sectors/track, 597785 cylinders, total 1951170560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x43492116

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc3            2048  1951170559   975584256   83  Linux
SuperNinja1:~ # fdisk -l /dev/sdd

Disk /dev/sdd: 999.0 GB, 998999326720 bytes
192 heads, 17 sectors/track, 597785 cylinders, total 1951170560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x94721d15

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd3            2048  1951170559   975584256   83  Linux
SuperNinja1:~ # df -h
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/system-root     2.0G  368M  1.6G  20% /
udev                         32G  200K   32G   1% /dev
tmpfs                        32G     0   32G   0% /dev/shm
/dev/sda1                   509M  132M  352M  28% /boot
/dev/mapper/system-home    1008M   40M  918M   5% /home
/dev/mapper/system-opt      9.9G  1.5G  8.0G  16% /opt
/dev/mapper/system-srv      6.9G  4.7G  1.9G  72% /srv
/dev/mapper/system-tmp      5.0G  144M  4.6G   3% /tmp
/dev/mapper/system-usr      6.0G  4.0G  1.7G  71% /usr
/dev/mapper/system-var      4.0G  377M  3.4G  10% /var
/dev/sdb3                   916G  152G  718G  18% /hdfsfilesystem1
/dev/sdc3                   916G  150G  720G  18% /hdfsfilesystem2
/dev/sdd3                   916G  152G  718G  18% /hdfsfilesystem3
/dev/mapper/datadisk-part0  493G  227G  241G  49% /data
SuperNinja1:~ #

Disable IPV6 if you not using it
Add the following lines to /etc/sysctl.conf
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
The file should look like this below, reboot the server after the changes.

SuperNinja1:/home # cat /etc/sysctl.conf
# Disable response to broadcasts.
# You don't want yourself becoming a Smurf amplifier.
net.ipv4.icmp_echo_ignore_broadcasts = 1
# enable route verification on all interfaces
net.ipv4.conf.all.rp_filter = 1
# enable ipV6 forwarding
#net.ipv6.conf.all.forwarding = 1
# increase the number of possible inotify(7) watches
fs.inotify.max_user_watches = 65536
# avoid deleting secondary IPs on deleting the primary IP
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
SuperNinja1:/home # cat /proc/sys/net/ipv6/conf/all/disable_ipv6
0
SuperNinja1:/home # reboot
Broadcast message from root (pts/0) (Wed May 28 10:27:23 2014):
The system is going down for reboot NOW!
SuperNinja1:/home #

Once the machine is up, check if IPV6 has been disabled, it should be 1

Xshell:\> ssh root@172.28.200.161
Connecting to 172.28.200.161:22...
Connection established.
Escape character is '^@]'.
WARNING! The remote SSH server rejected X11 forwarding request.
Last login: Wed May 28 09:01:25 2014 from kingrat
SuperNinja1:~ # cat /proc/sys/net/ipv6/conf/all/disable_ipv6
1
SuperNinja1:~ #



3 comments:

  1. awesome post presented by you..your writing style is fabulous and keep update with your blogs
    Big data hadoop online Course Hyderabad

    ReplyDelete
  2. really Good blog post.provided a helpful information.I hope that you will post more updates like thisBig data hadoop online Training

    ReplyDelete
  3. Really very great article,keep sharing more blogs with us.

    big data and hadoop online training

    ReplyDelete

Note: only a member of this blog may post a comment.