Setup Hadoop, HDFS, HBASE, hive
My, what an experience, if you have never done this before, be afraid, very afraid. Just joking, once you have the basics, it is fairly simple.I have to support my lavish lifestyle, so if this helped you in any way and you have some spare BitCoins, you may donate them to me - 16tb2Rgn4uDptrEuR94BkhQAZNgfoMj3ug
My setup
3 x Supermicro Servers 813MTQ-600CB- Intel Xeon E5-2609v2 CPUs
- 64GB Memory
- 4 x 1TB 7200rpm SATA drives
- 4 Port SATA raid controller
- 16GB RAM
- 2 x 1TB drives, nothing fancy
Keep this in mind
Host 1 - SuperNinja1 - 172.28.200.161 SuperMicro (The Master and running the NameNode. It also has H-Base and Hive installed)Host 2 - SuperNinja2 - 172.28.200.163 SuperMicro (DataNode and a NodeManager)
Host 3 - SuperNinja3 - 172.28.200.165 SuperMicro (DataNode and a NodeManager, This node runs the Postgres instance for Hive)
Host 4 - SuperNinja4 - 172.28.200.150 HP Desktop (DataNode and a NodeManager)
Host 5 - SuperNinja5 - 172.28.200.153 HP Desktop (DataNode and a NodeManger)
Setup from my Zabbix server
So let's get cracking - Setup the machines
I use SLES 11 -SP3, once you get the OS installed, change the ETH setups as follow - I assigned the ETH0's as the normal TCP/IP and ETH1's as the heartbeats, internal IPs, you can skip the heartbeats as this was for my PaceMaker and CoroSync testing. Check the blog for a setup guide on PaceMaker and CoroSync.The two files in question are ifcfg-eth0 and ifcfg-eth1
SuperNinja1:~ # cat /etc/sysconfig/network/ifcfg-eth0 STARTMODE='auto' BOOTPROTO='static' IPADDR='172.28.200.161' NETMASK='255.255.255.0' GATEWAY='172.28.200.1' NM_CONTROLLED='no' SuperNinja1:~ # cat /etc/sysconfig/network/ifcfg-eth1 STARTMODE='auto' BOOTPROTO='static' IPADDR='172.16.0.5/24' NM_CONTROLLED='no' SuperNinja1:~ #
Set the 2 IP addresses in the 2 files, the normal TCP/IP address is 172.28.200.161 and the heartbeat IP is set to 172.16.0.5 for this host, setup all your hosts. (with different IP addresses of course)
Restart networking for the changes to take affect
SuperNinja1:~ # service network restart Shutting down network interfaces: eth0 device: Intel Corporation I350 Gigabit Network Connec done eth1 device: Intel Corporation I350 Gigabit Network Connec done Shutting down service network . . . . . . . . . done Hint: you may set mandatory devices in /etc/sysconfig/network/config Setting up network interfaces: eth0 device: Intel Corporation I350 Gigabit Network Connec eth0 IP address: 172.28.200.161/24 done eth1 device: Intel Corporation I350 Gigabit Network Connec eth1 IP address: 172.16.0.5/24 done Setting up service network . . . . . . . . . . done SuperNinja1:~ # ifconfig -a eth0 Link encap:Ethernet HWaddr 0C:C4:7A:03:70:18 inet addr:172.28.200.161 Bcast:172.28.200.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:65826308 errors:0 dropped:18078140 overruns:0 frame:0 TX packets:258625520 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:13830916872 (13190.1 Mb) TX bytes:373734605729 (356421.0 Mb) Memory:fb920000-fb940000 eth1 Link encap:Ethernet HWaddr 0C:C4:7A:03:70:19 inet addr:172.16.0.5 Bcast:172.16.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Memory:fb900000-fb920000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:2515035 errors:0 dropped:0 overruns:0 frame:0 TX packets:2515035 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:621406221 (592.6 Mb) TX bytes:621406221 (592.6 Mb) SuperNinja1:~ #
If all it setup correctly, you should be able to ping all hosts.
SuperNinja1:~ # ping -c 4 SuperNinja2 PING SuperNinja2.xxxx.com (172.28.200.163) 56(84) bytes of data. 64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=1 ttl=64 time=0.141 ms 64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=2 ttl=64 time=0.146 ms 64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=3 ttl=64 time=0.161 ms 64 bytes from SuperNinja2.xxxx.com (172.28.200.163): icmp_seq=4 ttl=64 time=0.177 ms --- SuperNinja2.xxxx.com ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2997ms rtt min/avg/max/mdev = 0.141/0.156/0.177/0.016 ms SuperNinja1:~ # ping -c 4 SuperNinja3 PING SuperNinja3.xxxx.com (172.28.200.165) 56(84) bytes of data. 64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=1 ttl=64 time=0.113 ms 64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=2 ttl=64 time=0.214 ms 64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=3 ttl=64 time=0.181 ms 64 bytes from SuperNinja3.xxxx.com (172.28.200.165): icmp_seq=4 ttl=64 time=0.173 ms --- SuperNinja3.xxxx.com ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2998ms rtt min/avg/max/mdev = 0.113/0.170/0.214/0.037 ms SuperNinja1:~ # ping -c 4 SuperNinja4 PING SuperNinja4.xxxx.com (172.28.200.150) 56(84) bytes of data. 64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=1 ttl=64 time=3.18 ms 64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=2 ttl=64 time=0.169 ms 64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=3 ttl=64 time=0.202 ms 64 bytes from SuperNinja4.xxxx.com (172.28.200.150): icmp_seq=4 ttl=64 time=0.147 ms --- SuperNinja4.xxxx.com ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.147/0.925/3.185/1.305 ms SuperNinja1:~ # ping -c 4 SuperNinja5 PING SuperNinja5.xxxx.com (172.28.200.153) 56(84) bytes of data. 64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=1 ttl=64 time=5.96 ms 64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=2 ttl=64 time=0.224 ms 64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=3 ttl=64 time=0.152 ms 64 bytes from SuperNinja5.xxxx.com (172.28.200.153): icmp_seq=4 ttl=64 time=0.150 ms --- SuperNinja5.xxxx.com ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 0.150/1.623/5.966/2.507 ms SuperNinja1:~ #
Add all you hosts to all the /etc/hosts files in all the nodes
SuperNinja1:~ # cat /etc/hosts # # hosts This file describes a number of hostname-to-address # mappings for the TCP/IP subsystem. It is mostly # used at boot time, when no name servers are running. # On small systems, this file can be used instead of a # "named" name server. # Syntax: # # IP-Address Full-Qualified-Hostname Short-Hostname # # special IPv6 addresses ::1 localhost ipv6-localhost ipv6-loopback fe00::0 ipv6-localnet ff00::0 ipv6-mcastprefix ff02::1 ipv6-allnodes ff02::2 ipv6-allrouters ff02::3 ipv6-allhosts 172.28.200.161 SuperNinja1.xxxx.com SuperNinja1 127.0.0.1 localhost.localdomain localhost 172.28.200.163 SuperNinja2.xxxx.com SuperNinja2 172.28.200.165 SuperNinja3.xxxx.com SuperNinja3 172.28.200.150 SuperNinja4.xxxx.com SuperNinja4 172.28.200.153 SuperNinja5.xxxx.com SuperNinja5 SuperNinja1:~ #
HDFS works better if there in no LV setup, so what I did is to fdisk a partition and named it /hdfsfilesystemx, in the case of SuperNinja1, I also created a LV named /data, this is where the RAW files to be processed will be loaded to, see my blog post on how to create PVs, VGs and LVs. http://kingratlinux.blogspot.com/2014/06/create-physical-volumes-pv-volume.html
The 1st thing, see how many disks are in the machine
SuperNinja1:~ # hwinfo --disk --short disk: /dev/sdd SMC2108 /dev/sda SMC2108 /dev/sdc SMC2108 /dev/sdb SMC2108 SuperNinja1:~ #
Use fdisk to create the disks, on the 1st disk /dev/sda1 and /dev/sda2 is used for the OS, I then created /dev/sda3 for the /data slice
SuperNinja:~ #fdisk -l /dev/sda Disk /dev/sda: 999.0 GB, 998999326720 bytes 255 heads, 63 sectors/track, 121454 cylinders, total 1951170560 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00079fbc Device Boot Start End Blocks Id System /dev/sda1 * 2048 1060863 529408 83 Linux /dev/sda2 1060864 147861503 73400320 8e Linux LVM /dev/sda3 147861504 1951170559 901654528 8e Linux LVM SuperNinja:~ #
For /dev/sdb, /dev/sdc and /dev/sdd, no LV's are created, just a raw slice that's mounted, Note the type is set to Linux, not Linux LVM as /dev/sda3
SuperNinja1:~ # fdisk -l /dev/sdb Disk /dev/sdb: 999.0 GB, 998999326720 bytes 192 heads, 17 sectors/track, 597785 cylinders, total 1951170560 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x25303156 Device Boot Start End Blocks Id System /dev/sdb3 2048 1951170559 975584256 83 Linux SuperNinja1:~ # fdisk -l /dev/sdc Disk /dev/sdc: 999.0 GB, 998999326720 bytes 192 heads, 17 sectors/track, 597785 cylinders, total 1951170560 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x43492116 Device Boot Start End Blocks Id System /dev/sdc3 2048 1951170559 975584256 83 Linux SuperNinja1:~ # fdisk -l /dev/sdd Disk /dev/sdd: 999.0 GB, 998999326720 bytes 192 heads, 17 sectors/track, 597785 cylinders, total 1951170560 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x94721d15 Device Boot Start End Blocks Id System /dev/sdd3 2048 1951170559 975584256 83 Linux SuperNinja1:~ # df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/system-root 2.0G 368M 1.6G 20% / udev 32G 200K 32G 1% /dev tmpfs 32G 0 32G 0% /dev/shm /dev/sda1 509M 132M 352M 28% /boot /dev/mapper/system-home 1008M 40M 918M 5% /home /dev/mapper/system-opt 9.9G 1.5G 8.0G 16% /opt /dev/mapper/system-srv 6.9G 4.7G 1.9G 72% /srv /dev/mapper/system-tmp 5.0G 144M 4.6G 3% /tmp /dev/mapper/system-usr 6.0G 4.0G 1.7G 71% /usr /dev/mapper/system-var 4.0G 377M 3.4G 10% /var /dev/sdb3 916G 152G 718G 18% /hdfsfilesystem1 /dev/sdc3 916G 150G 720G 18% /hdfsfilesystem2 /dev/sdd3 916G 152G 718G 18% /hdfsfilesystem3 /dev/mapper/datadisk-part0 493G 227G 241G 49% /data SuperNinja1:~ #
Disable IPV6 if you not using it
Add the following lines to /etc/sysctl.conf
# disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1The file should look like this below, reboot the server after the changes.
SuperNinja1:/home # cat /etc/sysctl.conf # Disable response to broadcasts. # You don't want yourself becoming a Smurf amplifier. net.ipv4.icmp_echo_ignore_broadcasts = 1 # enable route verification on all interfaces net.ipv4.conf.all.rp_filter = 1 # enable ipV6 forwarding #net.ipv6.conf.all.forwarding = 1 # increase the number of possible inotify(7) watches fs.inotify.max_user_watches = 65536 # avoid deleting secondary IPs on deleting the primary IP net.ipv4.conf.default.promote_secondaries = 1 net.ipv4.conf.all.promote_secondaries = 1 # disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 SuperNinja1:/home # cat /proc/sys/net/ipv6/conf/all/disable_ipv6 0 SuperNinja1:/home # reboot Broadcast message from root (pts/0) (Wed May 28 10:27:23 2014): The system is going down for reboot NOW! SuperNinja1:/home #
Once the machine is up, check if IPV6 has been disabled, it should be 1
Xshell:\> ssh root@172.28.200.161 Connecting to 172.28.200.161:22... Connection established. Escape character is '^@]'. WARNING! The remote SSH server rejected X11 forwarding request. Last login: Wed May 28 09:01:25 2014 from kingrat SuperNinja1:~ # cat /proc/sys/net/ipv6/conf/all/disable_ipv6 1 SuperNinja1:~ #
awesome post presented by you..your writing style is fabulous and keep update with your blogs
ReplyDeleteBig data hadoop online Course Hyderabad
really Good blog post.provided a helpful information.I hope that you will post more updates like thisBig data hadoop online Training
ReplyDeleteReally very great article,keep sharing more blogs with us.
ReplyDeletebig data and hadoop online training