Strange thing about Zabbix, support for hardware errors, I suppose one can use IPMI, but what a shlepp to setup, I think a good way to monitor disks in a Linux machine is to use a utility called
hpacucli
The one I use is hpacucli-9.0-24.0.noarch.rpmDownload the rpm and save to /etc/zabbix/scripts
Install the RPM
svr1:/etc/zabbix/scripts # ls -ltr *.rpm -rw-r--r-- 1 root root 6504897 Mar 25 11:27 hpacucli-9.0-24.0.noarch.rpm svr1:/etc/zabbix/scripts # rpm -ivh hpacucli-9.0-24.0.noarch.rpm Preparing... ########################################### [100%] 1:hpacucli ########################################### [100%] svr1:/etc/zabbix/scripts #
vi a script called zx_raid_status.stage1.sh in /etc/zabbix/scripts, the script is below, just copy and paste and save
#!/bin/bash # Script (run by root) to get raid status # Changelog # 0.3 HP GEN 8 2 x controllers - King Rat 20130405 # 0.2 Provide absolute path to hpacucli binary, and make logging clearer # 0.1 Base version - 20120423 # Params # Version VER="0.3" if [ -f /etc/zabbix/scripts/diskstatus.log ];then rm /etc/zabbix/scripts/diskstatus.log fi touch /etc/zabbix/scripts/diskstatus.log chown zabbix:zabbix /etc/zabbix/scripts/diskstatus.log # The logical disk(s) LOGFILE="/etc/zabbix/scripts/diskstatus.log" echo "Version "$VER > $LOGFILE echo "Disk(s) last checked at "`date` >> $LOGFILE echo `hostname -a` >> $LOGFILE LDSTAT="/tmp/zx_ldstatus" > ${LDSTAT} # The physical disks PDSTAT="/tmp/zx_pdstatus" > ${PDSTAT} # Our logger tag TAG="zx_raidstatus" # The app location APP="/usr/sbin/hpacucli" # Functions nocont() { # How many controllers ${APP} ctrl all show config | grep -i "slot" | awk '{print $6}' > /etc/zabbix/scripts/cont.txt sort /etc/zabbix/scripts/cont.txt > /etc/zabbix/scripts/sort.log } out() { # Write to the log file logger -s -t ${TAG} } runroot() { # This has to be run as root if [ `whoami` != 'root' ] then echo "This has to be run by root" | out exit fi } pdstatus() { while read line; do # This check the status of all physical disks ${APP} ctrl slot=$line pd all show status | out ${APP} ctrl slot=$line pd all show status >> $LOGFILE ECNT=`${APP} ctrl slot=$line pd all show status | egrep -i "(fail|error|offline|rebuild|ignoring|degraded|skipping|nok)" | wc -l` if [ ${ECNT} -gt 0 ] then echo "${ECNT} non-OK statuses being reported (physical disk)" | out echo "${ECNT} non-OK statuses being reported (physical disk)" >> $LOGFILE echo ${ECNT} > ${PDSTAT} else echo 0 > ${PDSTAT} echo "Physical drives - all ok" >> $LOGFILE fi done < /etc/zabbix/scripts/sort.log } ldstatus() { while read line; do # This check the status of all physical disks ${APP} ctrl slot=$line logicaldrive all show status | out ${APP} ctrl slot=$line logicaldrive all show status >>$LOGFILE ECNT=`${APP} ctrl slot=$line pd all show status | egrep -i "(fail|error|offline|rebuild|ignoring|degraded|skipping|nok)" | wc -l` if [ ${ECNT} -gt 0 ] then echo "${ECNT} non-OK statuses being reported (logical disk)" | out echo "${ECNT} non-OK statuses being reported (logical disk)" >> $LOGFILE echo ${ECNT} > ${LDSTAT} else echo 0 > ${LDSTAT} echo "Logical drives - all ok" >> $LOGFILE fi done < /etc/zabbix/scripts/sort.log } # Execute echo "${VER} started" runroot nocont ldstatus pdstatus
vi a script called zx_raid_status_pdstat.sh in /etc/zabbix/scripts, the script is below, just copy and paste and save
#!/bin/sh # This is the second stage run by zabbix to get the last physical disk error count # Changelog # 0.1 Base version # Params # Our version VER="0.1" # Our files to read PDSTAT="/tmp/zx_pdstatus" cat ${PDSTAT}
vi a script called zx_raid_status_ldstat.sh in /etc/zabbix/scripts, the script is below, just copy and paste and save
#!/bin/sh # This is the second stage run by zabbix to get the last logical disk error count # Changelog # 0.1 Base version # Params # Our version VER="0.1" # Our files to read LDSTAT="/tmp/zx_ldstatus" cat ${LDSTAT}
You should have the following when done
svr1:/opt/temp # cd /etc/zabbix/scripts/ svr1:/etc/zabbix/scripts # ls -ltr total 6492 -rw-r--r-- 1 root root 1503 Mar 25 11:25 zx_raid_status.stage1.sh -rw-r--r-- 1 root root 242 Mar 25 11:25 zx_raid_status.pdstat.sh -rw-r--r-- 1 root root 241 Mar 25 11:25 zx_raid_status.ldstat.sh -rw-r--r-- 1 root root 6504897 Mar 25 11:27 hpacucli-9.0-24.0.noarch.rpm svr1:/etc/zabbix/scripts #
Make the scripts executable with chmod +x *.sh and set the owner to Zabbix
svr1:/etc/zabbix/scripts # chmod +x zx*.sh svr1:/etc/zabbix/scripts # chown zabbix:zabbix zx*.sh svr1:/etc/zabbix/scripts # ls -ltr zx*.sh -rwxr-xr-x 1 zabbix zabbix 1503 Mar 25 11:25 zx_raid_status.stage1.sh -rwxr-xr-x 1 zabbix zabbix 242 Mar 25 11:25 zx_raid_status.pdstat.sh -rwxr-xr-x 1 zabbix zabbix 241 Mar 25 11:25 zx_raid_status.ldstat.sh svr1:/etc/zabbix/scripts #
Run the file manually to make sure that it works - zx_raid_status.stage1.sh
svr1:/etc/zabbix/scripts # /etc/zabbix/scripts/zx_raid_status.stage1.sh 0.2 started zx_raidstatus: zx_raidstatus: logicaldrive 1 (279.4 GB, RAID 1): OK zx_raidstatus: logicaldrive 2 (1.1 TB, RAID 0): OK zx_raidstatus: logicaldrive 3 (1.4 TB, RAID 1+0): Failed zx_raidstatus: zx_raidstatus: 4 non-OK statuses being reported (logical disk) zx_raidstatus: zx_raidstatus: physicaldrive 2C:1:1 (port 2C:box 1:bay 1, 300 GB): OK zx_raidstatus: physicaldrive 2C:1:2 (port 2C:box 1:bay 2, 300 GB): OK zx_raidstatus: physicaldrive 2C:1:3 (port 2C:box 1:bay 3, 300 GB): OK zx_raidstatus: physicaldrive 2C:1:4 (port 2C:box 1:bay 4, 300 GB): OK zx_raidstatus: physicaldrive 3C:1:5 (port 3C:box 1:bay 5, 300 GB): OK zx_raidstatus: physicaldrive 3C:1:6 (port 3C:box 1:bay 6, 300 GB): OK zx_raidstatus: physicaldrive 3C:1:7 (port 3C:box 1:bay 7, 300 GB): Failed zx_raidstatus: physicaldrive 3C:1:8 (port 3C:box 1:bay 8, 300 GB): Failed zx_raidstatus: physicaldrive 4C:2:1 (port 4C:box 2:bay 1, 300 GB): OK zx_raidstatus: physicaldrive 4C:2:2 (port 4C:box 2:bay 2, 300 GB): OK zx_raidstatus: physicaldrive 4C:2:3 (port 4C:box 2:bay 3, 300 GB): OK zx_raidstatus: physicaldrive 4C:2:4 (port 4C:box 2:bay 4, 300 GB): Failed zx_raidstatus: physicaldrive 5C:2:5 (port 5C:box 2:bay 5, 300 GB): OK zx_raidstatus: physicaldrive 5C:2:6 (port 5C:box 2:bay 6, 300 GB): OK zx_raidstatus: physicaldrive 5C:2:7 (port 5C:box 2:bay 7, 300 GB): OK zx_raidstatus: physicaldrive 5C:2:8 (port 5C:box 2:bay 8, 300 GB): Failed zx_raidstatus: zx_raidstatus: 4 non-OK statuses being reported (physical disk) svr1:/etc/zabbix/scripts #
Add the following line to the root crontab, this line will run the script every 5 min and write logfiles to /tmp, The logfiles in /tmp will contain the number of errors on the disks
*/5 * * * * /etc/zabbix/scripts/zx_raid_status.stage1.sh > /dev/null 2>&1
svr1:/etc/zabbix/scripts # crontab -l # DO NOT EDIT THIS FILE - edit the master and reinstall. # (/tmp/crontab.XXXXIM2c9I installed on Mon Mar 25 11:37:16 2013) # (Cron version V5.0 -- $Id: crontab.c,v 1.12 2004/01/23 18:56:42 vixie Exp $) */5 * * * * /etc/zabbix/scripts/zx_raid_status.stage1.sh > /dev/null 2>&1 svr1:/etc/zabbix/scripts #
Change the Zabbix config file
svr1:/etc/zabbix/scripts # vi /etc/zabbix/zabbix_agentd.conf
and add this to the bottom of the file
UserParameter=raid.lderror,/etc/zabbix/scripts/zx_raid_status.ldstat.sh
UserParameter=raid.pderror,/etc/zabbix/scripts/zx_raid_status.pdstat.sh
svr1:/etc/zabbix/scripts # tail /etc/zabbix/zabbix_agentd.conf #UserParameter=mysql.qps,mysqladmin -uroot status|cut -f9 -d":" #UserParameter=mysql.version,mysql -V UserParameter=raid.lderror,/etc/zabbix/scripts/zx_raid_status.ldstat.sh UserParameter=raid.pderror,/etc/zabbix/scripts/zx_raid_status.pdstat.sh svr1:/etc/zabbix/scripts #
Stop and start the Zabbix agent
svr1:/etc/zabbix/scripts # /etc/init.d/zabbix-agent stop Shutdown may take a while.... Shutting down zabbix_agent: done svr1:/etc/zabbix/scripts # /etc/init.d/zabbix-agent start Starting zabbix_agent: done svr1:/etc/zabbix/scripts # /etc/init.d/zabbix-agent status Zabbix agent running(PID): 16290 16291 16292 16293 16294 svr1:/etc/zabbix/scripts #
The ITEMS and TRIGGERS are setup on the Zabbix server as follow
No comments:
Post a Comment
Note: only a member of this blog may post a comment.