参照netseek的pdf,centos6 64bit
nagios 安装步骤 1在做安装之前确认要对该机器拥有root 权限。 确认你安装好的linux 系统上已经安装如下软件包再继续。 Apache GCC 编译器 GD库与开发库 yum -y install httpd gcc glibc glibc-common gd gd-devel 2 建立nagios 账号 /usr/sbin/useradd nagios && passwd nagios 创建一个用户组名为nagcmd用于从Web 接口执行外部命令 用户都加到这个组中 /usr/sbin/groupadd nagcmd /usr/sbin/usermod ‐ G nagcmd nagios /usr/sbin/usermod ‐ G nagcmd apache 3 下载nagios 和插件程序包 下载Nagios 和Nagios 插件的软件包( 访问http://www.nagios.org/download/站点以获得最 新版本) cd /usr/local/src wget http://nchc.dl.sourceforge.net/sourceforge/nagios/nagios-3.0.6.tar.gz wget http://nchc.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz 4 编译与安装nagios cd /usr/local/src tar zxvf nagios-3.0.6.tar.gz cd nagios-3.0.6 ./configure --with-command-group=nagcmd --prefix=/usr/local/nagios make all make install make install-init make install-config make install-commandmode 验证程序是否被正确安装。切换目录到安装路径(这里是/usr/local/nagios),看是否存在 etc、bin、 sbin、 share、 var 这五个目录,如果存在则可以表明程序被正确的安装到系 统了。后表是五个目录功能的简要说明: 5 编译并安装nagios 插件 nagios-plugins cd /usr/local/src tar zxvf nagios-plugins-1.4.13.tar.gz cd nagios-plugins-1.4.13 ./configure --with-nagios-user=nagios --with-nagios-group=nagios --prefix=/usr/local/nagios make && make install 验证: ls /usr/local/nagios/libexec 会显示安装的插件文件,即所有的插件都安装在 libexec 这个目录下 6配置WEB 接口 方法一:直接在安装nagios 时 make install ‐ webconf 创建一个nagiosadmin的用户用于Nagios 的WEB 接口登录。记下你所设置的登录口 令,一会儿你会用到它。 htpasswd ‐ c /usr/local/nagios/etc/htpasswd.users nagiosadmin 重启Apache服务以使设置生效。 service httpdrestart 方法二:在httpd.conf最后添加如下内容: #for nagios ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin <Directory "/usr/local/nagios/sbin"> Options ExecCGI AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthType Basic AuthUserFile /usr/local/nagios/etc/htpasswd Require valid-user </Directory> Alias /nagios /usr/local/nagios/share <Directory "/usr/local/nagios/share"> Options None AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthType Basic AuthUserFile /usr/local/nagios/etc/htpasswd Require valid-user </Directory> htpasswd ‐ c /usr/local/nagios/etc/htpasswd test New password: (输入123456) Re‐ type new password: (再输入一次密码) Adding password for user test 查看认证文件的内容 less /usr/local/nagios/etc/htpasswd test:OmWGEsBnoGpIc 前半部分是用户名test, 后面是加密后的密码 本例添加的是 test 用户名,需要改 cgi.cfg 配置文件,允许test 用户 vi /usr/local/nagios/etc/cgi.cfg authorized_for_system_information=test authorized_for_configuration_information=test authorized_for_system_commands=test authorized_for_all_services=test authorized_for_all_hosts=nagiosadmin,test authorized_for_all_ service_commands=test authorized_for_all_host_commands=test 7 启动nagios 把Nagios 加入到服务列表中以使之在系统启动时自动启动 chkconfig ‐‐ add nagios chkconfig nagios on 验证Nagios 的样例配置文件 /usr/local/nagios/bin/nagios ‐ v /usr/local/nagios/etc/nagios.cfg 有可能 Nagios 3.0.6 Copyright (c) 1999-2008 Ethan Galstad (http://www.nagios.org) Last Modified: 12-01-2008 License: GPL Error: Cannot open main configuration file '/usr/local/‐' for reading! 然后赋予权限也不行 直接重启nagios服务 启动即可 Nagios 3.0.6 starting... (PID=2821) Local time is Thu Feb 16 14:24:25 CST 2012 Bailing out due to one or more errors encountered in the configuration files. Run Nagios from the command line with the -v option to verify your config before restarting. (PID=2821) 如果没有报错,可以启动Nagios 服务 service nagios start service httpd start 8 setenforce 0(执行这个命令就可了) 令SELinux处于容许模式 setenforce 0 如果要永久性更变它,需要更改/etc/selinux/config 里的设置并重启系统。 不关闭SELinux或是永久性变更它的方法是让 CGI 模块在SELinux下指定强制目标模式: chcon‐ R‐ t httpd_sys_content_t /usr/local/nagios/sbin/ chcon‐ R‐ t httpd_sys_content_t /usr/local/nagios/share/ 9 测试 登录 http://localhost/nagios/ 输入用户名test和密码123456就可以正常登录了 十 如何配置监控远程主机 1 在被监控主机上 增加用户 useradd nagios 设置密码 passwd nagios 安装nagios插件 wget http://nchc.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz tar zxvf nagios-plugins-1.4.13.tar.gz cd nagios-plugins-1.4.13 ./configure make make install chown nagios.nagios /usr/local/nagios/ chown -R nagios.nagios /usr/local/nagios/libexec/ 2 nagios 安装nrpe的时候步骤(监控与被监控都要安装) tar -zxvf nrpe-2.8.1.tar.gz cd nrpe-2.8.1 ./configure make all make install-plugin make install-daemon make install-daemon-config 3 vim /usr/local/nagios/etc/nrpe.cfg #allowed_hosts=127.0.0.1 allowed_hosts=127.0.0.1,192.168.1.130(192.168.1.130监控端的地址) 改/etc/hosts.allow增加监控机ip echo 'nrpe:192.168.1.130' >> /etc/hosts.allow 4启动服务 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 测试nrpe服务是否正常 /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1(用127.0.0.1测试 不要用localhost测试) NRPE v2.8.1 5在监控端(192.168.1.130)测试 看到如下结果说明成功 /etc/init.d/iptables stop(或者添加允许从被监控端收集信息) /usr/local/nagios/libexec/check_nrpe -H 192.168.1.129 NRPE v2.8.1 然后在监控端 1 vim /usr/local/nagios/etc/objects/129.cfg 内容如下 define host{ use linux-server host_name 129 alias 129 address 192.168.1.129 } define service{ use generic-service host_name 129 service_description load check_command check_nrpe!check_load #使用自定参数 #check_command check_nrpe!check_load!6.0,5.0,4.0!15.0,8.0,6.0 } vim /usr/local/nagios/etc/nagios.cfg 添加如下内容 # Definitions for monitoring 192.168.1.129 cfg_file=/usr/local/nagios/etc/objects/129.cfg vim /usr/local/nagios/etc/objects/commands.cfg # 'check_nrpe ' command definition define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ } 监控机nagios重启 service nagios reload 输入http://192.168.1.130/nagios 就可看到129已经添加成功 nagios监控swap 在被监控机的/usr/local/nagios/etc/nrpe.cfg vim /usr/local/nagios/etc/nrpe.cfg添加 command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10% nrpe服务重启 [root@localhost libexec]# ps -ef | grep nrpe nagios 2332 1 0 14:24 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d root 2373 28887 0 14:25 pts/0 00:00:00 grep nrpe kill -9 2332 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 监控端 /usr/local/nagios/etc/objects/commands.cfg添加 # check_swap command definition define command{ command_name check_swap command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$ } 在下面的文件中 vim /usr/local/nagios/etc/objects/129.cfg添加 define service{ use generic-service host_name 129 service_description swap check_command check_nrpe!check_swap } 重启nagios服务和http服务 service nagios restart service httpd restart nagios监控磁盘 在被监控机的/usr/local/nagios/etc/nrpe.cfg vim /usr/local/nagios/etc/nrpe.cfg添加 command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p / nrpe服务重启 [root@localhost libexec]# ps -ef | grep nrpe nagios 2332 1 0 14:24 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d root 2373 28887 0 14:25 pts/0 00:00:00 grep nrpe kill -9 2332 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 监控端 /usr/local/nagios/etc/objects/commands.cfg添加 define command{ command_name check_disk command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ } 在下面的文件中 vim /usr/local/nagios/etc/objects/129.cfg添加 define service{ use generic-service host_name 129 service_description disk check_command check_nrpe!check_disk } 重启nagios服务和http服务 service nagios restart service httpd restart nagios监控内存 监控内存脚本如下 ###################################### #!/bin/bash # check memory script TOTAL=`free -m | head -2 |tail -1 |gawk '{print $2}'` USED=`free -m | head -2 |tail -1 |gawk '{print $3}'` FREE=`free -m | head -2 |tail -1 |gawk '{print $4}'` # to calculate free percent # use the expression free * 100 / total FREETMP=`expr $FREE \* 100` PERCENT=`expr $FREETMP / $TOTAL` echo "$TOTAL MB Total Memory" echo "$USED MB Used Memory" echo "$FREE MB ($PERCENT%) Free Memory" exit 0 ###################################### 在被监控机的/usr/local/nagios/etc/nrpe.cfg vim /usr/local/nagios/etc/nrpe.cfg添加 command[check_mem]=/usr/local/nagios/libexec/check_mem -w 150 -c 200 把监控脚本check_mnem放到/usr/local/nagios/libexec/ 并赋予执行权限 chmod +x /usr/local/nagios/libexec/check_mem chown nagios.nagios /usr/local/nagios/libexec/check_mem nrpe服务重启 [root@localhost libexec]# ps -ef | grep nrpe nagios 2332 1 0 14:24 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d root 2373 28887 0 14:25 pts/0 00:00:00 grep nrpe kill -9 2332 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d 监控端 /usr/local/nagios/etc/objects/commands.cfg添加 define command{ command_name check_mem command_line $USER1$/check_mem -w $ARG1$ -c $ARG2$ } 在下面的文件中 vim /usr/local/nagios/etc/objects/129.cfg添加 define service{ use generic-service host_name 129 service_description memory check_command check_nrpe!check_mem } 重启nagios服务和http服务 service nagios restart service httpd restart nagios监控http存活状态 被监控机不需要任何操作(因为check_http不需要通过nrpe来监控) 监控端 /usr/local/nagios/etc/objects/commands.cfg已经存在check_http命令 故也不需要操作 在下面的文件中 vim /usr/local/nagios/etc/objects/129.cfg添加 define service{ use generic-service host_name 129 service_description http check_command check_http(这一行要注意 不是check_nrpe!check_http这种形式) } 重启nagios服务和http服务 service nagios restart service httpd restart 错误解决方法 因为http是采用yum安装的 网站文件路径默认是/var/www/html 执行下面命令检测时 /usr/local/nagios/libexec/check_http -I 192.168.1.129 报错如下 HTTP WARNING: HTTP/1.1 403 Forbidden 原因这是因为/var/www/html 下面没有文件所致 cd /var/www/html echo 123 >index.html 然后过一会 nagios检测就ok了 nagios监控mysql存活状态 被监控机登录数据库授权 mysql> grant all privileges on *.* to xxxxx@192.168.1.130 identified by '123456'; Query OK, 0 rows affected (0.09 sec) mysql> flush privileges; Query OK, 0 rows affected (0.08 sec) 监控端 /usr/local/nagios/etc/objects/commands.cfg添加如下内容 # check_mysql command definition define command{ command_name check_mysql command_line $USER1$/check_mysql -H $HOSTADDRESS$ -P $ARG1$ - u $ARG2$ -p $ARG3$ (liuyu那个pdf有问题) } 在下面的文件中 vim /usr/local/nagios/etc/objects/129.cfg添加 define service{ use generic-service host_name 129 service_description mysql check_command check_mysql!192.168.1.129!3306!xxxx!123456(这一行liuyu文档上是对的 这一行要注意 不是check_nrpe!check_http这种形式) notifications_enabled 0 } 重启nagios服务和http服务 service nagios restart service httpd restart nagios监控tomcat存活状态 被监控机不需要任何操作(因为check_tcp!8080不需要通过nrpe来监控) 监控端 /usr/local/nagios/etc/objects/commands.cfg已经存在check_tcp命令 故也不需要操作 在下面的文件中 vim /usr/local/nagios/etc/objects/hong221.cfg添加 define service{ use generic-service host_name hong221 service_description tomcat check_command check_tcp!8080!xxxxx } 收到检测 执行下面命令 [root@nagios objects]# /usr/local/nagios/libexec/check_tcp -H xxxxx -p 8080 TCP OK - 0.141 second response time on port 8080|time=0.141140s;;;0.000000;10.000000 重启nagios服务和http服务 service nagios restart service httpd restart 然后在监控端就可以看到监控页面了 nagios配置139邮箱报警 关于mail发送邮件139邮箱收不到的解决办法 tail -f /var/log/maillog 日志报错如下 Feb 21 17:20:49 localhost postfix/qmgr[2072]: A296612227F: from=<root@localhost.localdomain>, size=700, nrcpt=1 (queue active) Feb 21 17:20:49 localhost sendmail[2275]: q1L9KmDa002275: to=xxxxx@139.com, ctladdr=root (0/0), delay=00:00:01, xdelay=00:00:0 0, mailer=relay, pri=30221, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (Ok: queued as A296612227F) Feb 21 17:20:49 localhost postfix/smtpd[2276]: disconnect from localhost.localdomain[127.0.0.1] Feb 21 17:20:50 localhost postfix/smtp[2280]: A296612227F: to=<xxxxx@139.com>, relay=mx1.mail.139.com[221.176.9.178]:25, delay =0.53, delays=0.05/0.01/0.24/0.23, dsn=5.0.0, status=bounced (host mx1.mail.139.com[221.176.9.178] said: 550 985a4f43618db72-3c5de Mail rejected (in reply to end of DATA command)) Feb 21 17:20:50 localhost postfix/cleanup[2279]: 43FB812227E: message-id=<20120221092050.43FB812227E@localhost.localdomain> Feb 21 17:20:50 localhost postfix/qmgr[2072]: 43FB812227E: from=<>, size=2697, nrcpt=1 (queue active) Feb 21 17:20:50 localhost postfix/bounce[2281]: A296612227F: sender non-delivery notification: 43FB812227E Feb 21 17:20:50 localhost postfix/qmgr[2072]: A296612227F: removed 经指点是由于hostname(localhost.localdomain)的问题 可能会被139邮箱当做垃圾邮件 [root@nagios objects]# cat /etc/sysconfig/network NETWORKING=yes #HOSTNAME=localhost.localdomain HOSTNAME=nagios.localdomain [root@nagios objects]# cat /etc/hosts 192.168.1.130 nagios.localdomain nagios # Added by NetworkManager 127.0.0.1 localhost.localdomain localhost ::1 nagios.localdomain nagios localhost6.localdomain6 localhost6 故随便改了一个名字 然后重启服务器发现可以使用了 139邮箱也能收到邮件了 关于服务报警nagios方面的配置 监控机上 vim /usr/local/nagios/etc/objects/contacts.cfg define contact{ contact_name nagiosadmin ; Short name of user use generic-contact ; Inherit default values from generic-contact template (defined abov e) alias Nagios Admin ; Full name of user service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email email xxxxx@139.com(写上你要发送到的邮箱里面 139邮箱运维必备) ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ****** } define contactgroup{ contactgroup_name admins alias Nagios Administrators members nagiosadmin } 然后重启nagios服务即可 service nagios restart 注意在主机配置文件中 有下面语句的服务出了问题才会报警 notifications_enabled 1 (1是报警 0为不报警) 注意申请139邮箱的时候短信要选长格式的 邮件到达通知 要改成24小时的 vim templates.cfg define service{ name generic-service ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can l ead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be d etermined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 10 (这个就是间隔多少时间发一次报警信息) ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEM PLATE! } nagios相关错误解决方法 错误解决方法 一 当新增加一台监控主机(举例为129的load)监控项 点击Scheduling Queue--129load时 Status Information :这一项提示为CHECK_NRPE: Socket timeout after 10 seconds 检查 1 首先在监控主机上 执行 /usr/local/nagios/libexec/check_nrpe -H 192.168.1.129 看能不能得到NRPE的版本号 然后查看iptables是否有相关限制 2 查看文件权限 cd /usr/local/nagios/etc/objects [root@localhost objects]# ll total 52 -rw-r--r-- 1 root root 314 Feb 16 15:58 129.cfg -rwxrwxrwx 1 nagios nagios 7856 Feb 16 16:06 commands.cfg -rwxrwxrwx 1 nagios nagios 2166 Feb 16 13:58 contacts.cfg -rwxrwxrwx 1 nagios nagios 5403 Feb 16 13:58 localhost.cfg -rwxrwxrwx 1 nagios nagios 3124 Feb 16 13:58 printer.cfg -rwxrwxrwx 1 nagios nagios 3293 Feb 16 13:58 switch.cfg -rwxrwxrwx 1 nagios nagios 10812 Feb 16 13:58 templates.cfg -rwxrwxrwx 1 nagios nagios 3209 Feb 16 13:58 timeperiods.cfg -rwxrwxrwx 1 nagios nagios 4007 Feb 16 13:58 windows.cfg 看看新增加的这个监控主机文件权限是不是nagios用户可读可写 不可以的话参照其他文件修改如下 [root@localhost objects]# ll total 52 -rwxrwxrwx 1 nagios nagios 314 Feb 16 15:58 129.cfg -rwxrwxrwx 1 nagios nagios 7856 Feb 16 16:06 commands.cfg -rwxrwxrwx 1 nagios nagios 2166 Feb 16 13:58 contacts.cfg -rwxrwxrwx 1 nagios nagios 5403 Feb 16 13:58 localhost.cfg -rwxrwxrwx 1 nagios nagios 3124 Feb 16 13:58 printer.cfg -rwxrwxrwx 1 nagios nagios 3293 Feb 16 13:58 switch.cfg -rwxrwxrwx 1 nagios nagios 10812 Feb 16 13:58 templates.cfg -rwxrwxrwx 1 nagios nagios 3209 Feb 16 13:58 timeperiods.cfg -rwxrwxrwx 1 nagios nagios 4007 Feb 16 13:58 windows.cfg
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。