2013年8月8日星期四

VPS自动监控Shell脚本


该脚本用于监控VPS服务器负载,Web程序内存及CPU使用。当服务器系统负载或内存使用达到预设值,则重启该程序,或者某个php-cgi进程占用CPU过大,则直接kill掉该进程。目的在于缓解服务器资源耗尽导致意外宕机等情况。
嗯,没错。该脚本是此前 v1 的更新版本,考虑今后可能还会更新,故移到 github gist 进行简单的版本控制。

一、使用方法:

1
2
3
4
git clone git://gist.github.com/1216837.git gist-1216837
vim gist-1216837/sys-mon.sh //修改内存、CPU等预设阀值
mkdir /var/script
mv gist-1216837/sys-mon.sh /var/script
设置每分钟执行一次
1
2
crontab -e
* * * * * /bin/bash  /var/shell/sys-mon.sh

二、Shell脚本内容

建议打开下面网址查看最新版本。
https://gist.github.com/1216837
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
#! /bin/bash
#====================================================================
# sys-mon.sh
#
# Copyright (c) 2011, WangYan <webmaster@wangyan.org>
# All rights reserved.
# Distributed under the GNU General Public License, version 3.0.
#
# Monitor system mem and load, if too high, restart some service.
#
# See: http://wangyan.org/blog/sys-mon-shell-script.html
#
# V 0.5, Date: 2011-12-08
#====================================================================
 
# Need to monitor the service name
# Must be in /etc/init.d folder exists
NAME_LIST="httpd nginx mysql"
 
# Single process to allow the maximum CPU (%)
PID_CPU_MAX="25"
 
# The maximum allowed memory (%)
PID_MEM_SUM_MAX="95"
 
# The maximum allowed system load
SYS_LOAD_MAX="6"
 
# Log path settings
LOG_PATH="/var/log/sys-mon.log"
 
# Date time format setting
DATA_TIME=$(date +"%y-%m-%d %H:%M:%S")
 
# Your email address
EMAIL="webmaster@example.com"
 
# Your website url
MY_URL="http://106.187.38.210/p.php"
 
#====================================================================
 
for NAME in $NAME_LIST
do
    PID_CPU_SUM="0";PID_MEM_SUM="0"
    PID_LIST=`ps aux | grep $NAME | grep -v root`
 
    IFS_TMP="$IFS";IFS=$'\n'
    for PID in $PID_LIST
    do
        PID_NUM=`echo $PID | awk '{print $2}'`
        PID_CPU=`echo $PID | awk '{print $3}'`
        PID_MEM=`echo $PID | awk '{print $4}'`
#       echo "$NAME: PID_NUM($PID_NUM) PID_CPU($PID_CPU) PID_MEM($PID_MEM)"
 
        PID_CPU_SUM=`echo "$PID_CPU_SUM + $PID_CPU" | bc`
        PID_MEM_SUM=`echo "$PID_MEM_SUM + $PID_MEM" | bc`
 
        if [ `echo "$PID_CPU >= $PID_CPU_MAX" | bc` -eq 1 ];then
            if [[ "$NAME" = "php-fpm" || "$NAME" = "httpd" ]];then
                sleep 5
                if [ `echo "$PID_CPU >= $PID_CPU_MAX" | bc` -eq 1 ];then
                    echo "${DATA_TIME}: kill ${NAME}($PID_NUM) successful (CPU:$PID_CPU)" | tee -a $LOG_PATH
                    kill $PID_NUM
                fi
            else
                echo "${DATA_TIME}: [WARNING!] ${NAME}($PID_NUM) cpu usage is too high! (CPU:$PID_CPU)" | tee -a $LOG_PATH
            fi
        fi
    done
    IFS="$IFS_TMP"
 
    SYS_LOAD=`uptime | awk '{print $(NF-2)}' | sed 's/,//'`
    SYS_MON="CPU:$PID_CPU_SUM MEM:$PID_MEM_SUM LOAD:$SYS_LOAD"
#   echo -e "$NAME: $SYS_MON\n"
 
    SYS_LOAD_TOO_HIGH=`awk 'BEGIN{print('$SYS_LOAD'>'$SYS_LOAD_MAX')}'`
    PID_MEM_SUM_TOO_HIGH=`awk 'BEGIN{print('$PID_MEM_SUM'>'$PID_MEM_SUM_MAX')}'`
 
    if [[ "$SYS_LOAD_TOO_HIGH" = "1" || "$PID_MEM_SUM_TOO_HIGH" = "1" ]];then
        /etc/init.d/$NAME stop
        sleep 5
        for ((i=1;i<4;i++))
        do
            if [ `pgrep $NAME | wc -l` = "0" ];then
                echo "$DATA_TIME: Stop $NAME successful! ($SYS_MON)" | tee -a $LOG_PATH
                break
            else
                echo "${DATA_TIME}: [WARNING!] Stop $NAME failed[$i]! ($SYS_MON)" | tee -a $LOG_PATH
                pkill $NAME && killall $NAME
            fi
        done
        /etc/init.d/$NAME start
        sleep 5
        for ((ii=1;ii<4;ii++))
        do
            if [ `pgrep $NAME | wc -l` != "0" ];then
                echo "$DATA_TIME: Start $NAME successful!" | tee -a $LOG_PATH
                break
            else
                echo "${DATA_TIME}: [WARNING!] Start $NAME failed[$ii]! ($SYS_MON)" | tee -a $LOG_PATH
                /etc/init.d/$NAME start
                sleep 5
            fi
        done
        if [ `pgrep $NAME | wc -l` != "0" ];then
            echo "${DATA_TIME}: [ERROR!] Start $NAME failed! ($SYS_MON)" | mail -s "Start $NAME failed" $EMAIL
        fi
    fi
done
 
STATUS_CODE=`curl -o /dev/null -s -w %{http_code} $MY_URL`
#echo -e "STATUS CODE: $STATUS_CODE\n"
 
if [ "$STATUS_CODE" != "200" ];then
    sleep 3
    STATUS_CODE=`curl -o /dev/null -s -w %{http_code} $MY_URL`
    if [ "$STATUS_CODE" != "200" ];then
        echo "${DATA_TIME}: [WARNING!] Website Downtime! ($SYS_MON)" | tee -a $LOG_PATH
        echo "${DATA_TIME}: [WARNING!] Website Downtime! ($SYS_MON)" | mail -s "Start $NAME failed" $EMAIL
    fi
fi
脚本内容不难理解,原理解释可参考《Linux 进程自动监控shell脚本》

三、注意事项

1、NAME_LIST 指定的监控程序必须在/etc/init.d 文件夹中存在,并且支持stop和start操作
2、PID_CPU_MAX 指的是单个进程的CPU占用,只针对php-fpm或httpd。
3、PID_MEM_SUM_MAX 指的是该程序所有进程实际内存占用,而并非系统总内存。
4、EMAIL 只有在程序启动失败后,你才能收到邮件提醒。

四、更新历史:

2011.11.28: 去掉nginx502状态监控,完善进程cpu监控,修正数据不准确等问题。
2011.12.07: 继续修正cpu监控不正确问题,增加宕机后邮件通知功能。
原文地址 : http://wangyan.org/blog/sys-mon-shell-script.html
本站遵循 : 知识共享署名-非商业性使用-相同方式共享 3.0 版权协议
版权声明 : 原创文章转载时,请务必以超链接形式标明 文章原始出处
作者:WangYan | 分类: | 标签: Null 

没有评论:

发表评论