Prometheus配置AlertManager钉钉告警方法

AlertManager主要用于接收Prometheus发送的告警信息,支持丰富的告警通知渠道,例如邮件、微信、钉钉、Slack等常用的沟通工具。

配置钉钉告警接口

  1. 创建一个用于接收告警的钉钉群
  2. 通过“智能群助手”添加机器人
  3. 创建一个“通过Webhook接入”的自定义服务
  4. 在设置中填写服务器IP地址
  5. 保存Webhook地址是token

配置钉钉插件

访问钉钉插件地址检查是否有更新版本,然后下载并启动插件:

cd /opt/prometheus
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
tar zxvf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
cd prometheus-webhook-dingtalk-1.4.0.linux-amd64
# 获取Webhook的token后即可启动
./prometheus-webhook-dingtalk --ding.profile="dingding=https://oapi.dingtalk.com/robot/send?access_token=xxxxxx"

配置AlertManager

修改AlertManager配置文件alertmanager.yml,修改或添加内容:

...
route:
  group_by: [alertname]
  receiver: dingding   #默认告警接收人
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 1h
  routes:

  - receiver: 'dingding'  #告警接收人
    group_wait: 30s
    match_re:
      alertname: 内存使用率|CPU使用率|磁盘使用率

  - receiver: 'dingding_high'  #告警接收人
    group_wait: 30s
    match_re:
      alertname: 主机失去联系
...
receivers:
  - name: 'dingding'
    webhook_configs:
    - url: 'http://10.10.200.201:8060/dingtalk/dingding/send'
      send_resolved: true

- name: dingding_high
  webhook_configs:
  - url: http://10.10.200.201:8060/dingtalk/dingding_high/send 
    send_resolved: true

# 当与另一组匹配器的规则匹配时,仅其中一组失效,前提是两个告警组必须有相同的标签
inhibit_rules: 
  - source_match: 
      severity: 'critical' 
    target_match: 
      severity: 'warning' 
    equal: ['alertname', 'dev', 'instance']

配置告警规则

修改rules/memory_usage.yml

groups:
- name: 内存告警规则
  rules:
  - alert: 内存使用率
    expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes )) / node_memory_MemTotal_bytes * 100 > 75
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "内存使用率超过75% (当前:{{ $value }}%)"

修改rules/cpu_usage.yml

groups:
- name: CPU告警规则
  rules:
  - alert: CPU使用率
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 80
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "服务器CPU使用率超过80% (当前:{{ $value }}%)"

修改rules/disk_usage.yml

groups:
- name: 磁盘告警规则
  rules:
  - alert: 磁盘使用率
    expr: (node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100 > 90
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "磁盘分区{{ $labels.mountpoint }}空间使用率超过90% (当前: {{ $value }}%)"

修改rules/node_disconnected.yml

groups:
- name: 主机告警规则
  rules:
  - alert: 主机失去联系
    expr: up == 0
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      description: "主机{{ $labels.instance }}已经失去联系超过1分钟"

测试告警

# 执行以下命令可以测试通过机器人发送消息
curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=xxxxxx

因为钉钉的安全限制,发送消息时必须包含prometheus,否则prometheus-webhook-dingtalk会报422错误。

原创文章禁止转载:技术学堂 » Prometheus配置AlertManager钉钉告警方法

精彩评论

2+5=

感谢您的支持与鼓励

支付宝扫一扫打赏

微信扫一扫打赏