ThinkChat🤖让你学习和工作更高效,注册即送10W Token,即刻开启你的AI之旅 广告
[TOC] ## 监控 exporter 1. 如果 exporter 超过五分钟无法连接,则警告 ## 配置 prometheus.yaml ``` rule_files: - rules/*.rules ``` ## 配置 rules/rules.yml ``` groups: - name: monitor_exporter rules: # 对任何超过5分钟无法联系的实例发出警报 - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes." ``` - InstanceDown 表示当实例宕机时(up === 0)触发告警 - APIHighRequestLatency 表示有一半的 API 请求延迟大于 1s 时(api\_http\_request\_latencies\_second{quantile="0.5"} > 1)触发告警