## configuration 配置
---
[Alertmanager](https://github.com/prometheus/alertmanager)通过命令行和一个配置文件配置。命令行配置不可变的系统参数,而配置文件定义inhibiton规则,通知路由和通知接收者。
[可视化编辑器](https://prometheus.io/webtools/alerting/routing-tree-editor)可以帮助构建路由树。
如果想要查看所有命令,请使用命令`alertmanager -h`。
`Alertmanager`能够在运行时动态加载配置文件。如果新的配置有错误,则配置中的变化不会生效,同时错误日志被输出到终端。通过发送`SIGHUP`信号量给这个进程,或者通过HTTP POST请求`/-/reload`,Alertmanager配置动态加载到内存。
### 配置文件
使用`-config.file`指定要加载的配置文件
> ./alertmanager -config.file=simple.yml
这个配置文件使用`yaml`格式编写的,括号表示参数是可选的,对于非列表参数,该值将设置为指定的默认值。
- `<duration>`: 与正则表达式匹配的持续时间`[0-9]+(ms|[smhdwy])`
- `<labeltime>`: 与正则表达式匹配的字符串`[a-zA-Z_][a-zA-Z0-9_]*`
- `<filepath>`: 当前工作目录下的有效路径
- `<boolean>`: 布尔值: `false` 或者 `true`。
- `<string>`: 常规字符串
- `<tmpl_string>`: 一个在使用前被模板扩展的字符串
其他占位符被分开指定, 一个有效的示例,点击[这里](https://github.com/prometheus/alertmanager/blob/master/doc/examples/simple.yml)
全局配置指定的参数在所有其他上下文配置中是有效的。它们也作为其他区域的默认值。
```
global:
# ResolveTimeout is the time after which an alert is declared resolved
# if it has not been updated.
[ resolve_timeout: <duration> | default = 5m ]
# The default SMTP From header field.
[ smtp_from: <tmpl_string> ]
# The default SMTP smarthost used for sending emails.
[ smtp_smarthost: <string> ]
# SMTP authentication information.
[ smtp_auth_username: <string> ]
[ smtp_auth_password: <string> ]
[ smtp_auth_secret: <string> ]
# The default SMTP TLS requirement.
[ smtp_require_tls: <bool> | default = true ]
# The API URL to use for Slack notifications.
[ slack_api_url: <string> ]
[ pagerduty_url: <string> | default = "https://events.pagerduty.com/generic/2010-04-15/create_event.json" ]
[ opsgenie_api_host: <string> | default = "https://api.opsgenie.com/" ]
[ hipchat_url: <string> | default = "https://api.hipchat.com/" ]
[ hipchat_auth_token: <string> ]
# Files from which custom notification template definitions are read.
# The last component may use a wildcard matcher, e.g. 'templates/*.tmpl'.
templates:
[ - <filepath> ... ]
# The root node of the routing tree.
route: <route>
# A list of notification receivers.
receivers:
- <receiver> ...
# A list of inhibition rules.
inhibit_rules:
[ - <inhibit_rule> ... ]
```
### <route>
一个路由块在路由树和它的孩子中定义了一个节点。如果不设置,它的可选配置参数从父节点中继承其值。
每个警报在已配置路由树的顶部节点,这个节点必须匹配所有警报。然后遍历所有的子节点。如果`continue`设置成`false`, 当匹配到第一个孩子时,它会停止下来;如果`continue`设置成`true`, 则警报将继续匹配后续的兄弟姐妹节点。如果一个警报不匹配一个节点的任何孩子,这个警报将会基于当前节点的配置参数来处理警报。
```
[ receiver: <string> ]
[ group_by: '[' <labelname>, ... ']' ]
# Whether an alert should continue matching subsequent sibling nodes.
[ continue: <boolean> | default = false ]
# A set of equality matchers an alert has to fulfill to match the node.
match:
[ <labelname>: <labelvalue>, ... ]
# A set of regex-matchers an alert has to fulfill to match the node.
match_re:
[ <labelname>: <regex>, ... ]
# How long to initially wait to send a notification for a group
# of alerts. Allows to wait for an inhibiting alert to arrive or collect
# more initial alerts for the same group. (Usually ~0s to few minutes.)
[ group_wait: <duration> ]
# How long to wait before sending notification about new alerts that are
# in are added to a group of alerts for which an initial notification
# has already been sent. (Usually ~5min or more.)
[ group_interval: <duration> ]
# How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more).
[ repeat_interval: <duration> ]
# Zero or more child routes.
routes:
[ - <route> ... ]
```
#### 示例
```
# The root route with all parameters, which are inherited by the child
# routes if they are not overwritten.
route:
receiver: 'default-receiver'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [cluster, alertname]
# All alerts that do not match the following child routes
# will remain at the root node and be dispatched to 'default-receiver'.
routes:
# All alerts with service=mysql or service=cassandra
# are dispatched to the database pager.
- receiver: 'database-pager'
group_wait: 10s
match_re:
service: mysql|cassandra
# All alerts with the team=frontend label match this sub-route.
# They are grouped by product and environment rather than cluster
# and alertname.
- receiver: 'frontend-pager'
group_by: [product, environment]
match:
team: frontend
```
### <inhibit_rule>
一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。
```
# Matchers that have to be fulfilled in the alerts to be muted.
target_match:
[ <labelname>: <labelvalue>, ... ]
target_match_re:
[ <labelname>: <regex>, ... ]
# Matchers for which one or more alerts have to exist for the
# inhibition to take effect.
source_match:
[ <labelname>: <labelvalue>, ... ]
source_match_re:
[ <labelname>: <regex>, ... ]
# Labels that must have an equal value in the source and target
# alert for the inhibition to take effect.
[ equal: '[' <labelname>, ... ']' ]
```
### <receiver>
接收者是一个或者多个通知集成的命名配置
**Alertmanager在v0.0.4中可用的其他接收器尚未实现。我们乐意接受任何贡献,并将其添加到新的实现中**
```
# The unique name of the receiver.
name: <string>
# Configurations for several notification integrations.
email_configs:
[ - <email_config>, ... ]
hipchat_configs:
[ - <hipchat_config>, ... ]
pagerduty_configs:
[ - <pagerduty_config>, ... ]
pushover_configs:
[ - <pushover_config>, ... ]
slack_configs:
[ - <slack_config>, ... ]
opsgenie_configs:
[ - <opsgenie_config>, ... ]
webhook_configs:
[ - <webhook_config>, ... ]
```
### <email_config>
```
# Whether or not to notify about resolved alerts.
[ send_resolved: <boolean> | default = false ]
# The email address to send notifications to.
to: <tmpl_string>
# The sender address.
[ from: <tmpl_string> | default = global.smtp_from ]
# The SMTP host through which emails are sent.
[ smarthost: <string> | default = global.smtp_smarthost ]
# SMTP authentication information.
[ auth_username: <string> ]
[ auth_password: <string> ]
[ auth_secret: <string> ]
[ auth_identity: <string> ]
[ require_tls: <bool> | default = global.smtp_require_tls ]
# The HTML body of the email notification.
[ html: <tmpl_string> | default = '{{ template "email.default.html" . }}' ]
# Further headers email header key/value pairs. Overrides any headers
# previously set by the notification implementation.
[ headers: { <string>: <tmpl_string>, ... } ]
```
### <hipchat_config>
```
# Whether or not to notify about resolved alerts.
[ send_resolved: <boolean> | default = false ]
# The HipChat Room ID.
room_id: <tmpl_string>
# The auth token.
[ auth_token: <string> | default = global.hipchat_auth_token ]
# The URL to send API requests to.
[ url: <string> | default = global.hipchat_url ]
# See https://www.hipchat.com/docs/apiv2/method/send_room_notification
# A label to be shown in addition to the sender's name.
[ from: <tmpl_string> | default = '{{ template "hipchat.default.from" . }}' ]
# The message body.
[ message: <tmpl_string> | default = '{{ template "hipchat.default.message" . }}' ]
# Whether this message should trigger a user notification.
[ notify: <boolean> | default = false ]
# Determines how the message is treated by the alertmanager and rendered inside HipChat. Valid values are 'text' and 'html'.
[ message_format: <string> | default = 'text' ]
# Background color for message.
[ color: <tmpl_string> | default = '{{ if eq .Status "firing" }}red{{ else }}green{{ end }}' ]
```
### <pagerduty_config>
通过PagerDuty ApI发送通知:
```
# Whether or not to notify about resolved alerts.
[ send_resolved: <boolean> | default = true ]
# The PagerDuty service key.
service_key: <tmpl_string>
# The URL to send API requests to
[ url: <string> | default = global.pagerduty_url ]
# The client identification of the Alertmanager.
[ client: <tmpl_string> | default = '{{ template "pagerduty.default.client" . }}' ]
# A backlink to the sender of the notification.
[ client_url: <tmpl_string> | default = '{{ template "pagerduty.default.clientURL" . }}' ]
# A description of the incident.
[ description: <tmpl_string> | default = '{{ template "pagerduty.default.description" .}}' ]
# A set of arbitrary key/value pairs that provide further detail
# about the incident.
[ details: { <string>: <tmpl_string>, ... } | default = {
firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
num_firing: '{{ .Alerts.Firing | len }}'
num_resolved: '{{ .Alerts.Resolved | len }}'
} ]
```
### <pushover_config>
通过PUSHover API发送通知:
```
# The recipient user’s user key.
user_key: <string>
# Your registered application’s API token, see https://pushover.net/apps
token: <string>
# Notification title.
[ title: <tmpl_string> | default = '{{ template "pushover.default.title" . }}' ]
# Notification message.
[ message: <tmpl_string> | default = '{{ template "pushover.default.message" . }}' ]
# A supplementary URL shown alongside the message.
[ url: <tmpl_string> | default = '{{ template "pushover.default.url" . }}' ]
# Priority, see https://pushover.net/api#priority
[ priority: <tmpl_string> | default = '{{ if eq .Status "firing" }}2{{ else }}0{{ end }}' ]
# How often the Pushover servers will send the same notification to the user.
# Must be at least 30 seconds.
[ retry: <duration> | default = 1m ]
# How long your notification will continue to be retried for, unless the user
# acknowledges the notification.
[ expire: <duration> | default = 1h ]
```
### <slack_config>
通过Slack webhooks发送通知:
```
# Whether or not to notify about resolved alerts.
[ send_resolved: <boolean> | default = false ]
# The Slack webhook URL.
[ api_url: <string> | default = global.slack_api_url ]
# The channel or user to send notifications to.
channel: <tmpl_string>
# API request data as defined by the Slack webhook API.
[ color: <tmpl_string> | default = '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' ]
[ username: <tmpl_string> | default = '{{ template "slack.default.username" . }}'
[ title: <tmpl_string> | default = '{{ template "slack.default.title" . }}' ]
[ title_link: <tmpl_string> | default = '{{ template "slack.default.titlelink" . }}' ]
[ icon_emoji: <tmpl_string> ]
[ icon_url: <tmpl_string> ]
[ pretext: <tmpl_string> | default = '{{ template "slack.default.pretext" . }}' ]
[ text: <tmpl_string> | default = '{{ template "slack.default.text" . }}' ]
[ fallback: <tmpl_string> | default = '{{ template "slack.default.fallback" . }}' ]
```
### <opsgenie_config>
通过OpsGenie API发送通知:
```
# Whether or not to notify about resolved alerts.
[ send_resolved: <boolean> | default = true ]
# The API key to use when talking to the OpsGenie API.
api_key: <string>
# The host to send OpsGenie API requests to.
[ api_host: <string> | default = global.opsgenie_api_host ]
# A description of the incident.
[ description: <tmpl_string> | default = '{{ template "opsgenie.default.description" . }}' ]
# A backlink to the sender of the notification.
[ source: <tmpl_string> | default = '{{ template "opsgenie.default.source" . }}' ]
# A set of arbitrary key/value pairs that provide further detail
# about the incident.
[ details: { <string>: <tmpl_string>, ... } ]
# Comma separated list of team responsible for notifications.
[ teams: <tmpl_string> ]
# Comma separated list of tags attached to the notifications.
[ tags: <tmpl_string> ]
```
### <webhook_config>
webhook接收者允许配置一个通用的接收者
```
# Whether or not to notify about resolved alerts.
[ send_resolved: <boolean> | default = true ]
# The endpoint to send HTTP POST requests to.
url: <string>
```
Alertmanager通过HTTP POST请求发送json格式的数据到配置端点:
```
{
"version": "3",
"groupKey": <number> // key identifying the group of alerts (e.g. to deduplicate)
"status": "<resolved|firing>",
"receiver": <string>,
"groupLabels": <object>,
"commonLabels": <object>,
"commonAnnotations": <object>,
"externalURL": <string>, // backling to the Alertmanager.
"alerts": [
{
"labels": <object>,
"annotations": <object>,
"startsAt": "<rfc3339>",
"endsAt": "<rfc3339>"
},
...
]
}
```