Prometheus配置文件 · Kubernetes

[TOC] >[info] 说明：本章节只介绍常用的参数 # global全局配置常用参数列表 | 参数 | 参数说明 | | :-: | :-: | | **scrape_interval** | 默认情况下抓取目标的频率，默认1m | | **scrape_timeout** | 抓取请求超时需要多长时间，默认10s | | **evaluation_interval** | 评估规则的频率，默认1m | # rule_files配置文件没有参数，只有添加配置文件。参考下面的示例; 支持正则匹配文件 ```yaml rule_files: - /etc/prometheus/*.rules ``` # scrape_config_files采集文件没有参数，只有添加配置文件。参考下面的示例; 支持正则匹配文件 ```yaml scrape_config_files: - /etc/prometheus/*.target ``` >[info] 文件的内容与scrape_configs参数一致，只需要将scrape_configs内容拷过去就好了。具体配置请查看scrape_configs段落 # scrape_configs采集规则这个是监控的关键，是根据下面的配置来实现监控的。下面列举常用的配置项，请看所有的配置，请查看[Prometheus官方文档](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) | 参数 | 参数说明 | | :-: | :- | | **job_name** | 任务的名称 | | **scrape_interval** | 抓取目标的频率，没有设置则使用全局配置 | | **scrape_timeout** | 抓取请求超时需要多长时间，没有设置则使用全局配置 | | **metrics_path** | 从目标获取指标的 HTTP 资源路径，默认是 `/metrics` | | **scheme** | 配置用于请求的协议方案，默认是http | | **params** | 可选的 HTTP URL 参数 | | **relabel_configs** | 可以在目标被抓取之前动态地重写目标的标签【重要】 | | **basic_auth** | 在每个抓取请求上设置 `Authorization` 标头配置的用户名和密码，password 和 password_file 是互斥的 | | **authorization** | 使用配置的凭据在每个抓取请求上设置 `Authorization` 标头 | | **tls_config** | 配置抓取请求的 TLS 设置 | | **static_configs** | 标记的静态配置的警报管理器列表 | | **file_sd_config** | 文件服务发现配置列表 | | **consul_sd_config** | Consul 服务发现配置列表 | | **docker_sd_config** | Docker 服务发现配置列表 | | **kubernetes_sd_config** | Kubernetes SD 配置允许从 Kubernetes 的 REST API 检索抓取目标并始终与集群状态保持同步 | scrape_configs采集规则有两类： 1. 静态配置(上面列举的倒数第五个就是静态配置)，每次配置后都需要重启Prometheus服务 2. 服务发现(上面列举的后四个都是，其他服务发现的请看官方文档)。prometheus-server自动发现target。无需重启Prometheus服务【推荐】 prometheus 如何工作感知采集地址，路径以及http协议的呢？ 1. 采集地址：分为两类情况 - 静态配置以及基于文件服务发现，都是根据tagers确认IP地址以及端口 - 其他的服务发现是根据 `instance` 标签作为采集metrics数据的地址。当 `instance` 标签不存在时，就使用 `__address__` 标签替代 `instance` 标签 2. 采集http协议：scheme参数设置，默认是http协议。也可以 `__scheme__` 标签设置为采集目标的http协议 3. 采集路径：metrics_path参数配置，默认是/metrics。也可以 ` __metrics_path__` 标签设置为采集目标的metrics路径 ## relabel_configs 配置重新标记是一个强大的工具，可以在目标被抓取之前动态重写目标的标签集。每个抓取配置可以配置多个重新标记步骤。它们按照在配置文件中出现的顺序应用于每个目标的标签集。详细参数 [请参考官方文档](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) | 参数 | 参数说明 | | :-: | :- | | source_labels | 源标签从现有标签中选择值 | | separator | 放置在串联源标签值之间的分隔符，默认值; | | target_label | 在替换操作中将结果值写入的标签 | | regex | 与源标签提取的值相匹配的正则表达式，默认值(.*) | | replacement | 如果正则表达式匹配，则执行正则表达式替换的替换值，默认值$1 | | action | 基于正则表达式匹配执行的操作，默认值replace | > action 常用值 > - replace：将正则表达式与连接的 source_labels 匹配。然后，将 target_label 设置为替换，替换中的匹配组引用 (${1}, ${2}, ...) 替换为它们的值。如果正则表达式不匹配，则不进行替换。 > - keep：删除正则表达式与连接的 source_labels 不匹配的目标。 > - drop：删除正则表达式与连接的 source_labels 匹配的目标。 ## 采集规则配置流程 1. shell命令使用curl测试能获取到metrics数据 a. 确认获取metrics数据的参数 b. 确认地址是静态还是动态发现 2. 添加Prometheus采集metrics数据配置 ## 基于静态配置 **使用shell获取Prometheus监控指标** ```shell $ curl -s localhost:9090/metrics | head # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 3.7762e-05 go_gc_duration_seconds{quantile="0.25"} 0.000101175 go_gc_duration_seconds{quantile="0.5"} 0.00016822 go_gc_duration_seconds{quantile="0.75"} 0.000428428 go_gc_duration_seconds{quantile="1"} 0.00079745 go_gc_duration_seconds_sum 0.002778413 go_gc_duration_seconds_count 11 # HELP go_goroutines Number of goroutines that currently exist. ``` **在 prometheus-target 的configmap添加一个配置文件** ```yaml prometheus.targets: | scrape_configs: - job_name: "prometheus" static_configs: - targets: - "localhost:9090" ``` **验证Prometheus的targets的界面** ![targets01](https://img.kancloud.cn/c2/80/c280c59735f85956d9b68c44b57752d3_1689x209.png) ## 基于文件服务发现基于文件的服务发现提供了一种更通用的方式来配置静态目标，并用作插入自定义服务发现机制的接口。 **使用shell获取node-exporter监控指标** ```shell curl -s 192.168.31.103:9100/metrics | head # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 3.6659e-05 go_gc_duration_seconds{quantile="0.25"} 8.684e-05 go_gc_duration_seconds{quantile="0.5"} 0.00018778 go_gc_duration_seconds{quantile="0.75"} 0.000327928 go_gc_duration_seconds{quantile="1"} 0.092123081 go_gc_duration_seconds_sum 0.200803256 go_gc_duration_seconds_count 50 # HELP go_goroutines Number of goroutines that currently exist. ``` **在 prometheus-target 的configmap添加两个配置文件** ```yaml # 基于文件发现 # 后续有其他基于文件发现的话，都是在这个文件下添加job_name即可 file_discovery.targets: | scrape_configs: - job_name: "node-exporter" file_sd_configs: - files: - /etc/prometheus/target/node-exporter.yml # 刷新间隔，重新读取文件 refresh_interval: 1m # 关于node-exporter增删节点都是操作这个文件 # 无需roload/重启Prometheus服务，即可生效 node-exporter.yml: | - targets: - "192.168.31.103:9100" - "192.168.31.79:9100" - "192.168.31.95:9100" - "192.168.31.78:9100" - "192.168.31.253:9100" ``` **验证Prometheus的targets的界面** ![target02](https://img.kancloud.cn/4d/62/4d62635bcc01b4ac9024d4abfb6d9954_1727x343.png) ## 基于kubernetes服务发现 ### kubernetes node 节点角色为每个集群节点发现一个目标，其地址默认为 **`Kubelet`** 的 HTTP 端口。目标地址默认为NodeInternalIP、NodeExternalIP、NodeLegacyHostIP、NodeHostName的地址类型顺序中Kubernetes节点对象的第一个现有地址，按照顺序往下匹配。匹配成功则赋值 `__address__`的值。详细信息 [请参考官方文档](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#node) ![](https://img.kancloud.cn/bf/7e/bf7e11f5e20615753250846a042e8577_1155x166.png) **使用shell获取 kubelet 监控指标** ```shell # 获取sa的ca.crt证书 kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.ca\.crt}' | base64 -d > /tmp/ca.crt # 获取token TOKEN=$(kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.token}' | base64 -d) # 访问kubelet的metrics数据 curl -k --cacert /tmp/ca.crt -H "Authorization: Bearer ${TOKEN}" https://192.168.32.127:10250/metrics | head # HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0 ``` > 说明curl的参数 > 1. `-k`：关闭curl对证书的验证 > 2. `--cacert`: 提供访问kubelet的ca证书 > 3. `-H "Authorization: Bearer ${TOKEN}"`: 提供访问kueblet的token值 **在 prometheus-target 的configmap添加一个配置文件** ```yaml kubernetes.targets: | scrape_configs: - job_name: "kubelet" scheme: https tls_config: # 对应 curl --cacert 参数 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # 对应 curl -k 参数 insecure_skip_verify: true # 对应 curl -H "Authorization: Bearer ${TOKEN}" 参数 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node ``` ### kubernetes endpoints 端点角色从列出的服务端点中发现目标。对于每个端点地址，每个端口都会发现一个目标。如果端点由 pod 支持，则该 pod 的所有其他容器端口（未绑定到端点端口）也会被发现作为目标。详细参数 [请参考官方文档](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints) **使用shell获取 kube-apiserver 监控指标** ```shell # 获取sa的ca.crt证书 kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.ca\.crt}' | base64 -d > /tmp/ca.crt # 获取token TOKEN=$(kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.token}' | base64 -d) curl -sk --cacert /tmp/ca.crt -H "Authorization: Bearer ${TOKEN}" https://192.168.32.127:6443/metrics | head # HELP aggregator_openapi_v2_regeneration_count [ALPHA] Counter of OpenAPI v2 spec regeneration count broken down by causing APIService name and reason. # TYPE aggregator_openapi_v2_regeneration_count counter aggregator_openapi_v2_regeneration_count{apiservice="*",reason="startup"} 0 aggregator_openapi_v2_regeneration_count{apiservice="k8s_internal_local_delegation_chain_0000000002",reason="update"} 0 # HELP aggregator_openapi_v2_regeneration_duration [ALPHA] Gauge of OpenAPI v2 spec regeneration duration in seconds. # TYPE aggregator_openapi_v2_regeneration_duration gauge aggregator_openapi_v2_regeneration_duration{reason="startup"} 0.016283936 aggregator_openapi_v2_regeneration_duration{reason="update"} 0.021537866 # HELP aggregator_unavailable_apiservice [ALPHA] Gauge of APIServices which are marked as unavailable broken down by APIService name. # TYPE aggregator_unavailable_apiservice gauge ``` **Prometheus配置文件下的scrape_configs配置项或 scrape_config_files配置文件** ```yaml - job_name: "kube-apiserver" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: endpoints # __meta_kubernetes_namespace=default # __meta_kubernetes_endpoints_name=kubernetes # __meta_kubernetes_endpoint_port_name=https # 所有的endpoint符合上述的规则则保留 relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https ``` **验证Prometheus的targets的界面** ![](https://img.kancloud.cn/76/e1/76e1910ae914928dd35da110e2e075eb_1653x225.png) ### kubernetes pod pod 角色会发现所有 pod 并将其容器公开为目标。对于容器的每个声明端口，都会生成一个目标。如果容器没有指定端口，则会为每个容器创建一个无端口目标，以便通过重新标记手动添加端口。详细参数 [请参考官方文档](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod) **使用shell获取 cilium 监控指标** ```shell $ kubectl -n kube-system get pod -owide -l app.kubernetes.io/name=cilium-operator NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cilium-operator-584b8c6b7-znlwd 1/1 Running 3 (29h ago) 11d 192.168.32.128 192.168.32.128 <none> <none> $ curl -s 192.168.32.128:9963/metrics | head # HELP cilium_operator_ces_queueing_delay_seconds CiliumEndpointSlice queueing delay in seconds # TYPE cilium_operator_ces_queueing_delay_seconds histogram cilium_operator_ces_queueing_delay_seconds_bucket{le="0.005"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.01"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.025"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.05"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.1"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.25"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.5"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="1"} 0 ``` **Prometheus配置文件下的scrape_configs配置项或 scrape_config_files配置文件** ```yaml - job_name: "Service/cilium" kubernetes_sd_configs: - role: pod relabel_configs: # annotation 有 prometheus_io_scrape=true 参数和容器名称为cilium开头以及容器端口名称为 Prometheus 的 pod - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_name] action: keep regex: true;cilium-.+;prometheus # 拼接采集metrics地址 - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(\d+);(\d+) replacement: ${1}:${3} target_label: __address__ # 保留以下的标签集。第一个完整配置，其余都是省略写法 - source_labels: [__meta_kubernetes_namespace] action: replace regex: (.+) replacement: $1 target_label: namespace - source_labels: [__meta_kubernetes_pod_container_name] target_label: pod_container_name - source_labels: [__meta_kubernetes_pod_controller_kind] target_label: pod_controller_kind - source_labels: [__meta_kubernetes_pod_name] target_label: pod_name - source_labels: [__meta_kubernetes_pod_node_name] target_label: pod_node_name ``` **验证Prometheus的targets的界面** ![](https://img.kancloud.cn/6f/39/6f390608fc0174cd073427c80f580e04_1899x845.png)