[TOC]
> 注意:
> - Prometheus使用普通用户启动,注意创建文件的用户
> - 多个节点Prometheus文件中的target配置保持一致
## 静态监控
```yaml
- job_name: "Prometheus"
static_configs:
- "localhost:9090"
```
## 基于文件服务发现
1. 创建target目标
```yaml
- job_name: "node-exporter"
file_sd_configs:
- files:
- "targets/node-exporter.yml"
# 刷新间隔以重新读取文件
refresh_interval: 1m
```
2. 创建监控文件
```shell
mkdir /data/prometheus/targets
cat <<-EOF | sudo tee /data/prometheus/targets/node-exporter.yml > /dev/null
- targets:
- 192.168.31.103:9100
- 192.168.31.79:9100
- 192.168.31.95:9100
- 192.168.31.78:9100
- 192.168.31.253:9100
EOF
chown -R ops. /data/prometheus
```
3. 热加载配置文件
```shell
sudo systemctl reload prometheus
```
4. 将文件同步给其他节点
```shell
# 主配置文件 及 文件发现目录
cd /data/prometheus && scp -r prometheus.yml targets ops@k8s-master02:/data/prometheus
# 修改其他节点特有的labal
ssh ops@k8s-master02 "sed -ri 's@(replica).*@\1: B@g' /data/prometheus/prometheus.yml"
# 检测配置文件
ssh ops@k8s-master02 "promtool check config /data/prometheus/prometheus.yml"
# 热加载配置文件
ssh ops@k8s-master02 "sudo systemctl reload prometheus"
```
## 基于kubernetes服务发现
> 由于 thanos 是二进制部署的,需要在 kubernetes 集群上创建 sa 的相关监控权限
1. 创建Prometheus监控kubernetes集群的权限(k8s master节点执行)
```yaml
cat <<-EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
EOF
```
2. 获取监控kubernetes的token(k8s master节点执行)
```shell
kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -o jsonpath={.secrets[0].name}` -ojsonpath={.data.token} | base64 --decode > /data/prometheus/token
```
3. 示例(thanos节点)
```yaml
- job_name: "Service/kube-apiserver"
scheme: https
tls_config:
insecure_skip_verify: true
# 上面获取的token
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: endpoints
# 访问集群的入口
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
```
4. 热加载配置文件
```shell
sudo systemctl reload prometheus
```
5. 将文件同步给其他节点
```shell
# 主配置文件 及 文件发现目录
cd /data/prometheus && scp -r prometheus.yml targets ops@k8s-master02:/data/prometheus
# 修改其他节点特有的labal
ssh ops@k8s-master02 "sed -ri 's@(replica): .*@\1: B@g' /data/prometheus/prometheus.yml"
# 检测配置文件
ssh ops@k8s-master02 "promtool check config /data/prometheus/prometheus.yml"
# 热加载配置文件
ssh ops@k8s-master02 "sudo systemctl reload prometheus"
```
## 监控kubernetes(完整版)
> 下面有证书,token,文件发现目录等等,需要自行手工创建或者拷贝,这里只是主配文件示例
```yaml
scrape_configs:
# 基于文件服务发现
- job_name: "node-exporter"
file_sd_configs:
- files:
- "targets/node-exporter.yml"
# 刷新间隔以重新读取文件
refresh_interval: 1m
relabel_configs:
metric_relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: instance
replacement: $1
# 基于kubernetes服务发现
- job_name: "Service/kube-apiserver"
scheme: https
tls_config:
insecure_skip_verify: true
# 请参考上面方式创建token
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: "Service/kube-controller-manager"
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
regex: true
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:10257
- job_name: "Service/kube-scheduler"
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
regex: true
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:10259
- job_name: "Service/kubelet"
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
- job_name: "Service/kube-proxy"
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:10249
- job_name: "Service/etcd"
scheme: https
tls_config:
ca_file: targets/certs/ca.pem
cert_file: targets/certs/etcd.pem
key_file: targets/certs/etcd-key.pem
insecure_skip_verify: true
file_sd_configs:
- files:
- targets/etcd.yml
- job_name: "Service/calico"
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:9091
- job_name: "Service/coredns"
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-dns;metrics
- job_name: "Service/ingress-nginx"
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: ingress-nginx;ingress-nginx-metrics;metrics
- job_name: "kube-state-metrics"
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-state-metrics;http-metrics
- job_name: "service-http-probe"
scrape_interval: 1m
metrics_path: /probe
# 使用blackbox exporter配置文件的http_2xx的探针
params:
module: [ http_2xx ]
kubernetes_sd_configs:
- role: service
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
# 保留service注释有prometheus.io/scrape: true和prometheus.io/http-probe: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_http_probe]
action: keep
regex: true;true
# 将原标签名__meta_kubernetes_service_name改成service_name
- source_labels: [__meta_kubernetes_service_name]
action: replace
regex: (.*)
target_label: service_name
# 将原标签名__meta_kubernetes_namespace改成namespace
- source_labels: [__meta_kubernetes_namespace]
action: replace
regex: (.*)
target_label: namespace
# 将instance改成 `clusterIP:port` 地址
- source_labels: [__meta_kubernetes_service_cluster_ip, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port, __meta_kubernetes_service_annotation_pretheus_io_http_probe_path]
action: replace
regex: (.*);(.*);(.*)
target_label: __param_target
replacement: $1:$2$3
- source_labels: [__param_target]
target_label: instance
# 将__address__的值改成 `blackbox-exporter:9115`
- target_label: __address__
replacement: blackbox-exporter:9115
- job_name: "service-tcp-probe"
scrape_interval: 1m
metrics_path: /probe
# 使用blackbox exporter配置文件的tcp_connect的探针
params:
module: [ tcp_connect ]
kubernetes_sd_configs:
- role: service
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
# 保留prometheus.io/scrape: "true"和prometheus.io/tcp-probe: "true"的service
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_tcp_probe]
action: keep
regex: true;true
# 将原标签名__meta_kubernetes_service_name改成service_name
- source_labels: [__meta_kubernetes_service_name]
action: replace
regex: (.*)
target_label: service_name
# 将原标签名__meta_kubernetes_service_name改成service_name
- source_labels: [__meta_kubernetes_namespace]
action: replace
regex: (.*)
target_label: namespace
# 将instance改成 `clusterIP:port` 地址
- source_labels: [__meta_kubernetes_service_cluster_ip, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port]
action: replace
regex: (.*);(.*)
target_label: __param_target
replacement: $1:$2
- source_labels: [__param_target]
target_label: instance
# 将__address__的值改成 `blackbox-exporter:9115`
- target_label: __address__
replacement: blackbox-exporter:9115
```
- 前言
- 架构
- 部署
- kubeadm部署
- kubeadm扩容节点
- 二进制安装基础组件
- 添加master节点
- 添加工作节点
- 选装插件安装
- Kubernetes使用
- k8s与dockerfile启动参数
- hostPort与hostNetwork异同
- 应用上下线最佳实践
- 进入容器命名空间
- 主机与pod之间拷贝
- events排序问题
- k8s会话保持
- 容器root特权
- CNI插件
- calico
- calicoctl安装
- calico网络通信
- calico更改pod地址范围
- 新增节点网卡名不一致
- 修改calico模式
- calico数据存储迁移
- 启用 kubectl 来管理 Calico
- calico卸载
- cilium
- cilium架构
- cilium/hubble安装
- cilium网络路由
- IP地址管理(IPAM)
- Cilium替换KubeProxy
- NodePort运行DSR模式
- IP地址伪装
- ingress使用
- nginx-ingress
- ingress安装
- ingress高可用
- helm方式安装
- 基本使用
- Rewrite配置
- tls安全路由
- ingress发布管理
- 代理k8s集群外的web应用
- ingress自定义日志
- ingress记录真实IP地址
- 自定义参数
- traefik-ingress
- traefik名词概念
- traefik安装
- traefik初次使用
- traefik路由(IngressRoute)
- traefik中间件(middlewares)
- traefik记录真实IP地址
- cert-manager
- 安装教程
- 颁布者CA
- 创建证书
- 外部存储
- 对接NFS
- 对接ceph-rbd
- 对接cephfs
- 监控平台
- Prometheus
- Prometheus安装
- grafana安装
- Prometheus配置文件
- node_exporter安装
- kube-state-metrics安装
- Prometheus黑盒监控
- Prometheus告警
- grafana仪表盘设置
- 常用监控配置文件
- thanos
- Prometheus
- Sidecar组件
- Store Gateway组件
- Querier组件
- Compactor组件
- Prometheus监控项
- grafana
- Querier对接grafana
- alertmanager
- Prometheus对接alertmanager
- 日志中心
- filebeat安装
- kafka安装
- logstash安装
- elasticsearch安装
- elasticsearch索引生命周期管理
- kibana安装
- event事件收集
- 资源预留
- 节点资源预留
- imagefs与nodefs验证
- 资源预留 vs 驱逐 vs OOM
- scheduler调度原理
- Helm
- Helm安装
- Helm基本使用
- 安全
- apiserver审计日志
- RBAC鉴权
- namespace资源限制
- 加密Secret数据
- 服务网格
- 备份恢复
- Velero安装
- 备份与恢复
- 常用维护操作
- container runtime
- 拉取私有仓库镜像配置
- 拉取公网镜像加速配置
- runtime网络代理
- overlay2目录占用过大
- 更改Docker的数据目录
- Harbor
- 重置Harbor密码
- 问题处理
- 关闭或开启Harbor的认证
- 固定harbor的IP地址范围
- ETCD
- ETCD扩缩容
- ETCD常用命令
- ETCD数据空间压缩清理
- ingress
- ingress-nginx header配置
- kubernetes
- 验证yaml合法性
- 切换KubeProxy模式
- 容器解析域名
- 删除节点
- 修改镜像仓库
- 修改node名称
- 升级k8s集群
- 切换容器运行时
- apiserver接口
- 其他
- 升级内核
- k8s组件性能分析
- ETCD
- calico
- calico健康检查失败
- Harbor
- harbor同步失败
- Kubernetes
- 资源Terminating状态
- 启动容器报错