💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
[TOC] # 监控kubernetes基础组件示例 ## master节点 由于 kube-controller-manager,kube-scheduler 以及 etcd 的 metrics 都是暴露在 127.0.0.1 地址上。所以这里为了方便就使用 haproxy 代理网络使的网络可达。 **haproxy打通metrics网络** ```shell cat <<'EOF' | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: metrics-proxy namespace: kube-system data: # haproxy配置文件 haproxy.cfg: | global log stdout local2 info defaults mode tcp log global option tcplog maxconn 100 timeout connect 5s timeout client 30s timeout server 30s # 暴露 metrics 数据 frontend metrics bind *:8405 mode http http-request use-service prometheus-exporter if { path /metrics } no log listen etcd bind *:12381 tcp-request connection reject if !{ src -f /usr/local/etc/haproxy/whitelist.lst } server server1 127.0.0.1:2381 check listen kube-controller-manager bind *:20257 tcp-request connection reject if !{ src -f /usr/local/etc/haproxy/whitelist.lst } server server1 127.0.0.1:10257 check listen kube-scheduler bind *:20259 tcp-request connection reject if !{ src -f /usr/local/etc/haproxy/whitelist.lst } server server1 127.0.0.1:10259 check # 网络放通地址,Prometheus运行节点以及podIP地址【为了方便,放通整个集群以及podCICD地址段】 whitelist.lst: | 192.168.32.127 192.168.32.128 192.168.32.129 # podCICD网段 10.0.0.0/8 EOF cat <<'EOF' | kubectl apply -f - apiVersion: apps/v1 kind: DaemonSet metadata: name: metrics-proxy namespace: kube-system spec: selector: matchLabels: app: metrics-proxy template: metadata: labels: app: metrics-proxy spec: containers: - name: metrics-proxy image: haproxy:2.8-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: conf mountPath: /usr/local/etc/haproxy hostNetwork: true nodeSelector: node-role.kubernetes.io/master: "" volumes: - name: conf configMap: name: metrics-proxy EOF ``` **kube-apiserver** ```yaml - job_name: "k8s/kube-apiserver" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https ``` **kube-controller-manager** ```yaml - job_name: "k8s/kube-controller-manager" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master] action: keep - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:20257 ``` > 注意:master节点必须要有 node-role.kubernetes.io/master 标签 **kube-scheduler** ```yaml - job_name: "k8s/kube-scheduler" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master] action: keep - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:20259 ``` > 注意:master节点必须要有 node-role.kubernetes.io/master 标签 **etcd** ```yaml - job_name: "k8s/etcd" kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master] action: keep - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:12381 ``` > 注意:master节点必须要有 node-role.kubernetes.io/master 标签 **metrics-proxy** ```yaml - job_name: "k8s/metrics-proxy" tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master] action: keep - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:8405 ``` ## node节点 **kubelet** >[info] 两种方式获取metrics数据,任选其中一种就好了 ```yaml # 使用kubelet地址获取metrics数据 - job_name: "k8s/kubelet" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node # 使用apiserver地址获取kubelet metrics数据 - job_name: "k8s/kubelet" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] action: replace regex: (.+) replacement: /api/v1/nodes/$1/proxy/metrics target_label: __metrics_path__ ``` **containers** ```yaml - job_name: "k8s/cadvisor" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # 监控k8s的所有节点容器 kubernetes_sd_configs: - role: node metrics_path: /metrics/cadvisor relabel_configs: - regex: __meta_kubernetes_node_label_(.+) action: labelmap ``` **kube-proxy** >[info] 默认kube-proxy metrics地址也是127.0.0.1.这里将地址暴露成 0.0.0.0.0:10249 ```shell # 修改kube-proxy配置文件 $ kubectl -n kube-system edit cm # 将 metricsBindAddress 参数改成以下格式 metricsBindAddress: "0.0.0.0:10249" # 重启kube-proxy $ kubectl -n kube-system rollout restart ds/kube-proxy daemonset.apps/kube-proxy restarted ``` ```yaml # kube-proxy服务的scheme是http - job_name: "k8s/kube-proxy" kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:10249 ``` ## k8s生态插件监控 **calico** 默认没有暴露metrics端口,需要设置开启metrics接口 ```shell $ kubectl -n kube-system edit ds calico-node 1. 暴露metrics接口,calico-node 的 spec.template.spec.containers.env 下添加一段下面的内容 - name: FELIX_PROMETHEUSMETRICSENABLED value: "True" - name: FELIX_PROMETHEUSMETRICSPORT value: "9091" 2. calico-node 的 spec.template.spec.containers 下添加一段下面的内容 ports: - containerPort: 9091 name: http-metrics protocol: TCP ``` ```yaml - job_name: "k8s/calico" kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:9091 ``` **cilium** 通过helm安装cilium插件的话,可以添加以下配置开启监控 ```shell # 备份 cilium 安装参数 $ helm -n kube-system get values cilium > /tmp/cilium-values.yml $ sed -ri '1d' /tmp/cilium-values.yml # 添加以下配置开启监控 $ vim /tmp/cilium-values.yml operator: prometheus: enabled: true prometheus: enabled: true # 生效配置文件 $ helm -n kube-system upgrade cilium -f /tmp/cilium-values.yml Release "cilium" has been upgraded. Happy Helming! ``` ```yaml - job_name: "k8s/cilium" kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_name] action: keep regex: true;cilium-.+;prometheus - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(\d+);(\d+) replacement: ${1}:${3} target_label: __address__ - source_labels: [__meta_kubernetes_namespace] action: replace regex: (.+) replacement: $1 target_label: namespace - source_labels: [__meta_kubernetes_pod_container_name] target_label: pod_container_name - source_labels: [__meta_kubernetes_pod_controller_kind] target_label: pod_controller_kind - source_labels: [__meta_kubernetes_pod_name] target_label: pod_name - source_labels: [__meta_kubernetes_pod_node_name] target_label: pod_node_name ``` **coredns** ```yaml - job_name: "k8s/coredns" kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: kube-system;kube-dns;metrics ``` **ingress-nginx** 通过helm安装ingress-nginx插件的话,可以添加以下配置开启监控 ```shell # 备份ingress-nginx安装参数 $ helm -n kube-system get values ingress-nginx > /tmp/ingress-nginx-values.yml $ sed -ri '1d' /tmp/ingress-nginx-values.yml # 添加以下配置开启监控 $ vim /tmp/ingress-nginx-values.yml controller: metrics: enabled: true port: 10254 # 生效配置文件 $ helm -n kube-system upgrade ingress-nginx -f /tmp/ingress-nginx-values.yml Release "ingress-nginx" has been upgraded. Happy Helming! # 确认endpoints信息 $ kubectl -n kube-system get endpoints ingress-nginx-controller-metrics NAME ENDPOINTS AGE ingress-nginx-controller-metrics 192.168.32.127:10254,192.168.32.128:10254,192.168.32.129:10254 24d ``` ```yaml - job_name: "k8s/ingress-nginx" kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: kube-system;ingress-nginx-controller-metrics;metrics ```