**Prometheus****的组件:**
Prometheus生态由多个组件组成,并且这些组件大部分是可选的:
* **Prometheus****服务器**,用于获取和存储时间序列数据;
* 仪表应用数据的客户端类库(**Client Library**)
* 支持临时性工作的**推网关****(Push Gateway)**
* 特殊目的的**输出者****(Exporter)**,提供被监控组件信息的 HTTP 接口,例如HAProxy、StatsD、MySQL、Nginx和Graphite等服务都有现成的输出者接口
* 处理告警的**告警管理器(****Alert Manager****)**
* 其它支持工具
**Prometheus****的整体架构**
![pastedGraphic.png](blob:https://www.kancloud.cn/3370d151-9d21-477b-9e89-d79f6a042c34)
Prometheus的整体工作流程:
1)Prometheus 服务器定期从配置好的 jobs 或者 exporters 中获取度量数据;或者接收来自推送网关发送过来的 度量数据。
2)Prometheus 服务器在本地存储收集到的度量数据,并对这些数据进行聚合;
3)运行已定义好的 alert.rules,记录新的时间序列或者向告警管理器推送警报。
4)告警管理器根据配置文件,对接收到的警报进行处理,并通过email等途径发出告警。
5)Grafana等图形工具获取到监控数据,并以图形化的方式进行展示。
**数据模型**
Prometheus从根本上将所有数据存储为时间序列:属于相同度量标准和同一组标注尺寸的时间戳值流。除了存储的时间序列之外,普罗米修斯可能会生成临时派生时间序列作为查询的结果。
**度量名称和标签**:每个时间序列都是由度量标准名称和一组键值对(也称为标签)组成唯一标识。**度量名称**指定被测量的系统的特征(例如:http\_requests\_total-接收到的HTTP请求的总数)。它可以包含ASCII字母和数字,以及下划线和冒号。它必须匹配正则表达式\[a-zA-Z\_:\]\[a-zA-Z0-9\_:\]\*。**标签**启用Prometheus的维度数据模型:对于相同度量标准名称,任何给定的标签组合都标识该度量标准的特定维度实例。查询语言允许基于这些维度进行筛选和聚合。更改任何标签值(包括添加或删除标签)都会创建新的时间序列。标签名称可能包含ASCII字母,数字以及下划线。他们必须匹配正则表达式\[a-zA-Z\_\]\[a-zA-Z0-9\_\]\*。以\_\_开始的标签名称保留给供内部使用。
**样本**:实际的时间序列,每个序列包括:一个 float64 的值和一个毫秒级的时间戳。
**格式:**给定度量标准名称和一组标签,时间序列通常使用以下格式来标识:
{=, ...}
**度量类型**
Prometheus 客户端库主要提供Counter、Gauge、**Histogram****和****Summery**四种主要的 metric 类型:
**Counter(****计算器****)****:****Counter****是**一种累加的度量,它的值只能增加或在重新启动时重置为零。
**Gauge(****测量****)****:**Gauge表示单个数值,表达可以任意地上升和下降的度量。
**Histogram(**直方图):Histogram样本观测(例如:请求持续时间或响应大小),并将它们计入配置的桶中。它也提供所有观测值的总和。
**Summery**:**类似于****Histogram**,*Summery*样本观察(通常是请求持续时间和响应大小)。虽然它也提供观测总数和所有观测值的总和,但它计算滑动时间窗内的可配置分位数。
![pastedGraphic_1.png](blob:https://www.kancloud.cn/6affd576-1f9c-4e12-b85f-0a73244198a7)
两种获取数据的方式:pull、push
pull:客户端安装exporters,exporters采集数据,prometheus用HTTP get访问exporter,exporter返回数据
push:客户端安装pushgateway,用自己开发的脚本把数据组织成k\\v形式,发给pushgateway,然后pushgateway推给prometheus
**promql****示例**
1. 1)((sum(increase(node\_cpu{mode="idle"}\[1m\])) by(instance)) /(sum(increase(node\_cpu\[1m\])) by(instance))))\*100
要查询的是 node\_cpu
increase()求一个时间段的增量,\[1m\]表示求1分钟之内的增量
{mode="idle"}表示求空闲的cpu的1分钟之内的增量
sum()求和
by(instance) 可以把sum加到一起的数值按照指定方式拆分,instance代表机器名
/ 代表除法,promql支持 \+ - \* / % ^ 等数学运算
以上promql是计算cpu使用率的表达式
rate()求一个时间段的平均每秒的增量,专门搭配counter类
{exported\_instance=~"XXX"} 过滤,模糊匹配
topk() 取前几位的最高值,一般用于console查看
count() 把符合条件的输出数目加合,比如统计pod的总数是多少,而不是看pod有哪些
predict\_linear() 对曲线变化速率的计算,以及对加速的未来预测
以上只列出了比较常用的一些函数,官网上还有很多。
**安装****prometheus**
安装prometheus可以在k8s里以容器形式创建,也可以在外部用二进制包安装
\# vim prometheus.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
\---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
\- apiGroups: \[""\]
resources:
\- nodes
\- nodes/proxy
\- services
\- endpoints
\- pods
verbs: \["get", "list", "watch"\]
\- apiGroups:
\- extensions
resources:
\- ingresses
verbs: \["get", "list", "watch"\]
\- nonResourceURLs: \["/metrics"\]
verbs: \["get"\]
\---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
\---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
\- kind: ServiceAccount
name: prometheus
namespace: monitoring
\---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
labels:
name: prometheus-deployment
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
\- image: prom/prometheus:v2.0.0
name: prometheus
command:
\- "/bin/prometheus"
args:
\- "--config.file=/etc/prometheus/prometheus.yml"
\- "--storage.tsdb.path=/prometheus"
\- "--storage.tsdb.retention=24h"
ports:
\- containerPort: 9090
protocol: TCP
volumeMounts:
\- mountPath: "/prometheus"
name: data
\- mountPath: "/etc/prometheus"
name: config-volume
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
serviceAccountName: prometheus
imagePullSecrets:
\- name: regsecret
volumes:
\- name: data
emptyDir: {}
\- name: config-volume
configMap:
name: prometheus-config
\---
kind: Service
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus
namespace: monitoring
spec:
type: ClusterIP
ports:
\- port: 9090
targetPort: 9090
selector:
app: prometheus
\---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: prometheus
namespace: monitoring
spec:
rules:
\- host: prometheus.pkbeta.com
http:
paths:
\- path: /
backend:
serviceName: prometheus
servicePort: 9090
以上就是创建一个prometheus的yaml文件里面创建了一个namespace,deployment,ingress
**!!!最重要的****configmap****也就是****Prometheus****的配置文件我把单独列出来了**
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape\_interval: 15s 每15s抓取一次数据
evaluation\_interval: 15s 每15s评估一次规则
scrape\_configs:
\- job\_name: 'kubernetes-apiservers' 任务的名称
kubernetes\_sd\_configs: 以k8s角色来定义收集
\- role: endpoints 从endpoints获取apiserver数据
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel\_configs: 在抓取之前对任何目标及其标签进行修改
\- source\_labels: \[\_\_meta\_kubernetes\_namespace, \_\_meta\_kubernetes\_service\_name, \_\_meta\_kubernetes\_endpoint\_port\_name\] 选择哪些label
action: keep 含有符合regex的source\_label的endpoints进行保留
regex: default;kubernetes;https
\- job\_name: 'kubernetes-nodes'
kubernetes\_sd\_configs:
\- role: node
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel\_configs:
\- action: labelmap
regex: \_\_meta\_kubernetes\_node\_label\_(.+)
\- target\_label: \_\_address\_\_
replacement: kubernetes.default.svc:443
\- source\_labels: \[\_\_meta\_kubernetes\_node\_name\]
regex: (.+)
target\_label: \_\_metrics\_path\_\_
replacement: /api/v1/nodes/${1}/proxy/metrics
\- job\_name: 'kubernetes-cadvisor'
kubernetes\_sd\_configs:
\- role: node
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel\_configs:
\- action: labelmap
regex: \_\_meta\_kubernetes\_node\_label\_(.+)
\- target\_label: \_\_address\_\_
replacement: kubernetes.default.svc:443
\- source\_labels: \[\_\_meta\_kubernetes\_node\_name\]
regex: (.+)
target\_label: \_\_metrics\_path\_\_
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
\- job\_name: 'kubernetes-service-endpoints'
kubernetes\_sd\_configs:
\- role: endpoints
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_scrape\]
action: keep
regex: true
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_scheme\]
action: replace
target\_label: \_\_scheme\_\_
regex: (https?)
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_path\]
action: replace
target\_label: \_\_metrics\_path\_\_
regex: (.+)
\- source\_labels: \[\_\_address\_\_, \_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_port\]
action: replace
target\_label: \_\_address\_\_
regex: (\[^:\]+)(?::\\d+)?;(\\d+)
replacement: $1:$2
\- action: labelmap
regex: \_\_meta\_kubernetes\_service\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
action: replace
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_service\_name\]
action: replace
target\_label: kubernetes\_name
\- job\_name: 'kubernetes-services'
kubernetes\_sd\_configs:
\- role: service
metrics\_path: /probe
params:
module: \[http\_2xx\]
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_probe\]
action: keep
regex: true
\- source\_labels: \[\_\_address\_\_\]
target\_label: \_\_param\_target
\- target\_label: \_\_address\_\_
replacement: blackbox-exporter.example.com:9115
\- source\_labels: \[\_\_param\_target\]
target\_label: instance
\- action: labelmap
regex: \_\_meta\_kubernetes\_service\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_service\_name\]
target\_label: kubernetes\_name
\- job\_name: 'kubernetes-ingresses'
kubernetes\_sd\_configs:
\- role: ingress
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_ingress\_annotation\_prometheus\_io\_probe\]
action: keep
regex: true
\- source\_labels: \[\_\_meta\_kubernetes\_ingress\_scheme,\_\_address\_\_,\_\_meta\_kubernetes\_ingress\_path\]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target\_label: \_\_param\_target
\- target\_label: \_\_address\_\_
replacement: blackbox-exporter.example.com:9115
\- source\_labels: \[\_\_param\_target\]
target\_label: instance
\- action: labelmap
regex: \_\_meta\_kubernetes\_ingress\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_ingress\_name\]
target\_label: kubernetes\_name
\- job\_name: 'kubernetes-pods'
kubernetes\_sd\_configs:
\- role: pod
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_pod\_annotation\_prometheus\_io\_scrape\]
action: keep
regex: true
\- source\_labels: \[\_\_meta\_kubernetes\_pod\_annotation\_prometheus\_io\_path\]
action: replace
target\_label: \_\_metrics\_path\_\_
regex: (.+)
\- source\_labels: \[\_\_address\_\_, \_\_meta\_kubernetes\_pod\_annotation\_prometheus\_io\_port\]
action: replace
regex: (\[^:\]+)(?::\\d+)?;(\\d+)
replacement: $1:$2
target\_label: \_\_address\_\_
\- action: labelmap
regex: \_\_meta\_kubernetes\_pod\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
action: replace
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_pod\_name\]
action: replace
target\_label: kubernetes\_pod\_name
上面的job就是定义了你需要抓取的是什么数据
通过kubernetes-apiservers采集apiserver相关的性能指标数据
通过cadvisor采集容器相关的性能指标数据
等等
以上只安装了一个prometheus的server,但是需要一个在被监控端收集数据的。
通过cadvisor采集容器相关的性能指标数据,已集成在kubelet上
通过prometheus-node-exporter采集主机的性能指标数据,需要部署在每个node上,必须以DaemonSet形式部署
通过kube-state-metrics采集K8S资源对象以及K8S组件的健康状态指标数据,需要部署在每一个node上,**必须**以DaemonSet形式部署
\# vim node-exporter.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
spec:
template:
metadata:
labels:
k8s-app: node-exporter
spec:
containers:
\- image: prom/node-exporter:v0.16.0
name: node-exporter
ports:
\- containerPort: 9100
hostPort: 9100
protocol: TCP
name: http
volumeMounts:
\- name: time
mountPath: /etc/localtime
readOnly: true
volumes:
\- name: time
hostPath:
path: /etc/localtime
\---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/app-metrics: 'true'
prometheus.io/app-metrics-path: '/metrics'
labels:
k8s-app: node-exporter
name: node-exporter
namespace: monitoring
spec:
ports:
\- name: http
port: 9100
targetPort: 9100
protocol: TCP
selector:
k8s-app: node-exporter
**\# vim kube-state-metrics.yaml**
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: monitoring
spec:
replicas: 2
template:
metadata:
labels:
app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
\- name: kube-state-metrics
image: gcr.io/google\_containers/kube-state-metrics:v0.5.0
ports:
\- containerPort: 8080
\---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: monitoring
\---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: kube-state-metrics
namespace: monitoring
labels:
app: kube-state-metrics
spec:
ports:
\- name: kube-state-metrics
port: 8080
protocol: TCP
selector:
app: kube-state-metrics
\---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-directory-size-metrics
namespace: monitoring
annotations:
description: |
This `DaemonSet` provides metrics in Prometheus format about disk usage on the nodes.
The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
The other container `caddy` just hands out the contents of that file on request via `http` on `/metrics` at port `9102` which are the defaults for Prometheus.
These are scheduled on every node in the Kubernetes cluster.
To choose directories from the node to check, just mount them on the `read-du` container below `/mnt`.
spec:
template:
metadata:
labels:
app: node-directory-size-metrics
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
description: |
This `Pod` provides metrics in Prometheus format about disk usage on the node.
The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
The other container `caddy` just hands out the contents of that file on request on `/metrics` at port `9102` which are the defaults for Prometheus.
This `Pod` is scheduled on every node in the Kubernetes cluster.
To choose directories from the node to check just mount them on `read-du` below `/mnt`.
spec:
containers:
\- name: read-du
image: giantswarm/tiny-tools
imagePullPolicy: Always
command:
\- fish
\- --command
\- |
touch /tmp/metrics-temp
while true
for directory in (du --bytes --separate-dirs --threshold=100M /mnt)
echo $directory | read size path
echo "node\_directory\_size\_bytes{path=\\"$path\\"} $size" \\
\>> /tmp/metrics-temp
end
mv /tmp/metrics-temp /tmp/metrics
sleep 300
end
volumeMounts:
\- name: host-fs-var
mountPath: /mnt/var
readOnly: true
\- name: metrics
mountPath: /tmp
\- name: caddy
image: dockermuenster/caddy:0.9.3
command:
\- "caddy"
\- "-port=9102"
\- "-root=/var/www"
ports:
\- containerPort: 9102
volumeMounts:
\- name: metrics
mountPath: /var/www
volumes:
\- name: host-fs-var
hostPath:
path: /var
\- name: metrics
emptyDir:
medium: Memory
现在安装安装prometheus
\# kubectl create -f .
刚才设置了Prometheus的ingress,域名是prometheus.pkbeta.com
现在可以浏览器登陆域名查看了
![pastedGraphic_2.png](blob:https://www.kancloud.cn/859463e4-751f-4ece-8425-5eb1c3c49373)
1. 可以输入promql查询语句,查询你需要的数据
2. 查询按键
3. 列出可查询的参数
4. 显示查询的数据
5. 以图形显示查询的数据
**安装****Grafana**
**基本概念**
1. 数据源\--------(grafana只是一个时序数据展现工具,它展现所需的时序数据有数据源提供)
2. 组织\-----------(grafana支持多组织,单个实例就可以服务多个相互之间不信任的组织)
3. 用户\-----------(一个用户可以属于一个或者多个组织,且同一个用户在不同的组中可以分配不同级别的权限)
4. 行\--------------(在仪表板中行是分割板,用于对面板进行分组)
5. 面板\-----------(面板是最基本的显示单元,且每一个面板会提供一个查询编辑器)
6. 查询编辑器 \-(查询编辑器暴露了数据源的能力,并且不同的数据源有不同的查询编辑器)
7. 仪表板 ----(仪表板是将各种组件组合起来最终展现的地方)
\# vim grafana.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: grafana-core
namespace: monitoring
labels:
app: grafana
spec:
replicas: 1
template:
metadata:
labels:
app: grafana
spec:
containers:
\- image: grafana/grafana:4.2.0
name: grafana
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
env:
\- name: GF\_INSTALL\_PLUGINS
value: "alexanderzobnin-zabbix-app"
\- name: GF\_AUTH\_BASIC\_ENABLED
value: "true"
\- name: GF\_AUTH\_ANONYMOUS\_ENABLED
value: "false"
readinessProbe:
httpGet:
path: /login
port: 3000
volumeMounts:
\- name: grafana-persistent-storage
mountPath: /var
volumes:
\- name: grafana-persistent-storage
emptyDir: {}
\---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
spec:
type: ClusterIP
ports:
\- port: 3000
selector:
app: grafana
\---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: grafana
namespace: monitoring
spec:
rules:
\- host: grafana.pkbeta.com
http:
paths:
\- path: /
backend:
serviceName: grafana
servicePort: 3000
\# kubectl create -f grafana.yaml
上面定义了grafana的域名为grafana.pkbeta.com,用浏览器进入grafana的登录页面,默认账户密码为admin/admin
![pastedGraphic_3.png](blob:https://www.kancloud.cn/aff722f3-0ea2-41a9-b92e-7a7d5518e1df)
![pastedGraphic_4.png](blob:https://www.kancloud.cn/4e00fcb7-2bd5-4f3e-acf9-83b8fdbb748d)
现在还没有数据源和图形界面
首先添加数据源
![pastedGraphic_5.png](blob:https://www.kancloud.cn/6b4a5d4b-2d96-4741-8f93-7fc7815a566c)
![pastedGraphic_6.png](blob:https://www.kancloud.cn/675ebc3e-5d56-4dff-b74f-ae0f4d92a3cc)
![pastedGraphic_7.png](blob:https://www.kancloud.cn/0ffcdc04-4186-4add-9637-8a4ee488f961)
Name:定义数据源的名字,自定义
Type:数据源的类型,选择prometheus
Url:最好写k8s内部的prometheus域名加端口
然后添加数据源。
![pastedGraphic_8.png](blob:https://www.kancloud.cn/4997ffac-7a0e-41af-823b-6b8ddea64ef9)
现在有了数据源,还差个dashboard
可以导入别人做好的模板,也可以自己做一个
![pastedGraphic_9.png](blob:https://www.kancloud.cn/940e9498-5a9d-4d7b-9e32-6c1201dad88f)
![pastedGraphic_10.png](blob:https://www.kancloud.cn/31f926fa-d0f1-43f9-9660-8a84c0611ea9)
![pastedGraphic_11.png](blob:https://www.kancloud.cn/f9d11066-6ca1-43a0-8758-b2e7f520a98b)
点击图片的标题就可以出现选项
![pastedGraphic_12.png](blob:https://www.kancloud.cn/1cedb90f-5e1b-490c-8942-d8fa69d4c99d)
把default换成prometheus
![pastedGraphic_13.png](blob:https://www.kancloud.cn/e5fee0e9-8bb3-49f6-b2c2-8a2f281fcb7d)
修改Query为自己想要查询的promql
![pastedGraphic_14.png](blob:https://www.kancloud.cn/bc1bfe34-6e0f-42d6-bcf0-9650784a73e2)
或者也可以导入其他人做好的模板
![pastedGraphic_15.png](blob:https://www.kancloud.cn/28d1cf02-7dea-4493-80f2-066f17b4b61e)
![pastedGraphic_16.png](blob:https://www.kancloud.cn/22ffb472-9f8f-4982-9551-98228971820b)
第一个是导入模板的文件
或者输入导入模板的编号
或者直接粘贴模板的json
![pastedGraphic_17.png](blob:https://www.kancloud.cn/8a01dde3-33b0-428e-abdd-7b2b2e0b1a34)Name:dashboard模板的名称
Prometheus:选择数据源
![pastedGraphic_18.png](blob:https://www.kancloud.cn/d6f705e8-a75c-4273-8c62-9c90aedf1c5f)