监控&告警 · sg-exam

[TOC] # 一、原理流程如下： ![](https://img.kancloud.cn/ef/fb/effb0be55c27832756378cb13d115c06_1168x854.png) 主要依赖： * `implementation 'org.springframework.boot:spring-boot-starter-actuator'` * `implementation("io.micrometer:micrometer-registry-prometheus")` 各服务通过`/actuator/prometheus`暴露服务指标具体配置见：`kubernetes/services/base/deployments` ![](https://img.kancloud.cn/55/1f/551f5ee1946237612e1b13c49af449c5_2456x1008.png) `application.yml`配置： ![](https://img.kancloud.cn/33/33/3333b329bc2c76e763e112975389e4b8_1354x300.png) 可以在容器内curl确认prometheus指标是否上报成功： ``` $ curl 'http://localhost:4004/actuator/prometheus' ``` ![](https://img.kancloud.cn/2e/d7/2ed797f7b1c4dbf8b6865c066d316671_2056x1122.png) # 二、配置Grafana Dashboards 打开Grafana Dashboards ``` $ istioctl dashboard grafana ``` 访问[http://localhost:3000/dashboard/new?layout=list&search=open&orgId=1](http://localhost:3000/dashboard/new?layout=list&search=open&orgId=1) 可以看到服务网格的一些指标 ![](https://img.kancloud.cn/3a/38/3a3893dad2ee35e5037b1121b06feae8_2876x1482.png) # 三、导入JVM dashboard 主要是JVM的监控dashboard 1. 点击“+”号，选择import ![](https://img.kancloud.cn/98/e0/98e0bd6b20734282db0b57a84c605769_658x634.png) 2. 输入4701，点击load ![](https://img.kancloud.cn/c9/e0/c9e0d9cb619d706890b1d015ce81f733_1676x596.png) 3. 可以看到网络IO、JVM内存、线程等指标的监控 ![](https://img.kancloud.cn/73/ce/73ce8adc8346cf02575f923d15695baa_2840x1478.png) # 四、发送告警邮件执行脚本： ``` $ ./kubernetes/scripts/deploy-mail-server.bash ``` ![](https://img.kancloud.cn/b7/29/b729cf0951f2708df29f6538899e71ef_2024x324.png) 访问`http://localhost:8080/#/` ![](https://img.kancloud.cn/e7/6e/e76ec7c5a968107980ea2a75b467c307_2878x998.png) 配置Grafana发送邮件到mail server，执行以下命令： ``` $ kubectl -n istio-system set env deployment/grafana \ GF_SMTP_ENABLED=true \ GF_SMTP_SKIP_VERIFY=true \ GF_SMTP_HOST=mail-server.sg-exam.svc.cluster.local:25 \ GF_SMTP_FROM_ADDRESS=grafana@minikube.me $ kubectl -n istio-system wait --timeout=60s --for=condition=ready pod -l app=grafana ``` ![](https://img.kancloud.cn/75/76/757695c0e5695dd7791ab038a4f2ab37_2612x418.png) 配置Grafana发送告警邮件 ![](https://img.kancloud.cn/d5/7a/d57a9f52054493a3afd52df931b9b020_2374x1026.png) # 五、Elasticsearch监控主要是监控日志收集的es集群 1. 下载并运行最新版本cerebro: [https://github.com/lmenezes/cerebro/releases](https://github.com/lmenezes/cerebro/releases) 访问：[http://localhost:9000/](http://localhost:9000/) 2. 输入es集群的地址 ![](https://img.kancloud.cn/92/11/9211992becf439c1379a5c1d7c8903af_2380x1264.png) 点击connect 3. 效果可以监控到es集群、索引等指标，cerebro还是非常强大的，其它功能的具体使用参考官方wiki ![](https://img.kancloud.cn/16/57/16576e7f165a3088a6c1c0897beac4da_2846x998.png) 集群变黄色是因为部署的es集群为单节点，副本分片分配不了，但不影响使用，可修改副本数解决，点击索引->index settings 修改index.number_of_replicas为0 ![](https://img.kancloud.cn/4e/98/4e98971eb9b32555713abf722d1bc4a5_1212x474.png)