[TOC]
kubelet 服务对磁盘检查是有两个参数的,分别是 `imagefs` 与 `nodefs`。其中
- imagefs:监控docker启动参数 `data-root 或者 graph` 目录所在的分区。默认`/var/lib/docker`
- nodefs:监控kubelet启动参数 `--root-dir` 指定的目录所在分区。默认`/var/lib/kubelet`
## 环境说明
kubernetes版本
```shell
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 85d v1.18.18
k8s-master02 Ready master 85d v1.18.18
k8s-node01 Ready <none> 85d v1.18.18
k8s-node02 Ready <none> 85d v1.18.18
k8s-node03 Ready <none> 85d v1.18.18
```
节点状态
```shell
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 11:39:29 +0800 Wed, 01 Dec 2021 11:39:29 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletReady kubelet is posting ready status
```
docker数据目录
```shell
$ docker info | grep "Docker Root Dir"
Docker Root Dir: /data/docker/data
```
kubelet数据目录
```shell
$ ps -ef | grep kubelet
/data/k8s/bin/kubelet --alsologtostderr=true --logtostderr=false --v=4 --log-dir=/data/k8s/logs/kubelet --hostname-override=k8s-master01 --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --kubeconfig=/data/k8s/certs/kubelet.kubeconfig --bootstrap-kubeconfig=/data/k8s/certs/bootstrap.kubeconfig --config=/data/k8s/conf/kubelet-config.yaml --cert-dir=/data/k8s/certs/ --root-dir=/data/k8s/data/kubelet/ --pod-infra-container-image=ecloudedu/pause-amd64:3.0
```
分区使用率
```shell
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 40G 8.8G 32G 23% /
/dev/sdb 40G 1.9G 39G 10% /data/docker/data
...
```
## 验证方案
1. 验证nodefs超过阈值
2. 验证imagefs超过阈值
3. 验证imagefs和nodefs超过阈值
### 验证nodefs超过阈值
`kubelet` 的 `--root-dir` 参数在所分区(/)已使用23%,现在修改imagefs的阈值为78%,node应该nodefs超标。
```yaml
evictionHard:
memory.available: 10%
nodefs.available: 78%
nodefs.inodesFree: 10%
imagefs.available: 10%
imagefs.inodesFree: 10%
```
然后我们查看节点的状态,Attempting to reclaim ephemeral-storage,意思为尝试回收磁盘空间
```shell
$ kubectl describe node k8s-master01
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 14:18:56 +0800 Wed, 01 Dec 2021 14:18:56 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:56:13 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletReady kubelet is posting ready status
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 6m45s kubelet Starting kubelet.
Normal NodeAllocatableEnforced 6m45s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 6m45s kubelet Node k8s-master01 status is now: NodeHasSufficientMemory
Normal NodeHasDiskPressure 6m45s kubelet Node k8s-master01 status is now: NodeHasDiskPressure
Normal NodeHasSufficientPID 6m45s kubelet Node k8s-master01 status is now: NodeHasSufficientPID
Warning EvictionThresholdMet 105s (x31 over 6m45s) kubelet Attempting to reclaim ephemeral-storage
```
### 验证imagefs超过阈值
`docker` 存储目录(/data/docker/data)在所分区已使用10%,现在修改imagefs的阈值为91%,node应该imagefs超标。
```yaml
evictionHard:
memory.available: 10%
nodefs.available: 10%
nodefs.inodesFree: 10%
imagefs.available: 91%
imagefs.inodesFree: 10%
```
然后我们查看节点的状态,Attempting to reclaim ephemeral-storage,意思为尝试回收磁盘空间
```shell
$ kubectl describe node k8s-master01
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 14:18:56 +0800 Wed, 01 Dec 2021 14:18:56 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:56:13 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletReady kubelet is posting ready status
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasSufficientPID 18s kubelet Node k8s-master01 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 18s kubelet Updated Node Allocatable limit across pods
Warning EvictionThresholdMet 18s kubelet Attempting to reclaim ephemeral-storage
Normal NodeHasSufficientMemory 18s kubelet Node k8s-master01 status is now: NodeHasSufficientMemory
Normal NodeHasDiskPressure 18s kubelet Node k8s-master01 status is now: NodeHasDiskPressure
Normal Starting 18s kubelet Starting kubelet.
```
### 验证imagefs和nodefs同时超过阈值
现在修改imagefs的阈值为91%和nodefs的阈值为78%,node应该imagefs和nodefs超标。
```yaml
evictionHard:
memory.available: 10%
nodefs.available: 78%
nodefs.inodesFree: 10%
imagefs.available: 91%
imagefs.inodesFree: 10%
```
然后我们查看节点的状态,Attempting to reclaim ephemeral-storage,意思为尝试回收磁盘空间
```shell
$ kubectl describe node k8s-master01
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 14:18:56 +0800 Wed, 01 Dec 2021 14:18:56 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 15:23:03 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletReady kubelet is posting ready status
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 2m9s kubelet Starting kubelet.
Normal NodeHasSufficientPID 2m9s kubelet Node k8s-master01 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 2m9s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 2m9s kubelet Node k8s-master01 status is now: NodeHasSufficientMemory
Normal NodeHasDiskPressure 2m7s (x2 over 2m9s) kubelet Node k8s-master01 status is now: NodeHasDiskPressure
Warning EvictionThresholdMet 8s (x13 over 2m9s) kubelet Attempting to reclaim ephemeral-storage
```
## 总结
1. nodefs是--root-dir目录所在分区,imagefs是docker安装目录所在的分区
2. 建议nodefs与imagefs共用一个分区,但是这个分区要设置的大一些。
3. 当nodefs与imagefs共用一个分区时,kubelet中的其他几个参数--root-dir、--cert-dir
- 前言
- 架构
- 部署
- kubeadm部署
- kubeadm扩容节点
- 二进制安装基础组件
- 添加master节点
- 添加工作节点
- 选装插件安装
- Kubernetes使用
- k8s与dockerfile启动参数
- hostPort与hostNetwork异同
- 应用上下线最佳实践
- 进入容器命名空间
- 主机与pod之间拷贝
- events排序问题
- k8s会话保持
- 容器root特权
- CNI插件
- calico
- calicoctl安装
- calico网络通信
- calico更改pod地址范围
- 新增节点网卡名不一致
- 修改calico模式
- calico数据存储迁移
- 启用 kubectl 来管理 Calico
- calico卸载
- cilium
- cilium架构
- cilium/hubble安装
- cilium网络路由
- IP地址管理(IPAM)
- Cilium替换KubeProxy
- NodePort运行DSR模式
- IP地址伪装
- ingress使用
- nginx-ingress
- ingress安装
- ingress高可用
- helm方式安装
- 基本使用
- Rewrite配置
- tls安全路由
- ingress发布管理
- 代理k8s集群外的web应用
- ingress自定义日志
- ingress记录真实IP地址
- 自定义参数
- traefik-ingress
- traefik名词概念
- traefik安装
- traefik初次使用
- traefik路由(IngressRoute)
- traefik中间件(middlewares)
- traefik记录真实IP地址
- cert-manager
- 安装教程
- 颁布者CA
- 创建证书
- 外部存储
- 对接NFS
- 对接ceph-rbd
- 对接cephfs
- 监控平台
- Prometheus
- Prometheus安装
- grafana安装
- Prometheus配置文件
- node_exporter安装
- kube-state-metrics安装
- Prometheus黑盒监控
- Prometheus告警
- grafana仪表盘设置
- 常用监控配置文件
- thanos
- Prometheus
- Sidecar组件
- Store Gateway组件
- Querier组件
- Compactor组件
- Prometheus监控项
- grafana
- Querier对接grafana
- alertmanager
- Prometheus对接alertmanager
- 日志中心
- filebeat安装
- kafka安装
- logstash安装
- elasticsearch安装
- elasticsearch索引生命周期管理
- kibana安装
- event事件收集
- 资源预留
- 节点资源预留
- imagefs与nodefs验证
- 资源预留 vs 驱逐 vs OOM
- scheduler调度原理
- Helm
- Helm安装
- Helm基本使用
- 安全
- apiserver审计日志
- RBAC鉴权
- namespace资源限制
- 加密Secret数据
- 服务网格
- 备份恢复
- Velero安装
- 备份与恢复
- 常用维护操作
- container runtime
- 拉取私有仓库镜像配置
- 拉取公网镜像加速配置
- runtime网络代理
- overlay2目录占用过大
- 更改Docker的数据目录
- Harbor
- 重置Harbor密码
- 问题处理
- 关闭或开启Harbor的认证
- 固定harbor的IP地址范围
- ETCD
- ETCD扩缩容
- ETCD常用命令
- ETCD数据空间压缩清理
- ingress
- ingress-nginx header配置
- kubernetes
- 验证yaml合法性
- 切换KubeProxy模式
- 容器解析域名
- 删除节点
- 修改镜像仓库
- 修改node名称
- 升级k8s集群
- 切换容器运行时
- apiserver接口
- 其他
- 升级内核
- k8s组件性能分析
- ETCD
- calico
- calico健康检查失败
- Harbor
- harbor同步失败
- Kubernetes
- 资源Terminating状态
- 启动容器报错