[TOC]
<br >
*****
# **在 CentOS 7 上安装 Kafka 2.3.1 单节点环境**
<br >
## **前提**
假设我们已经满足了以下条件,如果没有的话,请参考 *附录A*:
* 一台2核4G的虚拟机,IP 地址192.168.80.81
* 已安装 CentOS 7.7 64 位操作系统
* 具备 root 用户权限
* 已安装 JDK 1.8.0\_221
<br >
## **安装**
1. 下载 Kafka
从 [Apache Kafka](%E5%B9%B6%E8%A7%A3%E5%8E%8B%E7%BC%A9) 官网下载推荐的二进制包 [kafka\_2.12-2.3.1.tgz](https://www.apache.org/dyn/closer.cgi?path=/kafka/2.3.1/kafka_2.12-2.3.1.tgz),并上传到 /opt 目录解压缩:
~~~
# cd /opt
# tar -xzf kafka_2.12-2.3.1.tgz
# chown -R lemon:oper /opt/kafka_2.12-2.3.1
# cd /opt/kafka_2.12-2.3.1
~~~
<br >
1. 启动服务器
Kafka 使用 [ZooKeeper](https://zookeeper.apache.org/) ,如果你还没有 ZooKeeper 服务器,你需要先启动一个 ZooKeeper 服务器。 您可以通过与 kafka 打包在一起的便捷脚本来快速简单地创建一个单节点 ZooKeeper 实例。
~~~
$ nohup bin/zookeeper-server-start.sh config/zookeeper.properties > zookeeper.log &
...
[2020-01-11 21:35:36,457] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
~~~
稍等片刻,待 Zookeeper 启动完成后,再启动 Kafka 服务器:
~~~
$ export JMX_PORT=9988
$ nohup bin/kafka-server-start.sh config/server.properties > kafka-server.log &
...
[2020-01-11 21:37:22,229] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)
~~~
<br >
## **测试**
1. 创建一个 topic
让我们创建一个名为“test”的 topic,它有一个分区和一个副本:
~~~
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Created topic test.
~~~
现在我们可以运行 list(列表)命令来查看这个 topic:
~~~
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
test
~~~
<br >
1. 发送一些消息
Kafka 自带一个命令行客户端,它从文件或标准输入中获取输入,并将其作为 message(消息)发送到 Kafka 集群。默认情况下,每行将作为单独的 message 发送。
使用另一个终端运行 producer,然后在控制台输入一些消息以发送到服务器。
~~~
$ cd /opt/kafka_2.12-2.3.1/
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
> Hello World!
> Hello China!
^C
~~~
<br >
1. 启动一个 consumer
Kafka 还有一个命令行 consumer(消费者),将消息转储到标准输出。
~~~
$ cd /opt/kafka_2.12-2.3.1/
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Hello World!
Hello China!
^C
~~~
如果您将上述命令在不同的终端中运行,那么现在就可以将消息输入到生产者终端中,并将它们在消费终端中显示出来。
所有的命令行工具都有其他选项;运行不带任何参数的命令将显示更加详细的使用信息。
<br >
2. 生产者吞吐量测试
kafka-producer-perf-test 脚本是 Kafka 提供的用于测试 producer 性能的脚本,该脚本可以很方便地计算出 producer 在一段时间内的吞吐量和平均延时
~~~
$ bin/kafka-producer-perf-test.sh --topic test --num-records 500000 --record-size 200 --throughput -1 --producer-props bootstrap.servers=localhost:9092 acks=-1
221506 records sent, 44063.3 records/sec (8.40 MB/sec), 2382.8 ms avg latency, 3356.0 ms max latency.
500000 records sent, 64102.564103 records/sec (12.23 MB/sec), 2078.40 ms avg latency, 3356.00 ms max latency, 1841 ms 50th, 3250 ms 95th, 3343 ms 99th, 3354 ms 99.9th.
~~~
输出结果表明在这台测试机上运行一个 kafka producer,平均每秒发送 *64102* 条记录,平均吞吐量是每秒 *12.23MB*(占用 97.84Mb/s 左右的宽带),平均延时 *2078* 毫秒,最大延时 *3356* 毫秒,50% 的消息发送需 *1841* 毫秒,95% 的消息发送需 *3250* 毫秒,99% 的消息发送需 *3343* 毫秒,99.9 的消息发送需 *3354* 毫秒。
<br >
3. 消费者吞吐量测试
和 kafka-producer-perf-test 脚本类似,Kafka 为 consumer 也提供了方便、便捷的性能测试脚本,即 kafka-consumer-perf-test 脚本。我们首先用它在刚刚搭建的 Kafka 集群环境中测试一下新版本 consumer 的吞吐量。
~~~
$ bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --messages 500000 --topic test
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2020-01-13 12:52:48:112, 2020-01-13 12:52:49:295, 95.3675, 80.6149, 500002, 422655.9594, 25, 1158, 82.3553, 431780.6563
~~~
<br >
## **关闭**
1. 首先使用 kafka-server-stop 脚本关闭 kafka 集群:
~~~
$ bin/kafka-server-stop.sh
~~~
<br >
1. 然后稍等片刻,使用 zookeeper-server-stop 脚本关闭 zookeeper:
~~~
$ bin/zookeeper-server-stop.sh
~~~
<br >
## **调优**
### **操作系统调优**
1. 调整最大文件描述符上限:
~~~
# ulimit -n 100000
~~~
> 使用 Kafka 的用户有时候会碰到“too many files open”的错误,这就需要为 broker 所在机器调优最大文件描述符上线。调优可参考这样的公式:broker 上可能的最大分区数 \* (每个分区平均数据量 / 平均的日志段大小 + 3)。实际一般将该值设置得很大,比如 1000000。
<br >
1. 关闭 swap:
~~~
# sysctl vm.swappiness=1
# vim /etc/sysctl.conf
vm.swappiness=1
~~~
> 关闭 swap 是很多使用磁盘的应用程序的常规调优手段,将 vm.swappiness 调整为 1 个较小的数 ,即大幅降低对 swap 空间的使用,以免极大地拉低性能。可以使用 `free -m` 命令验证。
<br >
1. 优化 /data 分区:
调整 /data 分区设置,*vim /etc/fstab*,在 defaults 后面增加 `,noatime,largeio`:
~~~
/dev/mapper/centos-data /data xfs defaults,noatime,largeio 0 0
~~~
> 禁止 atime 更新:由于 Kafka 大量使用物理磁盘进行消息持久化,故文件系统的选择是重要的调优步骤。对于 Linux 系统上的任何文件系统,Kafka 都推荐用户在挂载文件系统(mount)时设置 noatime 选项,即取消文件 atime(最新访问时间)属性的更新——禁掉 atime 更新避免了 inode 访问时间的写入操作,因此极大地减少了文件系统写操作数,从而提升了集群性能。Kafka 并没有使用 atime,因此禁掉它是安全的操作。可以使用 `ls -l --time=atime` 命令验证。
> largeio 参数将影响 stat 调用返回的 I/O 大小。对于大数据量的磁盘写入操作而言,它能够提升一定的性能。
> 重新挂载 /data 分区:
~~~
# mount -o remount /data
~~~
<br >
1. 生产者吞吐量测试
~~~
$ cd /opt/kafka_2.12-2.3.1
$ bin/kafka-producer-perf-test.sh --topic test --num-records 500000 --record-size 200 --throughput -1 --producer-props bootstrap.servers=localhost:9092 acks=-1
442836 records sent, 88496.4 records/sec (16.88 MB/sec), 1311.3 ms avg latency, 1937.0 ms max latency.
500000 records sent, 91642.228739 records/sec (17.48 MB/sec), 1328.20 ms avg latency, 1937.00 ms max latency, 1368 ms 50th, 1886 ms 95th, 1930 ms 99th, 1935 ms 99.9th.
~~~
<br >
1. 消费者吞吐量测试
和 kafka-producer-perf-test 脚本类似,Kafka 为 consumer 也提供了方便、便捷的性能测试脚本,即 kafka-consumer-perf-test 脚本。我们首先用它在刚刚搭建的 Kafka 集群环境中测试一下新版本 consumer 的吞吐量。
~~~
$ bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --messages 500000 --topic test
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2020-01-13 13:17:34:483, 2020-01-13 13:17:35:580, 95.4016, 86.9659, 500181, 455953.5096, 20, 1077, 88.5809, 464420.6128
~~~
<br >
### **JVM 调优**
1. 调整 JVM 启动参数:
编辑个人文件 `vim ~/.bash_profile`,将以下内容添加到文件中:
~~~
export KAFKA_HEAP_OPTS="-Xmx1g -Xms1g -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=85"
~~~
编译 .bash\_profile
~~~
source ~/.bash_profile
~~~
<br >
1. 生产者吞吐量测试
~~~
$ cd /opt/kafka_2.12-2.3.1
$ bin/kafka-producer-perf-test.sh --topic test --num-records 500000 --record-size 200 --throughput -1 --print-metrics --producer-props bootstrap.servers=localhost:9092 acks=-1
377763 records sent, 75552.6 records/sec (14.41 MB/sec), 1507.6 ms avg latency, 1979.0 ms max latency.
500000 records sent, 84160.915671 records/sec (16.05 MB/sec), 1466.62 ms avg latency, 1979.00 ms max latency, 1507 ms 50th, 1952 ms 95th, 1965 ms 99th, 1978 ms 99.9th.
Metric Name Value
app-info:commit-id:{client-id=producer-1} : 18a913733fb71c01
app-info:start-time-ms:{client-id=producer-1} : 1579140833567
app-info:version:{client-id=producer-1} : 2.3.1
kafka-metrics-count:count:{client-id=producer-1} : 102.000
producer-metrics:batch-size-avg:{client-id=producer-1} : 16337.068
producer-metrics:batch-size-max:{client-id=producer-1} : 16377.000
producer-metrics:batch-split-rate:{client-id=producer-1} : 0.000
producer-metrics:batch-split-total:{client-id=producer-1} : 0.000
producer-metrics:buffer-available-bytes:{client-id=producer-1} : 33554432.000
producer-metrics:buffer-exhausted-rate:{client-id=producer-1} : 0.000
producer-metrics:buffer-exhausted-total:{client-id=producer-1} : 0.000
producer-metrics:buffer-total-bytes:{client-id=producer-1} : 33554432.000
producer-metrics:bufferpool-wait-ratio:{client-id=producer-1} : 0.081
producer-metrics:bufferpool-wait-time-total:{client-id=producer-1} : 2834023273.000
producer-metrics:compression-rate-avg:{client-id=producer-1} : 1.000
producer-metrics:connection-close-rate:{client-id=producer-1} : 0.000
producer-metrics:connection-close-total:{client-id=producer-1} : 0.000
producer-metrics:connection-count:{client-id=producer-1} : 2.000
producer-metrics:connection-creation-rate:{client-id=producer-1} : 0.056
producer-metrics:connection-creation-total:{client-id=producer-1} : 2.000
producer-metrics:failed-authentication-rate:{client-id=producer-1} : 0.000
producer-metrics:failed-authentication-total:{client-id=producer-1} : 0.000
producer-metrics:failed-reauthentication-rate:{client-id=producer-1} : 0.000
producer-metrics:failed-reauthentication-total:{client-id=producer-1} : 0.000
producer-metrics:incoming-byte-rate:{client-id=producer-1} : 10062.678
producer-metrics:incoming-byte-total:{client-id=producer-1} : 360586.000
producer-metrics:io-ratio:{client-id=producer-1} : 0.012
producer-metrics:io-time-ns-avg:{client-id=producer-1} : 23895.015
producer-metrics:io-wait-ratio:{client-id=producer-1} : 0.098
producer-metrics:io-wait-time-ns-avg:{client-id=producer-1} : 204193.407
producer-metrics:io-waittime-total:{client-id=producer-1} : 3537650773.000
producer-metrics:iotime-total:{client-id=producer-1} : 413981132.000
producer-metrics:metadata-age:{client-id=producer-1} : 5.829
producer-metrics:network-io-rate:{client-id=producer-1} : 358.811
producer-metrics:network-io-total:{client-id=producer-1} : 12858.000
producer-metrics:outgoing-byte-rate:{client-id=producer-1} : 2939361.696
producer-metrics:outgoing-byte-total:{client-id=producer-1} : 105329087.000
producer-metrics:produce-throttle-time-avg:{client-id=producer-1} : 0.000
producer-metrics:produce-throttle-time-max:{client-id=producer-1} : 0.000
producer-metrics:reauthentication-latency-avg:{client-id=producer-1} : NaN
producer-metrics:reauthentication-latency-max:{client-id=producer-1} : NaN
producer-metrics:record-error-rate:{client-id=producer-1} : 0.000
producer-metrics:record-error-total:{client-id=producer-1} : 0.000
producer-metrics:record-queue-time-avg:{client-id=producer-1} : 1458.931
producer-metrics:record-queue-time-max:{client-id=producer-1} : 1977.000
producer-metrics:record-retry-rate:{client-id=producer-1} : 0.000
producer-metrics:record-retry-total:{client-id=producer-1} : 0.000
producer-metrics:record-send-rate:{client-id=producer-1} : 13975.068
producer-metrics:record-send-total:{client-id=producer-1} : 500000.000
producer-metrics:record-size-avg:{client-id=producer-1} : 286.000
producer-metrics:record-size-max:{client-id=producer-1} : 286.000
producer-metrics:records-per-request-avg:{client-id=producer-1} : 77.809
producer-metrics:request-latency-avg:{client-id=producer-1} : 4.382
producer-metrics:request-latency-max:{client-id=producer-1} : 136.000
producer-metrics:request-rate:{client-id=producer-1} : 179.406
producer-metrics:request-size-avg:{client-id=producer-1} : 16383.432
producer-metrics:request-size-max:{client-id=producer-1} : 16431.000
producer-metrics:request-total:{client-id=producer-1} : 6429.000
producer-metrics:requests-in-flight:{client-id=producer-1} : 0.000
producer-metrics:response-rate:{client-id=producer-1} : 179.411
producer-metrics:response-total:{client-id=producer-1} : 6429.000
producer-metrics:select-rate:{client-id=producer-1} : 481.330
producer-metrics:select-total:{client-id=producer-1} : 17325.000
producer-metrics:successful-authentication-no-reauth-total:{client-id=producer-1} : 0.000
producer-metrics:successful-authentication-rate:{client-id=producer-1} : 0.000
producer-metrics:successful-authentication-total:{client-id=producer-1} : 0.000
producer-metrics:successful-reauthentication-rate:{client-id=producer-1} : 0.000
producer-metrics:successful-reauthentication-total:{client-id=producer-1} : 0.000
producer-metrics:waiting-threads:{client-id=producer-1} : 0.000
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node--1} : 12.335
producer-node-metrics:incoming-byte-rate:{client-id=producer-1, node-id=node-0} : 10064.949
producer-node-metrics:incoming-byte-total:{client-id=producer-1, node-id=node--1} : 442.000
producer-node-metrics:incoming-byte-total:{client-id=producer-1, node-id=node-0} : 360144.000
producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node--1} : 1.702
producer-node-metrics:outgoing-byte-rate:{client-id=producer-1, node-id=node-0} : 2943220.331
producer-node-metrics:outgoing-byte-total:{client-id=producer-1, node-id=node--1} : 61.000
producer-node-metrics:outgoing-byte-total:{client-id=producer-1, node-id=node-0} : 105329026.000
producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node--1} : NaN
producer-node-metrics:request-latency-avg:{client-id=producer-1, node-id=node-0} : 4.382
producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node--1} : NaN
producer-node-metrics:request-latency-max:{client-id=producer-1, node-id=node-0} : 136.000
producer-node-metrics:request-rate:{client-id=producer-1, node-id=node--1} : 0.056
producer-node-metrics:request-rate:{client-id=producer-1, node-id=node-0} : 179.590
producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node--1} : 30.500
producer-node-metrics:request-size-avg:{client-id=producer-1, node-id=node-0} : 16388.521
producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node--1} : 37.000
producer-node-metrics:request-size-max:{client-id=producer-1, node-id=node-0} : 16431.000
producer-node-metrics:request-total:{client-id=producer-1, node-id=node--1} : 2.000
producer-node-metrics:request-total:{client-id=producer-1, node-id=node-0} : 6427.000
producer-node-metrics:response-rate:{client-id=producer-1, node-id=node--1} : 0.056
producer-node-metrics:response-rate:{client-id=producer-1, node-id=node-0} : 179.615
producer-node-metrics:response-total:{client-id=producer-1, node-id=node--1} : 2.000
producer-node-metrics:response-total:{client-id=producer-1, node-id=node-0} : 6427.000
producer-topic-metrics:byte-rate:{client-id=producer-1, topic=test} : 2934343.237
producer-topic-metrics:byte-total:{client-id=producer-1, topic=test} : 104981998.000
producer-topic-metrics:compression-rate:{client-id=producer-1, topic=test} : 1.000
producer-topic-metrics:record-error-rate:{client-id=producer-1, topic=test} : 0.000
producer-topic-metrics:record-error-total:{client-id=producer-1, topic=test} : 0.000
producer-topic-metrics:record-retry-rate:{client-id=producer-1, topic=test} : 0.000
producer-topic-metrics:record-retry-total:{client-id=producer-1, topic=test} : 0.000
producer-topic-metrics:record-send-rate:{client-id=producer-1, topic=test} : 13975.459
producer-topic-metrics:record-send-total:{client-id=producer-1, topic=test} : 500000.000
~~~
> throughput:用来进行限流控制,当设定的值小于 0 时不限流,当设定的值大于 0 时,如果发送的吞吐量大于该值时就会被阻塞一段时间。
> print-metrics:指定了这个参数时会在测试完成之后打印很多指标信息,对很多测试任务而言具有一定的参考价值。
<br >
1. 消费者吞吐量测试
和 kafka-producer-perf-test 脚本类似,Kafka 为 consumer 也提供了方便、便捷的性能测试脚本,即 kafka-consumer-perf-test 脚本。我们首先用它在刚刚搭建的 Kafka 集群环境中测试一下新版本 consumer 的吞吐量。
~~~
$ bin/kafka-consumer-perf-test.sh --broker-list localhost:9092 --messages 500000 --topic test
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2020-01-13 13:24:56:267, 2020-01-13 13:24:57:259, 95.4016, 96.1710, 500181, 504214.7177, 26, 966, 98.7594, 517785.7143
~~~
<br >
# **参考资料**
* [Apache Kafka QuickStart](http://kafka.apache.org/quickstart)
<br >