加载样本数据 · Kibana 5.2 中文文档

# 加载样本数据原文链接 : [https://www.elastic.co/guide/en/kibana/5.2/tutorial-load-dataset.html](https://www.elastic.co/guide/en/kibana/5.2/tutorial-load-dataset.html) 译文链接 : [http://www.apache.wiki/pages/viewpage.action?pageId=8159567](http://www.apache.wiki/pages/viewpage.action?pageId=8159567) 贡献者 : [片刻](/display/~jiangzhonglian)， [那伊抹微笑](/display/~wangyangting)，[ApacheCN](/display/~apachecn)，[Apache中文网](/display/~apachechina) 本节中的教程依赖以下数据集 : * 莎士比亚全集，适当解析成多个字段。下载此数据，点击这里 : [shakespeare.json](https://www.elastic.co/guide/en/kibana/3.0/snippets/shakespeare.json)。 * 一组虚构的账户，随机生成的数据。下载此数据，点击这里 : [accounts.zip](https://github.com/bly2k/files/blob/master/accounts.zip?raw=true) * 一组随机生成的日志文件。下载此数据，点击这里 : [logs.jsonl.gz](https://download.elastic.co/demos/kibana/gettingstarted/logs.jsonl.gz) 数据集的两个被压缩。使用下面的命令来解压缩文件 : ``` unzip accounts.zip gunzip logs.jsonl.gz ``` **Shakespeare**（莎士比亚）数据集以下列 **schema**（模式）进行组织 : ``` { "line_id": INT, "play_name": "String", "speech_number": INT, "line_number": "String", "speaker": "String", "text_entry": "String", } ``` **accounts**（帐户）数据集以下列 **schema**（模式）进行组织 : ``` { "account_number": INT, "balance": INT, "firstname": "String", "lastname": "String", "age": INT, "gender": "M or F", "address": "String", "employer": "String", "email": "String", "city": "String", "state": "String" } ``` 对于日志数据集的方案有几十个不同的 **fields**，但在本教程中使用更多的是 : ``` { "memory": INT, "geo.coordinates": "geo_point" "@timestamp": "date" } ``` 在此之前，我们加载莎士比亚数据集，我们需要建立一个映射的字段。映射划分的文件索引成逻辑组，并指定一个 **field **的特征，如该领域的搜索性或是否它的标记化，或分解成单独的词。使用以下命令来为莎士比亚数据集设置 **mapping**（映射）: ``` curl -XPUT http://localhost:9200/shakespeare -d ' { "mappings" : { "_default_" : { "properties" : { "speaker" : {"type": "string", "index" : "not_analyzed" }, "play_name" : {"type": "string", "index" : "not_analyzed" }, "line_id" : { "type" : "integer" }, "speech_number" : { "type" : "integer" } } } } } '; ``` 该 **mapping**（映射）指定了数据集下列特质 : 1. **speaker** 字段是不分析的字符串。在这个 **filed**（字段）中的字符串被视为一个单独的单元，即使在这个 **fileld**（字段）中有多个单词。 2. 这同样适用于 **play_name **字段。 3. **line_id** 和 **speech_number** 字段是整数。日志数据集需要映射，通过将 **`geo_point `**类型应用于这些字段，将日志中的 **latitude**（纬度）/**longitude**（纬度）对标记为地理位置。使用以下命令建立日志 **geo_point mappig**（映射）: ``` curl -XPUT http://localhost:9200/logstash-2015.05.18 -d ' { "mappings": { "log": { "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } } } } } } '; curl -XPUT http://localhost:9200/logstash-2015.05.19 -d ' { "mappings": { "log": { "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } } } } } } '; curl -XPUT http://localhost:9200/logstash-2015.05.20 -d ' { "mappings": { "log": { "properties": { "geo": { "properties": { "coordinates": { "type": "geo_point" } } } } } } } '; ``` **accounts**（账目）数据集不需要任何 **mapping**（映射），所以在这一点上，我们已经准备好使用 **Elasticsearch **的 [bulk](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/docs-bulk.html) **API** 加载数据集，使用以下命令 : ``` curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json curl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl ``` 这些命令可能需要一些时间来执行，这取决于可用的计算资源。验证成功加载使用下面的命令 : ``` curl 'localhost:9200/_cat/indices?v' ``` 您应该看到类似以下的输出 : ``` index pri rep docs.count docs.deleted store.size pri.store.size yellow open bank 5 1 1000 0 418.2kb 418.2kb yellow open shakespeare 5 1 111396 0 17.6mb 17.6mb yellow open logstash-2015.05.18 5 1 4631 0 15.6mb 15.6mb yellow open logstash-2015.05.19 5 1 4624 0 15.7mb 15.7mb yellow open logstash-2015.05.20 5 1 4750 0 16.4mb 16.4mb ```