访问配置中的事件数据和字段 · Logstash中文文档（个人翻译）

[TOC] Logstash agent是一个有着三个阶段的处理管道。包括：inputs→filters→outputs。Inputs生成事件，filters修改他们，outputs将他们送到任何地方。所有的事件都有自己的特性。如Apache的访问日志会包括如状态码(200,404),请求路径("/","index.html")，HTTP verb(GET,POST)，客户端IP地址等。Logstash将这些特性称作"fields"。 Logstash中的一些配置选项需要字段的存在才能发挥作用。因为inputs生成事件，所以在input阶段没有可用发fields，此时fields还不存在。原文：Because inputs generate events, there are no fields to evaluate within the input block—they do not exist yet! 因为对事件和fields的依赖，下面的配置仅工作在filter和output。 > IMPORTANT：下面提到的字段引用，sprintf格式和条件不能在input工作。 ### 字段引用通常来说能够使用名字引用字段是很有用的。为此你可以使用Logstash的字段引用语法。访问字段的语法是`[fieldname]`。如果你访问一个**top-level field**(顶级字段)，你可以省略`[]`只用`fieldname`。要引用一个**nested field**(嵌套字段),你需要声明字段的完整路径：`[top-level field][nested field]` 示例：下面的事件有5个顶级字段(agent，ip，request，response，ua)和三个嵌套字段(status，bytes，os)。 ```json { "agent": "Mozilla/5.0 (compatible; MSIE 9.0)", "ip": "192.168.24.44", "request": "/index.html" "response": { "status": 200, "bytes": 52353 }, "ua": { "os": "Windows 7" } } ``` 要引用`os`字段,你要声明`[ua][os]`.要引用顶级字段如`request`,你可以值声明字段名. ### sprintf 格式字段引用格式还可以用一种Logstash称作*sprintf*的格式.这种格式允许你引用字段值从其他字符串中. For example, the statsd output has an *increment*setting that enables you to keep a count of apache logs by status code: ```json output { statsd { increment => "apache.%{[response][status]}" } } ``` > 注:这段,有点难度.示例使用了一个叫`statsd`的插件.由于目前我还不知道这个插件的用途.所以,示例的描述信息不好理解.放上原文,自行脑补. > > The field reference format is also used in what Logstash calls *sprintf format*. This format enables you to refer to field values from within other strings. For example, the statsd output has an *increment*setting that enables you to keep a count of apache logs by status code: 同样的，您可以将`@timestamp`字段中的时间戳转换为字符串。不要在大括号内指定字段名，而是使用`+FORMAT`语法，其中`FORMAT`是[时间格式](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html)。 Similarly, you can convert the timestamp in the `@timestamp` field into a string. Instead of specifying a field name inside the curly braces, use the `+FORMAT` syntax where `FORMAT` is a [time format](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html). 比如,你想要文件输出根据事件的日期和时间以及类型字段写入日志: For example, if you want to use the file output to write to logs based on the event’s date and hour and the `type` field: ```json output { file { path => "/var/log/%{type}.%{+yyyy.MM.dd.HH}" } } ``` ### 条件式有些时候你只想在特定条件下过滤和输出事件.这种情况,你可以使用条件式. Logstash的条件式看起来和其他编程语言用法相同.条件式支持`if`,`else if`和`else`语句,并且支持嵌套. 条件式的语法是: ```json if EXPRESSION { ... } else if EXPRESSION { ... } else { ... } ``` 什么是条件式?测试比较,布尔逻辑等等! 你可以使用下面的比较运算符. + 等式:`==`,`!=`,`<`,`>`,`<=`,`>=` + 正则匹配:`=~`,`!~`(检查右边的正则能否匹配左边的字符串) + 包含:`in`,`not in` 支持的布尔操作为: + `and`,`or`,`nand`(与非,具体怎么运算自行百度吧.头一次听这个逻辑运算),`xor` 表达式可以写的很长很复杂,表达式中可以包含其他的表达式,你可以使用`!`来让表达式失效,也可以使用(...)来给他们分组. 比如下面的条件式使用mutate过滤器来移除`secret`字段,如果`field`字段的值为`login`: ```json filter { if [action] == "login" { mutate { remove_field => "secret" } } } ``` 你可以在单个条件中指定多个表达式: ```json output { # Send production errors to pagerduty if [loglevel] == "ERROR" and [deployment] == "production" { pagerduty { ... } } } ``` 你可以使用`in`来测试一个字段是否包含一个特定的字符串,key,或(for lists) 元素 ```json filter { if [foo] in [foobar] { mutate { add_tag => "field in field" } } if [foo] in "foo" { mutate { add_tag => "field in string" } } if "hello" in [greeting] { mutate { add_tag => "string in field" } } if [foo] in ["hello", "world", "foo"] { mutate { add_tag => "field in list" } } if [missing] in [alsomissing] { mutate { add_tag => "shouldnotexist" } } if !("foo" in ["hello", "world"]) { mutate { add_tag => "shouldexist" } } } ``` 你可以以同样的方式使用`not in`条件式.比如你可以使用`not in`来只将`grok`成功的事件存入Elasticsearch: ```json output { if "_grokparsefailure" not in [tags] { elasticsearch { ... } } } ``` 你可以检查一个特定的字段是否存在,但是目前并不能区分不存在的字段和有错误的字段.表达式`if [foo]`会返回`false`在以下情况: + `[foo]`字段不存在, + `[foo]`存在,but is false(应是其值为false),或者 + `[foo]`存在,但为空更多示例参考[使用条件式](https://www.elastic.co/guide/en/logstash/current/config-examples.html#using-conditionals)章节. ### @metadata字段在Logstash1.5及之后的版本中,有一个叫做`@metadata`的特殊字段.`@metadata`中的内容不会在输出的时候成为你事件的一部分,这使得它很适合用于条件式,或者使用字段引用和sprintf格式来扩展和构建事件字段. 下面的配置文件将从STDIN产生事件.任何输入都会变成事件中的`message`字段.Filter块中的`mutate`事件将添加一些字段,其中有些嵌套在`@metadata`字段. ```json input { stdin { } } filter { mutate { add_field => { "show" => "This date will be in the output" } } mutate { add_field => { "[@metadata][test]" => "Hello" } } mutate { add_field => { "[@metadata][no_show]" => "This data will not be in the output" } } } output { if [@metadata][test] == "Hello" { stdout { codec => rubydebug } } } ``` 让我们看看输出了什么: ```text $ bin/logstash -f ../test.conf Pipeline main started asdf { "@timestamp" => 2016-06-30T02:42:51.496Z, "@version" => "1", "host" => "example.com", "show" => "This data will be in the output", "message" => "asdf" } ``` 输入的`asdf`变成了`message`字段的内容,并且条件式成功的评估了`@metadata`中嵌套的`test`字段中的内容.但是在output中并没有显示一个叫`@metadata`的字段及其内容. `Rubydebug` codec允许你暴露`@metadata`字段中的内容,如果你在配置中添加了`metadata => true`: ```ruby stdout { codec => rubydebug { metadata => true } } ``` 让我们看看这次的输出有什么变化: ```text $ bin/logstash -f ../test.conf Pipeline main started asdf { "@timestamp" => 2016-06-30T02:46:48.565Z, "@metadata" => { "test" => "Hello", "no_show" => "This data will not be in the output" }, "@version" => "1", "host" => "example.com", "show" => "This data will be in the output", "message" => "asdf" } ``` 现在你可以看到`@metadata`字段和它的子字段了. > IMPORTANT:只有`rubydebug` codec可以允许你显示`@metadata`字段中的内容. 在你需要一个临时字段,但又不想它最终输出的时候,可以使用`@metadata`字段. 这个新字段最常见的用例之一可能是日期过滤器和临时时间戳. 这个配置文件已经被简化,但使用了Apache和Nginx通用的时间戳格式.过去,在使用时间戳字段覆盖`@timestamp`字段之后你必须自己删除时间戳字段.使用`@metadata`字段,这不在是必要操作. ```ruby input { stdin { } } filter { grok { match => [ "message", "%{HTTPDATE:[@metadata][timestamp]}" ] } date { match => [ "[@metadata][timestamp]", "dd/MMM/yyyy:HH:mm:ss Z" ] } } output { stdout { codec => rubydebug } } ``` 请注意，此配置将在`grok` filter中提取的日期放入`[@metadata][timestamp]`字段中.让我们给这个配置提供一个日期字符串样本，看看结果如何: ```ruby $ bin/logstash -f ../test.conf Pipeline main started 02/Mar/2014:15:36:43 +0100 { "@timestamp" => 2014-03-02T14:36:43.000Z, "@version" => "1", "host" => "example.com", "message" => "02/Mar/2014:15:36:43 +0100" } ``` 就是这样!输出中没有多余的字段,因为不必在`date` filter转换之后删除一个"timestamp"字段,配置文件也更简洁. 原文:That’s it! No extra fields in the output, and a cleaner config file because you do not have to delete a "timestamp" field after conversion in the `date` filter. 另一个用例是CouchDB作为input插件(查看:<https://github.com/logstash-plugins/logstash-input-couchdb_changes>).这个插件自动捕获CouchDB文档中的字段元数据到input插件自身的`@metadata`字段中去.当事件通过Elasticsearch进行索引的时候,Elasticsearch output 插件允许你指定动作(delete,Update,insert,etc.)和`document_id`,就像这样: 原文：Another use case is the CouchDB Changes input plugin (See<https://github.com/logstash-plugins/logstash-input-couchdb_changes>). This plugin automatically captures CouchDB document field metadata into the`@metadata` field within the input plugin itself. When the events pass through to be indexed by Elasticsearch, the Elasticsearch output plugin allows you to specify the `action` (delete, update, insert, etc.) and the `document_id`, like this: ```ruby output { elasticsearch { action => "%{[@metadata][action]}" document_id => "%{[@metadata][_id]}" hosts => ["example.com"] index => "index_name" protocol => "http" } } ```