# How filters work in Apache 2.0
### Warning
This is a cut 'n paste job from an email (<022501c1c529$f63a9550$7f00000a@KOJ>) and only reformatted for better readability. It's not up to date but may be a good start for further research.
## Filter Types
There are three basic filter types (each of these is actually broken down into two categories, but that comes later).
`CONNECTION`
Filters of this type are valid for the lifetime of this connection. (`AP_FTYPE_CONNECTION`, `AP_FTYPE_NETWORK`)
`PROTOCOL`
Filters of this type are valid for the lifetime of this request from the point of view of the client, this means that the request is valid from the time that the request is sent until the time that the response is received. (`AP_FTYPE_PROTOCOL`, `AP_FTYPE_TRANSCODE`)
`RESOURCE`
Filters of this type are valid for the time that this content is used to satisfy a request. For simple requests, this is identical to `PROTOCOL`, but internal redirects and sub-requests can change the content without ending the request. (`AP_FTYPE_RESOURCE`, `AP_FTYPE_CONTENT_SET`)
It is important to make the distinction between a protocol and a resource filter. A resource filter is tied to a specific resource, it may also be tied to header information, but the main binding is to a resource. If you are writing a filter and you want to know if it is resource or protocol, the correct question to ask is: "Can this filter be removed if the request is redirected to a different resource?" If the answer is yes, then it is a resource filter. If it is no, then it is most likely a protocol or connection filter. I won't go into connection filters, because they seem to be well understood. With this definition, a few examples might help:
Byterange
We have coded it to be inserted for all requests, and it is removed if not used. Because this filter is active at the beginning of all requests, it can not be removed if it is redirected, so this is a protocol filter.
http_header
This filter actually writes the headers to the network. This is obviously a required filter (except in the asis case which is special and will be dealt with below) and so it is a protocol filter.
Deflate
The administrator configures this filter based on which file has been requested. If we do an internal redirect from an autoindex page to an index.html page, the deflate filter may be added or removed based on config, so this is a resource filter.
The further breakdown of each category into two more filter types is strictly for ordering. We could remove it, and only allow for one filter type, but the order would tend to be wrong, and we would need to hack things to make it work. Currently, the `RESOURCE` filters only have one filter type, but that should change.
## How are filters inserted?
This is actually rather simple in theory, but the code is complex. First of all, it is important that everybody realize that there are three filter lists for each request, but they are all concatenated together. So, the first list is `r->output_filters`, then `r->proto_output_filters`, and finally `r->connection->output_filters`. These correspond to the `RESOURCE`, `PROTOCOL`, and `CONNECTION` filters respectively. The problem previously, was that we used a singly linked list to create the filter stack, and we started from the "correct" location. This means that if I had a `RESOURCE` filter on the stack, and I added a `CONNECTION` filter, the `CONNECTION` filter would be ignored. This should make sense, because we would insert the connection filter at the top of the `c->output_filters` list, but the end of `r->output_filters` pointed to the filter that used to be at the front of `c->output_filters`. This is obviously wrong. The new insertion code uses a doubly linked list. This has the advantage that we never lose a filter that has been inserted. Unfortunately, it comes with a separate set of headaches.
The problem is that we have two different cases were we use subrequests. The first is to insert more data into a response. The second is to replace the existing response with an internal redirect. These are two different cases and need to be treated as such.
In the first case, we are creating the subrequest from within a handler or filter. This means that the next filter should be passed to `make_sub_request` function, and the last resource filter in the sub-request will point to the next filter in the main request. This makes sense, because the sub-request's data needs to flow through the same set of filters as the main request. A graphical representation might help:
```
Default_handler --> includes_filter --> byterange --> ...
```
If the includes filter creates a sub request, then we don't want the data from that sub-request to go through the includes filter, because it might not be SSI data. So, the subrequest adds the following:
```
Default_handler --> includes_filter -/-> byterange --> ...
/
Default_handler --> sub_request_core
```
What happens if the subrequest is SSI data? Well, that's easy, the `includes_filter` is a resource filter, so it will be added to the sub request in between the `Default_handler` and the `sub_request_core` filter.
The second case for sub-requests is when one sub-request is going to become the real request. This happens whenever a sub-request is created outside of a handler or filter, and NULL is passed as the next filter to the `make_sub_request` function.
In this case, the resource filters no longer make sense for the new request, because the resource has changed. So, instead of starting from scratch, we simply point the front of the resource filters for the sub-request to the front of the protocol filters for the old request. This means that we won't lose any of the protocol filters, neither will we try to send this data through a filter that shouldn't see it.
The problem is that we are using a doubly-linked list for our filter stacks now. But, you should notice that it is possible for two lists to intersect in this model. So, you do you handle the previous pointer? This is a very difficult question to answer, because there is no "right" answer, either method is equally valid. I looked at why we use the previous pointer. The only reason for it is to allow for easier addition of new servers. With that being said, the solution I chose was to make the previous pointer always stay on the original request.
This causes some more complex logic, but it works for all cases. My concern in having it move to the sub-request, is that for the more common case (where a sub-request is used to add data to a response), the main filter chain would be wrong. That didn't seem like a good idea to me.
## Asis
The final topic. :-) Mod_Asis is a bit of a hack, but the handler needs to remove all filters except for connection filters, and send the data. If you are using `mod_asis`, all other bets are off.
## Explanations
The absolutely last point is that the reason this code was so hard to get right, was because we had hacked so much to force it to work. I wrote most of the hacks originally, so I am very much to blame. However, now that the code is right, I have started to remove some hacks. Most people should have seen that the `reset_filters`和`add_required_filters` functions are gone. Those inserted protocol level filters for error conditions, in fact, both functions did the same thing, one after the other, it was really strange. Because we don't lose protocol filters for error cases any more, those hacks went away. The `HTTP_HEADER`, `Content-length`, and `Byterange` filters are all added in the `insert_filters` phase, because if they were added earlier, we had some interesting interactions. Now, those could all be moved to be inserted with the `HTTP_IN`, `CORE`, and `CORE_IN` filters. That would make the code easier to follow.
- Apache HTTP Server Version 2.2 文档 [最后更新:2006年3月21日]
- 版本说明
- 从1.3升级到2.0
- 从2.0升级到2.2
- Apache 2.2 新特性概述
- Apache 2.0 新特性概述
- The Apache License, Version 2.0
- 参考手册
- 编译与安装
- 启动Apache
- 停止和重启
- 配置文件
- 配置段(容器)
- 缓冲指南
- 服务器全局配置
- 日志文件
- 从URL到文件系统的映射
- 安全方面的提示
- 动态共享对象(DSO)支持
- 内容协商
- 自定义错误响应
- 地址和端口的绑定(Binding)
- 多路处理模块
- Apache的环境变量
- Apache处理器的使用
- 过滤器(Filter)
- suEXEC支持
- 性能方面的提示
- URL重写指南
- Apache虚拟主机文档
- 基于主机名的虚拟主机
- 基于IP地址的虚拟主机
- 大批量虚拟主机的动态配置
- 虚拟主机示例
- 深入研究虚拟主机的匹配
- 文件描述符限制
- 关于DNS和Apache
- 常见问题
- 经常问到的问题
- Apache的SSL/TLS加密
- SSL/TLS高强度加密:绪论
- SSL/TLS高强度加密:兼容性
- SSL/TLS高强度加密:如何...?
- SSL/TLS Strong Encryption: FAQ
- 如何.../指南
- 认证、授权、访问控制
- CGI动态页面
- 服务器端包含入门
- .htaccess文件
- 用户网站目录
- 针对特定平台的说明
- 在Microsoft Windows中使用Apache
- 在Microsoft Windows上编译Apache
- Using Apache With Novell NetWare
- Running a High-Performance Web Server on HPUX
- The Apache EBCDIC Port
- 服务器和支持程序
- httpd - Apache超文本传输协议服务器
- ab - Apache HTTP服务器性能测试工具
- apachectl - Apache HTTP服务器控制接口
- apxs - Apache 扩展工具
- configure - 配置源代码树
- dbmmanage - 管理DBM格式的用户认证文件
- htcacheclean - 清理磁盘缓冲区
- htdbm - 操作DBM密码数据库
- htdigest - 管理用于摘要认证的用户文件
- httxt2dbm - 生成RewriteMap指令使用的dbm文件
- htpasswd - 管理用于基本认证的用户文件
- logresolve - 解析Apache日志中的IP地址为主机名
- rotatelogs - 滚动Apache日志的管道日志程序
- suexec - 在执行外部程序之前切换用户
- 其他程序
- 杂项文档
- 与Apache相关的标准
- Apache模块
- 描述模块的术语
- 描述指令的术语
- Apache核心(Core)特性
- Apache MPM 公共指令
- Apache MPM beos
- Apache MPM event
- Apache MPM netware
- Apache MPM os2
- Apache MPM prefork
- Apache MPM winnt
- Apache MPM worker
- Apache模块 mod_actions
- Apache模块 mod_alias
- Apache模块 mod_asis
- Apache模块 mod_auth_basic
- Apache模块 mod_auth_digest
- Apache模块 mod_authn_alias
- Apache模块 mod_authn_anon
- Apache模块 mod_authn_dbd
- Apache模块 mod_authn_dbm
- Apache模块 mod_authn_default
- Apache模块 mod_authn_file
- Apache模块 mod_authnz_ldap
- Apache模块 mod_authz_dbm
- Apache模块 mod_authz_default
- Apache模块 mod_authz_groupfile
- Apache模块 mod_authz_host
- Apache模块 mod_authz_owner
- Apache模块 mod_authz_user
- Apache模块 mod_autoindex
- Apache模块 mod_cache
- Apache模块 mod_cern_meta
- Apache模块 mod_cgi
- Apache模块 mod_cgid
- Apache模块 mod_charset_lite
- Apache模块 mod_dav
- Apache模块 mod_dav_fs
- Apache模块 mod_dav_lock
- Apache模块 mod_dbd
- Apache模块 mod_deflate
- Apache模块 mod_dir
- Apache模块 mod_disk_cache
- Apache模块 mod_dumpio
- Apache模块 mod_echo
- Apache模块 mod_env
- Apache模块 mod_example
- Apache模块 mod_expires
- Apache模块 mod_ext_filter
- Apache模块 mod_file_cache
- Apache模块 mod_filter
- Apache模块 mod_headers
- Apache模块 mod_ident
- Apache模块 mod_imagemap
- Apache模块 mod_include
- Apache模块 mod_info
- Apache模块 mod_isapi
- Apache模块 mod_ldap
- Apache模块 mod_log_config
- Apache模块 mod_log_forensic
- Apache模块 mod_logio
- Apache模块 mod_mem_cache
- Apache模块 mod_mime
- Apache模块 mod_mime_magic
- Apache模块 mod_negotiation
- Apache模块 mod_nw_ssl
- Apache模块 mod_proxy
- Apache模块 mod_proxy_ajp
- Apache模块 mod_proxy_balancer
- Apache模块 mod_proxy_connect
- Apache模块 mod_proxy_ftp
- Apache模块 mod_proxy_http
- Apache模块 mod_rewrite
- Apache模块 mod_setenvif
- Apache模块 mod_so
- Apache模块 mod_speling
- Apache模块 mod_ssl
- Apache模块 mod_status
- Apache模块 mod_suexec
- Apache模块 mod_unique_id
- Apache模块 mod_userdir
- Apache模块 mod_usertrack
- Apache模块 mod_version
- Apache模块 mod_vhost_alias
- Developer Documentation for Apache 2.0
- Apache 1.3 API notes
- Debugging Memory Allocation in APR
- Documenting Apache 2.0
- Apache 2.0 Hook Functions
- Converting Modules from Apache 1.3 to Apache 2.0
- Request Processing in Apache 2.0
- How filters work in Apache 2.0
- Apache 2.0 Thread Safety Issues
- 词汇和索引
- 词汇表
- 指令索引
- 指令速查
- 模块索引
- 站点导航