Understanding EXPLAIN plans · Gitlab 中文文档

# Understanding EXPLAIN plans > 原文：[https://docs.gitlab.com/ee/development/understanding_explain_plans.html](https://docs.gitlab.com/ee/development/understanding_explain_plans.html) * [Nodes](#nodes) * [Node statistics](#node-statistics) * [Node types](#node-types) * [Seq Scan](#seq-scan) * [Index Only Scan](#index-only-scan) * [Index Scan](#index-scan) * [Bitmap Index Scan and Bitmap Heap scan](#bitmap-index-scan-and-bitmap-heap-scan) * [Limit](#limit) * [Sort](#sort) * [Nested Loop](#nested-loop) * [Optimising queries](#optimising-queries) * [Queries that can’t be optimised](#queries-that-cant-be-optimised) * [Cardinality and selectivity](#cardinality-and-selectivity) * [Rewriting queries](#rewriting-queries) * [What makes a bad plan](#what-makes-a-bad-plan) * [Producing query plans](#producing-query-plans) * [Rails console](#rails-console) * [ChatOps](#chatops) * [`#database-lab`](#database-lab) * [Tips & Tricks](#tips--tricks) * [Further reading](#further-reading) # Understanding EXPLAIN plans[](#understanding-explain-plans "Permalink") PostgreSQL 允许您使用`EXPLAIN`命令获得查询计划. 尝试确定查询的执行方式时，此命令非常有用. 您可以在 SQL 查询中直接使用此命令，只要查询以它开头即可： ``` EXPLAIN SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20); ``` 在 GitLab.com 上运行时，将显示以下输出： ``` Aggregate (cost=922411.76..922411.77 rows=1 width=8) -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) Filter: (visibility_level = ANY ('{0,20}'::integer[])) ``` 当*仅*使用`EXPLAIN` ，PostgreSQL 实际上不会执行我们的查询，而是会根据可用的统计信息生成一个*估计的*执行计划. 这意味着实际计划可能相差很大. 幸运的是，PostgreSQL 还为我们提供了执行查询的选项. 为此，我们需要使用`EXPLAIN ANALYZE`而不是`EXPLAIN` ： ``` EXPLAIN ANALYZE SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20); ``` 这将产生： ``` Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) Filter: (visibility_level = ANY ('{0,20}'::integer[])) Rows Removed by Filter: 65677 Planning time: 2.861 ms Execution time: 3428.596 ms ``` 如我们所见，该计划是完全不同的，并且包含许多数据. 让我们逐步讨论一下. 由于`EXPLAIN ANALYZE`执行查询，因此在使用会写入数据或可能会超时的查询时应`EXPLAIN ANALYZE`小心. 如果查询修改了数据，请考虑将其包装在自动回滚的事务中，如下所示： ``` BEGIN; EXPLAIN ANALYZE DELETE FROM users WHERE id = 1; ROLLBACK; ``` `EXPLAIN`命令还包含其他选项，例如`BUFFERS` ： ``` EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20); ``` 然后将产生： ``` Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) Buffers: shared hit=208846 -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) Filter: (visibility_level = ANY ('{0,20}'::integer[])) Rows Removed by Filter: 65677 Buffers: shared hit=208846 Planning time: 2.861 ms Execution time: 3428.596 ms ``` 有关更多信息，请参阅官方的[`EXPLAIN`文档](https://s0www0postgresql0org.icopy.site/docs/current/sql-explain.html)并[使用`EXPLAIN`指南](https://s0www0postgresql0org.icopy.site/docs/current/using-explain.html) . ## Nodes[](#nodes "Permalink") 每个查询计划都由节点组成. 节点可以嵌套，并且可以由内而外执行. 这意味着最内部的节点在外部节点之前执行. 最好将其视为嵌套函数调用，并在展开时返回其结果. 例如，一个计划以`Aggregate`开头，然后是`Nested Loop` ，然后是`Index Only scan`可以认为是以下 Ruby 代码： ``` aggregate( nested_loop( index_only_scan() index_only_scan() ) ) ``` 节点使用`->`表示，后跟所采用的节点类型. 例如： ``` Aggregate (cost=922411.76..922411.77 rows=1 width=8) -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) Filter: (visibility_level = ANY ('{0,20}'::integer[])) ``` 这里执行的第一个节点是`Seq scan on projects` . `Filter:`是应用于节点结果的附加过滤器. 过滤器与 Ruby 的`Array#select`非常相似：它接受输入行，应用过滤器，并生成新的行列表. 节点完成后，我们将在其上方执行`Aggregate` . 嵌套节点将如下所示： ``` Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1) Buffers: shared hit=155 -> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1) Buffers: shared hit=155 -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1) Index Cond: (id < 100) Heap Fetches: 0 -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36) Index Cond: (id = users_1.id) Heap Fetches: 0 Planning time: 2.585 ms Execution time: 0.310 ms ``` 在这里，我们首先执行两次单独的"仅索引"扫描，然后对这两次扫描的结果执行"嵌套循环". ## Node statistics[](#node-statistics "Permalink") 计划中的每个节点都有一组关联的统计信息，例如成本，产生的行数，执行的循环数等等. 例如： ``` Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) ``` 在这里，我们可以看到我们的成本范围为`0.00..908044.47` （稍后我们将对此进行介绍），并且我们估计（因为我们使用的是`EXPLAIN`而不是`EXPLAIN ANALYZE` ），因此此节点将产生总计 5,746,914 行. `width`统计信息描述了每行的估计宽度，以字节为单位. `costs`字段指定节点的价格. 成本以查询计划者的成本参数确定的任意单位衡量. 影响成本的因素取决于各种设置，例如`seq_page_cost` ， `cpu_tuple_cost`和其他各种设置. 费用字段的格式如下： ``` STARTUP COST..TOTAL COST ``` The startup cost states how expensive it was to start the node, with the total cost describing how expensive the entire node was. In general: the greater the values, the more expensive the node. 使用`EXPLAIN ANALYZE` ，这些统计信息还将包括实际花费的时间（以毫秒为单位）以及其他运行时统计信息（例如，产生的行的实际数量）： ``` Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) ``` 在这里我们可以看到估计返回了 5,746,969 行，但实际上，我们返回了 5,746,940 行. 我们还可以看到， *仅*此顺序扫描就花费了 2.98 秒. 使用`EXPLAIN (ANALYZE, BUFFERS)`还将为我们提供有关由过滤器删除的行数，使用的缓冲区数等信息. 例如： ``` Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) Filter: (visibility_level = ANY ('{0,20}'::integer[])) Rows Removed by Filter: 65677 Buffers: shared hit=208846 ``` 在这里，我们可以看到我们的过滤器必须删除 65,677 行，并使用 208,846 个缓冲区. PostgreSQL 中的每个缓冲区都是 8 KB（8192 字节），这意味着我们上面的节点使用*1.6 GB 的缓冲区* . 好多啊！ ## Node types[](#node-types "Permalink") 有很多不同类型的节点，因此我们在这里仅介绍一些较常见的节点. 可以在[PostgreSQL 源文件`plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h)找到所有可用节点及其描述的完整列表. ### Seq Scan[](#seq-scan "Permalink") 对数据库表（的一部分）进行顺序扫描. 这类似于使用`Array#each` ，但是在数据库表上. 检索大量行时，顺序扫描可能会非常慢，因此，对于大型表，最好避免使用这些扫描. ### Index Only Scan[](#index-only-scan "Permalink") 对不需要从表中获取任何内容的索引进行的扫描. 在某些情况下，仅索引扫描仍可能从表中获取数据，在这种情况下，节点将包含" `Heap Fetches:`统计信息. ### Index Scan[](#index-scan "Permalink") 对索引的扫描，该索引需要从表中检索一些数据. ### Bitmap Index Scan and Bitmap Heap scan[](#bitmap-index-scan-and-bitmap-heap-scan "Permalink") 位图扫描介于顺序扫描和索引扫描之间. 当我们从索引扫描中读取太多数据但执行顺序扫描时读取的数据太少时，通常会使用它们. 位图扫描使用所谓的[位图索引](https://en.wikipedia.org/wiki/Bitmap_index)来执行其工作. [PostgreSQL](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441)的[源代码在位](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441)图扫描中指出以下内容： > 位图索引扫描提供潜在元组位置的位图；它不访问堆本身. 该位图可能由祖先位图堆扫描节点使用，可能是在经过中间位图和和/或位图或节点之后将其与其他位图索引扫描的结果组合在一起的. ### Limit[](#limit "Permalink") 在输入行上应用`LIMIT` . ### Sort[](#sort "Permalink") 使用`ORDER BY`语句对输入行进行排序. ### Nested Loop[](#nested-loop "Permalink") 嵌套循环将针对其前面的节点产生的每一行执行其子节点. 例如： ``` -> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1) Buffers: shared hit=155 -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1) Index Cond: (id < 100) Heap Fetches: 0 -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36) Index Cond: (id = users_1.id) Heap Fetches: 0 ``` 在这里，第一个子节点（ `Index Only Scan using users_pkey on users users_1` ）产生 36 行，并执行一次（ `rows=36 loops=1` ）. 下一个节点产生 1 行（ `rows=1` ），但重复 36 次（ `loops=36` ）. 这是因为前一个节点产生了 36 行. 这意味着，如果各个子节点继续产生许多行，则嵌套循环会迅速降低查询速度. ## Optimising queries[](#optimising-queries "Permalink") 顺便说一句，让我们看看如何优化查询. 让我们以以下查询为例： ``` SELECT COUNT(*) FROM users WHERE twitter != ''; ``` 该查询仅计算设置了 Twitter 个人资料的用户数. 让我们使用`EXPLAIN (ANALYZE, BUFFERS)`运行它： ``` EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM users WHERE twitter != ''; ``` 这将产生以下计划： ``` Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1) Buffers: shared hit=202662 -> Seq Scan on users (cost=0.00..844969.99 rows=56087 width=0) (actual time=0.019..1265.883 rows=51833 loops=1) Filter: ((twitter)::text <> ''::text) Rows Removed by Filter: 2487813 Buffers: shared hit=202662 Planning time: 0.390 ms Execution time: 1271.180 ms ``` 从该查询计划中，我们可以看到以下内容： 1. 我们需要对`users`表执行顺序扫描. 2. 此顺序扫描使用`Filter`过滤掉 2,487,813 行. 3. 我们使用 202,622 个缓冲区，相当于 1.58 GB 的内存. 4. 完成所有这些操作需要 1.2 秒. 考虑到我们只是在计算用户，这是相当昂贵的！在开始进行任何更改之前，让我们看一下`users`表上是否有可以使用的现有索引. 我们可以通过在`psql`控制台中运行`\d users` ，然后向下滚动至`Indexes:`部分来获取此信息： ``` Indexes: "users_pkey" PRIMARY KEY, btree (id) "index_users_on_confirmation_token" UNIQUE, btree (confirmation_token) "index_users_on_email" UNIQUE, btree (email) "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token) "index_users_on_static_object_token" UNIQUE, btree (static_object_token) "index_users_on_unlock_token" UNIQUE, btree (unlock_token) "index_on_users_name_lower" btree (lower(name::text)) "index_users_on_accepted_term_id" btree (accepted_term_id) "index_users_on_admin" btree (admin) "index_users_on_created_at" btree (created_at) "index_users_on_email_trigram" gin (email gin_trgm_ops) "index_users_on_feed_token" btree (feed_token) "index_users_on_group_view" btree (group_view) "index_users_on_incoming_email_token" btree (incoming_email_token) "index_users_on_managing_group_id" btree (managing_group_id) "index_users_on_name" btree (name) "index_users_on_name_trigram" gin (name gin_trgm_ops) "index_users_on_public_email" btree (public_email) WHERE public_email::text <> ''::text "index_users_on_state" btree (state) "index_users_on_state_and_user_type" btree (state, user_type) "index_users_on_unconfirmed_email" btree (unconfirmed_email) WHERE unconfirmed_email IS NOT NULL "index_users_on_user_type" btree (user_type) "index_users_on_username" btree (username) "index_users_on_username_trigram" gin (username gin_trgm_ops) "tmp_idx_on_user_id_where_bio_is_filled" btree (id) WHERE COALESCE(bio, ''::character varying)::text IS DISTINCT FROM ''::text ``` 在这里，我们可以看到`twitter`列上没有索引，这意味着 PostgreSQL 在这种情况下必须执行顺序扫描. 让我们尝试通过添加以下索引来解决此问题： ``` CREATE INDEX CONCURRENTLY twitter_test ON users (twitter); ``` 如果现在使用`EXPLAIN (ANALYZE, BUFFERS)`重新运行查询`EXPLAIN (ANALYZE, BUFFERS)`得到以下计划： ``` Aggregate (cost=61002.82..61002.83 rows=1 width=8) (actual time=297.311..297.312 rows=1 loops=1) Buffers: shared hit=51854 dirtied=19 -> Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1) Filter: ((twitter)::text <> ''::text) Rows Removed by Filter: 2487830 Heap Fetches: 26037 Buffers: shared hit=51854 dirtied=19 Planning time: 0.191 ms Execution time: 297.334 ms ``` 现在获取数据只需不到 300 毫秒，而不是 1.2 秒. 但是，我们仍然使用 51,854 个缓冲区，这大约是 400 MB 的内存. 对于这种简单的查询，300 毫秒的速度也很慢. 要了解为什么此查询仍然昂贵，让我们看一下以下内容： ``` Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1) Filter: ((twitter)::text <> ''::text) Rows Removed by Filter: 2487830 ``` 我们从仅对索引进行索引扫描开始，但是我们仍然以某种方式应用了`Filter` ，该过滤`Filter`可过滤掉 2,487,830 行. 这是为什么？好，让我们看一下如何创建索引： ``` CREATE INDEX CONCURRENTLY twitter_test ON users (twitter); ``` 我们只是简单地告诉 PostgreSQL 索引`twitter`列的所有可能值，甚至是空字符串. 我们的查询反过来使用`WHERE twitter != ''` . 这意味着索引确实可以改善事情，因为我们不需要进行顺序扫描，但是我们仍然可能会遇到空字符串. 这意味着 PostgreSQL *必须*对索引结果应用过滤器以摆脱这些值. 幸运的是，我们可以使用"部分索引"来进一步改善它. 部分索引是在索引数据时具有`WHERE`条件的索引. 例如： ``` CREATE INDEX CONCURRENTLY some_index ON users (email) WHERE id < 100 ``` 该索引仅索引与`WHERE id < 100`相匹配的行的`email`值. 我们可以使用部分索引将我们的 Twitter 索引更改为以下内容： ``` CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != ''; ``` 创建后，如果再次运行查询，将得到以下计划： ``` Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1) Buffers: shared hit=44036 -> Index Only Scan using twitter_test on users (cost=0.41..1479.71 rows=51420 width=0) (actual time=0.023..15.514 rows=51833 loops=1) Heap Fetches: 1208 Buffers: shared hit=44036 Planning time: 0.123 ms Execution time: 19.848 ms ``` *好多了* ！现在仅需要 20 毫秒即可获取数据，并且我们仅使用约 344 MB 的缓冲区（而不是原始的 1.58 GB）. 之所以可行，是因为现在 PostgreSQL 不再需要应用`Filter` ，因为索引仅包含不为空的`twitter`值. 请记住，每次您要优化查询时，不应该只添加部分索引. 每个索引都必须为每次写入更新，并且它们可能需要相当多的空间，具体取决于索引数据的数量. 结果，首先检查是否存在可以重用的现有索引. 如果没有，请检查是否可以略微更改现有查询以适合现有查询和新查询. 仅当现有索引无法以任何方式使用时，才添加新索引. ## Queries that can’t be optimised[](#queries-that-cant-be-optimised "Permalink") 既然我们已经了解了如何优化查询，让我们看一下可能无法优化的另一个查询： ``` EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20); ``` `EXPLAIN (ANALYZE, BUFFERS)`的输出如下： ``` Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) Buffers: shared hit=208846 -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) Filter: (visibility_level = ANY ('{0,20}'::integer[])) Rows Removed by Filter: 65677 Buffers: shared hit=208846 Planning time: 2.861 ms Execution time: 3428.596 ms ``` 查看输出，我们看到以下过滤器： ``` Filter: (visibility_level = ANY ('{0,20}'::integer[])) Rows Removed by Filter: 65677 ``` 查看过滤器删除的行数，我们可能会想在`projects.visibility_level`上添加索引，以某种方式将此顺序扫描+过滤器转换为仅索引扫描. 不幸的是，这样做不可能改善任何事情. 与某些人的看法相反，存在索引*并不能保证* PostgreSQL 会实际使用它. 例如，在执行`SELECT * FROM projects`时，仅扫描整个表而不是使用索引然后从表中获取数据要便宜得多. 在这种情况下，PostgreSQL 可能会决定不使用索引. 其次，让我们考虑一下查询的作用：它使所有项目的可见性级别为 0 或 20.在上面的计划中，我们可以看到生成了很多行（5,745,940），但相对于总行数是多少？通过运行以下查询来找出答案： ``` SELECT visibility_level, count(*) AS amount FROM projects GROUP BY visibility_level ORDER BY visibility_level ASC; ``` 对于 GitLab.com，这将产生： ``` visibility_level | amount ------------------+--------- 0 | 5071325 10 | 65678 20 | 674801 ``` 这里的项目总数为 5,811,804，其中 5,746,126 为 0 或 20 级.这是整个表的 98％！因此，无论我们做什么，此查询都将检索整个表的 98％. 由于大部分时间都花在做到这一点上，因此除了*完全不*运行查询之外，我们几乎没有其他方法可以改进此查询. 这里重要的是，尽管有些人可能建议您在看到顺序扫描时立即直接添加索引，但*更重要的*是首先了解查询的功能，检索的数据量等等. 毕竟，您无法优化您不了解的内容. ### Cardinality and selectivity[](#cardinality-and-selectivity "Permalink") 早先我们看到查询必须检索表中 98％的行. 数据库通常使用两个术语：基数和选择性. 基数是指表中特定列中唯一值的数量. 选择性是相对于总行数，操作（例如索引扫描或过滤器）产生的唯一值的数量. 选择性越高，PostgreSQL 使用索引的可能性就越大. In the above example, there are only 3 unique values: 0, 10, and 20\. This means the cardinality is 3\. The selectivity in turn is also very low: 0.0000003% (2 / 5,811,804), because our `Filter` only filters using two values (`0` and `20`). With such a low selectivity value it’s not surprising that PostgreSQL decides using an index is not worth it, because it would produce almost no unique rows. ## Rewriting queries[](#rewriting-queries "Permalink") 因此，上述查询无法真正按原样进行优化，或者至少没有太多优化. 但是，如果我们稍微改变它的目的怎么办？如果不是检索`visibility_level` 0 或 20 的所有项目，而是检索用户与之交互的那些项目，该怎么办？幸运的是，GitLab 对此有一个答案，它是一个名为`user_interacted_projects`的表. 该表具有以下架构： ``` Table "public.user_interacted_projects" Column | Type | Modifiers ------------+---------+----------- user_id | integer | not null project_id | integer | not null Indexes: "index_user_interacted_projects_on_project_id_and_user_id" UNIQUE, btree (project_id, user_id) "index_user_interacted_projects_on_user_id" btree (user_id) Foreign-key constraints: "fk_rails_0894651f08" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE "fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE ``` 让我们重写查询以将该表加入到我们的项目中，并获取特定用户的项目： ``` EXPLAIN ANALYZE SELECT COUNT(*) FROM projects INNER JOIN user_interacted_projects ON user_interacted_projects.project_id = projects.id WHERE projects.visibility_level IN (0, 20) AND user_interacted_projects.user_id = 1; ``` 我们在这里做的是以下几点： 1. 获得我们的项目. 2. INNER JOIN `user_interacted_projects` ，这意味着我们只剩下在`user_interacted_projects`中有对应行的`projects`中的行. 3. 将其限制为`visibility_level`为 0 或 20 的项目以及 ID 为 1 的用户与之交互的项目. 如果运行此查询，则会得到以下计划： ``` Aggregate (cost=871.03..871.04 rows=1 width=8) (actual time=9.763..9.763 rows=1 loops=1) -> Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1) -> Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1) Index Cond: (user_id = 1) -> Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) Index Cond: (id = user_interacted_projects.project_id) Filter: (visibility_level = ANY ('{0,20}'::integer[])) Rows Removed by Filter: 0 Planning time: 2.614 ms Execution time: 9.809 ms ``` 在这里，仅花了不到 10 毫秒的时间即可获取数据. 我们还可以看到我们正在检索的项目要少得多： ``` Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) Index Cond: (id = user_interacted_projects.project_id) Filter: (visibility_level = ANY ('{0,20}'::integer[])) Rows Removed by Filter: 0 ``` 在这里，我们看到执行了 145 个循环（ `loops=145` ），每个循环产生 1 行（ `rows=1` ）. 这比以前少了很多，我们的查询执行得更好！如果我们看一下计划，我们还会发现我们的成本非常低： ``` Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) ``` 在这里，我们的成本仅为 3.45，而仅需 0.050 毫秒即可完成. 下一次索引扫描会贵一些： ``` Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1) ``` 此处的成本为 160.71（ `cost=0.43..160.71` ），大约需要 2.5 毫秒（基于`actual time=....`的输出`actual time=....` ）. 这里最昂贵的部分是对这两个索引扫描的结果起作用的"嵌套循环"： ``` Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1) ``` 在这里，我们必须对 203 行（9.748 毫秒）执行 870.52 磁盘页读取，在单个循环中产生 143 行. 这里的主要要点是，有时您必须重写（部分）查询以使其更好. 有时，这意味着必须稍微更改功能以适应更好的性能. ## What makes a bad plan[](#what-makes-a-bad-plan "Permalink") 这是一个很难回答的问题，因为"坏"的定义与您要解决的问题有关. 但是，在大多数情况下最好避免某些模式，例如： * 大表上的顺序扫描 * 删除大量行的过滤器 * 执行某个步骤（例如索引扫描）需要*很多*缓冲区（例如，GitLab.com 大于 512 MB）. 作为一般准则，请针对以下查询： 1. 不超过 10 毫秒. 我们每个请求在 SQL 中花费的目标时间约为 100 毫秒，因此每个查询应尽可能快. 2. 相对于工作负载，不使用过多的缓冲区. 例如，检索十行应该不需要 1 GB 的缓冲区. 3. 不花费大量时间执行磁盘 IO 操作. 必须启用设置`track_io_timing` ，此数据才能包含在`EXPLAIN ANALYZE`的输出中. 4. 在检索行而不对其进行汇总时应用`LIMIT` ，例如`SELECT * FROM users` . 5. 不使用`Filter`过滤掉过多的行，尤其是在查询不使用`LIMIT`限制返回的行数的情况下. 通常可以通过添加（部分）索引来删除过滤器. 这些是*准则* ，不是硬性要求，因为不同的需求可能需要不同的查询. 唯一的*规则*是，您*必须始终*使用`EXPLAIN (ANALYZE, BUFFERS)`和相关工具*来衡量*您的查询（最好使用类似生产的数据库）： * [`explain.depesz.com`](https://explain.depesz.com/). * [`explain.dalibo.com/`](https://explain.dalibo.com/). ## Producing query plans[](#producing-query-plans "Permalink") 有几种获取查询计划输出的方法. 当然，您可以直接在`psql`控制台中运行`EXPLAIN`查询，或者可以遵循以下其他选项之一. ### Rails console[](#rails-console "Permalink") Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze) you can directly generate the query plan from the Rails console: ``` pry(main)> require 'activerecord-explain-analyze' => true pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true) Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600) ↳ (pry):12 => EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600) Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1) Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ... Filter: (projects.build_timeout > 3600) Rows Removed by Filter: 14 Buffers: shared hit=2 Planning time: 0.411 ms Execution time: 0.113 ms ``` ### ChatOps[](#chatops "Permalink") [GitLab 员工还可以使用`/chatops` slash 命令在 Slack 中使用我们的 ChatOps 解决方案](chatops_on_gitlabcom.html) . 您可以使用 ChatOps 通过运行以下命令来获取查询计划： ``` /chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20) ``` 还支持使用[https://explain.depesz.com/](https://explain.depesz.com/)可视化计划： ``` /chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20) ``` 不需要查询. 有关可用选项的更多信息，请运行： ``` /chatops run explain --help ``` ### `#database-lab`[](#database-lab "Permalink") GitLab 员工可以使用的另一种工具是由[Joe](https://gitlab.com/postgres-ai/joe)支持的聊天机器人，该机器人使用[Database Lab](https://gitlab.com/postgres-ai/database-lab)立即为开发人员提供他们自己的生产数据库克隆. Joe 在 Slack 的[`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C)频道中可用. 与 ChatOps 不同，它为您提供了一种执行 DDL 语句（如创建索引和表）并获取查询计划的方法，该查询计划不仅适用于`SELECT`而且适用于`UPDATE`和`DELETE` . 例如，为了测试新索引，您可以执行以下操作：创建索引： ``` exec CREATE INDEX index_projects_last_activity ON projects (last_activity_at) WHERE last_activity_at IS NOT NULL ``` 分析表以更新其统计信息： ``` exec ANALYZE projects ``` 获取查询计划： ``` explain SELECT * FROM projects WHERE last_activity_at < CURRENT_DATE ``` 完成后，您可以回滚您的更改： ``` reset ``` 有关可用选项的更多信息，请运行： ``` help ``` #### Tips & Tricks[](#tips--tricks "Permalink") 现在，在整个会话期间都将维护数据库连接，因此您可以对任何会话变量（例如`enable_seqscan`或`work_mem` ）使用`exec set ...` 这些设置将应用于所有后续命令，直到您重置它们. 也可以使用事务. 当您处理修改数据的语句（例如 INSERT，UPDATE 和 DELETE）时，这可能很有用. `explain`命令将执行`EXPLAIN ANALYZE` ，该语句执行该语句. 为了从干净状态开始运行每个`explain` ，您可以将其包装在事务中，例如： ``` exec BEGIN explain UPDATE some_table SET some_column = TRUE exec ROLLBACK ``` ## Further reading[](#further-reading "Permalink") 在[Dalibo.org](https://www.dalibo.com/en/)的[演示中](https://public.dalibo.com/exports/conferences/_archives/_2012/201211_explain/understanding_explain.pdf)可以找到关于理解查询计划的更广泛的指南.