ThinkChat🤖让你学习和工作更高效,注册即送10W Token,即刻开启你的AI之旅 广告
[TOC] > [参考](https://zhuanlan.zhihu.com/p/355785817) ## 概述 PostgreSQL数据库的分布式中间件,用以解决PostgreSQL横向扩展问题,以支持更大的数据量、更大的写入和查询性能 ## 场景 ### 多租户场景 * 高性能支持基于租户的查询,高并发下有很好的扩展性; * 由数据库处理数据分片,对业务透明; * 可以横向扩容支持更大的数据量; * 横向扩展并不会损失SQL支持能力; * 高性能支持基于租户的数据分析; * 很容易通过扩展来支持更多的租户; * 支持不同粒度租户的资源隔离; <details> <summary>SQL</summary> ``` CREATE TABLE companies ( id bigint NOT NULL, name text NOT NULL, image_url text, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL ); CREATE TABLE campaigns ( id bigint NOT NULL, company_id bigint NOT NULL, name text NOT NULL, cost_model text NOT NULL, state text NOT NULL, monthly_budget bigint, blacklisted_site_urls text[], created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL ); CREATE TABLE ads ( id bigint NOT NULL, company_id bigint NOT NULL, campaign_id bigint NOT NULL, name text NOT NULL, image_url text, target_url text, impressions_count bigint DEFAULT 0, clicks_count bigint DEFAULT 0, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL ); ALTER TABLE companies ADD PRIMARY KEY (id); ALTER TABLE campaigns ADD PRIMARY KEY (id, company_id); ALTER TABLE ads ADD PRIMARY KEY (id, company_id); ``` </details> 通过`create_distributed_table`语句将表设置为分布式表,同时指定`company_id`列为Sharding Key: ``` SELECT create_distributed_table('companies', 'id'); SELECT create_distributed_table('campaigns', 'company_id'); SELECT create_distributed_table('ads', 'company_id'); ``` 可以像普通表一样,支持增删改查操作: ``` INSERT INTO companies VALUES (5000, 'New Company', 'https://randomurl/image.png', now(), now()); DELETE FROM campaigns WHERE id = 46 AND company_id = 5; UPDATE campaigns SET monthly_budget = monthly_budget*2 WHERE company_id = 5; SELECT name, cost_model, state, monthly_budget FROM campaigns WHERE company_id = 5 ORDER BY monthly_budget DESC LIMIT 10; ``` ### 实时分析场景 * 随着数据量增长,仍然能保持亚秒级查询响应时间; * 支持对实时数据进行实时分析; * 支持多节点并行查询; * 横向扩展并不会损失SQL支持能力; * 高并发下有很好的性能扩展性; * 支持PostgreSQL丰富的类型级扩展; <details> <summary>SQL</summary> ``` CREATE TABLE github_events ( event_id bigint, event_type text, event_public boolean, repo_id bigint, payload jsonb, repo jsonb, user_id bigint, org jsonb, created_at timestamp ); CREATE TABLE github_users ( user_id bigint, url text, login text, avatar_url text, gravatar_id text, display_login text ); CREATE INDEX event_type_index ON github_events (event_type); CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops); ``` </details> 通过`create_distributed_table`语句将表设置为分布式表,,并指定`user_id`列为Sharding Key: ``` SELECT create_distributed_table('github_users', 'user_id'); SELECT create_distributed_table('github_events', 'user_id'); ``` 可以像普通表一样支持各类分析查询SQL ``` SELECT count(*) FROM github_users; SELECT date_trunc('minute', created_at) AS minute, sum((payload->>'distinct_size')::int) AS num_commits FROM github_events WHERE event_type = 'PushEvent' GROUP BY minute ORDER BY minute; SELECT login, count(*) FROM github_events ge JOIN github_users gu ON ge.user_id = gu.user_id WHERE event_type = 'CreateEvent' AND payload @> '{"ref_type": "repository"}' GROUP BY login ORDER BY count(*) DESC LIMIT 10; ``` ### Citus不适用的场景 * 单节点PostgreSQL就能满足的场景,没有必要分布式; * 离线分析场景,对查询分析的实时性没有太高要求; * 不需要支持大量并发用户的分析场景; * 需要返回大量数据的ETL查询场景;