PostgreSQL 高级特性与性能优化实战

PostgreSQL 高级特性与性能优化实战

简介

PostgreSQL 被誉为”开发者最喜爱的数据库”,其强大之处不仅在于基本的 CRUD 操作,更在于丰富的高级特性:部分索引、表达式索引、CTE 递归查询、窗口函数、分区表、并行查询、物化视图、JSON/JSONB 全文搜索等。本文从实战角度出发,深入讲解这些特性的使用场景与性能优化技巧。

前置要求

  • 已安装 PostgreSQL 15+(建议 16 或 17)
  • 基本的 SQL 知识(SELECT、INSERT、UPDATE、CREATE TABLE)
  • 了解数据库索引的基本概念
  • 建议准备一个测试数据库进行实验

目录

  1. 高级索引技巧
  2. CTE 与递归查询
  3. 窗口函数实战
  4. 表分区
  5. JSONB 与全文搜索
  6. 物化视图
  7. 查询性能调优
  8. 常见问题

1. 高级索引技巧

1.1 部分索引(Partial Index)

当查询只关注表中一小部分数据时,部分索引可以大幅减小索引体积。

1
2
3
4
5
6
7
8
9
-- 场景:订单表,99% 的订单是"已完成"状态,只查询"待处理"订单
CREATE INDEX idx_pending_orders ON orders (created_at)
WHERE status = 'pending';

-- 查询自动使用部分索引
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending' AND created_at > '2026-01-01';
-- 索引扫描,而非全表扫描

性能对比:

索引类型 大小 查询耗时(1000万行)
普通索引 214 MB 12 ms
部分索引 2.1 MB 0.8 ms

1.2 表达式索引(Expression Index)

对函数或表达式结果建立索引,避免每次查询都进行全表计算。

1
2
3
4
5
6
7
8
9
10
11
12
-- 场景:用户表,经常按邮箱域名查询
CREATE INDEX idx_email_domain ON users (split_part(email, '@', 2));

-- 查询
SELECT * FROM users
WHERE split_part(email, '@', 2) = 'company.com';

-- 场景:按日期范围查询时间戳字段
CREATE INDEX idx_order_date ON orders ((created_at::date));

SELECT * FROM orders
WHERE created_at::date = '2026-06-11';

1.3 覆盖索引(Covering Index / INCLUDE)

将不需要排序但经常 SELECT 的列包含在索引中,实现仅索引扫描(Index-Only Scan)。

1
2
3
4
5
6
-- 场景:经常查询用户的 email 和 status,但只按 id 过滤
CREATE INDEX idx_users_cover ON users (id) INCLUDE (email, status);

-- 以下查询无需回表
SELECT email, status FROM users WHERE id = 42;
-- 输出:Index Only Scan

1.4 复合索引列顺序

1
2
3
4
5
6
7
8
9
10
11
12
-- 规则:等值条件列在前,范围条件列在后
-- ✅ 好:先 status(等值),后 created_at(范围排序)
CREATE INDEX idx_orders_status_created ON orders (status, created_at);

-- ❌ 差:先 created_at,后 status
CREATE INDEX idx_orders_created_status ON orders (created_at, status);

-- 查询
SELECT * FROM orders
WHERE status = 'shipped'
AND created_at BETWEEN '2026-01-01' AND '2026-06-01'
ORDER BY created_at;

1.5 索引维护

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- 查看索引大小
SELECT
indexrelid::regclass AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- 重建索引(减少膨胀)
REINDEX INDEX idx_orders_status_created;

-- 并发重建(不阻塞写入)
REINDEX INDEX CONCURRENTLY idx_orders_status_created;

-- 检查索引使用率
SELECT
schemaname,
tablename,
indexname,
idx_scan AS index_scans,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0 -- 从未使用的索引
ORDER BY tablename;

2. CTE 与递归查询

2.1 基础 CTE(WITH 子句)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- 场景:统计每个分类的销售额占比
WITH category_sales AS (
SELECT
c.name,
SUM(oi.quantity * oi.unit_price) AS total_sales
FROM categories c
JOIN products p ON p.category_id = c.id
JOIN order_items oi ON oi.product_id = p.id
JOIN orders o ON o.id = oi.order_id
WHERE o.status = 'completed'
GROUP BY c.name
),
total AS (
SELECT SUM(total_sales) AS grand_total FROM category_sales
)
SELECT
name,
total_sales,
ROUND(total_sales * 100.0 / (SELECT grand_total FROM total), 2) AS percentage
FROM category_sales
ORDER BY total_sales DESC;

2.2 递归 CTE(WITH RECURSIVE)

递归 CTE 是处理树形/层级数据的利器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- 场景:组织架构树(员工-上级关系)
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
manager_id INTEGER REFERENCES employees(id),
department VARCHAR(50)
);

INSERT INTO employees VALUES
(1, 'CEO', NULL, 'Executive'),
(2, 'CTO', 1, 'Engineering'),
(3, 'CFO', 1, 'Finance'),
(4, 'Tech Lead', 2, 'Engineering'),
(5, 'Senior Dev', 4, 'Engineering'),
(6, 'Junior Dev', 5, 'Engineering');

-- 递归查询:查找 CEO 下的所有下属(任意层级)
WITH RECURSIVE org_tree AS (
-- 基础情况:CEO
SELECT id, name, manager_id, 0 AS level, ARRAY[id] AS path
FROM employees
WHERE manager_id IS NULL

UNION ALL

-- 递归步骤:下属
SELECT
e.id,
e.name,
e.manager_id,
ot.level + 1,
ot.path || e.id
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT
level,
repeat(' ', level) || name AS org_chart,
path
FROM org_tree
ORDER BY path;

输出:

1
2
3
4
5
6
7
8
 level |     org_chart      |   path
-------+--------------------+----------
0 | CEO | {1}
1 | CTO | {1,2}
2 | Tech Lead | {1,2,4}
3 | Senior Dev | {1,2,4,5}
4 | Junior Dev | {1,2,4,5,6}
1 | CFO | {1,3}

2.3 递归 CTE 实战:BOM(物料清单)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- 产品结构表
CREATE TABLE bom (
id SERIAL PRIMARY KEY,
parent_id INTEGER REFERENCES bom(id),
part_name VARCHAR(100),
quantity INTEGER
);

-- 查询某个产品的所有子部件及总用量
WITH RECURSIVE bom_explosion AS (
SELECT id, parent_id, part_name, quantity, 1 AS level
FROM bom
WHERE id = 1 -- 根产品

UNION ALL

SELECT
b.id,
b.parent_id,
b.part_name,
b.quantity * be.quantity AS total_quantity,
be.level + 1
FROM bom b
JOIN bom_explosion be ON b.parent_id = be.id
)
SELECT * FROM bom_explosion ORDER BY level, id;

3. 窗口函数实战

窗口函数在不折叠行的情况下进行聚合计算,是数据分析的核心工具。

3.1 排名函数

1
2
3
4
5
6
7
8
9
-- 场景:按部门工资排名
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;
name department salary row_num rank dense_rank
Alice Engineering 150000 1 1 1
Bob Engineering 150000 2 1 1
Carol Engineering 120000 3 3 2

3.2 聚合窗口函数

1
2
3
4
5
6
7
8
9
10
-- 场景:计算销售额的累计和与移动平均
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS cumulative_total,
SUM(amount) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS weekly_moving_sum,
AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS monthly_moving_avg,
amount - AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS deviation_from_avg
FROM daily_sales
ORDER BY order_date;

3.3 窗口帧(Frame)详解

1
2
3
4
5
6
7
8
9
10
11
-- ROWS vs RANGE vs GROUPS
SELECT
order_date,
amount,
-- ROWS:物理行数
AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rows_avg,
-- RANGE:逻辑范围(相同 order_date 视为一组)
AVG(amount) OVER (ORDER BY order_date RANGE BETWEEN '2 days' PRECEDING AND CURRENT ROW) AS range_avg,
-- GROUPS:按值分组
AVG(amount) OVER (ORDER BY order_date GROUPS BETWEEN 2 PRECEDING AND CURRENT ROW) AS groups_avg
FROM daily_sales;

3.4 滞后与超前分析

1
2
3
4
5
6
7
8
9
10
11
12
13
-- 场景:计算每日销售额环比
SELECT
order_date,
amount,
LAG(amount, 1) OVER (ORDER BY order_date) AS prev_day_amount,
ROUND(
(amount - LAG(amount, 1) OVER (ORDER BY order_date))
/ NULLIF(LAG(amount, 1) OVER (ORDER BY order_date), 0) * 100,
2
) AS day_over_day_change_pct,
LEAD(amount, 1) OVER (ORDER BY order_date) AS next_day_amount
FROM daily_sales
ORDER BY order_date;

3.5 分组内百分比

1
2
3
4
5
6
7
8
9
10
11
-- 场景:每个员工在部门内的薪资占比
SELECT
name,
department,
salary,
ROUND(
salary * 100.0 / SUM(salary) OVER (PARTITION BY department),
2
) AS pct_of_dept
FROM employees
ORDER BY department, pct_of_dept DESC;

4. 表分区

4.1 范围分区(Range Partitioning)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- 场景:日志表按月分区
CREATE TABLE logs (
id BIGSERIAL,
level VARCHAR(10),
message TEXT,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- 创建分区
CREATE TABLE logs_2026_01 PARTITION OF logs
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE logs_2026_02 PARTITION OF logs
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

CREATE TABLE logs_2026_03 PARTITION OF logs
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

-- 创建默认分区(捕获超出范围的数据)
CREATE TABLE logs_default PARTITION OF logs DEFAULT;

-- 插入数据自动路由到对应分区
INSERT INTO logs (level, message, created_at)
VALUES ('ERROR', 'Disk space low', '2026-02-15 10:30:00');
-- 自动写入 logs_2026_02 分区

4.2 列表分区(List Partitioning)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CREATE TABLE orders (
id BIGSERIAL,
region VARCHAR(20),
amount DECIMAL(10,2),
order_date DATE
) PARTITION BY LIST (region);

CREATE TABLE orders_na PARTITION OF orders
FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE orders_eu PARTITION OF orders
FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

CREATE TABLE orders_asia PARTITION OF orders
FOR VALUES IN ('CN', 'JP', 'KR', 'SG');

4.3 哈希分区(Hash Partitioning)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE TABLE user_sessions (
user_id BIGINT,
session_data JSONB,
login_at TIMESTAMP
) PARTITION BY HASH (user_id);

CREATE TABLE user_sessions_0 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE user_sessions_1 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE user_sessions_2 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE user_sessions_3 PARTITION OF user_sessions
FOR VALUES WITH (MODULUS 4, REMAINDER 3);

4.4 分区维护

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- 查看分区结构
SELECT
parent.relname AS parent_table,
child.relname AS partition_name,
pg_get_expr(child.relpartbound, child.oid) AS partition_boundary
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = 'logs';

-- 分离旧分区(比 DELETE 快得多)
ALTER TABLE logs DETACH PARTITION logs_2026_01;

-- 将分离的分区转为独立表
ALTER TABLE logs_2026_01 SET SCHEMA archive;

-- 添加新分区
CREATE TABLE logs_2026_04 PARTITION OF logs
FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');

4.5 分区剪枝验证

1
2
3
4
5
-- 验证查询是否只扫描了相关分区
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM logs
WHERE created_at >= '2026-02-01' AND created_at < '2026-03-01';
-- 输出应显示只扫描了 logs_2026_02 分区

5. JSONB 与全文搜索

5.1 JSONB 索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- 创建测试表
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200),
attributes JSONB
);

INSERT INTO products (name, attributes) VALUES
('Laptop', '{"brand": "Apple", "specs": {"cpu": "M4", "ram": "32GB", "storage": "1TB"}, "colors": ["Silver", "Space Gray"], "price": 1999}'),
('Phone', '{"brand": "Samsung", "specs": {"cpu": "Exynos", "ram": "12GB", "storage": "256GB"}, "colors": ["Black", "White"], "price": 999}');

-- GIN 索引(JSONB 的通用索引)
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

-- 常用查询模式
-- 1. 存在性检查
SELECT * FROM products WHERE attributes ? 'warranty';

-- 2. 键值匹配
SELECT * FROM products WHERE attributes @> '{"brand": "Apple"}';

-- 3. 路径查询
SELECT * FROM products
WHERE attributes -> 'specs' ->> 'ram' = '32GB';

-- 4. 嵌套路径匹配
SELECT * FROM products
WHERE attributes @> '{"specs": {"cpu": "M4"}}';

-- 5. 数组包含
SELECT * FROM products
WHERE attributes -> 'colors' ? 'Silver';

5.2 JSONB 路径查询(PgSQL 12+)

1
2
3
4
5
6
7
8
9
10
11
-- 使用 SQL/JSON 路径表达式
SELECT * FROM products
WHERE attributes @? '$.specs.cpu';

SELECT * FROM products
WHERE attributes @@ '$.specs.ram == "32GB"';

SELECT
name,
jsonb_path_query(attributes, '$.colors[*]') AS color
FROM products;

5.3 全文搜索(Full-Text Search)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- 创建全文搜索向量列
ALTER TABLE products ADD COLUMN search_vector TSVECTOR;

-- 生成搜索向量(合并多个字段)
UPDATE products SET
search_vector = to_tsvector('english',
coalesce(name, '') || ' ' ||
coalesce(attributes->>'brand', '')
);

-- 创建 GIN 索引
CREATE INDEX idx_products_search ON products USING GIN (search_vector);

-- 自动维护(使用触发器)
CREATE FUNCTION products_search_update() RETURNS TRIGGER AS $$
BEGIN
NEW.search_vector := to_tsvector('english',
coalesce(NEW.name, '') || ' ' ||
coalesce(NEW.attributes->>'brand', '')
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_products_search
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION products_search_update();

-- 搜索查询
SELECT name, attributes
FROM products
WHERE search_vector @@ to_tsquery('english', 'apple & laptop');

-- 带排名的搜索
SELECT
name,
ts_rank(search_vector, to_tsquery('english', 'apple & laptop')) AS rank
FROM products
WHERE search_vector @@ to_tsquery('english', 'apple & laptop')
ORDER BY rank DESC;

5.4 混合搜索(JSONB + 全文搜索)

1
2
3
4
5
6
-- 场景:电商产品搜索,同时过滤属性和搜索文本
SELECT *
FROM products
WHERE search_vector @@ to_tsquery('english', 'laptop')
AND attributes @> '{"specs": {"ram": "32GB"}}'
ORDER BY ts_rank(search_vector, to_tsquery('english', 'laptop')) DESC;

6. 物化视图

物化视图将查询结果物理存储,适合计算密集但变更不频繁的场景。

6.1 创建与刷新

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- 创建物化视图:每日销售报表
CREATE MATERIALIZED VIEW daily_sales_report AS
SELECT
DATE(o.created_at) AS order_date,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT o.user_id) AS unique_customers,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
SUM(oi.quantity * oi.unit_price - oi.quantity * p.cost_price) AS total_profit,
AVG(oi.quantity * oi.unit_price) AS avg_order_value
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.status = 'completed'
GROUP BY DATE(o.created_at)
ORDER BY order_date DESC;

-- 创建索引(物化视图支持索引)
CREATE UNIQUE INDEX idx_msr_date ON daily_sales_report (order_date);
CREATE INDEX idx_msr_revenue ON daily_sales_report (total_revenue DESC);

-- 刷新物化视图
REFRESH MATERIALIZED VIEW daily_sales_report;

-- 并发刷新(不阻塞读取)
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_report;
-- 注意:CONCURRENTLY 要求物化视图必须有唯一索引

6.2 物化视图 vs 普通视图

特性 普通视图 物化视图
数据存储 不存储,每次查询执行 物理存储
查询速度 慢(每次重新计算) 极快(直接读取)
数据实时性 实时 取决于刷新频率
索引支持
占用空间 0 取决于数据量
适用场景 简单封装 复杂聚合报表

6.3 自动刷新策略

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- 方案1:使用 pg_cron 扩展定时刷新
-- 需要先安装 pg_cron 扩展
SELECT cron.schedule(
'refresh-daily-sales',
'0 3 * * *', -- 每天凌晨 3 点
$$REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_report$$
);

-- 方案2:使用触发器 + 计数器
CREATE TABLE refresh_tracker (
view_name TEXT PRIMARY KEY,
last_refreshed TIMESTAMP DEFAULT NOW(),
refresh_interval INTERVAL DEFAULT '1 hour'
);

-- 查询时检查是否需要刷新
SELECT * FROM daily_sales_report
WHERE EXISTS (
SELECT 1 FROM refresh_tracker
WHERE view_name = 'daily_sales_report'
AND last_refreshed + refresh_interval < NOW()
);

7. 查询性能调优

7.1 使用 EXPLAIN ANALYZE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- 基础用法
EXPLAIN (ANALYZE, BUFFERS, TIMING)
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 100;

-- 输出解读关键指标
/*
Seq Scan on orders (cost=0.00..43210.00 rows=5000 width=120)
(actual time=0.12..123.45 rows=4567 loops=1)
Filter: (status = 'pending')
Rows Removed by Filter: 995433
Planning Time: 0.45 ms
Execution Time: 124.67 ms
*/
-- 看到 Seq Scan + Rows Removed 说明需要索引

7.2 常用性能诊断查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- 慢查询(运行时间最长的前 10 个)
SELECT
queryid,
ROUND(total_exec_time::numeric, 2) AS total_ms,
calls,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
ROUND(shared_blks_hit * 100.0 / NULLIF(shared_blks_hit + shared_blks_read, 0), 2) AS cache_hit_ratio,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- 表大小与膨胀率
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
pg_size_pretty(pg_table_size(schemaname||'.'||tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) AS index_size,
ROUND(100 * pg_stat_get_live_tuples(tablename::regclass) /
NULLIF(pg_stat_get_live_tuples(tablename::regclass) + pg_stat_get_dead_tuples(tablename::regclass), 0), 2) AS live_tuple_pct
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- 索引使用率
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC
LIMIT 20;

7.3 配置优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- 查看当前配置
SHOW shared_buffers;
SHOW work_mem;
SHOW effective_cache_size;
SHOW maintenance_work_mem;
SHOW random_page_cost;

-- 推荐配置(16GB 内存服务器)
-- postgresql.conf
shared_buffers = '4GB' -- 总内存的 25%
effective_cache_size = '12GB' -- 总内存的 75%
work_mem = '64MB' -- 每个排序操作的内存
maintenance_work_mem = '1GB' -- VACUUM/INDEX 操作
random_page_cost = 1.1 -- SSD 设置为 1.1,HDD 默认 4.0
effective_io_concurrency = 200 -- SSD 设置 200
wal_buffers = '64MB'
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

7.4 并行查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-- 允许并行查询
SET max_parallel_workers_per_gather = 4;

-- 并行顺序扫描
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*), status FROM large_orders
GROUP BY status;
-- 输出应显示 "Workers Planned: 4" 和 "Workers Launched: 4"

-- 并行聚合
EXPLAIN (ANALYZE)
SELECT department, AVG(salary), COUNT(*)
FROM employees
GROUP BY department;

8. 常见问题

Q1:查询使用了索引但仍然很慢?

1
2
3
4
5
6
7
8
9
10
-- 检查是否进行了不必要的回表
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'pending';
-- 如果显示 Bitmap Heap Scan + Recheck Cond,说明选择性不够高

-- 解决方案
-- 1. 创建覆盖索引
CREATE INDEX idx_orders_covering ON orders (status) INCLUDE (id, created_at, amount);
-- 2. 创建部分索引(如果只查特定状态)
CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = 'pending';

Q2:VACUUM 相关

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- 查看是否需要 VACUUM
SELECT
relname,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- 手动 VACUUM
VACUUM ANALYZE orders;

-- 查看 autovacuum 配置
SHOW autovacuum_vacuum_threshold;
SHOW autovacuum_vacuum_scale_factor;

Q3:死锁如何排查?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- 查看当前锁
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_locks.pid = blocked_activity.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_locks.pid = blocking_activity.pid
WHERE NOT blocked_locks.granted;

-- 终止阻塞进程(谨慎使用)
SELECT pg_terminate_backend(blocking_pid);

Q4:如何快速复制大表?

1
2
3
4
5
6
7
8
9
10
11
12
-- 使用 CREATE TABLE AS(比 INSERT INTO ... SELECT 更快)
CREATE TABLE orders_backup AS SELECT * FROM orders;

-- 只复制结构
CREATE TABLE orders_empty (LIKE orders INCLUDING ALL);

-- 并行复制(利用多个 worker)
INSERT INTO orders_copy
SELECT * FROM orders
WHERE ctid IN (
SELECT ctid FROM orders ORDER BY ctid
);

Q5:连接池配置建议?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- 查看当前连接数
SELECT count(*) FROM pg_stat_activity;

-- 推荐使用 PgBouncer 连接池
-- pgbouncer.ini 配置示例
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 200
default_pool_size = 25

Q6:如何备份与恢复?

1
2
3
4
5
6
7
8
9
10
11
# 逻辑备份
pg_dump -h localhost -U postgres -d mydb -F c -f mydb.dump

# 压缩备份
pg_dump -h localhost -U postgres -d mydb -F c -Z 9 -f mydb.dump

# 并行备份(多表并行)
pg_dump -h localhost -U postgres -d mydb -j 4 -F d -f /backup/mydb/

# 恢复
pg_restore -h localhost -U postgres -d mydb -j 4 mydb.dump

总结

本文覆盖了 PostgreSQL 高级特性中日常开发最常用的核心能力:

特性 核心命令 典型场景
部分索引 CREATE INDEX ... WHERE 过滤低频状态数据
表达式索引 CREATE INDEX ... ON func(col) 函数/类型转换查询
递归 CTE WITH RECURSIVE 树形结构、BOM、组织架构
窗口函数 ROW_NUMBER() OVER (PARTITION BY) 排名、移动平均、环比
表分区 PARTITION BY RANGE/LIST/HASH 日志、时序数据
JSONB @>, ?, jsonb_path_query 灵活 Schema、配置存储
全文搜索 to_tsvector, to_tsquery 产品搜索、内容检索
物化视图 CREATE MATERIALIZED VIEW 报表、聚合查询
并行查询 max_parallel_workers_per_gather 大表聚合、分析查询

建议在日常开发中从以下顺序逐步引入高级特性:

  1. 先加索引:部分索引 + 覆盖索引,性价比最高
  2. 再优化查询:窗口函数替代子查询,CTE 替代复杂 JOIN
  3. 大表做分区:超过 1000 万行考虑分区
  4. 缓存结果:物化视图加速报表
  5. 调优配置:根据服务器硬件调整 PG 参数