跳到主要内容

CTE 公共表表达式

问题

什么是 CTE(WITH 子句)?递归 CTE 怎么用?CTE 和子查询有什么区别?

答案

CTE(Common Table Expression,公共表表达式) 是 SQL 中用 WITH 关键字定义的临时命名结果集。你可以把它理解为"给子查询起个名字,让 SQL 更容易读懂"。

CTE 不会创建临时表,也不会持久化数据——它只在当前查询中有效。


基本语法

WITH cte_name AS (
-- 查询语句
SELECT ...
)
SELECT * FROM cte_name;

对比:子查询 vs CTE

-- ❌ 子查询嵌套,阅读困难
SELECT * FROM (
SELECT department, AVG(salary) AS avg_sal FROM (
SELECT * FROM employees WHERE salary > 10000
) t1
GROUP BY department
) t2
WHERE avg_sal > 15000;

-- ✅ CTE 分步骤,清晰易读
WITH high_salary AS (
SELECT * FROM employees WHERE salary > 10000
),
dept_avg AS (
SELECT department, AVG(salary) AS avg_sal
FROM high_salary
GROUP BY department
)
SELECT * FROM dept_avg WHERE avg_sal > 15000;

CTE 的优势一目了然:

  1. 可读性:复杂查询分解为多个有意义的步骤
  2. 可复用:同一个 CTE 可以在后续查询中多次引用
  3. 可维护:修改逻辑时只需修改对应的 CTE

多个 CTE

可以定义多个 CTE,用逗号分隔。后面的 CTE 可以引用前面的 CTE。

WITH 
-- 步骤1:计算每个用户的订单汇总
user_orders AS (
SELECT
user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM orders
GROUP BY user_id
),
-- 步骤2:对用户分层
user_segments AS (
SELECT *,
CASE
WHEN total_amount >= 10000 THEN '高价值'
WHEN total_amount >= 5000 THEN '中等价值'
ELSE '低价值'
END AS segment
FROM user_orders
)
-- 步骤3:统计每个层级的用户数
SELECT segment, COUNT(*) AS user_count, AVG(total_amount) AS avg_amount
FROM user_segments
GROUP BY segment;

递归 CTE

递归 CTE 是 CTE 最强大的特性——它可以引用自身,实现递归查询。最常见的场景:

  • 查询组织架构的上下级关系
  • 遍历树形结构(分类、评论回复、文件目录)
  • 生成连续的日期/数字序列

语法结构

WITH RECURSIVE cte_name AS (
-- 基础部分(Base Case):递归的起点
SELECT ...
UNION ALL
-- 递归部分:引用 CTE 自身
SELECT ... FROM cte_name WHERE 终止条件
)
SELECT * FROM cte_name;
MySQL 注意

MySQL 8.0+ 支持递归 CTE,但关键字是 WITH RECURSIVE。PostgreSQL 同样使用 WITH RECURSIVE

示例 1:组织架构查询

-- 员工表包含上级关系
-- | id | name | manager_id |
-- |----|--------|------------|
-- | 1 | CEO | NULL |
-- | 2 | VP技术 | 1 |
-- | 3 | VP市场 | 1 |
-- | 4 | 张三 | 2 |
-- | 5 | 李四 | 2 |
-- | 6 | 王五 | 3 |

-- 查找 CEO(id=1)的所有下属(包含间接下属)
WITH RECURSIVE subordinates AS (
-- 起点:CEO 本人
SELECT id, name, manager_id, 0 AS level
FROM employees WHERE id = 1

UNION ALL

-- 递归:找到当前层级的所有下属
SELECT e.id, e.name, e.manager_id, s.level + 1
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;

结果:

idnamemanager_idlevel
1CEONULL0
2VP技术11
3VP市场11
4张三22
5李四22
6王五32

示例 2:生成日期序列

数据分析中经常需要一个连续的日期列表,用来和实际数据 LEFT JOIN,确保没有数据的日期也能显示为 0。

-- 生成 2024-01-01 到 2024-01-31 的日期序列
WITH RECURSIVE date_series AS (
SELECT DATE('2024-01-01') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM date_series WHERE dt < '2024-01-31'
)
SELECT dt FROM date_series;

-- 实际应用:每日注册人数(包含 0 注册的日期)
WITH RECURSIVE date_series AS (
SELECT DATE('2024-01-01') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM date_series WHERE dt < '2024-01-31'
)
SELECT d.dt, COALESCE(COUNT(u.id), 0) AS registrations
FROM date_series d
LEFT JOIN users u ON DATE(u.created_at) = d.dt
GROUP BY d.dt
ORDER BY d.dt;

示例 3:查找分类的所有子分类

-- 分类表(自引用)
WITH RECURSIVE category_tree AS (
-- 从顶级分类开始
SELECT id, name, parent_id, name AS path
FROM categories WHERE parent_id IS NULL

UNION ALL

-- 递归查找子分类
SELECT c.id, c.name, c.parent_id,
CONCAT(ct.path, ' > ', c.name) AS path
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY path;

结果示例:

idnamepath
1电子产品电子产品
2手机电子产品 > 手机
5智能手机电子产品 > 手机 > 智能手机
3电脑电子产品 > 电脑

CTE 的性能考量

CTE 被内联还是物化?

-- 当 CTE 只被引用一次时,大多数数据库会将其"内联"到主查询
WITH filtered AS (
SELECT * FROM orders WHERE status = 'completed'
)
SELECT * FROM filtered WHERE amount > 100;
-- 等价于:SELECT * FROM orders WHERE status = 'completed' AND amount > 100;

-- 当 CTE 被引用多次时,行为因数据库而异
WITH top_users AS (
SELECT user_id, SUM(amount) AS total FROM orders GROUP BY user_id ORDER BY total DESC LIMIT 100
)
SELECT a.user_id, b.user_id
FROM top_users a
CROSS JOIN top_users b -- 引用了两次
WHERE a.total > b.total;
数据库单次引用多次引用
MySQL 8.0+内联优化每次引用独立执行
PostgreSQL 12+内联优化(默认)默认内联,可用 MATERIALIZED 强制物化
SQL Server内联内联(每次独立执行)

何时 CTE 比子查询慢?

CTE 在某些情况下可能阻碍优化器的优化:

-- ❌ CTE 可能阻止索引下推
WITH all_orders AS (
SELECT * FROM orders -- 全表扫描
)
SELECT * FROM all_orders WHERE user_id = 123;
-- 在某些数据库中,优化器不会把 WHERE 条件推入 CTE 内部

-- ✅ 直接查询,优化器可以直接使用 user_id 上的索引
SELECT * FROM orders WHERE user_id = 123;
性能建议
  • CTE 主要用于可读性,不适合作为性能优化手段
  • 如果 CTE 被多次引用且计算量大,在 PostgreSQL 中考虑使用 MATERIALIZED
  • 复杂查询先写 CTE 版本确保正确,然后用 EXPLAIN 分析是否需要改写

数据分析实战场景

场景一:漏斗分析(用 CTE 分步骤计算)

-- 电商购物漏斗:浏览 → 加购 → 下单 → 支付
WITH funnel AS (
-- 步骤 1:浏览
SELECT 'view' AS step, 1 AS step_order, COUNT(DISTINCT user_id) AS users
FROM events WHERE event_type = 'view' AND event_date = '2024-01-15'
UNION ALL
-- 步骤 2:加购
SELECT 'add_cart', 2, COUNT(DISTINCT user_id)
FROM events WHERE event_type = 'add_cart' AND event_date = '2024-01-15'
UNION ALL
-- 步骤 3:下单
SELECT 'order', 3, COUNT(DISTINCT user_id)
FROM events WHERE event_type = 'order' AND event_date = '2024-01-15'
UNION ALL
-- 步骤 4:支付
SELECT 'pay', 4, COUNT(DISTINCT user_id)
FROM events WHERE event_type = 'pay' AND event_date = '2024-01-15'
)
SELECT
step,
users,
LAG(users) OVER (ORDER BY step_order) AS prev_step_users,
ROUND(users * 100.0 / FIRST_VALUE(users) OVER (ORDER BY step_order), 2) AS overall_rate,
ROUND(users * 100.0 / LAG(users) OVER (ORDER BY step_order), 2) AS step_rate
FROM funnel;

场景二:月度同环比报表

WITH monthly AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(amount) AS revenue,
COUNT(DISTINCT user_id) AS buyers
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
),
with_comparison AS (
SELECT
month,
revenue,
buyers,
-- 环比
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0 /
NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0), 2) AS mom_pct,
-- 同比
LAG(revenue, 12) OVER (ORDER BY month) AS yoy_revenue,
ROUND((revenue - LAG(revenue, 12) OVER (ORDER BY month)) * 100.0 /
NULLIF(LAG(revenue, 12) OVER (ORDER BY month), 0), 2) AS yoy_pct
FROM monthly
)
SELECT * FROM with_comparison ORDER BY month DESC;

场景三:递归 CTE 生成日历表

-- 生成一个完整的日历,标注节假日和工作日
WITH RECURSIVE calendar AS (
SELECT DATE('2024-01-01') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM calendar WHERE dt < '2024-12-31'
)
SELECT
dt,
DAYNAME(dt) AS day_name,
WEEKDAY(dt) AS weekday, -- 0=周一, 6=周日
CASE
WHEN WEEKDAY(dt) IN (5, 6) THEN '周末'
ELSE '工作日'
END AS day_type,
-- 按月汇总
DATE_FORMAT(dt, '%Y-%m') AS month
FROM calendar;

场景四:用 CTE 实现分层计算

-- 需求:新用户首单分析
-- 步骤 1 → 2 → 3 层层递进
WITH
-- 第一层:找到每个用户的首单日期
first_orders AS (
SELECT user_id, MIN(order_date) AS first_order_date
FROM orders
GROUP BY user_id
),
-- 第二层:拿到首单的详细信息
first_order_details AS (
SELECT o.*
FROM orders o
JOIN first_orders fo ON o.user_id = fo.user_id AND o.order_date = fo.first_order_date
),
-- 第三层:按渠道统计首单指标
channel_stats AS (
SELECT
channel,
COUNT(*) AS new_buyers,
ROUND(AVG(amount), 2) AS avg_first_order,
ROUND(SUM(amount), 2) AS total_first_orders
FROM first_order_details
GROUP BY channel
)
SELECT *,
ROUND(new_buyers * 100.0 / SUM(new_buyers) OVER (), 2) AS pct
FROM channel_stats
ORDER BY new_buyers DESC;

CTE vs 子查询 vs 临时表

对比项CTE子查询临时表
可读性✅ 很好❌ 嵌套深时难读✅ 好
复用性✅ 同一查询内可多次引用❌ 需要重复写✅ 整个会话可用
递归✅ 支持❌ 不支持❌ 需要循环
持久化❌ 只在当前语句有效❌ 只在当前语句有效✅ 会话内持久
性能取决于优化器取决于优化器已物化为真实表

常见面试问题

Q1: CTE 有什么优势?什么时候用 CTE?

答案

CTE 的主要优势:

  1. 可读性:将复杂查询分解为清晰的步骤
  2. 可复用:同一个 CTE 在查询中可多次引用
  3. 递归能力:处理层级关系和序列生成

使用场景:

  • 复杂查询需要分步骤处理
  • 同一个子查询在多处被引用
  • 需要递归查询(树形结构、连续日期等)

Q2: 递归 CTE 会不会死循环?怎么防止?

答案

会!如果数据有循环引用(如 A→B→C→A),递归 CTE 会无限循环。

防止方法:

-- 方法 1:MySQL 有默认递归深度限制(默认1000)
SET cte_max_recursion_depth = 100;

-- 方法 2:在递归条件中加上深度限制
WITH RECURSIVE cte AS (
SELECT id, name, 0 AS depth FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, cte.depth + 1
FROM categories c
JOIN cte ON c.parent_id = cte.id
WHERE cte.depth < 10 -- 最多递归 10 层
)
SELECT * FROM cte;

Q3: CTE 是否会被物化(Materialized)?

答案

取决于数据库:

  • MySQL:CTE 不会被物化,优化器会将其视为内联子查询
  • PostgreSQL 12+:默认不物化,但可以用 MATERIALIZED 关键字强制物化
-- PostgreSQL:强制物化
WITH cte AS MATERIALIZED (
SELECT * FROM large_table WHERE ...
)
SELECT * FROM cte a JOIN cte b ON ...; -- cte 只计算一次

当 CTE 被多次引用时,物化可以避免重复计算。

Q4: 面试题——用递归 CTE 实现斐波那契数列

答案

WITH RECURSIVE fib AS (
SELECT 1 AS n, 1 AS val, 0 AS prev_val
UNION ALL
SELECT n + 1, val + prev_val, val
FROM fib
WHERE n < 10
)
SELECT n, val AS fibonacci FROM fib;
nfibonacci
11
21
32
43
55
68
713
821
934
1055

Q5: CTE 和视图(View)有什么区别?

答案

对比CTEView
生命周期只在当前 SQL 语句中有效持久存储在数据库中
可复用范围当前查询所有查询都可以引用
递归✅ 支持 RECURSIVE❌ 不支持
创建方式WITH ... AS (...)CREATE VIEW ... AS ...
权限不需要额外权限需要 CREATE VIEW 权限
适用场景临时性、一次性的复杂查询经常被复用的查询逻辑

Q6: 面试题——用递归 CTE 查找某节点的所有上级

答案

-- 从叶子节点向上查找所有祖先
WITH RECURSIVE ancestors AS (
-- 起点:从指定员工开始
SELECT id, name, manager_id, 0 AS level
FROM employees WHERE id = 5 -- 李四

UNION ALL

-- 递归:向上查找上级
SELECT e.id, e.name, e.manager_id, a.level + 1
FROM employees e
JOIN ancestors a ON e.id = a.manager_id
)
SELECT * FROM ancestors;
-- 结果:李四 → VP技术 → CEO

Q7: UNION ALL vs UNION 在 CTE 中的选择?

答案

递归 CTE 中必须用 UNION ALL。非递归 CTE 中两者都可以。

-- ✅ 递归 CTE 用 UNION ALL
WITH RECURSIVE cte AS (
SELECT 1 AS n
UNION ALL -- 必须是 UNION ALL
SELECT n + 1 FROM cte WHERE n < 10
)
SELECT * FROM cte;

-- 非递归 CTE 中可以用 UNION 去重
WITH combined AS (
SELECT user_id FROM table_a
UNION -- 自动去重
SELECT user_id FROM table_b
)
SELECT * FROM combined;

Q8: 面试题——用 CTE 实现层级缩进展示

答案

-- 展示组织架构的层级缩进效果
WITH RECURSIVE org_tree AS (
SELECT id, name, manager_id, 0 AS level, CAST(name AS CHAR(200)) AS display
FROM employees WHERE manager_id IS NULL

UNION ALL

SELECT e.id, e.name, e.manager_id, o.level + 1,
CAST(CONCAT(REPEAT(' ', o.level + 1), '├── ', e.name) AS CHAR(200))
FROM employees e
JOIN org_tree o ON e.manager_id = o.id
)
SELECT display FROM org_tree ORDER BY level, name;

输出:

CEO
├── VP市场
├── 王五
├── VP技术
├── 张三
├── 李四

相关链接