CTE 公共表表达式
问题
什么是 CTE(WITH 子句)?递归 CTE 怎么用?CTE 和子查询有什么区别?
答案
CTE(Common Table Expression,公共表表达式) 是 SQL 中用 WITH 关键字定义的临时命名结果集。你可以把它理解为"给子查询起个名字,让 SQL 更容易读懂"。
CTE 不会创建临时表,也不会持久化数据——它只在当前查询中有效。
基本语法
WITH cte_name AS (
-- 查询语句
SELECT ...
)
SELECT * FROM cte_name;
对比:子查询 vs CTE
-- ❌ 子查询嵌套,阅读困难
SELECT * FROM (
SELECT department, AVG(salary) AS avg_sal FROM (
SELECT * FROM employees WHERE salary > 10000
) t1
GROUP BY department
) t2
WHERE avg_sal > 15000;
-- ✅ CTE 分步骤,清晰易读
WITH high_salary AS (
SELECT * FROM employees WHERE salary > 10000
),
dept_avg AS (
SELECT department, AVG(salary) AS avg_sal
FROM high_salary
GROUP BY department
)
SELECT * FROM dept_avg WHERE avg_sal > 15000;
CTE 的优势一目了然:
- 可读性:复杂查询分解为多个有意义的步骤
- 可复用:同一个 CTE 可以在后续查询中多次引用
- 可维护:修改逻辑时只需修改对应的 CTE
多个 CTE
可以定义多个 CTE,用逗号分隔。后面的 CTE 可以引用前面的 CTE。
WITH
-- 步骤1:计算每个用户的订单汇总
user_orders AS (
SELECT
user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_amount
FROM orders
GROUP BY user_id
),
-- 步骤2:对用户分层
user_segments AS (
SELECT *,
CASE
WHEN total_amount >= 10000 THEN '高价值'
WHEN total_amount >= 5000 THEN '中等价值'
ELSE '低价值'
END AS segment
FROM user_orders
)
-- 步骤3:统计每个层级的用户数
SELECT segment, COUNT(*) AS user_count, AVG(total_amount) AS avg_amount
FROM user_segments
GROUP BY segment;
递归 CTE
递归 CTE 是 CTE 最强大的特性——它可以引用自身,实现递归查询。最常见的场景:
- 查询组织架构的上下级关系
- 遍历树形结构(分类、评论回复、文件目录)
- 生成连续的日期/数字序列
语法结构
WITH RECURSIVE cte_name AS (
-- 基础部分(Base Case):递归的起点
SELECT ...
UNION ALL
-- 递归部分:引用 CTE 自身
SELECT ... FROM cte_name WHERE 终止条件
)
SELECT * FROM cte_name;
MySQL 注意
MySQL 8.0+ 支持递归 CTE,但关键字是 WITH RECURSIVE。PostgreSQL 同样使用 WITH RECURSIVE。
示例 1:组织架构查询
-- 员工表包含上级关系
-- | id | name | manager_id |
-- |----|--------|------------|
-- | 1 | CEO | NULL |
-- | 2 | VP技术 | 1 |
-- | 3 | VP市场 | 1 |
-- | 4 | 张三 | 2 |
-- | 5 | 李四 | 2 |
-- | 6 | 王五 | 3 |
-- 查找 CEO(id=1)的所有下属(包含间接下属)
WITH RECURSIVE subordinates AS (
-- 起点:CEO 本人
SELECT id, name, manager_id, 0 AS level
FROM employees WHERE id = 1
UNION ALL
-- 递归:找到当前层级的所有下属
SELECT e.id, e.name, e.manager_id, s.level + 1
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;
结果:
| id | name | manager_id | level |
|---|---|---|---|
| 1 | CEO | NULL | 0 |
| 2 | VP技术 | 1 | 1 |
| 3 | VP市场 | 1 | 1 |
| 4 | 张三 | 2 | 2 |
| 5 | 李四 | 2 | 2 |
| 6 | 王五 | 3 | 2 |
示例 2:生成日期序列
数据分析中经常需要一个连续的日期列表,用来和实际数据 LEFT JOIN,确保没有数据的日期也能显示为 0。
-- 生成 2024-01-01 到 2024-01-31 的日期序列
WITH RECURSIVE date_series AS (
SELECT DATE('2024-01-01') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM date_series WHERE dt < '2024-01-31'
)
SELECT dt FROM date_series;
-- 实际应用:每日注册人数(包含 0 注册的日期)
WITH RECURSIVE date_series AS (
SELECT DATE('2024-01-01') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM date_series WHERE dt < '2024-01-31'
)
SELECT d.dt, COALESCE(COUNT(u.id), 0) AS registrations
FROM date_series d
LEFT JOIN users u ON DATE(u.created_at) = d.dt
GROUP BY d.dt
ORDER BY d.dt;
示例 3:查找分类的所有子分类
-- 分类表(自引用)
WITH RECURSIVE category_tree AS (
-- 从顶级分类开始
SELECT id, name, parent_id, name AS path
FROM categories WHERE parent_id IS NULL
UNION ALL
-- 递归查找子分类
SELECT c.id, c.name, c.parent_id,
CONCAT(ct.path, ' > ', c.name) AS path
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY path;
结果示例:
| id | name | path |
|---|---|---|
| 1 | 电子产品 | 电子产品 |
| 2 | 手机 | 电子产品 > 手机 |
| 5 | 智能手机 | 电子产品 > 手机 > 智能手机 |
| 3 | 电脑 | 电子产品 > 电脑 |
CTE 的性能考量
CTE 被内联还是物化?
-- 当 CTE 只被引用一次时,大多数数据库会将其"内联"到主查询
WITH filtered AS (
SELECT * FROM orders WHERE status = 'completed'
)
SELECT * FROM filtered WHERE amount > 100;
-- 等价于:SELECT * FROM orders WHERE status = 'completed' AND amount > 100;
-- 当 CTE 被引用多次时,行为因数据库而异
WITH top_users AS (
SELECT user_id, SUM(amount) AS total FROM orders GROUP BY user_id ORDER BY total DESC LIMIT 100
)
SELECT a.user_id, b.user_id
FROM top_users a
CROSS JOIN top_users b -- 引用了两次
WHERE a.total > b.total;
| 数据库 | 单次引用 | 多次引用 |
|---|---|---|
| MySQL 8.0+ | 内联优化 | 每次引用独立执行 |
| PostgreSQL 12+ | 内联优化(默认) | 默认内联,可用 MATERIALIZED 强制物化 |
| SQL Server | 内联 | 内联(每次独立执行) |
何时 CTE 比子查询慢?
CTE 在某些情况下可能阻碍优化器的优化:
-- ❌ CTE 可能阻止索引下推
WITH all_orders AS (
SELECT * FROM orders -- 全表扫描
)
SELECT * FROM all_orders WHERE user_id = 123;
-- 在某些数据库中,优化器不会把 WHERE 条件推入 CTE 内部
-- ✅ 直接查询,优化器可以直接使用 user_id 上的索引
SELECT * FROM orders WHERE user_id = 123;
性能建议
- CTE 主要用于可读性,不适合作为性能优化手段
- 如果 CTE 被多次引用且计算量大,在 PostgreSQL 中考虑使用
MATERIALIZED - 复杂查询先写 CTE 版本确保正确,然后用
EXPLAIN分析是否需要改写
数据分析实战场景
场景一:漏斗分析(用 CTE 分步骤计算)
-- 电商购物漏斗:浏览 → 加购 → 下单 → 支付
WITH funnel AS (
-- 步骤 1:浏览
SELECT 'view' AS step, 1 AS step_order, COUNT(DISTINCT user_id) AS users
FROM events WHERE event_type = 'view' AND event_date = '2024-01-15'
UNION ALL
-- 步骤 2:加购
SELECT 'add_cart', 2, COUNT(DISTINCT user_id)
FROM events WHERE event_type = 'add_cart' AND event_date = '2024-01-15'
UNION ALL
-- 步骤 3:下单
SELECT 'order', 3, COUNT(DISTINCT user_id)
FROM events WHERE event_type = 'order' AND event_date = '2024-01-15'
UNION ALL
-- 步骤 4:支付
SELECT 'pay', 4, COUNT(DISTINCT user_id)
FROM events WHERE event_type = 'pay' AND event_date = '2024-01-15'
)
SELECT
step,
users,
LAG(users) OVER (ORDER BY step_order) AS prev_step_users,
ROUND(users * 100.0 / FIRST_VALUE(users) OVER (ORDER BY step_order), 2) AS overall_rate,
ROUND(users * 100.0 / LAG(users) OVER (ORDER BY step_order), 2) AS step_rate
FROM funnel;
场景二:月度同环比报表
WITH monthly AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(amount) AS revenue,
COUNT(DISTINCT user_id) AS buyers
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
),
with_comparison AS (
SELECT
month,
revenue,
buyers,
-- 环比
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY month)) * 100.0 /
NULLIF(LAG(revenue, 1) OVER (ORDER BY month), 0), 2) AS mom_pct,
-- 同比
LAG(revenue, 12) OVER (ORDER BY month) AS yoy_revenue,
ROUND((revenue - LAG(revenue, 12) OVER (ORDER BY month)) * 100.0 /
NULLIF(LAG(revenue, 12) OVER (ORDER BY month), 0), 2) AS yoy_pct
FROM monthly
)
SELECT * FROM with_comparison ORDER BY month DESC;
场景三:递归 CTE 生成日历表
-- 生成一个完整的日历,标注节假日和工作日
WITH RECURSIVE calendar AS (
SELECT DATE('2024-01-01') AS dt
UNION ALL
SELECT DATE_ADD(dt, INTERVAL 1 DAY) FROM calendar WHERE dt < '2024-12-31'
)
SELECT
dt,
DAYNAME(dt) AS day_name,
WEEKDAY(dt) AS weekday, -- 0=周一, 6=周日
CASE
WHEN WEEKDAY(dt) IN (5, 6) THEN '周末'
ELSE '工作日'
END AS day_type,
-- 按月汇总
DATE_FORMAT(dt, '%Y-%m') AS month
FROM calendar;
场景四:用 CTE 实现分层计算
-- 需求:新用户首单分析
-- 步骤 1 → 2 → 3 层层递进
WITH
-- 第一层:找到每个用户的首单日期
first_orders AS (
SELECT user_id, MIN(order_date) AS first_order_date
FROM orders
GROUP BY user_id
),
-- 第二层:拿到首单的详细信息
first_order_details AS (
SELECT o.*
FROM orders o
JOIN first_orders fo ON o.user_id = fo.user_id AND o.order_date = fo.first_order_date
),
-- 第三层:按渠道统计首单指标
channel_stats AS (
SELECT
channel,
COUNT(*) AS new_buyers,
ROUND(AVG(amount), 2) AS avg_first_order,
ROUND(SUM(amount), 2) AS total_first_orders
FROM first_order_details
GROUP BY channel
)
SELECT *,
ROUND(new_buyers * 100.0 / SUM(new_buyers) OVER (), 2) AS pct
FROM channel_stats
ORDER BY new_buyers DESC;
CTE vs 子查询 vs 临时表
| 对比项 | CTE | 子查询 | 临时表 |
|---|---|---|---|
| 可读性 | ✅ 很好 | ❌ 嵌套深时难读 | ✅ 好 |
| 复用性 | ✅ 同一查询内可多次引用 | ❌ 需要重复写 | ✅ 整个会话可用 |
| 递归 | ✅ 支持 | ❌ 不支持 | ❌ 需要循环 |
| 持久化 | ❌ 只在当前语句有效 | ❌ 只在当前语句有效 | ✅ 会话内持久 |
| 性能 | 取决于优化器 | 取决于优化器 | 已物化为真实表 |
常见面试问题
Q1: CTE 有什么优势?什么时候用 CTE?
答案:
CTE 的主要优势:
- 可读性:将复杂查询分解为清晰的步骤
- 可复用:同一个 CTE 在查询中可多次引用
- 递归能力:处理层级关系和序列生成
使用场景:
- 复杂查询需要分步骤处理
- 同一个子查询在多处被引用
- 需要递归查询(树形结构、连续日期等)
Q2: 递归 CTE 会不会死循环?怎么防止?
答案:
会!如果数据有循环引用(如 A→B→C→A),递归 CTE 会无限循环。
防止方法:
-- 方法 1:MySQL 有默认递归深度限制(默认1000)
SET cte_max_recursion_depth = 100;
-- 方法 2:在递归条件中加上深度限制
WITH RECURSIVE cte AS (
SELECT id, name, 0 AS depth FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, cte.depth + 1
FROM categories c
JOIN cte ON c.parent_id = cte.id
WHERE cte.depth < 10 -- 最多递归 10 层
)
SELECT * FROM cte;
Q3: CTE 是否会被物化(Materialized)?
答案:
取决于数据库:
- MySQL:CTE 不会被物化,优化器会将其视为内联子查询
- PostgreSQL 12+:默认不物化,但可以用
MATERIALIZED关键字强制物化
-- PostgreSQL:强制物化
WITH cte AS MATERIALIZED (
SELECT * FROM large_table WHERE ...
)
SELECT * FROM cte a JOIN cte b ON ...; -- cte 只计算一次
当 CTE 被多次引用时,物化可以避免重复计算。
Q4: 面试题——用递归 CTE 实现斐波那契数列
答案:
WITH RECURSIVE fib AS (
SELECT 1 AS n, 1 AS val, 0 AS prev_val
UNION ALL
SELECT n + 1, val + prev_val, val
FROM fib
WHERE n < 10
)
SELECT n, val AS fibonacci FROM fib;
| n | fibonacci |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 3 |
| 5 | 5 |
| 6 | 8 |
| 7 | 13 |
| 8 | 21 |
| 9 | 34 |
| 10 | 55 |
Q5: CTE 和视图(View)有什么区别?
答案:
| 对比 | CTE | View |
|---|---|---|
| 生命周期 | 只在当前 SQL 语句中有效 | 持久存储在数据库中 |
| 可复用范围 | 当前查询 | 所有查询都可以引用 |
| 递归 | ✅ 支持 RECURSIVE | ❌ 不支持 |
| 创建方式 | WITH ... AS (...) | CREATE VIEW ... AS ... |
| 权限 | 不需要额外权限 | 需要 CREATE VIEW 权限 |
| 适用场景 | 临时性、一次性的复杂查询 | 经常被复用的查询逻辑 |
Q6: 面试题——用递归 CTE 查找某节点的所有上级
答案:
-- 从叶子节点向上查找所有祖先
WITH RECURSIVE ancestors AS (
-- 起点:从指定员工开始
SELECT id, name, manager_id, 0 AS level
FROM employees WHERE id = 5 -- 李四
UNION ALL
-- 递归:向上查找上级
SELECT e.id, e.name, e.manager_id, a.level + 1
FROM employees e
JOIN ancestors a ON e.id = a.manager_id
)
SELECT * FROM ancestors;
-- 结果:李四 → VP技术 → CEO
Q7: UNION ALL vs UNION 在 CTE 中的选择?
答案:
递归 CTE 中必须用 UNION ALL。非递归 CTE 中两者都可以。
-- ✅ 递归 CTE 用 UNION ALL
WITH RECURSIVE cte AS (
SELECT 1 AS n
UNION ALL -- 必须是 UNION ALL
SELECT n + 1 FROM cte WHERE n < 10
)
SELECT * FROM cte;
-- 非递归 CTE 中可以用 UNION 去重
WITH combined AS (
SELECT user_id FROM table_a
UNION -- 自动去重
SELECT user_id FROM table_b
)
SELECT * FROM combined;
Q8: 面试题——用 CTE 实现层级缩进展示
答案:
-- 展示组织架构的层级缩进效果
WITH RECURSIVE org_tree AS (
SELECT id, name, manager_id, 0 AS level, CAST(name AS CHAR(200)) AS display
FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, o.level + 1,
CAST(CONCAT(REPEAT(' ', o.level + 1), '├── ', e.name) AS CHAR(200))
FROM employees e
JOIN org_tree o ON e.manager_id = o.id
)
SELECT display FROM org_tree ORDER BY level, name;
输出:
CEO
├── VP市场
├── 王五
├── VP技术
├── 张三
├── 李四
相关链接
- MySQL WITH(CTE)
- PostgreSQL WITH Queries
- 递归 CTE 详解
- 窗口函数 - CTE 常与窗口函数搭配使用
- 子查询 - CTE 和子查询的对比选择
- 复杂聚合 - CTE 中的聚合操作