子查询
问题
什么是子查询?标量子查询、关联子查询、EXISTS 和 IN 有什么区别?
答案
子查询(Subquery) 是嵌套在另一个 SQL 语句内部的查询。你可以把它理解为"先查一步,把结果给外层用"。子查询让你能在一条 SQL 语句中完成更复杂的逻辑。
-- 外层查询使用子查询的结果
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees); -- 子查询
子查询的分类
按返回结果分
| 类型 | 返回结果 | 示例 |
|---|---|---|
| 标量子查询 | 一个值 | (SELECT AVG(salary) FROM employees) |
| 行子查询 | 一行多列 | (SELECT dept_id, MAX(salary) FROM ...) |
| 列子查询 | 一列多行 | (SELECT dept_id FROM departments WHERE ...) |
| 表子查询 | 多行多列 | (SELECT * FROM employees WHERE ...) |
按依赖关系分
| 类型 | 说明 | 执行方式 |
|---|---|---|
| 非关联子查询 | 子查询独立执行,不依赖外层 | 子查询执行 1 次 |
| 关联子查询 | 子查询引用了外层查询的列 | 外层每处理 1 行,子查询执行 1 次 |
标量子查询
返回一个单一值,可以用在 SELECT、WHERE、HAVING 中。
-- WHERE 中使用:查找薪资高于平均值的员工
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- SELECT 中使用:每个员工的薪资与平均值的差距
SELECT name, salary,
salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees;
列子查询 + IN
子查询返回一列数据,配合 IN / NOT IN 使用。
-- 查出有员工的部门信息
SELECT * FROM departments
WHERE dept_id IN (SELECT DISTINCT dept_id FROM employees WHERE dept_id IS NOT NULL);
-- 查出没有下过单的用户
SELECT * FROM users
WHERE id NOT IN (SELECT DISTINCT user_id FROM orders WHERE user_id IS NOT NULL);
如果子查询结果中包含 NULL,NOT IN 会返回空结果:
-- 如果 orders.user_id 有 NULL 值
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM orders);
-- 结果可能是空的!
-- 安全做法:排除 NULL
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL);
EXISTS 与 NOT EXISTS
EXISTS 检查子查询是否返回了至少一行。比 IN 在某些场景下性能更好。
-- 查出有订单的用户(等价于 IN 写法)
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
-- 查出没有订单的用户(等价于 NOT IN 写法)
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
- 子查询结果集小 → 用
IN(数据库可以先执行子查询获取列表) - 子查询结果集大,外层结果集小 → 用
EXISTS(对外层每行检查是否存在匹配) - 子查询可能包含 NULL → 用
EXISTS(避免 NOT IN 的 NULL 陷阱) - 不确定 → 用
EXISTS,通常是更安全的选择
关联子查询
关联子查询引用了外层查询的列,外层每处理一行,子查询就执行一次。
-- 查出每个部门薪资最高的员工
SELECT * FROM employees e1
WHERE salary = (
SELECT MAX(salary)
FROM employees e2
WHERE e2.department = e1.department -- 引用了外层的 e1
);
-- 查出薪资高于本部门平均值的员工
SELECT * FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department
);
关联子查询对外层的每一行都要执行一次子查询。假设外层有 10000 行,子查询就要执行 10000 次。如果子查询本身比较复杂或数据量大,性能会很差。
优化方式:尝试改写为 JOIN 或窗口函数。
关联子查询改写为 JOIN
-- 原始:查出薪资高于本部门平均值的员工(关联子查询)
SELECT * FROM employees e1
WHERE salary > (
SELECT AVG(salary) FROM employees e2 WHERE e2.department = e1.department
);
-- 改写:先计算每个部门的平均薪资,再 JOIN(通常更快)
SELECT e.*
FROM employees e
JOIN (
SELECT department, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
) dept_avg ON e.department = dept_avg.department
WHERE e.salary > dept_avg.avg_sal;
ALL / ANY / SOME 操作符
这三个操作符配合子查询使用,对子查询返回的结果集进行比较判断。
ALL — 与所有值比较
-- 查出薪资高于"所有"市场部员工的员工
SELECT * FROM employees
WHERE salary > ALL (
SELECT salary FROM employees WHERE department = '市场部'
);
-- 等价于:salary > (SELECT MAX(salary) FROM employees WHERE department = '市场部')
> ALL (子查询)=> 子查询的 MAX< ALL (子查询)=< 子查询的 MIN= ALL (子查询)= 等于子查询的所有值(通常只有所有值相同时才为 true)<> ALL (子查询)= 等价于NOT IN
ANY / SOME — 与任意一个值比较
ANY 和 SOME 是同义词,满足子查询中任意一行即可。
-- 查出薪资高于"任意一个"市场部员工的员工
SELECT * FROM employees
WHERE salary > ANY (
SELECT salary FROM employees WHERE department = '市场部'
);
-- 等价于:salary > (SELECT MIN(salary) FROM employees WHERE department = '市场部')
-- = ANY 等价于 IN
SELECT * FROM employees
WHERE department = ANY (
SELECT department FROM employees WHERE salary > 20000
);
-- 等价于:department IN (SELECT department FROM employees WHERE salary > 20000)
ALL / ANY 对比
| 操作 | ALL | ANY / SOME |
|---|---|---|
> ALL / > ANY | 大于最大值 | 大于最小值 |
< ALL / < ANY | 小于最小值 | 小于最大值 |
= ALL / = ANY | 等于所有(少见) | 等于任意一个 = IN |
<> ALL / <> ANY | 不等于任何 = NOT IN | 不等于某一个(几乎总是 true) |
如果子查询结果包含 NULL:
> ALL (1, 2, NULL)→ 结果为 UNKNOWN(不确定是否大于 NULL)> ANY (1, 2, NULL)→ 只要大于 1 或 2 就为 TRUE- 解决方案:在子查询中加
WHERE col IS NOT NULL过滤掉 NULL
子查询在不同位置的用法
FROM 子句中(派生表)
子查询放在 FROM 中,会生成一个临时的虚拟表(又叫派生表或内联视图)。
-- 先计算每个部门的统计,再筛选
SELECT * FROM (
SELECT department, COUNT(*) AS cnt, AVG(salary) AS avg_sal
FROM employees
GROUP BY department
) dept_stats -- 派生表必须有别名
WHERE cnt > 2 AND avg_sal > 15000;
-- 数据分析场景:计算每个用户的首单信息
SELECT u.name, first_order.order_date, first_order.amount
FROM users u
JOIN (
SELECT user_id, MIN(order_date) AS order_date,
-- 注意:这里的 amount 不一定是首单金额,需要用子查询或窗口函数
(SELECT amount FROM orders o2
WHERE o2.user_id = o1.user_id
ORDER BY order_date LIMIT 1) AS amount
FROM orders o1
GROUP BY user_id
) first_order ON u.id = first_order.user_id;
- MySQL 中派生表必须有别名,否则报错
- 派生表在某些数据库中不能使用索引,可能会被物化为临时表
- 如果派生表结果集很大,考虑用 CTE(
WITH子句)替代,可读性更好
SELECT 子句中(标量子查询)
-- 每个员工的薪资排名
SELECT name, salary,
(SELECT COUNT(DISTINCT salary) FROM employees e2 WHERE e2.salary > e1.salary) + 1 AS rank
FROM employees e1
ORDER BY salary DESC;
-- 每个部门的人数和整体占比
SELECT department,
COUNT(*) AS dept_count,
(SELECT COUNT(*) FROM employees) AS total_count,
ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM employees), 2) AS percentage
FROM employees
GROUP BY department;
SELECT 中的子查询必须返回单行单列(标量值)。如果返回多行,数据库会报错。
HAVING 子句中
-- 人数超过公司平均部门人数的部门
SELECT department, COUNT(*) AS cnt
FROM employees
GROUP BY department
HAVING cnt > (SELECT COUNT(*) / COUNT(DISTINCT department) FROM employees);
-- 订单金额合计超过全公司平均值的客户
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > (SELECT AVG(total) FROM (
SELECT SUM(amount) AS total FROM orders GROUP BY customer_id
) t);
WHERE 子句中(最常见)
-- 行子查询:查找与张三同部门同职级的员工
SELECT * FROM employees
WHERE (department, job_level) = (
SELECT department, job_level FROM employees WHERE name = '张三'
);
-- 表子查询:查找薪资和部门与经理匹配的员工
SELECT * FROM employees
WHERE (salary, department) IN (
SELECT salary, department FROM employees WHERE role = 'manager'
);
子查询 vs JOIN vs CTE
| 对比维度 | 子查询 | JOIN | CTE(WITH) |
|---|---|---|---|
| 可读性 | 嵌套深了难读 | 关系清晰 | 最好,分步骤 |
| 复用性 | 不能复用 | - | 同一语句内可多次引用 |
| 性能 | 优化器通常会改写 | 通常最优 | 多数等价子查询 |
| 适用场景 | 简单过滤、EXISTS | 多表关联 | 复杂逻辑、递归 |
-- 同一个需求的三种写法:查出有订单的用户
-- 1. 子查询
SELECT * FROM users WHERE id IN (SELECT DISTINCT user_id FROM orders);
-- 2. JOIN
SELECT DISTINCT u.* FROM users u JOIN orders o ON u.id = o.user_id;
-- 3. CTE
WITH order_users AS (
SELECT DISTINCT user_id FROM orders
)
SELECT u.* FROM users u JOIN order_users ou ON u.id = ou.user_id;
- 简单的存在性检查:
WHERE EXISTS (...)或WHERE col IN (...) - 不需要返回另一张表的列:只做过滤,不需要 JOIN 拿数据
- 标量值计算:在 SELECT 中嵌入一个计算值
- 一次性使用:结果只用一次,不需要 CTE 的复用能力
子查询优化技巧
1. 避免 SELECT * 在子查询中
-- ❌ 子查询中 SELECT * 浪费资源
SELECT * FROM users WHERE id IN (SELECT * FROM vip_users);
-- ✅ 只查需要的列
SELECT * FROM users WHERE id IN (SELECT user_id FROM vip_users);
2. NOT IN 替换为 NOT EXISTS
-- ❌ NOT IN 有 NULL 陷阱
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM blacklist);
-- 如果 blacklist.user_id 有 NULL,上面查询会返回空结果!
-- ✅ NOT EXISTS 安全
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM blacklist b WHERE b.user_id = u.id);
3. 关联子查询改写为 JOIN
-- ❌ 关联子查询(每行执行一次子查询)
SELECT e.name, e.salary,
(SELECT d.name FROM departments d WHERE d.id = e.department_id) AS dept_name
FROM employees e;
-- ✅ 改写为 JOIN(一次扫描)
SELECT e.name, e.salary, d.name AS dept_name
FROM employees e
LEFT JOIN departments d ON d.id = e.department_id;
4. 用 EXPLAIN 分析子查询执行计划
-- 查看数据库是否将子查询优化为半连接(semi-join)
EXPLAIN SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);
-- 关注以下字段:
-- type: 是否用到索引(ref/range)而不是全表扫描(ALL)
-- Extra: 是否出现 "Using subquery" 或被优化为 "MATERIALIZED"
数据分析实战场景
场景一:用户分层分析(RFM 模型简化版)
-- 根据最近一次消费时间和消费总额对用户分层
SELECT
user_id,
last_order_date,
total_amount,
CASE
WHEN last_order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY) AND total_amount >= 1000 THEN '高价值活跃用户'
WHEN last_order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY) AND total_amount < 1000 THEN '低价值活跃用户'
WHEN last_order_date < DATE_SUB(CURDATE(), INTERVAL 90 DAY) AND total_amount >= 1000 THEN '高价值流失用户'
ELSE '低价值沉默用户'
END AS user_segment
FROM (
-- 派生表:计算每个用户的关键指标
SELECT
user_id,
MAX(order_date) AS last_order_date,
SUM(amount) AS total_amount
FROM orders
GROUP BY user_id
) user_stats;
场景二:查找从未完成某行为的用户
-- 查出注册超过 7 天但从未下单的用户
SELECT u.id, u.name, u.register_date
FROM users u
WHERE u.register_date < DATE_SUB(CURDATE(), INTERVAL 7 DAY)
AND NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
场景三:TopN 分组排名
-- 查出每个品类下销量前 3 的商品(MySQL 8.0+)
SELECT * FROM (
SELECT
category,
product_name,
sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
FROM products
) t
WHERE rn <= 3;
-- MySQL 5.7 没有窗口函数,只能用关联子查询
SELECT p1.category, p1.product_name, p1.sales
FROM products p1
WHERE (
SELECT COUNT(*) FROM products p2
WHERE p2.category = p1.category AND p2.sales > p1.sales
) < 3
ORDER BY p1.category, p1.sales DESC;
场景四:同比环比计算
-- 计算每月营收的环比增长率
SELECT
curr.month,
curr.revenue,
prev.revenue AS prev_revenue,
ROUND((curr.revenue - prev.revenue) / prev.revenue * 100, 2) AS mom_growth_pct
FROM (
SELECT DATE_FORMAT(order_date, '%Y-%m') AS month, SUM(amount) AS revenue
FROM orders GROUP BY DATE_FORMAT(order_date, '%Y-%m')
) curr
LEFT JOIN (
SELECT DATE_FORMAT(order_date, '%Y-%m') AS month, SUM(amount) AS revenue
FROM orders GROUP BY DATE_FORMAT(order_date, '%Y-%m')
) prev ON prev.month = DATE_FORMAT(
STR_TO_DATE(CONCAT(curr.month, '-01'), '%Y-%m-%d') - INTERVAL 1 MONTH, '%Y-%m'
)
ORDER BY curr.month;
常见面试问题
Q1: 子查询和 JOIN 哪个性能好?
答案:
不能一概而论,取决于具体场景和数据库优化器:
| 场景 | 推荐 |
|---|---|
| 需要查询另一张表是否存在匹配 | EXISTS 或 JOIN 都可以 |
| 需要返回另一张表的多列数据 | JOIN 更好 |
| 简单的值列表过滤 | IN 子查询简洁 |
现代数据库优化器通常会将 IN 子查询自动转换为 JOIN(半连接优化),性能差异不大。建议优先选择语义更清晰的写法。
但以下情况子查询可能更慢:
- 关联子查询:对外层每行都要执行一次子查询
- NOT IN 碰到 NULL:可能导致无法使用索引
- 嵌套层数太深:优化器可能放弃优化
Q2: 写出查找每个部门第二高薪资的 SQL
答案:
-- 方法 1:关联子查询
SELECT department, MAX(salary) AS second_highest
FROM employees e1
WHERE salary < (SELECT MAX(salary) FROM employees e2 WHERE e2.department = e1.department)
GROUP BY department;
-- 方法 2:窗口函数(更推荐)
SELECT department, salary FROM (
SELECT department, salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rk
FROM employees
) t WHERE rk = 2;
-- 方法 3:LIMIT + OFFSET(单个部门)
SELECT salary FROM employees
WHERE department = '技术部'
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
Q3: EXISTS 和 IN 什么时候用哪个?
答案:
对于 EXISTS vs IN:
-- IN:适合子查询结果集较小的情况
SELECT * FROM users WHERE id IN (SELECT user_id FROM vip_users);
-- EXISTS:适合外层结果集较小,子查询结果集较大的情况
SELECT * FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
关键区别:
IN先执行子查询,把结果缓存,再逐一比较外层EXISTS对外层每行,执行子查询看是否有结果NOT IN有 NULL 陷阱,NOT EXISTS更安全- 大多数情况下推荐
NOT EXISTS
Q4: 面试题——查出连续 3 天都有登录记录的用户
答案:
-- 思路:自连接,找同一用户连续 3 天的记录
SELECT DISTINCT a.user_id
FROM login_log a
JOIN login_log b ON a.user_id = b.user_id AND b.login_date = DATE_ADD(a.login_date, INTERVAL 1 DAY)
JOIN login_log c ON a.user_id = c.user_id AND c.login_date = DATE_ADD(a.login_date, INTERVAL 2 DAY);
-- 更优雅的窗口函数解法(见 SQL 分析进阶)
SELECT DISTINCT user_id FROM (
SELECT user_id, login_date,
login_date - ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) * INTERVAL '1 day' AS grp
FROM (SELECT DISTINCT user_id, login_date FROM login_log) t
) t2
GROUP BY user_id, grp
HAVING COUNT(*) >= 3;
Q5: NOT IN 的 NULL 陷阱是什么?
答案:
-- 假设 blacklist 表中有数据:(1), (2), (NULL)
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM blacklist);
-- 返回空结果!一行都查不到
原因:NOT IN (1, 2, NULL) 展开为 id <> 1 AND id <> 2 AND id <> NULL。任何值与 NULL 比较都返回 UNKNOWN,整个条件变为 UNKNOWN → 不满足 WHERE 条件。
解决方案:
-- 方案 1:过滤 NULL
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM blacklist WHERE user_id IS NOT NULL);
-- 方案 2:改用 NOT EXISTS(推荐)
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM blacklist b WHERE b.user_id = u.id);
-- 方案 3:LEFT JOIN + IS NULL
SELECT u.* FROM users u
LEFT JOIN blacklist b ON u.id = b.user_id
WHERE b.user_id IS NULL;
Q6: 什么是半连接(Semi-Join)?
答案:
半连接是数据库优化器内部使用的概念,用来高效执行 EXISTS 和 IN 子查询。
- 普通 JOIN:左表的一行和右表匹配到多行时,结果会有多行(笛卡尔积)
- 半连接:左表的一行只要在右表中找到第一个匹配就停止,不会产生重复行
-- 这是逻辑上的半连接,优化器可能内部实现为 Semi-Join
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);
-- 用 EXPLAIN 看 Extra 列是否出现 "LooseScan"、"FirstMatch" 等关键字
-- 这些都是半连接的具体算法
Q7: 子查询中能用 ORDER BY 和 LIMIT 吗?
答案:
可以,但有限制:
-- ✅ 在 FROM 子句(派生表)中可以
SELECT * FROM (
SELECT * FROM orders ORDER BY amount DESC LIMIT 10
) top_orders;
-- ✅ 在 IN 子查询中可以(但不同数据库行为不同)
SELECT * FROM users WHERE id IN (
SELECT user_id FROM orders ORDER BY amount DESC LIMIT 100
);
-- ⚠️ MySQL 5.7 不允许在 IN 子查询中使用 LIMIT,8.0+ 可以
-- ❌ 在 EXISTS 子查询中没有意义
SELECT * FROM users u WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id ORDER BY amount -- ORDER BY 被忽略
);
Q8: 如何用子查询实现行转列?
答案:
-- 将每个学生的各科成绩从多行变成一行
SELECT
student_name,
(SELECT score FROM scores s WHERE s.student_id = st.id AND s.subject = '数学') AS math_score,
(SELECT score FROM scores s WHERE s.student_id = st.id AND s.subject = '语文') AS chinese_score,
(SELECT score FROM scores s WHERE s.student_id = st.id AND s.subject = '英语') AS english_score
FROM students st;
-- 更好的方式:用 CASE WHEN + 聚合函数
SELECT
student_name,
MAX(CASE WHEN subject = '数学' THEN score END) AS math_score,
MAX(CASE WHEN subject = '语文' THEN score END) AS chinese_score,
MAX(CASE WHEN subject = '英语' THEN score END) AS english_score
FROM scores s JOIN students st ON s.student_id = st.id
GROUP BY student_name;
更多行列转换技巧请参考 行列转换。
Q9: 面试题——查出只买过一次商品的客户
答案:
-- 方法 1:HAVING + 子查询
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders GROUP BY customer_id HAVING COUNT(*) = 1
);
-- 方法 2:关联子查询
SELECT * FROM customers c
WHERE (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.id) = 1;
-- 方法 3:JOIN + GROUP BY
SELECT c.customer_id, c.customer_name
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(*) = 1;
Q10: 面试题——查出下单金额超过自己平均订单金额的订单
答案:
-- 方法 1:关联子查询
SELECT * FROM orders o1
WHERE amount > (
SELECT AVG(amount) FROM orders o2 WHERE o2.customer_id = o1.customer_id
);
-- 方法 2:JOIN 派生表(推荐)
SELECT o.*
FROM orders o
JOIN (
SELECT customer_id, AVG(amount) AS avg_amount
FROM orders
GROUP BY customer_id
) ca ON o.customer_id = ca.customer_id
WHERE o.amount > ca.avg_amount;
-- 方法 3:窗口函数
SELECT * FROM (
SELECT *, AVG(amount) OVER (PARTITION BY customer_id) AS avg_amount
FROM orders
) t
WHERE amount > avg_amount;
相关链接
- MySQL 子查询
- MySQL 子查询优化
- PostgreSQL 子查询
- PostgreSQL 子查询性能
- 窗口函数 - 很多子查询场景可以用窗口函数替代
- CTE 公共表表达式 - 复杂子查询的可读性替代方案
- JOIN 连接查询 - 子查询与 JOIN 的对比选择
- 聚合函数与 GROUP BY - 子查询中常用的聚合操作