跳到主要内容

窗口函数

问题

什么是窗口函数?有哪些常用窗口函数?窗口函数和 GROUP BY 有什么区别?

答案

一、窗口函数的概念

窗口函数对一组(窗口)行执行计算,但 不折叠行(不同于 GROUP BY 会合并行)。

-- GROUP BY:每组只返回一行
SELECT department, AVG(salary) FROM employees GROUP BY department;

-- 窗口函数:每行都保留,同时附上聚合结果
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
核心区别

GROUP BY 合并行,窗口函数 保留行并附加计算列

二、窗口函数语法

函数名(参数) OVER (
[PARTITION BY 分组列] -- 分窗口(类似 GROUP BY,但不折叠)
[ORDER BY 排序列 [ASC|DESC]] -- 窗口内排序
[ROWS/RANGE frame_spec] -- 窗口帧范围
)

三、常用窗口函数

1. 排名函数

SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rnk
FROM employees;
函数并列处理示例(分数:100, 100, 90)
ROW_NUMBER()不跳号,强制唯一1, 2, 3
RANK()并列同号,跳号1, 1, 3
DENSE_RANK()并列同号,不跳号1, 1, 2

典型应用:取每组 Top N

-- 每个部门薪资最高的 3 个人
SELECT * FROM (
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) t WHERE rn <= 3;

2. 偏移函数

SELECT date, revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_day, -- 前一行
LEAD(revenue, 1) OVER (ORDER BY date) AS next_day, -- 后一行
FIRST_VALUE(revenue) OVER (ORDER BY date) AS first_day, -- 第一行
LAST_VALUE(revenue) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_day -- 最后一行
FROM daily_revenue;
函数说明
LAG(col, n, default)当前行的前 n 行值
LEAD(col, n, default)当前行的后 n 行值
FIRST_VALUE(col)窗口第一行值
LAST_VALUE(col)窗口最后一行值
NTH_VALUE(col, n)窗口第 n 行值

典型应用:计算环比

SELECT date, revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS day_diff,
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY date))::numeric
/ LAG(revenue, 1) OVER (ORDER BY date) * 100, 2
) AS growth_pct
FROM daily_revenue;

3. 聚合窗口函数

普通聚合函数(SUM, AVG, COUNT, MAX, MIN)都可以作为窗口函数使用:

SELECT date, revenue,
SUM(revenue) OVER (ORDER BY date) AS running_total, -- 累计求和
AVG(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d, -- 7 天移动平均
COUNT(*) OVER (PARTITION BY department) AS dept_count -- 部门人数
FROM daily_revenue;

4. 分布函数

SELECT name, salary,
NTILE(4) OVER (ORDER BY salary) AS quartile, -- 分为 4 组
PERCENT_RANK() OVER (ORDER BY salary) AS pct_rank, -- 百分比排名
CUME_DIST() OVER (ORDER BY salary) AS cum_dist -- 累积分布
FROM employees;

四、窗口帧(Frame)

窗口帧定义了计算范围:

-- ROWS:按物理行偏移
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW -- 前 2 行到当前行

-- RANGE:按值范围
RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW -- 前 7 天到当前

-- 常用帧
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- 从头到当前(累计)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- 全部行
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- 前后各 1 行
LAST_VALUE 的陷阱

默认窗口帧是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,所以 LAST_VALUE 默认返回的是当前行(不是真正的最后一行)。

需要显式指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

五、WINDOW 子句(命名窗口)

多个函数共用同一个窗口定义时,使用命名窗口避免重复:

SELECT name, department, salary,
ROW_NUMBER() OVER w AS rn,
RANK() OVER w AS rnk,
AVG(salary) OVER w AS dept_avg
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

常见面试问题

Q1: ROW_NUMBER、RANK、DENSE_RANK 的区别?

答案
以成绩 [100, 100, 90, 80] 为例:

ROW_NUMBERRANKDENSE_RANK
100111
100211
90332
80443
  • ROW_NUMBER:唯一序号,并列随机分先后
  • RANK:并列同号,后面跳号
  • DENSE_RANK:并列同号,后面不跳号

Q2: 窗口函数的执行顺序?

答案

FROM → WHERE → GROUP BY → HAVING → SELECT(窗口函数在这里) → ORDER BY → LIMIT

窗口函数在 WHERE 和 GROUP BY 之后执行,所以不能在 WHERE 中使用窗口函数的结果。需要用子查询或 CTE 包一层。

Q3: 如何用窗口函数实现连续登录天数?

答案

WITH login_groups AS (
SELECT user_id, login_date,
login_date - ROW_NUMBER() OVER (
PARTITION BY user_id ORDER BY login_date
)::int AS grp
FROM user_logins
)
SELECT user_id, MIN(login_date) AS start_date,
MAX(login_date) AS end_date,
COUNT(*) AS consecutive_days
FROM login_groups
GROUP BY user_id, grp
HAVING COUNT(*) >= 3; -- 连续登录 3 天以上

原理:连续日期减去行号后,差值相同的属于同一连续段。


相关链接