306 lines
7.0 KiB
Markdown
306 lines
7.0 KiB
Markdown
# 窗口函数速查
|
||
|
||
## 基本语法
|
||
|
||
```sql
|
||
函数名() OVER (
|
||
PARTITION BY 分组字段 -- 可选:分组
|
||
ORDER BY 排序字段 -- 可选:排序
|
||
ROWS/RANGE 窗口范围 -- 可选:窗口范围
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
## 排序函数
|
||
|
||
| 函数 | 说明 | 特点 | 适用场景 |
|
||
|------|------|------|----------|
|
||
| ROW_NUMBER() | 连续排名 | 不跳号,相同值不同排名 | 每组取前N条、去重 |
|
||
| RANK() | 排名 | 跳号,相同值相同排名 | 成绩排名 |
|
||
| DENSE_RANK() | 紧密排名 | 不跳号,相同值相同排名 | 连续名次 |
|
||
| NTILE(n) | 分桶 | 分成n组 | 数据分片、抽样 |
|
||
|
||
### ROW_NUMBER 示例
|
||
|
||
```sql
|
||
-- 每个部门薪资最高的员工(取第一名)
|
||
SELECT *
|
||
FROM (
|
||
SELECT
|
||
name,
|
||
department,
|
||
salary,
|
||
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
|
||
FROM employees
|
||
) t
|
||
WHERE rn = 1
|
||
|
||
-- 去重:每个用户取最新订单
|
||
SELECT *
|
||
FROM (
|
||
SELECT
|
||
*,
|
||
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
|
||
FROM orders
|
||
) t
|
||
WHERE rn = 1
|
||
```
|
||
|
||
### RANK 与 DENSE_RANK 示例
|
||
|
||
```sql
|
||
-- 成绩排名(跳号)
|
||
SELECT
|
||
name,
|
||
score,
|
||
RANK() OVER (ORDER BY score DESC) AS rank, -- 1,2,2,4,5...
|
||
DENSE_RANK() OVER (ORDER BY score DESC) AS dense -- 1,2,2,3,4...
|
||
FROM students
|
||
```
|
||
|
||
### NTILE 示例
|
||
|
||
```sql
|
||
-- 将用户分成4组(业绩分位)
|
||
SELECT
|
||
name,
|
||
sales,
|
||
NTILE(4) OVER (ORDER BY sales DESC) AS quartile -- 1(最高)到4(最低)
|
||
FROM sales_data
|
||
```
|
||
|
||
---
|
||
|
||
## 聚合函数(窗口内)
|
||
|
||
| 函数 | 说明 |
|
||
|------|------|
|
||
| SUM(col) OVER | 累计求和 |
|
||
| AVG(col) OVER | 累计平均 |
|
||
| COUNT(col) OVER | 窗口内计数 |
|
||
| MAX(col) OVER | 窗口内最大值 |
|
||
| MIN(col) OVER | 窗口内最小值 |
|
||
|
||
### 累计求和示例
|
||
|
||
```sql
|
||
-- 累计销售额(从开始到当前)
|
||
SELECT
|
||
date,
|
||
amount,
|
||
SUM(amount) OVER (ORDER BY date) AS cumulative_amount
|
||
FROM daily_sales
|
||
|
||
-- 分组累计(每个部门累计)
|
||
SELECT
|
||
date,
|
||
department,
|
||
amount,
|
||
SUM(amount) OVER (PARTITION BY department ORDER BY date) AS dept_cumulative
|
||
FROM sales_data
|
||
```
|
||
|
||
### 移动平均示例
|
||
|
||
```sql
|
||
-- 7天移动平均
|
||
SELECT
|
||
date,
|
||
amount,
|
||
AVG(amount) OVER (
|
||
ORDER BY date
|
||
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
|
||
) AS moving_avg_7d
|
||
FROM daily_sales
|
||
```
|
||
|
||
---
|
||
|
||
## 偏移函数
|
||
|
||
| 函数 | 说明 | 适用场景 |
|
||
|------|------|----------|
|
||
| LAG(col, n) | 取前n行的值 | 环比、查看历史值 |
|
||
| LAG(col, n, default) | 取前n行,无值返回default | 防止NULL |
|
||
| LEAD(col, n) | 取后n行的值 | 查看未来值 |
|
||
| LEAD(col, n, default) | 取后n行,无值返回default | 防止NULL |
|
||
| FIRST_VALUE(col) | 窗口第一个值 | 组内首个值 |
|
||
| LAST_VALUE(col) | 窗口最后一个值 | 组内末尾值 |
|
||
|
||
### LAG 示例(环比计算)
|
||
|
||
```sql
|
||
-- 日环比增长
|
||
SELECT
|
||
date,
|
||
amount,
|
||
LAG(amount, 1) OVER (ORDER BY date) AS prev_day,
|
||
amount - LAG(amount, 1) OVER (ORDER BY date) AS daily_growth,
|
||
ROUND((amount - LAG(amount, 1) OVER (ORDER BY date))
|
||
/ LAG(amount, 1) OVER (ORDER BY date) * 100, 2) AS growth_rate_pct
|
||
FROM daily_sales
|
||
|
||
-- 月度同比(取去年同期)
|
||
SELECT
|
||
month,
|
||
revenue,
|
||
LAG(revenue, 12) OVER (ORDER BY month) AS prev_year_revenue,
|
||
ROUND((revenue - LAG(revenue, 12) OVER (ORDER BY month))
|
||
/ LAG(revenue, 12) OVER (ORDER BY month) * 100, 2) AS yoy_growth_pct
|
||
FROM monthly_revenue
|
||
```
|
||
|
||
### LEAD 示例
|
||
|
||
```sql
|
||
-- 查看下一行数据
|
||
SELECT
|
||
date,
|
||
amount,
|
||
LEAD(amount, 1) OVER (ORDER BY date) AS next_day_amount
|
||
FROM daily_sales
|
||
```
|
||
|
||
### FIRST_VALUE / LAST_VALUE 示例
|
||
|
||
```sql
|
||
-- 每个部门薪资最高和最低的人
|
||
SELECT
|
||
name,
|
||
department,
|
||
salary,
|
||
FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_paid,
|
||
LAST_VALUE(name) OVER (
|
||
PARTITION BY department
|
||
ORDER BY salary DESC
|
||
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
|
||
) AS lowest_paid
|
||
FROM employees
|
||
```
|
||
|
||
**注意**:LAST_VALUE 默认窗口是 `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`,需要显式指定全窗口才能取到最后值。
|
||
|
||
---
|
||
|
||
## 窗口范围定义
|
||
|
||
### ROWS(基于行数)
|
||
|
||
```sql
|
||
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW -- 前3行到当前行
|
||
ROWS BETWEEN 6 PRECEDING AND 1 FOLLOWING -- 前6行到后1行
|
||
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- 从开始到当前
|
||
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING -- 当前行到后3行
|
||
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- 全窗口
|
||
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- 当前行及前后各1行(共3行)
|
||
```
|
||
|
||
### RANGE(基于值范围)
|
||
|
||
```sql
|
||
RANGE BETWEEN 100 PRECEDING AND CURRENT ROW -- 值差100以内
|
||
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW -- 7天内
|
||
```
|
||
|
||
**ROWS vs RANGE**:
|
||
- ROWS:固定行数,不受值影响
|
||
- RANGE:根据排序字段的值计算范围
|
||
|
||
---
|
||
|
||
## 实战案例
|
||
|
||
### 案例1:分组取Top N
|
||
|
||
```sql
|
||
-- 每个部门薪资前3名
|
||
SELECT *
|
||
FROM (
|
||
SELECT
|
||
name,
|
||
department,
|
||
salary,
|
||
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
|
||
FROM employees
|
||
) t
|
||
WHERE rn <= 3
|
||
```
|
||
|
||
### 案例2:连续问题判断
|
||
|
||
```sql
|
||
-- 判断是否连续增长(连续3天增长)
|
||
SELECT
|
||
date,
|
||
amount,
|
||
CASE
|
||
WHEN amount > LAG(amount, 1) OVER (ORDER BY date)
|
||
AND LAG(amount, 1) OVER (ORDER BY date) > LAG(amount, 2) OVER (ORDER BY date)
|
||
THEN '连续增长'
|
||
ELSE '非连续增长'
|
||
END AS trend
|
||
FROM daily_sales
|
||
```
|
||
|
||
### 案例3:组内占比
|
||
|
||
```sql
|
||
-- 每个部门各员工薪资占比
|
||
SELECT
|
||
name,
|
||
department,
|
||
salary,
|
||
ROUND(salary / SUM(salary) OVER (PARTITION BY department) * 100, 2) AS salary_pct
|
||
FROM employees
|
||
```
|
||
|
||
### 案例4:累计百分比(帕累托分析)
|
||
|
||
```sql
|
||
-- 80/20分析:哪些客户贡献了80%销售额
|
||
SELECT
|
||
customer_id,
|
||
sales_amount,
|
||
ROUND(SUM(sales_amount) OVER (ORDER BY sales_amount DESC)
|
||
/ SUM(sales_amount) OVER () * 100, 2) AS cumulative_pct
|
||
FROM customer_sales
|
||
ORDER BY sales_amount DESC
|
||
```
|
||
|
||
### 案例5:缺失值填充
|
||
|
||
```sql
|
||
-- 用前一个有效值填充NULL
|
||
SELECT
|
||
date,
|
||
amount,
|
||
LAST_VALUE(amount IGNORE NULLS) OVER (ORDER BY date) AS filled_amount
|
||
FROM sales_data
|
||
```
|
||
|
||
---
|
||
|
||
## 性能优化建议
|
||
|
||
1. **减少 PARTITION BY 分组数量**:分组越多,计算越慢
|
||
2. **合理使用窗口范围**:避免全窗口扫描
|
||
3. **先过滤再窗口**:WHERE 条件前置,减少数据量
|
||
4. **避免嵌套窗口函数**:多次调用会重复计算
|
||
|
||
```sql
|
||
-- 推荐:先过滤
|
||
SELECT
|
||
*,
|
||
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
|
||
FROM employees
|
||
WHERE hire_date >= '2024-01-01' -- 先过滤,减少数据量
|
||
|
||
-- 不推荐:先窗口再过滤(窗口函数在全部数据上执行)
|
||
SELECT *
|
||
FROM (
|
||
SELECT *, ROW_NUMBER() OVER (...) AS rn
|
||
FROM employees
|
||
) t
|
||
WHERE hire_date >= '2024-01-01'
|
||
``` |