1.基本概念
学习目标
开窗函数分为两个部分分别是
-了解分析函数作用和类型
1.聚合,排名,偏移,分布函数 。
-使用分析函数产生报告
2.开窗分区,排序,框架。
下面举个例子
分析函数
SELECT empid, ordermonth, val,
SUM(val) OVER(PARTITION BY empid
ORDER BY ordermonth
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS runval
FROM Sales.EmpOrders;
分析函数用于计算一些基于组的聚合值,它与聚合函数的区别在于,分析函数每组返回多行,聚合函数每组返回一行。
sum(val) 就是集合函数
over() 就是开窗 PARTITION BY empid 就是开窗分区(分组) ORDER BY
ordermonth 开窗排序
一般分析函数
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 开窗架构
ROW_NUMBER() OVER(PARTITION BY … ORDER BY …)
按分区或返回记录生成唯一编号
2.排名开窗函数
RANK() OVER(PARTITION BY … ORDER BY …)
按分区或返回记录排序,会跳号
SQL SERVER 支持4个排名函数 ROW_NUMBER,RANK,DENSE_RANK ,NTLE
来看看它们分别的作用
DENSE_RANK() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录排序,不跳号
SELECT orderid, custid, val,
ROW_NUMBER() OVER(ORDER BY val) AS rownum,
RANK() OVER(ORDER BY val) AS rank,
DENSE_RANK() OVER(ORDER BY val) AS dense_rank,
NTILE(10) OVER(ORDER BY val) AS ntile
FROM Sales.OrderValues
ORDER BY val;
COUNT() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录进行计数
MAX() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录计算最大值
可以看到 它们不同排序规则
MIN() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录计算最小值
ROW_NUMBER() 对排序字段行号进行排序
SUM() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录进行求和
RANK() 对数值进行排序 对相同数值有行号占用
AVG() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录求出平均值
DENSE_RANK() 也是对数值排序 如果有相同数值 依旧会按照原先行号加
FIRST_VALUE() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录的第一个值
NTILE 分区排序 为每一行分配一个区号 如果分10区 会对所有数据进行分区
总数据/分区数 就是每多少数据为一区
LAST_VALUE() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录的最后一个值
ROW_NUMBER() 默认在 DISTINCT 筛选重复项之前计算
LAG() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录上偏移值
2.偏移开窗函数
LEAD() OVER(PARTITION BY … ORDER BY
…) 按分区或返回记录下偏移值
sql server 中总共有4个偏移函数 LAG 和 LEAD, FIRST_VALUE 和 LAST_VALUE
LAG函数在当前行之前查找
函数语法说明
LEAD函数在当前行之后查找
PARITION BY
LAST_VALUE 返回查找结果的最后一行
Function_name(…) Over(partition by col_name)
FIRST_VALUE 返回查找结果的第一行
用于分区,按列进行分区
3.聚合开窗函数
ORDER BY (默认窗口)
SUM() OVER()
Function_name(…) Over(Order by col_name [rows |
range between n|unbounded preceding and n| unbounded following])
AVG() OVER()
-rows:【行】前n行、后n行或当前行,用于计算
COUNT() OVER()
-range:【范围】大于或小于当前值的n行,或使用前n行来计算
MAX() OVER() 等等
-unbounded:【无界限】所有行都使用计算
ROW_NUMBER() OVER(partition by … order by …)
功能与rownum伪列类似,order
by子句中指定有序列,从1开始为分区中的每一行或查询返回的每一行分配一个唯一的编号。
注意ROW_NUMBER(这里不能限定列名)
查询按部门代码求出薪水排名:
13:49:52 SQL> set pagesize 500
13:50:09 SQL> col last_name format a20
13:50:09 SQL> select last_name,department_id,salary,row_number() over(partition by department_id order by salary ) row_num
13:50:09 2 from employees;
LAST_NAME DEPARTMENT_ID SALARY ROW_NUM
-------------------- ------------- ---------- ----------
Whalen 10 4390 1
Fay 20 5990 1
Hartstein 20 12990 2
Colmenares 30 2490 1
Himuro 30 2590 2
Tobias 30 2790 3
Baida 30 2890 4
Khoo 30 3090 5
Raphaely 30 10990 6
Mavris 40 6490 1
RANK() OVER(PARTITION BY … ORDER BY …)
为查询返回的每一行并列排序,相同排名后的排名会出现跳号
查询部门代码50,工资在3000~6000之间的排名情况
13:56:59 SQL> set pagesize 500
13:57:30 SQL> col last_name format a20
13:57:30 SQL> select last_name,department_id,salary,rank() over(partition by department_id order by salary ) row_num
13:57:30 2 from employees
13:57:30 3 where department_id =50
13:57:30 4 and salary between 3000 and 6000;
LAST_NAME DEPARTMENT_ID SALARY ROW_NUM
-------------------- ------------- ---------- ----------
Fleaur 50 3090 1
Walsh 50 3090 1
Davies 50 3090 1
Nayer 50 3190 4--出现跳号
McCain 50 3190 4
Taylor 50 3190 4
Stiles 50 3190 4
Bissot 50 3290 8--出现跳号
Mallin 50 3290 8
Dellinger 50 3390 10--出现跳号
Rajs 50 3490 11
Dilly 50 3590 12
Ladwig 50 3590 12
Chung 50 3790 14--出现跳号
Everett 50 3890 15
Bell 50 3990 16
Bull 50 4090 17
Sarchand 50 4190 18
Mourgos 50 5790 19
已選取 19 個資料列.
DENSE_RANK() OVER(PARTITION BY … ORDER BY …)
为查询返回的每一行并列排序,相同排名后的排名不会跳号
查询部门代码50,工资在3000~6000之间的排名情况
14:01:48 SQL> set pagesize 500
14:02:06 SQL> col last_name format a20
14:02:06 SQL> select last_name,department_id,salary,dense_rank() over(partition by department_id order by salary ) row_num
14:02:06 2 from employees
14:02:06 3 where department_id =50
14:02:06 4 and salary between 3000 and 6000;
LAST_NAME DEPARTMENT_ID SALARY ROW_NUM
-------------------- ------------- ---------- ----------
Fleaur 50 3090 1
Walsh 50 3090 1
Davies 50 3090 1
Nayer 50 3190 2
McCain 50 3190 2
Taylor 50 3190 2
Stiles 50 3190 2
Bissot 50 3290 3
Mallin 50 3290 3
Dellinger 50 3390 4
Rajs 50 3490 5
Dilly 50 3590 6
Ladwig 50 3590 6
Chung 50 3790 7
Everett 50 3890 8
Bell 50 3990 9
Bull 50 4090 10
Sarchand 50 4190 11
Mourgos 50 5790 12
已選取 19 個資料列.
COUNT() OVER(PARTITION BY … ORDER BY …)
返回查询记录或分区的计数值(次数)
14:11:48 SQL> set pagesize 500
14:12:54 SQL> col last_name format a20
14:12:54 SQL> select last_name,department_id,salary,count(salary) over( partition by department_id ) count_num
14:12:54 2 from employees;
LAST_NAME DEPARTMENT_ID SALARY COUNT_NUM
-------------------- ------------- ---------- ----------
Whalen 10 4390 1--部门10中出现了1次
Hartstein 20 12990 2--部门20中出现了2次
Fay 20 5990 2--部门20中出现了2次
Raphaely 30 10990 6--部门30中出现了6次
Khoo 30 3090 6
Baida 30 2890 6
Tobias 30 2790 6
Himuro 30 2590 6
Colmenares 30 2490 6
Mavris 40 6490 1
Weiss 50 7990 45
Fripp 50 8190 45
Kaufling 50 7890 45
Vollman 50 6490 45
MAX() OVER(PARTITION BY … ORDER BY …)
按分区返回最大的值
14:12:55 SQL> set pagesize 500
14:15:59 SQL> col last_name format a20
14:15:59 SQL> select last_name,department_id,salary,max(salary) over( partition by department_id ) max_sal
14:15:59 2 from employees
14:15:59 3 ;
LAST_NAME DEPARTMENT_ID SALARY MAX_SAL
-------------------- ------------- ---------- ----------
Whalen 10 4390 4390--部门10薪水最高的
Hartstein 20 12990 12990--部门20薪水最高的
Fay 20 5990 12990--部门20薪水最高的
Raphaely 30 10990 10990--部门30薪水最高的
Khoo 30 3090 10990
Baida 30 2890 10990
Tobias 30 2790 10990
Himuro 30 2590 10990
Colmenares 30 2490 10990
Mavris 40 6490 6490--部门40薪水最高的
Weiss 50 7990 8190--部门50薪水最高的
Fripp 50 8190 8190
Kaufling 50 7890 8190
Vollman 50 6490 8190
Mourgos 50 5790 8190
MIN() OVER(PARTITION BY … ORDER BY …)
按分区返回最小的值
14:16:00 SQL> set pagesize 500
14:18:10 SQL> col last_name format a20
14:18:10 SQL> select last_name,department_id,salary,min(salary) over( partition by department_id ) min_sal
14:18:10 2 from employees;
LAST_NAME DEPARTMENT_ID SALARY MIN_SAL
-------------------- ------------- ---------- ----------
Whalen 10 4390 4390
Hartstein 20 12990 5990
Fay 20 5990 5990
Raphaely 30 10990 2490
Khoo 30 3090 2490
Baida 30 2890 2490
Tobias 30 2790 2490
Himuro 30 2590 2490
Colmenares 30 2490 2490
Mavris 40 6490 6490
Weiss 50 7990 2090
Fripp 50 8190 2090
Kaufling 50 7890 2090
Vollman 50 6490 2090
Mourgos 50 5790 2090
SUM() OVER(PARTITION BY … ORDER BY …)
按分区汇总求和
查询所有薪水累积相加
14:32:10 SQL> set pagesize 500
14:32:40 SQL> col last_name format a20
14:32:40 SQL> select last_name,department_id,salary,sum(salary) over( order by salary ) sum_sal
14:32:40 2 from employees
14:32:40 3 where department_id=30;
LAST_NAME DEPARTMENT_ID SALARY SUM_SAL
-------------------- ------------- ---------- ----------
Colmenares 30 2490 2490
Himuro 30 2590 5080
Tobias 30 2790 7870
Baida 30 2890 10760
Khoo 30 3090 13850
Raphaely 30 10990 24840
已選取 6 個資料列.
查询按部门分区进行求和
14:19:28 SQL> set pagesize 500
14:20:06 SQL> col last_name format a20
14:20:06 SQL> select last_name,department_id,salary,sum(salary) over( partition by department_id ) sum_sal
14:20:06 2 from employees;
LAST_NAME DEPARTMENT_ID SALARY SUM_SAL
-------------------- ------------- ---------- ----------
Whalen 10 4390 4390
Hartstein 20 12990 18980
Fay 20 5990 18980
Raphaely 30 10990 24840
Khoo 30 3090 24840
Baida 30 2890 24840
Tobias 30 2790 24840
Himuro 30 2590 24840
Colmenares 30 2490 24840
Mavris 40 6490 6490
Weiss 50 7990 155950
Fripp 50 8190 155950
Kaufling 50 7890 155950
查询按部门分区前后2笔进行求和 【窗口期】
14:24:33 SQL> set pagesize 500
14:24:59 SQL> col last_name format a20
14:24:59 SQL> select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary rows between 2 preceding and 2 following ) sum_sal
14:24:59 2 from employees;
LAST_NAME DEPARTMENT_ID SALARY SUM_SAL
-------------------- ------------- ---------- ----------
Whalen 10 4390 4390 --部门只有1行
Fay 20 5990 18980
Hartstein 20 12990 18980
Colmenares 30 2490 7870--部门30,前后2行求和,7870=2490+2590+2790
Himuro 30 2590 10760--部门30,前后2行求和,10760 =2490+2590+2790+2890
Tobias 30 2790 13850--部门30,前后2行求和,13850 =2490+2590+2790+2890+3090
Baida 30 2890 22350--部门30,前后2行求和,22350 =2590+2790+2890+3090+10990
Khoo 30 3090 19760
Raphaely 30 10990 16970
Mavris 40 6490 6490
Olson 50 2090 6470
Philtanker 50 2190 8860
Markle 50 2190 11250
Gee 50 2390 11650
Landry 50 2390 11950
Patel 50 2490 12250
Vargas 50 2490 12350
Marlow 50 2490 12450
Perkins 50 2490 12550
Sullivan 50 2490 12650
OConnell 50 2590 12750
范围在200以内的【窗口期】
set pagesize 500
col last_name format a20
select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary range between 200 preceding and 200 following ) sum_sal
from employees;
范围无限制【窗口期】
set pagesize 500
col last_name format a20
select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary range between unbounded preceding and unbounded following ) sum_sal
from employees;
set pagesize 500
col last_name format a20
select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary rows between unbounded preceding and unbounded following ) sum_sal
from employees;
AVG() OVER(PARTITION BY … ORDER BY …)
按分区求出平均值
14:39:56 SQL> set pagesize 500
14:39:56 SQL> col last_name format a20
14:39:56 SQL> select last_name,department_id,salary,avg(salary) over( partition by department_id ) avg_sal
14:39:56 2 from employees;
LAST_NAME DEPARTMENT_ID SALARY AVG_SAL
-------------------- ------------- ---------- ----------
Whalen 10 4390 4390
Hartstein 20 12990 9490
Fay 20 5990 9490
Raphaely 30 10990 4140
Khoo 30 3090 4140
Baida 30 2890 4140
Tobias 30 2790 4140
Himuro 30 2590 4140
Colmenares 30 2490 4140
Mavris 40 6490 6490
Weiss 50 7990 3465.55556
Fripp 50 8190 3465.55556
Kaufling 50 7890 3465.55556
Vollman 50 6490 3465.55556
Mourgos 50 5790 3465.55556
FIRST_VALUE() OVER(PARTITION BY … ORDER BY
….) 返回一组有序的值中的第一个值。
如果集合中的第一个值为空,则函数将返回null,除非指定IGNORE NULLS
LAST_VALUE() OVER(PARTITION BY … ORDER BY
….) 返回一组有序的值中的最后一个值。
如果集合中的最后一个值为空,则函数将返回null,除非指定IGNORE NULLS。
14:49:16 SQL> set pagesize 500
14:50:48 SQL> col last_name format a20
14:50:48 SQL> select last_name,department_id,salary,first_value(salary) over( partition by department_id) first_sal,
14:50:48 2 last_value(salary) over( partition by department_id ) last_sal
14:50:48 3 from employees;
LAST_NAME DEPARTMENT_ID SALARY FIRST_SAL LAST_SAL
-------------------- ------------- ---------- ---------- ----------
Whalen 10 4390 4390 4390
Hartstein 20 12990 12990 5990
Fay 20 5990 12990 5990
Raphaely 30 10990 10990 2490
Khoo 30 3090 10990 2490
Baida 30 2890 10990 2490
Tobias 30 2790 10990 2490
Himuro 30 2590 10990 2490
Colmenares 30 2490 10990 2490
Mavris 40 6490 6490 6490
Weiss 50 7990 7990 2590
Fripp 50 8190 7990 2590
Kaufling 50 7890 7990 2590
LAG(column,n,default) OVER(PARTITION BY … ORDER BY …)
上偏移位置的返回值,n表示偏移量(正整数),default指定默认值
LEAD(column,n,default) OVER(PARTITION BY … ORDER BY …)
下偏移位置的返回值,n表示偏移量(正整数) ,default指定默认值
14:58:21 SQL> set pagesize 500
14:58:33 SQL> col last_name format a20
14:58:33 SQL> select last_name,department_id,lag(salary,1,0) over( partition by department_id order by salary) lag_sal,
14:58:33 2 salary,
14:58:33 3 lead(salary,1,NULL) over( partition by department_id order by salary) lead_sal
14:58:33 4 from employees;
LAST_NAME DEPARTMENT_ID LAG_SAL SALARY LEAD_SAL
-------------------- ------------- ---------- ---------- ----------
Whalen 10 0 4390
Fay 20 0 5990 12990
Hartstein 20 5990 12990
Colmenares 30 0 2490 2590
Himuro 30 2490 2590 2790
Tobias 30 2590 2790 2890
Baida 30 2790 2890 3090
Khoo 30 2890 3090 10990
Raphaely 30 3090 10990
Mavris 40 0 6490
Olson 50 0 2090 2190
Philtanker 50 2090 2190 2190
Markle 50 2190 2190 2390
Gee 50 2190 2390 2390
学习总结:
1.了解分析函数和聚合(组)函数之间的差异
2.一般分析函数大致分为4类
伪列类:row_number() over()、rank() over()和dense_rank()
over()
计算类:sum() over()、count() over()、max() over()、min()
over()和avg() over()
返回值类:first_value() over()和last_value() over()
偏移类:lag() over()和lead() over