0%

mysql8 with使用与场景示例:用户邀请记录查询

mysql8引入with语法,可将一个临时表作为本次查询生命周期内的公共表达式使用(Common Table Expressions,CTE),来简化复杂的嵌套查询,并可使用with recursive语法进行递归查询。本文对基本的使用方法进行示例,并给出了一个实际的应用场景:用户邀请记录查询

with CTE示例

with基本语法为:

1
2
3
4
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;

如下,使用with定义了t1这一表达式,可以多次使用:

1
2
3
4
5
WITH t1 AS ( SELECT 'matian' AS `name` )
SELECT
tb1.`name` as n1, tb2.`name` as n2
FROM
t1 as tb1, t1 as tb2

输出为:

n1 n2
matian matian

也可使用with定义多个表达式,或者在嵌套子查询中继续使用with定义表达式,如下:

1
2
3
4
5
6
7
WITH t1 AS ( SELECT 'matian' AS NAME ),
t2 AS ( SELECT 1 AS id )
SELECT
*
FROM
t1,
t2

with CTE与视图view的区别

  • with仅在当次查询中有效,执行结束即释放;视图是持久存在的,是会“物化”的,会将结果集缓存活存储
  • with仅是构建一个表达式,视图部分情况下可以进行更新操作

因此,with只是在单个查询中用于简化语句、临时替代重复子查询,视图适合高频使用的结果集的持久化

with recursive递归查询

使用with recursive语法,相当于定义一个递归函数:

1
2
3
4
5
6
7
8
WITH RECURSIVE cte_name (column_list) AS (
SELECT initial_query_result
UNION [ALL]
SELECT recursive_query
FROM cte_name
WHERE condition
)
SELECT * FROM cte_name;

initial_query_result定义初始值,recursive_query定义递归规则,condition定义递归条件,如:

1
2
3
4
5
6
WITH RECURSIVE cte ( n ) AS 
( SELECT 1 UNION ALL SELECT n + 1 FROM cte WHERE n < 5 )
SELECT
*
FROM
cte;

定义了cet(n),并指定n的初始值为1,递归规则为n = n+1,递归条件为 n < 5,输出为:

1
2
3
4
5
6
7
n
-----
1
2
3
4
5

我们可以继续一个更有用的示例:实现一个字符串rightPad函数:rightpad ( str, append, n ),str为初始字符串,append为添加的字符串,n为append的重复次数。如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
WITH recursive rightpad ( str, append, n ) AS (
SELECT
cast(
'matian' AS CHAR ( 128 )) AS str,
'8' AS append,
1 AS n UNION ALL
SELECT
concat( str, append ) AS str,
append AS append,
n + 1 AS n
FROM
rightpad
WHERE
n < 3
)
SELECT * FROM rightpad;

输出为:

str append n
matian 8 1
matian8 8 2
matian88 8 3

需要留意的是,需要使用cast函数进行字符串类型的转换,否则将这一字符长度将受限而无法得到期望的结果

说到递归,必然想起斐波那契数列,可以使用with recursive实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
WITH recursive fbnq ( n, n1, n2 ) AS (
SELECT
1,
0,
1 UNION ALL
SELECT
n1 AS b,
n2 AS n1,(
n1 + n2
) AS n2
FROM
fbnq
WHERE
n < 5
)
SELECT
n2
FROM
fbnq

这一实现里我们经过精心的构造,用三个列分别作为n、n+1、n+2,如:

1
2
3
4
5
1    0    1
0 1 1
1 1 2
1 2 3
2 3 5

规则为:

1
2
n       n1           n2=n+n1
n=n1 n1=n2 n2=n+n1

得到的结果为:

1
2
3
4
5
6
7
8
9
n2
---
1
1
2
3
5
8
13

应用示例–邀请记录树形数据查询

在用户邀请的场景(chuan销?)中,用户邀请新用户,新用户邀请新新用户…一传十十传百,如何查询邀请记录呢?这里有这样几个基本的需求:

  • 查询某个用户的1层下线、2层下线…
  • 查询某个用户的上线
  • 查询某个用户所有下线用户的总数

跳出来看,邀请记录其实是一个树形结构,对于树形结构的遍历,就不得不提递归

首先模拟邀请记录表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- ----------------------------
-- Table structure for user
-- ----------------------------
DROP TABLE IF EXISTS `user`;
CREATE TABLE `user` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
`nickname` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '昵称',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 110 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci COMMENT = '用户表' ROW_FORMAT = DYNAMIC;

-- ----------------------------
-- Records of user
-- ----------------------------
INSERT INTO `user` VALUES (99, 'S');
INSERT INTO `user` VALUES (100, 'A');
INSERT INTO `user` VALUES (201, 'B');
INSERT INTO `user` VALUES (202, 'C');
INSERT INTO `user` VALUES (303, 'D');
INSERT INTO `user` VALUES (304, 'E');
INSERT INTO `user` VALUES (305, 'F');
INSERT INTO `user` VALUES (306, 'G');
INSERT INTO `user` VALUES (407, 'H');
INSERT INTO `user` VALUES (408, 'I');
INSERT INTO `user` VALUES (409, 'J');

-- ----------------------------
-- Table structure for user_invite_tree
-- ----------------------------
DROP TABLE IF EXISTS `user_invite_tree`;
CREATE TABLE `user_invite_tree` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
`user_id` bigint(20) NULL DEFAULT NULL COMMENT '用户id',
`invite_user_id` bigint(20) NULL DEFAULT NULL COMMENT '推荐者id',
PRIMARY KEY (`id`) USING BTREE,
INDEX `user_id`(`user_id` ASC) USING BTREE,
INDEX `invite_user_id`(`invite_user_id` ASC) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 11 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci COMMENT = '推广记录表' ROW_FORMAT = DYNAMIC;

-- ----------------------------
-- Records of user_invite_tree
-- ----------------------------
INSERT INTO `user_invite_tree` VALUES (1, 100, 99);
INSERT INTO `user_invite_tree` VALUES (2, 201, 100);
INSERT INTO `user_invite_tree` VALUES (3, 202, 100);
INSERT INTO `user_invite_tree` VALUES (4, 303, 201);
INSERT INTO `user_invite_tree` VALUES (5, 304, 201);
INSERT INTO `user_invite_tree` VALUES (6, 305, 202);
INSERT INTO `user_invite_tree` VALUES (7, 306, 202);
INSERT INTO `user_invite_tree` VALUES (8, 407, 303);
INSERT INTO `user_invite_tree` VALUES (9, 408, 303);
INSERT INTO `user_invite_tree` VALUES (10, 409, 304);

邀请关系为:

1
2
3
4
5
6
7
8
9
10
11
12
13
S(99)-- A(100)---B(201)------D(303)----H(407)
| | |
| | |
| | I(408)
| |
| E(304)----J(409)
|
|
C(202)------F(305)
|
|
G(306)

查询下线

可以使用with递归查询S(99)的所有下线,我们很慢可以写出这样的sql:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
WITH recursive inivite_relation ( iid, uid ) AS (
SELECT
0 AS iid,
99 AS uid UNION ALL
SELECT
t.invite_user_id AS iid,
t.user_id AS uid
FROM
user_invite_tree t,
inivite_relation r
WHERE
t.invite_user_id = r.uid
)
SELECT
*
FROM
inivite_relation;

输出为:

iid uid
0 99
99 100
100 201
100 202
201 303
201 304
202 305
202 306
303 407
303 408
304 409

但是这一sql并不美:为了查询99的所有下线,我们在递归的初始条件中虚拟了一个0作为99的邀请人id,那么如果查询其他的,比如202的下线,难道要先查出他的邀请人手动填入作为初始条件?
其实邀请人表中就包含了邀请人id和用户id,所以只需要指定用户id,初始的邀请人id就可以查询出来:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
WITH recursive invite_relation ( iid, uid ) AS (
SELECT
invite_user_id AS iid,
user_id AS uid
FROM
user_invite_tree
WHERE
invite_user_id = 99 UNION ALL
SELECT
t.invite_user_id AS iid,
t.user_id AS uid
FROM
user_invite_tree t,
invite_relation r
WHERE
t.invite_user_id = r.uid
)
SELECT
*
FROM
invite_relation;

输出结果相同

但可能有过多的下级,我们只希望查询99的向下两级下线,那么需要增加一个层数变量限制递归层级:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
WITH recursive invite_relation ( iid, uid, layer ) AS (
SELECT
invite_user_id AS iid,
user_id AS uid,
1 AS layer
FROM
user_invite_tree
WHERE
invite_user_id = 99 UNION ALL
SELECT
t.invite_user_id AS iid,
t.user_id AS uid,
r.layer + 1 AS layer
FROM
user_invite_tree t,
invite_relation r
WHERE
t.invite_user_id = r.uid
AND layer < 2
)
SELECT
*
FROM
invite_relation;

输出为:

iid uid layer
99 100 1
100 201 2
100 202 2

或者单独查询某一层,直接在最终的select中按layer筛选即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
WITH recursive invite_relation ( iid, uid, layer ) AS (
SELECT
invite_user_id AS iid,
user_id AS uid,
1 AS layer
FROM
user_invite_tree
WHERE
invite_user_id = 99 UNION ALL
SELECT
t.invite_user_id AS iid,
t.user_id AS uid,
r.layer + 1 AS layer
FROM
user_invite_tree t,
invite_relation r
WHERE
t.invite_user_id = r.uid
AND layer < 2
)
SELECT
*
FROM
invite_relation
WHERE
layer = 2

输出为:

iid uid layer
100 201 2
100 202 2

同样的,对最终的select增加一个group和count聚合,也能轻松查询出每层的用户数量

查询上线

可以向下递归,自然可以向上递归,如以下查询用户303的上线:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
WITH recursive invite_relation_rollup ( uid, iid ) AS (
SELECT
user_id AS uid,
invite_user_id AS iid
FROM
user_invite_tree
WHERE
user_id = 303 UNION ALL
SELECT
user_id AS uid,
invite_user_id AS iid
FROM
user_invite_tree t,
invite_relation_rollup r
WHERE
t.user_id = r.iid
)
SELECT
*
FROM
invite_relation_rollup

输出为:

iid uid
303 201
201 100
100 99

同样的,我们可以增加一个layer变量来控制向上查询层级:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
WITH recursive invite_relation_rollup ( uid, iid, layer ) AS (
SELECT
user_id AS uid,
invite_user_id AS iid,
1 AS layer
FROM
user_invite_tree
WHERE
user_id = 303 UNION ALL
SELECT
user_id AS uid,
invite_user_id AS iid,
r.layer + 1 AS layer
FROM
user_invite_tree t,
invite_relation_rollup r
WHERE
t.user_id = r.iid
AND r.layer < 2
)
SELECT
*
FROM
invite_relation_rollup

输出为:

iid uid layer
303 201 1
201 100 2

同样的,对最终的select增加一个group和count聚合,也能轻松查询出每层的用户数量

查询下线人数

如何查询某个用户的所有下线的数量?

我们可以使用上文的递归查询出用户的下线,然后进行统计,但是如此使用递归存在递归层级和数据量的限制,效率堪忧,按道理会是指数级别的性能下降

然后我们就想出了第二个方案:增加一个列,在记录新用户信息时,记录每个用户的邀请链路,如:

user_id invite_user_id invite_chain
99 null null
100 99 .99.
201 100 .99.100.
202 100 .99.100.
303 201 .99.100.
304 201 .99.100.
407 303 .99.100.303.
408 303 .99.100.303.
409 304 .99.100.304.

如上,第五行user_id = 304的用户,invite_chain则表示他的上线依次是:99 -> 邀请100 -> 邀请201 -> 邀请304

那么查询用户99的下级,使用 invite_chain like ‘%.99.%’ 筛选后count即可;同样,查询100的下级那么使用 invite_chain like ‘%.100.%’ 筛选后count即可

但是这样,在上文的示例中我们使用的id是小的数字来做Demo,而实际的user_id使用雪花ID等的情况下,长度大得多,那么invite_chain字段会很大

我们继续沿着这种思路考虑:每层的人数不会太多,毕竟微信好友数上限也才几百,pdd砍一刀死命拉也就能叫到十来人。那么我们对invite_chain字段的组成做一下修改:约定无邀请人的初始用户的invite_id = 0;对invite_chain不再使用user_id进行拼接,为了减小用户标识的长度,我们对某个用户的直接下线的用户按次序分配一个递增的编号,使用这一层内编号来代替user_id(同时,这一方式下可以记录下每个用户的邀请次序);在记录新用户信息时,统计邀请人的直接下线数量即可确定当前用户的次序编号,确定invite_chain

比如:

user_id invite_user_id invite_chain
99 null .0.
99 null .1.
100 99 .0.1.
101 98 .1.1.
201 100 .0.1.1.
202 100 .0.1.2.
203 100 .0.1.3.
204 101 .1.1.1.
303 201 .0.1.1.1.
304 201 .0.1.1.2.
407 303 .0.1.1.1.1.
408 303 .0.1.1.1.2.
409 304 .0.1.1.2.1.

如上例,我们要查询用户100的下线数量,那么使用 invite_chain like ‘.0.1.%’ 进行筛选后做一次count可以完成统计

这样的实现与使用用户id进行拼接并无本质区别,主要是为了减小invite_chain的字段长度