mysql炸裂函数实例

首先需要了解一个函数:

substring_index(“待截取字符串”,“截取规则”,截取前n个字符),如果第三个参数为负数,则倒过来取前n个

举例:

1
2
3
4
5
6
7
8
9
10
11
# 取第一个:==>得到结果为: 15
SELECT SUBSTRING_INDEX('15,151,152,16',',',1);

# N可以为负数,表示倒数前N个字符串。(有负号的时候,可以将整个字符倒过来看)==>得到结果为: 16
SELECT SUBSTRING_INDEX('15,151,152,16',',',-1);

# 取前2个字符串,然后倒过来取第一个 ==>得到结果为: 151
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('15,151,152,16',',',2),',',-1);

# 倒过来取前两个字符串,再正数第一个:==> 得到结果为:152
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('15,151,152,16',',',-2),',',1);

如果有一个字符数组,我们想分别取第一个,第二个,第三个字符:

user1,user2,user3

1
2
3
substring_index(substring_index(processed_data,',',1),',',-1),
substring_index(substring_index(processed_data,',',2),',',-1),
substring_index(substring_index(processed_data,',',3),',',-1)

以此类推第n个字符:

1
substring_index(substring_index(processed_data,',',n),',',-1)

如果我们需要把前n个字符都取出来,那就要借助一个足够长从0单调递增的数列,把<=n的所有字符都提取出来

创建一个只有一列的单调递增表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
drop procedure test_200w;

DELIMITER $$
CREATE PROCEDURE test_200w(IN in_count INT) # 创建存储过程
BEGIN
DECLARE COUNT INT DEFAULT 1;
# DECLARE SUM INT DEFAULT 0;
WHILE COUNT <= in_count
DO
# SET SUM = SUM + COUNT;
insert into test_200w values (COUNT);
SET COUNT = COUNT + 1;
END WHILE;
END $$
DELIMITER ;

drop table test_200w;

create table test_200w
(
id bigint primary key default 0
);

call test_200w(200000);

select *
from test_200w;

下面就来利用这个表进行炸裂:

需求1:炸裂email

示例数据中的email字段示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
[
"last@roechling.com",
"rep@roechling.com.sg",
"info@roechling.com",
"bewerbung@roechling.com",
"brown@roechling.com",
"bkoennecker@roechling.com",
"info-xan@roechling.com",
"ikhammarit@roechling.com",
"dwalulik@roechling.com",
"reg_thermo@roechling.com.sg",
"ina.breitsprecher@roechling.com",
"bewerbung.debre@roechling.com",
"robaproducts@roechling.com",
"reckerstorfer@roechling.com",
"aluger@roechling.com",
"datenschutz@roechling.com",
"info.deneu@roechling.com",
"rep@roechling.com",
"zentrale.deneu@roechling.com",
"hr-oep@roechling.com",
"apueschel@roechling.com",
"seminar-management@roechling.com",
"admin@roechling.com.sg",
"ibaumgaertel@roechling.com",
"personal.waldachtal@roechling.com",
"kbiedebach@roechling.com",
"mfritzmann@roechling.com",
"sales.deneu@roechling.com",
"nikolaus_niepon@roechling.com",
"lsieber@roechling.com",
"d-jane@roechling.com",
"rep@roechling.com.cn",
"pstraub@roechling.com",
"jane_d@roechling.com",
"info@roechling.com.cn",
"peter.walsh@roechling.com.email",
"breitsprecher@roechling.com",
"susanne.salomon@roechling.com",
"johannes.salmuth@roechling.com",
"angela.lin@roechling.com.sg",
"gabriele.stegher@roechling.com",
"jermel.jones@roechling.com",
"rega@roechling.com",
"lewis.carter@roechling.com",
"kgina.breitsprecher@roechling.com",
"jane.doe@roechling.com",
"martina.anschuetz@roechling.com",
"jtoepfer-linss@roechling.com",
"peter.walsh@roechling.com",
"witherspoon@roechling.com",
"cachedkeegan@roechling.com",
"chris@roechling.com",
"chris_krebs@roechling.com",
"heidi@roechling.com.email",
"abernathy@roechling.com",
"ludger@roechling.com",
"liangtong@roechling.com.sg",
"jtoepfer-linss@roechling.com.mit",
"rep@roechling.com.company",
"plastics.rep@roechling.com.sg",
"bewerbung.deneu@roechling.com",
"jtoepfer-linss@roechling.com.sv",
"ludger.bartels@roechling.com",
"admin@roechling.com",
"james_andolina@roechling.com",
"darren@roechling.com",
"linlin.ng@roechling.com.sg",
"cachedrep@roechling.com.sg",
"georg.duffner@roechling.com",
"km.pang@roechling.com.sg",
"j-d@roechling.com",
"djane@roechling.com",
"odell@roechling.com",
"info.debre@roechling.com"
]

这里首先要对数据进行清洗:清洗掉’[‘,’]’,’”‘,然后在join单调表,取id<=字符条数的所有字符进行炸裂

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SELECT 公司名称,
地址,
网址,
电话,
主要管理人,
雇员人数,
成立时间,
substring_index(substring_index(processed_data, ',', b.id), ',', - 1) 邮箱
FROM (
SELECT 公司名称,
地址,
网址,
电话,
主要管理人,
雇员人数,
成立时间,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(邮箱, '[', ''), ']', ''), '"', ''), ' ', ''),
CHAR(10), ''), CHAR(13), '') processed_data
FROM company_all
) temp
JOIN test_200w b
ON b.id <= (length(temp.processed_data) - length(REPLACE(temp.processed_data, ',', '')) + 1)
union all select * from company_all where 邮箱 is null;

这里有一点需要特别注意:如果有的列email为null,那么这些列的length也是null,长度是无法作差的,所以会丢失数据,这里需要加上。结果如下:

需求2:炸裂email并取前5

方案一:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
     SELECT 公司名称,
地址,
网址,
电话,
主要管理人,
雇员人数,
成立时间,
substring_index(substring_index(processed_data, ',', b.id), ',', -1) 邮箱
FROM (
SELECT 公司名称,
地址,
网址,
电话,
主要管理人,
雇员人数,
成立时间,
substring_index(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(邮箱, '[', ''), ']', ''), '"', ''),' ', ''), CHAR(10), ''), CHAR(13), ''), ',',5) processed_data
FROM company_all
) temp
JOIN test_200w b
ON b.id <= (length(temp.processed_data) - length(REPLACE(temp.processed_data, ',', ''))+1)
union all select * from company_all where 邮箱 is null;

方案二:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SELECT 公司名称,
地址,
网址,
电话,
主要管理人,
雇员人数,
成立时间,
substring_index(substring_index(processed_data, ',', b.id), ',', - 1) 邮箱
FROM (
SELECT 公司名称,
地址,
网址,
电话,
主要管理人,
雇员人数,
成立时间,
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(邮箱, '[', ''), ']', ''), '"', ''), ' ', ''),
CHAR(10), ''), CHAR(13), '') processed_data
FROM company_all
) temp
JOIN test_200w b ON b.id <= (5)
group by 公司名称, 地址, 网址, 电话, 主要管理人, 雇员人数, 成立时间,
substring_index(substring_index(processed_data, ',', b.id), ',', - 1);

效果如下:

Donate
  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

扫一扫,分享到微信

微信分享二维码
  • Copyrights © 2020-2021 ycfn97
  • Visitors: | Views:

请我喝杯咖啡吧~

支付宝
微信