shenfen，数据库模糊查询算法

本文讲解了GaussDB(DWS)上模糊查询常用的性能优化方法，通过创建索引，能够提升多种场景下模糊查询语句的执行速度。

本文分享自华为云社区《GaussDB(DWS) 模糊查询性能优化-云社区-华为云》，作者：黎明的风。

在使用GaussDB(DWS)时，通过like进行模糊查询，有时会遇到查询性能慢的问题。

（一）LIKE模糊查询

通常的查询语句如下：

select * from t1 where c1 like ‘A123%’;
当表t1的数据量大时，使用like进行模糊查询，查询的速度非常慢。

通过explain查看该语句生成的查询计划：

上面查询的模糊匹配条件 ‘A123%’，我们称它为后模糊匹配。这种场景，可以通过建立一个BTREE索引来提升查询性能。

建立索引时需要根据字段数据类型设置索引对应的operator，对于text，varchar和char分别设置和text_pattern_ops，varchar_pattern_ops和bpchar_pattern_ops。

例如上面例子里的c1列的类型为text，创建索引时增加text_pattern_ops，建立索引的语句如下：

CREATE INDEX ON t1 (c1 text_pattern_ops);
增加索引后打印查询计划：

test=# explain select * from t1 where c1 like ‘A123%’; QUERY PLAN —————————————————————————————- id | operation | E-rows | E-memory | E-width | E-costs —-+—————————————–+——–+———-+———+——— 1 | -> Streaming (type: GATHER) | 1 | | 8 | 14.27 2 | -> Index Scan using t1_c1_idx on t1 | 1 | 1MB | 8 | 8.27 Predicate Information (identified by plan id) ———————————————————————- 2 –Index Scan using t1_c1_idx on t1 Index Cond: ((c1 ~>=~ ‘A123’::text) AND (c1 ~<~ 'A124'::text)) Filter: (c1 ~~ 'A123%'::text)在创建索引后，可以看到语句执行时会使用到前面创建的索引，执行速度会变快。前面遇到的问题使用的查询条件是后缀的模糊查询，如果使用的是前缀的模糊查询，我们可以看一下查询计划是否有使用到索引。test=# explain select * from t1 where c1 like '%A123'; QUERY PLAN ----------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | 8 | 16.25 2 | -> Seq Scan on t1 | 1 | 1MB | 8 | 10.25 Predicate Information (identified by plan id) ——————————————— 2 –Seq Scan on t1 Filter: (c1 ~~ ‘%A123’::text)
如上图所示，当查询条件变成前缀的模糊查询，之前建的索引将不能使用到，查询执行时进行了全表的扫描。

这种情况，我们可以使用翻转函数（reverse），建立一个索引来支持前模糊的查询，建立索引的语句如下：

CREATE INDEX ON t1 (reverse(c1) text_pattern_ops);
将查询语句的条件采用reverse函数进行改写之后，输出查询计划：

test=# explain select * from t1 where reverse(c1) like ‘A123%’; QUERY PLAN —————————————————————————————— id | operation | E-rows | E-memory | E-width | E-costs —-+——————————-+——–+———-+———+——— 1 | -> Streaming (type: GATHER) | 5 | | 8 | 14.06 2 | -> Bitmap Heap Scan on t1 | 5 | 1MB | 8 | 8.06 3 | -> Bitmap Index Scan | 5 | 1MB | 0 | 4.28 Predicate Information (identified by plan id) —————————————————————————————- 2 –Bitmap Heap Scan on t1 Filter: (reverse(c1) ~~ ‘A123%’::text) 3 –Bitmap Index Scan Index Cond: ((reverse(c1) ~>=~ ‘A123’::text) AND (reverse(c1) ~<~ 'A124'::text))语句经过改写后，可以走索引，查询性能得到提升。（二）指定collate来创建索引

如果使用默认的index ops class时，要使b-tree索引支持模糊的查询，就需要在查询和建索引时都指定collate=”C”。

注意：索引和查询条件的collate都一致的情况下才能使用索引。

创建索引的语句为：

CREATE INDEX ON t1 (c1 collate “C”);
查询语句的where条件中需要增加collate的设置：

GIN（Generalized Inverted Index）通用倒排索引。设计为处理索引项为组合值的情况，查询时需要通过索引搜索出出现在组合值中的特定元素值。例如，文档是由多个单词组成，需要查询出文档中包含的特定单词。

下面举例说明GIN索引的使用方法：

create table gin_test_data(id int, chepai varchar(10), shenfenzheng varchar(20), duanxin text) distribute by hash (id);create index chepai_idx on gin_test_data using gin(to_tsvector(‘ngram’, chepai)) with (fastupdate=on);
上述语句在车牌的列上建立了一个GIN倒排索引。

如果要根据车牌进行模糊查询，可以使用下面的语句：

select count(*) from gin_test_data where to_tsvector(‘ngram’, chepai) @@ to_tsquery(‘ngram’, ‘湘F’);
这个语句的查询计划如下：

test=# explain select count(*) from gin_test_data where to_tsvector(‘ngram’, chepai) @@ to_tsquery(‘ngram’, ‘湘F’); QUERY PLAN ———————————————————————————————— id | operation | E-rows | E-memory | E-width | E-costs —-+————————————————+——–+———-+———+——— 1 | -> Aggregate | 1 | | 8 | 18.03 2 | -> Streaming (type: GATHER) | 1 | | 8 | 18.03 3 | -> Aggregate | 1 | 1MB | 8 | 12.03 4 | -> Bitmap Heap Scan on gin_test_data | 1 | 1MB | 0 | 12.02 5 | -> Bitmap Index Scan | 1 | 1MB | 0 | 8.00 Predicate Information (identified by plan id) ———————————————————————————————- 4 –Bitmap Heap Scan on gin_test_data Recheck Cond: (to_tsvector(‘ngram’::regconfig, (chepai)::text) @@ ”’湘f”’::tsquery) 5 –Bitmap Index Scan Index Cond: (to_tsvector(‘ngram’::regconfig, (chepai)::text) @@ ”’湘f”’::tsquery)
查询中使用了倒排索引，因此有比较的好的执行性能。

点击下方，第一时间了解华为云新鲜技术~

华为云博客_大数据博客_AI博客_云计算博客_开发者中心-华为云

本站部分内容由互联网用户自发贡献，该文观点仅代表作者本人，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规等内容，请举报！一经查实，本站将立刻删除。

shenfen，数据库模糊查询算法

相关推荐