ES字符串从任意位置模糊查询(支持只匹配含连续字符串内容)
巨人的肩膀
ElasticSearch一看就懂之分词器edge_ngram和ngram的区别
需求
某个ES index中text 类型的name 字段 (内容只为英文),支持字符串 从任意位置 的以下模糊搜索能力:
1、单字母模糊查询
已有数据:“zhang san”、“li si”、“wang wu”
例如: 输入 n
返回:返回 “zhang san”、 “wang wu”
2、字符串模糊查询:查询条件为连续的字符串时,只返回含有该连续字符串的内容
已有数据:“zhang san”、“li si”、“wang wu”
例如: 输入 hang
返回:返回 “zhang san”(而不会返回 “wang wu”)
3、查询条件有空格,则应视为两个查询条件做模糊查询
已有数据:“zhang san”、“li si”、“wang wu”
例如: 输入 “hang si”
返回:返回 “zhang san”、“li si”
实现
ES index定义 【核心】
PUT user-info
{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"name": {
"type": "text",
"analyzer": "name_analyzer",
// 查询条件按空格分词,以实现需求3
"search_analyzer": "whitespace"
}
}
},
"settings": {
// 此值应>= 下面的ngram_filter中的max_gram减min_gram的值
"max_ngram_diff": 50,
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
// 任意位置分词,步长最短为1,最长为50,最长值视具体需求而定,可自行修改(需考虑ES存储成本)
// 详情见参考文章[ElasticSearch一看就懂之分词器edge_ngram和ngram的区别]
"min_gram": "1",
"max_gram": "50"
}
},
"analyzer": {
"name_analyzer": {
"filter": [
"asciifolding",
"lowercase",
// 从任意位置开始分词,这是实现模糊查询的关键
"ngram_filter"
],
// 原始数据按空格分词
"tokenizer": "whitespace"
}
}
}
}
}
预备数据
POST user-info/_doc/1
{
"id": 1,
"name": "zhang san"
}
POST user-info/_doc/2
{
"id": 2,
"name": "li si"
}
POST user-info/_doc/3
{
"id": 3,
"name": "wang wu"
}
确认预备数据:
GET user-info/_search
...
"hits" : [
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : 1,
"name" : "zhang san"
}
},
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "li si"
}
},
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"id" : 3,
"name" : "wang wu"
}
}
]
...
验证
需求1
单字母模糊查询
GET user-info/_search
{
"query": {
"match": {
"name": "n"
}
}
}
查询结果:
...
"hits" : [
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.84928787,
"_source" : {
"id" : 1,
"name" : "zhang san"
}
},
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.72056305,
"_source" : {
"id" : 3,
"name" : "wang wu"
}
}
]
...
需求2
字符串模糊查询:查询条件为连续的字符串时,只返回含有该连续字符串的内容
GET user-info/_search
{
"query": {
"match": {
"name": "hang"
}
}
}
查询结果:
...
"hits" : [
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.5037103,
"_source" : {
"id" : 1,
"name" : "zhang san"
}
}
]
...
需求3
查询条件有空格,则应视为两个查询条件做模糊查询
GET user-info/_search
{
"query": {
"match": {
"name": "hang si"
}
}
}
查询结果:
...
"hits" : [
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.5037103,
"_source" : {
"id" : 1,
"name" : "zhang san"
}
},
{
"_index" : "user-info",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.5037103,
"_source" : {
"id" : 2,
"name" : "li si"
}
}
]
...