Elasticsearch中如何进行Match查询

发布时间：2021-11-16 16:55:00 来源：亿速云阅读：271 作者：柒染栏目：大数据

Elasticsearch中如何进行Match查询，针对这个问题，这篇文章详细介绍了相对应的分析和解答，希望可以帮助更多想解决这个问题的小伙伴找到更简单易行的方法。

如果索引单词对而不是索引独立的单词，就能对这些单词的上下文尽可能多的保留。这个时候就需要用到shingles。

例句：Sue ate the alligator 
unigram：["sue", "ate", "the", "alligator"]
bigrams：["sue ate", "ate the", "the alligator"]
trigrams：["sue ate the", "ate the alligator"]

备注：
Trigrams 提供了更高的精度，但是也大大增加了索引中唯一词项的数量。在大多数情况下，Bigrams 就够了。

幸运的是，用户倾向于使用和搜索数据相似的构造来表达搜索意图。
但这一点很重要：只是索引 bigrams 是不够的；我们仍然需要 unigrams ，但可以将匹配 bigrams 作为增加相关度评分的信号。

Shingles 需要在索引时作为分析过程的一部分被创建。 
我们可以将 unigrams 和 bigrams 都索引到单个字段中， 但将它们分开保存在能被独立查询的字段会更清晰。
unigrams 字段将构成我们搜索的基础部分，而 bigrams 字段用来提高相关度。

注意：
词项匹配
只有当用户输入的查询内容和在原始文档中顺序相同时，shingles 才是有用的
总结：
使用短语查询时使用Es默认的标准分词器（标准分词器：细粒度切分）最好，这样可以使查询分词和索引分词的词项最大可能的达到匹配
特别适合需要前后词一起搭配的情景（例：人名、地名...）

数据准备阶段

新建索引setting：
PUT /my_index
{
    "settings": {
        "number_of_shards": 1,
        "analysis": {
            "filter": {
                "my_shingle_filter": {
                    "type": "shingle",
                    "min_shingle_size": 2,
                    "max_shingle_size": 2,   
                    "output_unigrams":  false
                }
            },
            "analyzer": {
                "my_shingle_analyzer": {
                    "type": "custom",
                    "tokenizer":"standard",
                    "filter": [
                        "lowercase",
                        "my_shingle_filter">

Elasticsearch中如何进行Match查询

测试阶段

1.match查询

GET /my_index/_doc/_search
{
   "query": {
        "match": {
           "title": "the hungry alligator ate sue"
        }
   }
}

查询结果：
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.3721708,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3721708,#两个文档都包含 the 、 alligator 和 ate ，所以获得相同的评分。
        "_source" : {
          "title" : "Sue ate the alligator"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.3721708,#两个文档都包含 the 、 alligator 和 ate ，所以获得相同的评分。
        "_source" : {
          "title" : "The alligator ate Sue"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.21526179,#我们可以通过设置 minimum_should_match 参数排除文档 3 ，参考 控制精度 。 
        "_source" : {
          "title" : "Sue never goes anywhere without her alligator skin purse"
        }
      }
    ]
  }
}

分析：
注意文档 1 和 2 有相同的相关度评分因为他们包含了相同的单词

2.match.shingles查询

GET /my_index/_doc/_search
{
   "query": {
      "bool": {
         "must": {
            "match": {
               "title": "the hungry alligator ate sue"
            }
         },
         "should": {
            "match": {
               "title.shingles": "the hungry alligator ate sue"
            }
         }
      }
   }
}

查询结果：
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 3.6694741,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.6694741,
        "_source" : {
          "title" : "The alligator ate Sue"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.3721708,
        "_source" : {
          "title" : "Sue ate the alligator"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.21526179,
        "_source" : {
          "title" : "Sue never goes anywhere without her alligator skin purse"
        }
      }
    ]
  }
}

分析：
仍然匹配到了所有的 3 个文档， 但是文档 2 现在排到了第一名因为它匹配了 shingled 词项 ate sue.

关于Elasticsearch中如何进行Match查询问题的解答就分享到这里了，希望以上内容可以对大家有一定的帮助，如果你还有很多疑惑没有解开，可以关注亿速云行业资讯频道了解更多相关知识。

向AI问一下细节

Elasticsearch中如何进行Match查询

数据准备阶段

测试阶段

1.match查询

2.match.shingles查询

猜你喜欢

最新资讯

相关推荐

相关标签