怎么使用Elasticsearch中的Match_phrase查询

发布时间：2021-11-17 13:41:25 阅读：346 作者：iii 栏目：大数据

开发者测试专用服务器限时活动，0元免费领，库存有限，领完即止！点击查看>>

本篇内容主要讲解“怎么使用Elasticsearch中的Match_phrase查询”，感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷，实用性强。下面就让小编来带大家学习“怎么使用Elasticsearch中的Match_phrase查询”吧!

数据准备阶段

新建索引：
PUT test_phrase

设置索引mapping：
PUT /test_phrase/_mapping/_doc
{
    "properties": {
        "name": {
            "type":"text"
        }
    }
}
结果：
{
  "mapping": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text"
        }
      }
    }
  }
}

插入数据：
PUT test_phrase/_doc/2
{
  "name":"我爱北京天安门"
}

查询数据：
POST test_phrase/_search
{
  "query": {"match_all": {}}
}
结果：
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_phrase",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "我爱北京天安门"
        }
      },
      {
        "_index" : "test_phrase",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "王乃康"
        }
      }
    ]
  }
}


查看分词词项：
POST test_phrase/_analyze
{
  "field": "name",   
  "text": "我爱北京天安门"
}
结果：
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "爱",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "北",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "京",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "天",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "安",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "门",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    }
  ]
}

测试阶段

1.关键词"我"

POST test_phrase/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "我"
      }
    }
  }
}

结果：
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test_phrase",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "我爱北京天安门"
        }
      }
    ]
  }
}

分析：
POST test_phrase/_analyze
{
  "field": "name",   
  "text": "我"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    }
  ]
}
查询分词"我"的position位置是0，首先文档"我爱北京天安门"的索引分词中有"我"且position为0，符合短语查询的要求，因此可以正确返回。

2.关键词"我爱"

POST test_phrase/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "我爱"
      }
    }
  }
}

结果：
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "test_phrase",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.5753642,
        "_source" : {
          "name" : "我爱北京天安门"
        }
      }
    ]
  }
}

分析：
POST test_phrase/_analyze
{
  "field": "name",   
  "text": "我爱"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "爱",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    }
  ]
}
查询分词"我爱"的position分别是"我"-0、"爱"-1，首先索引分词中也存在"我"、"爱"词项，其次"我"-0、"爱"-1的position也服务要求，因此可以正确返回。

3.关键词"我北"

POST test_phrase/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "我北"
      }
    }
  }
}

结果：
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

分析：
POST test_phrase/_analyze
{
  "field": "name",   
  "text": "我北"
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "北",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    }
  ]
}

查询分词中"我"的position是0，"北"的position是1，索引分词中"我"的position是0，"北"的position是2，
虽然查询分词的词项在索引分词的词项中都存在，但是position并未匹配要求，导致搜索结果不能正确返回。

修正："slop": 1
POST test_phrase/_search
{
  "query": {
    "match_phrase": {
      "name": {
        "query": "我北",
        "slop": 1
      }
    }
  }
}
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.37229446,
    "hits" : [
      {
        "_index" : "test_phrase",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.37229446,
        "_source" : {
          "name" : "我爱北京天安门"
        }
      }
    ]
  }
}

补充阶段

1.使用邻近度提高相关度

我们可以将一个简单的 match 查询作为一个 must 子句。这个查询将决定哪些文档需要被包含到结果集中。我们可以用 minimum_should_match 参数去除长尾。然后我们可以以 should 子句的形式添加更多特定查询。每一个匹配成功的都会增加匹配文档的相关度。

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {   #must 子句从结果集中包含或者排除文档
          "title": {
            "query":                "quick brown fox",
            "minimum_should_match": "30%"
          }
        }
      },
      "should": {
        "match_phrase": {   #should 子句增加了匹配到文档的相关度评分。
          "title": {
            "query": "quick brown fox",
            "slop":  50
          }
        }
      }
    }
  }
}

到此，相信大家对“怎么使用Elasticsearch中的Match_phrase查询”有了更深的了解，不妨来实际操作一番吧！这里是亿速云网站，更多相关内容可以进入相关频道进行查询，关注我们，继续学习！

亿速云「云服务器」，即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘，价格低至29元/月。点击查看>>

向AI问一下细节

怎么使用Elasticsearch中的Match_phrase查询

数据准备阶段

测试阶段

1.关键词"我"

2.关键词"我爱"

3.关键词"我北"

补充阶段

1.使用邻近度提高相关度

猜你喜欢

最新资讯

相关推荐

开发者交流群：

相关标签