这篇文章主要介绍“Solr4.7的synonyms怎么配置”,在日常操作中,相信很多人在Solr4.7的synonyms怎么配置问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”Solr4.7的synonyms怎么配置”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
在搜索中,往往需要用到关联词(近义词),比如,搜索 “联想” 品牌那么我们同时搜索 “lenovo”等,solr为我们提供了近义词过滤器solr.SynonymFilterFactory。
配置搜索近义词很简单,只要在schema字段定义过滤器
在schema.xml的<types>标签中添加<fieldType>,如下:
<!-- IK中文分词器,停用词,同义词配置 --> <fieldType name="text_ik" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
solr.SynonymFilterFactory配置中,synonyms是近义词配置文件
ignoreCase:为true,表示转化为小写匹配,及忽略大小写。
expand:涉及到synonyms.txt的配置
synonyms.txt配置一行为单位,建立关键词联系
# The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. #----------------------------------------------------------------------- #some test synonym mappings unlikely to appear in real input text aaafoo => aaabar bbbfoo => bbbfoo bbbbar cccfoo => cccbar cccbaz fooaaa,baraaa,bazaaa # Some synonym groups specific to this example GB,gib,gigabyte,gigabytes MB,mib,megabyte,megabytes Television, Televisions, TV, TVs #notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming #after us won't split it into two words. 中国,英国,日本 # Synonym mappings can be used for spelling correction too pixima => pixma
就是说=>指一对一,以逗号分隔的是组群,也就是多对多。
当然这个还得定义相关字段为这个类型,如下。
<field name="msg_title" type="text_ik" indexed="true" stored="true"/>
到此,关于“Solr4.7的synonyms怎么配置”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注亿速云网站,小编会继续努力为大家带来更多实用的文章!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。