通过使用同义词,您可以将已经上传的同义词文件作用于阿里云Elasticsearch的同义词库,并使用更新后的词库搜索。阿里云Elasticsearch支持两种方式使用同义词:上传同义词文件、直接引用同义词。本文分别介绍两种方式的使用示例。
背景信息
本文中的命令,均可在Kibana控制台中执行。登录Kibana控制台的方法,请参见登录Kibana控制台。方式一:上传同义词文件
前提条件:已上传同义词文件。具体操作,请参见上传同义词文件进行上传。
以下示例使用filter过滤器配置同义词,使用aliyun_synonyms.txt作为测试文件,内容为begin, start
。
- 创建索引。
PUT /aliyun-index-test { "settings": { "index":{ "analysis": { "analyzer": { "by_smart": { "type": "custom", "tokenizer": "ik_smart", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] }, "by_max_word": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] } }, "filter": { "by_tfr": { "type": "stop", "stopwords": [" "] }, "by_sfr": { "type": "synonym", "synonyms_path": "analysis/aliyun_synonyms.txt" } }, "char_filter": { "by_cfr": { "type": "mapping", "mappings": ["| => |"] } } } } } }
- 配置同义词字段title。
- Elasticsearch 7.0以下版本示例
PUT /aliyun-index-test/_mapping/doc { "properties": { "title": { "type": "text", "analyzer": "by_max_word", "search_analyzer": "by_smart" } } }
- Elasticsearch 7.0及以上版本示例
PUT /aliyun-index-test/_mapping/ { "properties": { "title": { "type": "text", "analyzer": "by_max_word", "search_analyzer": "by_smart" } } }
重要 官方Elasticsearch从7.0版本开始,移除了类型(type)的概念,默认使用_doc
代替。因此在设置索引mapping时无需指定type,否则会报错。
- Elasticsearch 7.0以下版本示例
- 校验同义词。
GET /aliyun-index-test/_analyze { "analyzer": "by_smart", "text":"begin" }
执行成功后,返回如下结果。{ "tokens": [ { "token": "begin", "start_offset": 0, "end_offset": 5, "type": "ENGLISH", "position": 0 }, { "token": "start", "start_offset": 0, "end_offset": 5, "type": "SYNONYM", "position": 0 } ] }
- 添加数据,进行下一步测试。
- Elasticsearch 7.0以下版本示例
PUT /aliyun-index-test/doc/1 { "title": "Shall I begin?" }
PUT /aliyun-index-test/doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0及以上版本示例
PUT /aliyun-index-test/_doc/1 { "title": "Shall I begin?" }
PUT /aliyun-index-test/_doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0以下版本示例
- 通过搜索测试,校验同义词。
GET /aliyun-index-test/_search { "query" : { "match" : { "title" : "begin" }}, "highlight" : { "pre_tags" : ["<red>", "<bule>"], "post_tags" : ["</red>", "</bule>"], "fields" : { "title" : {} } } }
执行成功后,返回如下结果。{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.41048482, "hits": [ { "_index": "aliyun-index-test", "_type": "doc", "_id": "2", "_score": 0.41048482, "_source": { "title": "I start work at nine." }, "highlight": { "title": [ "I <red>start</red> work at nine." ] } }, { "_index": "aliyun-index-test", "_type": "doc", "_id": "1", "_score": 0.39556286, "_source": { "title": "Shall I begin?" }, "highlight": { "title": [ "Shall I <red>begin</red>?" ] } } ] } }
方式二:直接引用同义词
以下示例直接引用同义词,并使用IK词典进行分词。
- 创建索引。
PUT /my_index { "settings": { "analysis": { "analyzer": { "my_synonyms": { "filter": [ "lowercase", "my_synonym_filter" ], "tokenizer": "ik_smart" } }, "filter": { "my_synonym_filter": { "synonyms": [ "begin,start" ], "type": "synonym" } } } } }
以上命令的原理为:- 设置一个同义词过滤器my_synonym_filter,并配置同义词词库。
- 设置一个my_synonyms解释器,使用ik_smart分词。
- 经过ik_smart分词,把所有字母小写,并作为同义词处理。
- 配置同义词字段title。
- Elasticsearch 7.0以下版本示例
PUT /my_index/_mapping/doc { "properties": { "title": { "type": "text", "analyzer": "my_synonyms" } } }
- Elasticsearch 7.0及以上版本示例
PUT /my_index/_mapping/ { "properties": { "title": { "type": "text", "analyzer": "my_synonyms" } } }
重要 官方Elasticsearch从7.0版本开始,移除了类型(type)的概念,默认使用_doc
代替,所以在设置索引mapping时无需指定type,否则会报错。
- Elasticsearch 7.0以下版本示例
- 校验同义词。
GET /my_index/_analyze { "analyzer":"my_synonyms", "text":"Shall I begin?" }
执行成功后,返回如下结果。{ "tokens": [ { "token": "shall", "start_offset": 0, "end_offset": 5, "type": "ENGLISH", "position": 0 }, { "token": "i", "start_offset": 6, "end_offset": 7, "type": "ENGLISH", "position": 1 }, { "token": "begin", "start_offset": 8, "end_offset": 13, "type": "ENGLISH", "position": 2 }, { "token": "start", "start_offset": 8, "end_offset": 13, "type": "SYNONYM", "position": 2 } ] }
- 添加数据,进行下一步测试。
- Elasticsearch 7.0以下版本示例
PUT /my_index/doc/1 { "title": "Shall I begin?" }
PUT /my_index/doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0及以上版本示例
PUT /my_index/_doc/1 { "title": "Shall I begin?" }
PUT /my_index/_doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0以下版本示例
- 通过搜索测试,校验同义词。
GET /my_index/_search { "query" : { "match" : { "title" : "begin" }}, "highlight" : { "pre_tags" : ["<red>", "<bule>"], "post_tags" : ["</red>", "</bule>"], "fields" : { "title" : {} } } }
执行成功后,返回如下结果。{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.41913947, "hits": [ { "_index": "my_index", "_type": "doc", "_id": "2", "_score": 0.41913947, "_source": { "title": "I start work at nine." }, "highlight": { "title": [ "I <red>start</red> work at nine." ] } }, { "_index": "my_index", "_type": "doc", "_id": "1", "_score": 0.39556286, "_source": { "title": "Shall I begin?" }, "highlight": { "title": [ "Shall I <red>begin</red>?" ] } } ] } }