通過使用同義字,您可以將已經上傳的同義字檔案作用於Elasticsearch的同義字,並使用更新後的詞庫搜尋。Elasticsearch支援兩種方式使用同義字:上傳同義字檔案、直接引用同義字。本文分別介紹兩種方式的使用樣本。
背景資訊
本文中的命令,均可在Kibana控制台中執行。登入Kibana控制台的方法,請參見登入Kibana控制台。方式一:上傳同義字檔案
前提條件:已上傳同義字檔案。具體操作,請參見上傳同義字檔案進行上傳。
以下樣本使用filter過濾器配置同義字,使用aliyun_synonyms.txt作為測試檔案,內容為begin, start
。
- 建立索引。
PUT /aliyun-index-test { "settings": { "index":{ "analysis": { "analyzer": { "by_smart": { "type": "custom", "tokenizer": "ik_smart", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] }, "by_max_word": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] } }, "filter": { "by_tfr": { "type": "stop", "stopwords": [" "] }, "by_sfr": { "type": "synonym", "synonyms_path": "analysis/aliyun_synonyms.txt" } }, "char_filter": { "by_cfr": { "type": "mapping", "mappings": ["| => |"] } } } } } }
- 配置同義字欄位title。
- Elasticsearch 7.0以下版本樣本
PUT /aliyun-index-test/_mapping/doc { "properties": { "title": { "type": "text", "analyzer": "by_max_word", "search_analyzer": "by_smart" } } }
- Elasticsearch 7.0及以上版本樣本
PUT /aliyun-index-test/_mapping/ { "properties": { "title": { "type": "text", "analyzer": "by_max_word", "search_analyzer": "by_smart" } } }
重要 官方Elasticsearch從7.0版本開始,移除了類型(type)的概念,預設使用_doc
代替。因此在設定索引mapping時無需指定type,否則會報錯。
- Elasticsearch 7.0以下版本樣本
- 校正同義字。
GET /aliyun-index-test/_analyze { "analyzer": "by_smart", "text":"begin" }
執行成功後,返回如下結果。{ "tokens": [ { "token": "begin", "start_offset": 0, "end_offset": 5, "type": "ENGLISH", "position": 0 }, { "token": "start", "start_offset": 0, "end_offset": 5, "type": "SYNONYM", "position": 0 } ] }
- 添加資料,進行下一步測試。
- Elasticsearch 7.0以下版本樣本
PUT /aliyun-index-test/doc/1 { "title": "Shall I begin?" }
PUT /aliyun-index-test/doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0及以上版本樣本
PUT /aliyun-index-test/_doc/1 { "title": "Shall I begin?" }
PUT /aliyun-index-test/_doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0以下版本樣本
- 通過搜尋測試,校正同義字。
GET /aliyun-index-test/_search { "query" : { "match" : { "title" : "begin" }}, "highlight" : { "pre_tags" : ["<red>", "<bule>"], "post_tags" : ["</red>", "</bule>"], "fields" : { "title" : {} } } }
執行成功後,返回如下結果。{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.41048482, "hits": [ { "_index": "aliyun-index-test", "_type": "doc", "_id": "2", "_score": 0.41048482, "_source": { "title": "I start work at nine." }, "highlight": { "title": [ "I <red>start</red> work at nine." ] } }, { "_index": "aliyun-index-test", "_type": "doc", "_id": "1", "_score": 0.39556286, "_source": { "title": "Shall I begin?" }, "highlight": { "title": [ "Shall I <red>begin</red>?" ] } } ] } }
方式二:直接引用同義字
以下樣本直接引用同義字,並使用IK詞典進行分詞。
- 建立索引。
PUT /my_index { "settings": { "analysis": { "analyzer": { "my_synonyms": { "filter": [ "lowercase", "my_synonym_filter" ], "tokenizer": "ik_smart" } }, "filter": { "my_synonym_filter": { "synonyms": [ "begin,start" ], "type": "synonym" } } } } }
以上命令的原理為:- 設定一個同義字過濾器my_synonym_filter,並配置同義字詞庫。
- 設定一個my_synonyms解譯器,使用ik_smart分詞。
- 經過ik_smart分詞,把所有字母小寫,並作為同義字處理。
- 配置同義字欄位title。
- Elasticsearch 7.0以下版本樣本
PUT /my_index/_mapping/doc { "properties": { "title": { "type": "text", "analyzer": "my_synonyms" } } }
- Elasticsearch 7.0及以上版本樣本
PUT /my_index/_mapping/ { "properties": { "title": { "type": "text", "analyzer": "my_synonyms" } } }
重要 官方Elasticsearch從7.0版本開始,移除了類型(type)的概念,預設使用_doc
代替,所以在設定索引mapping時無需指定type,否則會報錯。
- Elasticsearch 7.0以下版本樣本
- 校正同義字。
GET /my_index/_analyze { "analyzer":"my_synonyms", "text":"Shall I begin?" }
執行成功後,返回如下結果。{ "tokens": [ { "token": "shall", "start_offset": 0, "end_offset": 5, "type": "ENGLISH", "position": 0 }, { "token": "i", "start_offset": 6, "end_offset": 7, "type": "ENGLISH", "position": 1 }, { "token": "begin", "start_offset": 8, "end_offset": 13, "type": "ENGLISH", "position": 2 }, { "token": "start", "start_offset": 8, "end_offset": 13, "type": "SYNONYM", "position": 2 } ] }
- 添加資料,進行下一步測試。
- Elasticsearch 7.0以下版本樣本
PUT /my_index/doc/1 { "title": "Shall I begin?" }
PUT /my_index/doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0及以上版本樣本
PUT /my_index/_doc/1 { "title": "Shall I begin?" }
PUT /my_index/_doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0以下版本樣本
- 通過搜尋測試,校正同義字。
GET /my_index/_search { "query" : { "match" : { "title" : "begin" }}, "highlight" : { "pre_tags" : ["<red>", "<bule>"], "post_tags" : ["</red>", "</bule>"], "fields" : { "title" : {} } } }
執行成功後,返回如下結果。{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.41913947, "hits": [ { "_index": "my_index", "_type": "doc", "_id": "2", "_score": 0.41913947, "_source": { "title": "I start work at nine." }, "highlight": { "title": [ "I <red>start</red> work at nine." ] } }, { "_index": "my_index", "_type": "doc", "_id": "1", "_score": 0.39556286, "_source": { "title": "Shall I begin?" }, "highlight": { "title": [ "Shall I <red>begin</red>?" ] } } ] } }