配置es同義字 - Elasticsearch

通過使用同義字，您可以將已經上傳的同義字檔案作用於Elasticsearch的同義字，並使用更新後的詞庫搜尋。Elasticsearch支援兩種方式使用同義字：上傳同義字檔案、直接引用同義字。本文分別介紹兩種方式的使用樣本。

背景資訊

本文中的命令，均可在Kibana控制台中執行。登入Kibana控制台的方法，請參見登入Kibana控制台。

方式一：上傳同義字檔案

前提條件：已上傳同義字檔案。具體操作，請參見上傳同義字檔案進行上傳。

以下樣本使用filter過濾器配置同義字，使用aliyun_synonyms.txt作為測試檔案，內容為begin, start。

建立索引。

PUT /aliyun-index-test
{
  "settings": {
    "index":{
      "analysis": {
          "analyzer": {
            "by_smart": {
              "type": "custom",
              "tokenizer": "ik_smart",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            },
            "by_max_word": {
              "type": "custom",
              "tokenizer": "ik_max_word",
              "filter": ["by_tfr","by_sfr"],
              "char_filter": ["by_cfr"]
            }
         },
         "filter": {
            "by_tfr": {
              "type": "stop",
              "stopwords": [" "]
              },
           "by_sfr": {
              "type": "synonym",
              "synonyms_path": "analysis/aliyun_synonyms.txt"
              }
          },
          "char_filter": {
            "by_cfr": {
              "type": "mapping",
              "mappings": ["| => |"]
            }
          }
      }
    }
  }
}

配置同義字欄位title。

Elasticsearch 7.0以下版本樣本

PUT /aliyun-index-test/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}

Elasticsearch 7.0及以上版本樣本
```
PUT /aliyun-index-test/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "by_max_word",
   "search_analyzer": "by_smart"
 }
}
}
```
重要官方Elasticsearch從7.0版本開始，移除了類型（type）的概念，預設使用_doc代替。因此在設定索引mapping時無需指定type，否則會報錯。

校正同義字。

GET /aliyun-index-test/_analyze
{
"analyzer": "by_smart",
"text":"begin"
}

執行成功後，返回如下結果。

{
"tokens": [
 {
   "token": "begin",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "start",
   "start_offset": 0,
   "end_offset": 5,
   "type": "SYNONYM",
   "position": 0
 }
]
}

添加資料，進行下一步測試。

Elasticsearch 7.0以下版本樣本

PUT /aliyun-index-test/doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/doc/2
{
"title": "I start work at nine."
}

Elasticsearch 7.0及以上版本樣本

PUT /aliyun-index-test/_doc/1
{
"title": "Shall I begin?"
}

PUT /aliyun-index-test/_doc/2
{
"title": "I start work at nine."
}

通過搜尋測試，校正同義字。

GET /aliyun-index-test/_search
{
 "query" : { "match" : { "title" : "begin" }},
 "highlight" : {
     "pre_tags" : ["<red>", "<bule>"],
     "post_tags" : ["</red>", "</bule>"],
     "fields" : {
         "title" : {}
     }
 }
}

執行成功後，返回如下結果。

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41048482,
 "hits": [
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41048482,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "aliyun-index-test",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}

方式二：直接引用同義字

以下樣本直接引用同義字，並使用IK詞典進行分詞。

建立索引。

PUT /my_index
{
 "settings": {
     "analysis": {
         "analyzer": {
             "my_synonyms": {
                 "filter": [
                     "lowercase",
                     "my_synonym_filter"
                 ],
                 "tokenizer": "ik_smart"
             }
         },
         "filter": {
             "my_synonym_filter": {
                 "synonyms": [
                     "begin,start"
                 ],
                 "type": "synonym"
             }
         }
     }
 }
}

以上命令的原理為：

設定一個同義字過濾器my_synonym_filter，並配置同義字詞庫。
設定一個my_synonyms解譯器，使用ik_smart分詞。
經過ik_smart分詞，把所有字母小寫，並作為同義字處理。

配置同義字欄位title。
- Elasticsearch 7.0以下版本樣本
```
PUT /my_index/_mapping/doc
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}
```
- Elasticsearch 7.0及以上版本樣本
```
PUT /my_index/_mapping/
{
"properties": {
 "title": {
   "type": "text",
   "analyzer": "my_synonyms"
 }
}
}
```
  重要官方Elasticsearch從7.0版本開始，移除了類型（type）的概念，預設使用_doc代替，所以在設定索引mapping時無需指定type，否則會報錯。

校正同義字。

GET /my_index/_analyze
{
 "analyzer":"my_synonyms",
 "text":"Shall I begin?"
}

執行成功後，返回如下結果。

{
"tokens": [
 {
   "token": "shall",
   "start_offset": 0,
   "end_offset": 5,
   "type": "ENGLISH",
   "position": 0
 },
 {
   "token": "i",
   "start_offset": 6,
   "end_offset": 7,
   "type": "ENGLISH",
   "position": 1
 },
 {
   "token": "begin",
   "start_offset": 8,
   "end_offset": 13,
   "type": "ENGLISH",
   "position": 2
 },
 {
   "token": "start",
   "start_offset": 8,
   "end_offset": 13,
   "type": "SYNONYM",
   "position": 2
 }
]
}

添加資料，進行下一步測試。

Elasticsearch 7.0以下版本樣本

PUT /my_index/doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/doc/2
{
"title": "I start work at nine."
}

Elasticsearch 7.0及以上版本樣本

PUT /my_index/_doc/1
{
"title": "Shall I begin?"
}

PUT /my_index/_doc/2
{
"title": "I start work at nine."
}

通過搜尋測試，校正同義字。

GET /my_index/_search
{
"query" : { "match" : { "title" : "begin" }},
"highlight" : {
  "pre_tags" : ["<red>", "<bule>"],
  "post_tags" : ["</red>", "</bule>"],
  "fields" : {
      "title" : {}
  }
}
}

執行成功後，返回如下結果。

{
"took": 11,
"timed_out": false,
"_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
},
"hits": {
 "total": 2,
 "max_score": 0.41913947,
 "hits": [
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "2",
     "_score": 0.41913947,
     "_source": {
       "title": "I start work at nine."
     },
     "highlight": {
       "title": [
         "I <red>start</red> work at nine."
       ]
     }
   },
   {
     "_index": "my_index",
     "_type": "doc",
     "_id": "1",
     "_score": 0.39556286,
     "_source": {
       "title": "Shall I begin?"
     },
     "highlight": {
       "title": [
         "Shall I <red>begin</red>?"
       ]
     }
   }
 ]
}
}