全部產品
Search
文件中心

Elasticsearch:使用同義字

更新時間:Jun 30, 2024

通過使用同義字,您可以將已經上傳的同義字檔案作用於Elasticsearch的同義字,並使用更新後的詞庫搜尋。Elasticsearch支援兩種方式使用同義字:上傳同義字檔案、直接引用同義字。本文分別介紹兩種方式的使用樣本。

背景資訊

本文中的命令,均可在Kibana控制台中執行。登入Kibana控制台的方法,請參見登入Kibana控制台

方式一:上傳同義字檔案

前提條件:已上傳同義字檔案。具體操作,請參見上傳同義字檔案進行上傳。

以下樣本使用filter過濾器配置同義字,使用aliyun_synonyms.txt作為測試檔案,內容為begin, start

  1. 建立索引。
    PUT /aliyun-index-test
    {
      "settings": {
        "index":{
          "analysis": {
              "analyzer": {
                "by_smart": {
                  "type": "custom",
                  "tokenizer": "ik_smart",
                  "filter": ["by_tfr","by_sfr"],
                  "char_filter": ["by_cfr"]
                },
                "by_max_word": {
                  "type": "custom",
                  "tokenizer": "ik_max_word",
                  "filter": ["by_tfr","by_sfr"],
                  "char_filter": ["by_cfr"]
                }
             },
             "filter": {
                "by_tfr": {
                  "type": "stop",
                  "stopwords": [" "]
                  },
               "by_sfr": {
                  "type": "synonym",
                  "synonyms_path": "analysis/aliyun_synonyms.txt"
                  }
              },
              "char_filter": {
                "by_cfr": {
                  "type": "mapping",
                  "mappings": ["| => |"]
                }
              }
          }
        }
      }
    }
  2. 配置同義字欄位title
    • Elasticsearch 7.0以下版本樣本
      PUT /aliyun-index-test/_mapping/doc
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "by_max_word",
         "search_analyzer": "by_smart"
       }
      }
      }
    • Elasticsearch 7.0及以上版本樣本
      PUT /aliyun-index-test/_mapping/
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "by_max_word",
         "search_analyzer": "by_smart"
       }
      }
      }
      重要 官方Elasticsearch從7.0版本開始,移除了類型(type)的概念,預設使用_doc代替。因此在設定索引mapping時無需指定type,否則會報錯。
  3. 校正同義字。
    GET /aliyun-index-test/_analyze
    {
    "analyzer": "by_smart",
    "text":"begin"
    }
    執行成功後,返回如下結果。
    {
    "tokens": [
     {
       "token": "begin",
       "start_offset": 0,
       "end_offset": 5,
       "type": "ENGLISH",
       "position": 0
     },
     {
       "token": "start",
       "start_offset": 0,
       "end_offset": 5,
       "type": "SYNONYM",
       "position": 0
     }
    ]
    }
  4. 添加資料,進行下一步測試。
    • Elasticsearch 7.0以下版本樣本
      PUT /aliyun-index-test/doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /aliyun-index-test/doc/2
      {
      "title": "I start work at nine."
      }
    • Elasticsearch 7.0及以上版本樣本
      PUT /aliyun-index-test/_doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /aliyun-index-test/_doc/2
      {
      "title": "I start work at nine."
      }
  5. 通過搜尋測試,校正同義字。
    GET /aliyun-index-test/_search
    {
     "query" : { "match" : { "title" : "begin" }},
     "highlight" : {
         "pre_tags" : ["<red>", "<bule>"],
         "post_tags" : ["</red>", "</bule>"],
         "fields" : {
             "title" : {}
         }
     }
    }
    執行成功後,返回如下結果。
    {
    "took": 11,
    "timed_out": false,
    "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
    },
    "hits": {
     "total": 2,
     "max_score": 0.41048482,
     "hits": [
       {
         "_index": "aliyun-index-test",
         "_type": "doc",
         "_id": "2",
         "_score": 0.41048482,
         "_source": {
           "title": "I start work at nine."
         },
         "highlight": {
           "title": [
             "I <red>start</red> work at nine."
           ]
         }
       },
       {
         "_index": "aliyun-index-test",
         "_type": "doc",
         "_id": "1",
         "_score": 0.39556286,
         "_source": {
           "title": "Shall I begin?"
         },
         "highlight": {
           "title": [
             "Shall I <red>begin</red>?"
           ]
         }
       }
     ]
    }
    }

方式二:直接引用同義字

以下樣本直接引用同義字,並使用IK詞典進行分詞。

  1. 建立索引。
    PUT /my_index
    {
     "settings": {
         "analysis": {
             "analyzer": {
                 "my_synonyms": {
                     "filter": [
                         "lowercase",
                         "my_synonym_filter"
                     ],
                     "tokenizer": "ik_smart"
                 }
             },
             "filter": {
                 "my_synonym_filter": {
                     "synonyms": [
                         "begin,start"
                     ],
                     "type": "synonym"
                 }
             }
         }
     }
    }
    以上命令的原理為:
    1. 設定一個同義字過濾器my_synonym_filter,並配置同義字詞庫。
    2. 設定一個my_synonyms解譯器,使用ik_smart分詞。
    3. 經過ik_smart分詞,把所有字母小寫,並作為同義字處理。
  2. 配置同義字欄位title
    • Elasticsearch 7.0以下版本樣本
      PUT /my_index/_mapping/doc
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "my_synonyms"
       }
      }
      }
    • Elasticsearch 7.0及以上版本樣本
      PUT /my_index/_mapping/
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "my_synonyms"
       }
      }
      }
      重要 官方Elasticsearch從7.0版本開始,移除了類型(type)的概念,預設使用_doc代替,所以在設定索引mapping時無需指定type,否則會報錯。
  3. 校正同義字。
    GET /my_index/_analyze
    {
     "analyzer":"my_synonyms",
     "text":"Shall I begin?"
    }
    執行成功後,返回如下結果。
    {
    "tokens": [
     {
       "token": "shall",
       "start_offset": 0,
       "end_offset": 5,
       "type": "ENGLISH",
       "position": 0
     },
     {
       "token": "i",
       "start_offset": 6,
       "end_offset": 7,
       "type": "ENGLISH",
       "position": 1
     },
     {
       "token": "begin",
       "start_offset": 8,
       "end_offset": 13,
       "type": "ENGLISH",
       "position": 2
     },
     {
       "token": "start",
       "start_offset": 8,
       "end_offset": 13,
       "type": "SYNONYM",
       "position": 2
     }
    ]
    }
  4. 添加資料,進行下一步測試。
    • Elasticsearch 7.0以下版本樣本
      PUT /my_index/doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /my_index/doc/2
      {
      "title": "I start work at nine."
      }
    • Elasticsearch 7.0及以上版本樣本
      PUT /my_index/_doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /my_index/_doc/2
      {
      "title": "I start work at nine."
      }
  5. 通過搜尋測試,校正同義字。
    GET /my_index/_search
    {
    "query" : { "match" : { "title" : "begin" }},
    "highlight" : {
      "pre_tags" : ["<red>", "<bule>"],
      "post_tags" : ["</red>", "</bule>"],
      "fields" : {
          "title" : {}
      }
    }
    }
    執行成功後,返回如下結果。
    {
    "took": 11,
    "timed_out": false,
    "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
    },
    "hits": {
     "total": 2,
     "max_score": 0.41913947,
     "hits": [
       {
         "_index": "my_index",
         "_type": "doc",
         "_id": "2",
         "_score": 0.41913947,
         "_source": {
           "title": "I start work at nine."
         },
         "highlight": {
           "title": [
             "I <red>start</red> work at nine."
           ]
         }
       },
       {
         "_index": "my_index",
         "_type": "doc",
         "_id": "1",
         "_score": 0.39556286,
         "_source": {
           "title": "Shall I begin?"
         },
         "highlight": {
           "title": [
             "Shall I <red>begin</red>?"
           ]
         }
       }
     ]
    }
    }