通过使用同义词,您可以将已经上传的同义词文件作用于阿里云Elasticsearch的同义词库,并使用更新后的词库搜索。阿里云Elasticsearch支持两种方式使用同义词:上传同义词文件、直接引用同义词。本文分别介绍两种方式的使用示例。

背景信息

本文中的命令,均可在Kibana控制台中执行。登录Kibana控制台的方法,请参见登录Kibana控制台

方式一:上传同义词文件

前提条件:已上传同义词文件。具体操作,请参见上传同义词文件进行上传。

以下示例使用filter过滤器配置同义词,使用aliyun_synonyms.txt作为测试文件,内容为begin, start

  1. 创建索引。
    PUT /aliyun-index-test
    {
      "settings": {
        "index":{
          "analysis": {
              "analyzer": {
                "by_smart": {
                  "type": "custom",
                  "tokenizer": "ik_smart",
                  "filter": ["by_tfr","by_sfr"],
                  "char_filter": ["by_cfr"]
                },
                "by_max_word": {
                  "type": "custom",
                  "tokenizer": "ik_max_word",
                  "filter": ["by_tfr","by_sfr"],
                  "char_filter": ["by_cfr"]
                }
             },
             "filter": {
                "by_tfr": {
                  "type": "stop",
                  "stopwords": [" "]
                  },
               "by_sfr": {
                  "type": "synonym",
                  "synonyms_path": "analysis/aliyun_synonyms.txt"
                  }
              },
              "char_filter": {
                "by_cfr": {
                  "type": "mapping",
                  "mappings": ["| => |"]
                }
              }
          }
        }
      }
    }
  2. 配置同义词字段title
    • Elasticsearch 7.0以下版本示例
      PUT /aliyun-index-test/_mapping/doc
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "by_max_word",
         "search_analyzer": "by_smart"
       }
      }
      }
    • Elasticsearch 7.0及以上版本示例
      PUT /aliyun-index-test/_mapping/
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "by_max_word",
         "search_analyzer": "by_smart"
       }
      }
      }
      重要 官方Elasticsearch从7.0版本开始,移除了类型(type)的概念,默认使用_doc代替。因此在设置索引mapping时无需指定type,否则会报错。
  3. 校验同义词。
    GET /aliyun-index-test/_analyze
    {
    "analyzer": "by_smart",
    "text":"begin"
    }
    执行成功后,返回如下结果。
    {
    "tokens": [
     {
       "token": "begin",
       "start_offset": 0,
       "end_offset": 5,
       "type": "ENGLISH",
       "position": 0
     },
     {
       "token": "start",
       "start_offset": 0,
       "end_offset": 5,
       "type": "SYNONYM",
       "position": 0
     }
    ]
    }
  4. 添加数据,进行下一步测试。
    • Elasticsearch 7.0以下版本示例
      PUT /aliyun-index-test/doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /aliyun-index-test/doc/2
      {
      "title": "I start work at nine."
      }
    • Elasticsearch 7.0及以上版本示例
      PUT /aliyun-index-test/_doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /aliyun-index-test/_doc/2
      {
      "title": "I start work at nine."
      }
  5. 通过搜索测试,校验同义词。
    GET /aliyun-index-test/_search
    {
     "query" : { "match" : { "title" : "begin" }},
     "highlight" : {
         "pre_tags" : ["<red>", "<bule>"],
         "post_tags" : ["</red>", "</bule>"],
         "fields" : {
             "title" : {}
         }
     }
    }
    执行成功后,返回如下结果。
    {
    "took": 11,
    "timed_out": false,
    "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
    },
    "hits": {
     "total": 2,
     "max_score": 0.41048482,
     "hits": [
       {
         "_index": "aliyun-index-test",
         "_type": "doc",
         "_id": "2",
         "_score": 0.41048482,
         "_source": {
           "title": "I start work at nine."
         },
         "highlight": {
           "title": [
             "I <red>start</red> work at nine."
           ]
         }
       },
       {
         "_index": "aliyun-index-test",
         "_type": "doc",
         "_id": "1",
         "_score": 0.39556286,
         "_source": {
           "title": "Shall I begin?"
         },
         "highlight": {
           "title": [
             "Shall I <red>begin</red>?"
           ]
         }
       }
     ]
    }
    }

方式二:直接引用同义词

以下示例直接引用同义词,并使用IK词典进行分词。

  1. 创建索引。
    PUT /my_index
    {
     "settings": {
         "analysis": {
             "analyzer": {
                 "my_synonyms": {
                     "filter": [
                         "lowercase",
                         "my_synonym_filter"
                     ],
                     "tokenizer": "ik_smart"
                 }
             },
             "filter": {
                 "my_synonym_filter": {
                     "synonyms": [
                         "begin,start"
                     ],
                     "type": "synonym"
                 }
             }
         }
     }
    }
    以上命令的原理为:
    1. 设置一个同义词过滤器my_synonym_filter,并配置同义词词库。
    2. 设置一个my_synonyms解释器,使用ik_smart分词。
    3. 经过ik_smart分词,把所有字母小写,并作为同义词处理。
  2. 配置同义词字段title
    • Elasticsearch 7.0以下版本示例
      PUT /my_index/_mapping/doc
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "my_synonyms"
       }
      }
      }
    • Elasticsearch 7.0及以上版本示例
      PUT /my_index/_mapping/
      {
      "properties": {
       "title": {
         "type": "text",
         "analyzer": "my_synonyms"
       }
      }
      }
      重要 官方Elasticsearch从7.0版本开始,移除了类型(type)的概念,默认使用_doc代替,所以在设置索引mapping时无需指定type,否则会报错。
  3. 校验同义词。
    GET /my_index/_analyze
    {
     "analyzer":"my_synonyms",
     "text":"Shall I begin?"
    }
    执行成功后,返回如下结果。
    {
    "tokens": [
     {
       "token": "shall",
       "start_offset": 0,
       "end_offset": 5,
       "type": "ENGLISH",
       "position": 0
     },
     {
       "token": "i",
       "start_offset": 6,
       "end_offset": 7,
       "type": "ENGLISH",
       "position": 1
     },
     {
       "token": "begin",
       "start_offset": 8,
       "end_offset": 13,
       "type": "ENGLISH",
       "position": 2
     },
     {
       "token": "start",
       "start_offset": 8,
       "end_offset": 13,
       "type": "SYNONYM",
       "position": 2
     }
    ]
    }
  4. 添加数据,进行下一步测试。
    • Elasticsearch 7.0以下版本示例
      PUT /my_index/doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /my_index/doc/2
      {
      "title": "I start work at nine."
      }
    • Elasticsearch 7.0及以上版本示例
      PUT /my_index/_doc/1
      {
      "title": "Shall I begin?"
      }
      PUT /my_index/_doc/2
      {
      "title": "I start work at nine."
      }
  5. 通过搜索测试,校验同义词。
    GET /my_index/_search
    {
    "query" : { "match" : { "title" : "begin" }},
    "highlight" : {
      "pre_tags" : ["<red>", "<bule>"],
      "post_tags" : ["</red>", "</bule>"],
      "fields" : {
          "title" : {}
      }
    }
    }
    执行成功后,返回如下结果。
    {
    "took": 11,
    "timed_out": false,
    "_shards": {
     "total": 5,
     "successful": 5,
     "failed": 0
    },
    "hits": {
     "total": 2,
     "max_score": 0.41913947,
     "hits": [
       {
         "_index": "my_index",
         "_type": "doc",
         "_id": "2",
         "_score": 0.41913947,
         "_source": {
           "title": "I start work at nine."
         },
         "highlight": {
           "title": [
             "I <red>start</red> work at nine."
           ]
         }
       },
       {
         "_index": "my_index",
         "_type": "doc",
         "_id": "1",
         "_score": 0.39556286,
         "_source": {
           "title": "Shall I begin?"
         },
         "highlight": {
           "title": [
             "Shall I <red>begin</red>?"
           ]
         }
       }
     ]
    }
    }