全部產品
Search
文件中心

Elasticsearch:同義字配置規則

更新時間:Jun 30, 2024

Elasticsearch(簡稱ES)支援使用filter過濾器來配置同義字。filter過濾器支援Solr和WordNet兩種同義字格式。

配置樣本

PUT /test_index
{    
    "settings": {        
        "index" : {            
            "analysis" : {                
                "analyzer" : {                    
                    "synonym" : {                        
                        "tokenizer" : "whitespace",                       
                        "filter" : ["synonym"]                    
                        }               
                   },                
                   "filter" : {                    
                        "synonym" : {                       
                             "type" : "synonym",                        
                              "synonyms_path" : "analysis/synonym.txt",                                          
                              "tokenizer" : "whitespace"                    
                          }               
                       }            
                    }        
                  }    
          }
}

filter中配置了一個synonym(同義字)過濾器,其中包含了同義字詞典檔案的路徑analysis/synonym.txt(路徑是相對於config的位置)。更多參數說明請參見官方ES的Synonym Token Filter文檔。

Solr同義字

配置規則樣本如下。
# Blank lines and lines starting with pound are comments.
# Explicit mappings match any token sequence on the LHS of "=>"
# and replace with all alternatives on the RHS.  These types of mappings
# ignore the expand parameter in the schema.
# Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
# Equivalent synonyms may be separated with commas and give
# no explicit mapping.  In this case the mapping behavior will
# be taken from the expand parameter in the schema.  This allows
# the same synonym file to be used in different synonym handling strategies.
# Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos
lol, laughing out loud
# If expand==true, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent
# to the explicit mapping:
ipod, i-pod, i pod => ipod
# Multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
# is equivalent to
foo => foo bar, baz
您也可以在filter過濾器中直接定義同義字(請注意使用synonyms而不是synonyms_path),樣本如下。
PUT /test_index
{
    "settings": {
        "index" : {
            "analysis" : {
                "filter" : {
                    "synonym" : {
                        "type" : "synonym",
                        "synonyms" : [
                            "i-pod, i pod => ipod",
                            "begin, start"
                        ]
                    }
                }
            }
        }
    }
}
說明 建議您使用synonyms_path在檔案中定義大型同義字集。

WordNet同義字

配置規則樣本如下。
PUT /test_index
{
    "settings": {
        "index" : {
            "analysis" : {
                "filter" : {
                    "synonym" : {
                        "type" : "synonym",
                        "format" : "wordnet",
                        "synonyms" : [
                            "s(100000001,1,'abstain',v,1,0).",
                            "s(100000001,2,'refrain',v,1,0).",
                            "s(100000001,3,'desist',v,1,0)."
                        ]
                    }
                }
            }
        }
    }
}

以上樣本使用synonyms定義WordNet同義字,您也可以使用synonyms_path在文本中定義WordNet同義字。