Synonym token filter(同义词)
Configuration
{
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"lenient": true,
"synonyms": [
"波行, 宁波银行",
"招行 => 招商银行"
]
}
},
"analyzer": {
"text_char": {
"tokenizer": "char_tokenizer",
filter: [
"lowercase",
"whitespace_remove",
"synonym"
]
}
},
"tokenizer": {
"char_tokenizer": {
"type": "simple_pattern",
"pattern": "."
}
}
}
}
"招行 => 招商银行"
:single direction"波行,宁波银行"
:double direction
Configuration with txt file:
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt"
}
}
Analyzer workflow
Character filters => Tokenizer => Token filters
Token filters are not allowed to change the position or character offsets of each token.
_analyze {
"anayzer": "text_char",
"text": ["波行"]
}
Result:
{
"tokens": [
{
"token": ”波“,
”start_offset“: 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": ”宁“,
”start_offset“: 0,
"end_offset": 1,
"type": "SYNONYM",
"position": 0
},
{
"token": ”行“,
”start_offset“: 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": ”波“,
”start_offset“: 1,
"end_offset": 2,
"type": "SYNONYM",
"position": 1
},
{
"token": ”银“,
”start_offset“: 1,
"end_offset": 2,
"type": "SYNONYM",
"position": 2
},
{
"token": ”行“,
”start_offset“: 1,
"end_offset": 2,
"type": "SYNONYM",
"position": 3
},
]
}
Will get in query profile:
"description": "CUST_NM.char: \"(波 宁) (行 波) 银 行\""
Reference
Synonym token filter 借助同义词让 Elasticsearch 更加强大 | Elastic Blog