本地英文版地址: ../en/query-dsl-multi-match-query.html
multi_match
查询建立在 match
查询的基础上,以允许 多字段(multi-field) 查询:
GET /_search { "query": { "multi_match" : { "query": "this is a test", "fields": [ "subject", "message" ] } } }
fields
和 字段增强(per-field boosting)
可以使用通配符指定字段名称,比如:
GET /_search { "query": { "multi_match" : { "query": "Will Smith", "fields": [ "title", "*_name" ] } } }
单个字段可以用脱字符号(^
)来增强:
GET /_search { "query": { "multi_match" : { "query" : "this is a test", "fields" : [ "subject^3", "message" ] } } }
如果未指定 fields
,则 multi_match
查询默认使用 index.query.default_field
索引设置,该设置又默认为 *
。
*
提取映射中符合条件查询的所有字段,并过滤元数据字段。
然后将所有提取的字段组合起来构建一个查询。
一次可以查询的字段数量是有限制的。
它由 indices.query.bool.max_clause_count
搜索设置 定义的,默认为 1024。
multi_match
查询的类型:
multi_match
查询在内部执行的方式取决于参数 type
,该参数可设置为:
|
(默认)
查找与任何字段匹配的文档,但使用最佳字段的 |
|
查找与任何字段匹配的文档,并合并每个字段的 |
|
用相同的 |
|
对每个字段运行 |
|
对每个字段运行 |
|
在每个字段上创建 |
best_fields
类型在搜索同一字段中最容易找到的多个单词时最有用。
例如,一个字段里的“brown fox”比一个字段里的“brown”和另一个字段里的“fox”更有意义。
best_fields
类型为每个字段生成一个match
查询,并将它们包裹在一个 dis_max
查询中,以找到单个最佳匹配字段。
比如下面这个查询
GET /_search { "query": { "multi_match" : { "query": "brown fox", "type": "best_fields", "fields": [ "subject", "message" ], "tie_breaker": 0.3 } } }
在执行时会被转换为:
GET /_search { "query": { "dis_max": { "queries": [ { "match": { "subject": "brown fox" }}, { "match": { "message": "brown fox" }} ], "tie_breaker": 0.3 } } }
通常,best_fields
类型使用单个最佳匹配字段的分数,但如果指定了tie_breaker
,则它会按如下方式计算分数:
- 最佳匹配字段的分数
-
给所有其他匹配的字段加上
tie_breaker * _score
此外,接受 analyzer
、boost
、operator
、minimum_should_match
、fuzziness
、lenient
、prefix_length
、max_expansions
、rewrite
、zero_terms_query
、cutoff_frequency
、auto_generate_synonyms_phrase_query
及 fuzzy_transpositions
,如 match 查询 中所述。
operator
和 minimum_should_match
best_fields
和 most_fields
类型是以字段为中心(field-centric)的,它们为每个字段生成一个 match
查询。
这意味着 operator
和 minimum_should_match
参数分别应用于每个字段,这可能不是你想要的。
以下面这个查询为例:
GET /_search { "query": { "multi_match" : { "query": "Will Smith", "type": "best_fields", "fields": [ "first_name", "last_name" ], "operator": "and" } } }
该查询按如下方式执行:
(+first_name:will +first_name:smith) | (+last_name:will +last_name:smith)
换句话说,所有词项必须出现在一个字段中,文档才能匹配。
更好的解决方案请参考 cross_fields
。
当查询包含以不同方式分析的相同文本的多个字段时,most_fields
类型最有用。
例如,主字段可能包含同义词(synonyms)、词干(stemming)和不带发音符号(diacritics)的词项。
第二个字段可能包含原始的(original)词项(term),第三个字段可能包含 shingles(不知道该怎么翻译)。
通过组合所有三个字段的分数,我们可以将尽可能多的文档与主字段匹配,但使用第二个和第三个字段将最相似的结果推到列表的顶部。
下面这个查询
GET /_search { "query": { "multi_match" : { "query": "quick brown fox", "type": "most_fields", "fields": [ "title", "title.original", "title.shingles" ] } } }
在执行时会被转换为:
GET /_search { "query": { "bool": { "should": [ { "match": { "title": "quick brown fox" }}, { "match": { "title.original": "quick brown fox" }}, { "match": { "title.shingles": "quick brown fox" }} ] } } }
每个 match
子句的得分相加,然后除以 match
子句的数量。
此外,接受analyzer
、boost
、operator
、minimum_should_match
、fuzziness
、lenient
、prefix_length
、max_expansions
、rewrite
、zero_terms_query
及 cutoff_frequency
,如 match 查询 中所述,但是 请参见 operator
和 minimum_should_match
。
phrase
和 phrase_prefix
类型的行为类似于best_fields
,但是它们使用 match_phrase
或 match_phrase_prefix
查询而不是 match
查询。
下面这个查询
GET /_search { "query": { "multi_match" : { "query": "quick brown f", "type": "phrase_prefix", "fields": [ "subject", "message" ] } } }
在执行时会被转换为:
GET /_search { "query": { "dis_max": { "queries": [ { "match_phrase_prefix": { "subject": "quick brown f" }}, { "match_phrase_prefix": { "message": "quick brown f" }} ] } } }
此外,还接受 match 中所述的 analyzer
、boost
、lenient
及 zero_terms_query
,以及 match phrase 中介绍的 slop
。
phrase_prefix
类型还接受 max_expansions
。
cross_fields
类型对于多个字段should(应该)匹配的结构化文档特别有用。
例如,当在 first_name
和 last_name
字段查询“Will Smith”时,最佳匹配可能是一个字段中有“Will ”,另一个字段中有“Smith”。
处理这些类型的查询的一种方法是简单地将 first_name
和 last_name
字段索引到单个full_name
字段中。当然,这只能在索引时完成。
cross_field
类型试图通过采用以词项为中心(term-centric)的方法在查询时解决这些问题。
它首先将查询字符串分析成单个词项,然后在任何字段中查找每个词项,就好像它们是一个大字段一样。
像下面这样的一个查询
GET /_search { "query": { "multi_match" : { "query": "Will Smith", "type": "cross_fields", "fields": [ "first_name", "last_name" ], "operator": "and" } } }
会被执行为:
+(first_name:will last_name:will) +(first_name:smith last_name:smith)
换句话说,所有词项 必须出现在至少一个字段中,文档才能匹配。
(将这与 用于 best_fields
和 most_fields
的逻辑 进行比较。)
这解决了两个问题中的一个。 不同词频的问题通过混合(blending)所有字段的词频以消除差异来解决。
实际上,first_name:smith
将被视为与last_name:smith
+ 1 具有相同的频率。
这将使 first_name
和 last_name
的匹配项具有可比较的分数,last_name
有一点优势,因为它是最有可能包含smith
的字段。
请注意,cross_fields
通常只对 boost
都为 1
的短字符串字段有用。
否则 boost、词频和长度归一化会以这样的一种方式影响分数,以至于词项统计数据的混合不再有意义。
如果你通过 Validate API 运行上述查询,它将返回如下解释:
+blended("will", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name])
此外,还接受 analyzer
、boost
、operator
、minimum_should_match
、lenient
、zero_terms_query
及 cutoff_frequency
,如 match 查询 中所述。
cross_field
类型只能在具有相同分析器的字段上以词项为中心(term-centric)的模式下工作。
如上例所示,具有相同分析器的字段被分组在一起。
如果有多个组,它们将与一个 bool
查询组合在一起。
例如,如果字段 first
和 last
具有相同的分析器,加上 first.edge
和 last.edge
都使用 edge_ngram
分析器,则下面这个查询
GET /_search { "query": { "multi_match" : { "query": "Jon", "type": "cross_fields", "fields": [ "first", "first.edge", "last", "last.edge" ] } } }
将被执行为:
blended("jon", fields: [first, last]) | ( blended("j", fields: [first.edge, last.edge]) blended("jo", fields: [first.edge, last.edge]) blended("jon", fields: [first.edge, last.edge]) )
换句话说,first
和 last
将被分组在一起并被视为单个字段,first.edge
and last.edge
将被分组在一起并被视为单个字段。
拥有多个组是可以的,但是当与 operator
或 minimum_should_match
结合使用时,它可能会遇到与most_fields
或 best_fields
相同的问题。
可以很容易地将该查询重写为两个单独的 cross_fields
查询与一个 bool
查询的组合,并将 minimum_should_match
参数应用于其中一个查询:
GET /_search { "query": { "bool": { "should": [ { "multi_match" : { "query": "Will Smith", "type": "cross_fields", "fields": [ "first", "last" ], "minimum_should_match": "50%" } }, { "multi_match" : { "query": "Will Smith", "type": "cross_fields", "fields": [ "*.edge" ] } } ] } } }
通过在查询中指定 analyzer
参数,可以强制将所有字段归入同一组。
GET /_search { "query": { "multi_match" : { "query": "Jon", "type": "cross_fields", "analyzer": "standard", "fields": [ "first", "last", "*.edge" ] } } }
它将被执行为:
blended("will", fields: [first, first.edge, last.edge, last]) blended("smith", fields: [first, first.edge, last.edge, last])
默认情况下,每个按词项 blended
(混合的) 查询将使用组中任何字段返回的最佳分数,然后将这些分数相加得到最终分数。
参数 tie_breaker
可以更改按词项 blended
(混合的) 查询的默认行为。
它接受:
|
取单个最佳分数(默认)。(比如从 |
|
将分数相加。(比如把 |
|
将单个最佳分数加上 |
bool_prefix
类型的评分行为类似于most_fields
,但使用 match_bool_prefix
查询 而不是 match
查询。
GET /_search { "query": { "multi_match" : { "query": "quick brown f", "type": "bool_prefix", "fields": [ "subject", "message" ] } } }
支持 match 查询 中所述的参数 analyzer
、boost
、operator
、minimum_should_match
、lenient
、zero_terms_query
及 auto_generate_synonyms_phrase_query
。
用于构建词项(term, "条件"??)查询的词项支持fuzziness
、prefix_length
、max_expansions
、rewrite
及 fuzzy_transpositions
参数,但这些参数对从最终的词项构建的前缀查询没有作用。
该查询类型不支持slop
和 cutoff_frequency
参数。
- Elasticsearch权威指南: 其他版本:
- Elasticsearch是什么?
- 7.7版本的新特性
- 开始使用Elasticsearch
- 安装和设置
- 升级Elasticsearch
- 搜索你的数据
- 查询领域特定语言(Query DSL)
- SQL access(暂时不翻译)
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- 聚合
- 度量(metric)聚合
- 桶(bucket)聚合
- adjacency_matrix 聚合
- auto_date_histogram 聚合
- children 聚合
- composite 聚合
- date_histogram 聚合
- date_range 聚合
- diversified_sampler 聚合
- filter 聚合
- filters 聚合
- geo_distance 聚合
- geohash_grid 聚合
- geotile_grid 聚合
- global 聚合
- histogram 聚合
- ip_range 聚合
- missing 聚合
- nested 聚合
- parent 聚合
- range 聚合
- rare_terms 聚合
- reverse_nested 聚合
- sampler 聚合
- significant_terms 聚合
- significant_text 聚合
- terms 聚合
- 给范围字段分桶的微妙之处
- 管道(pipeline)聚合
- 矩阵(matrix)聚合
- 重度缓存的聚合
- 只返回聚合的结果
- 聚合元数据
- Returning the type of the aggregation
- 使用转换对聚合结果进行索引
- 脚本
- 映射
- 删除的映射类型
- 字段数据类型
- alias(别名)
- array(数组)
- binary(二进制)
- boolean(布尔)
- date(日期)
- date_nanos(日期纳秒)
- dense_vector(密集矢量)
- histogram(直方图)
- flattened(扁平)
- geo_point(地理坐标点)
- geo_shape(地理形状)
- IP
- join(联结)
- keyword(关键词)
- nested(嵌套)
- numeric(数值)
- object(对象)
- percolator(渗透器)
- range(范围)
- rank_feature(特征排名)
- rank_features(特征排名)
- search_as_you_type(输入即搜索)
- Sparse vector
- Text
- Token count
- Shape
- Constant keyword
- Meta-Fields
- Mapping parameters
- Dynamic Mapping
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Char Group Tokenizer
- Classic Tokenizer
- Edge n-gram tokenizer
- Keyword Tokenizer
- Letter Tokenizer
- Lowercase Tokenizer
- N-gram tokenizer
- Path Hierarchy Tokenizer
- Path Hierarchy Tokenizer Examples
- Pattern Tokenizer
- Simple Pattern Tokenizer
- Simple Pattern Split Tokenizer
- Standard Tokenizer
- Thai Tokenizer
- UAX URL Email Tokenizer
- Whitespace Tokenizer
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Index modules
- Ingest node
- Pipeline Definition
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Enrich your data
- Processors
- Append Processor
- Bytes Processor
- Circle Processor
- Convert Processor
- CSV Processor
- Date Processor
- Date Index Name Processor
- Dissect Processor
- Dot Expander Processor
- Drop Processor
- Enrich Processor
- Fail Processor
- Foreach Processor
- GeoIP Processor
- Grok Processor
- Gsub Processor
- HTML Strip Processor
- Inference Processor
- Join Processor
- JSON Processor
- KV Processor
- Lowercase Processor
- Pipeline Processor
- Remove Processor
- Rename Processor
- Script Processor
- Set Processor
- Set Security User Processor
- Split Processor
- Sort Processor
- Trim Processor
- Uppercase Processor
- URL Decode Processor
- User Agent processor
- ILM: Manage the index lifecycle
- Monitor a cluster
- Frozen indices
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure a cluster
- Overview
- Configuring security
- User authentication
- Built-in users
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Security privileges
- Document level security
- Field level security
- Granting privileges for indices and aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enabling audit logging
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Alerting on cluster and index events
- Command line tools
- How To
- Glossary of terms
- REST APIs
- API conventions
- cat APIs
- cat aliases
- cat allocation
- cat anomaly detectors
- cat count
- cat data frame analytics
- cat datafeeds
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat shards
- cat segments
- cat snapshots
- cat task management
- cat templates
- cat thread pool
- cat trained model
- cat transforms
- Cluster APIs
- Cluster allocation explain
- Cluster get settings
- Cluster health
- Cluster reroute
- Cluster state
- Cluster stats
- Cluster update settings
- Nodes feature usage
- Nodes hot threads
- Nodes info
- Nodes reload secure settings
- Nodes stats
- Pending cluster tasks
- Remote cluster info
- Task management
- Voting configuration exclusions
- Cross-cluster replication APIs
- Document APIs
- Enrich APIs
- Explore API
- Index APIs
- Add index alias
- Analyze
- Clear cache
- Clone index
- Close index
- Create index
- Delete index
- Delete index alias
- Delete index template
- Flush
- Force merge
- Freeze index
- Get field mapping
- Get index
- Get index alias
- Get index settings
- Get index template
- Get mapping
- Index alias exists
- Index exists
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists
- Open index
- Put index template
- Put mapping
- Refresh
- Rollover index
- Shrink index
- Split index
- Synced flush
- Type exists
- Unfreeze index
- Update index alias
- Update index settings
- Index lifecycle management API
- Ingest APIs
- Info API
- Licensing APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendar
- Create datafeeds
- Create filter
- Delete calendar
- Delete datafeeds
- Delete events from calendar
- Delete filter
- Delete forecast
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Estimate model memory
- Find file structure
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filter
- Update jobs
- Update model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Create inference trained model
- Delete data frame analytics jobs
- Delete inference trained model
- Evaluate data frame analytics
- Explain data frame analytics API
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Get inference trained model
- Get inference trained model stats
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Migration APIs
- Reload search analyzers
- Rollup APIs
- Search APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect Prepare Authentication API
- OpenID Connect authenticate API
- OpenID Connect logout API
- SAML prepare authentication API
- SAML authenticate API
- SAML logout API
- SAML invalidate API
- SSL certificate
- Snapshot and restore APIs
- Snapshot lifecycle management API
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Breaking changes
- Release notes
- Elasticsearch version 7.7.1
- Elasticsearch version 7.7.0
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1