All Products
Search
Document Center

AnalyticDB for MySQL:Full-text search

Last Updated:Aug 22, 2024

This topic describes how to use the full-text index functions MATCH() AGAINST(), MATCH() FUZZY(), and MATCH() PHRASE() to perform full-text search and highlight full-text search keywords.

Prerequisites

Full-text indexes are created. For more information, see Create a full-text index.

Sample data

In this example, a table named tbl_fulltext_demo is created to provide sample data. Each full-text index uses a different analyzer. The following SQL statements are used to create the tbl_fulltext_demo table and insert data into the table:

CREATE TABLE `tbl_fulltext_demo` (
  `id` int,
  `content` varchar,
  `content_alinlp` varchar,
  `content_ik` varchar,
  `content_standard` varchar,
  `content_ngram` varchar,
  FULLTEXT INDEX fidx_c(`content`),  // Use the default analyzer.
  FULLTEXT INDEX fidx_alinlp(`content_alinlp`) WITH ANALYZER alinlp,
  FULLTEXT INDEX fidx_ik(`content_ik`) WITH ANALYZER ik,
  FULLTEXT INDEX fidx_standard(`content_standard`) WITH ANALYZER standard,
  FULLTEXT INDEX fidx_ngram(`content_ngram`) WITH ANALYZER ngram,
  PRIMARY KEY (`id`)
) DISTRIBUTE BY HASH(id);



INSERT INTO tbl_fulltext_demo(id, content, content_alinlp, content_ik, content_standard, content_ngram) 
VALUES(1, 'Customers Need Better Products and Services', 'Customers Need Better Products and Services', 'Customers Need Better Products and Services', 'Customers Need Better Products and Services', 'Customers Need Better Products and Services')
, (2, 'Wuhan Changjiang Bridge', 'Wuhan Changjiang Bridge', 'Wuhan Changjiang Bridge', 'Wuhan Changjiang Bridge', 'Wuhan Changjiang Bridge')
,(3, 'Hangzhou, Zhejiang Province','Hangzhou, Zhejiang Province', 'Hangzhou, Zhejiang Province', 'Hangzhou, Zhejiang Province', 'Hangzhou, Zhejiang Province')
, (4, 'ZheA666666, 18888888888', 'ZheA666666, 18888888888', 'ZheA666666, 18888888888', 'ZheA666666, 18888888888', 'ZheA666666, 18888888888')
, (5, 'User Values and Commercial Values of Products', 'User Values and Commercial Values of Products', 'User Values and Commercial Values of Products', 'User Values and Commercial Values of Products', 'User Values and Commercial Values of Products');
                

Usage notes

  • Full-text index functions support specific special characters but the escape character \\ must be used before the special characters. The following special characters are supported: + - & | ! ( ) { } [ ] ^ " ~ * ? : \ /.

    For example, you want to search for content that contains Spring/Scenery. Incorrect syntax: MATCH(content) AGAINST('Spring / Scenery' ). Correct syntax: MATCH(content) AGAINST('Spring \\/ Scenery').

  • Full-text index functions do not support operators such as =, !=, BETWEEN, IS NULL, IS NOT NULL, and LIKE.

MATCH() AGAINST()

The MATCH() AGAINST() function supports word match and exact match. You can use this function to search for content that matches a keyword from one or more columns.

Syntax

SELECT * FROM `table_name` WHERE match(column_name[ , ... ]) against('term')

Parameters

  • table_name: the name of the table that you want to search.

  • column_name: the name of the column that you want to search. Separate multiple columns with commas (,).

  • term: the search keyword. Keyword search supports the following logical operators:

    • AND: searches for content that matches all keywords.

    • OR: searches for content that matches one of the keywords.

    • NOT: searches for content that matches only the keyword to the left of the logical operator.

    Note

    Logical operators are case-insensitive.

Example 1: Perform a single-column query

SELECT id, content
FROM `tbl_fulltext_demo`
WHERE MATCH(`content`) AGAINST('Products Services');

Sample result:

+------+--------------------------------------+
| id   | content                              |
+------+--------------------------------------+
| 5    |User Values and Commercial Values of Products              |
+------+--------------------------------------+
| 1    | Customers Need Better Products and Services             |
+------+--------------------------------------+

Example 2: Perform a multi-column query

If you want to search for content from multiple columns, you do not need to create a multi-column index. If each of the columns has a full-text index, you can perform full-text search on the columns.

SELECT id, content, content_alinlp
FROM `tbl_fulltext_demo`
WHERE MATCH(content, content_alinlp) AGAINST('Services');

or

SELECT id, content, content_alinlp
FROM `tbl_fulltext_demo`
WHERE MATCH(content) AGAINST('Services')
OR MATCH(content_alinlp) AGAINST('Services');
Note

The preceding SQL statements have the same effect.

Sample result:

+------+-----------------------------+------------------------------+
| id   | content                     | content_alinlp               |
+------+-----------------------------+------------------------------+
| 1    | Customers Need Better Products and Services    | Customers Need Better Products and Services     |
+------+-----------------------------+------------------------------+

Example 3: Perform a Boolean query

  • Use the AND logical operator to search for content that matches all keywords.

    SELECT * FROM `tbl_fulltext_demo` WHERE MATCH(content) AGAINST('Products AND Services');

    Sample result:

    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
    | id   | content                        | content_alinlp                 | content_ik                  | content_standard                     | content_ngram                        |
    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
    |    1 | Customers Need Better Products and Services         | Customers Need Better Products and Services         | Customers Need Better Products and Services       | Customers Need Better Products and Services               | Customers Need Better Products and Services               |
    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
  • Use the OR logical operator to search for content that matches any of the keywords.

    SELECT * FROM `tbl_fulltext_demo` WHERE MATCH(content) AGAINST('Products OR Services');

    Sample result:

    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
    | id   | content                        | content_alinlp                 | content_ik                  | content_standard                     | content_ngram                        |
    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
    |    5 | User Values and Commercial Values of Products         | User Values and Commercial Values of Products         | User Values and Commercial Values of Products       | User Values and Commercial Values of Products                | User Values and Commercial Values of Products               |
    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
    |    1 | Customers Need Better Products and Services         | Customers Need Better Products and Services         | Customers Need Better Products and Services       | Customers Need Better Products and Services               |  Customers Need Better Products and Services              |
    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
  • Use the NOT logical operator to search for content that matches only the keyword to the left of the logical operator.

    SELECT * FROM `tbl_fulltext_demo` WHERE MATCH(content) AGAINST('Products NOT Services');

    Sample result:

    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
    | id   | content                        | content_alinlp                 | content_ik                  | content_standard                     | content_ngram                        |
    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
    |    5 | User Values and Commercial Values of Products         | User Values and Commercial Values of Products          | User Values and Commercial Values of Products      | User Values and Commercial Values of Products               | User Values and Commercial Values of Products               |
    +------+--------------------------------+--------------------------------+-----------------------------+--------------------------------------+--------------------------------------+
  • Use parentheses () to construct a complex Boolean query.

    SELECT id, content_alinlp
    FROM `tbl_fulltext_demo`
    WHERE MATCH(content_alinlp) AGAINST('(Wuhan OR Hangzhou) AND (Bridge OR Xihu)');

    Sample result:

    +------+-----------------------+
    | id   | content_alinlp        |
    +------+-----------------------+
    |    2 | Wuhan Changjiang Bridge         |
    +------+-----------------------+

Example 4: Perform a query and filter the result set

Full-text search returns all results that approximate a keyword. In scenarios that involve large amounts of data, the size of the result set obtained based on a keyword is large. If you want only the highly approximate results, you can use the result set filtering feature provided by AnalyticDB for MySQL.

The following sample statement filters out 90% of the results that have low approximation and returns only the remaining 10% of results:

SELECT id, content
FROM `tbl_fulltext_demo`
WHERE MATCH(content) AGAINST('Products Services') > 0.9;

Sample result:

+------+--------------------------------------+
| id   | content                              |
+------+--------------------------------------+
|    5 | User Values and Commercial Values of Products               |
+------+--------------------------------------+
|    1 | Customers Need Better Products and Services               |
+------+--------------------------------------+

Example 5: Perform a query and sort the results by approximation score

AnalyticDB for MySQL allows you to obtain the approximation scores of results and sort the results by approximation score.

  • Query the approximation scores.

    SELECT id, content, MATCH(content) AGAINST('Products Services') AS score
    FROM `tbl_fulltext_demo`
    WHERE MATCH(content) AGAINST('Products Services') > 0.9;
    Note

    WHERE MATCH(content) AGAINST('Products Services') > 0.9 specifies that 90% of the results that have low approximation are filtered out, and only the remaining 10% of results are returned.

    By default, the returned results are not sorted by approximation score.

    +------+--------------------------------------+----------------------+
    | id   | content                              | score                |
    +------+--------------------------------------+----------------------+
    |    5 | User Values and Commercial Values of Products               | 0.13076457381248474  |
    +------+--------------------------------------+----------------------+
    |    1 | Customers Need Better Products and Services               | 1.1090354919433594   |
    +------+--------------------------------------+----------------------+

    The MATCH(content) AGAINST('Wuhan') function in the SELECT projection is not necessarily the same as the MATCH(content) AGAINST('Products Services') function in the WHERE clause. The following sample statement queries the approximation score of the MATCH(content) AGAINST('Wuhan') function.

    SELECT *, MATCH(content) AGAINST('Wuhan') AS score
    FROM `tbl_fulltext_demo`
    WHERE MATCH(content) AGAINST('Products Services') > 0.9
    ORDER BY score DESC;

    Sample result:

    +------+--------------------------------+--------------------------------+-----------------------------+-------------------------------------+--------------------------------------+-------------+
    | id   | content                        | content_alinlp                 | content_ik                  | content_standard                    | content_ngram                        |score        |
    +------+--------------------------------+--------------------------------+-----------------------------+-------------------------------------+--------------------------------------+-------------+
    |    5 | User Values and Commercial Values of Products         | User Values and Commercial Values of Products         | User Values and Commercial Values of Products       | User Values and Commercial Values of Products              | User Values and Commercial Values of Products               |0.0          |
    +------+--------------------------------+--------------------------------+-----------------------------+-------------------------------------+--------------------------------------+-------------+
    |    1 | Customers Need Better Products and Services         | Customers Need Better Products and Services         | Customers Need Better Products and Services       |Customers Need Better Products and Services               |  Customers Need Better Products and Services              |0.0          |
    +------+--------------------------------+--------------------------------+-----------------------------+-------------------------------------+--------------------------------------+-------------+
  • Sort the results by approximation score.

    Use the ORDER BY clause to sort the results by approximation score in descending order.

    SELECT id, content, MATCH(content) AGAINST('Products Services') AS score
    FROM `tbl_fulltext_demo`
    WHERE MATCH(content) AGAINST('Products Services') > 0.9
    ORDER BY score DESC;

    Sample result:

    +------+--------------------------------------+---------------------+
    | id   | content                              | score               |
    +------+--------------------------------------+---------------------+
    |    1 | Customers Need Better Products and Services               | 1.1090354919433594  |
    +------+--------------------------------------+---------------------+
    |    5 | User Values and Commercial Values of Products               | 0.13076457381248474 |
    +------+--------------------------------------+---------------------+

Example 6: Perform word match

By default, full-text search segments a keyword into words before searching. If a keyword is not enclosed in double quotation marks (""), the query results that match any of the words contained in the keyword are returned.

The AliNLP analyzer segments the keyword "Products and Services" into the words "Products", "and", and "Services", and the keyword "Products Services" into the words "Products" and "Services".

SELECT id, content_alinlp, MATCH(content_alinlp) AGAINST('Products Services') AS score
FROM `tbl_fulltext_demo`
WHERE MATCH(content_alinlp) AGAINST('Products Services') > 0.9
ORDER BY score DESC;

The query results that match any of the words contained in the keyword "Products Services" are returned.

+------+--------------------------------------+---------------------+
| id   | content_alinlp                       | score               |
+------+--------------------------------------+---------------------+
|    1 | Customers Need Better Products and Services               | 0.5953410863876343  |
+------+--------------------------------------+---------------------+
|    5 | User Values and Commercial Values of Products               | 0.13076457381248474 |
+------+--------------------------------------+---------------------+

Example 7: Perform exact match

If a keyword is enclosed in double quotation marks (""), the keyword is not segmented by analyzers. The query results that exactly match the keyword are returned.

  • Search for content that exactly matches the keyword Products Services.

    SELECT id, content_alinlp, MATCH(content_alinlp) AGAINST('"Products Services"') AS score
    FROM `tbl_fulltext_demo`
    WHERE MATCH(content_alinlp) AGAINST('"Products Services"') > 0.9
    ORDER BY score DESC;

    An empty result set is returned.

    Empty set
  • Search for content that exactly matches the keyword Products and Services.

    SELECT id, content_alinlp, MATCH(content_alinlp) AGAINST('"Products and Services"') AS score
    FROM `tbl_fulltext_demo`
    WHERE MATCH(content_alinlp) AGAINST('"Products and Services"') > 0.9
    ORDER BY score DESC; 

    Sample result:

    +------+--------------------------------------+--------------------+
    | id   | content_alinlp                       | score              |
    +------+--------------------------------------+--------------------+
    |    1 | Customers Need Better Products and Services               | 0.8930116891860962 |
    +------+--------------------------------------+--------------------+

MATCH() FUZZY()

The MATCH() FUZZY() function supports fuzzy match. You can use this function to search for text based on the Levenshtein edit distance. In scenarios that involve incorrect keyword spelling, fuzzy match can help query content that is approximate to a keyword.

Syntax

SELECT * FROM `table_name` WHERE match(`column_name`) fuzzy('term') [max_edits(n)];

Parameters

  • table_name: the name of the table that you want to search.

  • column_name: the name of the column that you want to search.

  • term: the search keyword.

  • max_edits(n): the maximum edit distance, which is an optional parameter. This parameter specifies the minimum number of changes, such as insertion, deletion, and replacement, that are required to change String A to String B. Default value: 2. Valid values: integers from 0 to 2. For example, the windos string is changed to the windows string by performing a single operation of inserting the w character. In this case, the maximum edit distance between the windos and windows strings is 1.

Example 1: Perform approximate search

SELECT id, content
FROM tbl_fulltext_demo
WHERE MATCH(content_standard) FUZZY('hangzou');

Sample result:

+------+-----------------------------+
| id   | content                     |
+------+-----------------------------+
|    3 | Hangzhou, Zhejiang Province |
+------+-----------------------------+

Example 2: Perform approximate search with an edit distance of 1

SELECT id, content
FROM tbl_fulltext_demo
WHERE MATCH(content_standard) FUZZY('hangzou') max_edits(1);

Sample result:

+------+-----------------------------+
| id   | content                     |
+------+-----------------------------+
|    3 | Hangzhou, Zhejiang Province |
+------+-----------------------------+

Example 3: Perform approximate search with an edit distance of 2

SELECT id, content
FROM tbl_fulltext_demo
WHERE MATCH(content_standard) FUZZY('Wuhan Chang') max_edits(2);

Sample result:

+------+-----------------------+
| id   | content               |
+------+-----------------------+
|    2 | Wuhan Changjiang Bridge         |
+------+-----------------------+

MATCH() PHRASE()

The MATCH() PHRASE() function supports phrase search. You can use this function to search for content that matches multiple keywords from a specific column. If multiple keywords are matched, you can use the slop parameter to further match the sequence of keywords.

Syntax

SELECT * FROM `table_name` WHERE match(`column_name`) phrase('term1 term2') [slop(n)]

Parameters

  • table_name: the name of the table that you want to search.

  • column_name: the name of the column that you want to search.

  • term1 term2: the list of search keywords. Separate multiple keywords with spaces. The sequence of the keywords affects the match result.

  • slop(n): the maximum movement step, which is an optional parameter. After you use an analyzer to segment a text into a list of words, the positions of the words are marked from 0 in ascending order. You can move the words to match the list of keywords specified by the PHRASE parameter. Default value: 0. Valid values: integers from 0 to 6.

Example: Search for phrases

By default, AnalyticDB for MySQL provides the standard analyzer to segment the text "Hangzhou, Zhejiang Province" into the words "Hangzhou", "Zhejiang", and "Province".

  • Search for content that matches the list of keywords zhejiang hangzhou.

    SELECT id, content
    FROM tbl_fulltext_demo
    WHERE MATCH(content_standard) PHRASE('zhejiang hangzhou');

    The default value of the slop parameter is 0. No content matches the list of keywords zhejiang hangzhou. An empty result set is returned.

    Empty set
  • Search for content that matches the list of keywords hangzhou Province after moving one step.

    SELECT id, content
    FROM tbl_fulltext_demo
    WHERE MATCH(content_standard) PHRASE('hangzhou Province') slop(1);

    Sample result:

    +------+-----------------------------+
    | id   | content                     |
    +------+-----------------------------+
    |    3 | Hangzhou, Zhejiang Province |
    +------+-----------------------------+
  • Search for content that matches the list of keywords zhejiang hangzhou after moving two steps.

    SELECT id, content
    FROM tbl_fulltext_demo
    WHERE MATCH(content_standard) PHRASE('zhejiang hangzhou') slop(2);

    Sample result:

    +------+-----------------------------+
    | id   | content                     |
    +------+-----------------------------+
    |    3 | Hangzhou, Zhejiang Province |
    +------+-----------------------------+

Highlight support

Use a function to highlight keywords

AnalyticDB for MySQL allows you to use the fulltext_highlight(`column_name`) function to highlight keywords in full-text index columns.

Example 1: Use the MATCH() AGAINST() function to specify the keywords that you want to highlight

  • Highlight keywords when you perform a single-column query.

    SELECT MATCH(content_alinlp) AGAINST('Wuhan Changjiang') AS score, fulltext_highlight(content_alinlp)
    FROM tbl_fulltext_demo
    WHERE MATCH(content_alinlp) AGAINST('Wuhan Changjiang') > 0.9
    ORDER BY score DESC LIMIT 3;

    Sample result:

    +--------------------+-----------------------------------------+
    | score              | fulltext_highlight(content_alinlp)      |
    +--------------------+-----------------------------------------+
    | 0.2615291476249695 | <em>Wuhan</em> <em>Changjiang</em> Bridge         |
    +--------------------+-----------------------------------------+
  • Highlight keywords when you perform a multi-column query.

    SELECT MATCH(content_alinlp) AGAINST('Wuhan Changjiang') AS score, fulltext_highlight(content_alinlp)
    FROM tbl_fulltext_demo
    WHERE MATCH(content_alinlp) AGAINST('Wuhan Changjiang') > 0.9
    AND MATCH(content_alinlp) AGAINST('Bridge') > 0.9
    ORDER BY score DESC LIMIT 3;

    Sample result:

    +--------------------+--------------------------------------------------+
    | score              | fulltext_highlight(content_alinlp)               |
    +--------------------+--------------------------------------------------+
    | 0.2615291476249695 | <em>Wuhan</em> <em>Changjiang</em> <em>Bridge</em>         |
    +--------------------+--------------------------------------------------+

Example 2: Use the MATCH() FUZZY() function to specify the keywords that you want to highlight

SELECT id, MATCH(content_standard) FUZZY('Wuhan Chang') as score, fulltext_highlight(content_standard)
FROM tbl_fulltext_demo
WHERE MATCH(content_standard) FUZZY('Wuhan Chang');

Sample result:

+------+-------+--------------------------------------------------+
| id   | score | fulltext_highlight(content_standard)             |
+------+-------+--------------------------------------------------+
|    2 |   0.0 | <em>Wuhan</em> <em>Chang</em>jiang Bridge         |
+------+-------+--------------------------------------------------+

Example 3: Use the MATCH() PHRASE() function to specify the keywords that you want to highlight

SELECT id, MATCH(content_ik) PHRASE('Products Services') slop(1) as score, fulltext_highlight(content_ik)
FROM tbl_fulltext_demo
WHERE MATCH(content_ik) PHRASE('Products Services') slop(1);

Sample result:

+------+--------------------+--------------------------------------------------------+
| id   | score              | fulltext_highlight(content_ik)                         |
+------+--------------------+--------------------------------------------------------+
|    1 | 0.6931471824645996 | Customers Need Better <em>Products</em> and <em>Services</em>               |
+------+--------------------+--------------------------------------------------------+

Use a hint to add custom highlight tags

By default, AnalyticDB for MySQL uses the <em> and </em> tags to highlight keywords in full-text search results. You can also use a hint to specify the fulltext_highlight_pre_tag and fulltext_highlight_post_tag parameters to add custom highlight tags.

Example 1: Use a hint to add custom highlight tags based on the MATCH() AGAINST() function

/*+ fulltext_highlight_pre_tag=<span>,fulltext_highlight_post_tag=</span>*/
SELECT MATCH(content_alinlp) AGAINST('Wuhan Changjiang') AS score, fulltext_highlight(content_alinlp)
FROM tbl_fulltext_demo
WHERE MATCH(content_alinlp) AGAINST('Wuhan Changjiang') > 0.9
ORDER BY score DESC LIMIT 3;

Sample result:

+--------------------+-------------------------------------------------+
| score              | fulltext_highlight(content_alinlp)              |
+--------------------+-------------------------------------------------+
| 0.2615291476249695 | <span>Wuhan</span> <span>Changjiang</span> Bridge         |
+--------------------+-------------------------------------------------+

Example 2: Use a hint to add custom highlight tags based on the MATCH() FUZZY() function

/*+ fulltext_highlight_pre_tag=<span>,fulltext_highlight_post_tag=</span>*/
SELECT MATCH(content_alinlp) FUZZY('Wuhan Changjiang') AS score, fulltext_highlight(content_alinlp)
FROM tbl_fulltext_demo
WHERE MATCH(content_alinlp) FUZZY('Wuhan Changjiang') > 0.9
ORDER BY score DESC LIMIT 3;

Sample result:

+--------------------+-------------------------------------------------+
| score              | fulltext_highlight(content_alinlp)              |
+--------------------+-------------------------------------------------+
| 0.0                | <span>Wuhan</span> <span>Changjiang</span> Bridge         |
+--------------------+-------------------------------------------------+

Example 3: Use a hint to add custom highlight tags based on the MATCH() PHRASE() function

/*+ fulltext_highlight_pre_tag=<span>,fulltext_highlight_post_tag=</span>*/
SELECT MATCH(content_alinlp) PHRASE('Products and Services') AS score, fulltext_highlight(content_alinlp)
FROM tbl_fulltext_demo
WHERE MATCH(content_alinlp) PHRASE('Products and Services') > 0.9
ORDER BY score DESC LIMIT 3;

Sample result:

+--------------------+---------------------------------------------------------------+
| score              | fulltext_highlight(content_alinlp)                            |
+--------------------+---------------------------------------------------------------+
| 0.8930116891860962 | Customers Need Better <span>Products</span> <span>and</span> <span>Services</span> |
+--------------------+---------------------------------------------------------------+