This topic provides answers to some frequently asked questions about writes and queries in AnalyticDB for MySQL.
If no edition is specified in a question, the question applies only to AnalyticDB for MySQL Data Warehouse Edition clusters.
FAQ overview
How do I choose between XIHE MPP and XIHE BSP to run a job in a Data Lakehouse Edition cluster?
How do I check the status of a XIHE BSP job in a Data Lakehouse Edition cluster?
How do I isolate resources to reduce mutual impact among SQL jobs?
What do I do if a large number of conditions are specified for the IN operator?
What do I do if the "Query exceeded maximum time limit of 1800000.00ms" error is returned?
What do I do if an error occurs on the built-in AES_ENCRYPT() function?
Can I use .csv.gz files from OSS as external table data sources in AnalyticDB for MySQL?
Is the INSERT ON DUPLICATE KEY UPDATE statement supported in AnalyticDB for MySQL?
Can I use JOIN in UPDATE statements in AnalyticDB for MySQL?
What are the conditions for an AnalyticDB for MySQL cluster to load a built-in dataset?
How do I check whether a built-in dataset is loaded to an AnalyticDB for MySQL cluster?
How do I use a standard database account to use the built-in dataset feature?
How do I test an AnalyticDB for MySQL cluster after I load a built-in dataset?
Can I use JDBC to query the data of Hudi tables in a Data Lakehouse Edition cluster?
Yes, you can use JDBC to query the data of Hudi tables. After you create a Hudi table in a Data Lakehouse Edition cluster, you can use JDBC to query the data of the Hudi table.
Can I use a Data Lakehouse Edition cluster to read the data of Hudi tables that are stored in OSS?
Yes, you can use external tables to read the data of Hudi tables that are stored in OSS. For more information, see Use external tables to import data to Data Lakehouse Edition.
Can a Data Lakehouse Edition cluster automatically switch between XIHE MPP and XIHE BSP jobs?
When you submit a job, you must specify an interactive resource group or a job resource group to determine whether the job is a XIHE MPP or XIHE BSP job.
How do I choose between XIHE MPP and XIHE BSP to run a job in a Data Lakehouse Edition cluster?
By default, XIHE BSP asynchronously submits jobs. The only difference between synchronous submission and asynchronous submission is whether the client needs to wait for the query execution to complete.
The following limits are imposed on asynchronous submissions:
A result set can contain up to 10,000 rows of data.
The system can retain up to 1,000 result sets, including the download URL of CSV files, for up to 30 days.
We recommend that you asynchronously submit BSP jobs for queries that consume large amounts of computing power, and require a long period of time to complete, but return a small number of result sets. Examples: INSERT INTO SELECT
, INSERT OVERWRITE SELECT
, and CREATE TABLE AS SELECT
.
How do I check the status of a XIHE BSP job in a Data Lakehouse Edition cluster?
If you submit a XIHE BSP job in a Data Lakehouse Edition cluster by using a job editor, you can choose Job Editor > SQL Development and click the Execution Records tab to view the status of the job.
If you submit a XIHE BSP job without using a job editor, you can execute the following statement to query the status of the job from the internal memory table:
SELECT status FROM information_schema.kepler_meta_elastic_job_list WHERE process_id='<job_id>';
You can execute the following statement to query status statistics about XIHE BSP jobs:
SELECT status,count(*) FROM information_schema.kepler_meta_elastic_job_list GROUP BY status;
How do I isolate resources to reduce mutual impact among SQL jobs?
AnalyticDB for MySQL Data Warehouse Edition clusters in elastic mode for Cluster Edition and Data Lakehouse Edition clusters provide the resource group feature. For more information, see Resource group overview (Data Warehouse Edition) and Resource group overview (Data Lakehouse Edition). You can create different types of resource groups and submit SQL jobs to the corresponding resource groups to isolate resources.
What do I do if a large number of conditions are specified for the IN operator?
AnalyticDB for MySQL imposes a limit on the number of conditions that can be specified for the IN operator. The default value is 2000. You can modify the value based on your business requirements.
To ensure performance, do not specify more than 5,000 conditions for the IN operator.
For example, execute the following statement to set the number of conditions for the IN operator to 3000:
SET ADB_CONFIG max_in_items_count=3000;
What do I do if the "Query exceeded maximum time limit of 1800000.00ms" error is returned?
AnalyticDB for MySQL allows you to configure timeout periods for SQL queries. If the execution of a query does not end after 1,800,000 milliseconds, the preceding error is returned. You can execute the following statements to configure a timeout period for a single query or all queries in a cluster:
For a single query:
/*+ QUERY_TIMEOUT=xxx */SELECT count(*) FROM t;
For all queries in a cluster:
SET ADB_CONFIG QUERY_TIMEOUT=xxx;
For more information, see Config and hint configuration parameters.
What do I do if the "multi-statement be found" error is returned when I use the multi-statement feature to consecutively execute multiple SQL statements?
Only AnalyticDB for MySQL clusters of V3.1.9.3 or later support the multi-statement feature. You must check whether the minor version of the cluster meets the requirement. If the minor version is earlier than 3.1.9.3, contact technical support to update the version. If the minor version is later than 3.1.9.3 but the error persists, the multi-statement feature may be disabled on the client.
When you use a MySQL JDBC client to connect to an AnalyticDB for MySQL cluster, you must execute the SET ADB_CONFIG ALLOW_MULTI_QUERIES=true;
statement to enable the multi-statement feature and set the allowMultiQueries
property to true
.
What do I do if query results contain truncated time?
Check whether the time that is contained in query results is displayed as expected on the MySQL client. If the time is displayed as expected, check whether the query results are processed by other clients.
What do I do if an error occurs on the built-in AES_ENCRYPT() function?
An error may occur when you execute the following statement:
SELECT CONVERT(AES_DECRYPT(AES_ENCRYPT('ABC123','key_string'),'key_string'),char(10));
In this case, you must change the x parameter in the AES_ENCRYPT(NCRYPT(varbinary x, varchar y) statement to the VARBINARY type. Sample statement:
SELECT CONVERT(AES_DECRYPT(AES_ENCRYPT(CAST('ABC123' AS VARBINARY), 'key_string'), 'key_string'),char(10));
Why do query results suddenly change?
If no data is updated, a sudden change in query results may occur due to the following reasons:
No ORDER BY clause is used together with the LIMIT clause. AnalyticDB for MySQL uses a distributed architecture. If a query uses multiple threads, the query ends when the number of returned rows reaches the value that is specified by the LIMIT clause. Therefore, if no ORDER BY clause is used together with the LIMIT clause, query results are returned in a random order.
In a SELECT statement that uses a GROUP BY clause and aggregate functions, if the GROUP BY clause and aggregate functions do not contain a specific field, a random value is returned for the field.
If this issue persists, contact technical support.
Why does a query that involves sorting by using the ORDER BY clause on a single table consume an extended period of time?
The data is stored in a scattered manner and is not sorted at the storage layer. As a result, a significant amount of unnecessary data is read during the query, which increases the query duration.
We recommend that you create a clustered index on the field that is used in the ORDER BY clause. After you create a clustered index, data is initially sorted at the storage layer. When you perform a query that uses the ORDER BY clause, only a small amount of data needs to be read. This improves query performance. For information about how to create a clustered index, see the "Create a clustered index" section of the ALTER TABLE topic.
You can create only one clustered index for a table. Before you create a clustered index on the field that is used in the ORDER BY clause for a table that already has a clustered index on a different field, you must first delete the existing clustered index.
After you create a clustered index for a large table, the amount of time required to perform BUILD jobs increases, which affects the CPU utilization of storage nodes.
Why does the number of scanned rows for all tables in an execution plan differ from the total number of rows that are scanned by the query?
In most cases, this issue is caused by replicated tables. AnalyticDB for MySQL stores replicated tables on each shard and repeatedly counts the number of rows when the replicated tables are scanned.
Why is data duplicated when the INSERT OVERWRITE statement is used on an AnalyticDB for MySQL table that does not have a primary key?
Automatic deduplication is not supported for AnalyticDB for MySQL tables that do not have primary keys.
Why does the "Column 'XXX' not in GROUP BY clause" error occur when the SELECT * FROM TABLE GROUP BY KEY statement is executed?
Query results cannot display all fields when you use the GROUP BY clause. You must specify column names in your statement. Sample statement:
SELECT nation.name FROM nation GROUP BY nation.nationkey
Is a limit imposed on the number of values that are specified for the IN operator when query results are returned in the JSON format?
For AnalyticDB for MySQL clusters of V3.1.4 or earlier, the number of values that are specified for the IN operator cannot exceed 16. For AnalyticDB for MySQL clusters later than V3.1.4, no limit is imposed on the number of values that are specified for the IN operator. For information about how to query the minor version of a cluster, see the "How do I query the minor version of an AnalyticDB for MySQL cluster?" section of the Purchase, configuration change, and upgrade topic.
Can I use .csv.gz files from OSS as external table data sources in AnalyticDB for MySQL?
Yes, you can use .csv.gz files as external table data sources in AnalyticDB for MySQL. You must set the compress_type parameter to gzip for external tables. For information about the syntax of OSS external tables, see the "Non-partitioned OSS external tables" section of the Use external tables to import data to Data Warehouse Edition topic.
Is the INSERT ON DUPLICATE KEY UPDATE statement supported in AnalyticDB for MySQL?
Yes, the INSERT ON DUPLICATE KEY UPDATE statement is supported in AnalyticDB for MySQL. However, AnalyticDB for MySQL allows you to assign only values to variables in the INSERT ON DUPLICATE KEY UPDATE statement. You cannot assign expressions in this statement.
Can I use JOIN in UPDATE statements in AnalyticDB for MySQL?
You can use JOIN in UPDATE statements only for AnalyticDB for MySQL clusters of V3.1.6.4 or later. For more information, see UPDATE.
Can I set SQL variables in AnalyticDB for MySQL?
No, you cannot set SQL variables in AnalyticDB for MySQL.
Can I execute the INSERT ON DUPLICATE KEY UPDATE
statement to batch insert data by using the Logstash plug-in?
Yes, you can execute the INSERT ON DUPLICATE KEY UPDATE statement to batch insert data. When you execute the INSERT ON DUPLICATE KEY UPDATE
statement to batch insert data, you must add the ON DUPLICATE KEY UPDATE
clause after the last VALUES()
.
For example, execute the following statement to insert three rows of data into the student_course
table:
INSERT INTO student_course(`id`, `user_id`, `nc_id`, `nc_user_id`, `nc_commodity_id`, `course_no`, `course_name`, `business_id`)
VALUES(277943, 11056941, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--77', 'kuaiji'),
(277944, 11056943, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--88', 'kuaiji'),
(277945, 11056944, '1001EE1000000043G2T5', '1001EE1000000043G2TO', '1001A5100000003YABO2', 'kckm303', 'Industrial Accounting Practice V9.0--99', 'kuaiji')
ON DUPLICATE KEY UPDATE
course_name = 'Industrial Accounting Practice V9.0--77',
business_id = 'kuaiji';
What are the conditions for an AnalyticDB for MySQL cluster to load a built-in dataset?
To load a built-in dataset, an AnalyticDB for MySQL cluster must have at least 24 AnalyticDB compute units (ACUs) of reserved storage resources and at least 16 ACUs of computing reserved resources in the user_default resource group.
How do I check whether a built-in dataset is loaded to an AnalyticDB for MySQL cluster?
In the left-side navigation pane of an AnalyticDB for MySQL cluster, choose Job Development > SQL Development. On the SQL Development page, you can view the loading progress. If the icon is displayed before the dimmed Load Built-in Dataset button and the Databases and Tables tab displays the ADB_SampleData_TPCH database and the tables of the database, the built-in dataset is loaded.
What do I do if the system returns an error message that indicates loading failure or prolonged loading?
Execute the DROP TABLE table_name;
statement to delete all tables from the ADB_SampleData_TPCH database. Then, execute the DROP DATABASE ADB_SampleData_TPCH;
statement to delete the ADB_SampleData_TPCH database. After the ADB_SampleData_TPCH database is deleted, reload the built-in dataset.
How do I use a standard database account to use the built-in dataset feature?
The built-in dataset feature complies with the permission management rules of AnalyticDB for MySQL. Even if your cluster is loaded with a built-in dataset, standard database accounts do not have permissions on the ADB_SampleData_TPCH database. You must use a privileged account to grant the permissions to standard accounts by using the following statement:
GRANT select ON ADB_SampleData_TPCH.* TO <user_name>;
How do I test an AnalyticDB for MySQL cluster after I load a built-in dataset?
AnalyticDB for MySQL provides corresponding query scripts that you can use to test a cluster. After a built-in dataset is loaded, you can choose Job Development > SQL Development, click the Scripts tab, and then execute SQL statements. For more information about the SQL statements, see TPC-H test queries.
To ensure the integrity of the built-in dataset, we recommend that you execute only SELECT statements on the ADB_SampleData_TPCH database. If you execute other DML or DDL statements, data in the built-in dataset may be changed, and the dataset may fail to be loaded. In this case, you must delete the ADB_SampleData_TPCH database and reload the dataset.