All Products
Search
Document Center

ApsaraDB for ClickHouse:Release details of ApsaraDB for ClickHouse Enterprise Edition V24.2.2.16399

Last Updated:Aug 30, 2024

This topic describes the features released in ApsaraDB for ClickHouse Enterprise Edition V24.2.2.16399.

New features

  • The system.dns_cache table is supported to debug Domain Name System (DNS) issues.

  • The mergeTreeIndex table function is supported.

    • This function represents the content of indexes and marker files of MergeTree tables. It can be used for introspection.

    • Syntax:

      Note

      In the following SQL syntax, database.table is an existing table that uses the MergeTree engine.

      mergeTreeIndex(database.table, [with_marks = true])
  • The generate_series table function is supported. This function can be used to generate a table with an arithmetic progression of natural numbers.

  • The following four properties are added to StorageMemory (memory engine).

    Note

    You need to add the context property to MemorySink in the memory.md document to enable access to table parameter bounds.

    • min_bytes_to_keep

    • max_bytes_to_keep

    • min_rows_to_keep

    • max_rows_to_keep

  • The single-argument version for the merge table function is supported. The syntax is merge([<db_name>, ] <tables_regexp>).

  • The groupArrayIntersect aggregate function is supported.

  • The toMillisecond function is supported.

    Note

    This function returns values of the DateTime or DateTime64 type in seconds.

  • You can use an SQL statement to remove ZooKeeper nodes related to an empty partition.

    Syntax:

    ALTER TABLE <table_name> FORGET PARTITION partition;

    The following table describes the parameter in the preceding statement.

    Parameter

    Description

    Example

    table_name

    The name of the table.

    test

  • The DIFINER=<userName> syntax is provided for you to specify the view definer in a view or materialized view. The view definer has permissions to perform query and insert operations on the view without explicit authorization on the underlying table.

  • Topk and topkweighted modes are supported, which return the count of values and the errors.

  • File formats that are unknown in the file, s3, hdfs, url, and azureBlobStorage engines are automatically detected during schema inference.

  • The default compression algorithm changes from LZ4HC(3) to LZ4HC(2), which improves the query efficiency at the expense of less compression.

  • File name extensions are not case-sensitive. For example, Tsv, TSV, and tsv indicate the same file type.

  • The ATTACH PARTITION ALL statement is supported.

Management improvement

  • The ApsaraDB for ClickHouse server and ApsaraDB for ClickHouse Keeper dynamically adjust the memory soft limit based on the compute capacity units (CCUs). This removes the limitation of the small maximum memory supported by Keeper and prevents out-of-memory (OOM) issues to some extent.

  • The uncompressed cache is disabled.

  • The setting logic of the mark cache size is optimized to reduce the memory usage in some cases for elastic scaling.

  • The size of FileSystemCache can be dynamically adjusted based on the CCUs. In previous versions, the size was fixed to 100 GB.

  • More aggressive serverless policies are supported. This feature is in invitational preview.

  • By default, text_log is enabled to help you troubleshoot issues.

  • The system responsiveness to changes in memory usage is improved.

  • The packaged files are sorted to optimize cached download content and reduce network traffic to Object Storage Service (OSS) and the QPS amplification factor.

Improvement

  • Parallel and distributed processing are supported for the S3Queue table engine.

    Note
    • You can set s3queue_total_shards_num to implement distributed processing. Default value: 1.

    • The ordered processing mode is supported when you set s3queue_processing_threads_num.

    • Note: The s3queue_processing_threads_num (number of processing threads per shard) and s3queue_total_shards_num settings change the metadata storage method in ordered mode. The number of max_processed_file nodes equals the value of s3queue_processing_threads_num × s3queue_total_shards_num. Therefore, these settings must be the same across all shards and cannot be changed after one shard is created.

  • When you run the MODIFY COLUMN queries for materialized views, make sure that every column of internal tables exists.

  • The system.keywords table is supported. This table contains all keywords from the parsers to help you better perform fuzzy testing and syntax highlighting.

  • Parameterized views with analyzers are supported. The creation of parameterized views is not analyzed. The existing logic of parameterized views is refactored. The creation of parameterized views is not analyzed.

  • Ordinary databases cannot be created. Existing Ordinary databases can still be used.

  • When you delete a table, you must delete all zero copy locks related to the table and the directory that contains these locks.

  • The short-circuit feature is added to the dictGetOrDefault function.

  • Enumerations are allowed to be declared in the structure of an external table.

  • When you perform the ALTER COLUMN MATERIALIZE operation on a column with the DEFAULT or MATERIALIZED expression, the system writes the valid values.

    Note
    • For existing parts with default values, the default values are written. For existing parts with non-default values, the non-default values are written.

    • In earlier versions, default values are written for all existing parts.

  • The backoff logic, such as exponential backoff, is enabled, which reduces CPU utilization, memory usage, and the size of log files.

  • Lightweight deleted rows are considered during data merging.

    Note

    Lightweight deleted rows refer to data that is not actually removed from storage but instead marked as deleted. For more information about lightweight deleted rows, see The Lightweight DELETE Statement.

  • The Date32 type is supported in the T64 codec.

  • The Pull Request (PR) enables the reuse of HTTP and HTTPS connections in all scenarios, even when a 3xx or 4xx status code is returned.

  • You can add comments for all columns of system tables.

  • Virtual columns can be used in the PREWHERE clause.

    Note

    This optimization is worthwhile for non-constant virtual columns such as _part_offset.

  • Keeper improvement: You can set latest_logs_cache_size_threshold and commit_logs_cache_size_threshold to cache a certain number of logs in memory.

  • OSS generates keys to determine the capability of object removal, instead of using constant keys.

  • By default, floating-point numbers in exponential notation are not inferred.

  • Parentheses () can be used to enclose the ALTER operations.

    Note
    • Earlier versions do not support the new syntax. Therefore, using the new syntax may cause issues if both later and earlier ApsaraDB for ClickHouse versions are used in a single cluster.

    • By default, parentheses () are contained in formatted queries because the formatted ALTER operations are stored in some places, such as mutations, as metadata.

    • The new syntax supports some queries where ALTER operations end in a list. For example, ALTER TABLE x MODIFY TTL date GROUP BY a, b, DROP COLUMN c cannot be parsed properly by using the old syntax. ALTER TABLE x (MODIFY TTL date GROUP BY a, b), (DROP COLUMN c) can be executed as expected by using the new syntax.

  • The upgrade of Intel Quick Path Interconnect (QPI) improves the DEFLATE_QPL codec and fixes an error in the polling timeout mechanism that could cause concurrent processing issues.

  • The positional pread feature is supported in libhdfs3.

    Note

    If you want to call positional read in libhdfs3, use the hdfsPread function in hdfs.h. Example:

    tSize hdfsPread(hdfsFS fs, hdfsFile file, void * buffer, tSize length, tOffset position);
  • Stack overflow in parsers is checked even if you accidentally set max_parser_depth to a very high value.

    Note
    • For earlier versions, the server may crash if you set max_parser_depth to a very high value.

    • The default value of max_parser_depth is 1000. For more information, see the max_parser_depth section of the Core Settings topic.

  • The behavior of named collections created by using XML files and SQL statements in Kafka storage is unified.

    Note
    • Kafka storage has two types of parameters: storage parameters and librdkafka parameters.

    • Before optimization, the storage parameters are used in both cases. However, only named collections created by using XML files are loaded.

    • This optimization unifies the access method for named collections in Kafka storage and enables the use of new named collections created by using XML files without requiring a server restart.

  • If a universally unique identifier (UUID) is explicitly specified in the CREATE TABLE statement, the UUID can be used in replica_path.

    Example:

    CREATE TABLE x UUID 'aaaaaaaa-1111-2222-3333-aaaaaaaaaaaa' (key Int) ENGINE = ReplicatedMergeTree('/tables/{database}/{uuid}', 'r1') ORDER BY tuple();
  • The metadata_version column of ReplicatedMergeTree tables is added to the system.tables system table.

  • Keeper improvement: A retry mechanism is added for failed disk-related operations.

  • If StorageBuffer has multiple shards (the value of num_layers is greater than 1), background flush will happen simultaneously for all shards in multiple threads.

  • The short-circuit feature is added to the ULIDStringToDateTime function.

  • The performance of ApsaraDB for ClickHouse is optimized when no transactions are active. In this case, ApsaraDB for ClickHouse no longer reports the INVALID_TRANSACTION exception and does not throw any exceptions, which is similar to MySQL.

  • The none_only_active mode is supported for distributed_ddl_output_mode.

  • If you use the MySQL port to connect to ApsaraDB for ClickHouse, the value of prefer_column_name_to_alias is set to 1 by default. mysql_map_string_to_text_in_show_columns and mysql_map_fixed_string_to_text_in_show_columns are also enabled by default. This increases compatibility with Business Intelligence (BI) tools.

  • The substring function has a new alias byteSlice.

  • SHOW INDEX | INDEXES | INDICES | KEYS no longer sorts results by the primary key columns.

  • Keeper improvement: The startup process is aborted if invalid snapshots are detected during service startup. This can prevent data loss.

  • String types and enumerations can be used in the same context, such as arrays, UNION queries, and conditional expressions.

  • A flag is added for sort-merge join (SMJ) to treat null values as the maximum or minimum value. This makes ApsaraDB for ClickHouse compatible with other SQL systems, such as Apache Spark.

  • parallel_replicas_allow_in_with_subquery = 1 is supported, which allows IN subqueries to work with parallel replicas.

  • DNSResolver shuffles the set of resolved IP addresses.

  • By default, processor profiling is enabled to track the time spent on sorting and aggregation operations as well as the input and output bytes.

  • The toUInt128OrZero function is supported. The compatibility aliases FROM_UNIXTIME and DATE_FORMAT are not case-sensitive.

  • Access checks are improved to allow you to revoke permissions that are not granted to a user.

    Example:

    GRANT SELECT ON . TO user1;
    REVOKE SELECT ON system.* FROM user1;
  • The break statement is removed to ensure that the first filtered column has the minimum length.

  • The compatibility between the has() function and the Nullable columns is fixed.

  • The VMMaxMapCount and VMNumMaps asynchronous metrics are added for virtual memory mappings.

  • The temporary_files_codec setting is used in all cases when you create temporary data, such as external memory sorting and external memory GROUP BY. Previously, the setting was only used in the partial_merge JOIN algorithm.

  • The sharded mode of the Amazon Simple Storage Service (Amazon S3) queues is not allowed because this configuration will be rewritten.

  • Some duplicate entries in blob_storage_log are deleted.

  • The current_user function is supported and used as a compatibility alias for MySQL.

Fixed issues

  • The following issue is fixed: Queries are unexpectedly terminated due to memory statistics errors.

  • The following issue is fixed: An exception is reported when you execute the first DDL statement after your cluster is restarted.

  • The error that occurs when the intDiv function processes parameters of the decimal type is fixed.

  • The Kusto Query Language (KQL) issues found by the WINGFUZZ platform are fixed.

  • The Read beyond last offset error of AsynchronousBoundedReadBuffer is fixed.

  • The following issue is fixed: Acknowledgment (ACK) and Negative Acknowledgment (NACK) messages are not returned when the communication between RabbitMQ and ApsaraDB for ClickHouse is abnormal. Currently, a NACK message is returned when an exception occurs during the read and write phases.

  • The following issue is fixed: The analysis result fails to be returned when QueryAnalyzer analyzes a query that contains a GROUP BY clause with a constant of the LowCardinality type.

  • The scale conversion for DateTime64 values is fixed.

  • The escaping of single quotation marks (') in the INSERT into SQLite statement is fixed. Single quotation marks (') are escaped by using single quotation marks (') instead of backslashes (\).

  • The following issue is fixed: ApsaraDB for ClickHouse cannot identify column aliases and reports errors.

  • The finished_mutations_to_keep=0 setting of MergeTree is fixed. The value 0 indicates that all content is kept.

  • The exception that may occur when you delete an S3Queue table is fixed.

  • The following issue is fixed: PartsSplitter produces invalid read ranges if the primary key value for the end range of one part is equal to the primary key value for the start range of another part.

  • max_query_size from context is used in DDLLogEntry to replace the hardcoded 4096.

  • The issue of inconsistent query formats is fixed.

  • The issue of inconsistent formats of the EXPLAIN statement in subqueries is fixed.

  • The following issue is fixed: The cosineDistance function crashes when the function handles nullable data types.

  • You can convert string representations of boolean values to actual boolean values.

    Note

    You can execute statements similar to SELECT true = 'true'. Previously, exceptions would occur if you executed such statements.

  • The system.s3queue_log system table is fixed.

    Note

    The unpopulated column table_uuid in the system.s3queue_log system table is fixed. The database and table columns are added. The table_uuid column is renamed to uuid.

  • The following issue is fixed: The error Bad cast from type DB::ColumnVector<double> to DB::ColumnNullable may be reported when you use the arrayReduce function.

  • The following issue is fixed: Preliminary filters such as primary key checks and partition pruning do not take effect when multiple INSERT operations are performed.

  • The sensitive information about S3Queue tables is hidden.

  • Replace ORDER BY ALL by ORDER BY * is restored.

  • The issues related to endpoints and prefixes in Azure Blob Storage are fixed.

  • The HTTP exception codes are fixed.

  • The LRUResource Cache (Hive cache) bug is fixed.

  • The bug of S3Queue and the flaky test test_storage_s3_queue/test.py::test_shards_distributed are fixed.

  • The following issue is fixed: Uninitialized values are used in the IPv6 hash function to generate invalid results.

  • Re-analysis is forcefully executed if parallel replicas change.

  • The usage of plain metadata types when new disk configuration options are introduced is fixed.

  • max_parallel_replicas cannot be set to 0 because this configuration does not make sense.

  • Efforts are taken to try to fix the logic error Cannot capture column because it has incompatible type in mapContainsKeyLike.

  • The issue of OptimizeDateOrDateTimeConverterWithPreimageVisitor with empty parameters is fixed.

  • Efforts are taken to try to avoid the calculation of scalar subqueries for CREATE TABLE.

  • The keys in s3Cluster are correctly checked.

  • The deadlock that occurs during parallel parsing when a large number of rows are skipped due to errors is fixed.

  • The issue with KQL compound operators, such as mv-expand, that occurs when you set max_query_size is fixed.

  • Keeper fix: Timeouts are added for waiting for commit logs.

  • The number of rows read from system.numbers is reduced.

  • Number tips are not output for date types.

  • The following issue is fixed: When you use non-deterministic functions in filter conditions to read data from a MergeTree table, the result set is incorrect.

  • The logic error caused by an incorrect value type in the compatibility settings is fixed.

  • The following issue is fixed: The status of aggregate functions is inconsistent in hybrid x86–64 and ARM Kubernetes clusters.

  • The issue of Pipelined Relational Query Language (PRQL) is fixed to implement a more robust panic handler.

  • The following issue is fixed: The program crash occurs when the parameters of the intDiv function are of the decimal and date/time types.

  • The following issue is fixed: Exceptions occur when common table expressions (CTEs) are used in the ALTER TABLE ... MODIFY QUERY statement.

  • system.parts in non-atomic or simple database engines, such as the in-memory engine, is fixed.

  • The error Invalid storage definition in metadata file of parameterized views is fixed.

  • The buffer overflow in CompressionCodecMultiple is fixed.

  • The meaningless content of SQL or JSON data is deleted.

  • The invalid cleanup checks for the quantileGK aggregate function are deleted.

  • The following issue is fixed: Duplicate data is unexpectedly deleted when the insert_deduplication_token custom parameter is used in queries.

  • The following issue is fixed: Custom metadata headers that are not supported are added when you call multipart upload operations.

  • The toStartOfInterval function is fixed.

  • The crash in the arrayEnumerateRanked function is fixed.

  • The crash that occurs when the input() function is used in the INSERT SELECT JOIN statement is fixed.

  • The crash that occurs when different values of allow_experimental_analyzer exist in subqueries is fixed.

  • Recursion is removed when data of Amazon S3 buckets is read.

  • The potential stuck issue in case of error in HashedDictionaryParallelLoader is fixed.

  • The asynchronous restore mechanism for replicated databases is fixed.

  • The deadlock that occurs when data is asynchronously inserted into the Log table through the local protocol is fixed.

  • The delayed execution of default parameters in dictGetOrDefault for RangeHashedDictionary is fixed.

  • Multiple errors in groupArraySorted are fixed.

  • The reconfigurations of standalone binary files for Keeper are fixed.

  • The usage of session_token in the S3 table engine is fixed.

  • The following issue is fixed: The uniqExact aggregate function may produce incorrect results.

  • The error of the database display is fixed.

  • The logic error in RabbitMQ storage where materialized columns are used is fixed.

  • CREATE OR REPLACE DICTIONARY is fixed.

  • The ATTACH queries with the external ON CLUSTER option are fixed.

  • The issue of directed acyclic graph (DAG) splitting is fixed.

  • The following issue is fixed: Failed RESTORE commands cannot be stopped as expected.

  • You can disable async_insert_use_adaptive_busy_timeout with compatibility settings as expected.

  • Queuing in restoration pools are allowed.

  • The following issue is fixed: An error is reported when data is read from system.parts by using a UUID.

  • The crash in window views is fixed.

  • The repeat issue that occurs when non-native integers are used is fixed.

  • The -s parameter of the client is fixed.

  • The program crash caused by the usage of the arrayPartialReverseSort function is fixed.

  • The search performed within a string from a constant position is fixed.

  • The following issue is fixed: An error caused by addDays occurs when DateTime64 is used.

  • The following issue is fixed: Duplicate data exists in system.part_log when data is asynchronously inserted.

  • The non-ready state of system.parts is fixed.

  • The issue of potential junk data and data loss in SharedMergeTree is fixed.

  • Distributed cache metrics can be retrieved from all available regions.

  • The crash caused by the usage of both DISTINCT and window functions is fixed.

  • DistrCache improvement: More events related to configuration files are introduced and more detailed and clear error messages are returned.

  • The authorization issue of the default database for a cluster is fixed.

  • The following issue is fixed: MemoryTrackerSwitcher fails to correctly track certain memory usage.

  • The following issue is fixed: Queries are unexpectedly terminated due to memory statistics errors.

  • The cross-subquery issue in OrderByLimitByDuplicateEliminationVisitor is fixed.

  • The backward incompatibility issue in TTL execution is fixed.

  • Connections and connection pools in distributed caches are improved.

  • The following issue is fixed: If TTL is set for the Replicated database engine, a crash may occur.

  • Duplicate naming in the new input generated as the result of ActionsDAG::split is avoided.

  • Analyzer: The issue of RewriteAggregateFunctionWithIfPass is fixed.

  • The CREATE TABLE AS queries with default expressions are fixed.

  • Analyzer: The size validation issue of the query tree is fixed.

  • The alias issue related to GLOBAL IN is fixed.

  • Random settings are not used for 02228_merge_tree_insert_memory_usage.

  • Analyzer: The issue of AggregateFunctionsArithmeticOperationsPass is fixed.

  • The alias issue related to INTERPOLATE in remote queries configured with analyzers is fixed.

  • The special macros {uuid} and {database} are allowed in the ZooKeeper path of a replicated database.

  • Exception messages are updated.

  • The weak pointer of ContextAccess is captured to ensure security.

  • The error that occurs during the short-circuit of hash dictionary operations is fixed.

  • The crashes of UniqInjectiveFunctionsEliminationPass and uniqCombined are fixed.

  • The initialization order for ServerUUID and ZooKeeper is fixed.

  • Continuous integration (CI): AArch build is added to the backport workflow.

  • The issue of getFileLastModified in packed part storage is fixed.

  • The issue of restoring from backup for definers is fixed.

  • The test verifying that totalqpslimitexceeded is a retryable S3 error is manually backported.

  • The following issue is fixed: Insertion is interrupted due to the OSS QPS limit.

  • The MySQL dictionary source is fixed.

  • Efforts are taken to try to fix the segmentation fault (SIGSEGV) in MergeTreeReadPoolBase::createTask.

  • jwcrypto is added to the integration test runner.

  • MergeTreePrefetchedReadPool becomes safer.

  • The following rare error is fixed: After an ALTER operation, SELECT queries may fail and the error Unexpected return type from materialize. Expected type_XXX. Got type_YYY. is reported.

  • The 02362_part_log_merge_algorithm flaky test is fixed.

  • The following issue is fixed: Data is lost when data is inserted into SharedMergeTree.

  • The setting of not removing server constants from the GROUP BY key for secondary queries is removed.

  • The capture of nested lambda is fixed.

  • Statistics are not collected if the userspace page cache is not used.

  • The analyzer reads only the necessary columns.

  • The following issue is fixed: extra_credentials is missed after StorageS3 restarts.

  • S3 reads are properly canceled when parallel reads are used.

  • test_odbc_interaction for ARM64 on Linux is fixed.

  • S3Queue: Failed files in tracked_file_ttl_sec and traked_files_limit are accounted.

  • Checksums for components are checked only before removal.

  • The issue of invalid columns in some merge joins is fixed.

  • The issue of scalars in the CREATE AS SELECT statement is fixed.

  • test_short_strings_aggregation for ARM is fixed.

  • test_disk_types for AArch64 is fixed.

  • test_catboost_evaluate for AArch64 is fixed.

  • The redundant DISTINCT is removed from the window functions.

  • The issue of flatten_nested in replicated databases is fixed.

  • test_non_default_compression/test.py::test_preconfigured_deflateqpl_codec is disabled for ARM.

  • 02124_insert_deduplication_token_multiple_blocks is fixed.

  • The following issue is fixed: After a range is deleted, intersected data parts exist after the service restarts.

  • The following issue is fixed: Incorrect result is generated when a new analyzer and parallel replicas are used to read data from a materialized view.

  • When a DDL task fails for more than the number of consecutive times specified by the max_retries_before_automatic_recovery parameter, the system automatically marks the database replica as lost and starts the recovery program. In addition, the following issue is fixed: If an error occurs during the execution of the DDL task, the DDL task may be skipped.

  • The analyzer issue that occurs when the IN function is used within materialized views with nested subqueries of arbitrary depth is fixed.

  • The issues of the find_super_nodes and find_big_family commands for keeper-client are fixed.

  • The execution name of lambda is updated.

  • The following issue is fixed: New versions are automatically created in the release branch.

  • The untrusted binary input data is deserialized in a safer manner.

  • The following issue is fixed: During backup and restoration, components still have the projection after the projection is removed from the table metadata.

  • Array(Nothing) can be converted to Map(Nothing, Nothing).

  • The issue of COLUMNS for the analyzer is fixed.

  • Locks are acquired before current_parts.size() and future_parts.size() are accessed.

  • text_log of Keeper is ignored.

  • The logic error that occurs when CREATE TABLE AS MaterializedView is used is fixed.

  • The following issue is fixed: For tables that do not use adaptive granularity, incorrect results are generated when queries with FINAL are performed.

  • The logic error that occurs when quorum insert transactions are revoked is fixed.

  • The SQL security access checks in case of using analyzers are fixed.

  • Query cache: The identical queries against different databases are considered different queries.

  • The error in select_sequential_consistency is fixed.

  • Analyzer fix: Only interpolation expressions can be used for DAG.

  • InterpreterCreateQuery.cpp is updated.

  • The logic error that occurs when the prewhere clause is used for the Buffer table is fixed.

  • CI: Keys are added to the reusable stage workflow yml.

  • The following issue is fixed: After a commit fails due to concurrent updates, the file system (FS) metadata cache becomes invalid.

  • The SIGSEGV triggered by the CPU or real profiler is fixed.

  • Correct fallback is implemented during the copy backup.

  • enable_vertical_final is disabled.

  • The AddressSanitizer (ASan) image is optimized.

  • qps_limit_exceeded is tested.

  • The following issue is fixed: The program aborts because ~WriteBufferFromFileDescriptor does not capture exceptions in StatusFile.

  • The issue of early constant folding for isNull or isNotNul and analyzers is fixed.

  • The error in ApsaraDB for ClickHouse that causes a digest mismatch during session closing is fixed.

  • The certificate chain is reloaded during certificate reload.

  • Empty parts in SharedMergeTree are cleared after the specified TTL expires.

  • The issues about SQL security that occur when ALTER statements are used for replication are fixed.

  • An explicit shutdown is added to fix the crash that occurs when AccessControl is destroyed.

Performance improvement

  • The min, max, any, and anyLast aggregators for the GROUP BY keys in SELECT are eliminated.

  • The performance of the serialized aggregation method in case of multiple nullable columns is improved.

  • The performance of all joins is improved through the output delay of join building.

  • The ArgMin, ArgMax, any, anyLast, and anyHeavy aggregate functions, as well as ORDER BY (using u8, u16, u32, u64, i8, i16, i32, or i64) LIMIT 1 are improved. u8, u16, u32, and u64 are unsigned integers. i8, i16, i32, and i64 are signed integers.

  • The performance of conditional SUM and AVG operations for BIGINT and BIG DECIMAL types is optimized by reducing branch mispredictions.

  • The performance of SELECT queries with active mutations is improved.

  • The details of column filtering are optimized to prevent columns whose underlying data type is not Number from being filtered by using result_size_hint = -1. In some cases, the peak memory usage can be reduced to 44% of the original.

  • Less memory is used by primary keys.

  • The memory usage of primary keys and other operations is improved.

  • The primary keys of tables will be loaded lazily upon first access.

    Note

    The lazy loading of primary keys is implemented by using the primary_key_lazy_load setting of MergeTree. By default, the setting is enabled. The following section describes the advantages and disadvantages of the setting:

    • Advantages:

      • The primary keys of unused tables are not loaded.

      • If memory is insufficient, an exception will be thrown upon first use rather than at server startup.

    • Disadvantages:

      • The lazy loading of primary keys occurs on the first query, rather than before connections are accepted.

      • Errors may occur in case of a sudden influx of requests.

  • The vectorized function dotProduct facilitates vector search.

  • If the primary key of a table contains mostly useless columns, do not keep these columns in memory.

    Note
    • This can be implemented by setting primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns.

    • Default value: 0.9. This indicates that for a composite primary key, if a column changes its value at least 90% of the time, the subsequent columns will not be loaded.

  • If the multiIf function is expected to return a numeric result, the system uses columnar execution to optimize the function call. This indicates that the system will attempt to concurrently process data in an efficient manner by column when functions like multiIf are executed and a large amount of data exists.

  • The bitwise AND (&) operator replaces the logical AND (&&) operator. This modifies the implementation of the filter combination process and allows the filter combination to be automatically vectorized by the compiler. This optimization is an attempt to improve the execution performance of database queries by using vectorized operations. This way, the features of processors can be used to improve performance.

  • The performance of mutex in ThreadFuzzer is optimized to achieve a 100% improvement.

  • The performance issue that occurs because connections in distributed queries cannot be executed concurrently is fixed.

  • The performance loss that occurs when insertManyFrom repeatedly calls insertFrom is fixed.

  • The dotProduct function is optimized to omit unnecessary and expensive memory copies.

  • Lock contention is reduced by using file system caching operations.

  • ColumnString::replicate is optimized and memcpySmallAllowReadWriteOverflow15Impl is prevented from being replaced by built-in memcpy. This optimization has increased the speed of ColumnString::replicate by 2.46 times on x86-64 architectures.

  • The printing speed for 256-bit integers has increased by 30 times.

  • The parser logic is optimized to fix the following issue: When queries with syntax errors contain COLUMNS matchers with regular expressions, multiple compilations are triggered, which affects parsing speed.