Hologres provides the dynamic table feature, which serves as a declarative data processing architecture. You can use the dynamic table feature to implement automatic processing and storage of data aggregation results of one or more base tables. The dynamic table feature provides different built-in data refresh policies. You can configure data refresh policies based on your business requirements to implement automatic data forwarding from base tables to dynamic tables. This meets the requirements of centralized business development, automatic data forwarding, and timeliness in business processing.
Background information
In real-time data warehousing scenarios, complex processing of business data, such as multi-table join queries and large-table aggregation queries, is often involved. The following information describes the requirements on business timeliness in different scenarios:
Real-time scenarios, such as risk control and recommendation: Result output within seconds or milliseconds is required.
Near real-time scenarios, such as real-time reports and business intelligence-based data viewing: A latency of minutes is allowed.
Offline scenarios, such as periodic reports and historical data query: The query frequency is low. A latency of hours is allowed.
Join queries may be required. In this case, data standard consistency must be strictly ensured.
To meet the requirements of low business resource costs, improved development efficiency, and high business timeliness, the industry has conducted R&D and evolution in architectures, such as the early Lambda architecture and integrated streaming and batch processing architecture. Although these efforts have resolved specific business and development issues, the following issues persist:
Architecture: Different products in the current market are combined to meet the timeliness requirements in different business scenarios. You cannot use a single product to meet all business requirements.
Data extract, transform, and load (ETL) operations: No clear methodology is available for data processing from the data detail layer to the application layer in data warehousing. As a result, automatic data forwarding at low costs cannot be achieved, resource costs are high, and the development efficiency is low.
To address the preceding issues, Hologres provides the dynamic table feature. The dynamic table feature supports full and incremental data processing modes. This helps implement automatic data forwarding and layering in a more efficient and cost-effective manner. You can use the dynamic table feature together with the other features provided by Hologres to build a unified storage layer, a unified computing layer, and a unified data service layer to meet the requirements on the development efficiency and timeliness.
Benefits
Simplified data warehouse architecture
The dynamic table feature supports multiple data refresh modes to achieve different levels of latencies. This meets the query requirements for timeliness-sensitive business. Hologres supports unified real-time and offline data storage, which can be used with the dynamic table feature to meet the query requirements in different application scenarios, such as online analytical processing (OLAP), online services, and AI and big models. This effectively simplifies the data warehouse architecture and reduces development and O&M costs.
Automatic data layering
Data can be automatically refreshed by using the dynamic table feature. This helps implement automatic data forwarding between data layers in the following order and improve user experience in data layering: operational data store (ODS) > data warehouse detail (DWD) > data warehouse service (DWS) > application data service (ADS).
Improved ETL efficiency
Full and incremental data refresh modes are supported to meet different timeliness requirements. Only incremental data is processed in incremental data refresh mode. This reduces the amount of data that is computed in the ETL process and significantly improves data processing efficiency.
Simplified development and maintenance
Refresh tasks as well as the hierarchies and dependencies between data are automatically managed. This simplifies development and O&M processes and improves development efficiency.
Terms
base table
The table that serves as the data source of a dynamic table. A base table can be a single internal or foreign table, or an association of multiple tables. The supported base table types vary based on the data refresh mode. For more information, see Application scope and limits of dynamic tables.
query
The query specified when you create a dynamic table. A query is equivalent to an ETL process used to process data in base tables. The supported query types vary based on the data refresh mode. For more information, see Application scope and limits of dynamic tables.
refresh
The refresh operation that is performed to update data changes in base tables to a dynamic table. A refresh task is automatically run in the background based on the specified start time of a refresh operation and the refresh interval. For more information about how to monitor and maintain a refresh task, see Maintain refresh tasks for dynamic tables.
How it works
A refresh task is used to write data in base tables to a dynamic table based on the data processing process defined in the query in the dynamic table. This section describes partial technical principles of the dynamic table feature in the aspects of data refresh mode, computing resource, data storage, and table index.
Refresh modes
The dynamic table feature supports full and incremental data refresh modes. The technical principles of the dynamic table feature vary based on the data refresh mode.
Full data refresh
In this mode, full data is processed each time the refresh operation is performed, and data aggregation results of base tables are materialized and written to a dynamic table. The technical principles in this mode are similar to those of the INSERT OVERWRITE statement.
Incremental data refresh
In this mode, only incremental data in base tables is read each time the refresh operation is performed, and final results are calculated based on the intermediate aggregation status and incremental data and written to a dynamic table. Compared with the full data refresh mode, the incremental data refresh mode is used to process a smaller amount of data each time. This is more efficient and helps improve the timeliness of refresh tasks and reduce the consumption of computing resources.
How it works
After you create a dynamic table in incremental data refresh mode, the system creates a column-oriented state table at the underlying layer to store the intermediate aggregation status of the query. The Hologres engine optimizes the intermediate aggregation status in terms of coding and storage to accelerate reading and updating of the intermediate aggregation status. The incremental data is aggregated in the memory in micro-batches and merged with the data in the state table. Then, the latest aggregation results are efficiently written to the dynamic table in bulkload mode. The method of processing incremental data in micro-batches reduces the amount of data to be processed in a single refresh operation and significantly improves the timeliness in computing.
Precautions
Specific limits are imposed on supported base tables in incremental data refresh mode. For more information, see Application scope and limits of dynamic tables.
In incremental data refresh mode, you must enable binary logging for base tables to detect data changes in the base tables. For more information about how to enable binary logging, see Subscribe to Hologres binary logs.
The built-in state table for incremental data refresh occupies a certain amount of storage. The system periodically cleans data based on the configured time to live (TTL). You can use functions to view the storage size of the state table. For more information, see the Manage state tables section of the "Query the structure and lineage of a dynamic table" topic.
Computing resources
The computing resources used to run a refresh task can be the resources of the current instance or serverless computing resources.
Resources of the current instance: If you use the resources of the current instance to run a refresh task, the refresh task shares resources with other tasks in the instance. In this case, tasks may compete for resources during peak hours.
Serverless computing resources: By default, serverless computing resources are used to run refresh tasks. If a query is complex and a large amount of data needs to be processed, you can use serverless computing resources to effectively enhance the stability of refresh tasks and prevent competition for resources with other tasks in the current instance. In addition, you can modify the computing resources for a single refresh task to use serverless computing resources in a more efficient manner.
Data storage
Dynamic tables are the same as standard tables in terms of data storage. By default, the hot storage mode is used. To reduce storage costs, you can set the storage mode of data that is less frequently queried to cold storage.
Table index
In data query scenarios, you can directly query dynamic tables, which is equivalent to directly querying aggregation results. This can significantly improve the query performance. Similar to a standard table, you can configure a table index, such as row or column store, a distribution key, or a clustering key, for a dynamic table. In most cases, the Hologres engine can derive a required index based on the query of a dynamic table. If further tuning is required, you can reconfigure an index for the dynamic table to significantly improve the query performance.
Comparison with a materialized view
Comparison between a dynamic table and a Hologres real-time materialized view
Hologres V1.3 provides the feature of managing materialized views by using SQL statements. However, the data processing capabilities provided by the feature are relatively weak. The following table describes the differences between a Hologres dynamic table and a Hologres real-time materialized view.
Feature | Hologres dynamic table | Hologres real-time materialized view |
Base table type |
| Single internal table |
Operations that can be performed on a base table |
| Data write in appending mode |
Refresh principle | Asynchronous refresh (full or incremental data refresh) | Synchronous refresh |
Refresh timeliness |
| Real time |
Query type |
Note The supported query types vary based on the data refresh mode. For more information, see Application scope and limits of dynamic tables. | Limited operators, such as aggregation functions and RB functions |
Query mode | Directly querying the dynamic table |
|
Comparison between a dynamic table and an asynchronous materialized view
In the current market, asynchronous materialized views of OLAP products and dynamic tables of Snowflake provide similar features as dynamic tables in Hologres. The following table describes the differences.
Feature | Hologres dynamic table | Asynchronous materialized view of an OLAP product | Snowflake dynamic table |
Base table type |
Note The supported query types vary based on the data refresh mode. For more information, see Application scope and limits of dynamic tables. |
|
|
Refresh mode |
|
|
|
Refresh timeliness |
| Hour-level |
|
Query type |
Note The supported query types vary based on the data refresh mode. For more information, see Application scope and limits of dynamic tables. |
|
|
Query mode | Directly querying the dynamic table |
| Directly querying the dynamic table |
Observation or O&M |
| Various metrics | Visualized UI |
Scenarios
The dynamic table feature can be used to implement automatic data processing and storage. The dynamic table feature helps accelerate data queries and improve the business timeliness. We recommend that you use the dynamic table feature in lakehouse acceleration and data layering scenarios.
Lakehouse acceleration
If you use the dynamic table feature, data in base tables can be from Hologres tables, data warehouses such as MaxCompute, or data lakes such as Object Storage Service (OSS) and Paimon. You can perform full or incremental refresh on data in base tables to meet timeliness-specific data query and exploration requirements. Recommended scenarios:
Periodic report query
In periodic observation-related scenarios, such as periodic report queries, you can use the full or incremental data refresh mode to periodically refresh the aggregation and analysis results of data in lakehouses to a dynamic table if the data volume is small or the query is not complex. The application side directly queries the dynamic table to obtain analysis results. This accelerates report queries.
Real-time dashboard or report
Scenarios such as real-time dashboards and real-time reports require higher data timeliness. In such scenarios, we recommend that you use the incremental data refresh mode to refresh the aggregation and analysis results of data in a Paimon table or real-time data to a dynamic table to accelerate real-time data processing. The application side directly queries the dynamic table to obtain data analysis results for near real-time analysis.
Data layering
If a base table contains a large amount of data, and complex ETL processing is required to meet the timeliness requirements in business, a common approach is data layering. In data warehousing scenarios, multiple schemes such as materialized views and periodic scheduling are provided in the industry for data layering. These schemes can be used to resolve specific issues. However, issues such as low data timeliness and inconvenient development persist. The dynamic table feature of Hologres provides automatic data processing capabilities. You can use the dynamic table feature for data layering.
Recommended method for data layering:
Use the dynamic table feature in Hologres to build data layers DWD, DWS, and ADS.
Use the incremental data refresh mode for data synchronization between data layers. This ensures a smaller amount of data to be processed at each layer, reduces unnecessary, repeated computing, and accelerates data synchronization. You can also use the Serverless Computing feature to run a refresh task based on business requirements to further improve the timeliness and stability of the refresh task.
Use the full data refresh mode if you want to refresh data at each layer to ensure data standard consistency at each layer. You can also use the Serverless Computing feature to run a refresh task based on business requirements to further improve the timeliness and stability of the refresh task.
Build each data layer in Hologres. Data layers are clear and queries can be performed at each layer based on business requirements. This ensures data visibility and reusability.
You can use the dynamic table feature in Hologres to complete data processing and application. This significantly improves the data warehouse development and O&M efficiency.