Optimize dataset performance - - Alibaba Cloud Documentation Center

Background information

The dataset cache feature can accelerate report access and reduce database queries. After you enable the cache feature for a dataset and a user queries a report that is generated based on the dataset, the queried data is cached. When other users attempt to query reports within the specified cache duration, Quick BI first requests data from the cache. If the cache does not contain the requested data, Quick BI requests the data from the database and then caches the data.

Specify a cache duration for a dataset

When you enable the cache feature for a dataset, you must specify a cache duration. The following table describes the duration values supported by each Quick BI edition.


Edition		Cache duration
Quick BI Pro	Personal workspace	Valid values: 5 Minutes, 30 Minutes, 1 Hour, 2 Hours, 4 Hours, and 12 Hours.
Quick BI Pro	Group workspace	Valid values: 5 Minutes, 30 Minutes, 1 Hour, 2 Hours, 4 Hours, and 12 Hours.
Quick BI Enterprise Standard	Personal workspace	Valid values: 5 Minutes, 30 Minutes, 1 Hour, 2 Hours, 4 Hours, and 12 Hours.
Quick BI Enterprise Standard	Group workspace	Valid values: 5 Minutes, 30 Minutes, 1 Hour, 2 Hours, 4 Hours, 12 Hours, and 24 Hours.

After the cache duration elapses, the cached data becomes invalid. New cached data can be generated when a user queries data on the report page. When the same SQL statement is executed again within the cache duration, the system retrieves data from the cache.
The system caches data based on datasets. When the cached data for a dataset is cleared, the cached data for all charts that are generated based on the dataset is cleared. Therefore, after the specified cache duration elapses, all chart caches for this dataset are cleared.

Enable the precompute feature for a dataset

The precompute feature is suitable for datasets that produce data once a day.

You can enable the precompute feature for datasets that are created based on data sources from MaxCompute, ApsaraDB RDS for MySQL, and ApsaraDB RDS for PostgreSQL.

After the precompute feature is enabled for a dataset, the system processes data in the dataset. When users query reports that are created based on the dataset, the system can return results in a short period of time. The precompute feature provides the following benefits:

Easy to use. You need only to select Turn on precalculation in the Speed up the cache dialog box to enable this feature. The system automatically builds a model based on the dataset usage in the last seven days and then precomputes data and aggregates result data. You can use this feature only for aggregate queries.
High-performance and cost-effective. The precompute feature precomputes data and aggregates result data to reduce the number of data dimensions that need to be queried. This can significantly reduce query time and improve analysis efficiency for large-scale datasets.

Configure cache acceleration

You can enable the cache feature and configure a cache duration, enable the acceleration engine feature, or enable the precompute feature to accelerate report queries.

Before you enable the acceleration engine feature for a dataset, make sure that an acceleration engine is configured. Only administrators can configure an acceleration engine. For more information, see Configure an acceleration engine.

In the top navigation bar, click Workspace. On the page that appears, click Datasets in the left-side navigation pane.
On the Datasets page, find the dataset that you want to manage, click the icon in the Actions column, and then click Speed up the cache.
You can also right-click the dataset name and then click Speed up the cache.
In the Speed up the cache dialog box, select and configure one of the following parameters:
- Enable caching: Select this option to enable the cache feature. After you enable the cache feature for the current dataset and a user queries a report that is generated base on the dataset, the queried data is cached. When other users attempt to query reports within the specified cache duration, Quick BI first requests data from the cache. If the cache does not contain the requested data, Quick BI requests the data from the database and then caches the data.
- Turn on acceleration: Select this option to enable the Data Lake Analytics (DLA) acceleration engine. If a dataset has a large amount of data, you can enable this feature. After this feature is enabled, the system automatically uses Alibaba Cloud DLA to accelerate chart queries. This feature allows Quick BI to respond to queries on hundreds of millions of data records in seconds. This way, the report query time is reduced.
- Turn on precalculation: Select this option to enable the precompute feature. After the precompute feature is enabled, the system processes data in the dataset. When users query reports that are created based on the dataset, the system can return results in a short period of time.
  Note
  
  You can enable the precompute feature for datasets that are created based on data sources from MaxCompute, ApsaraDB RDS for MySQL, and ApsaraDB RDS for PostgreSQL.
  
  The precompute feature is suitable for datasets that produce data once a day.
- Turn on dimensional acceleration: Select this option to enable the dimensional acceleration feature. If the dataset has a large amount of data, the query speed may be slow when you use the query widget to query charts. As a solution, Quick BI provides this feature to accelerate queries on dimensions in a dataset. When you enable this feature, you must specify the tables and the fields from which the dimension values come. After the dimensional acceleration feature is enabled and configured, the system reads data from the specified tables based on the conditions that users configure in the query widget. This way, the system does not obtain data by automatically parsing the source data.
Click OK.

Optimize the performance of data queries on large-scale databases

For databases that have slow queries, such as MaxCompute databases and Hive-based databases, Quick BI provides a built-in query engine to optimize query performance.

Enable the full table scan feature

If you want to query data from a dataset that is created based on MaxCompute partitioned tables, you must enable the full table scan feature. Otherwise, requested data cannot be obtained.

In the top navigation bar, click Workspace. On the page that appears, click Datasets in the left-side navigation pane.
On the Datasets page, find the dataset that you want to manage, click the icon in the Actions column, and then click Edit Properties.
You can also right-click the dataset name and then click Edit Properties.
In the Edit Properties panel, select Scan Full Table.
Click Save.
Optional:If the scan efficiency is slow after you enable the full table scan feature, you can use field-based filtering or SQL statement-based filtering to improve the scan efficiency.
- Field-based filtering
  1. On the Datasets page, find the dataset that you want to manage, and click the icon in the Actions column.
  2. On the dataset editing page, click Set Filter in the top navigation bar.
  3. In the Set Filter Fields dialog box, specify the fields that you want to use as filters.
- SQL statement-based filtering
  Add a filter to the WHERE clause.

Change the data source for a dataset

You can change the data source of a dataset to improve the performance of the dataset.

Notice Save the dataset configuration before you change the data source. Otherwise, unsaved configuration changes are not retained after the data source is changed. Different data sources support different syntaxes. After the data source is changed, check whether the calculated field configuration is valid.

In the top navigation bar, click Workspace. On the page that appears, click Datasets in the left-side navigation pane.
On the Datasets page, click the name of the dataset that you want to manage.
You can also click the icon in the Actions column.
In the left-side navigation pane of the dataset editing page, click the drop-down icon in the data source field and select a new data source.
Note The drop-down list displays all data sources in this workspace.
- The left-side column displays the data source name.
- The right-side column displays the data source type.
In the Switch data source message, read the message and click OK.

:Optimize dataset performance

Prerequisites

Background information

Specify a cache duration for a dataset

Enable the precompute feature for a dataset

Configure cache acceleration

Optimize the performance of data queries on large-scale databases

Enable the full table scan feature

Change the data source for a dataset