This topic describes data overview of data tables.
Description
Data Lake Formation (DLF) provides detailed metrics for each data table based on the statistics of metadata and stored data. The metrics include the storage size of the table, total number of files in the table, last DDL update time, last data update time, number of visits to the table, number of partitions, number of partitioned files, and last partition update time. If you have enabled the data lake management feature, you can also query the statistics on storage class distribution and file size distribution.
Scenarios
As your business grows, if you want to quickly analyze the size of data stored in each table for more specific storage optimization, you can use the Size metric.
As the data volume increases, if you want to locate and cold archive the tables that are seldom accessed to save storage costs, you can use the File Visits metric.
If you want to quickly locate the tables that have not been updated for a long time and then clean or optimize the tables, you can use the Data Last Updated At metric.
Metrics
Metric | Description | Source |
Size | The size of all data stored in a table. | If the location is hosted in DLF, the value of this metric is derived from the statistics of Object Storage Service (OSS) data. Otherwise, the value of this metric is derived from the statistics of E-MapReduce (EMR) engines. |
Total Number of Tables | The total number of files stored in a table. | If the location is hosted in DLF, the value of this metric is derived from the statistics of OSS data. Otherwise, the value of this metric is derived from the statistics of EMR engines. |
Number of Partitions | The total number of partitions in a table. | The value of this metric is derived from metadata. |
DDL Last Updated At | The time when the table was last updated by using a data description language (DDL) statement. | If the location is hosted in DLF, the value of this metric is derived from the statistics of OSS. |
Data Last Updated At | The time when the data was last updated. | If the location is hosted in DLF, the value of this metric is derived from the statistics of OSS. |
File Visits within Last Day | The number of visits to a table on the last day. | The data is derived from various compute engines. Currently, the data is derived from only EMR engines. Applicable conditions:
|
File Visits within Last Seven Days | The number of visits to a table in the last seven days. | |
File Visits within Last 30 Days | The number of visits to a table in the last 30 days. |
Procedure
Log on to the DLF console.
In the left-side navigation pane, choose
.Click the Table tab, configure the Catalog List and Database Name parameters, and then enter the name of the table that you want to search for.
Click the table name to go to the table details page.
Click the Data Profile tab.