You can use HoloWeb to import public datasets and query data in the public datasets in a visualized manner. This topic describes how to use HoloWeb to create an import task and view the task status.
Background information
In HoloWeb, you can import the tpch_10g
, tpch_100g
, and github_event
public datasets with a few clicks. Each of the datasets may occupy 10 GB to 100 GB storage space.
The
tpch_10g
andtpch_100g
public datasets are two sample datasets in retail scenarios. The tpch_10g public dataset contains 10 GB of data, and the tpch_100g public dataset contains 100 GB of data. For more information, see Test plan.The
github_event
public dataset is available on GitHub. For more information, see Introduction to business and data.
Prerequisites
The version of your Hologres instance is V1.3.13 or later.
A Hologres instance is connected to HoloWeb. For more information, see Log on to an instance.
Usage notes
The public dataset importing feature is supported by Hologres instances that are deployed in the following regions: China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), and China (Zhangjiakou).
The account that you use to import public datasets must have permissions to perform operations such as creating schemas, creating tables, and writing data. For more information, see Overview.
It may take 3 minutes to 20 minutes to import a public dataset into a Hologres instance. The duration varies based on the instance specifications. We recommend that you plan your computing resources in advance to prevent negative impacts on your online business.
In a public dataset import task, two schemas and several external tables and internal tables are automatically created. You must make sure that no existing schemas and tables in your Hologres instance have the same names as the automatically created schemas and tables. This prevents data deletion by mistake.
Create a public dataset import task
Go to the HoloWeb console. For more information, see Connect to HoloWeb.
In the HoloWeb console, click Data Solution in the top navigation bar.
On the Data Solution page, click One-Click Import of public datasets in the left-side navigation pane.
On the One-Click import of public datasets page, click Create a public dataset import task.
On the New Public Dataset Import page, configure the Instance Name, Database, and Public Data Set Name parameters, and click Submit.
View the information about a public dataset import task
On the One-Click import of public datasets page, configure the Instance Name and Database parameters and click Query.
You can view the information displayed in the task list and perform operations on a task.
Displayed information: No., Instance, Database, Public Data Set Name, Status, Progress, Created At, and Ended At. The Progress is displayed in the following format: Number of completed SQL statements/Total number of SQL statements.
Supported operations: Details, Stop, Rerun, Delete, and Execution History.
A public dataset import task is complete if Status is Successful. You can then use the data for analytics.
Drop a public dataset
You can execute the following statement to drop the schema in which the public dataset that you want to drop resides and all dependencies. In this example, the tpch_100g
dataset is dropped. Exercise caution when you perform this operation.
DROP SCHEMA hologres_dataset_tpch_100g, hologres_foreign_dataset_tpch_100g CASCADE;