Data Lake Analytics (DLA) allows you to configure a data source, a destination data warehouse, and Object Storage Service (OSS). It provides an automatic and seamless method to synchronizes full data from the data sources to OSS at a specified time. The data source can be ApsaraDB RDS or a self-managed database hosted on an ECS instance. In addition, a schema that is the same as that in the data source is created in OSS and DLA. This schema can be used for the analysis of data in OSS. This process does not affect the business of data sources.
Prerequisites
The following operations are performed:
- RDS
Preparations are complete for ApsaraDB RDS. For more information, see General workflow to use RDS for MySQL, General workflow to use RDS SQL Server, and Create an ApsaraDB RDS for PostgreSQL instance.
- OSS
- OSS is activated. For more information, see Activate OSS.
- A bucket is created. For more information, see Create buckets.
- A folder is created. For more information, see Create directories.
Note You can determine whether to create a folder to store ApsaraDB RDS data based on your business requirements.
- DLA
- DLA is activated. For more information, see Activate Data Lake Analytics.
- The password of the Alibaba Cloud account that is used to log on to the database in DLA is initialized. For more information, see Manage DLA accounts.
Procedure
- Create a data warehouse with one click.
- After you create a data warehouse, you can manually trigger data synchronization at any time based on your business requirements. During data synchronization, a table schema that is the same as that in the data source such as ApsaraDB RDS or a self-managed database hosted on an ECS instance is created in OSS and DLA.