MaxCompute provides the data lakehouse solution that enables you to build a data management platform that combines data lakes and data warehouses. This solution integrates the flexibility and broad ecosystem compatibility of data lakes with the enterprise-class deployment of data warehouses. This topic describes how to use MaxCompute and a heterogeneous data platform to build a data lakehouse solution. The data lakehouse solution is in public preview.
Build a data lakehouse solution
You can build a data lakehouse solution by using MaxCompute and a data lake. MaxCompute serves as a data warehouse in the data lakehouse solution. You can build a data lakehouse solution by using one of the following methods:
Build a data lakehouse by using MaxCompute, DLF, and OSS: If you use this method, all schemas of the data lake are stored in Data Lake Formation (DLF). MaxCompute can use the metadata management capability of DLF to efficiently process semi-structured data in OSS. The OSS semi-structured data includes data in the Delta Lake, Apache Hudi, AVRO, CSV, JSON, Parquet, and ORC formats.
Build a data lakehouse by using MaxCompute and Hadoop: You can use a Hadoop cluster that is deployed in a data center, on virtual machines (VMs) in the cloud, or in Alibaba Cloud E-MapReduce (EMR). If MaxCompute is connected to the virtual private cloud (VPC) in which the Hadoop cluster is deployed, MaxCompute can directly access Hive metastores and map metadata to external projects of MaxCompute.
Limits
The lakehouse solution is supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt).
MaxCompute must be deployed in the same region as DLF and OSS.
References
After an external project is created, the owner of the tables in the external project belongs to the account that is used to create the external project. For more information about how to grant other users the permissions to perform operations on tables in an external project, see Grant other users the permissions on an external project.
When you build a data lakehouse solution, you can use SQL statements to manage external projects. For more information, see Use SQL statements to manage an external project.