Data lake analytics - MaxCompute - Alibaba Cloud Documentation Center

Tutorials

Document link	Introduction
Data transformation and multi-scenario orchestration on a data lake using MaxCompute	Use MaxLake to ingest data into a data lake and warehouse and enable multi-scenario analytics. This tutorial uses Internet of Vehicles (IoV) data to show how to analyze mileage and speed from vehicle GPS information. It also explains how to orchestrate multiple engines to support real-time query reports, cross-team collaboration, desensitized data sharing, and AI training. This method lets you derive multiple values from a single copy of data.
Read CSV data from a data lake using DLF 1.0 and OSS	Configure Data Lake Formation (DLF) to extract metadata from Object Storage Service (OSS). Then, use a MaxCompute external schema to run federated queries on the data lake. This solution simplifies data analysis and processing while ensuring data reliability and security.
Read Paimon data from a data lake using DLF 1.0 and OSS	Use Flink to create a Paimon DLF catalog. Read MySQL Change Data Capture (CDC) data and write it to OSS. Then, synchronize the metadata to DLF. Finally, use a MaxCompute external schema to run federated queries on the data lake.
Read Parquet data from a data lake using a schemaless query	This tutorial uses an E-MapReduce serverless Spark cluster as an example. It shows how to use a schemaless query in MaxCompute to read Parquet files generated by Spark SQL. After the computation is complete, you can use the UNLOAD command to write the results back to OSS.
Read Hadoop Hive data using HMS and HDFS	This tutorial uses Hive on E-MapReduce as an example. It shows how to create an external schema in MaxCompute and query Hive table data in Hadoop.
Create metadata mapping and data synchronization for Hologres	This tutorial demonstrates how to use MaxCompute to create metadata mapping and data synchronization for Hologres.
Read and write Paimon data on a data lake using an external project and a FileSystem Catalog	Use Flink to create a Paimon catalog and generate data. Then, use MaxCompute to create an external project based on the FileSystem Catalog to directly read the Paimon table data.
(Invitational preview) Use an external project to read and write Paimon data on a data lake using DLF	Use Flink to create a Paimon DLF catalog. Read MySQL CDC business data and write it to DLF. Then, use a MaxCompute external project to run federated queries and analysis on the data lake and write the results back to DLF. This topic uses the new version of DLF, which is different from the previous DLF 1.0.