MaxFrame is a distributed computing framework developed by Alibaba Cloud and is compatible with Pandas interfaces. MaxFrame supports Python programming interfaces and automatically performs distributed computing. You can use the massive computing resources and data of MaxCompute to process large amounts of data, visualize data exploration and analysis, and perform scientific computing and development based on machine learning (ML) and AI.
Background information
In the current data-driven era, efficient application of big data processing and AI has become an indispensable part of enterprises and research institutions. Python provides strong support in the field of data science based on the integration with third-party development ecosystems such as NumPy, Pandas, and scikit-learn. However, these ecosystems are limited to standalone or single-core computing capabilities. It is difficult to meet the requirements for distributed big data processing.
To meet the increasing requirements for efficient big data processing and AI development in Python, MaxCompute provides MaxFrame, a distributed computing framework based on Python programming interfaces. MaxFrame allows you to directly use huge amounts of MaxCompute computing resources to perform distributed processing. You can also use MaxFrame together with MaxCompute Notebook and features such as image management to build a Python development ecosystem for MaxCompute.
Introduction to MaxFrame
MaxFrame is a distributed computing framework that supports Python programming interfaces and can directly use the computing resources and data interfaces of MaxCompute. This allows Python developers to process large amounts of data and develop AI models in a more efficient and convenient manner. MaxFrame is fully compatible with Pandas interfaces and automatically implements distributed processing. You can use the massive computing resources and data of MaxCompute to complete data processing, visual data exploration, scientific computing, and ML- or AI-based development in a more familiar and efficient way. The following figure shows the architecture.
Benefits
More familiar development habits
MaxFrame provides Python programming interfaces and is fully compatible with Pandas operators. The operators are submitted to MaxCompute for automatic distributed execution. This way, the execution is not limited to the size of resources on on-premises machines.
More efficient data processing capabilities
MaxFrame allows you to directly perform distributed data computing in a MaxCompute cluster. When you run MaxFrame, you do not need to pull data to your on-premises machine. This eliminates the need for local data transfers and improves job execution efficiency.
More convenient development experience
MaxFrame is integrated with MaxCompute Notebook and DataWorks to provide an out-of-the-box interactive development environment and offline scheduling capabilities. MaxFrame allows you to directly reference MaxCompute built-in images such as Pandas, NumPy, and XGBoost and custom images in code development. MaxFrame supports Python 3.7 and Python 3.11. You do not need to consider complex environment preparation and compatibility issues.
Scenarios
MaxCompute MaxFrame is suitable for the following scenarios:
Python ecosystem development: MaxFrame provides an ideal solution for developers who require an out-of-the-box Python environment and rapid data processing, data science, and interactive data exploration.
Large-scale data analysis and processing: If the amount of data to be processed is large and the processing logic is complex, MaxFrame allows you to directly use the massive data and computing resources of MaxCompute to perform distributed processing. This significantly improves the development efficiency for data analysis, processing, and mining.
Data and AI development: If an entire distributed data development and model development process depends on third-party or custom images, MaxFrame provides full support for the workflow from data processing to AI model training and deployment.
Supported tools
MaxFrame can be used in on-premises environments and DataWorks. For more information, see Preparations.
Technical support
If you have questions about MaxFrame, you can use DingTalk to scan the following QR code to join a dedicated DingTalk group for MaxFrame technical support.