All Products
Search
Document Center

MaxCompute:Usage notes

Last Updated:Oct 31, 2023

This topic provides reading recommendations based on your roles.

MaxCompute beginners

If you are a beginner in MaxCompute, we recommend that you first familiarize yourself with the modules described in the following table.

Module

Description

Product Introduction

Provides an overview of MaxCompute and describes the features, scenarios, limits, and basic concepts of MaxCompute. This module helps you obtain a general knowledge of MaxCompute.

Preparations

Describes how to create an account, prepare an environment, create a table, import data, run SQL jobs, and export returned data.

Getting Started

Common SQL statements

Describes the commonly used commands in MaxCompute. This module helps you familiarize yourself with operations on MaxCompute.

Tools

Describes the common tools in MaxCompute, such as the MaxCompute client and MaxCompute Studio. Before you analyze data, you must familiarize yourself with the tools.

Endpoints

Describes the network connection modes supported in different regions and the endpoints that correspond to each region. This module also describes the issues that may occur when MaxCompute is connected to other Alibaba Cloud services, such as Elastic Compute Service (ECS), Tablestore, and Object Storage Service (OSS). These issues include network connectivity issues and issues related to data download charges.

Data analysts

If you are a data analyst, we recommend that you familiarize yourself with the SQL topics. You can query and analyze large volumes of data stored in MaxCompute. The following table describes the features that are provided by MaxCompute SQL.

Feature

Description

DDL statements

Allows you to manage tables, partitions, columns, lifecycles, and views.

DML statements

Allows you to insert data into or update data in tables or partitions.

DQL statements

Allows you to perform various query operations, such as SELECT and subqueries.

SQL enhancement operations

Allows you to perform SQL enhancement operations, such as importing and exporting data from MaxCompute tables and cloning table data, by using commands.

Built-in functions

Allows you to process data by using MaxCompute built-in functions, such as the mathematical functions, window functions, date functions, aggregate functions, and string functions.

UDF

Allows you to create user-defined functions (UDFs) to meet your computing requirements.

Users with development experience

If you have development experience, understand the distributed architecture, and want to obtain data analytics capabilities that SQL cannot deliver, we recommend that you familiarize yourself with advanced functional modules of MaxCompute.

Module

Description

MapReduce

MaxCompute provides the MapReduce programming model in Java. You can use the Java API provided by MapReduce to write MapReduce programs and process data in MaxCompute.

Graph

Graph is a processing framework for iterative graph computing. A graph consists of vertices and edges, both of which contain values. MaxCompute Graph iteratively edits and evolves graphs to obtain analysis results.

Tunnel

MaxCompute Tunnel enables you to upload or download large amounts of data to or from MaxCompute at a time.

SDK for Java

MaxCompute provides an SDK for Java for developers.

SDK for Python

MaxCompute provides an SDK for Python for developers.

Project owners or administrators

If you are a project owner or administrator, we recommend that you familiarize yourself with the modules described in the following table. A project owner can create and use projects, and a project administrator can manage projects, security operations, and costs.

Module

Feature

Description

Project management

Prepare for project creation

A project is a basic organizational unit of MaxCompute. Similar to a database or schema in a traditional database system, a project is used to isolate users and control access requests. A user can have permissions on multiple projects. After a user is granted the related permissions, the user can access objects, such as tables, resources, functions, and instances, across projects. MaxCompute is used to manage various objects in projects. You must make the following preparations before you create a project:

  • Prepare your budget for resources

    You are charged for storage resources, computing resources, and resources for Internet-based data downloads.

    • Storage resources: You are charged for these resources based on the pay-as-you-go billing method and tiered unit prices. You can estimate their costs based on the volume of data stored. Data stored in MaxCompute changes all the time. As a result, the costs also change.

    • Computing resources: You are charged for these resources based on the pay-as-you-go and subscription billing methods. It is difficult to estimate the number of required computing resources at the beginning of your project. We recommend that you use the pay-as-you-go billing method and then decide whether to switch to the subscription billing method based on the number of computing resources used.

    • Resources for Internet-based data downloads: You are charged for these resources based on the pay-as-you-go billing method.

    For more information, see Storage pricing (pay-as-you-go), Computing pricing, and Download pricing (pay-as-you-go).

  • Create an account and activate the service

    Before you create a MaxCompute project, you must create an Alibaba Cloud account and activate MaxCompute. Bills are issued to the Alibaba Cloud account. After the account is created, you must choose the pay-as-you-go or subscription billing method based on your budget for the resources you require.

Create a project

For more information, see Create a MaxCompute project.

Manage project members

Members are managed based on member responsibilities and security requirements. If you use MaxCompute in the DataWorks console, you must understand the permission relationships between MaxCompute and DataWorks.

Manage RAM users

You can manage MaxCompute projects by using your Alibaba Cloud account or the credentials of a RAM user. You can add RAM users of your Alibaba Cloud account to a MaxCompute project. For more information about RAM users, see Prepare a RAM user.

If you manage MaxCompute projects and DataWorks workspaces in the DataWorks console, you can add only RAM users of your Alibaba Cloud account as members. Therefore, you must use your Alibaba Cloud account to create RAM users and manage these RAM users in the Resource Access Management (RAM) console.

Note
  • We recommend that you do not allow multiple project members to share one RAM user.

  • When a project member is transferred to a new position or resigns, you must delete the RAM user of the project member at the earliest opportunity. If a RAM user is added as a project member in the DataWorks console, delete the project member in the DataWorks console and then delete the RAM user in the RAM console.

Manage scheduling resources

You are required to manage the scheduling resources of DataWorks. These resources are used to execute or distribute the tasks that are delivered by the scheduling system. Scheduling resources of DataWorks are categorized into the following types:

  • Default scheduling resources. Default scheduling resources are the resources in the public resource pool of DataWorks. If the parallelism of DataWorks nodes is high and the scheduling resources are insufficient, the nodes wait for resources. After resources are allocated to the nodes, the nodes run the delivered tasks.

  • Custom scheduling resources. You can configure your ECS instance as a scheduling server to distribute tasks. You can use your Alibaba Cloud account to create custom scheduling resources. Scheduling resources include physical machines or ECS instances that are used to run tasks, such as data synchronization tasks.

Configure projects

Only the owner of a project has the permissions to configure the project. For example, the project owner can specify whether to enable full table scan and whether to enable the MaxCompute V2.0 data type edition. For more information, see Project operations.

Cost management

None

Budgets for resources help you estimate costs before you use the resources. It is difficult to estimate the precise costs due to the different billing methods of MaxCompute. You must manage costs during the entire business development process.

  • For more information about pricing, see Overview.

  • You can switch between the pay-as-you-go and subscription billing methods. For more information, see Switch billing methods.