Python on MaxCompute (PyODPS) is MaxCompute SDK for Python, which helps you use PyODPS to interact with MaxCompute and process data. You can use PyODPS to develop MaxCompute jobs, analyze data, and manage MaxCompute resources. This topic describes how to use PyODPS.
Introduction to PyODPS
PyODPS provides a DataFrame API for data manipulation and methods for managing MaxCompute objects. It is compatible with Python 2 (version 2.6 or later) and Python 3.
You can find more information about PyODPS from the following resources:
Learn about PyODPS: PyODPS Documentation and PyODPS Community Resources.
Install PyODPS: PyODPS Installation Guide.
Develop with PyODPS: PyODPS Development Guide.
Initialization
Before using PyODPS, you must initialize a MaxCompute client object with your Alibaba Cloud account credentials. Run the following command:
import os
from odps import ODPS
# Set the environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET to the AccessKey ID and AccessKey secret of your Alibaba Cloud account.
o = ODPS(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='your-default-project',
endpoint='your-end-point',
)Parameters:
ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET: Your AccessKey ID and AccessKey Secret. The associated RAM user or role must have the necessary permissions to manage objects within the MaxCompute project. You can create and find your credentials in the AccessKey page.
your-default-project: The name of your MaxCompute project. Find this name in the MaxCompute console under Workspace > Projects.
your-end-point: The endpoint of the region where your project is located.
Method descriptions
The following table describes common PyODPS methods for MaxCompute operations.
Item | Method | Description |
Projects | get_project(project_name) | Retrieves the name of a MaxCompute project. |
exist_project(project_name) | Checks whether a MaxCompute project exists. | |
Tables | list_tables() | Lists all tables in a MaxCompute project. |
exist_table(table_name) | Checks whether a table exists. | |
get_table(table_name, project=project_name) | Retrieves a specified table. You can obtain a table from another MaxCompute project. | |
create_table() | Creates a table. | |
read_table() | Reads data from a table. | |
write_table() | Writes data to a table. | |
delete_table() | Deletes an existing table. | |
Table partitions | exist_partition() | Checks whether a partition exists. |
get_partition() | Obtains information about a partition. | |
create_partition() | Creates a partition. | |
delete_partition() | Deletes an existing partition. | |
SQL | execute_sql()/run_sql() | Executes SQL statements. |
open_reader() | Reads execution results of SQL statements. | |
Instances | list_instances() | Lists all instances in a MaxCompute project. |
exist_instance() | Checks whether an instance exists. | |
get_instance() | Obtains information about an instance. | |
stop_instance() | Terminates an instance. | |
Resources | create_resource() | Creates a resource. |
open_resource() | Opens a resource. | |
get_resource() | Obtains information about a resource. | |
list_resources() | Lists all existing resources. | |
exist_resource() | Checks whether a resource exists. | |
delete_resource() | Deletes an existing resource. | |
Functions | create_function() | Creates a function. |
delete_function() | Deletes an existing function. | |
Uploads and downloads tunnels | create_upload_session() | Creates a session that is used to upload data. |
create_download_session() | Creates a session that is used to download data. |
The create_table(), read_table(), write_table(), and delete_table() methods require parameters. For more information, see Examples of using the SDK for Python: tables.