MaxFrame-specific APIs - MaxCompute - Alibaba Cloud Documentation Center

This topic describes several types of APIs that are specific to MaxFrame, including session, input/output, execute, and fetch APIs. These APIs are used to process data in MaxFrame tasks.

Session-related API

new_session

API name: new_session. For more information about the source code, see new_session.

new_session(
  session_id: str = None,
  default: bool = True,
  new: bool = True,
  odps_entry: Optional[ODPS] = None
)

Description: starts a MaxFrame task session.

Input parameters

Parameter	Data type	Required	Description
session_id	String	No	The session identifier. This parameter is used to specify a unique identifier for a new session. If this parameter is not specified, MaxFrame automatically generates a default identifier.
default	Boolean	No	Specifies whether to use the created session as the default session. Default value: True.
new	Boolean	No	Specifies whether to create a session. Default value: True. If this parameter is set to False, an existing session is reused based on session_id.
odps_entry	ODPS	Yes	The MaxCompute entry object. For more information, see Create a MaxCompute entry point.

Return value
The session object.

Sample code

from maxframe import new_session
from odps import ODPS

# Use the MaxFrame account to initialize MaxCompute.
o = ODPS(
    # Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_ID to the AccessKey ID of your Alibaba Cloud account. 
    # Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET to the AccessKey secret of your Alibaba Cloud account. 
    # We recommend that you do not directly use the actual AccessKey ID and AccessKey secret. 
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
    project='your-default-project',
    endpoint='your-end-point',
)

# Initialize the MaxFrame session.
session = new_session(odps_entry=o)

Input/Output-related APIs

read_odps_table

API name: read_odps_table. For more information about the source code, see read_odps_table.

read_odps_table(
  table_name: Union[str, Table],
  partitions: Union[None, str, List[str]] = None,
  columns: Optional[List[str]] = None,
  index_col: Union[None, str, List[str]] = None,
  odps_entry: ODPS = None,
  string_as_binary: bool = None,
  append_partitions: bool = False
)

Description: reads data from a MaxCompute table and builds a DataFrame object. You can specify specific columns as indexes. If you do not specify indexes, a RangeIndex is generated.

Input parameters

Parameter	Data type	Required	Description
table_name	String/Table	Yes	The name of the MaxCompute table or table object from which you want to read data.
partitions	String/List	No	The table partition or partition list from which you want to read data. The value of this parameter is in the `<partition_name>=<partition_value>` format. If this parameter is not specified, data of all partitions in the table is read.
columns	List	No	The names of columns from which you want to read data. The value of this parameter is in the `<column1>, <column2>, ...` format. If this parameter is not specified, data of all columns in the table except the partition key columns is read.
index_col	String/List	No	The names of columns that are used as indexes.
odps_entry	ODPS	No	The MaxCompute entry object. For more information, see Create a MaxCompute entry point.
string_as_binary	Boolean	No	Specifies whether to read string data in the binary form.
append_partitions	Boolean	No	Specifies whether to read data from partition key columns. Default value: False. If this parameter is set to True, data is read from all columns, including partition key columns, when the `columns` parameter is not specified.

Return value
The DataFrame object.

Sample code

import maxframe.dataframe as md

df = md.read_odps_table('BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users', index_col='user_id', columns=['age', 'sex'])
print(df.execute().fetch())

# Return value		
user_id	age   sex
1	24	M
2	53	F
3	23	M
4	24	M
5	33	F
...	...	...
939	26	F
940	32	M
941	20	M
942	48	F
943	22	M

read_odps_query

API name: read_odps_query. For more information about the source code, see read_odps_query.

read_odps_query(
  query: str,
  odps_entry: ODPS = None,
  index_col: Union[None, str, List[str]] = None,
  string_as_binary: bool = None
  )

Description: reads data from a MaxCompute SQL query and creates a DataFrame object. You can specify specific columns as indexes. If you do not specify indexes, a RangeIndex is generated.

Input parameters

Parameter	Data type	Required	Description
query	String	Yes	The MaxCompute SQL statement that you want to read.
odps_entry	ODPS	No	The MaxCompute entry object. For more information, see Create a MaxCompute entry point.
index_col	String/List	No	The names of columns that are used as indexes.
string_as_binary	Boolean	No	Specifies whether to read string data in the binary form.

Return value
The DataFrame object.

Sample code

import maxframe.dataframe as md

df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`')

to_odps_table

API name: to_odps_table. For more information about the source code, see to_odps_table.

to_odps_table(
  table: Union[Table, str],
  partition: Optional[str] = None,
  partition_col: Union[None, str, List[str]] = None,
  overwrite: bool = False,
  unknown_as_string: Optional[bool] = None,
  index: bool = True,
  index_label: Union[None, str, List[str]] = None,
  lifecycle: Optional[int] = None
)

Description: writes a DataFrame object to a MaxCompute table. If the table does not exist in MaxCompute, MaxFrame automatically creates the table.

Input parameters

Parameter	Data type	Required	Description
table	String/Table	Yes	The name of the table or table object to which you want to write DataFrame data.
partition	String	No	The partition to which you want to write data. Example: `pt1=xxx, pt2=yyy`.
partition_col	String/List	No	The names of columns that are used as partition key columns in DataFrame.
overwrite	Boolean	No	Specifies whether to overwrite data if the table or partition already exists. Default value: False.
unknown_as_string	Boolean	No	Specifies whether to process data of an unrecognized type as the STRING data type. Default value: False. If this parameter is set to True, the object type in DataFrame is processed as the STRING data type. An error may occur.
index	Boolean	No	Specifies whether to store indexes. Default value: True.
index_label	String/List	No	The name of the column specified for the index. The name of an index column is specified by the index_label parameter. If this parameter is not specified, the default name index is used. If only a one-level index exists, the index is named index by default. If multi-level indexes exist, each index is named level_x. x is the level of the index.
lifecycle	int	No	The lifecycle of the output table. The value of this parameter is a positive integer. If the table already exists, the setting of this parameter overwrites the original parameter setting.

Return value
The DataFrame object.

Sample code

import maxframe.dataframe as md

df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`', index_col='user_id'))
ouput_df = df.to_odps_table('output_table', lifecycle = 7)

Execute

execute

API name: execute. For more information about the source code, see execute.
```
execute(
  session: SessionType = None
)
```
Description: calls the execute method to start a data processing task.

Input parameters

Parameter

Data type

Required

Description

session

Session

The session that is used to run a data processing task. For more information about how to create a session, see new_session.

If this parameter is not specified, the global session initialized by using new_session is used.

Return value
N/A.

Sample code

import maxframe.dataframe as md

df = md.read_odps_query('select user_id, age, sex FROM BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users', index_col='user_id'))
df.execute()

Fetch

fetch

API name: fetch. For more information about the source code, see fetch.
```
fetch(
  session: SessionType = None
)
```
Description: returns the result data to the on-premises environment.

Input parameters

Parameter

Data type

Required

Description

session

Session

The session that is used to obtain the result data. For more information about how to create a session, see new_session.

If this parameter is not specified, the global session initialized by using new_session is used.

Return value
The DataFrame or Series of Pandas.

Sample code

import maxframe.dataframe as md

df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`', index_col='user_id')
res = df.execute().fetch()
print(res)

# Obtain the returned result.      
user_id   age  sex
1         24   M
2         53   F
3         23   M
4         24   M
5         33   F
...      ...  ..
939       26   F
940       32   M
941       20   M
942       48   F
943       22   M

[943 rows x 2 columns]