This topic describes several types of APIs that are specific to MaxFrame, including session, input/output, execute, and fetch APIs. These APIs are used to process data in MaxFrame tasks.
Session-related API
new_session
API name: new_session. For more information about the source code, see new_session.
new_session( session_id: str = None, default: bool = True, new: bool = True, odps_entry: Optional[ODPS] = None )
Description: starts a MaxFrame task session.
Input parameters
Parameter
Data type
Required
Description
session_id
String
No
The session identifier.
This parameter is used to specify a unique identifier for a new session. If this parameter is not specified, MaxFrame automatically generates a default identifier.
default
Boolean
No
Specifies whether to use the created session as the default session.
Default value: True.
new
Boolean
No
Specifies whether to create a session.
Default value: True. If this parameter is set to False, an existing session is reused based on session_id.
odps_entry
ODPS
Yes
The MaxCompute entry object. For more information, see Create a MaxCompute entry point.
Return value
The session object.
Sample code
from maxframe import new_session from odps import ODPS # Use the MaxFrame account to initialize MaxCompute. o = ODPS( # Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_ID to the AccessKey ID of your Alibaba Cloud account. # Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET to the AccessKey secret of your Alibaba Cloud account. # We recommend that you do not directly use the actual AccessKey ID and AccessKey secret. os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'), os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'), project='your-default-project', endpoint='your-end-point', ) # Initialize the MaxFrame session. session = new_session(odps_entry=o)
Input/Output-related APIs
read_odps_table
API name: read_odps_table. For more information about the source code, see read_odps_table.
read_odps_table( table_name: Union[str, Table], partitions: Union[None, str, List[str]] = None, columns: Optional[List[str]] = None, index_col: Union[None, str, List[str]] = None, odps_entry: ODPS = None, string_as_binary: bool = None, append_partitions: bool = False )
Description: reads data from a MaxCompute table and builds a DataFrame object. You can specify specific columns as indexes. If you do not specify indexes, a RangeIndex is generated.
Input parameters
Parameter
Data type
Required
Description
table_name
String/Table
Yes
The name of the MaxCompute table or table object from which you want to read data.
partitions
String/List
No
The table partition or partition list from which you want to read data.
The value of this parameter is in the
<partition_name>=<partition_value>
format. If this parameter is not specified, data of all partitions in the table is read.columns
List
No
The names of columns from which you want to read data.
The value of this parameter is in the
<column1>, <column2>, ...
format. If this parameter is not specified, data of all columns in the table except the partition key columns is read.index_col
String/List
No
The names of columns that are used as indexes.
odps_entry
ODPS
No
The MaxCompute entry object. For more information, see Create a MaxCompute entry point.
string_as_binary
Boolean
No
Specifies whether to read string data in the binary form.
append_partitions
Boolean
No
Specifies whether to read data from partition key columns.
Default value: False. If this parameter is set to True, data is read from all columns, including partition key columns, when the
columns
parameter is not specified.Return value
The DataFrame object.
Sample code
import maxframe.dataframe as md df = md.read_odps_table('BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users', index_col='user_id', columns=['age', 'sex']) print(df.execute().fetch()) # Return value user_id age sex 1 24 M 2 53 F 3 23 M 4 24 M 5 33 F ... ... ... 939 26 F 940 32 M 941 20 M 942 48 F 943 22 M
read_odps_query
API name: read_odps_query. For more information about the source code, see read_odps_query.
read_odps_query( query: str, odps_entry: ODPS = None, index_col: Union[None, str, List[str]] = None, string_as_binary: bool = None )
Description: reads data from a MaxCompute SQL query and creates a DataFrame object. You can specify specific columns as indexes. If you do not specify indexes, a RangeIndex is generated.
Input parameters
Parameter
Data type
Required
Description
query
String
Yes
The MaxCompute SQL statement that you want to read.
odps_entry
ODPS
No
The MaxCompute entry object. For more information, see Create a MaxCompute entry point.
index_col
String/List
No
The names of columns that are used as indexes.
string_as_binary
Boolean
No
Specifies whether to read string data in the binary form.
Return value
The DataFrame object.
Sample code
import maxframe.dataframe as md df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`')
to_odps_table
API name: to_odps_table. For more information about the source code, see to_odps_table.
to_odps_table( table: Union[Table, str], partition: Optional[str] = None, partition_col: Union[None, str, List[str]] = None, overwrite: bool = False, unknown_as_string: Optional[bool] = None, index: bool = True, index_label: Union[None, str, List[str]] = None, lifecycle: Optional[int] = None )
Description: writes a DataFrame object to a MaxCompute table. If the table does not exist in MaxCompute, MaxFrame automatically creates the table.
Input parameters
Parameter
Data type
Required
Description
table
String/Table
Yes
The name of the table or table object to which you want to write DataFrame data.
partition
String
No
The partition to which you want to write data.
Example:
pt1=xxx, pt2=yyy
.partition_col
String/List
No
The names of columns that are used as partition key columns in DataFrame.
overwrite
Boolean
No
Specifies whether to overwrite data if the table or partition already exists.
Default value: False.
unknown_as_string
Boolean
No
Specifies whether to process data of an unrecognized type as the STRING data type.
Default value: False. If this parameter is set to True, the object type in DataFrame is processed as the STRING data type. An error may occur.
index
Boolean
No
Specifies whether to store indexes.
Default value: True.
index_label
String/List
No
The name of the column specified for the index.
The name of an index column is specified by the index_label parameter. If this parameter is not specified, the default name index is used. If only a one-level index exists, the index is named index by default. If multi-level indexes exist, each index is named level_x. x is the level of the index.
lifecycle
int
No
The lifecycle of the output table.
The value of this parameter is a positive integer. If the table already exists, the setting of this parameter overwrites the original parameter setting.
Return value
The DataFrame object.
Sample code
import maxframe.dataframe as md df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`', index_col='user_id')) ouput_df = df.to_odps_table('output_table', lifecycle = 7)
Execute
execute
API name: execute. For more information about the source code, see execute.
execute( session: SessionType = None )
Description: calls the execute method to start a data processing task.
Input parameters
Parameter
Data type
Required
Description
session
Session
No
The session that is used to run a data processing task. For more information about how to create a session, see new_session.
If this parameter is not specified, the global session initialized by using new_session is used.
Return value
N/A.
Sample code
import maxframe.dataframe as md df = md.read_odps_query('select user_id, age, sex FROM BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users', index_col='user_id')) df.execute()
Fetch
fetch
API name: fetch. For more information about the source code, see fetch.
fetch( session: SessionType = None )
Description: returns the result data to the on-premises environment.
Input parameters
Parameter
Data type
Required
Description
session
Session
No
The session that is used to obtain the result data. For more information about how to create a session, see new_session.
If this parameter is not specified, the global session initialized by using new_session is used.
Return value
The DataFrame or Series of Pandas.
Sample code
import maxframe.dataframe as md df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`', index_col='user_id') res = df.execute().fetch() print(res) # Obtain the returned result. user_id age sex 1 24 M 2 53 F 3 23 M 4 24 M 5 33 F ... ... .. 939 26 F 940 32 M 941 20 M 942 48 F 943 22 M [943 rows x 2 columns]