All Products
Search
Document Center

MaxCompute:MaxFrame-specific APIs

Last Updated:Jul 22, 2024

This topic describes several types of APIs that are specific to MaxFrame, including session, input/output, execute, and fetch APIs. These APIs are used to process data in MaxFrame tasks.

Session-related API

new_session

  • API name: new_session. For more information about the source code, see new_session.

    new_session(
      session_id: str = None,
      default: bool = True,
      new: bool = True,
      odps_entry: Optional[ODPS] = None
    )
  • Description: starts a MaxFrame task session.

  • Input parameters

    Parameter

    Data type

    Required

    Description

    session_id

    String

    No

    The session identifier.

    This parameter is used to specify a unique identifier for a new session. If this parameter is not specified, MaxFrame automatically generates a default identifier.

    default

    Boolean

    No

    Specifies whether to use the created session as the default session.

    Default value: True.

    new

    Boolean

    No

    Specifies whether to create a session.

    Default value: True. If this parameter is set to False, an existing session is reused based on session_id.

    odps_entry

    ODPS

    Yes

    The MaxCompute entry object. For more information, see Create a MaxCompute entry point.

  • Return value

    The session object.

  • Sample code

    from maxframe import new_session
    from odps import ODPS
    
    # Use the MaxFrame account to initialize MaxCompute.
    o = ODPS(
        # Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_ID to the AccessKey ID of your Alibaba Cloud account. 
        # Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET to the AccessKey secret of your Alibaba Cloud account. 
        # We recommend that you do not directly use the actual AccessKey ID and AccessKey secret. 
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
        os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
        project='your-default-project',
        endpoint='your-end-point',
    )
    
    # Initialize the MaxFrame session.
    session = new_session(odps_entry=o)

Input/Output-related APIs

read_odps_table

  • API name: read_odps_table. For more information about the source code, see read_odps_table.

    read_odps_table(
      table_name: Union[str, Table],
      partitions: Union[None, str, List[str]] = None,
      columns: Optional[List[str]] = None,
      index_col: Union[None, str, List[str]] = None,
      odps_entry: ODPS = None,
      string_as_binary: bool = None,
      append_partitions: bool = False
    )
  • Description: reads data from a MaxCompute table and builds a DataFrame object. You can specify specific columns as indexes. If you do not specify indexes, a RangeIndex is generated.

  • Input parameters

    Parameter

    Data type

    Required

    Description

    table_name

    String/Table

    Yes

    The name of the MaxCompute table or table object from which you want to read data.

    partitions

    String/List

    No

    The table partition or partition list from which you want to read data.

    The value of this parameter is in the <partition_name>=<partition_value> format. If this parameter is not specified, data of all partitions in the table is read.

    columns

    List

    No

    The names of columns from which you want to read data.

    The value of this parameter is in the <column1>, <column2>, ... format. If this parameter is not specified, data of all columns in the table except the partition key columns is read.

    index_col

    String/List

    No

    The names of columns that are used as indexes.

    odps_entry

    ODPS

    No

    The MaxCompute entry object. For more information, see Create a MaxCompute entry point.

    string_as_binary

    Boolean

    No

    Specifies whether to read string data in the binary form.

    append_partitions

    Boolean

    No

    Specifies whether to read data from partition key columns.

    Default value: False. If this parameter is set to True, data is read from all columns, including partition key columns, when the columns parameter is not specified.

  • Return value

    The DataFrame object.

  • Sample code

    import maxframe.dataframe as md
    
    df = md.read_odps_table('BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users', index_col='user_id', columns=['age', 'sex'])
    print(df.execute().fetch())
    
    # Return value		
    user_id	age   sex
    1	24	M
    2	53	F
    3	23	M
    4	24	M
    5	33	F
    ...	...	...
    939	26	F
    940	32	M
    941	20	M
    942	48	F
    943	22	M

read_odps_query

  • API name: read_odps_query. For more information about the source code, see read_odps_query.

    read_odps_query(
      query: str,
      odps_entry: ODPS = None,
      index_col: Union[None, str, List[str]] = None,
      string_as_binary: bool = None
      )
  • Description: reads data from a MaxCompute SQL query and creates a DataFrame object. You can specify specific columns as indexes. If you do not specify indexes, a RangeIndex is generated.

  • Input parameters

    Parameter

    Data type

    Required

    Description

    query

    String

    Yes

    The MaxCompute SQL statement that you want to read.

    odps_entry

    ODPS

    No

    The MaxCompute entry object. For more information, see Create a MaxCompute entry point.

    index_col

    String/List

    No

    The names of columns that are used as indexes.

    string_as_binary

    Boolean

    No

    Specifies whether to read string data in the binary form.

  • Return value

    The DataFrame object.

  • Sample code

    import maxframe.dataframe as md
    
    df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`')

to_odps_table

  • API name: to_odps_table. For more information about the source code, see to_odps_table.

    to_odps_table(
      table: Union[Table, str],
      partition: Optional[str] = None,
      partition_col: Union[None, str, List[str]] = None,
      overwrite: bool = False,
      unknown_as_string: Optional[bool] = None,
      index: bool = True,
      index_label: Union[None, str, List[str]] = None,
      lifecycle: Optional[int] = None
    )
  • Description: writes a DataFrame object to a MaxCompute table. If the table does not exist in MaxCompute, MaxFrame automatically creates the table.

  • Input parameters

    Parameter

    Data type

    Required

    Description

    table

    String/Table

    Yes

    The name of the table or table object to which you want to write DataFrame data.

    partition

    String

    No

    The partition to which you want to write data.

    Example: pt1=xxx, pt2=yyy.

    partition_col

    String/List

    No

    The names of columns that are used as partition key columns in DataFrame.

    overwrite

    Boolean

    No

    Specifies whether to overwrite data if the table or partition already exists.

    Default value: False.

    unknown_as_string

    Boolean

    No

    Specifies whether to process data of an unrecognized type as the STRING data type.

    Default value: False. If this parameter is set to True, the object type in DataFrame is processed as the STRING data type. An error may occur.

    index

    Boolean

    No

    Specifies whether to store indexes.

    Default value: True.

    index_label

    String/List

    No

    The name of the column specified for the index.

    The name of an index column is specified by the index_label parameter. If this parameter is not specified, the default name index is used. If only a one-level index exists, the index is named index by default. If multi-level indexes exist, each index is named level_x. x is the level of the index.

    lifecycle

    int

    No

    The lifecycle of the output table.

    The value of this parameter is a positive integer. If the table already exists, the setting of this parameter overwrites the original parameter setting.

  • Return value

    The DataFrame object.

  • Sample code

    import maxframe.dataframe as md
    
    df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`', index_col='user_id'))
    ouput_df = df.to_odps_table('output_table', lifecycle = 7)

Execute

execute

  • API name: execute. For more information about the source code, see execute.

    execute(
      session: SessionType = None
    )
  • Description: calls the execute method to start a data processing task.

  • Input parameters

    Parameter

    Data type

    Required

    Description

    session

    Session

    No

    The session that is used to run a data processing task. For more information about how to create a session, see new_session.

    If this parameter is not specified, the global session initialized by using new_session is used.

  • Return value

    N/A.

  • Sample code

    import maxframe.dataframe as md
    
    df = md.read_odps_query('select user_id, age, sex FROM BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users', index_col='user_id'))
    df.execute()

Fetch

fetch

  • API name: fetch. For more information about the source code, see fetch.

    fetch(
      session: SessionType = None
    )
  • Description: returns the result data to the on-premises environment.

  • Input parameters

    Parameter

    Data type

    Required

    Description

    session

    Session

    No

    The session that is used to obtain the result data. For more information about how to create a session, see new_session.

    If this parameter is not specified, the global session initialized by using new_session is used.

  • Return value

    The DataFrame or Series of Pandas.

  • Sample code

    import maxframe.dataframe as md
    
    df = md.read_odps_query('select user_id, age, sex FROM `BIGDATA_PUBLIC_DATASET.data_science.maxframe_ml_100k_users`', index_col='user_id')
    res = df.execute().fetch()
    print(res)
    
    # Obtain the returned result.      
    user_id   age  sex
    1         24   M
    2         53   F
    3         23   M
    4         24   M
    5         33   F
    ...      ...  ..
    939       26   F
    940       32   M
    941       20   M
    942       48   F
    943       22   M
    
    [943 rows x 2 columns]