MTable Assembler - Platform For AI - Alibaba Cloud Documentation Center

You can use MTable Assembler to group columns in a table and aggregate the data into an MTable.

Limits

The supported compute engines are MaxCompute, Flink, and Deep Learning Containers (DLC).

Configure the component in the Platform for AI (PAI) console

Input ports
Input port (left-to-right)
Data type
Recommended upstream component
Required
data
None
Read Table
Feature engineering
Data preprocessing
Yes

Component parameters

Tab	Parameter	Description
Field Setting	selectedCols	A list of names of computed columns.
Field Setting	groupCols	The names of columns you want to group. This parameter is optional. You can select one or more columns. This parameter is left empty by default, which indicates full data aggregation.
Parameter Setting	outputCol	The name of the output result column.
Execution Tuning	Number of Workers	The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
Execution Tuning	Memory per worker, unit MB	The memory size of each worker. Valid values: 1024 to 64 × 1024. Unit: MB.

Configure the component by using code

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the MTable Assembler component.

import numpy as np
import pandas as pd
from pyalink.alink import *

df_data = pd.DataFrame([
      ["a1", "11L", 2.2],
      ["a1", "12L", 2.0],
      ["a2", "11L", 2.0],
      ["a2", "12L", 2.0],
      ["a3", "12L", 2.0],
      ["a3", "13L", 2.0],
      ["a4", "13L", 2.0],
      ["a4", "14L", 2.0],
      ["a5", "14L", 2.0],
      ["a5", "15L", 2.0],
      ["a6", "15L", 2.0],
      ["a6", "16L", 2.0]
])

input = BatchOperator.fromDataframe(df_data, schemaStr='id string, f0 string, f1 double')

zip = GroupByBatchOp()\
	.setGroupByPredicate("id")\
	.setSelectClause("id, mtable_agg(f0, f1) as m_table_col")

flatten = FlattenMTableBatchOp()\
	.setReservedCols(["id"])\
	.setSelectedCol("m_table_col")\
	.setSchemaStr('f0 string, f1 int')

zip.linkFrom(input).link(flatten).print()

Input port (left-to-right)	Data type	Recommended upstream component	Required
data	None	Read Table Feature engineering Data preprocessing	Yes