All Products
Search
Document Center

Platform For AI:MTable Expander

Last Updated:Nov 25, 2024

You can use MTable Expander to expand a MTable to a table to facilitate data processing and demonstration.

Supported computing resources

  • MaxCompute

  • Apache Flink

  • Deep Learning Containers (DLC)

Configure the component in the Machine Learning Platform for AI (PAI) console

  • Input ports

    Input port (left-to-right)

    Data type

    Recommended upstream component

    Required

    data

    None

    Yes

  • Component parameters

Category

Parameter

Required value

Field Setting

selectedCol

The name of the computed column. The value of this parameter is STRING in the MTABLE format.

reservedCols

The columns to be reserved by the algorithm.

Parameters Setting

Schema

The name and type of the expanded columns. The format is colname coltype[, colname2, coltype2[, ...]], such as f0 string, f1 bigint, f2 double.

handleInvalidMethod

The method used to handle invalid values. Valid values:

  • error (default)

  • skip

Execution Tuning

Choose Running Mode

MaxCompute

Use MaxCompute or Flink computing resources. Fore information about how to configure the number of workers and their memory, see Appendix: How to estimate resource usage.

Flink

DLC

Use DLC computing resources. Configure the specifications based on the prompts.

Configure the component by coding

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the MTable Expander component.

import numpy as np
import pandas as pd
from pyalink.alink import *
 
df_data = pd.DataFrame([
      ["a1", "11L", 2.2],
      ["a1", "12L", 2.0],
      ["a2", "11L", 2.0],
      ["a2", "12L", 2.0],
      ["a3", "12L", 2.0],
      ["a3", "13L", 2.0],
      ["a4", "13L", 2.0],
      ["a4", "14L", 2.0],
      ["a5", "14L", 2.0],
      ["a5", "15L", 2.0],
      ["a6", "15L", 2.0],
      ["a6", "16L", 2.0]
])
 
input = BatchOperator.fromDataframe(df_data, schemaStr='id string, f0 string, f1 double')
 
zip = GroupByBatchOp()\
	.setGroupByPredicate("id")\
	.setSelectClause("id, mtable_agg(f0, f1) as m_table_col")
 
flatten = FlattenMTableBatchOp()\
	.setReservedCols(["id"])\
	.setSelectedCol("m_table_col")\
	.setSchemaStr('f0 string, f1 int')
 
zip.linkFrom(input).link(flatten).print()