Merge data - Platform For AI - Alibaba Cloud Documentation Center

The JOIN algorithm is typically used in the data preprocessing stage to consolidate relevant information from different data sources into a single data table by matching records on one or more fields. This operation is similar to the JOIN statement in SQL and aims to ensure that the merged data is accurate in terms of integrity and consistency, providing a reliable data foundation for subsequent training and analysis.

Configure the component

You can configure the JOIN component on the pipeline page of Machine Learning Designer. The following table describes the parameters.

Parameter	Description
Join Type	The join type. Valid values: Left Join, Inner Join, Right Join, and Full Join.
MapJoin Optimization	Specifies whether to load data in the small table to the memory to accelerate the execution of the JOIN operation. Valid values: Not Optimized: Data in the small table is not loaded to the memory. Optimize Left Table: The left table is the small table, and data in the left table is loaded to the memory to accelerate the access speed. Optimize Right Table: The right table is the small table, and data in the right table is loaded to the memory to accelerate the access speed.
Join Condition	The join conditions, which are in the format of equations. You can manually add or remove join conditions.
Select Output Columns from the Left Table	The output columns from the left table.
Select Output Columns from the Right Table	The output columns from the right table.