This topic introduces the basic terms of MapReduce.
Map/Reduce
When a map or reduce task runs, the setup(), map() or reduce(), and cleanup() methods are called. The setup() method is called prior to the map() or reduce() method. Each worker calls it only once.
The cleanup() method is called after the map() or reduce() method. Each worker calls it only once.
For more information about usage examples, see Example programs.
Sort
Some columns in the key records generated by a mapper can be used as sort columns. These columns do not support a custom comparator. You can select a few sort columns as group columns. These columns do not support a custom group comparator. Sort columns are used to sort your data, while group columns are used for secondary sorting.
For more information about usage examples, see Secondary sorting source code.
Partition
MaxCompute supports partition columns and custom partitioners. Partition columns take precedence over custom partitioners.
Partitioners are used to allocate the data generated by a mapper to different reducers based on the partitioning logic.
Combiner
The combiner function combines adjacent records at the shuffle stage. You can determine whether to use the combiner function based on your business logic.
The combiner function is the optimization of the MapReduce computing framework. The combiner logic is the same as the reducer logic. After a mapper generates data, the framework combines the data with the same key at the map stage.
For more information about usage examples, see Example programs.