Time Window MR

Updated at: 2023-07-14 08:15

The multi-date loop execution feature is added to the MaxCompute MR (MapReduce) component so that you can execute day-level MR tasks in parallel within a certain period of time. For example, in the recommendation algorithm customization scenario, you can execute EasyRecFGMapper tasks of the past 30 days in parallel.

Limits

  • This feature is applicable only to the day-level data backfill loops.

  • Disable the multi-date loop execution before you use Periodic Scheduling to schedule your pipeline. This ensures that no extra data backfill tasks are performed in the production environment, and ensures that you do not generate more than the necessary data.

  • If you set the Maximum number of concurrent parameters on the Parameters Setting tab, the settings take effect only on the node for which it is configured. If you want to run data backfill on multiple nodes, take note of the total concurrency limit supported by the resources of the current project.

Configure the component in Machine Learning Designer

Machine Learning Designer allows you to configure the component in the Machine Learning Platform for AI (PAI) console. The following table describes the parameters.

Tab

Parameter

Description

Tab

Parameter

Description

Parameters Setting

Business base date

You can set this parameter in one of the following ways:

Whether to open multi-date loop execution

Multi-date loop execution is enabled by default. If multi-date loop execution is disabled, this component functions the same as the MR component.

Execution time window

The value can contain integers and time ranges. Separate time ranges with commas (,).

The system calculates the execution time based on the Business base date and starts subtasks at the specified time. Up to 100 subtasks can be executed.

For example, if you set the Business base date to 20230210 and the Execution time window to (-4,-2],0, then tasks are executed on data obtained for 20230207, 20230208, and 20230210.

Maximum number of concurrent

We recommend that you do not run a large number of concurrent tasks at a time to avoid resource contention. Default value: 3.

Date format

The value is used to generate the ${pai.system.cycledate} system variable. Valid values:

  • yyyyMMdd: this is the default value.

  • yyyy-MM-dd

  • yyyy/MM/dd

Example: If you set the Business base date to 20230210 and the Date format to yyyy-MM-dd, then the ${pai.system.cycledate} variable in the SQL script is converted to 2023-02-10.

Resource OSS Path

The directory where the resource file is located.

List of resource files

Separate multiple resource files with commas (,).

OSS path of classpath

The path of the JAR file.

Main class

Main class

MR task input parameters

If the multi-date loop execution is enabled, you need to replace the date with the system variable ${pai.system.cycledate}.

As is set in the example of this topic, three tasks are started in parallel. Other features are the same with the MR component.

Example

For more information, see 2_rec_sln_demo_dssm_recall_vector_recall_sample_fg_encoded_v1 in Vector Recall.

  • On this page (1, T)
  • Limits
  • Configure the component in Machine Learning Designer
  • Example
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare