All Products
Search
Document Center

E-MapReduce:Get started with the development of JAR batch jobs

Last Updated:Oct 28, 2024

You can build a JAR package that contains business logic and upload the JAR package for the development of Spark JAR jobs in a convenient manner. This topic provides an example on how to develop and deploy a JAR batch job.

Prerequisites

  • A workspace is created. For more information, see Manage workspaces.

  • A business application is developed and a JAR package is created for the application.

Procedure

Step 1: Develop a JAR package

E-MapReduce (EMR) Serverless Spark does not provide an integrated environment for developing JAR packages. Therefore, you must write the code of a Spark application and package the code into a JAR file on an on-premises or a standalone development platform. This topic provides a test JAR package to help you quickly get started with JAR batch jobs. You can download the test package for use in subsequent steps.

Click spark-examples_2.12-3.3.1.jar to download the test JAR package.

Note

The JAR package is a simple example provided by Spark. It is used to calculate the value of pi (π).

Step 2: Upload the JAR package

  1. Go to the Files page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, find the desired workspace and click the name of the workspace.

    4. In the left-side navigation pane of the EMR Serverless Spark page, click Files.

  2. On the Files page, click Upload File.

  3. In the Upload File dialog box, click the area in a dotted-line rectangle to select a JAR package or directly drag a JAR package to the area.

    In this example, the spark-examples_2.12-3.3.1.jar package is uploaded.

Step 3: Develop and run a job

  1. In the left-side navigation pane of the EMR Serverless Spark page, click Data Development.

  2. On the Development tab, click Create.

  3. In the Create dialog box, configure the Name parameter, choose Batch Job > JAR from the Type drop-down list, and then click OK.

  4. In the upper-right corner of the configuration tab of the job, select a queue for the job.

    For information about how to add a queue, see Manage resource queues.

  5. Configure the parameters that are described in the following table and click Run. You do not need to configure other parameters.

    Parameter

    Description

    Main JAR Resource

    Select the JAR package that you uploaded in the previous step. In this example, the spark-examples_2.12-3.3.1.jar package is selected.

    Main Class

    Enter the main class that is specified when a Spark job is submitted. In this example, enter org.apache.spark.examples.SparkPi.

  6. After you run the job, click Details in the Actions column of the job on the Execution Records tab.

  7. On the Development Job Runs tab of the Job History page, view related logs.

    image

Step 4: Publish the job

Important

A published job can be run on a workflow node.

  1. Confirm that the job runs as expected. Then, click Publish in the upper-right corner of the configuration tab of the job.

  2. In the Publish dialog box, configure the Remarks parameter and click OK.

(Optional) Step 5: View job information on the Spark UI

After the job runs as expected, you can view the running details of the job on the Spark UI.

  1. In the left-side navigation pane of the EMR Serverless Spark page, click Job History.

  2. On the Job History page, click the Development Job Runs tab.

  3. On the Development Job Runs tab, find the desired job and click Details in the Actions column.

  4. On the Overview tab, click Spark UI in the Spark UI field.

    image

  5. On the Spark Jobs page, view the running details of the job.

    image

References

After a job is published, you can schedule the job in workflows. For more information, see Manage workflows. For information about a complete job development and orchestration process, see Get started with the development of Spark SQL jobs.