All Products
Search
Document Center

E-MapReduce:Get started with the development of Spark Submit jobs

Last Updated:Oct 28, 2024

E-MapReduce (EMR) Serverless Spark is compatible with spark-submit command line parameters. This simplifies the job running process. This topic provides an example on how to develop a Spark Submit job.

Prerequisites

  • A workspace is created. For more information, see Manage workspaces.

  • A business application is developed and a JAR package is created for the application.

Procedure

Step 1: Develop a JAR package

This topic provides a test JAR package to help you quickly get started with Spark Submit jobs. You can download the test package for use in subsequent steps.

Click spark-examples_2.12-3.3.1.jar to download the test JAR package.

Note

The JAR package is a simple example provided by Spark. It is used to calculate the value of pi (π).

Step 2: Upload the JAR package to OSS

In this example, the spark-examples_2.12-3.3.1.jar package is uploaded to the Object Storage Service (OSS) console. For more information, see Simple upload.

Step 3: Develop and run a job

  1. In the left-side navigation pane of the EMR Serverless Spark page, click Data Development.

  2. On the Development tab, click Create.

  3. In the Create dialog box, configure the Name parameter, choose Batch Job > Spark Submit from the Type drop-down list, and then click OK.

  4. In the upper-right corner of the configuration tab of the job, select a queue for the job.

    For information about how to add a queue, see Manage resource queues.

  5. Configure the parameters that are described in the following table and click Run. You do not need to configure other parameters.

    Parameter

    Description

    Script

    The spark-submit script.

    The following sample code provides an example of the spark-submit script:

    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.memory=2g \
    oss://<YourBucket>/spark-examples_2.12-3.3.1.jar
  6. After you run the job, click Details in the Actions column of the job on the Execution Records tab.

  7. On the Job History page, click the Log Exploration tab to view related logs.

    image

Step 4: Publish the job

Important

A published job can be run on a workflow node.

  1. Confirm that the job runs as expected. Then, click Publish in the upper-right corner of the configuration tab of the job.

  2. In the Publish dialog box, configure the Remarks parameter and click OK.

(Optional) Step 5: View job information on the Spark UI

After the job runs as expected, you can view the running details of the job on the Spark UI.

  1. In the left-side navigation pane of the EMR Serverless Spark page, click Job History.

  2. On the Job History page, click the Development Job Runs tab.

  3. On the Development Job Runs tab, find the desired job and click Spark UI in the Actions column.

    On the Spark Jobs page, view the running details of the job.

References

After a job is published, you can schedule the job in workflows. For more information, see Manage workflows. For information about a complete job development and orchestration process, see Get started with the development of Spark SQL jobs.