All Products
Search
Document Center

AnalyticDB:Overview

Last Updated:Sep 27, 2024

AnalyticDB for MySQL provides the same development method for Spark batch applications and streaming applications. This topic describes how to develop Spark applications.

Development tools

You can use one of the following tools to develop Spark batch applications and streaming applications:

Sample code

The following sample code provides an example on how to develop a Spark application based on data that is stored in Object Storage Service (OSS). The code includes common parameters such as name and conf, and parameters that are specific to Java, Scala, and Python applications. The parameters are written in the JSON format.

 {
  "args": ["args0", "args1"],
  "name": "spark-oss-test",
  "file": "oss://<testBucketName>/jars/test/spark-examples-0.0.1-SNAPSHOT.jar",
  "className": "com.aliyun.spark.oss.SparkReadOss",
  "conf": {
    "spark.driver.resourceSpec": "medium",
    "spark.executor.resourceSpec": "medium",
    "spark.executor.instances": 2,
    "spark.adb.connectors": "oss"
  }
}

Common parameters

Parameter

Required

Example

Description

name

No

"name": "spark-oss-test"

The name of the Spark application.

file

Yes for Python, Java, and Scala applications

"file":"oss://<testBucketName>/jars/test/spark-examples-0.0.1-SNAPSHOT.jar"

The absolute path of the main file of the Spark application. The main file can be a JAR package that contains the entry point or an executable file that serves as the entry point for the Python program.

    Important

    You must store the main files of Spark applications in OSS.

    The OSS bucket must reside in the same region as the AnalyticDB for MySQL cluster.

files

No

"files":["oss://<testBucketName>/path/to/files_name1","oss://<testBucketName>/path/to/files_name2"]

The files that are required for the Spark application. These files are downloaded to the working directories of the driver and executor processes.

You can configure aliases for the files. Example: oss://<testBucketName>/test/test1.txt#test1. In this example, test1 is used as the file alias. You can specify ./test1 or ./test1.txt to access the file.

Separate multiple files with commas (,).

Note
  • If you specify the log4j.properties file for this parameter, the Spark application uses the log4j.properties file as the log configuration file.

  • You must store all files that are required for Spark applications in OSS.

archives

No

"archives":["oss://<testBucketName>/path/to/archives","oss://<testBucketName>/path/to/archives"]

The compressed packages that are required for the Spark application. The packages must be in the TAR.GZ format. The packages are decompressed to the working directory of the Spark process.

You can configure aliases for the files that are contained in the package. Example: oss://testBucketName/test/test1.tar.gz#test1. In this example, test1 is used as the file alias. For example, test2.txt is a file that is contained in the test1.tar.gz package. You can access the file by specifying ./test1/test2.txt or ./test1.tar.gz/test2.txt.

Separate multiple packages with commas (,).

Note

You must store all compressed packages that are required for Spark applications in OSS. If a package fails to be decompressed, the job fails.

conf

Yes

"conf":{"spark.driver.resourceSpec": "medium",spark.executor.resourceSpec":"medium,"spark.executor.instances": 2,"spark.adb.connectors": "oss"}

The configuration parameters that are required for the Spark application, which are similar to those of Apache Spark. The parameters must be in the key: value format. Separate multiple parameters with commas (,). For information about the configuration parameters that are different from those of Apache Spark or the configuration parameters that are specific to AnalyticDB for MySQL, see Spark application configuration parameters.

Java application parameters

Parameter

Required

Example

Description

args

No

"args":["args0", "args1"]

The parameters that are required for JAR packages. Separate multiple parameters with commas (,).

className

Yes

"className":"com.aliyun.spark.oss.SparkReadOss"

The entry class of the Java application.

jars

No

"jars":["oss://<testBucketName>/path/to/jar","oss://testBucketName/path/to/jar"]

The absolute paths of JAR packages that are required for the Spark application. Separate multiple paths with commas (,). When a Spark application runs, JAR packages are added to the classpaths of the driver and executor Java virtual machines (JVMs).

Important
  • You must store all JAR packages that are required for Spark applications in OSS.

  • The OSS bucket must reside in the same region as the AnalyticDB for MySQL cluster.

Scala application parameters

Parameter

Required

Example

Description

className

Yes

"className":"com.aliyun.spark.oss.SparkReadOss"

The entry class of the Scala application.

jars

No

"jars":["oss://<testBucketName>/path/to/jar","oss://testBucketName/path/to/jar"]

The absolute paths of JAR packages that are required for the Spark application. Separate multiple paths with commas (,). When a Spark application runs, JAR packages are added to the classpaths of the driver and executor Java virtual machines (JVMs).

Important
  • You must store all JAR packages that are required for Spark applications in OSS.

  • The OSS bucket must reside in the same region as the AnalyticDB for MySQL cluster.

Python application parameters

Parameter

Required

Example

Description

pyFiles

Yes

"pyFiles":["oss://<testBucketName>/path/to/pyfiles","oss://<testBucketName>/path/to/pyfiles"]

The Python files that are required for the PySpark application. The files must be in the ZIP, PY, or EGG format. If multiple Python files are required, we recommend that you use the files in the ZIP or EGG format. You can reference Python files as modules in Python code. Separate multiple packages with commas (,).

Note

You must store all Python files that are required for Spark applications in OSS.