AnalyticDB for MySQL provides the same development method for Spark batch applications and streaming applications. This topic describes how to develop Spark applications.
Development tools
You can use one of the following tools to develop Spark batch applications and streaming applications:
Sample code
The following sample code provides an example on how to develop a Spark application based on data that is stored in Object Storage Service (OSS). The code includes common parameters such as name and conf, and parameters that are specific to Java, Scala, and Python applications. The parameters are written in the JSON format.
{
"args": ["args0", "args1"],
"name": "spark-oss-test",
"file": "oss://<testBucketName>/jars/test/spark-examples-0.0.1-SNAPSHOT.jar",
"className": "com.aliyun.spark.oss.SparkReadOss",
"conf": {
"spark.driver.resourceSpec": "medium",
"spark.executor.resourceSpec": "medium",
"spark.executor.instances": 2,
"spark.adb.connectors": "oss"
}
}
Common parameters
Parameter | Required | Example | Description |
name | No |
| The name of the Spark application. |
file | Yes for Python, Java, and Scala applications |
| The absolute path of the main file of the Spark application. The main file can be a JAR package that contains the entry point or an executable file that serves as the entry point for the Python program. Important You must store the main files of Spark applications in OSS. The OSS bucket must reside in the same region as the AnalyticDB for MySQL cluster. |
files | No |
| The files that are required for the Spark application. These files are downloaded to the working directories of the driver and executor processes. You can configure aliases for the files. Example: Separate multiple files with commas (,). Note
|
archives | No |
| The compressed packages that are required for the Spark application. The packages must be in the TAR.GZ format. The packages are decompressed to the working directory of the Spark process. You can configure aliases for the files that are contained in the package. Example: Separate multiple packages with commas (,). Note You must store all compressed packages that are required for Spark applications in OSS. If a package fails to be decompressed, the job fails. |
conf | Yes |
| The configuration parameters that are required for the Spark application, which are similar to those of Apache Spark. The parameters must be in the |
Java application parameters
Parameter | Required | Example | Description |
args | No |
| The parameters that are required for JAR packages. Separate multiple parameters with commas (,). |
className | Yes |
| The entry class of the Java application. |
jars | No |
| The absolute paths of JAR packages that are required for the Spark application. Separate multiple paths with commas (,). When a Spark application runs, JAR packages are added to the classpaths of the driver and executor Java virtual machines (JVMs). Important
|
Scala application parameters
Parameter | Required | Example | Description |
className | Yes |
| The entry class of the Scala application. |
jars | No |
| The absolute paths of JAR packages that are required for the Spark application. Separate multiple paths with commas (,). When a Spark application runs, JAR packages are added to the classpaths of the driver and executor Java virtual machines (JVMs). Important
|
Python application parameters
Parameter | Required | Example | Description |
pyFiles | Yes |
| The Python files that are required for the PySpark application. The files must be in the ZIP, PY, or EGG format. If multiple Python files are required, we recommend that you use the files in the ZIP or EGG format. You can reference Python files as modules in Python code. Separate multiple packages with commas (,). Note You must store all Python files that are required for Spark applications in OSS. |