This topic describes how to configure parameters for the Spark jobs of Lindorm Distributed Processing System (LDPS).
Configure parameters for Spark jobs
LDPS allows you to configure common parameters for Spark jobs, including parameters related to resources, execution, and monitoring.
Table 1. Kyuubi parameters
Restricted parameters
The spark.master and spark.submit.deployMode parameters are system parameters and cannot be customized.
Parameter | Description |
spark.master | The endpoint of the cluster management system. |
spark.submit.deployMode | The mode in which the Spark driver is deployed. |
Resource parameters
LDPS provides services based on elastic resource pools. By default, the maximum number of resources that you can configure is not limited. These resources are billed based on a pay-as-you-go basis. For more information about how to modify the maximum number of resources that you can configure, see Modify the configurations of LDPS.
You can configure resource parameters for each job that you submit to LDPS, such as a JDBC, JAR, or Python job. Resource parameters include specification parameters and capacity parameters.
Specification parameters
Basic specification parameters
Parameter | Description | Default value |
spark.driver.memory | The size of heap memory of the driver. Unit: MB. | 8192M |
spark.driver.memoryOverhead | The size of off-heap memory of the driver. Unit: MB. | 8192M |
spark.kubernetes.driver.disk.size | The size of the local disk of the driver. Unit: GB. | 50 |
spark.executor.cores | The number of CPU cores of a single executor node. | 4 |
spark.executor.memory | The size of heap memory of a single executor. Unit: MB. | 8192M |
spark.executor.memoryOverhead | The size of off-heap memory of a single executor. Unit: MB. | 8192M |
spark.kubernetes.executor.disk.size | The size of the local disk of a single executor. Unit: GB. | 50 |
Advanced specification parameters
Parameter | Description | Default value |
spark.{driver/executor}.resourceTag | The resource specification set. Valid values:
| None |
spark.kubernetes.{driver/executor}.ecsModelPreference | The model of the compute node. This parameter can be configured together with the LDPS applies for the models in the order based on which the models are specified. If all specified models are out of stock, LDPS randomly applies for an available model based on the specified resource specification. | None |
spark.kubernetes.{driver/executor}.annotation.k8s.aliyun.com/eci-use-specs | The specification and model of the GPU. For more information, see GPU-accelerated instance specifications. | ecs.gn7i-c8g1.2xlarge |
spark.{driver/executor}.resource.gpu.vendor | The manufacturer of the GPU. Note The value of this parameter must correspond to the specified GPU specification. | nvidia.com |
spark.{driver/executor}.resource.gpu.amount | The number of GPUs. Note Set this parameter to 1. | 1 |
spark.{driver/executor}.resource.gpu.discoveryScript | The path in which the script file is located. Note The script file specified by this parameter is used to query and associated with GPU resources when you start the Spark driver or executor. Set this parameter to | /opt/spark/examples/src/main/scripts/getGpusResources.sh |
spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs | The specification of executor instances. Expand the disk of executors to ensure sufficient capacity. The following specifications are supported:
Note | None |
Capacity parameters
Parameter | Description | Default value |
spark.executor.instances | The number of executors that are applied for the job. | 2 |
spark.dynamicAllocation.enabled | Specifies whether to enable dynamic resource allocation. Valid values:
After dynamic resource allocation is enabled, LDPS applies for and releases executors based on the real-time workload of the job. | true |
spark.dynamicAllocation.minExecutors | The minimum number of executors when dynamic resource allocation is enabled. | 0 |
spark.dynamicAllocation.maxExecutors | The maximum number of executors when dynamic resource allocation is enabled. Note The maximum number of executors is the same as the specified number of concurrent tasks. | Infinity |
spark.dynamicAllocation.executorIdleTimeout | The maximum idle period for executors when dynamic resource allocation is enabled. If an executor is idle for a time period longer than the specified value, the executor is released. Unit: seconds. | 600s |
Execution parameters
Parameter | Description | Default value |
spark.speculation | Specifies whether to enable speculative execution. Valid values:
If the execution of a task takes a large amount of time, the driver resubmits the task to avoid long tails. Note Long tails indicate that the execution periods of some tasks are significantly longer than those of other tasks. | true |
spark.task.maxFailures | The maximum number of failures that is allowed for a task. If the number for which a task has been failed exceeds the value, the job to which the task belongs fails. | 4 |
spark.dfsLog.executor.enabled | Specifies whether to store the logs of executors to LindormDFS. Valid values:
If the jobs in LDPS is large in scale, you can set this parameter to false to prevent excess DFS loads caused by log streams. | true |
spark.jars | The path of the JAR package that is required when you submit a task. The value of this parameter can be a path in OSS or HDFS. If you set this parameter to an OSS path, you must also configure the following parameters:
Important If you use JDBC to connect to LDFS, this parameter can be set only to an HDFS path. | None |
spark.hadoop.fs.oss.endpoint | The endpoint of OSS. For more information about how to obtain the endpoint, see Regions and OSS endpoints in the public cloud. | None |
spark.hadoop.fs.oss.accessKeyId | The AccessKey ID of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair. | None |
spark.hadoop.fs.oss.accessKeySecret | The AccessKey secret of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair. | None |
spark.hadoop.fs.oss.impl | The class that is used to access OSS. Set this parameter to | None |
spark.default.parallelism | The default concurrency of non-SQL tasks, including the data source concurrency and shuffle concurrency. | None |
spark.sql.shuffle.partitions | The default shuffle concurrency of SQL tasks. | 200 |
Monitoring parameters
LDFS allows you to use customized parameters to monitor the status of the instance. You can configure these parameters to record the status of the drivers and executors in job logs.
Parameter | Description | Default value |
spark.monitor.cmd | The command group for job monitoring. Separate multiple commands with semicolons (;). The commands specified by this parameter are executed in sequence at regular intervals. The execution results of the commands are recorded in job logs. Sample monitoring commands:
Sample statements:
Important If you use Beeline or JDBC to submit a job, this parameter cannot be configured. | None |
spark.monitor.interval | The interval at which the commands in the group are executed. Unit: seconds. The commands specified by the spark.monitor.cmd parameter are executed at the interval specified by this parameter. | 60 |
spark.monitor.timeout | The timeout period for the monitoring commands. Unit: seconds. If the execution time of a monitoring command in the command group specified by the spark.monitor.cmd parameter exceeds the specified timeout period, the command is skipped and the subsequent commands are executed. This way, monitoring information can be recorded in logs without being blocked. | 2 |
Parameters related to open source Spark
For more information about parameters related to open source Spark, see Spark Configuration.
Configuration method
When you submit jobs to LDPS, you can configure customized resource parameters. The configuration method varies with the method that you use to submit jobs.
Beeline
You can specify the configuration method by modifying the conf/beeline.conf
configuration file in the Spark package where the Beeline command line tool is located. For more information, see Quick start.
The following example shows the content of the configuration file:
# Endpoint of Lindorm Compute Engine, e.g. jdbc:hive2://123.456.XX.XX:10009/;?token=bb8a15-jaksdj-sdfjsd-ak****
endpoint=jdbc:hive2://ld-bp13ez23egd123****-proxy-ldps-pub.lindorm.aliyuncs.com:10009/;?token=jfjwi2453-fe39-cmkfe-afc9-01eek2j5****
# Username for connection, by default root.
user=root
# Password for connection, by default root.
password=root
# Whether to share Spark resource between different sessions, by default true.
shareResource=false
# Normal Spark configurations
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=3
JDBC
You can configure parameters by using the JDBC connection string. For more information about the JDBC URL, see Use JDBC in application development.
For example, you can use a JDBC connection string to set the default number of Spark shuffle partitions to 2 and the memory space used by the executor to 8 GB.
jdbc:hive2://${host}:${port}/;?token=${token};spark.executor.memory=8g;spark.sql.shuffle.partitions=2
JAR
You can configure parameters for a Java job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.
When you use DMS to submit a Java job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.
Python
You can configure parameters for a Python job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.
When you use DMS to submit a Python job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.