Configure parameters for jobs

This topic describes how to configure parameters for the Spark jobs of Lindorm Distributed Processing System (LDPS).

Configure parameters for Spark jobs

LDPS allows you to configure common parameters for Spark jobs, including parameters related to resources, execution, and monitoring.

Table 1. Kyuubi parameters

Restricted parameters

The spark.master and spark.submit.deployMode parameters are system parameters and cannot be customized.

Parameter	Description

Parameter	Description
spark.master	The endpoint of the cluster management system.
spark.submit.deployMode	The mode in which the Spark driver is deployed.

Resource parameters

LDPS provides services based on elastic resource pools. By default, the maximum number of resources that you can configure is not limited. These resources are billed based on a pay-as-you-go basis. For more information about how to modify the maximum number of resources that you can configure, see Modify the configurations of LDPS.

You can configure resource parameters for each job that you submit to LDPS, such as a JDBC, JAR, or Python job. Resource parameters include specification parameters and capacity parameters.

Specification parameters

Basic specification parameters

Parameter	Description	Default value
spark.driver.memory	The size of heap memory of the driver. Unit: MB.	8192M
spark.driver.memoryOverhead	The size of off-heap memory of the driver. Unit: MB.	8192M
spark.kubernetes.driver.disk.size	The size of the local disk of the driver. Unit: GB.	50
spark.executor.cores	The number of CPU cores of a single executor node.	4
spark.executor.memory	The size of heap memory of a single executor. Unit: MB.	8192M
spark.executor.memoryOverhead	The size of off-heap memory of a single executor. Unit: MB.	8192M
spark.kubernetes.executor.disk.size	The size of the local disk of a single executor. Unit: GB.	50

Advanced specification parameters

Parameter	Description	Default value
spark.{driver/executor}.resourceTag	The resource specification set. Valid values: `xlarge, 2xlarge, 4xlarge, 8xlarge, and 16xlarge`. Resources defined in the specified specification set are automatically configured for the job. xlarge: spark.{driver/executor}.cores=4 spark.{driver/executor}.memory=8192m spark.{driver/executor}.memoryOverhead=8192m spark.kubernetes.{driver/executor}.disk.size=50 2xlarge: spark.{driver/executor}.cores=8 spark.{driver/executor}.memory=16384m spark.{driver/executor}.memoryOverhead=16384m spark.kubernetes.{driver/executor}.disk.size=100 4xlarge: spark.{driver/executor}.cores=16 spark.{driver/executor}.memory=32768m spark.{driver/executor}.memoryOverhead=32768m spark.kubernetes.{driver/executor}.disk.size=200 8xlarge: spark.{driver/executor}.cores=32 spark.{driver/executor}.memory=65536m spark.{driver/executor}.memoryOverhead=65536m spark.kubernetes.{driver/executor}.disk.size=400 16xlarge: spark.{driver/executor}.cores=64 spark.{driver/executor}.memory=131072m spark.{driver/executor}.memoryOverhead=131072m spark.kubernetes.{driver/executor}.disk.size=400	None
spark.kubernetes.{driver/executor}.ecsModelPreference	The model of the compute node. This parameter can be configured together with the `spark.{driver/executor}.resourceTag` parameter to configure different models in sequence. You can configure up to four models. Example: `spark.kubernetes.driver.ecsModelPreference=hfg6,g6`. LDPS applies for the models in the order based on which the models are specified. If all specified models are out of stock, LDPS randomly applies for an available model based on the specified resource specification.	None
spark.kubernetes.{driver/executor}.annotation.k8s.aliyun.com/eci-use-specs	The specification and model of the GPU. For more information, see GPU-accelerated instance specifications.	ecs.gn7i-c8g1.2xlarge
spark.{driver/executor}.resource.gpu.vendor	The manufacturer of the GPU. Note The value of this parameter must correspond to the specified GPU specification.	nvidia.com
spark.{driver/executor}.resource.gpu.amount	The number of GPUs. Note Set this parameter to 1.	1
spark.{driver/executor}.resource.gpu.discoveryScript	The path in which the script file is located. Note The script file specified by this parameter is used to query and associated with GPU resources when you start the Spark driver or executor. Set this parameter to `/opt/spark/examples/src/main/scripts/getGpusResources.sh`.	/opt/spark/examples/src/main/scripts/getGpusResources.sh
spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs	The specification of executor instances. Expand the disk of executors to ensure sufficient capacity. The following specifications are supported: ecs.d1ne.2xlarge: 8 cores and 32 GB of memory. ecs.d1ne.4xlarge: 16 cores and 64 GB of memory. ecs.d1ne.6xlarge: 24 cores and 96 GB of memory. ecs.d1ne.8xlarge: 32 cores and 128 GB of memory. ecs.d1ne.14xlarge: 56 cores and 224 GB of memory. Note Select the specification of executor instances based on your requirement. You must configure the following two parameters together with this parameter: spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.mount.path=/var spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.options.medium=LocalRaid0 In some cases, executors with the specified specification may be unavailable. If an error occurs during configuration, contact the technical support of Lindorm (DingTalk ID: s0s3eg3).	None

Capacity parameters

Parameter	Description	Default value
spark.executor.instances	The number of executors that are applied for the job.	2
spark.dynamicAllocation.enabled	Specifies whether to enable dynamic resource allocation. Valid values: true: Enable dynamic resource allocation. false: Disable dynamic resource allocation. After dynamic resource allocation is enabled, LDPS applies for and releases executors based on the real-time workload of the job.	true
spark.dynamicAllocation.minExecutors	The minimum number of executors when dynamic resource allocation is enabled.	0
spark.dynamicAllocation.maxExecutors	The maximum number of executors when dynamic resource allocation is enabled. Note The maximum number of executors is the same as the specified number of concurrent tasks.	Infinity
spark.dynamicAllocation.executorIdleTimeout	The maximum idle period for executors when dynamic resource allocation is enabled. If an executor is idle for a time period longer than the specified value, the executor is released. Unit: seconds.	600s

Execution parameters

Parameter	Description	Default value

Parameter	Description	Default value
spark.speculation	Specifies whether to enable speculative execution. Valid values: true: Enable speculative execution. false: Disable speculative execution. If the execution of a task takes a large amount of time, the driver resubmits the task to avoid long tails. Note Long tails indicate that the execution periods of some tasks are significantly longer than those of other tasks.	true
spark.task.maxFailures	The maximum number of failures that is allowed for a task. If the number for which a task has been failed exceeds the value, the job to which the task belongs fails.	4
spark.dfsLog.executor.enabled	Specifies whether to store the logs of executors to LindormDFS. Valid values: true: Store the logs of executors to LindormDFS. false: Do not store the logs of executors to LindormDFS. If the jobs in LDPS is large in scale, you can set this parameter to false to prevent excess DFS loads caused by log streams.	true
spark.jars	The path of the JAR package that is required when you submit a task. The value of this parameter can be a path in OSS or HDFS. If you set this parameter to an OSS path, you must also configure the following parameters: spark.hadoop.fs.oss.endpoint spark.hadoop.fs.oss.accessKeyId spark.hadoop.fs.oss.accessKeySecret spark.hadoop.fs.oss.impl Important If you use JDBC to connect to LDFS, this parameter can be set only to an HDFS path.	None
spark.hadoop.fs.oss.endpoint	The endpoint of OSS. For more information about how to obtain the endpoint, see Regions and OSS endpoints in the public cloud.	None
spark.hadoop.fs.oss.accessKeyId	The AccessKey ID of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair.	None
spark.hadoop.fs.oss.accessKeySecret	The AccessKey secret of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account. For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair.	None
spark.hadoop.fs.oss.impl	The class that is used to access OSS. Set this parameter to `org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem`.	None
spark.default.parallelism	The default concurrency of non-SQL tasks, including the data source concurrency and shuffle concurrency.	None
spark.sql.shuffle.partitions	The default shuffle concurrency of SQL tasks.	200

Monitoring parameters

LDFS allows you to use customized parameters to monitor the status of the instance. You can configure these parameters to record the status of the drivers and executors in job logs.

Parameter	Description	Default value

Parameter	Description	Default value
spark.monitor.cmd	The command group for job monitoring. Separate multiple commands with semicolons (;). The commands specified by this parameter are executed in sequence at regular intervals. The execution results of the commands are recorded in job logs. Sample monitoring commands: Monitor the system status: top -b -n 1, vmstat. Monitor the memory status: free -m. Monitor the I/O status: iostat -d -x -c -k. Monitor the disk status: df -h. Monitor the network status: sar -n DEV 1 1, netstat. Sample statements: Configure a single monitoring command: `"spark.monitor.cmd": "top -b -n 1"` Configure multiple monitoring commands: `"spark.monitor.cmd": "top -b -n 1; vmstat; free -m; iostat -d -x -c -k; df -h; sar -n DEV 1 1; netstat"` Important If you use Beeline or JDBC to submit a job, this parameter cannot be configured.	None
spark.monitor.interval	The interval at which the commands in the group are executed. Unit: seconds. The commands specified by the spark.monitor.cmd parameter are executed at the interval specified by this parameter.	60
spark.monitor.timeout	The timeout period for the monitoring commands. Unit: seconds. If the execution time of a monitoring command in the command group specified by the spark.monitor.cmd parameter exceeds the specified timeout period, the command is skipped and the subsequent commands are executed. This way, monitoring information can be recorded in logs without being blocked.	2

Parameters related to open source Spark

For more information about parameters related to open source Spark, see Spark Configuration.

Configuration method

When you submit jobs to LDPS, you can configure customized resource parameters. The configuration method varies with the method that you use to submit jobs.

Beeline

You can specify the configuration method by modifying the conf/beeline.conf configuration file in the Spark package where the Beeline command line tool is located. For more information, see Quick start.

The following example shows the content of the configuration file:

# Endpoint of Lindorm Compute Engine, e.g. jdbc:hive2://123.456.XX.XX:10009/;?token=bb8a15-jaksdj-sdfjsd-ak****
endpoint=jdbc:hive2://ld-bp13ez23egd123****-proxy-ldps-pub.lindorm.aliyuncs.com:10009/;?token=jfjwi2453-fe39-cmkfe-afc9-01eek2j5****

# Username for connection, by default root.
user=root

# Password for connection, by default root.
password=root

# Whether to share Spark resource between different sessions, by default true.
shareResource=false

# Normal Spark configurations
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=3

JDBC

You can configure parameters by using the JDBC connection string. For more information about the JDBC URL, see Use JDBC in application development.

For example, you can use a JDBC connection string to set the default number of Spark shuffle partitions to 2 and the memory space used by the executor to 8 GB.

jdbc:hive2://${host}:${port}/;?token=${token};spark.executor.memory=8g;spark.sql.shuffle.partitions=2

JAR

You can configure parameters for a Java job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.
When you use DMS to submit a Java job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.

Python

You can configure parameters for a Python job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.
When you use DMS to submit a Python job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.

Configure parameters for Spark jobs

Restricted parameters

Resource parameters

Specification parameters

Capacity parameters

Execution parameters

Monitoring parameters

Parameters related to open source Spark

Configuration method

Beeline

JDBC

JAR

Python

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)