All Products
Search
Document Center

Lindorm:Configure parameters for jobs

Last Updated:Sep 13, 2024

This topic describes how to configure parameters for the Spark jobs of Lindorm Distributed Processing System (LDPS).

Configure parameters for Spark jobs

LDPS allows you to configure common parameters for Spark jobs, including parameters related to resources, execution, and monitoring.

Table 1. Kyuubi parameters

Restricted parameters

The spark.master and spark.submit.deployMode parameters are system parameters and cannot be customized.

Parameter

Description

spark.master

The endpoint of the cluster management system.

spark.submit.deployMode

The mode in which the Spark driver is deployed.

Resource parameters

LDPS provides services based on elastic resource pools. By default, the maximum number of resources that you can configure is not limited. These resources are billed based on a pay-as-you-go basis. For more information about how to modify the maximum number of resources that you can configure, see Modify the configurations of LDPS.

You can configure resource parameters for each job that you submit to LDPS, such as a JDBC, JAR, or Python job. Resource parameters include specification parameters and capacity parameters.

Specification parameters

Basic specification parameters

Parameter

Description

Default value

spark.driver.memory

The size of heap memory of the driver. Unit: MB.

8192M

spark.driver.memoryOverhead

The size of off-heap memory of the driver. Unit: MB.

8192M

spark.kubernetes.driver.disk.size

The size of the local disk of the driver. Unit: GB.

50

spark.executor.cores

The number of CPU cores of a single executor node.

4

spark.executor.memory

The size of heap memory of a single executor. Unit: MB.

8192M

spark.executor.memoryOverhead

The size of off-heap memory of a single executor. Unit: MB.

8192M

spark.kubernetes.executor.disk.size

The size of the local disk of a single executor. Unit: GB.

50

Advanced specification parameters

Parameter

Description

Default value

spark.{driver/executor}.resourceTag

The resource specification set. Valid values: xlarge, 2xlarge, 4xlarge, 8xlarge, and 16xlarge. Resources defined in the specified specification set are automatically configured for the job.

  • xlarge:

    spark.{driver/executor}.cores=4

    spark.{driver/executor}.memory=8192m

    spark.{driver/executor}.memoryOverhead=8192m

    spark.kubernetes.{driver/executor}.disk.size=50

  • 2xlarge:

    spark.{driver/executor}.cores=8

    spark.{driver/executor}.memory=16384m

    spark.{driver/executor}.memoryOverhead=16384m

    spark.kubernetes.{driver/executor}.disk.size=100

  • 4xlarge:

    spark.{driver/executor}.cores=16

    spark.{driver/executor}.memory=32768m

    spark.{driver/executor}.memoryOverhead=32768m

    spark.kubernetes.{driver/executor}.disk.size=200

  • 8xlarge:

    spark.{driver/executor}.cores=32

    spark.{driver/executor}.memory=65536m

    spark.{driver/executor}.memoryOverhead=65536m

    spark.kubernetes.{driver/executor}.disk.size=400

  • 16xlarge:

    spark.{driver/executor}.cores=64

    spark.{driver/executor}.memory=131072m

    spark.{driver/executor}.memoryOverhead=131072m

    spark.kubernetes.{driver/executor}.disk.size=400

None

spark.kubernetes.{driver/executor}.ecsModelPreference

The model of the compute node. This parameter can be configured together with the spark.{driver/executor}.resourceTag parameter to configure different models in sequence. You can configure up to four models. Example: spark.kubernetes.driver.ecsModelPreference=hfg6,g6.

LDPS applies for the models in the order based on which the models are specified. If all specified models are out of stock, LDPS randomly applies for an available model based on the specified resource specification.

None

spark.kubernetes.{driver/executor}.annotation.k8s.aliyun.com/eci-use-specs

The specification and model of the GPU. For more information, see GPU-accelerated instance specifications.

ecs.gn7i-c8g1.2xlarge

spark.{driver/executor}.resource.gpu.vendor

The manufacturer of the GPU.

Note

The value of this parameter must correspond to the specified GPU specification.

nvidia.com

spark.{driver/executor}.resource.gpu.amount

The number of GPUs.

Note

Set this parameter to 1.

1

spark.{driver/executor}.resource.gpu.discoveryScript

The path in which the script file is located.

Note

The script file specified by this parameter is used to query and associated with GPU resources when you start the Spark driver or executor. Set this parameter to /opt/spark/examples/src/main/scripts/getGpusResources.sh.

/opt/spark/examples/src/main/scripts/getGpusResources.sh

spark.kubernetes.executor.annotation.k8s.aliyun.com/eci-use-specs

The specification of executor instances. Expand the disk of executors to ensure sufficient capacity.

The following specifications are supported:

  • ecs.d1ne.2xlarge: 8 cores and 32 GB of memory.

  • ecs.d1ne.4xlarge: 16 cores and 64 GB of memory.

  • ecs.d1ne.6xlarge: 24 cores and 96 GB of memory.

  • ecs.d1ne.8xlarge: 32 cores and 128 GB of memory.

  • ecs.d1ne.14xlarge: 56 cores and 224 GB of memory.

    Note
    • Select the specification of executor instances based on your requirement.

    • You must configure the following two parameters together with this parameter:

      • spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.mount.path=/var

      • spark.kubernetes.executor.volumes.emptyDir.spark-local-dir-1.options.medium=LocalRaid0

    • In some cases, executors with the specified specification may be unavailable. If an error occurs during configuration, contact the technical support of Lindorm (DingTalk ID: s0s3eg3).

None

Capacity parameters

Parameter

Description

Default value

spark.executor.instances

The number of executors that are applied for the job.

2

spark.dynamicAllocation.enabled

Specifies whether to enable dynamic resource allocation. Valid values:

  • true: Enable dynamic resource allocation.

  • false: Disable dynamic resource allocation.

After dynamic resource allocation is enabled, LDPS applies for and releases executors based on the real-time workload of the job.

true

spark.dynamicAllocation.minExecutors

The minimum number of executors when dynamic resource allocation is enabled.

0

spark.dynamicAllocation.maxExecutors

The maximum number of executors when dynamic resource allocation is enabled.

Note

The maximum number of executors is the same as the specified number of concurrent tasks.

Infinity

spark.dynamicAllocation.executorIdleTimeout

The maximum idle period for executors when dynamic resource allocation is enabled. If an executor is idle for a time period longer than the specified value, the executor is released. Unit: seconds.

600s

Execution parameters

Parameter

Description

Default value

spark.speculation

Specifies whether to enable speculative execution. Valid values:

  • true: Enable speculative execution.

  • false: Disable speculative execution.

If the execution of a task takes a large amount of time, the driver resubmits the task to avoid long tails.

Note

Long tails indicate that the execution periods of some tasks are significantly longer than those of other tasks.

true

spark.task.maxFailures

The maximum number of failures that is allowed for a task. If the number for which a task has been failed exceeds the value, the job to which the task belongs fails.

4

spark.dfsLog.executor.enabled

Specifies whether to store the logs of executors to LindormDFS. Valid values:

  • true: Store the logs of executors to LindormDFS.

  • false: Do not store the logs of executors to LindormDFS.

If the jobs in LDPS is large in scale, you can set this parameter to false to prevent excess DFS loads caused by log streams.

true

spark.jars

The path of the JAR package that is required when you submit a task. The value of this parameter can be a path in OSS or HDFS.

If you set this parameter to an OSS path, you must also configure the following parameters:

  • spark.hadoop.fs.oss.endpoint

  • spark.hadoop.fs.oss.accessKeyId

  • spark.hadoop.fs.oss.accessKeySecret

  • spark.hadoop.fs.oss.impl

Important

If you use JDBC to connect to LDFS, this parameter can be set only to an HDFS path.

None

spark.hadoop.fs.oss.endpoint

The endpoint of OSS. For more information about how to obtain the endpoint, see Regions and OSS endpoints in the public cloud.

None

spark.hadoop.fs.oss.accessKeyId

The AccessKey ID of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account.

For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair.

None

spark.hadoop.fs.oss.accessKeySecret

The AccessKey secret of your Alibaba Cloud account or a RAM user of your Alibaba Cloud account.

For more information about how to obtain the AccessKey ID and AccessKey secret, see Obtain an AccessKey pair.

None

spark.hadoop.fs.oss.impl

The class that is used to access OSS.

Set this parameter to org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.

None

spark.default.parallelism

The default concurrency of non-SQL tasks, including the data source concurrency and shuffle concurrency.

None

spark.sql.shuffle.partitions

The default shuffle concurrency of SQL tasks.

200

Monitoring parameters

LDFS allows you to use customized parameters to monitor the status of the instance. You can configure these parameters to record the status of the drivers and executors in job logs.

Parameter

Description

Default value

spark.monitor.cmd

The command group for job monitoring. Separate multiple commands with semicolons (;). The commands specified by this parameter are executed in sequence at regular intervals. The execution results of the commands are recorded in job logs.

Sample monitoring commands:

  • Monitor the system status: top -b -n 1, vmstat.

  • Monitor the memory status: free -m.

  • Monitor the I/O status: iostat -d -x -c -k.

  • Monitor the disk status: df -h.

  • Monitor the network status: sar -n DEV 1 1, netstat.

Sample statements:

  • Configure a single monitoring command:

    "spark.monitor.cmd": "top -b -n 1"
  • Configure multiple monitoring commands:

"spark.monitor.cmd": "top -b -n 1; vmstat; free -m; iostat -d -x -c -k; df -h; sar -n DEV 1 1; netstat"
Important

If you use Beeline or JDBC to submit a job, this parameter cannot be configured.

None

spark.monitor.interval

The interval at which the commands in the group are executed. Unit: seconds.

The commands specified by the spark.monitor.cmd parameter are executed at the interval specified by this parameter.

60

spark.monitor.timeout

The timeout period for the monitoring commands. Unit: seconds.

If the execution time of a monitoring command in the command group specified by the spark.monitor.cmd parameter exceeds the specified timeout period, the command is skipped and the subsequent commands are executed. This way, monitoring information can be recorded in logs without being blocked.

2

Parameters related to open source Spark

For more information about parameters related to open source Spark, see Spark Configuration.

Configuration method

When you submit jobs to LDPS, you can configure customized resource parameters. The configuration method varies with the method that you use to submit jobs.

Beeline

You can specify the configuration method by modifying the conf/beeline.conf configuration file in the Spark package where the Beeline command line tool is located. For more information, see Quick start.

The following example shows the content of the configuration file:

# Endpoint of Lindorm Compute Engine, e.g. jdbc:hive2://123.456.XX.XX:10009/;?token=bb8a15-jaksdj-sdfjsd-ak****
endpoint=jdbc:hive2://ld-bp13ez23egd123****-proxy-ldps-pub.lindorm.aliyuncs.com:10009/;?token=jfjwi2453-fe39-cmkfe-afc9-01eek2j5****

# Username for connection, by default root.
user=root

# Password for connection, by default root.
password=root

# Whether to share Spark resource between different sessions, by default true.
shareResource=false

# Normal Spark configurations
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=3

JDBC

You can configure parameters by using the JDBC connection string. For more information about the JDBC URL, see Use JDBC in application development.

For example, you can use a JDBC connection string to set the default number of Spark shuffle partitions to 2 and the memory space used by the executor to 8 GB.

jdbc:hive2://${host}:${port}/;?token=${token};spark.executor.memory=8g;spark.sql.shuffle.partitions=2

JAR

  • You can configure parameters for a Java job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.

  • When you use DMS to submit a Java job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.

Python

  • You can configure parameters for a Python job based on a job content template when you submit the Java job in the Lindorm console. For more information, see Manage jobs in the Lindorm console.

  • When you use DMS to submit a Python job, you can configure customized parameters for the job in the Job configuration section of the job node page. For more information, see Use DMS to manage jobs.