Common Spark errors and solutions - AnalyticDB - Alibaba Cloud Documentation Center

This topic describes the messages and causes of and solutions to common errors that may occur on AnalyticDB for MySQL Spark jobs. The solutions provided in this topic can help you resolve issues.

Overview

Problem description	Error code	Error message
A Spark job fails to access a Java Database Connectivity (JDBC) data source.	JDBC_SOURCE_TABLE_NAME_DUPLICATE	Both '$jdbcTableName' and '$jdbcQueryString' can not be specified at the same time
	JDBC_NO_SUITABLE_DRIVER	SQLException .* No suitable driver found for
	JDBC_COMMUNICATION_FAILURE	CommunicationsException .* Communications link failure
	JDBC_SSL_ERROR	SSL peer shut down incorrectly
	JDBC_COLUMN_TYPE_PARSER_ERROR	Can't get JDBC type for <DataType>
A Spark job fails due to a surge in data traffic.	EXECUTOR_CONTAINER_OOM	Exit Code: 137
A Spark job fails due to a surge in data traffic.	EXECUTOR_DISK_FULL	No space left on device
A Spark job immediately fails after the job starts.	ENI_NOT_VALID	The VPC of the specified ENI and security group are not in the same VPC
A Spark job immediately fails after the job starts.	DRIVER_FILE_NOTFOUND	java.lang.ClassNotFoundException
A Spark job unexpectedly fails.	BROADCAST_TOO_LARGE	Cannot broadcast the table that is larger than
	BROADCAST_MEM_NOT_ENOUGH	Not enough memory to build and broadcast the table to all
	ADB_DOMAIN_NOT_RESOLVED	unkown host .* ts.adb.com
A Spark job for which an elastic network interface (ENI) is configured fails.	SG_MANADED_BY_CLOUD	The security group has been managed by another cloud product.
	VSWITCH_IP_NOT_ENOUGH	does not have enough IP addresses
A Spark job remains in the Submitting state and cannot enter the Running state.	EXCEEDED_QUOTA	exceeded quota
Hudi data fails to be read and written by using Spark SQL.	HUDI_PARTITION_NOT_EXISTS	Error fetching partition paths with prefix
A Spark job fails to access an Object Storage Service (OSS) data source.	DRIVER_OSS_ACCESS_DENIED	The bucket you access does not belong to you
A Spark job fails to access an Elasticsearch data source.	ES_DATANODE_NOT_FOUND	EsHadoopNoNodesLeftException: Connection error .* all nodes failed
A Spark job fails to access metadata.	USER_HAVE_NONE_PRIVILEGE	MetaException .* User have none of the privileges

Error causes and solutions

Note

You can find an application on the Applications tab and click Log in the Actions column to view the log information about the Spark job. For more information, see Spark editor.

Both '$jdbcTableName' and '$jdbcQueryString' can not be specified at the same time

Error log: Spark driver log.

Cause: When you use Spark to access a JDBC data source, the url and dbtable parameters in the OPTIONS configuration are configured at the same time to specify a table name.

Solution: You can specify a table name only once. Delete the table name from the value of the url parameter.

**SQLException .* No suitable driver found for**

Error log: Spark driver log.

Cause: When you use Spark to access a JDBC data source, no suitable JDBC driver exists.

Solution: Check the JAR package required for the Spark job to ensure that the JAR package contains a suitable JDBC driver. If you want to access multiple JDBC data sources at the same time, the JAR package must contain the JDBC drivers for all data sources. For example, if you want to access the Hive and ApsaraDB RDS for MySQL data sources at the same time, the JAR package must contain the JDBC drivers for the Hive and ApsaraDB RDS for MySQL data sources.

CommunicationsException .* Communications link failure

Error log: Spark driver log.

Cause: The ENI that is configured for a Spark job cannot access a specific data source. In most cases, this error occurs in scenarios that involve ApsaraDB RDS for MySQL or Hadoop Distributed File System (HDFS) data sources.

Solution:

Check whether the CIDR block of the vSwitch that is configured for the Spark job is included in the whitelist of the data source. If not, reconfigure the whitelist of the data source.
For example, if the CIDR block of the specified vSwitch is not included in the whitelist of the ApsaraDB RDS for MySQL data source that you want to access, you must add the CIDR block to the whitelist.
Check whether the security group that is configured for the Spark job allows access to the port of the data source. For more information, see Add a security group rule.
For example, if you want to access an ApsaraDB RDS for MySQL data source, you must add security group rules to allow inbound and outbound access to port 3306.
Check whether the ENI that is configured for the Spark job resides in the same virtual private cloud (VPC) as the data source.

SSL peer shut down incorrectly

Error log: Spark driver log.

Cause: When you use Spark to access a JDBC data source, no correct SSL certificate information is configured.

Solution: Configure correct SSL certificate information by referring to the "Access ApsaraDB RDS for MySQL over an SSL connection" section of the Access ApsaraDB RDS for MySQL topic.

Can't get JDBC type for <DataType>

Error log: Spark driver log.

Cause: When you use Spark to access a JDBC data source, data types are incorrectly mapped. For example, when you access an ApsaraDB RDS for MySQL data source, the SHORT INT type that allows null in the data source is mapped to the INT type in an AnalyticDB for MySQL table.

Solution: Change the data types in the AnalyticDB for MySQL table to ensure that the data type that allows null in the JDBC data source can be correctly mapped to a data type in the AnalyticDB for MySQL table. For example, if the SHORT INT type is used in the ApsaraDB RDS for MySQL data source, you can map the data type to the BOOLEAN type in the AnalyticDB for MySQL table.

Exit Code: 137

Error log: Spark driver log.

Cause: The memory usage of a Spark executor process exceeds the limit. A Spark executor container must use memory for shuffle and cache data (off-heap memory) and Python user-defined functions (UDFs) in addition to the memory for Java virtual machines (JVMs). If the memory that the Spark executor container uses exceeds the limit, the Spark executor process is forcibly terminated by the KILL statement. In most cases, this error occurs in data mining scenarios or Python-based Spark job scenarios.

Solution: Increase the value of the spark.executor.memoryOverhead parameter. This parameter specifies the additional memory that can be allocated beyond the Spark executor process within a container. Unit: MB. Default value: 30% of the total memory of a Spark executor container. For example, if the current executor specifications are Medium (2 cores and 8 GB memory), the additional memory that can be allocated beyond the Spark executor process within a container is 2.4 GB. Sample statement:

spark.executor.memoryOverhead: 4000MB

No space left on device

Error log: Spark executor log.

Cause: The disk storage is insufficient.

Solution: Use the spark.adb.executorDiskSize parameter to change the size of additional disk storage that is mounted on a Spark executor. For more information, see Spark application configuration parameters.

The VPC of the specified ENI and security group are not in the same VPC

Error log: Spark driver log.

Cause: The vSwitch and security group that are configured for a Spark job do not reside in the same VPC.

Solution: Check the configurations of the Spark job. Reconfigure a vSwitch and a security group for the Spark job.

java.lang.ClassNotFoundException

Error log: Spark driver log.

Cause: When you submit a Spark job, a required class is missing in the uploaded JAR package. In most cases, this error occurs in scenarios that involve third-party JAR packages.

Solution: Check whether the third-party JAR package contains all required content. If not, repackage the JAR files to ensure that the required class is included.

Cannot broadcast the table that is larger than

Error log: Spark driver log.

Cause: The broadcast fails because the size of broadcast tables exceeds the limit. For information about broadcast tables, see Class Broadcast<T>.

Solution:

The maximum memory that can be used for broadcast tables in Spark jobs is 8 GB. When you submit a Spark job, you can use the spark.sql.autoBroadcastJoinThreshold parameter to configure the memory for broadcast tables. Unit: MB.
Spark uses a sampling method to estimate table sizes. If data is unevenly distributed across tables, an estimation error occurs. You can set the spark.sql.autoBroadcastJoinThreshold parameter to -1 to disable the broadcast feature and ensure that your business runs as expected. Unit: MB.

Not enough memory to build and broadcast the table to all

Error log: Spark driver log.

Cause: The maximum memory of Spark drivers is insufficient to send broadcast tables. For information about broadcast tables, see Class Broadcast<T>.

Solution: Decrease the value of the spark.sql.autoBroadcastJoinThreshold parameter. We recommend that you set the parameter to a value that is less than or equal to 400. Unit: MB.

unkown host .* ts.adb.com

Error log: Spark driver log.

Cause: An internal domain name resolution fails due to network jitters or Domain Name System (DNS) connection failures.

Solution: If this error frequently occurs, set the spark.adb.eni.adbHostAlias.enabled parameter to true.

The security group has been managed by another cloud product.

Error log: response log of the GetSparkAppLog operation.

Cause: The security group that is configured for a Spark job is managed by another cloud service and cannot be used for AnalyticDB for MySQL Spark.

Solution: Check the security group configuration of the Spark job. Reconfigure a security group for the Spark job.

does not have enough IP addresses

Error log: response log of the GetSparkAppLog operation.

Cause: The vSwitch that is configured for a Spark job does not have idle IP addresses to assign.

Solution: Check the vSwitch configuration of the Spark job. Reconfigure a vSwitch for the Spark job to ensure that the vSwitch has sufficient idle IP addresses.

exceeded quota

Error log: response log of the GetSparkAppLog operation.

Cause: The amount of resources required to run a Spark job exceeds the amount of remaining available resources of the job resource group.

Solution: Change the maximum amount of reserved computing resources for the job resource group or submit the Spark job after other jobs are complete.

Error fetching partition paths with prefix

Error log: Spark driver log.

Cause: When you use Spark to access Hudi data, the specified Hudi table partition does not exist.

Solution: Check whether the specified Hudi table partition exists.

The bucket you access does not belong to you

Error log: Spark driver log.

Cause: The Resource Access Management (RAM) role that is specified by the spark.adb.roleArn parameter does not have permissions to access OSS.

Solution: Grant the required permissions to the RAM role. For more information, see the "Perform authorization within an Alibaba Cloud account" section of the Perform authorization topic.

EsHadoopNoNodesLeftException: Connection error .* all nodes failed

Error log: Spark driver log.

Cause: Alibaba Cloud Elasticsearch does not allow Spark to access data by using a direct connection to DataNodes. As a result, data access based on the open source community configuration fails.

Solution: Access Alibaba Cloud Elasticsearch data by referring to Access Alibaba Cloud Elasticsearch.

MetaException .* User have none of the privileges

Error log: Spark driver log.

Cause: The Alibaba Cloud account or the RAM user that you use to run a Spark job does not have permissions to access metadata.

Solution: Associate the database account that you use to run the Spark job with a RAM user and grant the required database and table permissions to the RAM user. For more information, see Associate or disassociate a database account with or from a RAM user and Create a database account.