This topic describes how to use the serverless Spark engine of Data Lake Analytics (DLA) to access Hadoop clusters on which Kerberos authentication is not enabled.
Prerequisites
- DLA is activated and a Spark virtual cluster (VC) is created in the DLA console. For more information about how to activate DLA, see Activate Data Lake Analytics.
- Object Storage Service (OSS) is activated. For more information, see Sign up for OSS.
- The vSwitch ID and security group ID that are required for creating a Spark compute node are obtained. You can select
the IDs of an existing vSwitch and an existing security group. You can also create
a vSwitch and a security group and use their IDs. The vSwitch and security group that you selected must meet the
following conditions:
- The vSwitch must be in the same virtual private cloud (VPC) as the Hadoop cluster.
- The security group that you selected must be in the same VPC as the Hadoop cluster. You can log on to the Elastic Compute Service (ECS) console. In the left-side navigation pane, choose Network & Security > Security Groups. On the Security Groups page, enter the VPC ID in the search box to search for the security groups that are associated with the VPC and select the ID of a security group.
- If the Hadoop cluster is configured with a whitelist for access control, you must add the CIDR block of the vSwitch to the whitelist.
Notice If you want to access a Hadoop cluster in X-Pack Spark of ApsaraDB for HBase, join the DingTalk group (ID: dgw-jk1ia6xzp) to activate Hadoop distributed file system (HDFS) first. By default, HDFS is not activated. This is because the Hadoop cluster in X-Pack Spark may be unstable or even attacked after HDFS is activated.