How to deploy Apache Spark on an instance - Resource Orchestration Service

This topic describes how to deploy Apache Spark on an instance by creating a stack in the Resource Orchestration Service (ROS) console.

Background information

Apache Spark is a general-purpose computing engine designed for large-scale data processing. Apache Spark uses Scala as an application framework and leverages Resilient Distributed Datasets (RDDs) for in-memory computing. Apache Spark provides interactive queries and can optimize workloads by using iterative algorithms.

You can use the Installs Spark on an ECS instance (existing VPC) sample template to create an Elastic Compute Service (ECS) instance based on existing resources and associate an elastic IP address (EIP) with the instance. The existing resources include a virtual private cloud (VPC), vSwitch, and security group. The following software versions are used in the sample template:

JDK 1.8.0: the Java Development Kit (JDK)
Hadoop 2.7.7: the framework for distributed systems
Scala 2.12.1: the programming language
Apache Spark 2.1.0: the computing engine

After a stack is created by using the sample template, you can obtain the value of SparkWebSiteURL and log on to the Apache Spark management console. If you want to access the URL specified by SparkWebSiteURL over the Internet, you must configure an inbound rule for the security group to allow traffic on ports 8088 and 8080. For more information, see Add a security group rule.

Step 1: Create a stack

Log on to the ROS console.
In the left-side navigation pane, choose Templates > Public Templates.
Search for the Installs Spark on an ECS instance (existing VPC) sample template.
Click Create Stack.

In the Configure Parameters step, configure the Stack Name parameter and the following parameters.

Parameter	Description	Example
Existing VPC ID	The ID of the VPC. For more information about how to create and query a VPC, see Create and manage a VPC.	vpc-bp1m6fww66xbntjyc****
VSwitch Zone ID	The zone ID of the vSwitch that resides in the VPC.	Hangzhou Zone K
VSwitch ID	The ID of the vSwitch that resides in the VPC. For more information about how to create and query a vSwitch, see Create and manage a vSwitch.	vsw-bp183p93qs667muql****
Business Security Group ID	The ID of the ECS security group. For more information about how to query the ID of a security group, see Search for security groups.	sg-bp15ed6xe1yxeycg7o****
Instance Type	The instance type of the ECS instance. Select a valid instance type. For more information, see Overview of instance families.	ecs.c5.large
Image ID	The image ID of the ECS instance. By default, a CentOS 7 image is used. For more information, see Overview.	centos_7
Instance Password	The password of the ECS instance.	Test_12****
Public IP Bandwidth	The bandwidth of the public IP address. Valid values: 1 to 100. Unit: Mbit/s.	5
Disk Type	The disk category. Valid values: cloud_efficiency: ultra disk. cloud_ssd: standard SSD. cloud_essd: Enterprise SSD (ESSD). cloud: basic disk. ephemeral_ssd: local SSD. For more information, see Disks.	cloud_efficiency
System Disk Space	The system disk size of the ECS instance. Valid values: 40 to 500. Unit: GB.	40

Click Next:Check and Confirm. Then, click Create.
On the Stack Information tab, view the stack status. Wait until the stack is created. Then, click the Outputs tab to obtain the value of SparkWebSiteURL.
Access the URL specified by SparkWebSiteURL and log on to the Apache Spark management console.

Step 2: View resources

In the left-side navigation pane, choose Deployment > Stacks.
On the Stacks page, click the ID of the desired stack.

Click the Resources tab to view information about the resource in the stack.

The following table describes the resource in this example.

Resource	Quantity	Resource description	Specification description
ALIYUN::ECS::Instance	1	Creates an ECS instance to deploy Apache Spark on the instance.	An ECS instance that has the following specifications is created: Instance type: ecs.c5.large. Disk category: ultra disk. System disk size: 40 GB. Public IP address: A public IP address is allocated.

Note

For more information about the pricing details of resources, go to the relevant console or refer to the pricing documentation of each resource.