This topic describes how to deploy Apache Spark on a single Elastic Compute Service (ECS) instance by creating a stack in the Resource Orchestration Service (ROS) console.
Background information
Apache Spark is a general-purpose computing engine designed for large-scale data processing. Apache Spark uses Scala as its application framework and uses resilient distributed datasets (RDDs) for in-memory computing. Apache Spark provides interactive queries and can optimize workloads by means of iterative algorithms.
The Installs Spark on an ECS instance (existing VPC) sample template in ROS helps you create an ECS instance based on existing resources, such as the virtual private cloud (VPC), vSwitch, and security group, and associate an elastic IP address (EIP) with the instance. The following software versions are used in the sample template:
- JDK 1.8.0: the Java Development Kit (JDK).
- Hadoop 2.7.7: the framework for distributed systems.
- Scala 2.12.1: the programming language.
- Apache Spark 2.1.0: the computing engine.
After a stack is created by using the sample template, you can obtain the URL of the Spark web interface and use the URL to log on to the Spark management console. If you want to access the URL of the Spark web interface over the Internet, add inbound rules to the security group to allow traffic on ports 8088 and 8080. For more information, see Add a security group rule.