When you create an Elastic High Performance Computing (E-HPC) cluster, you must configure its hardware, software, and basic settings. This topic describes how to create a cluster by using the wizard in the E-HPC console.
Prerequisites
A service-linked role for E-HPC is created. The first time you log on to the E-HPC console, you are prompted to create a service-linked role for E-HPC.
A virtual private cloud (VPC) and a vSwitch are created. For more information, see Create and manage a VPC and Create and manage a vSwitch.
File Storage NAS (NAS) is activated. A NAS file system and a mount target are created. For more information, see Create a file system and Manage mount targets.
Background information
A cluster provides computing resources and storage resources. You can submit jobs, debug jobs, store results, and view results in the cluster. Before you create and use an E-HPC cluster, take note of the following information:
You can create up to three clusters in a region. To create more clusters, submit a ticket.
You are charged E-HPC service fees and other resource fees when you create a cluster. For more information, see Billable items.
Do not use the Elastic Compute Service (ECS) console to manage nodes in the cluster. We recommend that you manage the nodes in a cluster in the E-HPC console rather than the Elastic Compute Service (ECS) console.
Step 1: Configure hardware settings
When you create a cluster, you must configure the hardware settings of the cluster. The hardware settings determine the performance of a cluster, including the region, deployment mode, number of nodes, network type, and storage.
You can configure the hardware settings based on your business requirements.
Log on to the E-HPC console.
In the left part of the top navigation bar, select a region.
In the left-side navigation pane, click Cluster.
On the Cluster page, click Create Cluster.
In the Hardware Configurations step, configure the hardware settings. The following table describes the parameters that you can configure.
Parameter
Description
Availability Zone
The zone to which the cluster belongs.
NoteTo ensure efficient communication between E-HPC nodes, make sure that all nodes reside in the same region and zone. For more information, see Regions and zones.
Pricing Model
The billing method of nodes in the cluster. The billing method does not apply to elastic IP addresses and NAS file systems.
Subscription: You can purchase or renew a node by week, month, or year.
Pay-As-You-Go: You are charged for nodes on an hourly basis.
Preemptible Instance: Only compute nodes support preemptible instances. Both of the management nodes and logon nodes support only the pay-as-you-go billing method.
For more information, see ECS billing method overview.
Deploy Mode
The deployment mode of the cluster. Valid values:
Standard: The logon node, management nodes, and compute nodes are deployed separately.
Tiny: The logon node and management nodes are deployed on the same instance. Compute nodes are deployed separately.
ImportantIf you want to use the Open Grid Scheduler (SGE), you must deploy the cluster in Tiny mode.
Node type and quantity
Specify the instance type and the number of nodes based on the deployment mode.
Specify instance types based on your business requirements. If you want to use the cluster to perform molecular dynamics computing, you can select the GPU type to accelerate analysis. For more information, see Specifications and Best practices for instance type selection.
NoteTo create a cluster that is equipped with YiTian processors, select an instance type that is equipped with YiTian processors. For example, you can select ecs.g8m.large. The g8m instance family is in invitational preview. You can go to the g8m Instance Free Trial Application Form page to apply for a free trial use.
We recommend that you specify the instance specifications of management nodes based on the number of compute nodes.
If the number of compute nodes in the cluster is less than or equal to 100, we recommend that you select 16 or more vCPUs and 64 GiB or more of memory.
If the number of compute nodes in the cluster is less than or equal to 500, we recommend that you select 32 or more vCPUs and 128 GiB or more of memory.
If the number of compute nodes in the cluster is more than 500, we recommend that you select 64 or more vCPUs and 256 GiB or more of memory.
A logon node is configured as the development environment. A logon node provides the required resources and a testing environment to cluster users for software development and debugging. We recommend that you configure a logon node by using a CPU-to-memory ratio that is higher than or equal to the CPU-to-memory ratio of compute nodes.
System Disk
The cloud disk type and capacity of all node system disks. Valid values: 40 to 2000. Unit: GB.
NoteTo configure a system disk with a capacity of more than 500 GB, submit a ticket.
Expand the Advanced Configurations section. In the Advanced Configurations section, configure the network and storage settings.
Parameter
Description
Authorized Instance Configurations
Enabled
Bind a RAM role to a node. This way, you can access Alibaba Cloud services on the node.
ImportantBy default, the feature is disabled. To enable the feature, submit a ticket.
After the ticket is approved, perform the following operations based on your user type:
Alibaba Cloud account: Click Switch to RAM for authorization to authorize the current user to use the default RAM role.
RAM user: Log on to the RAM console by using an Alibaba Cloud account and select one of the following methods to grant permissions to the RAM user.
Add the following custom policy and attach the policy to the RAM user. For more information, see Create custom policies and Grant permissions to a RAM user.
{ "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "ram:PassRole", "ram:ListRoles" ], "Resource": "*" }, { "Effect": "Allow", "Action": "ecs:AttachInstanceRamRole", "Resource": "*" } ] }
Grant the AliyunRAMFullAccess permission to the RAM user.
The AliyunRAMFullAccess permission is used to manage RAM users and permissions. This permission grants more privileges compared to a custom policy. For more information, see Create a RAM user and authorize the RAM user to access Log Service.
Role Name
The RAM role that you want to bind to the node. We recommend that you select the default role AliyunECSInstanceForEHPCRole.
Node Type
The type of the node to which you want to bind the RAM role. Valid values:
Scheduling Node
Domain Account Node
Logon Node
Compute Node
NoteIf you select Compute Node, compute nodes that are added during scale-out activities are automatically bound to the specified RAM role.
Resource Group
Resource Group
The resource group to which the cluster nodes belong. You can use the resource group to manage multiple cluster nodes that belong to your account in a centralized manner.
Networking
EIP
An elastic IP address (EIP) is a public IP address that you can separately purchase and own. If you want to access the cluster from a static IP address, you can purchase and bind an EIP to the logon node of the cluster.
Use: An EIP is automatically created and bound to the logon node. You can access the cluster over the Internet.
Do Not Use: You can access the cluster only over a VPC.
NoteYou are charged for using EIP resources. For more information, see Billing overview.
VPC and vSwitch
The VPC in which the cluster resides. Different VPCs are logically isolated from each other. You can create and manage E-HPC clusters in a VPC.
By default, the first VPC and vSwitch in the VPC and vSwitch drop-down lists are selected. Make sure that the number of available IP addresses is greater than the number of cluster nodes.
You can click Create VPC and Create vSwitch (for subnet) to create a VPC and a vSwitch. For more information, see Create and manage a VPC and Create and manage a vSwitch.
Create Security Group
You can configure security group rules to manage the inbound and outbound traffic of nodes in the security group.
If you turn on the switch, you must enter a new security group name in the Security Group Name field.
If you turn off the switch, you need to select an existing security group from the Select Security Group drop-down list.
Storage
Configure by Directory
If you turn off Configure by Directory, only one file system is configured for the cluster.
If you turn on Configure by Directory, file systems are mounted to the directories of all nodes. This improves the shared storage capacity of the cluster.
Type
The type of the file system. Valid values:
General-purpose NAS
Extreme NAS
File System ID and Mount Point
By default, the first file system and mount point in the File System ID and Mount Point drop-down lists are selected. Make sure that the file system has sufficient mount points.
You can click Create a file system and Create mount point to create a file system and a mount point.
Mount Configurations
If you mount a General-purpose NAS file system, you can select a mount protocol. Valid values: Mount over NFSv3 and Mount over NFSv4.
Remote Directory
The remote directory to which the file system is mounted.
Step 2: Configure software settings
Software settings include the image and scheduler that are installed on the nodes and the domain account service that manages the cluster and cluster users.
After you configure the hardware settings, click Next.
In the Software Configurations step, configure the software settings. The following table describes the parameters that you can configure.
Parameter
Description
Image Type and Image
Select an image type based on your business requirements. Valid values:
Public Image
Custom Image
Shared Images
Alibaba Cloud Marketplace Image
Community Image
If you set Image Type to Custom Image, take note of the following limits:
E-HPC supports CentOS images and custom images that are created based on Alibaba Cloud images. When you import an image, make sure that Check After Import is selected. Otherwise, the image cannot be identified in the E-HPC console.
You cannot use an existing image that was generated for another cluster. Otherwise, compute nodes may not run as expected after the current cluster is created.
You cannot modify the yum repository configurations of the operating system in a custom image. Otherwise, the cluster cannot be created or scaled out.
The mount directory of the custom image cannot be the
/home
directory or/opt
directory.
After you select an image type, you can select the image that you want to use. Different images apply to different operating systems. The system deploys cluster nodes based on the image that you select.
ImportantThe system automatically displays available images based on the region that you select, the available image resources, and the images that are supported by the node instance type.
Scheduler
Schedulers help you manage jobs, and are deployed on E-HPC clusters.
E-HPC supports multiple schedulers. However, different schedulers apply to different image types. The E-HPC console displays the schedulers that are supported by the specified image type.
Domain Service
The domain account service based on which the cluster and cluster users are managed. nis and ldap are supported.
VNC
If you turn on VNC, the system automatically enables the Virtual Network Computing (VNC) service. You can access the E-HPC console on another computer by using VNC.
Configure the queue and post-installation script settings.
Parameter
Description
Queue Config
Create New Queue
E-HPC allows you to categorize compute nodes that run different jobs or perform different tasks by adding the nodes to different queues. Jobs are run in a sequence that is determined by the specified queues and scheduler.
Default Queue: The compute nodes of the cluster are automatically added to the default queue of the specified scheduler. For example, the default queue of PBS is workq, and the default queue of slurm is comp.
New Queue: You must enter a queue name in the Queue Name field. The queue is automatically created, and the specified compute nodes are added to the queue.
Post-Install Script
Script URL
The URL that is used to download the script after the cluster is created.
NoteYou can download the script over HTTP or HTTPS. We recommend that you save the script in a public Object Storage Service (OSS) bucket.
Arguments
The runtime parameters of the script. For more information, see Configure an installation script.
Step 3: Configure basic settings
After you configure the software settings, click Next.
In the Basic Configurations step, configure the basic settings. The following table describes the parameters that you can configure.
Parameter
Description
Cluster Name
The name of the cluster. The cluster name is displayed on the Cluster page.
Logon Password and Repeat Password
The password of the cluster. The password is required when you use SSH to remotely access the logon node of the cluster. The username is root.
In the Configuration List section, check the parameters that you configured. Read and select Alibaba Cloud International Website Product Terms of Service and click OK.
Check the results
After you create the cluster, you can check the status of the cluster on the Cluster page. If the cluster and all cluster nodes are in the Running state, the cluster is created.