Most of big data instance families offer a CPU-to-memory ratio of 1:4. They are suitable for big data computing and storage scenarios in which services such as Hadoop MapReduce, Hadoop Distributed File System (HDFS), Hive, and HBase are used and search and log data processing scenarios in which solutions such as Elasticsearch and Kafka are used.
Background information
Before you read further in this topic, you must be familiar with the following information:
Classification and naming of instance types. Familiarize yourself with the instance family categories, naming conventions of instance types, and differences between instance families. For more information, see Classification and naming of instance types.
Instance type metrics. For information about the metrics of instance types, see Instance type metrics. You can also call the DescribeInstanceTypeFamilies and DescribeInstanceTypes operations to query the instance families and the details of all instance types provided by ECS.
Instructions for selecting instance types based on your business scenarios. For more information, see Instance type selection.
After you determine an instance type for your use case, you may need to learn about the following information:
Regions in which the instance type is available for purchase. Instance types that are available for purchase vary based on the region. You can go to the Instance Types Available for Each Region page to view the instance types available for purchase in each region. Alternatively, you can call the DescribeRegions and DescribeZones operations to query the available regions and the zones in a specific region.
Estimated instance costs. You can calculate the price of instances that uses different billing methods in the Price Calculator. You can also call the DescribePrice operation to query information about the most recent prices of ECS resources.
Instructions for purchasing an instance. You can go to the ECS instance buy page to place a purchase order for instances.
You may be concerned about the following information:
Retired instance families. If you cannot find an instance type in this topic, the instance type may be in a retired instance family. For information about retired instance families, see Retired instance families.
Supported instance type changes. Before you change the instance type of an instance, check whether the instance type can be changed and identify compatible instance types. For more information, see Instance types and families that support instance type changes.
Recommended instance families | Not recommended (If these instance families are sold out, you can use the recommended ones.) |
Overview
The durability of data stored on a local disk is determined by the reliability of the associated physical machine. The risk of a single point of failure exists. Data stored on local disks may be lost when a hardware failure occurs on their associated physical machine. We recommend that you store only temporary data on local disks. For more information, see Local disks.
Big data instance families are designed to provide cloud computing and big data storage to support the needs of big data-oriented enterprises. These instance families are suitable for scenarios that require offline computing and big data storage, such as Hadoop distributed computing, extensive log processing, and large-scale data warehousing. Big data instance families are ideal for business that uses distributed networks and has high requirements on storage, capacity, and internal bandwidth.
These instance families are suitable for customers in industries such as Internet and finance that need to compute, store, and analyze big data. Big data instance families use local storage to ensure large amounts of storage space and high storage performance.
Big data instances have the following benefits:
Enterprise-level computing power ensures efficient and stable data processing.
Network performance is enhanced with higher maximum internal bandwidth per instance and higher maximum packet forwarding rates to satisfy data transfer demands such as shuffling in Hadoop MapReduce at peak times.
When you use big data instances, take note of the following items:
Instances equipped with local SSDs do not support instance configuration changes.
Local disks can be tied only to specific instance types. The number and capacity of local disks attached to an instance vary based on the instance type. You cannot separately purchase local disks, or detach local disks from instances and then attach the disks to other instances.
You cannot create snapshots for local disks. If you want to create an image from the system disk and data disks of an instance equipped with local SSDs, we recommend that you create an image by combining the snapshots of both the system disk and data disks. In this case, the data disks must be cloud disks.
You cannot create images that consist of system disk snapshots and data disk snapshots based on instances equipped with local SSDs.
You can attach a standard SSD to an instance equipped with local SSDs and extend the capacity of the standard SSD.
Operations on an instance that are equipped with local SSDs may affect the data stored on the local SSDs. For more information, see the Impacts of instance operations on data stored on local disks section of the "Local disks" topic.
Best practices for mounting a file system to a big data instance
The first time you mount a file system such as ext4, you must initialize the inode table. By default, the lazyinit feature is enabled in Linux kernel v2.6.37 and later, which causes the inode table not to be initialized until file systems are mounted. In addition, local disks consume a large amount of throughput when they are being initialized, such as 600 MB/s for 30 local disks. This may affect service stability. The concurrent number of objects in lazy initialization in Linux kernel v4.x is increased to resolve this issue. For more information, see index: kernel/git/stable/linux.git. We recommend that you use the following best practices for initializing the inode table at your earliest opportunity:
Obtain a list of all local serial advanced technology attachment (SATA) HDDs.
Run the following command to initialize each local disk separately.
In this example, an ext4 file system is created on a local disk whose device name is /dev/vdb.
mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdb &
After all local disks are initialized, run the iostat -x 5 command until the I/O activities of all local disks are displayed as 0.
Batch run the mount command.
d3s, storage-intensive big data instance family
Features:
This instance family is equipped with 12-TB, large-capacity, high-throughput local SATA HDDs and can provide a maximum network bandwidth of 64 Gbit/s between instances.
Supported scenarios:
Big data computing and storage business scenarios in which services such as Hadoop MapReduce, HDFS, Hive, and HBase are used
Machine learning scenarios such as Spark in-memory computing and MLlib
Search and log data processing scenarios in which solutions such as Elasticsearch and Kafka are used
This instance family supports online replacement and hot swapping of damaged disks to prevent instance shutdown.
If a local disk fails, you receive a system event. You can handle the system event by initiating the process of repairing the damaged disk. For more information, see O&M scenarios and system events for instances equipped with local disks.
ImportantAfter you initiate the process of repairing the damaged disk, data stored on the damaged disk cannot be restored.
Compute:
Uses 2.7 GHz Intel® Xeon® Scalable (Ice Lake) processors that deliver an all-core turbo frequency of 3.5 GHz to provide consistent computing performance.
Storage:
Is an instance family in which all instances are I/O optimized.
Supports only ESSDs and ESSD AutoPL disks.
Network:
Supports IPv4 and IPv6. For information about IPv6 communication, see IPv6 communication.
Provides high network performance based on large computing capacity.
d3s instance types
Instance type | vCPUs | Memory size (GiB) | Local storage (GB) | Network baseline/burst bandwidth (Gbit/s) | Packet forwarding rate (pps) | Disk baseline/burst bandwidth (Gbit/s) |
ecs.d3s.2xlarge | 8 | 32 | 4 * 11,918 | 10/burstable up to 15 | 2,000,000 | 3/burstable up to 5 |
ecs.d3s.4xlarge | 16 | 64 | 8 * 11,918 | 25/none | 3,000,000 | 5/none |
ecs.d3s.8xlarge | 32 | 128 | 16 * 11,918 | 40/none | 6,000,000 | 8/none |
ecs.d3s.12xlarge | 48 | 192 | 24 * 11,918 | 60/none | 9,000,000 | 12/none |
ecs.d3s.16xlarge | 64 | 256 | 32 * 11,918 | 80/none | 12,000,000 | 16/none |
d3c, compute-intensive big data instance family
Features:
This instance family is equipped with high-capacity and high-throughput local disks and can provide a maximum bandwidth of 40 Gbit/s between instances.
Supported scenarios:
Big data computing and storage business scenarios in which services such as Hadoop MapReduce, HDFS, Hive, and HBase are used
Scenarios in which EMR JindoFS and Object Storage Service (OSS) are used in combination to separately store hot and cold data and decouple storage from computing
Machine learning scenarios such as Spark in-memory computing and MLlib
Search and log data processing scenarios in which solutions such as Elasticsearch and Kafka are used
This instance family supports online replacement and hot swapping of damaged disks to prevent instance shutdown.
If a local disk fails, you receive a system event. You can handle the system event by initiating the process of repairing the damaged disk. For more information, see O&M scenarios and system events for instances equipped with local disks.
ImportantAfter you initiate the process of repairing the damaged disk, data stored on the damaged disk cannot be restored.
Compute:
Uses third-generation 2.9 GHz Intel® Xeon® Scalable (Ice Lake) processors that deliver an all-core turbo frequency of 3.5 GHz to provide consistent computing performance.
Storage:
Is an instance family in which all instances are I/O optimized.
Supports only ESSDs and ESSD AutoPL disks.
Network:
Supports IPv4 and IPv6. For information about IPv6 communication, see IPv6 communication.
Provides high network performance based on large computing capacity.
d3c instance types
Instance type | vCPUs | Memory size (GiB) | Local storage (GB) | Network baseline/burst bandwidth (Gbit/s) | Packet forwarding rate (pps) | Disk baseline/burst IOPS | Disk baseline/burst bandwidth (Gbit/s) |
ecs.d3c.3xlarge | 14 | 56.0 | 1 * 13,743 | 8/burstable up to 10 | 1,600,000 | 40,000/none | 3/none |
ecs.d3c.7xlarge | 28 | 112.0 | 2 * 13,743 | 16/burstable up to 25 | 2,500,000 | 50,000/none | 4/none |
ecs.d3c.14xlarge | 56 | 224.0 | 4 * 13,743 | 40/none | 5,000,000 | 100,000/none | 8/none |
This instance family supports only Linux images. When you create an instance of this instance family, select a Linux image.
d2c, compute-intensive big data instance family
Features:
This instance family is equipped with high-capacity and high-throughput local SATA HDDs and can provide a maximum bandwidth of 35 Gbit/s between instances.
Supported scenarios:
Big data computing and storage business scenarios in which services such as Hadoop MapReduce, HDFS, Hive, and HBase are used
Scenarios in which EMR JindoFS and OSS are used in combination to separately store hot and cold data and decouple storage from computing
Machine learning scenarios such as Spark in-memory computing and MLlib
Search and log data processing scenarios in which solutions such as Elasticsearch and Kafka are used
This instance family supports online replacement and hot swapping of damaged disks to prevent instance shutdown.
If a local disk fails, you receive a system event. You can handle the system event by initiating the process of repairing the damaged disk. For more information, see O&M scenarios and system events for instances equipped with local disks.
ImportantAfter you initiate the process of repairing the damaged disk, data stored on the damaged disk cannot be restored.
Compute:
Uses 2.5 GHz Intel® Xeon® Platinum 8269CY (Cascade Lake) processors.
Storage:
Is an instance family in which all instances are I/O optimized.
Supports enhanced SSDs (ESSDs), ESSD AutoPL disks, standard SSDs, and ultra disks.
Network:
Supports IPv4 and IPv6. For information about IPv6 communication, see IPv6 communication.
Provides high network performance based on large computing capacity.
d2c instance types
Instance type | vCPUs | Memory size (GiB) | Local storage (GB) | Network baseline bandwidth (Gbit/s) | Packet forwarding rate (pps) |
ecs.d2c.6xlarge | 24 | 88.0 | 3 * 3,972 | 12.0 | 1,600,000 |
ecs.d2c.12xlarge | 48 | 176.0 | 6 * 3,972 | 20.0 | 2,000,000 |
ecs.d2c.24xlarge | 96 | 352.0 | 12 * 3,972 | 35.0 | 4,500,000 |
d2s, storage-intensive big data instance family
Features:
This instance family is equipped with high-capacity and high-throughput local SATA HDDs and can provide a maximum bandwidth of 35 Gbit/s between instances.
Supported scenarios:
Big data computing and storage business scenarios in which services such as Hadoop MapReduce, HDFS, Hive, and HBase are used
Machine learning scenarios such as Spark in-memory computing and MLlib
Search and log data processing scenarios in which solutions such as Elasticsearch and Kafka are used
This instance family supports online replacement and hot swapping of damaged disks to prevent instance shutdown.
If a local disk fails, you receive a system event. You can handle the system event by initiating the process of repairing the damaged disk. For more information, see O&M scenarios and system events for instances equipped with local disks.
ImportantAfter you initiate the process of repairing the damaged disk, data stored on the damaged disk cannot be restored.
Compute:
Uses 2.5 GHz Intel® Xeon® Platinum 8163 (Skylake) processors.
Storage:
Is an instance family in which all instances are I/O optimized.
Supports ESSDs, ESSD AutoPL disks, standard SSDs, and ultra disks.
Network:
Supports IPv4 and IPv6. For information about IPv6 communication, see IPv6 communication.
Provides high network performance based on large computing capacity.
d2s instance types
Instance type | vCPUs | Memory size (GiB) | Local storage (GB) | Network baseline bandwidth (Gbit/s) | Packet forwarding rate (pps) |
ecs.d2s.5xlarge | 20 | 88.0 | 8 * 7,838 | 12.0 | 1,600,000 |
ecs.d2s.10xlarge | 40 | 176.0 | 15 * 7,838 | 20.0 | 2,000,000 |
ecs.d2s.20xlarge | 80 | 352.0 | 30 * 7,838 | 35.0 | 4,500,000 |
d1ne, network-enhanced big data instance family
Features:
This instance family is equipped with high-capacity and high-throughput local SATA HDDs and can provide a maximum bandwidth of 35 Gbit/s between instances.
Supported scenarios:
Scenarios in which services such as Hadoop MapReduce, HDFS, Hive, and HBase are used
Machine learning scenarios such as Spark in-memory computing and MLlib
Search and log data processing scenarios in which solutions such as Elasticsearch are used
Compute:
Offers a CPU-to-memory ratio of 1:4, which is designed for big data scenarios.
Uses 2.5 GHz Intel® Xeon® E5-2682 v4 (Broadwell) processors.
Storage:
Is an instance family in which all instances are I/O optimized.
Supports only standard SSDs and ultra disks.
Network:
Supports IPv4 and IPv6. For information about IPv6 communication, see IPv6 communication.
Provides high network performance based on large computing capacity.
d1ne instance types
Instance type | vCPUs | Memory size (GiB) | Local storage (GB) | Network baseline bandwidth (Gbit/s) | Packet forwarding rate (pps) |
ecs.d1ne.2xlarge | 8 | 32.0 | 4 * 5,905 | 6.0 | 1,000,000 |
ecs.d1ne.4xlarge | 16 | 64.0 | 8 * 5,905 | 12.0 | 1,600,000 |
ecs.d1ne.6xlarge | 24 | 96.0 | 12 * 5,905 | 16.0 | 2,000,000 |
ecs.d1ne-c8d3.8xlarge | 32 | 128.0 | 12 * 5,905 | 20.0 | 2,000,000 |
ecs.d1ne.8xlarge | 32 | 128.0 | 16 * 5,905 | 20.0 | 2,500,000 |
ecs.d1ne-c14d3.14xlarge | 56 | 160.0 | 12 * 5,905 | 35.0 | 4,500,000 |
ecs.d1ne.14xlarge | 56 | 224.0 | 28 * 5,905 | 35.0 | 4,500,000 |
d1, big data instance family
Features:
This instance family is equipped with high-capacity and high-throughput local SATA HDDs and can provide a maximum bandwidth of 17 Gbit/s between instances.
Supported scenarios:
Scenarios in which services such as Hadoop MapReduce, HDFS, Hive, and HBase are used
Machine learning scenarios such as Spark in-memory computing and MLlib
Scenarios in which customers in industries such as Internet and finance need to compute, store, and analyze big data
Search and log data processing scenarios in which solutions such as Elasticsearch are used
Compute:
Offers a CPU-to-memory ratio of 1:4, which is designed for big data scenarios.
Uses 2.5 GHz Intel® Xeon® E5-2682 v4 (Broadwell) processors.
Storage:
Is an instance family in which all instances are I/O optimized.
Supports standard SSDs and ultra disks.
Network:
Supports IPv4
Provides high network performance based on large computing capacity.
d1 instance types
Instance type | vCPUs | Memory size (GiB) | Local storage (GB) | Network baseline bandwidth (Gbit/s) | Packet forwarding rate (pps) |
ecs.d1.2xlarge | 8 | 32.0 | 4 * 5,905 | 3.0 | 300,000 |
ecs.d1.3xlarge | 12 | 48.0 | 6 * 5,905 | 4.0 | 400,000 |
ecs.d1.4xlarge | 16 | 64.0 | 8 * 5,905 | 6.0 | 600,000 |
ecs.d1.6xlarge | 24 | 96.0 | 12 * 5,905 | 8.0 | 800,000 |
ecs.d1-c8d3.8xlarge | 32 | 128.0 | 12 * 5,905 | 10.0 | 1,000,000 |
ecs.d1.8xlarge | 32 | 128.0 | 16 * 5,905 | 10.0 | 1,000,000 |
ecs.d1-c14d3.14xlarge | 56 | 160.0 | 12 * 5,905 | 17.0 | 1,800,000 |
ecs.d1.14xlarge | 56 | 224.0 | 28 * 5,905 | 17.0 | 1,800,000 |