Alibaba Cloud disks use the triplicate storage technology to provide stable, efficient, and reliable access to data in ECS instances based on a distributed file system, achieving a data reliability of 99.9999999%.
Overview
When you perform read and write operations on disks, the operations are translated into the corresponding processes on the files stored in Alibaba Cloud data storage system. Alibaba Cloud uses a flat design in which a linear address space is divided into chunks. Each chunk is replicated into three copies stored on different data nodes of the storage cluster to ensure data reliability.
All user level operations on disk data (including adding, modifying, and deleting data) are synchronized across three chunk copies at the underlying layer. This mechanism ensures the reliability and consistency of your data.
How triplicate storage works
Triplicate storage involves three types of key component: the master, chunk server, and client. Chunk servers are data nodes where chunk copies are stored. Each write operations is executed by the client in the following manner:
- The client receives your write request and determines which chunk corresponds to the write operation.
- The client queries the master to find the chunk servers where the three copies of the chunk are stored.
- The client sends write requests to the chunk servers returned from the master.
- If the write operation succeeds on all three chunk copies, the client returns a success. Otherwise, the client returns a failure.
The master ensures that the copies of each chunk are distributed to different chunk servers cross different racks. This prevents data unavailability caused by the failure of a single chunk server or rack. The distribution strategy of the master takes many factors of the storage system into account, such as chunk server disk usage, chunk server distribution across racks, power distribution conditions, and node workloads.
Data protection
When a data node is damaged or disk faults occur on a data node, the number of valid copies of some chunks within the cluster may become less than three. In such cases, the master will initiate synchronization tasks and replicate data between chunk servers to ensure that there are three valid copies of each chunk in the cluster.