Alibaba Cloud Genomics Service (AGS) is an ultra-fast, cost-effective, and high-precision genome sequencing and secondary data analysis service that is developed by Alibaba Cloud. AGS provides containerized workflows for genomics computing based on Container Service for Kubernetes (ACK). AGS provides an out-of-the-box acceleration API. You can call this API without the need to create a cluster. This topic describes AGS and its benefits.
Overview
AGS is commonly used for genome sequencing and secondary data analysis. It requires only 15 minutes to complete a high-precision 30x whole genome sequencing (WGS) process, which includes gene comparison, sequencing, deduplication, and mutation detection. Compared with the traditional solution, AGS offers a 120 times speed improvement and is 2 to 4 times faster than the globally used FPGA/GPU solution.
AGS provides strong support for genetic disease detection and cancer screening by analyzing and identifying the mutation mechanism of personal genome sequences. In the future, it will play an important role in clinical medicine and genetic diagnosis. The whole human genome consists of about three billion base pairs. One 30x WGS sample is about 100 GB in size. AGS offers great benefits in terms of computation speed, precision, cost, usability, and compatibility with upstream sequencers. It is also suitable in scenarios such as SNP/INDEL and copy number variation (CNV) detection in DNAs, and DNA and RNA virus detection.
Benefits
Fast and high precision. In actual tests, the solution completes the sequencing and secondary analysis of eight 30x WGS samples within 15 minutes. The solution can be used to assemble, sequence, deduplicate, and detect mutation on 720 billion base pairs within 15 minutes. The solution achieves a 120 times speed improvement without compromising the precision. If you compare the NA12878 sample and gold-standard Variant Call Format (VCF) files, the SNP precision reaches 99.80% and the precision of secondary analysis is higher than or equivalent to the output of BWA-0.7.17/GATK 4.1.3.
Dataset: 30x NA12878
SNP
RECALL
PRECISION
F1
GATK 4.1
99.86%
99.79%
99.82%
AGS
99.86%
99.80%
99.83%
INDEL
RECALL
PRECISION
F1
GATK 4.1
99.28%
99.70%
99.49%
AGS
99.27%
99.68%
99.47%
Cost-effectiveness. ACK and AGS provide PaaS-based acceleration capabilities and assist BGI Group to perform secondary data analysis on large-scale FASTQ data from genome sequencers based on a hybrid cloud architecture. This solution reduces the costs of secondary data analysis and shortens the delivery cycle by 95%.
Wide application scope. AGS provides the following features to enable a wide application scope:
AGS meets flexible requirements and supports custom analysis processes for different platforms and data types without compromising the throughput. AGS provides major sequencing service providers and research institutions with solutions for more simplified and efficient storage, automated analysis, data transmission, project collaboration, and bioinformatics tool development.
AGS supports Kubernetes-native workflows and allows you to execute DAG-structured workflows in Kubernetes clusters. AGS is also applicable in scenarios such as genomics computing and data processing.
Ease of use. AGS supports elastic scheduling in large-scale computing scenarios based on the auto scaling feature. In the practice of this solution, you do not need to plan computing resources, processing logic, and data caching. You need to only upload data in FASTQ files to Object Storage Service (OSS) and authorize AGS to access the OSS buckets. The data analysis process is efficiently complete and the result is uploaded to the specified storage.
References
In addition, AGS also fixes the issues in workflow management, massive data storage, migration and transmission, and security compliance requirements. For more information, see the following topics: