Ganos is designed and developed based on the cloud-native Data Lake Analytics (DLA). Ganos is an engine used to store and compute large amounts of spatio-temporal data. Ganos is based on the serverless architecture and Spark computing engine of DLA. Ganos can access a variety of Alibaba Cloud storage systems. These systems include PolarDB, ApsaraDB for HBase Enhanced Edition (Lindorm), and Object Storage Service (OSS). Ganos uses uniform spatio-temporal data models and APIs to manage and compute heterogeneous data from multiple data sources. It also allows you to perform complex operations, such as the association analysis of data sources where heterogeneous data is stored. The serverless architecture of DLA allows you to use Ganos on demand. You are charged only for the data queries that you perform. This way, resource scaling is elastic, upgrades are performed without affecting services, and operational costs are reduced. For more information about DLA, see What is DLA?.
Scenarios
In typical scenarios, DLA Ganos performs extract, transform, and load (ETL) operations to transfer and collaboratively analyze the spatio-temporal data that is stored in different databases or file systems. DLA Ganos loads GeoTiff files from OSS to generate a Resilient Distributed Dataset (RDD) model. Then, DLA Ganos writes data to a storage system such as ApsaraDB for HBase Enhanced Edition (Lindorm) to store the data. DLA Ganos can also load spatio-temporal data from multiple data sources at the same time. This operation aims to cleanse and convert data. After the data is analyzed and computed by using machine learning algorithms or tools, DLA Ganos writes the analysis results to a specified data source. Then, a professional spatio-temporal data publishing system such as GeoServer publishes the computing results as a standard Open Geospatial Consortium (OGC) service. You can use the OGC service to query and view data.
Benefits
- Cost-effectiveness
Ganos is developed based on the serverless architecture of DLA. When you use DLA Ganos, you do not need to deploy and manage infrastructure or separately maintain Spark virtual clusters (VCs). You need only to apply for VCs and use them if required. You are charged only for the resources that you use. DLA Ganos starts immediately after it is activated. It is upgraded without affecting services. DLA Ganos also supports elastic scaling to ensure the quality of services.
- Easy to use
DLA Ganos provides a series of APIs based on Spark SQL. These APIs include a large number of built-in user-defined functions (UDFs) to analyze spatio-temporal data. This allows you to process large amounts of spatio-temporal data by using SQL statements. You can process data in a similar way to how you process data stored in relational databases.
- Unified modeling
DLA Ganos provides a unified spatio-temporal data model based on Spark RDD. This facilitates the modeling of various spatio-temporal data. DLA Ganos can implement complex operations such as data loading and model conversion. You need only to focus on the business logic.
- Multiple data sources
DLA Ganos can access multiple data sources and analyze heterogeneous data from different data sources. It allows you to analyze data stored in Alibaba Cloud storage systems. These systems include OSS, PolarDB, and ApsaraDB for HBase Enhanced Edition (Lindorm). DLA Ganos also allows you to perform the association analysis of data of DLA and data of other data sources.
Activate DLA Ganos
- Create a VC in the DLA console. For more information, see Create a virtual cluster.
- On the Virtual Cluster management page, find the VC that you created and click Details in the Actions column.
- In the Cluster Attributes section of the cluster details page, select spark_dla_ganos from the Version drop-down list to activate DLA Ganos.