By Dong Guoping, Senior Technical Expert of Alibaba Cloud Intelligence
Big data platform on public clouds differs in the design and implementation of multi-tenancy. This article mainly introduces the issues and challenges that need to be considered in the multi-tenancy implementation of the public cloud big data platform and focuses on the features of MaxCompute in the multi-tenant implementation of computing and storage. You will learn more about the important technical points in the multi-tenant solution of the big data cloud platform and the product features of MaxCompute in the multi-tenant implementation.
The concept of multi-tenancy may be understood differently. Here is a simple classification:
The first type is that each tenant enjoys a database instance that support basic role-based access control. For example, the traditional database on the cloud is usually in this mode. In this scenario, multiple tenants are supported from the perspective of the cloud platform. However, each tenant purchases an independent instance and divides roles within the instance. The data between instances are independent.
The second type is multi-tenancy in control plane . For example, metadata and permission control are multi-tenant, but computing resources are relatively independent. Computing resources are usually managed separately in big data scenarios since complex computing is required.
The third one is multi-tenancy in a broader sense, which means sharing everything. Multi-tenancy resources are used in control plane, computing and storage, which can also be called strong multi-tenancy.
From the perspective of users,the degree of multi-tenancy increases will result in better system scalability and easier expansion and shrinkage of resource , but higher complexity of the cloud platform . We know that higher system complexity results in a more unstable system. As the jobs of different users run together, the security requirements are higher, especially in the public cloud scenario.
This article focuses more on the computing and storage implementation of multi-tenancy. Control plane, RBAC-based or permission table-based permission management and row-level/column-level permissions are also part of the multi-tenant implementation of big data platforms, but they are not the focus. Let's come back to the multi-tenancy of computing and storage. There will be different combinations of implementations.
A typical form is single-tenancy computing plus open storage, such as AWS EMR and Databricks.
This is the architecture diagram of Databricks. The control plane is multi-tenant. The computing resources of different users are single-tenant, and it uses open storage like S3. The control uses the Databricks account, but the computing resources belong to the user's VPC. The advantage is that since computing resources are single-tenant, complex UDFs can be supported without consideration for security issues. Moreover, since the storage is open, computing can be easily bounced to other clouds to support multi-cloud. The challenge is that the granularity of resources is tenant-level, so they need to be purchased in advance. Elastic scaling depends on the elasticity of the cloud platform. The reading and writing of multi-tenant cloud storage lack efficiency. The physical location of computing and storage is far away. It may have to pass through the gateway. Thus, there is a bandwidth forwarding bottleneck, and data prefetching and caching are required. At the same time, the intermediate data generated in the computing process cannot be completely dependent on cloud storage due to its performance. Other methods need to be considered, such as memory or local storage.
The implementation of BigQuery and MaxCompute are similar, which use multi-tenant computing plus internal storage.
Both computing and storage resources are multi-tenant. Computing and storage can be located in the same data center with close physical locations. The advantage is extreme elasticity. Users can run large-scale tasks without holding physical resources. It can be charged based on the resources used by user jobs. As for the implementation of internal storage, there can be a large bandwidth between computing and storage, and we can make full use of the underlying storage features to optimize. The challenge is UDF support. UDFs are a feature often provided in big data scenarios. It uses custom functions to support complex computing. We need to avoid a malicious user's code threatening the security of the platform or other tenants. In this respect, BigQuery and MaxCompute have different implementations. BigQuery is relatively restrained in the implementation of UDFs. It provides js UDFs but cuts the capabilities, but MaxCompute uses secure containers to support complete UDF capabilities, which will be described later. However, the implementation of secure containers on the cloud platform has the limitation of secondary virtualization, so we need resource forms such as bare metal or physical machines.
The advantage of multi-tenancy is that it is out of the box and does not need to create a separate resource pool. It can expand in seconds with extreme elasticity. If a single-tenant resource pool depends on the ECS instance, it may take minutes from the purchase of resources to the preparation of the software environment, while it may only be a configuration parameter change on a multi-tenant platform. In terms of charging, multi-tenancy can be charged according to the actual expenses, while the charge for a single-tenant resource pool is according to the specifications of the resource pool, regardless of whether it is used or not. The cloud platform can dynamically scale according to resource usage, but there are still essential differences in the granularity of resource sales. In terms of cost, multi-tenant resource pools can bring higher resource utilization through peak cutting and valley filling between different tenant jobs, then cloud platforms can bring cost advantages by giving this part to users.
This will certainly cause some technical challenges. First of all, the problems of remote reading and writing and intermediate file storage need to be solved with cloud storage. The internal storage can be customized and optimized, but the openness of storage is a problem. At the resource scheduling level, we need to ensure different tenants and types of jobs can be scheduled reasonably on the platform to support ultra-large-scale computing nodes. In the aspect of runtime, we need to implement runtime isolation for UDF or three-party engine scenarios to ensure that there will be no unauthorized access to data between tenants and malicious code of a single tenant will not affect the security of the platform and other tenants. At the same time, for user-customized network requirements, you also need to get through at the tenant level but not at the cluster level.
The difference between single-tenancy and multi-tenancy is shown on the chart. The single-tenant resource pool ensures the security between multiple tenants through the isolation of the IAAS layer, while multi-tenancy requires the big data platform to solve the internal security risk. Among these challenges, the resource scheduling layer focuses on performance and scalability in large-scale scenarios, while security challenges are the key to the feasibility of the solution. If the security of multi-tenancy fails to be guaranteed, it is unacceptable for cloud services.
Alibaba Cloud MaxCompute is an enterprise-level cloud data warehouse for big data analysis that offers fully hosted serverless services. We support SQL, Java and Python UDF capabilities. We support model training and other operations on MaxCompute data based on the algorithm components of the Machine Learning Platform for AI (PAI). We also support open-source Spark task. All these are provided on unified computing and storage resources.
In terms of storage, we use the storage engine Pangu which is self-develpoed by Aliyun Apsara and implement a capability-based permission model. The permission model can be simplified without opening access to the outside world. Since it is the internal storage, we can implement distributed access to avoid performance bottlenecks caused by centralized nodes. At the same time, we can use internal storage to achieve better localization and management of temporary data during the job running.
A multi-tenant resource pool cannot be separated from a good resource scheduling engine. In the aspect of resource scheduling, we implement a set of efficient and scalable resource scheduling systems. We provide horizontal scaling capabilities to support large-scale computing nodes at the scheduling and resource management levels. At the same time, we ensure that different types of tasks of different tenants can be fairly scheduled on the platform and have optimized the failover process. In terms of resource forms, we provide subscription and Pay-As-You-Go resource forms. Subscription users are guaranteed to resources usage. Pay-As-You-Go users are scheduled according to the resource requirements and time sequence.
At the host level of resource control, we implement job-level resource control through the cgroup mechanism to ensure that exceptions of one job will not affect others. We support different startup methods for jobs, process methods or container methods, and can also manage CPU or GPU resources at the same time.
For flexibility and extensibility considerations, MaxCompute supports UDF in the SQL language to facilitate the expansion of computing behavior. It also introduces three-party engines, such as Spark. They are untrusted code to the platform, which may trigger unexpected system damage or an attack by a malicious user. We use lightweight security containers (virtualized containers) to achieve process-level isolation. In other words, untrusted code will be run inside the secure container.
MaxCompute has high requirements for the stability and performance of secure containers due to the features of short duration and big cluster size of big data computing tasks. We have also made specific optimization. First of all, in terms of security, we have tailored the VM kernel to remove unnecessary kernel functions, reduce the attack plane, and provide necessary protection mechanisms. The external network access is prohibited by default. Ordinarily, for an offline data computing platform, users are insensitive to latency, but the optimization of the entire process is the direction we have been working hard on, so we have done a lot of optimizations on the startup speed of secure containers. The implementation of virtualization will have additional resource usage. Technically, it is necessary to reduce the resource usage of VMs and increase the computing density of single machines, so more tasks can be run. You need to establish an efficient data tunnel inside and outside the security container to read and write computing data.
After we have isolated security containers, nodes need to communicate with each other for tasks similar to Spark, such as task distribution and status monitoring between Spark driver and worker. For security reasons, these communications cannot be built on top of the host network, so we constructed a virtual network of VXLAN based on secure containers. Let all nodes of the same task run in the same virtual network. Nodes in the virtual network communicate through private IP addresses and fail to access the host network. As for the external network requirements customized by users (such as accessing an interface on the public network or other data services inside the VPC), we have also made a task-level connection. The user declares the network target that needs to be accessed when the job is started and the network is connected in the job dimension after necessary permission checks.
Similarly, due to the frequent start-stop and scale of tasks, the construction and communication of virtual networks will also face immense pressure. We know the creation of a VPC on the cloud is usually based on VXLAN technology, but the creation of a VPC is relatively fixed. Usually, a user only has one VPC. The purchase of hosts is to add nodes to the VPC with low operation frequency. However, we need to create a VPC for a task and pull up hundreds of nodes in the task in a short time, which can be a performance challenge.
We have realized the strong multi-tenancy in a single resource pool through the preceding technologies, making more business models possible. We provide a powerful UDF implementation on a multi-tenant cluster based on the isolation of secure containers and virtual networks. Compared with UDFs provided by other platforms, we have fewer restrictions on the capabilities of UDFs, allowing access to local IO and network functions and being able to access data inside user VPCs. For example, in the scenario of the lake house, we can create a networklink to open up network access to the user's VPC. After you associate the networklink when creating an external data source, you can access external data through SQL in MaxCompute. Currently, these have been implemented on the MaxCompute platform. Task-level isolation allows us to provide mixed computing forms in a single cluster. In addition to SQL and UDF implementations, we support internal PAI machine learning platforms and open-source Spark engines.
There will be corresponding implementations for different business scenarios, product forms, and infrastructure in the design of multi-tenancy. Returning to the original intention of the design, why should we implement strong multi-tenancy on the resources of unified computing and storage? MaxCompute is an internally incubated product. Currently, more than 99% of offline data within the group runs on the MaxCompute platform. In terms of business models, we expect the UDF ecosystem to be compatible with Hive and support an open-source ecosystem. Due to the requirements for data security within the group, we have implemented multi-tenant security early. When facing public cloud services, we expect to provide customers with advantages in terms of resource granularity, elasticity, and cost, prompting us to finally adhere to the strong multi-tenant form.
As mentioned earlier, regarding the internal storage, we expect to enhance our openness to computing scenarios at the storage level in the future evolution. In the multi-tenant scenario, the temporary and unpredicted consumption of large-scale resource from a major customer is unfriendly to the platform, which may lead to queuing for other users' jobs. Thus, it is also an option to provide a single-tenant computing form when facing such customers. The open storage and single tenancy computing will support the subsequent multi-cloud form so users can have more choices and use different combinations to meet personalized needs.
137 posts | 19 followers
FollowAlibaba Clouder - November 6, 2017
Alibaba Cloud Community - July 27, 2022
Alibaba Clouder - March 31, 2021
Alibaba Clouder - July 13, 2020
Alibaba Cloud Native Community - April 23, 2023
Alibaba Clouder - July 26, 2019
137 posts | 19 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreGet started on cloud with $1. Start your cloud innovation journey here and now.
Learn MoreMore Posts by Alibaba Cloud MaxCompute