With the continuous improvement and development of the idea and technologies of cloud-native, more industries are using cloud-native technologies. This has more or less affected technology practitioners in different positions. The entire technology stack (from business logic to technology selection) has undergone dramatic changes for IT executives, frontline developers, and O&M personnel. Everyone must have an in-depth understanding of the impact of cloud-native implementation on different positions.
Many enterprises have comprehensive and strict requirements on the capabilities of technical CXOs (including CTO, CIO, CISO, and CDO, which are all referred to as CXOs in this article) and technology executives. They should be able to consider all aspects of technology management and take maintaining the company's business as their core responsibility. Therefore, CXO and IT technology executives must have a broad technical vision, excellent technical judgment, and even high-level architecture design capabilities. They also should have good product awareness to cope with the changing internal and external environments.
As CXO and IT/R&D executives of enterprises, these senior roles must realize something. Cloud-native is the inevitable trend of cloud computing development and has reshaped the basic technology platform for enterprise digital transformation. Cloud-native architecture is the basic technology architecture for building modern enterprise applications. Therefore, cloud-native architecture is critical for Internet applications, enterprise transaction applications, big data applications, and AI loads.
CXO and IT executives need to note that most cloud-native technologies and standards come from projects of mainstream open-source foundations, and these technologies and standards constitute an open-source technology system. This is related to technology management topics that concern CXO and IT executives, such as open-source and localization. Cloud-native services launched by major cloud vendors are also compatible with the corresponding technologies and standards. Open-source cloud-native technologies and products meet the requirements of enterprise customers for free of vendor binding. When changing the Cloud Service Provider (CSP) or Independent Software Vendor (ISV), enterprises do not need to worry about technical failure of changing or high migration costs. Localization is increasingly becoming the rigid demand of the nation and enterprises. Enterprises need to select cloud-native products that meet localization standards, including the automation and controllable capabilities of cloud-native products, source code contributed (usually reflected in aspects like O&M, APIs, and component extensions), and support from local servers. At the same time, units, such as CAICT and CESI, also provide relevant evaluations for enterprises to help them choose commercial products that meet the localization standards.
Within an enterprise, CXO and IT executives must use cloud-native technologies to promote enterprise technology upgrades and realize technology and business value based on the situation of the enterprise.
First, Alibaba Cloud Native Architecting (ACNA) should be used to evaluate and formulate the cloud-native strategy and implementation path of the enterprise at the strategic and organizational levels. It should also become a part of the overall enterprise strategy to help and accelerate digital transformation. In addition, just like the middle platform strategy of the enterprise, the cloud-native strategy is a comprehensive technical upgrade and an upgrade of the IT organizational structure and culture. Nowadays, more enterprises have realized this point. Let’s use Alibaba as an example. Alibaba started the R&D of cloud-native technologies and products as early as ten years ago. During the 2020 Apsara Conference, the news about the establishment of the Alibaba Cloud Native Technical Committee reflects the determination of the Group to promote full cloud-native for Alibaba and Ant.
Second, the cloud-native technology restructures the development approach of enterprise applications in an all-around way. CXO and IT executives need to think about some issues. For example, they can use technologies (such as containers, microservices, Serverless, and Service Mesh) to rewrite applications, use DevOps to reshape the R&D and O&M processes of enterprises, use GitOps, IaC, and declarative architecture to redefine the pipeline and O&M methods of enterprises, use observability and Service-Level Agreement (SLA) to upgrade the original monitoring system, and use the cloud-native identity-centered security system to ensure enterprise security.
All technology upgrades are intended to bring practical value to enterprises. Therefore, CXO and IT executives should pay attention to the following aspects when using cloud-native technology for upgrades:
For the technical backbone of enterprises, such as architects, consultants, and system planners, cloud-native technology and architecture have a profound impact on architecture evolution and risk control, technology selection, modern application construction, reshaping of IT service process, application of new tools, and security planning.
1. Architecture Evolution and Risk Control
The essence of the evolution of cloud-native architecture is to change the infrastructure environment in which software runs, namely, the cloud platform, and enable the upper-layer software architecture to break the original stable state and establish a new one. This requires architects, consultants, and system planners to evaluate the organizational capabilities of the enterprise, skill levels of development and O&M personnel, development cycle, cost budgets, legacy system integration, and business demands carefully. They also need to use the ACNA architecture for risk control to ensure the smooth implementation and operation of the cloud-native architecture in enterprises.
2. Technology Selection
Technology selection involves two aspects. The first aspect is the field of cloud-native technologies and architectures, and the second aspect is the selection among multiple similar technologies or products in the same field. Regarding the former, enterprises should consider the maturity model of cloud-native architecture. Then, they can determine the field that matches their demands and capabilities gradually based on the architecture iteration cycle. For example, some enterprises choose container + microservices + Internet middleware to build their middle platforms. For the latter, enterprises should choose open-source products and services with commercial support (at least with successful commercial implementation cases in the same field), such as microservices and containers provided by cloud platforms.
3. Modern Application Construction
Enterprises can achieve business agility and cope with changing challenges in the market while empowering applications with dynamic scaling and high resilience to construct modern applications based on cloud-native technology. Core architects apply the iteration process of cloud-native technology and architecture to the R&D of the next-generation core software by rewriting and reconstructing the core software of the enterprise, giving new applications the characteristics of modern applications. Since cloud-native brings the complete upgrade of application architecture, enterprises should rewrite the system rather than reconstructing it to minimize the repayment of historical technology debts. At the same time, it can reduce the legacy burden of the system and accelerate the modernization process of new applications.
4. Reshaping the IT Service Process
After the enterprise upgrades cloud-native technologies, the entire IT service process must also undergo cloud-native upgrades, which include management of events, problems, changes, releases, and configurations. The definitions of these processes are perfect. As cloud-native technology defines new tools, methods, and standards, the entire upgrade process has become more automated, and the processing has been simplified. For example, the use of observability tools reduces the monitoring burden during the event management process significantly. Kubernetes-based cloud event management can better cover tasks from virtual hosts, containers, PaaS services, integrated services, and application layer, including centralized collection, storage, analysis, alerting, relevance analysis, and visual display. This improves the process efficiency of the service desk and subsequent events.
5. Application of New Tools
The cloud-native technology system consists of many new tools that can improve the efficiency of cloud delivery, cloud integration, and cloud O&M significantly. Enterprises may face problems without these tools, such as insufficient automation, fragmented IT information, and high O&M risks. Therefore, architects, consultants, and system planners must select or develop suitable tools for scenarios, such as continuous integration/continuous delivery (CI/CD), microservice implementation, on-cloud activation and integration of PaaS/SaaS services, Configuration Management Database (CMDB) integration, enterprise monitoring integration, and account/permission/authentication integration. All these are aimed to improve the O&M automation level of enterprises and reduce O&M risks.
6. Security Planning
In the context of digital transformation, although the value of digital assets is explored constantly, the risks are also increasing. DevSecOps, zero-trust model, and numerous cloud security services advocated by cloud-native have performed fine-grained upgrades for security policies, such as permission control, service-level dynamic isolation, and request-level access control. This ensures the security of the end-to-end process from code development to application O&M. This process requires the enterprise to upgrade its security planning to achieve simultaneous planning from the cloud infrastructure to application security.
Cloud-native technologies and architectures have a huge impact on many technical developers (including design, development, and testing technicians) in the following six aspects:
1. Technology Stack
Developers of the entire technology stack will benefit from adopting cloud-native technology. The development environment gradually changes from local IDE to cloud IDE, and cloud services are pre-integrated in IDE (for example, use Cloud Toolkit to deploy applications in IDE), improving the efficiency of writing and debugging the code. The Backend for Frontend layer adopts the Serverless architecture and a large number of PaaS cloud services, which simplifies the technology stack and frees developers from backend O&M. Backend developers need to pay attention to frequently used technologies, such as containers, microservices, Serverless, Service Mesh, and PaaS cloud services.
2. Distributed Design Mode
The cloud-native technology system contains a large number of existing distributed design patterns and integrates them into open-source products and cloud services, reducing the work intensity of architects and developers significantly. For example, microservices and Service Mesh support many architecture modes, such as gray release, circuit breaking, warehouse isolation, throttling, degradation, observability, and service gateway. Modes, such as Event-Driven Architecture (EDA), read/write splitting, Serverless, Command Query Responsibility Segregation (CQRS), Basically Available, Softs State, and Eventual Consistent (BASE), need to be introduced from the application architecture layer and cannot be transparent to applications.
3. Business Development
The more cloud-native technologies and cloud services are adopted, the less energy developers spend on developing non-functional features, thus having more time and energy to focus on the functional design of the business itself. For applications developed based on Service Mesh and Serverless, developers do not need to care about server O&M, upgrade dependent software constantly, or handle the complexity of gray hot upgrade and automatic rollback. Therefore, online traffic pressure testing is not needed to reduce the workload of integration testing and smoke testing.
4. Test Method
The traditional way of designing test cases based on prediction is too inefficient. The solution is to use active fault injection and chaos engineering for fatigue testing to simulate faults that may occur in the real world. The test method for recording and playback of online traffic can form test cases and improve the effectiveness of regression quickly. More importantly, these test methods are used in the production system and have not been tested in the test environment in advance. Internet companies, such as Netflix, Amazon, and Alibaba, are using these test methods to reduce the risk of failures in large-scale distributed environments.
5. Software Development and O&M
For enterprises that have changed from the traditional waterfall model to an agile development method, DevOps and DevSecOps make more obvious changes to the R&D process. They require enterprises to ensure the secure continuous release and redefine and standardize the R&D process and tools contacted by R&D personnel. Enterprises also need to realize the integration of development and O&M positions to set up positions focusing on improving the stability, efficiency, and quality of projects. It can be said that the organization, process, and culture of R&D and O&M are redefined.
6. Learning Scenario
The cloud platform is the infrastructure of the digital society and an important part of new infrastructure. Many of the most advanced and up-to-date IT technologies and concepts will be reflected on the cloud platform. Open-source projects supported by these new technologies, related conferences, parties, discussion forums, and technology blogs are perfect places for technicians to learn and improve their technical skills. In addition, cloud computing-related technical media often provide a large number of new cloud-native technologies and solutions. Developers can learn to broaden their horizons and improve their technical capabilities. These technical media usually provide online documents, live broadcasts, videos, technical articles, blogs, and other resources.
As the guarantor of the successful operation of the software, O&M personnel, including Site Reliability Engineers (SREs), are also deeply influenced by cloud-native technologies and architectures, especially in aspects including technology stack, O&M tools, monitoring and error handling, SLA management, and AIOps. The following part gives detailed descriptions.
1. Technology Stack
O&M engineers may change their technology stacks because O&M software is constructed using cloud-native technology stacks. In addition, they actively use cloud-native technologies and tools to construct new tasks and processes, including integration, monitoring, automation, self-recovery, performance management, high availability management, security management, SLA management, IT asset management, event management, configuration management, change management, release management, and patch management. Typically, Kubernetes Operator is used to implement automated resource creation, delivery, and instance migration operations.
2. O&M Tools
The cloud-native architecture implements high automation of O&M through IaC and declarative O&M. It processes deployment, upgrade, rollback, configuration change, scaling, and other operations automatically even in a complex distributed system with hundreds or thousands of servers. GitOps, as a core implementation concept of IaC, contains a description of the system target state and runs through the entire change process. It is in line with the DevOps transparency principles and has the advantages of declarative O&M.
3. Monitoring and Error Handling
Daily error handling covers receiving user feedback, discovering abnormal system metrics, and taking a variety of O&M means to identify, analyze, and solve problems and breakdowns. Observability emphasizes that one service execution can obtain logs, metrics, and tracking information from multiple distributed services, containers, virtual hosts, networks, and BaaS services. Therefore, the monitoring capability and error handling efficiency are improved. Cloud-native technology does not require O&M personnel to collect and associate the information from multiple distributed nodes. Instead, Prometheus and Grafana help perform correlation analysis, alarms, and visualization of multi-dimensional information.
4. SLA Management
After measurement metric information is obtained, we can perform SLA management on business services and PaaS components using the dependencies derived from the call relationships. Then, we can implement SLA management on global services and IT assets. In the absence of infrastructure and capabilities, such as Service Mesh and observability, traditional monitoring systems can only try to retrieve the measurement metric information from logs of different formats. If the software does not print the measurement metric information, the monitoring system cannot obtain it. At the same time, due to the lack of full-procedure dependency, SLA management cannot achieve correlation analysis of upstream and downstream. As a result, the system cannot immediately perceive whether a service or component meets its Service Level Objective (SLO). These problems are solved in the cloud-native system, which helps O&M personnel improve the SLA management level of the system.
5. AIOps
AIOps analyzes and prevents faults proactively while accelerating troubleshooting during O&M using machine learning and AI. After performing observability operations in a large number of business services and technical components, the system generates a large amount of log, metric, and tracing data. It can assist in operations, such as exception detection before and after changes, correlation analysis and false positive alert elimination of multiple events, root cause analysis, automatic removal of abnormal nodes, and emergency recovery by analyzing the data through real-time machine learning and AI.
As important roles in the software delivery procedure, software delivery engineers and system integration engineers will change their working styles by applying cloud-native software.
1. Standardized Delivery
One of the biggest difficulties in the delivery process is that different customers have different IaaS environments, including different server or virtual host technologies, network environments, storage products, operating systems, and basic software libraries. Different IaaS environments produce different versions of delivered software and change in different delivery stages, which increases the complexity of delivery management. Containers and immutable infrastructure can mask the differences of IaaS components. In addition, containers can use different images to create different configuration versions when the container operation environment changes instead of in-place modifying or updating. (In-place modifying or updating may cause loss of version configurations or make it difficult to manage configurations of different versions.) By doing so, standardized delivery processes can be achieved, and the influence of frequent changes in the IaaS layer on configuration changes of upper-layer applications can be isolated to improve software delivery efficiency.
2. Automated Delivery
Another difficulty in software integration and delivery is the need to provide corresponding software configuration, installation, or deployment manuals (for relevant personnel) to handle the differences between standard deployment and deployment in different environments. In this process, the installation script is only auxiliary because it does not need to know the knowledge in the manual. Cloud-native Operation Administration and Maintenance (OAM) uses YAML files to provide metadata-level descriptions of the operating environment, composition, and O&M features of software from the perspective of applications. It also describes the final state of software deployment and the configuration changes that can be adapted. Scripts can read and understand YAML files. At the same time, we can see that the deployment of the same software in typical scenarios can be standardized, open-source, and shared (for example, the deployment process of Redis on Alibaba Cloud ECS instances.) This can automate the delivery process of common software and share the delivery experience in typical environments, improving the delivery level.
3. Cloud Delivery and Cloud Integration
Cloud computing provides a new place for software to run and a new form of delivery. Cloud computing is also a Proof of Concept (POC) place for software delivery. The integration of software and cloud has become a new software integration model and fueled new Cloud System Integration (CSI). The system integrates with software deployed on the public cloud first. Then, it replicates the environment on the public cloud to the private cloud using cloud-native delivery tools. This reduces integration and delivery costs while simplifying integration complexity.
4. Continuous Delivery
Continuous software delivery is an essential part of the DevOps process. With small-scale and frequent delivery, DevOps can make software delivery more automated and version-oriented and perform upgrades and rollback operations repeatedly and automatically. Continuous delivery ensures that the software always has the latest and available version. When the code or configuration changes, a new version is generated. Then, the availability of this new version is verified. By doing so, the software delivery efficiency improves.
5. Common Toolchain and Knowledge System
The cloud-native technology system is open-source and contains widely used open-source components and open knowledge systems. With these components and knowledge, software integration engineers and software delivery engineers can learn the latest cloud-native technologies quickly, get the most suitable cloud-native toolchain, and verify in their environments quickly. In addition, enterprises can acquire basic technical knowledge of the products they use through internet channels, which also reduces the training cost in the software delivery process to a certain extent.
Database Administrator (DBA) plays an important role in traditional commercial databases and open-source database systems. They are the keys to ensuring the stability of the entire software system. The development of cloud-native technologies and products has also had a profound impact on DBAs. Their working styles are undergoing a dramatic transformation. Their focus is gradually shifting from underlying system construction to business system architecture design, from basic stability to business structure optimization, and from using database software to making good use of cloud-native products. Meanwhile, enterprise requirements for O&M objects, O&M platforms, and technical capabilities have also changed significantly.
1. O&M Object
Once upon a time, Database as a Service (DaaS) was only a dream. However, with the continuous evolution of cloud-native architecture, it has become a reality. Cloud databases provide out-of-the-box PaaS services and offer a wide range of cloud-native database products, including computing resource pools and storage resource pools, through cloud-native resource pool technologies. This enables DBAs to perform O&M operations on database services instead of hosts, networks, and databases. DBAs no longer need to focus on the delivery of resources from Internet Data Centers (IDCs) to hosts. These basic services are completed by the cloud platform, which leverages the supply chain scale and virtualization technology to provide high-quality services with costs much lower than those of self-built IDCs. In the cloud computing era, DBAs can use the IaaS-based service capabilities of cloud computing to offload the O&M burden of basic resources. By doing so, they can pay more attention to the support from database services for business and focus on database services as the O&M objects.
2. O&M Platform
In the era of commercial databases, the basic capabilities of DBAs are to make good use of a single database product and build an O&M platform. Thus, they can achieve data security, high service availability, backup and recovery, performance monitoring, problem diagnosis, and other basic functions. Even in the era of open-source databases, DBAs of most companies have developed by themselves or customized modifications based on open-source O&M components on the preceding aspects. This consumes a lot of human and material resources, and it is difficult to obtain continuous O&M capabilities. If there is a loss of core O&M personnel, the enterprise may find the platform difficult to sustain. The database PaaS platform provides rich O&M capabilities through cloud-native architecture. Therefore, DBAs no longer need to build O&M platforms from scratch. They can shift from O&M for basic components to O&M for database services and achieve customized development of business support capabilities based on various OpenAPIs provided by the cloud platform. They can focus on providing stable database services for businesses by the O&M platform. At the same time, with the gradual improvement of the basic capabilities of the cloud platform, new technologies take advantage of the OpenAPI system to improve the capabilities of the database service-oriented O&M platform continuously. Therefore, we need to be aware that the platform-based advantages of the cloud-native architecture can be fully utilized only when the objectives of O&M platform construction are transformed.
3. Technical Capabilities
Rich cloud services in the cloud-native era have brought technological and architectural advantages and freed traditional DBAs from fundamental issues. Enterprises need architects capable of designing business data architectures based on cloud services rather than traditional O&M DBAs. Therefore, DBAs need to change as soon as possible. In the cloud-native architecture, many problems that once required a lot of energy of DBAs can be solved easily. A typical example is data security, which has always been the top priority of DBAs. They invest their energy in disk disaster tolerance, data center disaster tolerance, data backup, and other data security protection work. The multi-AZ and distributed storage architectures in the cloud-native era have natural advantages in solving data security problems. Another example is capacity planning. It is always difficult to determine the database capacity planning. When the business model changes, such as during big promotions, the system capacity is prone to be insufficient. Cloud-native systems use resource pool technology to take advantage of the elasticity of the cloud-native storage/computing separation architecture to shorten the scaling time consumption significantly from days to seconds. Moreover, the shared storage technology allows read nodes to be scaled out in seconds, thus expanding the read capacity of the system. In the near future, we believe the database capacity elasticity will become even more powerful based on breakthroughs in CPU pooling, memory pooling, and multi-point write technologies.
In addition, SQL optimization has always been an important part of the daily work of DBAs. It occupies much of the work time of DBAs to guide business developers to write SQL that conforms to the characteristics of the database. In the cloud-native era, the cloud-native automatic optimization system supports database self-detection, self-repairing, self-optimization, self-O&M, and self-security based on machine learning and experts' experience. It can help DBAs simplify database management and eliminate service failures caused by manual operations, thereby ensuring the stability, security, and efficiency of database services.
In the cloud-native era, cloud services have freed DBAs. At the same time, DBAs are required to change from DBAs to database architects as soon as possible. By doing so, they can be more deeply involved in the architecture design of business systems and help developers make full use of the features of cloud databases.
Cloud-native technologies affect the daily work of related technology personnel in terms of the business processes, technology selection, and technology stack. The impacts of cloud-native technologies are on much more aspects. In an environment where cloud-native has become an inevitable trend of the future, technical practitioners must also learn and adopt the concept and technology of cloud-native. Thus, cloud-native technologies and products can be used to realize the value of cloud computing and support the development of related businesses.
508 posts | 48 followers
FollowAlibaba F(x) Team - March 3, 2021
Alibaba Cloud Community - March 9, 2022
Alibaba Clouder - July 16, 2020
OpenAnolis - September 6, 2022
MiSand - February 27, 2020
Alibaba Cloud Native Community - February 13, 2023
508 posts | 48 followers
FollowCustomized infrastructure to ensure high availability, scalability and high-performance
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreAccelerate software development and delivery by integrating DevOps with the cloud
Learn MoreAn intelligent tool that can be used to perform quick inspections on your cloud resources and application architecture to detect underlying risks and provide solutions.
Learn MoreMore Posts by Alibaba Cloud Native Community