ChaosBlade-Box is an open-source cloud-native Chaos Engineering console of Alibaba Cloud for multiple clusters, languages, and environments. The main features include a unified chaos experiment user interface, Chaos Engineering tool deployment (such as ChaosBlade and LitmusChaos), and the support for experimental scenario management and multi-dimensional experiments.
ChaosBlade is an easy-to-use, efficient, and open-source Chaos Engineering experimental tool of Alibaba Cloud, which conforms with the experimental model. It supports multi-platform, multi-language environment, and more than 200 drill scenarios, such as host system -> container -> Kubernetes cluster -> common components (Elasticsearch, Redis, and MySQL) -> application (Java, Golang, C++, and NodeJS), and over 3,000 parameters.
Chaos Engineering is a discipline that conducts experiments on distributed systems. Chaos Engineering helps detect weak points of systems in advance, promotes the improvement of the architecture, and finally realizes business resilience by actively injecting faults.
Since 2021, major enterprises have paid attention to and invested in the research and development of Chaos Engineering. ChaosBlade (an open-source Chaos Engineering tool from Alibaba) has officially become a CNCF Sandbox project. A new version of the ChaosBlade-Box was released to help users of open-source projects implement Chaos Engineering better. The following sections describe the features of the new version.
ChaosBlade-Box aims to build a unified Chaos Engineering operation platform. Since its release, it has received extensive attention from the open-source community. There are also the following problems.
In view of the problems above, the ChaosBlade-Box and Agent were revised significantly to integrate the Community Edition with the kernel of the Enterprise Edition, unify user operation habits, upgrade the system architecture of the Community Edition, and enhance its component features.
The new ChaosBlade-Box console is a multi-cluster, multi-environment, and multi-language cloud-native Chaos Engineering platform. It provides Chinese and English versions and supports global namespaces. Therefore, the same user can set different global namespaces according to their needs, such as test space, sandbox space, and online space. It offers automated tool deployment to simplify tool installation steps and improve execution efficiency. ChaosBlade-Box supports probe installation and drills in different environments, such as hosts and Kubernetes. It supports drills in nodes, pods, and containers in the Kubernetes environment. Data related to pods in a cluster is automatically collected and managed in a unified manner in application management. Thus, it simplifies the steps of the user drill query. Users do not need to go to the cluster to view the names of the pods or containers of the applications needed to be drilled. It also supports one-click migration to the Enterprise Edition and synchronizes drill data from the Community Edition to the Enterprise Edition as needed.
The following is the whole process of a drill on the new ChaosBlade-Box platform. Sequential execution and stage execution are supported. Sequential execution refers to multiple drill scenarios taking effect in sequence, while stage execution is multiple drill scenarios taking effect at the same time. A variety of security policies are used to ensure the drill is resumed, such as manual penalty and automatic stop. Automatic stop is configured by setting the timeout parameter during the drill configuration. This way, even if the platform and the probe (agent) are disconnected and cannot perform manual stop, the fault can be automatically recovered when the system reaches the timeout period.
The following figure illustrates the system architecture of the new ChaosBlade and its component features:
The console of the new Community Edition has the following features:
1) Internationalization
It supports languages switching between Chinese and English.
2) Namespace Switch
It supports global space switching, allowing the same user to set different global namespaces according to their needs, such as test space, sandbox space, and online space.
3) More Smooth Drill Arrangement
The smooth orchestration is consistent with the drill process orchestration of the Enterprise Edition. Also, it supports parallel or serial drill processes for multiple fault drills at the same time.
4) Improved Application Management
It provides more comprehensive application management features and supports applications deployed in the host and Kubernetes environments, including application overview, machine list, drill records, and application configuration.
5) Seamless Migration
It is consistent with the operation interface of the Enterprise Edition and provides the one-click migration feature. It can automatically replace the probe with the public cloud one, register the drill machine to the Enterprise Edition, and synchronize the drill data to the Enterprise Edition. As a result, it can easily and simply switch to the Enterprise Edition.
6) Safety
It offers multiple fault recovery strategies to ensure that the issued drills can be recovered.
7) Multi-Environment Deployment
It supports deployment methods in different environments, including hosts, Docker, and Kubernetes.
8) Hierarchical Drill Scenarios
The drill scenarios are displayed in different categories. When you create a drill, the drill scenarios can be displayed in different categories in real-time according to the selected drill target.
The new probe (agent) has more features:
1) Support for Drill Channels for Different Environments
It can be used as a channel of drill command delivery in different environments. Thus, it simplifies the steps of the old version required to specify the kubeconfig of the cluster to perform the drill in the Kubernetes environment.
2) A More Complete API
It unifies the agent external API interface to facilitate expansion and docking.
3) Automatic Data Collection and Reporting
A new server that reports data related to Kubernetes to the console in the Kubernetes environment is added. This allows users to select drill targets in the Kubernetes environment.
4) Automatic Uninstallation of the Probe
The automatic probe uninstallation interface is added to directly control the probe installation and uninstallation in the console.
5) Keep Alive
Add a probe script to guarantee the liveness of the probe process
6) Multi-Environment Deployment
It supports deployment in different environments, including hosts, Docker, and Kubernetes.
Chaos Engineering is an approach to ensure the high availability of the system. Alibaba made ChaosBlade-Box (Chaos Engineering Console) open-source in 2021. ChaosBlade has been widely used as China's first open-source Chaos Engineering tool. It aims to help implement the Community Edition of Chaos Engineering, manage different open-source fault injection tools, and build a unified Chaos Engineering operation platform. This new version has a lot of improvements in the use of the user interface and the realization of functions, making it easier and more convenient to use and implement Chaos Engineering.
[1] chaosblade-box: https://github.com/chaosblade-io/chaosblade-box
[2] chaosblade-box-agent: https://github.com/chaosblade-io/chaosblade-box-agent
[3] chaosblade: https://github.com/chaosblade-io/chaosblade
KubeDL HostNetwork: Accelerating Communication Efficiency for Distributed Training
508 posts | 48 followers
FollowAlibaba Cloud Community - July 1, 2022
Alibaba Cloud Native Community - March 3, 2022
Alibaba Developer - January 19, 2022
Alibaba Cloud Community - March 8, 2022
Alibaba Clouder - May 24, 2019
Alibaba Container Service - March 29, 2019
508 posts | 48 followers
FollowApplication High Available Service is a SaaS-based service that helps you improve the availability of your applications.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreMore Posts by Alibaba Cloud Native Community