By Zunfei Liu
As a configuration center, Nacos often stores sensitive information. Misuse of Nacos can lead to security risks. The two most common problems are:
Check if you have these two types of problems. If you do, solve them immediately. After resolving these basic issues, let's explore the advanced security features.
With the rise of cloud computing, the Internet of Things (IoT), big data, and AI, the modern system network deployment architecture has become more complex. The traditional network boundary is becoming increasingly blurred, which has a significant impact on traditional network security architecture. Security risks have a greater impact on enterprise development. In this context, the concept of zero-trust security has emerged and is guiding modern network security architecture. This article will introduce how to ensure Nacos data security based on the concept of zero-trust security.
The content shared today will be divided into three parts. The first part will briefly introduce Nacos and its common usage scenarios. The second part will analyze possible security risks based on Nacos runtime principles. The lack of necessary security protection for Nacos is also a cause of Nacos data leakage. The last part will share best practices for Nacos under zero-trust security and how to make Nacos run more securely in production environments.
What is Nacos? Its name is the acronym for Dynamic Naming and Configuration Service, and its core functions include dynamic service discovery management, dynamic configuration management, and dynamic DNS service. Dynamic service management can be combined with the upper-layer microservice RPC, such as Spring Cloud and Dubbo, to realize that when a node goes online or offline, the traffic is automatically adjusted or redirected. Dynamic configuration management provides the ability to change the runtime behavior of business applications without restarting business nodes during the runtime. It can also be combined with CoreDNS to export services registered on Nacos as DNS domain names to implement dynamic DNS services.
The figure above lists some common usage scenarios of Nacos, covering fields like microservice, high availability, frontend ecology, and database. From a product functionality perspective, service discovery and registration will focus more on scenarios involving the mapping of service names or domain names to IP addresses. On the other hand, configuration management will cover a wider range of scenarios, including critical ones such as traffic scheduling, routing rules, emergency plans, key business gates, and database data source configurations. The sensitivity of data is crucial, as any miswriting or leakage of password data can lead to significant data security risks.
The above diagram illustrates the Nacos runtime. On the left side are business applications and nacos-client, which are typically deployed in the same process. On the right side is the Nacos server, comprising Nacos-Server and centralized persistent storage. From a data flow perspective, the business application calls the nacos-client's configuration publishing interface, transmitting data to the Nacos Server through the network. The Nacos Server then persistently stores the configuration content and loads it into the local disk cache and content cache of the server. When the client queries the configuration, the configuration content is pulled from the server to the local server. For high availability and disaster recovery, the client caches the configuration content on its local disk. From a data perspective, the configuration content is stored in the local disk cache of the business application, the local disk cache of the server, and the persistent database, and passes through each intermediate network device during transmission.
Now, let's assume that we ignore the secure intranet and that all these components are exposed to the Internet. Based on this assumption, several issues may arise.
Since Nacos configuration data is cached on the client, if a business application server is compromised, the configuration content in the local cache will be leaked.
Data may be captured by traffic as it passes through intermediate network devices. If data is transmitted in plaintext, it will be leaked.
If the machine hosting the Nacos server is compromised, the content cached in the local disk will be obtained in plaintext.
Configuration leakage can also occur due to a data breach of the centralized database for configuration storage.
If the Nacos server does not have policy-based access control enabled, anyone with the server node's IP address can pull configuration content through the Nacos API, or a broad ACL rule is set for the Nacos server node. These scenarios are common when users deploy Nacos, such as in commercial MSE Nacos cases where many users have enabled Internet access for their instances and set broad ACL rules without enabling authentication, creating high-risk scenarios.
The issues introduced above can be concluded from three aspects: storage security, transmission security, and access control.
We have made several assumptions before, such as ignoring the problem of secure intranet isolation, and assuming that every environment may be breached. In fact, this is a very important principle in the concept of zero-trust security, "never trust, always verify". In the next section, we will introduce the zero-trust security practice of Nacos and how to ensure Nacos data security through zero-trust security.
Let's first briefly understand the relevant content of zero-trust security.
As the name implies, no unauthenticated access entity should be trusted by default. Zero trust is a modern network security access concept, and its core idea is "never trust, always verify."
The emergence of zero trust is driven by a series of realistic factors under the current IT network architecture.
Several basic features of Zero Trust include:
Next, I will introduce the zero-trust security practices of Nacos from three aspects: Nacos transmission security, storage security, and access control.
The first aspect is Nacos transmission security using TLS. We know that TLS solves three problems in the data transmission process. First, it ensures data confidentiality by encrypting data, making it impossible for a third party to obtain plaintext data. Second, it guarantees integrity by preventing data tampering by a third party. Third, it solves the identity authentication problem between communicating parties, preventing man-in-the-middle attacks.
The TLS handshake process can be broken down into several main stages, which are partially simplified here. The first stage involves confirming the TLS protocol version, encryption algorithm, compression method, and other basic information between the client and server. The second stage exchanges and verifies the certificates of both parties, ensuring identity authentication. The third stage negotiates the symmetric key used in the subsequent actual message transmission process through asymmetric encryption. In the fourth stage, both parties send an encrypted switching message, and communication begins based on the symmetric encryption key. The fifth stage involves actual application message transmission, using symmetric encryption to transmit data and performing MAC verification to ensure integrity.
Nacos uses gRPC as the underlying communication protocol in version 2.x. gRPC uses Netty as the network communication framework. The implementation of Nacos's TLS is also based on gRPC/Netty. The figure describes the basic components of the client and the server. The part in light blue shows how users can control whether to enable TLS through parameters, including property files, JVM parameters, and environment variables. The part in dark blue is the components provided by the Nacos layer to accept parameters and convert the parameters to the underlying gRPC and Netty components to implement TLS functions. In addition, the server supports the dynamic rotation function of the server certificate, and users can customize the SPI extension to implement the processing logic when the server certificate file changes.
Here, we will explain how to activate TLS encryption for transmission in Nacos. The entire process involves three steps:
● Prepare a Certificate
You can purchase a commercial certificate to obtain the relevant file information. For development and testing, you can use self-signed keytool openssl to generate an SSL certificate. In this step, you need the following information:
● Start the Nacos Server
nacos.remote.server.rpc.tls.enable=true
: Enable the TLS. If you set this parameter to true, TLS is enabled on the server.nacos.remote.server.rpc.tls.certChainFile={certFilePath}
: Specify the certificate file path.nacos.remote.server.rpc.tls.certPrivateKey={keyPath}
: Specify the certificate private key file.*nacos.remote.server.rpc.tls.mutualAuth=true/false
: Determine whether to enable mutual authentication. Default value: false. If you set this parameter to true, the identity of the client needs to be verified. The client also needs to configure the certificate and private key file simultaneously.*nacos.remote.server.rpc.tls.trustCollectionChainPath={trustFilePath}
: A trusted client CA certificate is used to verify the validity of the client certificate when mutual authentication is enabled.*nacos.remote.server.rpc.tls.compatibility=true/false
: Determine whether to support non-encrypted clients. Default value: true.*nacos.remote.server.rpc.tls.sslContextRefresher={spiName}
: Specify the SPI name of the certificate rotation sensor.● Start the Nacos Client
nacos.remote.client.rpc.tls.enable=true
: The client enables the TLS. If you set this parameter to true but the server does not support TLS or does not enable TLS, the connection still fails.nacos.remote.client.rpc.tls.trustAll=true/false
: Determine whether to trust all servers that support TLS. If you set this parameter to true, the CA verification is not performed on the server certificate.*nacos.remote.client.rpc.tls.trustCollectionChainPath={trustFilePath}
: If trustAll is false, you need to set the trusted server CA certificate file.*nacos.remote.client.rpc.tls.mutualAuth=true/false
: Determine whether to enable mutual authentication. If you set this parameter to true, you need to configure the certificate file and private key of the client simultaneously.*nacos.remote.client.rpc.tls.certChainFile={certFilePath}
: Specify the path of the client certificate file.*nacos.remote.client.rpc.tls.certPrivateKey={keyPath}
: Specify the private key file of the client certificate.Configuration storage security needs to solve the problem that when each medium that may contain configuration content is breached, the configuration can be accessed in plaintext. In this case, we need to encrypt the storage of configuration content. However, encrypted storage requires a third-party encryption system for assistance, which increases the complexity. Generally, we recommend that you encrypt the storage of some sensitive configurations. Next, we will introduce how to store sensitive configurations by using the configured encryption and decryption plugins.
The above figure introduces the publishing process to encrypt the configuration. The plaintext content is generated in the business application and sent to the server through the nacos-client. In the client, the encrypted configuration passes through the IConfigFilter filter, and the plaintext will be converted into ciphertext and dataKey through the encryption of the encryption and decryption plugin. The common method is to generate a random key locally and encrypt the plaintext content into ciphertext through this random key. The key is encrypted by the encryption and decryption plugin, and then the ciphertext configuration content and encrypted dataKey are sent to the server. The server persistently stores the ciphertext and dataKey, and notifies all nodes of the cluster to load the ciphertext and dataKey from the database to the local disk cache and memory cache of the server. In this case, even if the server is breached and the persistent database is breached, the corresponding plaintext cannot be decrypted by ciphertext and dataKey, which reduces the security risk caused by data leakage.
The following is the process of querying the encrypted configuration. The client queries the configuration from the server, and the server returns the ciphertext and dataKey to the client. The client decrypts the ciphertext and dataKey into plaintext with the decryption of the encryption and decryption plugin through the IConfigFilter filter and then returns the plaintext to the service listener in the memory. The local cache of the client stores the ciphertext and dataKey instead of the plaintext, which ensures that the configuration storage in the entire process is secure.
The introduction of third-party encryption and decryption plugins in the entire process increases data security and additional complexity. We recommend that you encrypt and decrypt some sensitive information, such as usernames and passwords, database configurations, AK/SK, and tokens. At the same time, since the encryption and decryption plugin implements the encryption process of ciphertext -> plaintext, it is also necessary to ensure the security of the data transmission during the interaction with the plugin, and if it is stateful, it is also necessary to ensure its storage security. In the commercial MSE Nacos, we use KMS as the implementation of the third-party encryption and decryption plugin to ensure overall data security.
This part describes how to use the Nacos authentication plugin to implement the access control of Nacos.
The authentication plugin abstracts the basic model of the Nacos access control as the client and the server. They implement the overall access control function based on the agreed rules.
Client
Server
Identity information IdentityContext can be username and password, AK/SK, or AK/TOKEN rotated automatically by STS/RAM ROLE. If you want to customize your identity information, you only need to implement SPI as needed, or you can implement the logic of dynamic identity rotation in the custom implementation. Regardless of the identity information methods, the client and server must maintain consistent rules, including the logic for identity information extraction and signature verification.
In addition to identity verification and signature verification, the server also needs to verify permissions of identities and access resources. The RBAC model is commonly used for user permission management. The full name of RBAC is role-based access control, which is currently used by the default authentication plugin of Nacos.
The RBAC model consists of three parts: permission, role, and user.
As mentioned above, the default authentication plugin of Nacos is also built based on the RBAC model and can complete basic permission control. To enable the default authentication function, you need to create three tables, users, roles, and permissions, to store the corresponding data.
Three steps are required to enable Nacos access control:
1) Enable authentication on the server
2) Create users/roles/permissions
3) After the user/role/permission server is created and authentication is enabled, the console login needs to verify the username and password. When calling the interface through nacos-client, you also need to pass in the username and password. When building ConfigService and NamingService, you need to pass in the username and password in the properties attribute. Otherwise, the interface access will return the no permission error code 403.
Note: The default open-source authentication plugin supports permission control based on the granularity of the namespace. Commercial MSE Nacos implements access control based on Alibaba Cloud Resource Access Management (RAM) and supports fine-grained service serviceName/ to configure access control at dataId level.
In the past, we prioritized user experience when designing Nacos, often overlooking security concerns. However, over the past year, Nacos has made significant strides in security. We've updated versions to address security risks, removed default security parameters, and raised user awareness about security when using Nacos.
In addition to implementing business functions, security is a critical issue that every architect must focus on today. We've described how to implement transmission security, storage security, and access control to build a secure zero-trust architecture when using open-source Nacos. We recommend enabling access control by default in production environments, disabling anonymous access, implementing fine-grained permission control, and using TLS to ensure data transmission security. We can then use encryption and decryption plugins to encrypt highly sensitive information. In the future, we'll continue to improve security protection policies on open-source Nacos, introduce security-related settings in Quick Start, and add more tips in the open-source console to enhance user data security awareness.
Nacos is an open-source product. In Nacos 2.0, we've modified multiple plugins. Users can customize plugins or enhance the security level of self-built Nacos, which requires additional customization and development work.
Here, we recommend trying commercial MSE Nacos to obtain one-stop security protection capabilities. In commercial MSE Nacos, we provide enhanced security protection capabilities, including unified management and distribution of TLS certificates and automatic rotation, support for encryption and decryption solutions with KMS3.0, fine-grained authentication rules combined with RAM, and advanced security protection capabilities such as daily security inspections and risk management.
SLS New Query Paradigm: Interactive Exploration of Logs Using SPL
OpenKruise v1.6: Enhanced Multi-domain Management Capability
506 posts | 48 followers
FollowAlibaba Cloud Native Community - May 17, 2022
Alibaba Developer - January 5, 2022
Alibaba Clouder - February 22, 2021
Alibaba Cloud Community - November 12, 2024
Alibaba Clouder - March 26, 2020
Alibaba Clouder - February 26, 2021
506 posts | 48 followers
FollowThis solution helps you easily build a robust data security framework to safeguard your data assets throughout the data security lifecycle with ensured confidentiality, integrity, and availability of your data.
Learn MoreIndustry-standard hardware security modules (HSMs) deployed on Alibaba Cloud.
Learn MoreAlibaba Cloud is committed to safeguarding the cloud security for every business.
Learn MoreSimple, secure, and intelligent services.
Learn MoreMore Posts by Alibaba Cloud Native Community