×
Community Blog Zero-Trust Security Practice of Nacos

Zero-Trust Security Practice of Nacos

This article introduces how to ensure Nacos data security based on the concept of zero-trust security, covering common security risks, Nacos runtime p.

By Zunfei Liu

As a configuration center, Nacos often stores sensitive information. Misuse of Nacos can lead to security risks. The two most common problems are:

  1. Can Nacos be exposed to the Internet? No, because Nacos is a registry configuration center and an internal system, it should not be exposed to the Internet.
  2. Is it okay to enable Internet access without authentication? No, because it leaves the system vulnerable to unauthorized access and cyber attacks.

Check if you have these two types of problems. If you do, solve them immediately. After resolving these basic issues, let's explore the advanced security features.

Practical Background of Security Risk

With the rise of cloud computing, the Internet of Things (IoT), big data, and AI, the modern system network deployment architecture has become more complex. The traditional network boundary is becoming increasingly blurred, which has a significant impact on traditional network security architecture. Security risks have a greater impact on enterprise development. In this context, the concept of zero-trust security has emerged and is guiding modern network security architecture. This article will introduce how to ensure Nacos data security based on the concept of zero-trust security.

The content shared today will be divided into three parts. The first part will briefly introduce Nacos and its common usage scenarios. The second part will analyze possible security risks based on Nacos runtime principles. The lack of necessary security protection for Nacos is also a cause of Nacos data leakage. The last part will share best practices for Nacos under zero-trust security and how to make Nacos run more securely in production environments.

Development and Scenarios of Nacos

_1

What is Nacos? Its name is the acronym for Dynamic Naming and Configuration Service, and its core functions include dynamic service discovery management, dynamic configuration management, and dynamic DNS service. Dynamic service management can be combined with the upper-layer microservice RPC, such as Spring Cloud and Dubbo, to realize that when a node goes online or offline, the traffic is automatically adjusted or redirected. Dynamic configuration management provides the ability to change the runtime behavior of business applications without restarting business nodes during the runtime. It can also be combined with CoreDNS to export services registered on Nacos as DNS domain names to implement dynamic DNS services.

_2

The figure above lists some common usage scenarios of Nacos, covering fields like microservice, high availability, frontend ecology, and database. From a product functionality perspective, service discovery and registration will focus more on scenarios involving the mapping of service names or domain names to IP addresses. On the other hand, configuration management will cover a wider range of scenarios, including critical ones such as traffic scheduling, routing rules, emergency plans, key business gates, and database data source configurations. The sensitivity of data is crucial, as any miswriting or leakage of password data can lead to significant data security risks.

Security Risks Faced by Nacos

_3

The above diagram illustrates the Nacos runtime. On the left side are business applications and nacos-client, which are typically deployed in the same process. On the right side is the Nacos server, comprising Nacos-Server and centralized persistent storage. From a data flow perspective, the business application calls the nacos-client's configuration publishing interface, transmitting data to the Nacos Server through the network. The Nacos Server then persistently stores the configuration content and loads it into the local disk cache and content cache of the server. When the client queries the configuration, the configuration content is pulled from the server to the local server. For high availability and disaster recovery, the client caches the configuration content on its local disk. From a data perspective, the configuration content is stored in the local disk cache of the business application, the local disk cache of the server, and the persistent database, and passes through each intermediate network device during transmission.

Now, let's assume that we ignore the secure intranet and that all these components are exposed to the Internet. Based on this assumption, several issues may arise.

  • Business Application Machines are Compromised

Since Nacos configuration data is cached on the client, if a business application server is compromised, the configuration content in the local cache will be leaked.

  • Data is Intercepted by Intermediate Network Devices During Transmission

Data may be captured by traffic as it passes through intermediate network devices. If data is transmitted in plaintext, it will be leaked.

  • Nacos Server is Compromised

If the machine hosting the Nacos server is compromised, the content cached in the local disk will be obtained in plaintext.

  • Persistence Layer Database is Breached

Configuration leakage can also occur due to a data breach of the centralized database for configuration storage.

  • Nacos Server is Publicly Accessible

If the Nacos server does not have policy-based access control enabled, anyone with the server node's IP address can pull configuration content through the Nacos API, or a broad ACL rule is set for the Nacos server node. These scenarios are common when users deploy Nacos, such as in commercial MSE Nacos cases where many users have enabled Internet access for their instances and set broad ACL rules without enabling authentication, creating high-risk scenarios.

_4

The issues introduced above can be concluded from three aspects: storage security, transmission security, and access control.

We have made several assumptions before, such as ignoring the problem of secure intranet isolation, and assuming that every environment may be breached. In fact, this is a very important principle in the concept of zero-trust security, "never trust, always verify". In the next section, we will introduce the zero-trust security practice of Nacos and how to ensure Nacos data security through zero-trust security.

Zero-trust Security Practice of Nacos

Let's first briefly understand the relevant content of zero-trust security.

Definition of Zero Trust

As the name implies, no unauthenticated access entity should be trusted by default. Zero trust is a modern network security access concept, and its core idea is "never trust, always verify."

Problems that Zero Trust Addresses

The emergence of zero trust is driven by a series of realistic factors under the current IT network architecture.

  • Network boundaries are blurred, and modern system network architectures are more intricate and sophisticated. The rise of cloud computing and IoT has broken the traditional secure intranet boundaries.
  • Security breaches are unavoidable: Cyber attacks and data leakage incidents increase, but traditional security models cannot address these problems.
  • Risk of data leakage: Data leakage will cause direct economic loss to the enterprise and the impact on public opinion.
  • Compliance requirements: In the overall environment, the industry and regulations require enterprises to verify network security more strictly in real time.

Basic Functions of Zero Trust

Several basic features of Zero Trust include:

  • Identity management: provides full lifecycle management of identities.
  • Identity authentication: authenticates the identities of all access sources.
  • Access control: ensures authorization of resources accessed by network entities.
  • Transmission security: ensures that all data transmitted in the network will not be stolen and tampered with.
  • Behavior monitoring: continuously monitors behaviors and responds to exceptions in real time.

Basic Principle of Zero Trust

  • Never trust, always verify
  • Access control, and least privilege access
  • Differential segmentation to prevent lateral movement and reduce the attack surface

Transmission Security of Nacos

Next, I will introduce the zero-trust security practices of Nacos from three aspects: Nacos transmission security, storage security, and access control.

_5

The first aspect is Nacos transmission security using TLS. We know that TLS solves three problems in the data transmission process. First, it ensures data confidentiality by encrypting data, making it impossible for a third party to obtain plaintext data. Second, it guarantees integrity by preventing data tampering by a third party. Third, it solves the identity authentication problem between communicating parties, preventing man-in-the-middle attacks.

The TLS handshake process can be broken down into several main stages, which are partially simplified here. The first stage involves confirming the TLS protocol version, encryption algorithm, compression method, and other basic information between the client and server. The second stage exchanges and verifies the certificates of both parties, ensuring identity authentication. The third stage negotiates the symmetric key used in the subsequent actual message transmission process through asymmetric encryption. In the fourth stage, both parties send an encrypted switching message, and communication begins based on the symmetric encryption key. The fifth stage involves actual application message transmission, using symmetric encryption to transmit data and performing MAC verification to ensure integrity.

_6

Nacos uses gRPC as the underlying communication protocol in version 2.x. gRPC uses Netty as the network communication framework. The implementation of Nacos's TLS is also based on gRPC/Netty. The figure describes the basic components of the client and the server. The part in light blue shows how users can control whether to enable TLS through parameters, including property files, JVM parameters, and environment variables. The part in dark blue is the components provided by the Nacos layer to accept parameters and convert the parameters to the underlying gRPC and Netty components to implement TLS functions. In addition, the server supports the dynamic rotation function of the server certificate, and users can customize the SPI extension to implement the processing logic when the server certificate file changes.

_7

Here, we will explain how to activate TLS encryption for transmission in Nacos. The entire process involves three steps:

Prepare a Certificate

You can purchase a commercial certificate to obtain the relevant file information. For development and testing, you can use self-signed keytool openssl to generate an SSL certificate. In this step, you need the following information:

  • CA certificate file: it is used for the validity of the peer certificate. The peer certificate must be issued by the CA to prevent the identity from impersonating a man-in-the-middle attack.
  • Certificate file and certificate private key file: it is used to enable TLS on the server.
  • Private key file password: for security reasons, you usually need to set a password for the private key file.

Start the Nacos Server

  • nacos.remote.server.rpc.tls.enable=true: Enable the TLS. If you set this parameter to true, TLS is enabled on the server.
  • nacos.remote.server.rpc.tls.certChainFile={certFilePath}: Specify the certificate file path.
  • nacos.remote.server.rpc.tls.certPrivateKey={keyPath}: Specify the certificate private key file.
  • *nacos.remote.server.rpc.tls.mutualAuth=true/false: Determine whether to enable mutual authentication. Default value: false. If you set this parameter to true, the identity of the client needs to be verified. The client also needs to configure the certificate and private key file simultaneously.
  • *nacos.remote.server.rpc.tls.trustCollectionChainPath={trustFilePath}: A trusted client CA certificate is used to verify the validity of the client certificate when mutual authentication is enabled.
  • *nacos.remote.server.rpc.tls.compatibility=true/false: Determine whether to support non-encrypted clients. Default value: true.
  • *nacos.remote.server.rpc.tls.sslContextRefresher={spiName}: Specify the SPI name of the certificate rotation sensor.

Start the Nacos Client

  • nacos.remote.client.rpc.tls.enable=true: The client enables the TLS. If you set this parameter to true but the server does not support TLS or does not enable TLS, the connection still fails.
  • nacos.remote.client.rpc.tls.trustAll=true/false: Determine whether to trust all servers that support TLS. If you set this parameter to true, the CA verification is not performed on the server certificate.
  • *nacos.remote.client.rpc.tls.trustCollectionChainPath={trustFilePath}: If trustAll is false, you need to set the trusted server CA certificate file.
  • *nacos.remote.client.rpc.tls.mutualAuth=true/false: Determine whether to enable mutual authentication. If you set this parameter to true, you need to configure the certificate file and private key of the client simultaneously.
  • *nacos.remote.client.rpc.tls.certChainFile={certFilePath}: Specify the path of the client certificate file.
  • *nacos.remote.client.rpc.tls.certPrivateKey={keyPath}: Specify the private key file of the client certificate.

Storage Security of Nacos

Configuration storage security needs to solve the problem that when each medium that may contain configuration content is breached, the configuration can be accessed in plaintext. In this case, we need to encrypt the storage of configuration content. However, encrypted storage requires a third-party encryption system for assistance, which increases the complexity. Generally, we recommend that you encrypt the storage of some sensitive configurations. Next, we will introduce how to store sensitive configurations by using the configured encryption and decryption plugins.

_8

The above figure introduces the publishing process to encrypt the configuration. The plaintext content is generated in the business application and sent to the server through the nacos-client. In the client, the encrypted configuration passes through the IConfigFilter filter, and the plaintext will be converted into ciphertext and dataKey through the encryption of the encryption and decryption plugin. The common method is to generate a random key locally and encrypt the plaintext content into ciphertext through this random key. The key is encrypted by the encryption and decryption plugin, and then the ciphertext configuration content and encrypted dataKey are sent to the server. The server persistently stores the ciphertext and dataKey, and notifies all nodes of the cluster to load the ciphertext and dataKey from the database to the local disk cache and memory cache of the server. In this case, even if the server is breached and the persistent database is breached, the corresponding plaintext cannot be decrypted by ciphertext and dataKey, which reduces the security risk caused by data leakage.

_9

The following is the process of querying the encrypted configuration. The client queries the configuration from the server, and the server returns the ciphertext and dataKey to the client. The client decrypts the ciphertext and dataKey into plaintext with the decryption of the encryption and decryption plugin through the IConfigFilter filter and then returns the plaintext to the service listener in the memory. The local cache of the client stores the ciphertext and dataKey instead of the plaintext, which ensures that the configuration storage in the entire process is secure.

The introduction of third-party encryption and decryption plugins in the entire process increases data security and additional complexity. We recommend that you encrypt and decrypt some sensitive information, such as usernames and passwords, database configurations, AK/SK, and tokens. At the same time, since the encryption and decryption plugin implements the encryption process of ciphertext -> plaintext, it is also necessary to ensure the security of the data transmission during the interaction with the plugin, and if it is stateful, it is also necessary to ensure its storage security. In the commercial MSE Nacos, we use KMS as the implementation of the third-party encryption and decryption plugin to ensure overall data security.

Access Control of Nacos

This part describes how to use the Nacos authentication plugin to implement the access control of Nacos.

_10

The authentication plugin abstracts the basic model of the Nacos access control as the client and the server. They implement the overall access control function based on the agreed rules.

  • Client

    • Extract and identify client identity information IdentityContext
    • Use identity information to sign access resources
    • Upload identity information and signature to the message
  • Server

    • Extract the identity information uploaded by the client identityNames
    • Verify identity legitimacy and verify signature validateIdentity
    • Verify the identity and access resources validateAuthority

Identity information IdentityContext can be username and password, AK/SK, or AK/TOKEN rotated automatically by STS/RAM ROLE. If you want to customize your identity information, you only need to implement SPI as needed, or you can implement the logic of dynamic identity rotation in the custom implementation. Regardless of the identity information methods, the client and server must maintain consistent rules, including the logic for identity information extraction and signature verification.

In addition to identity verification and signature verification, the server also needs to verify permissions of identities and access resources. The RBAC model is commonly used for user permission management. The full name of RBAC is role-based access control, which is currently used by the default authentication plugin of Nacos.

_11

The RBAC model consists of three parts: permission, role, and user.

  • Permission: A permission defines the access rules for a specified resource, including the resource definition, operations (read/write), and permissions (allow/deny). For example, the permission allows you to read configurations in namespace A.
  • Role: A virtual identity with a set of permissions, which contains a set of permissions, such as defining the administrator role of namespace A and assigning it to read and write configurations in namespace A.
  • User: An entity user that can actually access the system, which can be used as an identity. You can assign multiple roles to an entity user who indirectly has multiple permissions. Users can be divided into administrators and regular users. Administrators have the highest permissions and are generally authenticated for MFA. Regular users are created by administrators and can be assigned to multiple roles.

As mentioned above, the default authentication plugin of Nacos is also built based on the RBAC model and can complete basic permission control. To enable the default authentication function, you need to create three tables, users, roles, and permissions, to store the corresponding data.

_12

Three steps are required to enable Nacos access control:

1) Enable authentication on the server

  • Enable authentication: nacos.core.auth.enabled=true.
  • Set authentication plugin: nacos.core.auth.system.type=nacos, and specify it as the name of the built-in authentication plugin for the Nacos model.

2) Create users/roles/permissions

  • Change the default password of the admin role: After enabling the server authentication, you must set the username and password for logging on to the Nacos console. Replace the default password after you log on to the Nacos console for the first time. After authentication is enabled, log on to the console. On the left is the menu for permission control.
  • Create a regular user and set a username and password.
  • Create a role and bind a user-role.
  • Create permissions and bind resource access rules and roles.

3) After the user/role/permission server is created and authentication is enabled, the console login needs to verify the username and password. When calling the interface through nacos-client, you also need to pass in the username and password. When building ConfigService and NamingService, you need to pass in the username and password in the properties attribute. Otherwise, the interface access will return the no permission error code 403.

Note: The default open-source authentication plugin supports permission control based on the granularity of the namespace. Commercial MSE Nacos implements access control based on Alibaba Cloud Resource Access Management (RAM) and supports fine-grained service serviceName/ to configure access control at dataId level.

Summary

In the past, we prioritized user experience when designing Nacos, often overlooking security concerns. However, over the past year, Nacos has made significant strides in security. We've updated versions to address security risks, removed default security parameters, and raised user awareness about security when using Nacos.

In addition to implementing business functions, security is a critical issue that every architect must focus on today. We've described how to implement transmission security, storage security, and access control to build a secure zero-trust architecture when using open-source Nacos. We recommend enabling access control by default in production environments, disabling anonymous access, implementing fine-grained permission control, and using TLS to ensure data transmission security. We can then use encryption and decryption plugins to encrypt highly sensitive information. In the future, we'll continue to improve security protection policies on open-source Nacos, introduce security-related settings in Quick Start, and add more tips in the open-source console to enhance user data security awareness.

Nacos is an open-source product. In Nacos 2.0, we've modified multiple plugins. Users can customize plugins or enhance the security level of self-built Nacos, which requires additional customization and development work.

Here, we recommend trying commercial MSE Nacos to obtain one-stop security protection capabilities. In commercial MSE Nacos, we provide enhanced security protection capabilities, including unified management and distribution of TLS certificates and automatic rotation, support for encryption and decryption solutions with KMS3.0, fine-grained authentication rules combined with RAM, and advanced security protection capabilities such as daily security inspections and risk management.

0 1 0
Share on

You may also like

Comments