To ensure node security for an ACK managed cluster, you can manually limit the permissions of the worker Resource Access Management (RAM) role of the cluster based on the least privilege principle.
Prerequisites
An ACK managed cluster that runs Kubernetes 1.18 or later is created. ACK managed clusters are classified into ACK Pro clusters and ACK Basic clusters. For more information, see Create an ACK managed cluster and Update an ACK cluster.
If you want to limit the permissions of the worker RAM role of an ACK dedicated cluster, you must first migrate from the ACK dedicated cluster to an ACK Pro cluster. For more information, see Hot migration from ACK dedicated clusters to ACK Pro clusters.
Default roles are assigned to ACK to grant the permissions required by ACK managed clusters. For more information, see Assign default roles to ACK.
Step 1: Confirm whether permission limits are required
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage. On the cluster details page, click the Basic Information tab. Click the hyperlink on the right side of Worker RAM Role to go to the RAM console.
On the Permissions tab of the Role Details page, check whether policies are displayed.
If no policy is displayed, you do not need to limit the permissions of the worker RAM role.
If a policy is displayed, such as k8sWorkerRolePolicy-db8ad5c7***, you may need to limit the permissions of the worker RAM role. In this case, we recommend that you limit the permissions of the worker RAM role based on your requirements and the least privilege principle.
Step 2: Update system components
Update the key system components of the ACK managed cluster to the minimum required version or the latest version. For more information, see Manage system components.
Do not update multiple components at the same time. Instead, update them one after one. Before you start to update a component, make sure that the previous component is successfully updated.
Before you update a component, we recommend that you read and understand the remarks of the component in the following table.
Components can be installed from the Add-ons page in the ACK console or by using node pools. The following table describes how to update components installed by using the preceding methods.
Components installed from the Add-ons page
Go to the Add-ons page and update the installed components based on the descriptions in the following table. If a component is already of the minimum required version or the latest version, redeploy the component by running the corresponding command in the following table or by clicking Redeploy in the ACK console.
Component | Minimum required version | Redeploy command | Remarks |
metrics-server | v0.3.9.4-ff225cd-aliyun |
| None |
alicloud-monitor-controller | v1.5.5 |
| None |
logtail-ds | v1.0.29.1-0550501-aliyun |
| |
terway | v1.0.10.333-gfd2b7b8-aliyun |
|
|
terway-eni | v1.0.10.333-gfd2b7b8-aliyun |
| |
terway-eniip | v1.0.10.333-gfd2b7b8-aliyun |
| |
terway-controlplane | v1.2.1 |
| None |
flexvolume | v1.14.8.109-649dc5a-aliyun |
| |
csi-plugin | v1.18.8.45-1c5d2cd1-aliyun |
| None |
csi-provisioner | v1.18.8.45-1c5d2cd1-aliyun |
| None |
storage-operator | v1.18.8.55-e398ce5-aliyun |
| None |
alicloud-disk-controller | v1.14.8.51-842f0a81-aliyun |
| None |
ack-node-problem-detector | 1.2.16 |
| None |
aliyun-acr-credential-helper | v23.02.06.2-74e2172-aliyun |
| Before you start the update, you must grant permissions.
|
ack-cost-exporter | 1.0.10 |
| Before you start the update, you must grant permissions. |
mse-ingress-controller | 1.1.5 |
| Before you start the update, you must grant permissions. |
arms-prometheus | 1.1.11 |
| None |
ack-onepilot | 3.0.11 |
| Before you start the update, you must grant permissions. |
cluster-autoscaler installed by using node pools
Component | Minimum required version | Redeploy command | Remarks |
cluster-autoscaler | v1.3.1-bcf13de9-aliyun |
| You can use the following methods to view the version of cluster-autoscaler. For more information about how to update cluster-autoscaler, see [Component updates] Update cluster-autoscaler.
|
Check the configurations of Terway
If terway, terway-eni, or terway-eniip is installed in your cluster, you need to manually check the configuration file of Terway, which is the eni_conf
ConfigMap in the kube-system namespace.
Run the following command to view and modify the eni_conf ConfigMap:
kubectl edit cm eni-config -n kube-system
If the
"credential_path": "/var/addon/token-config",
setting is included in the eni-conf ConfigMap, no additional action is required.If the
"credential_path": "/var/addon/token-config",
setting is not included in theeni_conf
ConfigMap, you need to add a new row below themin_pool_size
parameter and specify"credential_path": "/var/addon/token-config",
in the row."credential_path": "/var/addon/token-config",
Run the corresponding command in the preceding table to redeploy Terway.
Step 3: Use ActionTrail to collect cluster logs
Use ActionTrail to collect API audit logs to analyze the API operations performed in the cluster. This way, you can identify the applications that rely on the RAM policy attached to the worker RAM role of the cluster. For more information about the Alibaba Cloud services that work with ActionTrail, see Services that work with ActionTrail.
We recommend that you collect audit logs that are generated within more than at least one week.
Go to the ActionTrail console and create a single-account trail in the region where the cluster resides. When you create the single-account trail, select Delivery to Simple Log Service. For more information, see Create a single-account trail.
Step 4: Perform a functional test on the cluster
After the preceding steps are completed, perform a functional test on the cluster to check whether the cluster works as expected.
Test item | Description | Reference |
Computing | Whether the cluster can scale nodes as expected. | |
Network | Whether the cluster can assign IP addresses to pods as expected. | |
Storage | Whether the cluster can deploy workloads that use external storage as expected if external storage is enabled. | |
Monitoring | Whether the cluster can generate alerts as expected. | |
Scalability | Whether the cluster can automatically scale nodes as expected if auto scaling is enabled. | |
Security | Whether the cluster can use the password-free image pulling feature as expected if the feature is enabled. | Use the aliyun-acr-credential-helper component to pull images without a password |
After the functional test is completed, verify the logic of the business deployed in your cluster to ensure that the business runs as expected.
Step 5: Analyze the logs collected by ActionTrail
Log on to the Simple Log Service console.
In the Projects section, click the project that you want to manage.
On the details page of the project, choose
and click the Logstore that you want to manage on the Logstores tab.The name of the Logstore that you use to store the logs collected by ActionTrail in Step 3 is in the actiontrail_<trail name> format
Use the following query statement to retrieve the API operations that the worker RAM role of the cluster performs by using STS tokens.
Replace
<worker_role_name>
with the name of the worker RAM role of the cluster.* and event.userIdentity.userName: <worker_role_name> | select "event.serviceName", "event.eventName", count(*) as total GROUP BY "event.eventName", "event.serviceName"
Step 6: Limit the permissions of the worker RAM role
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of the cluster that you want to manage. On the cluster details page, click the Basic Information tab. Click the hyperlink on the right side of Worker RAM Role to go to the RAM console.
On the Permissions tab of the Role Details page, click the RAM policy that you want to manage. On the Policy Content tab, click Modify Policy Document.
ImportantBefore you modify the policy, make a copy of the original policy content in case you need to roll back the policy.
Delete permissions from the policy based on your business requirements and the analysis result generated in Step 5. For example, you can delete the API operations that are not included in the analysis result from the
Action
section of the policy content. If you confirm that all API operations in the policy content are not required, you can detach the RAM policy from the worker RAM role.Redeploy the system component. For more information, see the redeploy commands in Step 2.
Repeat Step 4, Step 5, and Step 6 until the worker RAM role provides only the minimum permissions required by the components and applications in your cluster.
References
For more information about the authorization system of ACK, see Best practices of authorization.