Container Service for Kubernetes (ACK) allows you to collect and analyze the audit logs of Kubernetes components, including the audit logs of the API server, Ingresses, control plane components, and key Kubernetes events. This helps you locate causes when security issues or cluster issues are identified in the log data.
Use cluster auditing
The audit log of the API server of a Kubernetes cluster helps administrators track operations performed by different users. Auditing plays an important role in cluster security and cluster O&M. For more information about how to collect and analyze audit logs by using Simple Log Service, set custom alert rules, and disable cluster auditing, see Work with cluster auditing.
ACK provides the following audit policies:
For more information about descriptions of audit log fields, see audit-k8s-io-v1-Event.
apiVersion: audit.k8s.io/v1beta1 # This is required.
kind: Policy
#Do not generate audit events for requests in the RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
#Ignore the following requests because the requests are manually identified as high-volume and low-risk.
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core
resources: ["endpoints", "services"]
- level: None
users: ["system:unsecured"]
namespaces: ["kube-system"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"] # legacy kubelet identity
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes"]
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
- level: None
users: ["system:apiserver"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["namespaces"]
#Do not audit requests that are sent to the following read-only URLs.
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
#Do not audit requests that generated upon audit events.
- level: None
resources:
- group: "" # core
resources: ["events"]
#Secrets, ConfigMaps, and token reviews can contain sensitive and binary data.
#Therefore, you can audit only the metadata of these resources.
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps"]
- group: authentication.k8s.io
resources: ["tokenreviews"]
- level: Request
verbs: ["get", "list", "watch"]
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
#The default audit level for known API requests and responses.
- level: RequestResponse
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "networking.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "settings.k8s.io"
- group: "storage.k8s.io"
- group: "autoscaling.alibabacloud.com"
#The default audit level for other requests.
- level: Metadata
Enable internal activity auditing for exec containers
Typically, attackers run the exec command to log on to a container and initiate lateral attacks in the Kubernetes cluster. Once an attacker successfully logs on to a container, the default API server audit logs cannot record the commands initiated by the attacker. In this scenario, using container internal activity auditing can help O&M engineers obtain the audit logs of the commands initiated by an attacker after the attacker logs on to the container. This helps locate the cause of security events and stop business loss.
Use the audit log metadata
The Kubernetes audit log contains two annotations: authorization.k8s.io/decision
and authorization.k8s.io/reason
. The authorization.k8s.io/decision annotation indicates whether a request is authorized. The annotations are used to specify the reasons why specific API operations can be called.
Use node-problem-detector with the Kubernetes event center of Simple Log Service to identify abnormal cluster events
node-problem-detector is a tool maintained by ACK to diagnose Kubernetes nodes. node-problem-detector detects node exceptions, generates node events, and works with kube-eventer to generate alerts upon these events and enable closed-loop management of alerts. node-problem-detector generates node events when the following exceptions are detected: Docker engine hangs, Linux kernel hangs, outbound traffic exceptions, and file descriptor exceptions. In addition to node issues and exceptions detected by node-problem-detector, a Kubernetes cluster also generates events when the status of the cluster changes. For example, a Kubernetes cluster generates events when a pod is evicted and the cluster fails to pull an image. The Kubernetes event center of Simple Log Service collects all events generated in Kubernetes clusters and provides the following capabilities: storage, query, analytics, visualization, and alerting. The Kubernetes event center helps O&M engineers identify issues that may affect the cluster stability and abnormal events, such as regular users running the exec command to log on to specific containers. For more information, see Event monitoring.
Enable the Ingress dashboard
Ingress controllers of ACK allow you to stream all HTTP request log data to standard outputs. ACK is also integrated with Simple Log Service. You can create dashboards to monitor and analyze log data. The Ingress dashboard displays the following information about the status of Ingresses in a cluster: the number of page views (PVs), the number of unique visitors (UVs), inbound and outbound traffic, the average latency, and top URLs. This helps you gain insights into the service traffic, and detect malicious traffic and DDoS attacks at the earliest opportunity. For more information, see Ingress Dashboard.
Enable logging for CoreDNS
CoreDNS is deployed in ACK clusters and serves as a DNS server. You can check the log of CoreDNS to locate the causes of slow DNS resolution or analyze DNS queries for high-risk domain names. You can view the analytical report of the CoreDNS log in Simple Log Service dashboards. This helps you identify DNS queries for high-risk domain names. For more information, see Collect and analyze CoreDNS logs.