If your application is written in Java and the heap size of the Java virtual machine (JVM) is small, the application may encounter out of memory (OOM) errors. You can mount a Container Network File System (CNFS) volume to the log directory of your application. This way, the log that records OOM errors is automatically stored in the CNFS volume. This topic describes how to use CNFS to automatically collect the heap dumps of a JVM.
Prerequisites
A Kubernetes cluster is created. The Container Storage Interface (CSI) plug-in is used as the volume plug-in. For more information, see Create an ACK managed cluster.
An instance of Container Registry Enterprise Edition is created. For more information, see Create an instance of Container Registry Enterprise Edition.
CNFS is used to manage File Storage NAS (NAS) file systems. For more information, see Use CNFS to manage NAS file systems (recommended).
Background information
CNFS allows you to abstract NAS file systems as custom Kubernetes objects by using the CustomResourceDefinition (CRD) resource. You can use the custom objects to create, delete, describe, mount, monitor, and expand NAS file systems. For more information, see CNFS overview.
Container Registry is a secure platform that allows you to manage and distribute cloud-native artifacts that meet the standards of Open Container Initiative (OCI) in an effective manner. The artifacts include container images and Helm charts. For more information, see What is Container Registry?.
Considerations
Set the Java argument Xmx to a value smaller than the memory limit of your application pod. This prevents the situations where OOM errors occur in the pod but not in the JVM.
To collect the heap dumps of a JVM, we recommend that you mount a new CNFS volume. This way, your application data and the heap dumps are stored in separate CNFS volumes. This prevents the .hprof file from occupying excessive storage, which may cause a negative impact on your application.
Procedure
You can use the registry.cn-hangzhou.aliyuncs.com/acs1/java-oom-test:v1.0 image to deploy a Java program that is used to trigger OOM errors in the JVM.
For more information about how to build an image, see Use a Container Registry Enterprise Edition instance to build an image.
Use the following template to create a Deployment named java-application.
When you launch the Mycode program, the heap size is set to 80 MB, and the heap dumps are written to the /mnt/oom/logs directory. If the heap cannot meet the requirement of the JVM, a HeapDumpOnOutOfMemoryError error is returned.
cat << EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: java-application spec: selector: matchLabels: app: java-application template: metadata: labels: app: java-application spec: containers: - name: java-application image: registry.cn-hangzhou.aliyuncs.com/acs1/java-oom-test:v1.0 # The image address of the sample Java application. imagePullPolicy: Always env: # Specify two environment variables. Set the key of one variable to POD_NAME and the value to metadata.name. Set the key of the other variable to POD_NAMESPACE and the value to metadata.namespace. - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace args: - java # Run the Java command. - -Xms80m # The minimum heap size. - -Xmx80m # The maximum heap size. - -XX:HeapDumpPath=/mnt/oom/logs # The path in which heap dumps are stored when OOM errors occur. - -XX:+HeapDumpOnOutOfMemoryError # Generate heap dumps when OOM errors occur. - Mycode # Run the Mycode program. volumeMounts: - name: java-oom-pv mountPath: "/mnt/oom/logs" # Mount the CNFS volume to the /mnt/oom/logs directory. subPathExpr: $(POD_NAMESPACE).$(POD_NAME) # Create a subdirectory named $(POD_NAMESPACE).$(POD_NAME). The subdirectory is used to store heap dumps that are generated due to OOM errors. volumes: - name: java-oom-pv persistentVolumeClaim: claimName: cnfs-nas-pvc # The persistent volume claim (PVC) that is used to mount the CNFS volume. The PVC name is cnfs-nas-pvc. EOF
Go to the Event Center module of the Container Service for Kubernetes (ACK) console. If a Back-off restarting warning event appears on the page, an OOM error has occurred in the java-application application.
Log on to the ACK console.
In the left-side navigation pane of the ACK console, click Clusters.
On the Clusters page, find the cluster that you want to manage and click the name of the cluster or click Details in the Actions column. The details page of the cluster appears.
In the left-side navigation pane of the cluster details page, choose .
To view, upload, and download files in NAS file systems, you can deploy a File Browser application. This allows you to perform these operations on a web page. Mount the NAS file system to the rootDir path of the File Browser application. Then, run the
kubectl port-forward
command to map the container port of the File Browser application to your on-premises machine. This way, you can use your browser to access files in the NAS file system.Use the following template to create a ConfigMap that is used by File Browser and the File Browser Deployment. By default, port 80 is opened.
cat << EOF | kubectl apply -f - apiVersion: v1 data: .filebrowser.json: | { "port": 80 } kind: ConfigMap metadata: labels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser name: filebrowser namespace: default --- apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser name: filebrowser namespace: default spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser template: metadata: labels: app.kubernetes.io/instance: filebrowser app.kubernetes.io/name: filebrowser spec: containers: - image: docker.io/filebrowser/filebrowser:v2.18.0 imagePullPolicy: IfNotPresent name: filebrowser ports: - containerPort: 80 name: http protocol: TCP resources: {} securityContext: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /.filebrowser.json name: config subPath: .filebrowser.json - mountPath: /db name: rootdir - mountPath: /rootdir name: rootdir dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - configMap: defaultMode: 420 name: filebrowser name: config - name: rootdir persistentVolumeClaim: claimName: cnfs-nas-pvc EOF
Expected output:
configmap/filebrowser unchanged deployment.apps/filebrowser configured
Map port 80 of File Browser to your on-premises machine.
kubectl port-forward deployment/filebrowser 8080:80
Expected output:
Forwarding from 127.0.0.1:8080 -> 80 Forwarding from [::1]:8080 -> 80
Open your browser, enter 127.0.0.1:8080 in the address bar, and then press Enter. The File Browser logon page appears. Enter the default username (admin) and password (admin). Then, click Login.
The cnfs-nas-pvc PVC is mounted to the rootDir directory. Double-click rootDir to open the NAS file system.
Result
On the File Browser page, find the default.java-application-76d8cd95b7-prrl2 directory that is created for java-application and named based on the subPathExpr: $(POD_NAMESPACE).$(POD_NAME)
configuration.
Navigate to this directory and find the heap dump file java_pid1.hprof. If you want to locate the exact line of code that triggers the OOM error, download java_pid1.hprof to your on-premises machine and use Eclipse Memory Analyzer Tool (MAT) to analyze the JVM stacks.