Continuous Profiler Agent is an Alibaba Cloud Java agent that is developed by the JVM team of Alibaba Cloud to collect performance data. Continuous Profiler Agent has been tested in large-scale production environments. It provides high performance and high stability. You can use Logtail to collect performance data reported by Continuous Profiler Agent from Java programs to the Full-stack Observability application for visualized monitoring and analysis.
Prerequisites
A Full-stack Observability instance is created. For more information, see Create an instance.
Limits
Only Linux Logtail V1.7 or later is supported.
The following Linux distributions are supported: CentOS, Red Hat, Alibaba Cloud Linux, Ubuntu, and Debian. The kernel version must be 2.6.32-431.23.3.el6.x86_64 or later. GNU C Library and MUSL Library are supported.
JDK versions are supported. The following table describes the details.
Engine type
CPU
Memory
AUTO engine
OpenJDK 8u272 and later, JDK 11, and JDK 17 are supported.
OracleJDK 11 and OracleJDK 17 are supported.
OracleJDK 8 is not supported.
OpenJDK 8u352 and later, OpenJDK 11.0.17 and later, and OpenJDK 17.0.5 and later are supported.
OracleJDK 8 is not supported.
OracleJDK 11.0.21 and later, and OracleJDK 17.0.9 and later are supported.
async_profiler engine
OpenJDK 8, OpenJDK 11, OpenJDK 17, OracleJDK 8, OracleJDK 11, and OracleJDK 17 are supported.
OpenJDK 8, OpenJDK 11, OpenJDK 17, OracleJDK 8, OracleJDK 11, and OracleJDK 17 are supported.
Resource consumption description
In most scenarios, the performance overhead for Java programs is less than 5%.
Step 1: Create a Logtail configuration
Log on to the Simple Log Service console.
In the Log Application section, click the Intelligent O&M tab. Then, click Full-stack Observability.
On the Simple Log Service Full-stack Observability page, click the instance that you want to manage.
In the left-side navigation pane, click Performance Monitoring.
If this is your first time to use Performance Monitoring in the instance, click Enable.
In the left-side navigation tree, click Data Import. On the Data Access Configurations page, find Common Push Import in the Performance Monitoring section.
The first time you create a Logtail configuration for this type of performance data, turn on the switch to go to the configuration page. If you created a Logtail configuration, click the icon to go to the configuration page.
Create a machine group.
If a machine group is available, click Use Existing Machine Groups.
If no machine groups are available, perform the following steps:
Check your server type.
If you use an Elastic Compute Service (ECS) instance that belongs to the same Alibaba Cloud account as Simple Log Service, click the ECS Instances tab, select Manually Select Instances and your ECS instance, and then click Create.
For more information, see Install Logtail on ECS instances.
If your server is an ECS instance that belongs to another Alibaba Cloud account, a server provided by a third-party cloud service provider, or a server deployed in a self-managed data center, you must manually install Linux Logtail V1.7 or later on the server. For more information, see Install Logtail on a Linux server.
ImportantAfter you manually install Logtail, you must configure a user identifier for the server. For more information, see Configure a user identifier.
If you use a Kubernetes cluster, install Logtail components by following the instructions in Collect monitoring data about Kubernetes resources.
After Logtail is installed, click Complete Installation.
In the Create Machine Group step, configure the Name parameter and click Next.
Simple Log Service allows you to create IP address-based machine groups and custom identifier-based machine groups. For more information, see Create an IP address-based machine group and Create a custom identifier-based machine group.
ImportantIf you install Logtail in a Kubernetes cluster, a machine group named in the {instanceId}-{clusterId}-k8s-cluster format is automatically generated. You can skip this step.
In the Machine Group Settings step, move your server from the Source Server Groups section to the Applied Server Groups section and click Next.
ImportantIf you enable a machine group immediately after you create the machine group, the heartbeat status of the machine group may be FAIL. This issue occurs because the machine group is not connected to Simple Log Service. To resolve this issue, you can click Automatic Retry. If the issue persists, see What do I do if a Logtail machine group has no heartbeats?
In the Specify Data Source step, configure the parameters and click Complete. The following table describes the parameters.
Parameter
Description
Config Name
The name of the Logtail configuration. You can enter a custom name.
Cluster
The name of the cluster. You can enter a custom name.
After you configure this parameter, Simple Log Service adds a
cluster=<Cluster name>
tag to the performance data that is collected by using the Logtail configuration.ImportantMake sure that the cluster name is unique. Otherwise, data conflicts may occur.
Address
The address for data collection. The default value is
http://:4040
, where 4040 is the default port of Pyroscope. If you retain the default value, the HTTP server uses the local address.If you use an ECS instance, specify the value in the following format:
IP address of the ECS instance:4040
.If you use a server that resides in a Kubernetes cluster, set the value to
logtail-kubernetes-metrics.sls-monitoring:4040
.If you use a server that is from a third-party cloud service provider or a data center, specify the value in the following format:
IP address of the server:4040
.
Endpoint
The default endpoint of Pyroscope. Default value:
/ingest
.Read Timeout Period
The timeout period for data read operations. Default value: 10. Unit: seconds.
Maximum Body Size
The maximum size of data that can be collected.
After you configure the settings, Simple Log Service automatically creates assets such as Metricstores. For more information, see Assets.
Step 2: Download a Java agent
Regions in China
wget https://logtail-release-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/jvm/continuous-profile-collector-agent-1.9.0.jar
Regions outside China
wget https://logtail-release-ap-southeast-1.oss-ap-southeast-1.aliyuncs.com/jvm/continuous-profile-collector-agent-1.9.0.jar
Step 3: Configure a Java program to push performance data
Configure the Java program by using JVM parameters
java \
-Dprofiling.app.name=your_service_name \
-Dprofiling.agent.upload.server="http://{host}:{port}" \
-Dprofiling.cpu.engine={engine} \
-javaagent:{path for javaagent} \
-jar demo.jar
Parameter | Description |
profiling.app.name | The name of the service. |
profiling.agent.upload.server | The address for data upload.
|
profiling.cpu.engine | The engine used for CPU hotspot monitoring. Default value: off. Valid values: auto, async_profiler, jfr, and off. The value off specifies that CPU hotspot monitoring is disabled. Other values specify that CPU hotspot monitoring is enabled. We recommend that you set the value to auto. |
Configure the Java program by using environment variables
export PROFILING_APP_NAME="your_service_name"
export PROFILING_AGENT_UPLOAD_SERVER="http://{host}:{port}"
export PROFILING_CPU_ENGINE="{engine}"
export PROFILING_ALLOC_ENGINE="{engine}"
Parameter | Description |
PROFILING_APP_NAME | The name of the service. |
PROFILING_AGENT_UPLOAD_SERVER | The address for data upload.
|
PROFILING_CPU_ENGINE | The engine used for CPU hotspot monitoring. Default value: off. Valid values: auto, async_profiler, jfr, and off. The value off specifies that CPU hotspot monitoring is disabled. Other values specify that CPU hotspot monitoring is enabled. We recommend that you set the value to auto. |
Do not start the address with http. The system automatically adds the http prefix to the address.
Do not end the address with a forward slash (/). The system automatically appends a forward slash (/) to the address.
none: The file is not compressed and is suffixed with .jfr.
gzip: The file is compressed and is suffixed with .jfr.gzip.
Empty: ""
Single thread: 123
Multiple threads: 122,123
Remarks
JVM parameter | Environment variable | Description |
profiling.app.name | PROFILING_APP_NAME | The name of the application. |
profiling.agent.upload.server | PROFILING_AGENT_UPLOAD_SERVER | The address of the server to which the Java Flight Recorder (JFR) file is uploaded. Default value: http://localhost:4040. |
profiling.agent.timeout | PROFILING_AGENT_TIMEOUT | The timeout period for uploading the JFR file. Default value: 10. Unit: seconds. |
profiling.agent.ingest.max.tries | PROFILING_AGENT_INGEST_MAX_TRIES | The maximum number of retries that are allowed for uploading the JFR file. Default value: 2. |
profiling.app.http.headers | PROFILING_APP_HTTP_HEADERS | The HTTP header that is used when you upload the JFR file. This parameter is empty by default. Example: SESSION_ID=1111;XXX=YYY. |
profiling.app.labels | PROFILING_APP_LABELS | The tag that is added to the JFR file when you upload the JFR file. This parameter is empty by default. Example: |
profiling.agent.log.level | PROFILING_AGENT_LOG_LEVEL | The log level. Default value: info. Valid values: info, debug, and error. |
profiling.agent.log.file | PROFILING_AGENT_LOG_FILE | The path to the log file. You can set the value to /path/to/profiling.log. By default, the path is written to Java stdout and stderr. |
profiling.period | PROFILING_PERIOD | The interval at which performance data is uploaded. Default value: 1. Unit: minutes. |
profiling.delay | PROFILING_DELAY | The performance monitoring latency. Default value: 0, which indicates that performance monitoring starts immediately after the performance monitoring engine is enabled. If you set the value to N, performance monitoring starts N seconds after the performance monitoring engine is enabled. |
profiling.start.at.zero.second | PROFILING_START_AT_ZERO_SECOND | Specifies whether to start performance monitoring at the 0th second of every minute. If you want to start performance monitoring at the 0th second of every minute, set the value to true. For example, if the value is set to true and the current time is 30 seconds of the current minute, the system automatically waits for 30 seconds before it starts performance monitoring. Default value: false. |
profiling.compression.mode | PROFILING_COMPRESSION_MODE | The compression mode. Default value: none. Valid values: gzip and none. |
profiling.trigger.mode | PROFILING_TRIGGER_MODE | The trigger mode. You can trigger periodic or one-time performance monitoring. Default value: periodic. Valid values: periodic and api. We recommend that you set the value to periodic in agent mode. |
profiling.output.format | PROFILING_OUTPUT_FORMAT | The format of the file. Default value: jfr. Valid values: jfr and collapsed. |
profiling.cpu.engine | PROFILING_CPU_ENGINE | The engine used for CPU hotspot monitoring. Default value: off. Valid values: auto, async_profiler, jfr, and off. The value off specifies that CPU hotspot monitoring is disabled. Other values specify that CPU hotspot monitoring is enabled. We recommend that you set the value to auto. |
profiling.cpu.interval | PROFILING_CPU_INTERVAL | The interval at which CPU hotspot monitoring is performed. A small value increases the overhead. Default value: 10. Unit: milliseconds. |
profiling.wallclock.engine | PROFILING_WALLCLOCK_ENGINE | The engine used for the monitoring of wall clock hotspots. Default value: off. Valid values: auto, async_profiler, and off. The value off specifies that the monitoring of wall clock hotspots is disabled. Other values specify that the monitoring of wall clock hotspots is enabled. We recommend that you set the value to off. |
profiling.wallclock.interval | PROFILING_WALLCLOCK_INTERVAL | The interval at which the monitoring of wall clock hotspots is performed. A small value increases the overhead. Default value: 20. Unit: milliseconds. |
profiling.wallclock.thread.filter | PROFILING_WALLCLOCK_THREAD_FILTER | The thread filter used for the monitoring of wall clock hotspots. Default value: 0, which indicates that no threads are involved. The following list provides examples on how to specify values: Thread range: 122 to 134 |
profiling.wallclock.threads.per.tick | PROFILING_WALLCLOCK_THREADS_PER_TICK | The maximum number of threads used to monitor wall clock hotspots. Default value: 8. |
profiling.alloc.engine | PROFILING_ALLOC_ENGINE | The engine used for Alloc hotspot monitoring. Default value: off. Valid values: auto, async_profiler, jfr, and off. Alloc hotspot monitoring refers to the monitoring of memory request hotspots. The value off specifies that Alloc hotspot monitoring is disabled. Other values specify that Alloc hotspot monitoring is enabled. We recommend that you set the value to auto. |
profiling.alloc.interval | PROFILING_ALLOC_INTERVAL | The interval at which Alloc hotspot monitoring is performed. A small value increases the overhead. Default value: 256. Unit: kilo bytes. |
profiling.jfr.max.size | PROFILING_JFR_MAX_SIZE | The upper limit of the size for the JFR file. If the size reaches the upper limit, the data in the file is automatically discarded. Default value: 64m. Example values: 256k and 10m. |
profiling.jfr.max.age | PROFILING_JFR_MAX_AGE | The upper limit of the age for the JFR file. If the age reaches the upper limit, the data in the file is automatically discarded. Default value: 10m. Example values: 1m, 1h, and 1d. |
profiling.jfr.max.stack.depth | PROFILING_JFR_MAX_STACK_DEPTH | The maximum stack depth that is allowed during JFR sampling. Default value: 64. |
What to do next
After you collect the performance data from Java programs to Full-stack Observability, you can use the performance monitoring feature to troubleshoot performance issues. For more information, see Data query and Data comparison.