Scenario | Problem description | Possible cause | Solution |
Wait for scheduling resources | Problem description 1: The logs of a synchronization task show that the task is waiting for gateway resources. Problem description 2: The General tab of the directed acyclic graph (DAG) of an instance generated for an auto triggered task shows that the instance waits for resources for a long period of time.
| A resource group for scheduling is used to issue the synchronization task to a compute engine instance for running. Therefore, if the number of synchronization tasks that are run in parallel on the resource group for scheduling reaches the upper limit, the current task must wait until the synchronization tasks finish running and the resources used by the tasks are released. | On the Intelligent Diagnosis page, you can view the information about the tasks that are using the resources of the resource group while the current task is waiting for resources. Note If you use the shared resource group for scheduling, we recommend that you migrate the task to an exclusive resource group for scheduling for running. |
Wait for resources in a resource group for Data Integration | The logs of a synchronization task show that the task is in the WAIT state. | The remaining resources in the resource group for Data Integration that you want to use to run the current synchronization task are insufficient to run the task. For example, a resource group for Data Integration supports a maximum of eight parallel threads. Three synchronization tasks are configured to run on the resource group for Data Integration. Three parallel threads are configured for each of the synchronization tasks. If two of the tasks are run in parallel on the resource group, the resource group for Data Integration can support two more parallel threads. In this case, the remaining task has to wait for resources in the resource group due to insufficient resources, and the logs of the task show that the task is in the WAIT state. | You can use one of the following methods to resolve the issue that other synchronization tasks in the resource group for Data Integration are using a large number of resources: Note On the Intelligent Diagnosis page, you can view the resource usage and the information about the tasks that are using the resources of the resource group while the current task is waiting for resources. The maximum number of parallel threads supported by a resource group for Data Integration varies based on the specifications of the resource group. For more information, see Overview of an exclusive resource group for Data Integration.
Check whether the tasks that are using the resources in the resource group for Data Integration are blocked or significantly slow down. If the tasks have any issues, you must resolve the issues or suspend the tasks. If the tasks are not blocked, wait until the tasks finish running and the resources used by the tasks are released, and then run the current task. Find the tasks that are using the resources in the resource group for Data Integration and ask the owners of the tasks to reduce the parallel threads for the tasks. Reduce the parallel threads that you specified for the current batch synchronization task. Then, commit and deploy the task again. Scale out the resource group for Data Integration. For more information, see Scale out or in a resource group.
|
Slow data synchronization speed | The logs of a synchronization task show that the task is in the RUN state but the data synchronization speed is 0. If the synchronization task is running but remains in such a state for an extended period of time, we recommend that you click the link to the right of Detail log url to view the execution details of the task. If the value of the WaitReaderTime parameter is large in the task logs, it takes a long time to wait for data to be returned from the source. | An inappropriate shard key is specified. The SQL statements that are generated based on the shard key to read data from the source database are executed for an extended period of time. The SQL statements that are used to read data from the source database take an extended period of time to execute. For some Readers, you need to configure the parameters such as where and querySql. The parameters slow down the execution of the SQL statements. Scenario: A full table scan slows down the data synchronization because no index is added in the WHERE clause. Database loads are high when the synchronization task is run. Network issues such as the bandwidth (throughput) and network speed issues exist.
Note If you synchronize data over the Internet, the data synchronization speed cannot be ensured. | |
The logs of a synchronization task show that the task is in the RUN state but the data synchronization speed is 0. If the synchronization task is running but remains in such a state for an extended period of time, we recommend that you click the link to the right of Detail log url to view the execution details of the task. If the value of the WaitWriterTime parameter is large in the task logs, it takes a long time to write data to the destination. | The SQL statements that you configure to execute before and after data is written to the destination database take an extended period of time to execute. For some Writers, you need to configure the parameters such as preSql and postSql. The parameters slow down the execution of the SQL statements. Database loads are high when the synchronization task is run. Network issues such as the bandwidth (throughput) and network speed issues exist.
Note If you synchronize data over the Internet, the data synchronization speed cannot be ensured. |
The logs of a synchronization task show that the task is in the RUN state but the data synchronization speed is slow. | An inappropriate shard key is specified for a synchronization task that is used to synchronize data from a relational database. In this case, the parallelism settings become invalid for the task. The task runs a single thread to synchronize data. The maximum number of parallel threads is set to a small value. A large amount of dirty data is generated during data synchronization, which affects the synchronization speed. Database performance issues exist. Note A database that has better performance can support more parallel threads. In this case, you can specify a large parallelism value for a synchronization task. Network issues such as the bandwidth (throughput) and network speed issues exist.
Note If you synchronize data over the Internet, the data synchronization speed cannot be ensured. | Specify an appropriate shard key. For more information about how to specify a shard key for a synchronization task, see View run logs generated for a batch synchronization task. Specify the maximum number of parallel threads for each synchronization task that is run on a resource group for Data Integration based on the total number of parallel threads that are supported by the resource group. You can increase the number of parallel threads for the current synchronization task based on your business requirements. You can configure the parallelism for a synchronization task on the codeless UI. The following example shows how to configure the parallelism in the code editor: Process dirty data. For more information about the definition of dirty data, see Terms. If synchronization tasks are run in distributed execution mode, make sure that the value obtained after the number of parallel threads for the tasks is divided by the number of ECS instances in a resource group for Data Integration is less than or equal to the maximum number of parallel threads that are supported by a single ECS instance in the resource group for Data Integration. If you want to synchronize data across clouds or regions, we recommend that you synchronize data over an internal network after network connections are established. For information about network connectivity solutions, see Establish a network connection between a resource group and a data source. Check database loads.
|