×
Community Blog PostgreSQL Best Practices: Overcoming Ride-Hailing Conflicts Arising in a Trip Dispatch System

PostgreSQL Best Practices: Overcoming Ride-Hailing Conflicts Arising in a Trip Dispatch System

In this article, the author discusses optimizing the PostgreSQL database to overcome ride-hailing conflicts in crowded areas during peak hours.

By digoal

Background

During peak hours, many people in a certain area may book a ride at the same time. For example, a lot of people may book a ride outside an office building when work shifts end or when a concert ends at a stadium.

In these scenarios, a request is initiated from the same location, the nearest car is found, and then the car is locked.

These scenarios may lead to locking conflicts in a database. A previous article describes advisory locks that can significantly improve the database throughput. However, there is still a hidden drawback. If everyone sends a request to lock nearby cars from the same location, redundant scanning and filtering problems will occur. (After A locks the first nearby car, B must skip the first car and lock the second nearby car, and C must skip the first two nearby cars and lock the third car, and so forth.) The higher the concurrency is, the more cars are skipped. If this drawback is resolved, the performance will improve greatly.

How to resolve this drawback?

Optimization method: When more than one person books a ride at the same time and place, and if the same rule of selecting a nearby car is used, the same car will be selected multiple times. This article describes how to use advisory locks to prevent row lock conflicts. Although this method prevents lock conflicts, scanning is still performed from near to far. When concurrent requests are initiated, multiple rows in sessions are scanned for unlocked rows. Therefore, better optimization methods are required.

Here is a method that is similar to "group commit" in a database. In this method, multiple cars are selected simultaneously and allocated by the system to people who book a ride at the same place. There is also a method that must be implemented in a database. A discrete factor is provided, and the car selected each time may not be the nearest car.

Cars within a specified distance can be selected randomly, but this selection must be done in the index, and only one result can be selected during both the index scan and heap scan in the database. This is similar to random scanning of indexed points.

Starting from the Big Bang

1

According to the big bang theory, the beginning of the universe was a point. After the big bang, the point gradually expanded and formed the present universe.

The ride-hailing situation for DiDi is similar. During peak hours, many people from an office building concentrate at one point, and available DiDi cars are on the periphery or nearby. As mentioned earlier, if the distance to a car is calculated from this point for all these people, the nearest car is the same car. Therefore, conflict occurs.

2

Optimization Idea

You can scatter concentrated points in an office building to prevent conflicts when the system searches for the nearest cars. This reduces the filtering effort, improves the search performance, and increases the throughput.

Example

This article provides only one example because the methods for spatial data processing are similar (specify a discrete range for X and Y axes to scatter the data).

Assume that we have 10 million data records, the ID is the primary key, and the requirement is to search for the ID closest to an input value and lock the ID. If the ID is locked, we need to lock the next ID closest to the input value.

For example, if the input value is 50, lock 50 first. If 50 has been locked by another session, lock 49 or 51, and so forth.

1) Create a test table.

postgres=# create table a(id int primary key, info text, crt_time timestamp);    
CREATE TABLE    

2) Write 10 million test data records into the table.

postgres=# insert into a select generate_series(1,10000000), 'test', now();

3) Create a GiST index that supports distance search operators.

postgres=# create extension btree_gist;    
CREATE EXTENSION    
postgres=# create index idx_a_1 on a using gist(id);    

Create a test script to simulate the ride-hailing scenario during peak hours: Enter the same point to lock the closest point. If the closest point has been locked, skip it and lock the point next to it. In this example, we use pg_try_advisory_xact_lock() to skip locked rows.

vi test.sql    
    
select * from a where pg_try_advisory_xact_lock(id) order by id <-> 5000000 for update limit 1;    

Start the stress testing. The throughput can be up to 49,000 TPS.

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120    
    
progress: 3.0 s, 45775.8 tps, lat 1.224 ms stddev 0.828    
progress: 4.0 s, 45571.5 tps, lat 1.229 ms stddev 0.826    
progress: 5.0 s, 49345.6 tps, lat 1.135 ms stddev 0.747    
progress: 6.0 s, 48948.0 tps, lat 1.144 ms stddev 0.856    
progress: 7.0 s, 49578.2 tps, lat 1.129 ms stddev 0.758    

Analyzing the Optimizable Aspects: Filtering

As previously mentioned, there is one problem when concurrent requests are initiated from the same point to search for and lock the nearest car. After A locks the first nearby car, B must skip the first car and lock the second nearest car, C must skip the first two and lock the third nearest car, and so forth. More cars are skipped if more parallel requests are received.

The following SQL statements can help understand this problem.

1) Session A

postgres=# begin;    
BEGIN    
    
    
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where pg_try_advisory_xact_lock(id) order by id <-> (5000000-500000+100) for update limit 1;    
                                                                QUERY PLAN                                                                    
------------------------------------------------------------------------------------------------------------------------------------------    
 Limit  (cost=0.42..0.54 rows=1 width=27) (actual time=0.096..0.097 rows=1 loops=1)    
   Output: id, info, crt_time, ((id <-> 4500100)), ctid    
   Buffers: shared hit=5    
   ->  LockRows  (cost=0.42..397168.88 rows=3333333 width=27) (actual time=0.095..0.095 rows=1 loops=1)    
         Output: id, info, crt_time, ((id <-> 4500100)), ctid    
         Buffers: shared hit=5    
         ->  Index Scan using idx_a_1 on public.a  (cost=0.42..363835.55 rows=3333333 width=27) (actual time=0.092..0.092 rows=1 loops=1)    
               Output: id, info, crt_time, (id <-> 4500100), ctid    
               Order By: (a.id <-> 4500100)    
               Filter: pg_try_advisory_xact_lock((a.id)::bigint)    
               Buffers: shared hit=4    
 Planning time: 0.111 ms    
 Execution time: 0.135 ms    
(13 rows)    

2) Session B: Before Session A releases the lock, a request is initiated from the same point to lock the nearest point.

Rows Removed by Filter: 1

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from a where pg_try_advisory_xact_lock(id) order by id <-> (5000000-500000+100) for update limit 1;    
                                                                QUERY PLAN                                                                    
------------------------------------------------------------------------------------------------------------------------------------------    
 Limit  (cost=0.42..0.54 rows=1 width=27) (actual time=0.128..0.128 rows=1 loops=1)    
   Output: id, info, crt_time, ((id <-> 4500100)), ctid    
   Buffers: shared hit=5    
   ->  LockRows  (cost=0.42..397168.88 rows=3333333 width=27) (actual time=0.127..0.127 rows=1 loops=1)    
         Output: id, info, crt_time, ((id <-> 4500100)), ctid    
         Buffers: shared hit=5    
         ->  Index Scan using idx_a_1 on public.a  (cost=0.42..363835.55 rows=3333333 width=27) (actual time=0.114..0.114 rows=1 loops=1)    
               Output: id, info, crt_time, (id <-> 4500100), ctid    
               Order By: (a.id <-> 4500100)    
               Filter: pg_try_advisory_xact_lock((a.id)::bigint)    
               Rows Removed by Filter: 1    
               Buffers: shared hit=4    
 Planning time: 0.112 ms    
 Execution time: 0.168 ms    
(14 rows)    

You can see in the preceding snippet that filtering is performed. Car #4500100 should be returned based on the distance, but it has already been locked. Therefore, the system skips it and looks for the next lockable point. Here, one row is removed by the filter.

The higher the number of parallel requests, the more rows are locked and removed by the filter, which affects the performance.

Optimization Method

Scatter concentrated points onto a plane with a specified radius, for example, with 1 km radius. That is to say, your current location is not used to calculate the distance from the nearest car. Instead, a random point that is one kilometer away from your current location is used for calculation.

We have set a distance range in this example. Specifically, we distribute concentrated points (customers) to scattered points (customers) within a 5,000 m distance from the concentrated points. Then we calculate the nearest point (car) for a scattered point (customer) and lock it (the car).

vi test.sql    
    
\set seed random(1,5000)    
select * from a where pg_try_advisory_xact_lock(id) order by id <-> (5000000+2500-:seed) for update limit 1;  

Stress testing result:

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120    
    
progress: 5.0 s, 150380.9 tps, lat 0.372 ms stddev 0.165    
progress: 6.0 s, 151711.9 tps, lat 0.369 ms stddev 0.168    
progress: 7.0 s, 152098.8 tps, lat 0.368 ms stddev 0.154    
progress: 8.0 s, 152003.3 tps, lat 0.368 ms stddev 0.156    
progress: 9.0 s, 152421.4 tps, lat 0.367 ms stddev 0.154    
progress: 10.0 s, 153108.7 tps, lat 0.366 ms stddev 0.148    
progress: 11.0 s, 151427.8 tps, lat 0.370 ms stddev 0.156    

Summary

We can randomly scatter concentrated points (customers) onto a plane of a specified range to lock the nearby points (cars). This can more than double the throughput, from 49,000 TPS to 151,000 TPS.

When you find something optimizable, go ahead and perform the optimization and improve the performance. This idea is helpful for all application scenarios involving hotspot elimination.

0 0 0
Share on

digoal

282 posts | 25 followers

You may also like

Comments

digoal

282 posts | 25 followers

Related Products