AMAP is a leading travel solution provider in China and addresses Navigation as its core user scenario. It facilitates route planning, a prerequisite for navigation, to develop a personalized travel plan based on the start-point, endpoint, and path policy settings.
Start-point road tracking is necessary for route planning, and its accuracy is crucial to the route-planning quality and user experience. This article outlines how to improve the accuracy of start-point road tracking of AMAP, and focuses on the exploration and practice of introducing machine learning algorithms.
In simple words, start-point road tracking is the process of acquiring the location information about a user who initiates a route planning request to connect the user's start point to the actual road where the user is located.
AMAP app provides the following three methods for selecting a start point during route planning:
1. Manual Selection: Here, the user manually marks his/her location on the map.
2. Point of Interest (POI) Selection: A POI indicates a geographic location, such as a store, residential area, or bus stop in a geographic information system.
3. Automatic Location: It automatically locates the user through GPS, a base station, or Wi-Fi.
Manual selection and POI selection generate location information that is more accurate as compared to automatic location mode and therefore highly improves the accuracy of start-point road tracking.
The location coordinates in automatic location mode tend to drift due to accuracy issues of GPS, base stations, and network location. The location captured by a locating device may be several meters, dozens of meters, or even hundreds of meters away from the actual road where the user is located. The primary issue is the accurate identification of users' locations (down to specific roads) with limited information.
Prior to the advent of machine learning, candidate roads were sorted based on manual rules during start-point road tracking. The core idea is to sort candidate roads based on the weighted scores that are calculated primarily on the basis of distance in combination with the angle, speed, and other features. The weights and thresholds involved in manual rules are manually determined based on comprehensive practice experience.
With the continuous growth across AMAP business, route planning requests, and scenarios, manual rules are increasingly faced with limited applicability. Some key challenges include:
Big data and AI drive the inevitable trend of using the power of data to replace manual operations with automatic processes to improve productivity.
To improve start-point road tracking based on manual rules, we introduce machine learning for automatically determining the relationship between features and road tracking results. AMAP's unique advantage of acquiring training data for machine learning models addresses the primary challenges of both, a large amount of planning and real-life movement data. Improved expressiveness enables machine learning models to learn the complex relationship between features, and therefore, improves the accuracy of road-tracking.
This section illustrates how to build a machine learning model for start-point road tracking. Let's deep dive to discover how machine learning is used to solve practical problems:
Before introducing a machine learning model, we must mathematically abstract the problem to be solved.
The preceding figure exhibits the schematic drawing of start-point road tracking, where a user initiates a route planning request at point A, and the roads that surround point A constitute an independent set B. The road where the user is located is a unique element C in set B. In this case, start-point road tracking is the process of selecting the road where the user is most likely to be located from the set of roads that surround point A.
Such a process is similar to Searching and Sorting, the two means used for modeling. It includes the following steps:
Finally, start-point road tracking is defined as a supervised process of searching and sorting. After determining the target, we proceed to the issues of data acquisition and feature engineering.
According to industry norms, models and algorithms are only the means of approximating the upper limit of machine learning that is determined by data and features. Data and features are critical to the final effect of a project.
To train a machine learning model for start-point road tracking, we need to acquire the following two types of data from raw data:
Truth value data is the road information about the user who sends a route planning request. In start-point road tracking, the first issue to be addressed by machine learning is the acquisition of truth-value data. When a user initiates a route planning request at point A, the user's actual location cannot be determined due to the accuracy limitations of location.
However, if the user has information about real-life movement in the area around point A, we can match the real-life movement information with the road network to generate a motion track, which can be used to acquire the road where point A resides. We combine real-life user movement and route planning information by mining the navigation request data of AMAP to acquire a dataset of a one-to-one mapping between requests and truth values.
In the start-point road tracking model, we extract three categories of features for building sample sets: anchor point-related features, road features, and features that are a combination of the preceding two feature categories.
Feature processing is the core of feature engineering, and feature preprocessing varies depending on different projects. Special processing is required based on the actual scenario and depends on professional experience. In start-point road tracking, we perform a series of data cleansing operations on anchor point-related features, including sample deduplication, outlier processing, error value correction, and mapping.
After start-point road tracking is defined as a process of searching and sorting, we can use the various ranking techniques of machine learning such as point-wise, pair-wise, or list-wise to achieve our target. Based on start-point road tracking features, we select the list-wise approach, whose learning-to-rank framework has the following characteristics:
Also, we select normalized discounted cumulative gain (NDCG) as the model evaluation indicator, which comprehensively considers the relationship between the model sorting result and the actual sequence. NDCG is also a commonly used indicator to measure the sorting results.
We extract request information for a certain period of time and acquire truth values and feature data by using the method described in step 2. Moving ahead, we build sample sets by means of tagging, divide the sample sets into a training set and a test-set, train a model, and check whether the results meet our expectations
To evaluate the model performance, we perform road-tracking for the requests in the test-set by using manual rules and machine learning. Then, we compare the results of manual rules and machine learning with truth values and calculate accuracy.
On comparing the results, we find a 10% difference between the results of road tracking via manual rules and the machine learning model. In contrast, model-based tracking shows a 40% increase in accuracy compared to road tracking based on manual rules. The improvement is significant.
At AMAP, we have introduced some scenarios for the application of big data and machine learning to start-point road tracking. The successful launch of the project demonstrates that machine learning plays an important role in improving accuracy and optimizing processes.
In the future, we hope to continue revising existing model scenarios, finding new benefits, and optimizing the effects of machine learning in road tracking through exploration from the perspectives of data and models.
Alibaba Clouder - June 17, 2020
amap_tech - April 20, 2021
amap_tech - March 23, 2021
amap_tech - October 29, 2020
amap_tech - April 20, 2020
Alibaba Clouder - February 28, 2019
Conduct large-scale data warehousing with MaxCompute
Learn MoreA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreAn on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by amap_tech