In today's digitally connected world, maps are an essential part of everyday life, and data is indispensable for map services. Dynamic map services provide users direct and explicit access to smart features supported by a large amount of data.
In the nascent stages of map services technology, data was collected from a range of specialties via tools such as vehicles, bicycles, airplanes, and satellite images. In the last two years, map data is being collected through crowdsourcing via intelligent hardware. The collected data is then updated at unprecedented speed and with unparalleled accuracy. Rapid changes on the ground are driving users to increasingly depend on map services apps. With the growing demand for map services, map companies focusing on superior user experience prioritize the need for speed and accuracy of updating data as the key objective. The first step in the direction of effective data update is traffic sign detection.
This article describes the application of machine learning in map data generation for AMAP. The technical solutions and designs outlined below are verified and have accomplished great results. Additionally, these solutions ensure a basic technical guarantee for the rapid update of AMAP data.
It is the process of automatically detecting traffic signs, such as speed limit signs, U-turn prohibition signs, crosswalk signs, and electronic eyes, in images of street scenes. These detection results are delivered to a production process for generating map data for map services users
The key challenges of traffic sign detection include the complexity of the forms of traffic signs and the susceptibility to the natural environment during the shooting process. Traffic sign detection has strict requirements for algorithm performance to achieve fast data updates and high data accuracy. Let's deep dive to understand specific challenges:
Traffic signs vary hugely in the following aspects:
Figure 1: Common Traffic Signs (Signage)
Traffic signs may either be obstructed by vehicles or trees or worn out under natural conditions. Secondly, image collection may be affected by the weather or the season, resulting in blurred images and color distortion.
Figure 2: Traffic Signs Captured Under Natural Conditions
The algorithm accuracy gets severely impaired by the misidentification of signs that resemble traffic signs, such as business placards and public welfare billboards.
Figure 3: Examples of signs that resemble traffic signs and generate noise during traffic sign detection
The following are the requirements for algorithm performance to achieve fast data updates and high data accuracy:
The academic circle trains deep learning models for target detection, specifically for the end-to-end mode, to achieve the globally optimal detection results. The end-to-end mode is easy to use, as it simply requires annotating samples of hundreds of objects and putting them in the deep learning framework for iterative training to obtain the final model.
The end-to-end mode is divided into two-stage methods (Faster R-CNN[1]) and one-stage methods (YOLO[2], SSD[3]). It is critical to make note of the followings during actual application:
Considering the development of common target detection technologies and the traffic sign detection requirement posed by AMAP, we select Faster R-CNN as the basic detection framework for its better detection results (especially for small targets) and independent region proposal network (RPN), which can meet extensibility requirements. In terms of speed, we also implemented targeted optimization and adjustment.
Figure 4: Target detection and fine-grained classification for traffic sign detection
For actual application, we divide the detection framework into the following two phases:
It is the process of detecting all traffic signs in captured pictures through Faster R-CNN and thereby classifying the traffic signs in a coarse-grained manner at a higher recall rate and execution speed.
In practice, the following policies are adopted to improve algorithm capability:
Figure 5: Schematic Drawing of a Multi-RPN Design
It is defined as the process of classifying the candidate frames that are acquired during the target detection phase in a fine-grained manner, by filtering the noise to ensure a high recall rate and accuracy. In actual implementation, the following policies are also used to improve the results:
Figure 6: Modular Schematic Drawing of Fine-grained Classification
Fine-grained classification uses multiple models that increase the video RAM usage of the server and poses additional requirements for computing resources. To address these concerns, we optimize the deep learning framework by dynamically allocating and sharing temporary buffers among models, and then cropping the reverse propagation function of the framework. Such measures reduce video RAM usage by more than 50%.
The solution illustrated in the preceding section is officially launched. The accuracy of the recall rate meets the production requirements, and the average daily image throughput is more than 10 million. Figure 7 shows some results of the solution (different boxes indicate different detection results).
Figure 7 Results of Traffic Sign Detection
Traffic sign detection technology applied to AMAP helps to effectively improve the data production efficiency of AMAP and achieve the goal of updating map data at an approximate speed of T+0 (zero time difference).
Currently, we also use machine learning technology for automatic data production and further narrowing down the differences between the real world and map data so as to "connect the real world and make travel better."
Model Optimization: Application of Machine Learning to AMAP's Suggestion Service
Alibaba Clouder - April 6, 2021
Alex - January 22, 2020
Alibaba Clouder - June 17, 2020
Alibaba F(x) Team - June 20, 2022
Alibaba Cloud Security - December 12, 2019
PM - C2C_Yuan - October 11, 2021
A platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreOffline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn MoreMore Posts by amap_tech