AI Sports: Best Practices of Alibaba Sports in End Intelligence

By Qisheng

1. Background

Over the past year, Alibaba's Sports Technology Team has been exploring end intelligence. As a result, Alibaba successfully launched the AI Sports Project that has adopted practices and empowered businesses in sports and health scenarios. The project aims to digitalize sports and marks the first step of Alibaba Sports in end intelligence. With many new and fun features, the project is welcomed by our customers, and we believe it will get more people interested in sports.

In 2020, the COVID-19 pandemic restricted outdoor sports significantly. As a result, there was an upswing in indoor sports. Alibaba's Sports Technology Team developed AI Sports to provide customers with fun and convenient ways to work out at home. All you need is a mobile phone and 3-4 square meters of space. Open the Ledongli app, fix your phone, set it to the proper angle, and adjust your distance with your phone according to the voice prompt until your body is entirely in the recognition frame. Then, you can enjoy a fun sports experience.

2. End Intelligence Practices

After a year of exploration and improvement, Alibaba verified demos and developed an AI sports platform that could recognize a large number of movements and support migration. Alibaba Sports has established a systematical AI sports system for clients. This system is based on Alibaba's deep inference engine to detect body poses and movements and analyze motion trail, body pose, and other data. Based on analysis results, it provides real-time feedback for users to adjust their movements. With the combination of capability modules, the system can recognize more than ten kinds of movements and supports dozens of sports, making online sports easier and more interesting for users.

3. Technical Support

The AI sports system is based on the MNN inference engine to infer and detect body poses. This procedure is explained below:

The system detects the body contour in real-time in images and videos and locates 14 human key points, such as the head, shoulders, and feet.
The system analyzes body poses, motion trail, and movement angle based on these key points.
The system times and counts users' movements using a matching technique to identify users' movements. Meanwhile, it detects and analyzes movements to provide real-time feedback for users to adjust their movements.

Traditionally, users can get real-time feedback and help from on-site persons (coaches, supervisors, or friends) during exercise. With AI Sports, users can only interact with their mobile phones to adjust their movements. The interaction and recognition capabilities of the system are affected by a series of factors, such as inference model capabilities, sports scene complexity, and the movement matching recognition algorithm. New challenges and difficulties will arise while exploring and improving AI sports. For example, you have to match the position of users and mobile phones correctly, the system may be unable to identify some points or identify wrong points, user's movement images may be distorted, and the problem of a waggling phone and noisy environment also has to be solved. We have chosen a few representative problems to explain below:

The foundation of Al Sports is the accurate detection of movements and the design of core algorithms to improve movement matching.
On the premise of ensuring recognition accuracy, the AI Sports Team takes effective measures to reduce resource consumption on mobiles terminals and improve user experiences, such as lowering power consumption and heating.
We adopt a more flexible approach to reduce the time consumption of mobile testing, improve development and testing efficiency, and provide strong support for the team's delivery guarantee.

3.1 Improve Recognition Accuracy

The most prominent user experience of the AI sports system is the accuracy of movement counting. If the movement recognition and counting are not accurate, the user's enthusiasm for the app will be dispelled, and users will be unwilling to engage in online sports. So, we must make accuracy our priority.

The basic principle of intelligent movement counting is to divide a complete movement into several small steps. Identify and determine each step independently, and verify the validity of the entire movement after all steps are traversed. If valid, the counter is increased by 1. If not, the procedures above are repeated. In short, intelligent movement recognition and counting is a state machine. A movement is divided into N state machines, such as {s(0),s(1),s(2),...,s(n-1)}. Then, each state machine is detected following a certain order. If all state machines are detected, it means the user has completed the movement, and the counter is increased by 1. If a certain state machine is not detected, the corresponding feedback information is triggered and provided, and the state machine will be reset to enter a new loop. Each state machine corresponds to certain triggering conditions. The movement matching result is obtained through real-time cyclic matching detection of human key points coordinates and states.

It is easy to see that the movement recognition accuracy is closely related to the movement matching algorithm; the better the matching effect of the algorithm, the higher the recognition accuracy. Factors, such as human key points, state machines, and matching, should be considered first to improve accuracy. The corresponding solutions are below:

Improve the stability of human key points to ensure the accuracy of state matching results
Select stable, recognizable, and representative movements as state machines
The frame rate should be able to cover all state machines of one movement.

The following content illustrates these methods:

The accuracy of human key point recognition is closely related to movement matching. As shown in the following picture, the key point of the left arm of the test object is incorrectly recognized. The system will get the wrong result if it matches the movement directly. In this situation, the user's historical movement information can be used to correct the movement matching based on the movement matching algorithm.

In another situation, the user has completed all the steps of a certain movement, such as jumping jacks in the following figure. Due to the low sampling frame rate, not all the poses in the process are captured and detected. Certain states fail to be matched, resulting in a match error. The low frame rate issue can be solved by improving the model and the input source. The model can be simplified to reduce the inference time without affecting the movement recognition accuracy. Input sources with different resolution ratios can be used to reduce the time of raw data processing for different terminal devices.

3.2 Reduce Performance Overhead

Due to hardware restrictions, the computing power and storage on the mobile platform are limited. In addition, deep learning inference involves a large number of operations, which consumes a lot of resources. If deep learning inference is performed on the mobile device directly, considering the resource consumption of the mobile phone's services, such as the camera, video recording, and animation effects, the overhead of CPU and memory increases significantly. The phone gets hot and consumes more power. When launching intelligent sports on the mobile platform, pay special attention to performance overhead since it is crucial for improving user experience.

We can reduce overall performance overhead by reducing single frame consumption. Single frame processing can be divided into three phases: pre-inference, inference, and post-inference.

These three phases play different roles. In the pre-inference phase, the data stream obtained from the camera is converted to formats required for inference, such as YUV and RGBA. The inference phase mainly completes the calculation and output of human key point coordinates. The inference engine executes a series of algorithms on input frame data and outputs inference results. For example, for human pose detection, the inference engine converts RGBA data from an input image into human key point coordinates. The post-inference phase is mainly for display through rendering and business-related operations such as UI display and animation display.

The three phases above can be optimized respectively. In the inference phase, optimization is achieved using the Alibaba MNN deep inference engine. In the pre-inference phase, the data stream from cameras can be converted to the required format directly. For example, if RGBA raw data is used in the inference phase, the data stream can be converted into RGBA format directly. In the post-inference phase, a rendering solution must be selected based on the platform to reduce consumption. For iOS platforms, Metal can be used to improve rendering efficiency.

3.3 Improve Testing Efficiency

AI Sports is a bold attempt by Alibaba's Sports Technology Team in sports digitalization. In the application development process, we dedicated a considerable workforce, equipment, and time to improve the application functions, application performance, and user experience, especially in the testing phase. In addition, the test effect of AI movement recognition is affected significantly by environmental factors, such as light, background, distance, and the image size of a person in a camera. This poses a challenge for the test method.

Let's take the traditional test plan as an example. Generally, the test is conducted through real-life, field, and real-time movements. Testers need to manually record the results for subsequent analysis, as shown in the following picture:

Mobile phones where AI intelligent sports runs are of different brands and models with different system versions and performance parameters. Users of AI intelligent sports may be in different environments. The traditional test method is unable to meet the challenge brought by these differences, which poses a great challenge for the tester. Moreover, the consistency and accuracy of the test cannot be guaranteed. The detailed causes are below:

High Labor Cost: A test requires the cooperation of multiple persons, which consumes much energy and time.
Single Test Environment: The traditional method cannot cope with complex and diverse online environments.
Difficult to Quantify Test Results: It is impossible to quantify and assess the model accuracy, algorithm efficiency, movement matching accuracy, improvement of accuracy, performance consumption, etc.
Difficult to Locate the Problem: Post-event analysis and troubleshooting are unable to recur and locate the problem complained by our customers online.

The Alibaba Sports Technology Team has developed an AI sports automatic testing tool that tests AI projects to overcome the preceding difficulties. Thus, the fast location of online problems can be realized, and the accuracy of the model algorithm is evaluated quantitatively.

The automatic testing tool can parse video sets in batches to simulate real scenarios, obtain human key point data, test business results, and generate test reports automatically. The following figure shows the specific technical solution:

The labor cost is reduced significantly, and the testing efficiency is improved after using the new testing tools. The specific test results are below:

Note: The effect of the testing tool is related to the number of test samples; the richer the samples, the better the test accuracy.

4. Business Achievements

AI Sports supports dozens of sports movements and provides a wide range of AI training courses. The modular combination of sports capabilities will support more new movements in the future.

Since the birth of AI Sports, Ledongli has successively launched upper limb exercises (such as straight arm jumping jacks and push-ups), lower limb exercises (Osuch as hip bridge and squat), and full-body exercises (such as jump rope and jumping jacks.) Users can participate in AI Sports with friends anytime and anywhere. Thus, more users are attracted and become interested in sports. In addition, AI training courses have introduced courses where celebrities can accompany users in training every day. Celebrities can drive users to enjoy and fall in love with sports and develop a habit of exercising regularly. The Alibaba Sports Technology Team will continue to create more sports scenarios according to users' needs, diversify product functions, and build a unique and innovative product with distinctive features of Alibaba sports end intelligence.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

Community

AI Sports: Best Practices of Alibaba Sports in End Intelligence

1. Background

2. End Intelligence Practices

3. Technical Support

3.1 Improve Recognition Accuracy

3.2 Reduce Performance Overhead

3.3 Improve Testing Efficiency

4. Business Achievements

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Offline Visual Intelligence Software Packages

Platform For AI

Network Intelligence Service

Online Education Solution