By Baiyu
A variety of promotional activities are one of the most challenging scenarios for O&M engineers to support. Every promotional season accompanies sleepless nights. The challenge comes from a large number of content updates, consumer influx, and data reading and writing. Although there are various technical solutions and tools to ensure the smooth operation of the promotion, it is still possible to receive various complaints from users, such as the picture cannot be loaded, the page is opening slowly, and the order cannot be paid. The bad user experience and website performance lead to a low user conversion rate and slow business growth. O&M engineers will be blamed for all these.
In terms of user experience and website performance, we conducted interviews with many O&M engineers and independent webmasters and found that everyone's views focused on the following aspects:
1. Performance and Experience Problems Caused by the Gap between Product and User Experience
As the Internet dividend decreases, product functions and user experience design are increasingly competitive. A variety of flash sales, promotion activities, and UGC content make product logic increasingly complicated for users to understand. Even if guides and explanation documents are provided, users still find it overwhelming. Meanwhile, a large number of rich media, third-party components, and customer advertisements have been added to products to enrich the functional modules. Too many unreasonable external cooperations increase the system load and reduce the product performance. The price is the sacrifice of website performance and user experience.
2. Performance and Experience Problems Caused by the Intricate Network Environment
China is flooded with all kinds of first-level and second-level operators, which increases the complexity of the national network environment substantially. The slow update of the operator's infrastructure and many sudden human-made problems cause frequent IDC failures. Enterprises can only appease their customers and wait for these failures to be repaired. However, the time consumption of troubleshooting and repairing depends on different factors. At the same time, the wide geographical distribution, scattered user distribution, and various access methods cause the access network to be complex. Enterprises cannot estimate the user environment effectively. It is difficult to solve network environment problems, even with the help of widely distributed data centers and multi-line BGP access. This increases the optimization difficulty of the network environment and makes the user experience of real users more difficult to predict.
3. Performance and Experience Problems Brought by Differences in Distinct PC-End Environments
China has the largest number of Internet users worldwide, but there is a huge difference in hardware configuration at the user end. Some people may use an i9-11900K and RTX3080 Ti to watch 4K HD live video on Bilibili, while others use a Pentium 4 with integrated graphics cards from 2000 to browse text-based news on websites. As a result, different groups with different browser versions, rendering mechanisms, and local host performance have a varied user experience in terms of Internet speed and resource consumption. O&M personnel must get to know the experience of users and evaluate their differences to improve this situation and offer users the best experience.
4. System Availability Issue Caused by Seeking Iteration Speed
Due to the fierce competition in the Internet industry, the product architecture and stability of the product have to be ignored in the term of function window period and fine-tuning. Problems, such as system overload, system crash, and response timeout, are caused by many reasons, such as unreasonable architecture and lack of support from architectures for business development.
1. Since the business iteration speed is very fast, the intrusive monitoring method cannot be implemented in a short time. At the same time, the business system needs to be sensed quickly when it fails.
2. Development resources are insufficient or uncoordinated, and infrastructure-related monitoring cannot directly reflect business problems. In addition, the cost of implementing application monitoring is too high.
3. The availability of the third-party API cannot be guaranteed, and the failure cannot be responded to and handled in time.
We may feel that these are single-point problems, but when the business reaches a certain scale, this combination of problems will affect user experience.
4. Passive Posture in Handling Customer Complaints Caused by a Lack of Monitoring Methods from the User's Perspective
Product functions will undergo various tests before they are launched, and the O&M Team continues to pay attention to user experience. However, it is usually not until the customer complains that the O&M Team realizes there are problems with the system. They are very passive in dealing with these problems. Exception recurrence and positioning of the problem may take them one day, which seriously affects the Net Promoter Score (NPS). Common monitoring methods are also mostly from their perspective and cannot reflect users' problems.
In the face of many influencing factors, how can we test our website from the perspective of real users, quantify the user experience, and locate the bottleneck of website performance? Here, we take the marketing activities in the e-commerce industry as an example. In the face of increasingly fierce competition, promotional activities, such as the Double 11 and 618 Shopping Festivals, have become important annual marketing activities. However, the short-term influx of a large number of users can cause problems that affect the user experience, such as delayed website loading or service stalling.
Specific issues include:
The performance of the activities of competitors cannot be obtained, and it is impossible to know the changes in the marketing situation of the competitors.
In the past, the problems above were difficult to solve because:
The O&M Team cannot conduct related tests actively. As a result, problems can only be found during the actual user experience and resolved passively. However, the recurrence of the problem and the location of the fault may delay the entire O&M Team, resulting in an extremely long repair time.
Therefore, the Operation Team and Maintenance Team need a product or solution that can solve the problems above. As a non-intrusive cloud-native monitoring product for business, Cloud Automated Testing (CAT) is the best choice. CAT can simulate real user behavior and monitor the availability and performance of a website and its network, service, API, and port continuously around the clock with Alibaba Cloud’s global service network. CAT can locate problems at the page element level, network request level, and network link level. Its rich monitoring-related items and analysis models help enterprises discover and locate performance bottlenecks and experience dark spots in a timely manner. It reduces operational risks and improves service experience and efficiency.
More than 200,000 LM, more than 500 IDC terminal monitoring nodes, more than 400 operators, and hundreds of thousands of registered members worldwide ensure that the monitoring scale meets the increasingly large business scale.
With zero-intrusive monitoring, you only need to enter the URL and do some simple configurations. A complete data analysis report of website performance can be obtained in a few minutes. Billing through resource packages, pay-as-you-go, and many other purchase options meet O&M testing requirements.
The monitoring period has reached the minute level. CAT supports more than 20 associated parameters in seven categories for monitoring, various mainstream protocols, and 24/7 real-time fine-grained fault monitoring, alert, and performance analysis services for sites and service ports. From the client perspective, CAT deeply analyzes the details of a single sample through multiple dimensions, such as region and operator. CAT visually locates the problem, determines the affected range, and finds root causes with rich indicators and charts. The analysis time consumption is reduced, and O&M efficiency is improved significantly. As such, fine-grained monitoring can be achieved.
Real-time alerts are implemented for the first-page loading time, overall performance, and availability. The rich alert policy settings are deeply integrated with the Alibaba Cloud alert center to shorten MTTR. CAT supports finding page element-level errors and can locate problems in a single network request process. This improves problem locating efficiency.
Let’s take the marketing promotion of an e-commerce enterprise as an example. The monthly number of active users of the website exceeds one million, and the user groups are mainly distributed in the third-, fourth-, and fifth-tier cities in China. The annual cost of website operation and maintenance exceeds two million yuan. However, due to the frequent update of commodity information during the promotional period, complaints from users are often received. This leads to a low user conversion rate, and users often complain about the O&M Team.
We solved this problem through CAT and optimized website performance to support promotional activities.
Before enterprise's marketing activities or new systems go online, CAT helps enterprises select the monitoring points of operators in different cities, set browsing and network tasks to obtain access experience data of real users in real-time, and locate page elements where problems occur. CAT will help technical teams fix problems in a timely manner. CAT simulates high-concurrency access during peak traffic. The changes of main performance indicators can be observed, and performance bottlenecks are tapped by increasing the peak pressure.
Problem verification and fault recurrence can be carried out immediately to evaluate and optimize website performance through first-page monitoring and real-time monitoring functions. We can understand users’ real experience process, optimize the browsing path, find the transformation bottleneck, and improve the conversion rate through transaction flow analysis.
Enterprises can collect and analyze the performance data of marketing activities from competitive products in the industry and understand the changes in the marketing of the competitive products using the zero-intrusion feature. They can also make targeted IT investment, optimization, and iteration to make up for marketing weaknesses and maintain their leading positions.
The website performance can be improved significantly, and the quantitative indicators related to the user experience will be improved by more than 30% using the measures above. This drives business growth. In addition to the scenarios above, CAT can be widely used in many scenarios, such as network interfaces, service availability monitoring, CDN service monitoring and selection, DNS resolution status, and hijacking analysis.
503 posts | 48 followers
FollowAlibaba Cloud Native - November 5, 2021
Alibaba Clouder - November 3, 2017
Alibaba Clouder - July 29, 2020
Nick Patrocky - March 8, 2024
Iain Ferguson - January 13, 2022
Hiteshjethva - October 30, 2020
503 posts | 48 followers
FollowA unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.
Learn MoreManaged Service for Grafana displays a large amount of data in real time to provide an overview of business and O&M monitoring.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreA HPCaaS cloud platform providing an all-in-one high-performance public computing service
Learn MoreMore Posts by Alibaba Cloud Native Community