Serverless Exploration of TapTap Algorithm Platform
Serverless has saved a lot of O&M and development manpower for TapTap in building applications, and directly brought our very original infrastructure, or resource management level, to the relatively cutting-edge standard in the industry without investing in infrastructure manpower. The most intuitive data is that only single digit manpower can provide a full range of AI and big data support for all businesses related to TapTap.
Founded in 2003, Xinxin is a global game developer and publisher with rich experience in R&D, distribution and agency operation. By the middle of 2022, Xinxin has operated 38 free and paid games with 50 million active users worldwide, mainly in Greater China, Southeast Asia, North America and South America. In 2016, Xinxin launched the mobile game community and application store TapTap. Players can buy and download mobile games for free or for a fee through official channels, and can also communicate with other players in the community. As of June 2022, TapTap has more than 50 million active users worldwide.
Business background
TapTap is different from the traditional app store sharing model. It has always insisted on zero channel sharing, which also determines that the current commercialization of TapTap is mainly driven by advertising. TapTap ads belong to the original ads on the site, which are highly consistent with other non-commercial ads in terms of content, giving users a better experience. For example, the game recommendation on the home page, the content recommendation on the discovery page, the shading words on the search guide page, the search suggestion words that will appear when the search is entered, and the landing page at the end of the search, etc. The advertising part is interspersed among these strategic content.
TapTap's Serverless practice is also based on the actual needs of these business scenarios. For example, the automatic update/deployment of deep learning models that Sogo, Guangzhou and Twitter currently rely on, the model experiment recording platform that the students in the group need to rely on, and some NLP analysis and processing of new content in the station.
In the early days, most of the backend services of TapTap were deployed in ECS and managed and deployed through Rundeck, which was not ideal in terms of efficiency and management. On the demand of infrastructure upgrading scheme, I summarized four points:
1. It can greatly improve the development, operation and maintenance efficiency
2. Meet business needs with low labor costs
3. The service is reliable enough to have good performance
4. Since the TapTap project is currently mainly based on Go language, good support for Go is required in the subsequent infrastructure upgrading.
Scheme comparison
We have considered two mainstream solution architectures. One is the full solution of virtual machine+self built K8s, and the other is the Serverless architecture, which uses the Service Application Engine (SAE) and functions to calculate FC.
After comparison, we chose the latter. On the one hand, Serverless can eliminate the purchase process of the machine and does not need to purchase ECS in advance. In addition, it also comes with some optional default environments. If there is no special requirement, it can basically eliminate the complexity of environment construction; On the other hand, Serverless has integrated a lot of basic components, which can basically be said to be the degree of going online without operation and maintenance.
In terms of subsequent maintenance, the Serverless product has higher billing accuracy than ECS. It can achieve minute level or even second level billing. It can only pay when real business uses resources. Compared with the K8s+ECS model, it can save a lot of labor costs in early development and subsequent operation and maintenance.
From the actual experiment experience of TapTap, we can understand the two products of Serverless.
Function computing FC decouples the business scheduling and triggering logic from the business logic itself. Students of development and algorithm can control the triggering and scheduling logic of the entire business logic on the function computing console first, so they don't need to develop more. They can focus more on the design of the business logic itself, which also determines that function computing is more suitable for business driven scenarios, Apply for resources to run the business logic when the event actually occurs.
In our opinion, the Serverless application engine SAE is similar to the enhanced K8s with more functions and a full range of micro services, which can greatly reduce maintenance costs and achieve real out of the box use. This is more suitable for microservice transformation. By directly migrating old services on ECS, you can obtain a complete set of containerized O&M solutions without investing O&M personnel.
Basically, through the combination of the two, most of the business scenarios of TapTap can be covered, and all the application services of All On Serverless can be realized.
Business Practice
Function calculation FC
1) Fully automatic model deployment/hourly update service triggered by OSS.
TapTap has a model auto deployment and update service triggered by OSS to export and deploy the model. After training their own models, whether TensorFlow, PyTorch or other machine learning models, the algorithm students only need to export them to the designated OSS B storage space ucket, which will trigger the update and deployment services of the model to achieve complete export and deployment. In this way, the algorithm students can deploy, update and expand the model flexibly even without relying on other engineering manpower.
2) Model experiment management platform triggered by HTTP (WEB service)
After the algorithm students submit the model training task through the internal model experiment management and parameter platform implemented by HTTP triggers, we will automatically record its training parameters, log addresses, and log instances, so that all experiments can be traced and managed. This is a Web service itself. It has a front end, but it is also an internal service. It does not require high QPS and performance, So we put it on function calculation, which has considerable advantages in management cost. Especially recently, there is a free quota for function calculation, so it is basically free.
3) Trigger new content NLP processing/parsing service through Kafka
When a user in the TapTap station sends a new post, we will push it to the NLP analysis service provider through Kafka for NLP processing and parsing, and save it for later search. This enables users to send a content call service once, and accurately control costs.
4) Weekly/daily regular statistics of resource consumption
For MaxCompute and EAS resource consumption statistics triggered regularly every week/day, TapTap will automatically pull the unstructured consumption bill from Alibaba Cloud's background, and then aggregate it to each student, task and model, and push it to the students in the group to help them improve their cost awareness and help each business line better manage costs.
Serverless application engine SAE
On the ground of SAE, we chose the intra group prediction service. This service itself integrates the capabilities of model reasoning, feature development and sample retrieval required by search, recommendation and advertising. It is a mid platform microservice, and all business lines can access the most mature online prediction service in the current group at a very low cost. For example, the click through rate of the recommended words on the current search page and the click through rate of the international version of the game are estimated.
Through SAE, TapTap's service quickly has the capability of Serverless. Because SAE itself shields a lot of resource management, environmental management and basic operation and maintenance component management, TapTap can quickly launch a set of independent prediction services for new scenarios and new businesses at home and abroad.
At the same time, TapTap also integrates SAE's alarm platform, event center and log service. TapTap can sense the status of online business in real time through pinning alarms, such as whether OOM or restart occurs, error logs, etc.
In addition, the service itself is also connected to the Dubbo Go framework, enabling the service to directly have micro service capabilities such as service registration discovery, IP direct connection, and elegant online and offline. Compared with the previous mode of using ECS, this scheme has great advantages in operation and maintenance management, development and online and subsequent cost control. It can basically cover the whole process of subsequent operation and maintenance from development and online, greatly saving the development costs within the group.
Business Value
Simple operation and maintenance: development can easily complete the whole process of application development, deployment and management, so that you can focus more on business, and greatly save the investment and cost of operation and maintenance.
No stop publishing+minute level online: SAE supports grayscale publishing and rolling publishing, and also provides a relatively complete Open API, which can be integrated into Git for rapid deployment, enabling TapTap services to have minute level publishing capabilities, which is particularly attractive for new businesses.
Second level elastic scaling: SAE supports the configuration of scaling strategies for indicators of different dimensions, such as CPU, memory, QPS, RT, and timing, which can help improve resource utilization. Especially when the business scale is large, the machine cost can be significantly reduced by configuring more sophisticated elastic strategies.
Multilingual microservice capability: SAE provides PHP, Python, GO and other runtimes, and realizes low-cost microservice of Go language based on K8s Service multilingual service registration discovery.
Founded in 2003, Xinxin is a global game developer and publisher with rich experience in R&D, distribution and agency operation. By the middle of 2022, Xinxin has operated 38 free and paid games with 50 million active users worldwide, mainly in Greater China, Southeast Asia, North America and South America. In 2016, Xinxin launched the mobile game community and application store TapTap. Players can buy and download mobile games for free or for a fee through official channels, and can also communicate with other players in the community. As of June 2022, TapTap has more than 50 million active users worldwide.
Business background
TapTap is different from the traditional app store sharing model. It has always insisted on zero channel sharing, which also determines that the current commercialization of TapTap is mainly driven by advertising. TapTap ads belong to the original ads on the site, which are highly consistent with other non-commercial ads in terms of content, giving users a better experience. For example, the game recommendation on the home page, the content recommendation on the discovery page, the shading words on the search guide page, the search suggestion words that will appear when the search is entered, and the landing page at the end of the search, etc. The advertising part is interspersed among these strategic content.
TapTap's Serverless practice is also based on the actual needs of these business scenarios. For example, the automatic update/deployment of deep learning models that Sogo, Guangzhou and Twitter currently rely on, the model experiment recording platform that the students in the group need to rely on, and some NLP analysis and processing of new content in the station.
In the early days, most of the backend services of TapTap were deployed in ECS and managed and deployed through Rundeck, which was not ideal in terms of efficiency and management. On the demand of infrastructure upgrading scheme, I summarized four points:
1. It can greatly improve the development, operation and maintenance efficiency
2. Meet business needs with low labor costs
3. The service is reliable enough to have good performance
4. Since the TapTap project is currently mainly based on Go language, good support for Go is required in the subsequent infrastructure upgrading.
Scheme comparison
We have considered two mainstream solution architectures. One is the full solution of virtual machine+self built K8s, and the other is the Serverless architecture, which uses the Service Application Engine (SAE) and functions to calculate FC.
After comparison, we chose the latter. On the one hand, Serverless can eliminate the purchase process of the machine and does not need to purchase ECS in advance. In addition, it also comes with some optional default environments. If there is no special requirement, it can basically eliminate the complexity of environment construction; On the other hand, Serverless has integrated a lot of basic components, which can basically be said to be the degree of going online without operation and maintenance.
In terms of subsequent maintenance, the Serverless product has higher billing accuracy than ECS. It can achieve minute level or even second level billing. It can only pay when real business uses resources. Compared with the K8s+ECS model, it can save a lot of labor costs in early development and subsequent operation and maintenance.
From the actual experiment experience of TapTap, we can understand the two products of Serverless.
Function computing FC decouples the business scheduling and triggering logic from the business logic itself. Students of development and algorithm can control the triggering and scheduling logic of the entire business logic on the function computing console first, so they don't need to develop more. They can focus more on the design of the business logic itself, which also determines that function computing is more suitable for business driven scenarios, Apply for resources to run the business logic when the event actually occurs.
In our opinion, the Serverless application engine SAE is similar to the enhanced K8s with more functions and a full range of micro services, which can greatly reduce maintenance costs and achieve real out of the box use. This is more suitable for microservice transformation. By directly migrating old services on ECS, you can obtain a complete set of containerized O&M solutions without investing O&M personnel.
Basically, through the combination of the two, most of the business scenarios of TapTap can be covered, and all the application services of All On Serverless can be realized.
Business Practice
Function calculation FC
1) Fully automatic model deployment/hourly update service triggered by OSS.
TapTap has a model auto deployment and update service triggered by OSS to export and deploy the model. After training their own models, whether TensorFlow, PyTorch or other machine learning models, the algorithm students only need to export them to the designated OSS B storage space ucket, which will trigger the update and deployment services of the model to achieve complete export and deployment. In this way, the algorithm students can deploy, update and expand the model flexibly even without relying on other engineering manpower.
2) Model experiment management platform triggered by HTTP (WEB service)
After the algorithm students submit the model training task through the internal model experiment management and parameter platform implemented by HTTP triggers, we will automatically record its training parameters, log addresses, and log instances, so that all experiments can be traced and managed. This is a Web service itself. It has a front end, but it is also an internal service. It does not require high QPS and performance, So we put it on function calculation, which has considerable advantages in management cost. Especially recently, there is a free quota for function calculation, so it is basically free.
3) Trigger new content NLP processing/parsing service through Kafka
When a user in the TapTap station sends a new post, we will push it to the NLP analysis service provider through Kafka for NLP processing and parsing, and save it for later search. This enables users to send a content call service once, and accurately control costs.
4) Weekly/daily regular statistics of resource consumption
For MaxCompute and EAS resource consumption statistics triggered regularly every week/day, TapTap will automatically pull the unstructured consumption bill from Alibaba Cloud's background, and then aggregate it to each student, task and model, and push it to the students in the group to help them improve their cost awareness and help each business line better manage costs.
Serverless application engine SAE
On the ground of SAE, we chose the intra group prediction service. This service itself integrates the capabilities of model reasoning, feature development and sample retrieval required by search, recommendation and advertising. It is a mid platform microservice, and all business lines can access the most mature online prediction service in the current group at a very low cost. For example, the click through rate of the recommended words on the current search page and the click through rate of the international version of the game are estimated.
Through SAE, TapTap's service quickly has the capability of Serverless. Because SAE itself shields a lot of resource management, environmental management and basic operation and maintenance component management, TapTap can quickly launch a set of independent prediction services for new scenarios and new businesses at home and abroad.
At the same time, TapTap also integrates SAE's alarm platform, event center and log service. TapTap can sense the status of online business in real time through pinning alarms, such as whether OOM or restart occurs, error logs, etc.
In addition, the service itself is also connected to the Dubbo Go framework, enabling the service to directly have micro service capabilities such as service registration discovery, IP direct connection, and elegant online and offline. Compared with the previous mode of using ECS, this scheme has great advantages in operation and maintenance management, development and online and subsequent cost control. It can basically cover the whole process of subsequent operation and maintenance from development and online, greatly saving the development costs within the group.
Business Value
Simple operation and maintenance: development can easily complete the whole process of application development, deployment and management, so that you can focus more on business, and greatly save the investment and cost of operation and maintenance.
No stop publishing+minute level online: SAE supports grayscale publishing and rolling publishing, and also provides a relatively complete Open API, which can be integrated into Git for rapid deployment, enabling TapTap services to have minute level publishing capabilities, which is particularly attractive for new businesses.
Second level elastic scaling: SAE supports the configuration of scaling strategies for indicators of different dimensions, such as CPU, memory, QPS, RT, and timing, which can help improve resource utilization. Especially when the business scale is large, the machine cost can be significantly reduced by configuring more sophisticated elastic strategies.
Multilingual microservice capability: SAE provides PHP, Python, GO and other runtimes, and realizes low-cost microservice of Go language based on K8s Service multilingual service registration discovery.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00