×
Community Blog How Yuque Improved Its Performance by Going Serverless

How Yuque Improved Its Performance by Going Serverless

In this blog, we'll discuss how Yuque managed to improve its performance and efficiency by migrating several key processes to Alibaba Cloud's Function Compute serverless platform.

Introduction

Yuque is a professional on-cloud knowledge base for team document collaboration. It has become a standard tool for Alibaba employees to write documents and accumulate knowledge and begun to provide external services since 2018.

Customers' Pain Points

Yuque is a complex web application and also a typical data-intensive application that relies heavily on cloud services such as cloud databases. Yuque's server is a Node.js technology stack. When we refer to Node.js, several words may immediately appear in our mind, like single-threaded, non-blocking, and asynchronously programming. Thanks to these features, Node.js is very suitable for developing scalable network applications to implement I/O-intensive applications such as web services. However, when it comes CPU-intensive scenarios, these features are barriers to development. Once the method that blocks a process is executed, the entire process is blocked.

For applications like Yuque, Node.js is applied to implement the entire server logic. It is difficult to completely avoid scenarios that may consumes plenty of CPU resources or even make processes into endless loops. Take markdown conversion as an example. Since we cannot be totally clear about all user's input, it is possible for conversion codes being led to inefficient or endless loops. When Node.js was created, it was difficult to find a perfect solution to these problems. Even languages based on thread concurrency model, such as Java, are powerless when such scenarios occur. After all, CPU is a very important resource for web applications. However, with the improvement of basic settings, there seems to be a perfect solution for the biggest weakness of Node.js as Function Compute appears.

Solution

Busi, Technical Director of Yuque, said: *"With Function Compute applied, we can run all CPU-intensive and unstable operations in Function Compute. By doing so, our primary service can once again return to the I/O-intensive application model. We can once more enjoy the efficiency of R&D brought by Node.js!"

"Take an actual scenario we encountered when using Yuque. The user uploaded some documents in HTML or markdown format, and we need to convert it into Yuque's own document format. In most cases, it is very fast to parse user's documents, but it is still possible to trigger bugs of the parser in some unexpected scenarios and cause endless loops. We don't even dare to upgrade the markdown parsing library and related plug-ins, fearing more problems may be introduced. However, with the introduction of Function Compute, we can transfer the conversion logic that consumes CPU resources to Function Compute, so the stability of Yuque's primary service is no longer affected."

2

In addition to perform some CPU-intensive operations web systems, what else can Function Compute do?

Yuque supports drawing with various codes, including Plantuml, formula, and Mermaid. It also allows you to export documents as PDF files and images. These scenarios have two features:

  1. They depend on some complex application software, such as Puppeteer, Graphviz, etc.
  2. They may need to execute user's input.

It seems simple to support this kind of scenario by just calling the sub-process of process.exec, but when trying to make it a stable external service, problems arise. These complex applications may not be designed for long-term operation. So they may have problems in memory usage and stability during long-term operation. Besides, CPU will be under high pressure when these applications are called in large concurrency. In addition, as users' input code needs to be run in some scenarios, attackers can run attack codes on servers through malicious input, which is highly risky.

Before Function Compute was introduced, Yuque assigned a separate task cluster to support these features. In this cluster, Yuque ran third-party services and accepted requests from the primary service to avoid affecting its stability. However, to solve aforementioned problems, many costs need to be paid:

  1. A large task cluster needs to be maintained, although many resources may not be used most of the time.
  2. Third-party applications need to be restarted regularly to avoid memory leakage caused by long-term operation. Even so, some special requests may still cause instability in third-party software.
  3. User's input needs to be detected and filtered to prevent hacker attacks. But it is difficult to completely prevent hacker's attack codes and security risks are still high.

3

Yuque packages all third-party services into function separately and splits functions of the task cluster into a series of functions of Function Compute. By applying Function Compute, all above-mentioned problems can be solved.

  1. Function Compute charges users for the actual CPU time spent on running code. There is no need to maintain a task cluster for a long time.
  2. Although some resident functions will be optimized during the Function Compute runtime, users basically do not need to consider problems caused by long-term operation. Each call is independent from each other and will not be affected by other calls.
  3. User's input code is run in a sandboxed-container. Even if any user's input is not filtered, malicious attackers cannot obtain any sensitive information and enter the internal network to run malicious code as well, which makes Yuque safer.

4

In addition to the preceding features, Yuque recently replaced Alibaba Cloud's ApsaraVideo VOD with Object Storage Service (OSS) + Function Compute to transcode videos and audios.

There are not so many audio and video formats supported by browsers. So videos uploaded by a large number of users need to be transcoded for playing on Yuque. FFmpeg is generally used to transcode audios and videos. The transcoding service is also a typical CPU-intensive scenario.

To build your own video transcoding cluster, you will waste a lot of resources. As for using Alibaba Cloud's ApsaraVideo VOD, the cost is relatively high, and there are not enough things that can be controlled. Function Compute directly integrates FFmpeg to provide audio and video processing capabilities. It is also integrated into the application center to improve monitoring and data analysis by co-working with Log Service (SLS).

Through the optimization of improving compressing rate and reducing unnecessary transcoding, Yuque has reduced the cost to one fifth of the original cost, after migrating audio and video processing from ApsaraVideo VOD to Function Compute.

5

Practices

Busi said that from the practice of Yuque, Yuque did not migrate web services to Function Compute like SFF (the current Function Compute architecture is actually not good at SFF mode). However, Function Compute plays an important role in Yuque's overall architecture for stability, security, and cost control. In summary, Function Compute is very suitable for the following scenarios:

  1. Share the CPU pressure of the primary service for CPU-intensive operations with low time-sensitivity requirements.
  2. Run the user-submitted code as a sandbox.
  3. Run unstable third-party application services.
  4. Run services requiring strong dynamic scaling capabilities.

After Function Compute was introduced, the current architecture of Yuque takes Monolith Application as its core. Some independent functional modules are divided into Microservices and Serverless architectures according to their usage scenarios and capability requirements. Application architecture is closely related to team members and business forms. With the improvement of various cloud services and infrastructure, we can more easily choose a more appropriate architecture.

Based on Serverless, tasks with security risks or that consume a large amount of CPU resources can be migrated to Function Compute. In this way, we do not need to worry about security risks caused by malicious code because these tasks are completed in a sandbox. At the same time, these CPU-intensive tasks are removed from the primary service to avoid blocking during concurrent operations. Besides, the pay-as-you-go billing method can greatly reduce costs because you do not have to deploy a resident service for low-frequency function scenarios. Therefore, such services should be migrated to Serverless as many as possible.

0 0 0
Share on

Alibaba Cloud Serverless

99 posts | 7 followers

You may also like

Comments

Alibaba Cloud Serverless

99 posts | 7 followers

Related Products