Themepica Technology Enhances Data Processing Efficiency with Serverless

This article introduces how Beijing Themepica Technology Co., Ltd. leverages advanced cloud-native technologies to address challenges related to complex data processing procedures.

By Yue Yang, Chen Dequan and Liu Jingna

Founded in June 2023, Beijing Themepica Technology Co., Ltd. is positioned as a thematic investing pioneer in the era of intelligent investment. As the asset management industry shifts its focus from institutions to users, Themepica Technology builds a thematic investing engine to empower the integration of inclusive finance and investment and to create a new bridge with investors and asset management institutions as the theme and core, and natural language interaction as the entrance.

Themepica Technology processes about 10,000 pieces of financial information each day. By exploring emerging trends and identifying trend inflection points, Themepica Technology has formed a thematic investing system including 10+ super themes, 40+ investment themes, and 200+ sub-themes. Themepica Technology now serves 10 industry benchmark customers, providing diverse services such as data APIs and weekly and monthly reports. So far, it has issued about 500 reports and released nearly 1,000 analytical articles on its official accounts. Moving forward, Themepica Technology aims to become a personalized thematic investing agent through real-time user intention mining and thematic calculations.

Platform Characteristics and Challenges

Themepica Technology offers typical information service products. The platform aggregates financial industry information from various channels, stores it locally, and then processes it following relevant procedures in an investment analysis framework to create financial data products and provide external services. The platform's business features and demand for system resources are characterized as follows:

1. Large data volumes and diverse storage requirements

(a) The platform's core data is primarily unstructured. The data processed at each stage consists of source, intermediate, and result data, measured at the TB level. Though handling such a data volume is a breeze for file or object storage, it is still challenging for analytical or index storage.

(b) Unstructured data storage requires a variety of APIs at different processing stages, including APIs for accessing files, objects, online analytical processing (OLAP) databases, and caching and indexing systems.

(c) The timeliness requirements for processing financial information impose a high demand on the query performance of analytical storage systems.

2. Complex and variable data processing

(a) The data processing procedures are reflective of the investment analysis strategies used in the system and are central to the entire platform. The processing logic employed at the key nodes in these procedures cannot be implemented using the standardized features provided by the platform. Users must commit their Java or Python code to the platform so that the procedures can flexibly call the code as needed.

(b) For realizing the required business logic, frequent data flow and interaction requirements arise between processing nodes in a data processing procedure, between nodes and data storage interfaces, and even between different procedures.

(c) Investment strategies should be dynamically adjusted in response to market changes and customer needs. Accordingly, data processing procedures and even the core processing logic need to be adjusted in line with the business strategies.

(d) The complexity of data processing logic dictates that, after business is launched, it is often necessary to track and analyze the processing of specific data in the production environment, and the capability to conveniently view runtime details is required.

3. Obvious peaks and valleys in the platform's demand for resources

(a) The platform has fixed peaks throughout each day, which are when information feeds come in, when the information is being processed, and when business personnel query information concurrently. Traffic peaks also occur at the beginning of each week and month.

(b) Traffic peaks require the processing capacity to be scaled out by multiple times, and different types of peaks require different system resources. Therefore, it is necessary to plan scale-out solutions for different scenarios ahead.

4. Reliability and timeliness requirements

(a) With information feeds being generated and fed into the platform nonstop, the platform must process incoming information within minutes and put the processed data into the data pool used to provide external services. Therefore, the platform must be able to process information in a stable and continuous manner and automatically scale out during peaks to prevent data backlogs. If an omission or error occurs in the process, the platform should be able to retry automatically.

(b) The systems responsible for providing external services, as access endpoints for end users, must meet the service continuity requirements.

To implement the aforementioned platform features, Themepica Technology has the following requirements for IT infrastructure including IaaS and PaaS:

1. Diversified storage and smooth mutual access between systems Multiple types of storage should be available. Seamless mutual access between different storage systems should be supported. The GUI should support the configuration of daily use, management, and forwarding of data.

2. Simple and flexible data processing procedures

(a) Provide a unified process management portal and support GUI-based process design.

(b) Support implementation of complex business logic using common programming languages and seamless embedding of code into processes.

(c) Support complex and interactive control between nodes in a procedure, between procedures and data storage interfaces, and between procedures.

(d) Allow tracking of runtime processes to analyze processing procedures, and support convenient tracking and analysis of specific data or processes.

3. Automatic system scaling

(a) The capacity of data processing systems should be automatically increased and decreased during traffic peaks and valleys, and the capacity scaling logic can be coded in scripts based on dependencies between systems.

(b) Other business systems should be automatically scaled in and out during traffic valleys and peaks.

4. Overall improvement in R&D quality and efficiency

(a) Reduce the direct costs of IT resources and management costs while ensuring system reliability. (b) Improve the overall process efficiency of continuous integration or continuous delivery (CI/CD).

CloudFlow and Function Compute Boost Complex Data Processing Efficiency

Themepica Technology is a data technology company born on the wave of cloud-native technologies. Since its establishment, the company has decided to leverage cloud-native technologies to improve the overall quality and cost-efficiency of IT R&D.

The challenges faced by Themepica Technology in its quest for higher quality and efficiency mainly revolve around data processing procedures. Therefore, in addition to using regular CI/CD efficiency enhancement tools such as Apsara Devops and containerized deployment, Themepica Technology opts for two new products, CloudFlow and Function Compute, after team evaluation. The goal is to use CloudFlow to manage complex data processes and use Function Compute to help certain nodes running in CloudFlow process complex business logic. The combination of the two products easily meets the need for auto scaling.

Data flowchart:

Practice has proven that using CloudFlow to develop common workflows nearly halves the workload compared to using a mainstream Java application framework. In addition, the elimination of the release step improves the efficiency of online debugging. After a period of adaptation, the efficiency of web console-based tracking and debugging is also greatly improved.

After six months of experience with CloudFlow and Function Compute, Themepica Technology has developed nearly 20 workflows, which have called dozens of functions and run hundreds of thousands of times. Even with only one engineer responsible for workflows, Themepica Technology still manages to release a new workflow about every two weeks. Except in rare cases where online tracking and debugging are required, engineers typically do not need to monitor the running status of released workflows, thus achieving "set-it-and-forget-it" deployment.

Outlook

As a data-centric startup in the era of large models, Themepica Technology will explore deeper into the potential of combining data platforms with large models and, based on the innovative infrastructure of Alibaba Cloud, provide data products with more powerful capabilities and faster iterations to end customers.

Community

Themepica Technology Enhances Data Processing Efficiency with Serverless

Platform Characteristics and Challenges

CloudFlow and Function Compute Boost Complex Data Processing Efficiency

Outlook

Read previous post:

Alibaba Cloud Serverless

You may also like

Comments

Alibaba Cloud Serverless

Related Products

Function Compute

Realtime Compute for Apache Flink

MaxCompute

DevOps Solution

A Free Trial That Lets You Build Big!