CTR estimates what to do
Background
Ali’s targeted advertising team first observed that users have diverse interests, and only some interests will affect user behavior when facing specific products, so they proposed the DIN[1] network. In order to extract the user's abstract interest expression and capture the user's interest evolution information, we propose the DIEN[2] model. Both DIN and DIEN are under the performance pressure of the online system, and the length of user behavior is only 50. More and richer behavioral data is more informative, but it also puts more pressure online. So we proposed a solution from the perspective of algorithm and system co-design.
On the algorithm side, we borrowed the idea of Memory Network and proposed a new generation of CTR prediction model MIMN (Multi-channel user InterestMemory Network). MIMN reads the user's ultra-long behavior sequence, extracts and summarizes the user's diverse interests, and deposits it into the memory network. On the system side, we designed an independent UIC module to be responsible for user interest calculation. Decouple user interest calculation from ad request. The UIC+MIMN framework has achieved remarkable results in Alibaba's precise and targeted display advertising scenarios.
Mining potential interests of users based on historical behavior data has become an important part of CTR prediction modeling. New algorithms are constantly emerging in the field of interest modeling. They have verified their effectiveness in an offline experimental environment, but most of the methods face huge challenges in actual industrial implementation. Most of the services in the industry respond in real time. For example, the click-through rate estimation service faces high concurrent requests and needs to complete the response in a very short time. In the face of long user behavior sequence data, the online reasoning modules of existing interest modeling algorithms are all under the pressure of delay and storage.
The length of the user sequence that can be modeled by the current mainstream CTR estimation technology in the industry is generally within 100. However, in the Alibaba e-commerce scenario, user behaviors are very rich. The average length of user behaviors in just 60 days exceeds 1,000. These data contain very rich information. We counted the average length of user behaviors for different days, and experimented with the offline performance of the CTR prediction model when user behaviors of different lengths were introduced, as shown in Figure 1. We use a simple DNN model to use 1000-length user behavior information compared with 100-length, which can bring about a 0.6% AUC improvement. Among them, the 0.6% offline promotion has great significance for online business.
In order to solve the above challenges, this paper designs a new modeling scheme from the perspective of algorithm and system co-design.
Algorithm side: We borrowed the idea of Memory Network and proposed a new generation of CTR prediction model MIMN (Multi-channel user InterestMemory Network). MIMN reads the user's ultra-long behavior sequence, extracts and summarizes the user's diverse interests, and deposits it into the memory network. In addition, MIMN has designed memory utilization rules, which can effectively improve the utilization rate of memory storage; at the same time, MIMN has introduced an interest induction structure, which can capture the evolution of different interest tracks of users;
System side: We designed an independent UIC module to be responsible for user interest calculation. The update calculation of the UIC module is triggered by the user's real-time new behavior, and is performed asynchronously with the calculation of the estimated click rate of the advertisement.
The co-design solution of MIMN+UIC proposed in this paper breaks the bottleneck restriction of behavior sequence length on user interest modeling technology. In Alibaba's display advertising scenario, we have achieved modeling of more than 1,000 ultra-long user behavior sequences. This technology has been actually deployed in the production system, and has achieved significant online effect improvement.
System side - real-time CTR estimation system
Figure 2 (A) shows the current real-time estimation system framework of Alibaba's display advertising business. The CTR estimation system needs to return the candidate set to estimate the click probability under strict time constraints.
In the field of e-commerce recommendation in the industry, user behavior contributes a large amount of storage, and 90% of the storage in our system is user behavior. Introducing longer user behavior sequences consumes more storage space. In order to maintain low time-consuming and high throughput in the recommendation system, user behavior data will be stored in a distributed storage system (such as Ali's TAIR). However, such a distributed storage system is expensive and cannot withstand massive amounts of data. If we continue to adopt the idea of sequence modeling, the ultra-long user behavior data will bring greater challenges. We use the DIEN model structure to model 1000-length user behavior, and it takes 200ms at 500QPS, which is unacceptable for Alibaba's display advertising scenario business. Therefore, it is not feasible to directly use existing systems and models to model long-term user behavior sequences.
To solve the above challenges, in the system part we design an independent UIC module to handle user interest calculation. Figure 2 (B) introduces the UIC module and redesigns the RTP system. The difference between Figure 2 (A) and Figure 2 (B) lies in the user interest calculation part. In Figure 2 (B), the UIC service provides the latest expression of the user's interest status. The user's interest state changes with the new behavior, and is decoupled from the click-through rate estimation request. Therefore, the user's interest inference calculation can be completed before the real-time calculation of the click-through rate. Therefore, the UIC module is not time-consuming for the CTR estimation and scoring service.
Algorithm side - multi-track user interest memory network
Modeling long sequence data is a well-known algorithmic challenge. Since the simple RNNS network (RNN, GRU, LSTM) is difficult to model longer sequence data, the Attention structure is introduced to enhance the expression of long sequence data. But in actual computing, the structure of RNN+Attention needs to store all historical behavior data, which will bring great storage pressure to the online system.
We borrowed the memory modeling idea of NTM and proposed the MIMN model. Thanks to the ingenuity of the memory structure, the MIMN model can run incrementally. It is implemented in the UIC module and is friendly to online real-time services. Although UIC stores abstract vectors to replace raw user actions, the memory size is limited due to storage pressure. Therefore, we designed the memory utilization regularization, which can effectively improve the utilization of memory storage; at the same time, we introduced the interest induction structure, which can capture the evolution of different interest tracks of users.
In order to effectively improve memory utilization, we propose memory utilization regularization, hoping to constrain the write variance in different memory slots.
image.png represents the sum of write weights from 1 to time t. Among them, image.png represents the write weight after rewriting at time c.
M represents the number of slots, and image.png can reduce the variance of utilization of different memory slots.
NTM algorithm memory is usually used to store raw data information, and the capture of higher-order information is lost. In order to better capture user interest, MIMN designed the Memory Induction Unit (MIU) to capture the evolution of user interest. Each memory slot is regarded as a user interest track in MIU. At time t, MIU selects K orbits of interest for evolution, and each orbit adopts the GRU structure for evolution calculation.
Where image.png is the original interest memory record, and image.png is the behavior embedding vector. Different from using the attention structure to capture user expressions, MIMN does not need to use advertisements to capture relevant user interests, but uses additional memory for storage and mining. Users have diverse interests. The memory-based model framework can perform incremental updates to user interests, making the length of sequence behavior modeling unlimited.
The online implementation of MIMN is shown in Figure 3. The calculation of MIU and NTM is implemented in the UIC server. When new user behavior arrives, UIC will incrementally calculate user interest and update it to TAIR. After the ad request arrives, the user's interest expression will be taken directly from TAIR for CTR estimation.
experiment
We conducted detailed experiments on the public datasets of Amazon (books) and Taobao, as well as the production datasets of Alimama's accurate display advertisements, and verified the effectiveness of UIC&MIMN co-design. The data scale used is as follows:
Public dataset experiment:
We conduct experiments on the book categories in the Amazon dataset, and the training set and test set are randomly divided according to users (that is, users in the test set will not appear in the training set). Sort the reviews written by users according to time, and use the previous T-1 reviews to predict whether T reviews will happen. The Taobao data set is also processed in the same way, using the previous T-1 clicks to predict the user's T clicks. The experimental results are as follows:
For all models, we use the Adam optimization method, with an initial learning rate of 0.001. The embedding dim is set to 16.
Production dataset experiments:
Production dataset We use 49 days of ad impression and click samples as the training set, and the next day as the test set. In addition to MIMN, others use the user's previous 14-day historical behavior as sequence modeling input. MIMN uses the user's behavior data in the first 60 days and truncates it to a length of 1000.
On the production data set, we only compare it with DIEN, the best model currently produced, and it can improve by 1% offline.
Production:
We deployed the MIMN&UIC architecture to Alibaba's display advertising business. Compared with DIEN, the best model in our production environment, the online CTR increased by 7.5%, and the RPM increased by 6%. Thanks to the UI architecture, the complex MIMN algorithm structure can be launched online, and the online throughput and time-consuming performance are improved compared with DIEN. Figure 4:
In addition, we encountered many difficulties in the process of launching the model and summed it up as experience.
UIC server and RTPserver model parameter synchronization problem
The MIMN algorithm is composed of two parts, one is user interest extraction, and the other is CTR estimation. Online calculation will involve two sets of services, UIC and RTP. Therefore, there will be a problem of model parameter synchronization. In Table 3, we have done related experiments to prove that the difference between UIC and RTP server model parameters by one day has no effect on the offline effect of the model. Our system adopts the method of incremental training, and the model parameters are updated every hour, which will greatly reduce the risk of parameter inconsistency.
Big promotion data impact
There are often big promotions in the e-commerce scene, such as the most famous Double Eleven promotion. In such a case, the distribution of data and user behavior are very different from usual. We compared user behavior during the introduction of promotional period to characterize user interest. It can be seen from Table 3 that the offline effect has a 0.2% drop.
initialization strategy
Although UIC can incrementally calculate user behavior, accumulating long-term user behavior will take a lot of time. So we set up an initialization mechanism. Export the interest expression of the user's 120-day behavior learned by the model, and put it into TAIR as initialization.
Rollback mechanism
In order to prevent online failures, such as behavioral data being polluted, online effect problems. We set up a breakpoint saving mechanism to export the user's interest status at zero o'clock every day to offline storage. After a failure, load the nearest offline storage.
Ali’s targeted advertising team first observed that users have diverse interests, and only some interests will affect user behavior when facing specific products, so they proposed the DIN[1] network. In order to extract the user's abstract interest expression and capture the user's interest evolution information, we propose the DIEN[2] model. Both DIN and DIEN are under the performance pressure of the online system, and the length of user behavior is only 50. More and richer behavioral data is more informative, but it also puts more pressure online. So we proposed a solution from the perspective of algorithm and system co-design.
On the algorithm side, we borrowed the idea of Memory Network and proposed a new generation of CTR prediction model MIMN (Multi-channel user InterestMemory Network). MIMN reads the user's ultra-long behavior sequence, extracts and summarizes the user's diverse interests, and deposits it into the memory network. On the system side, we designed an independent UIC module to be responsible for user interest calculation. Decouple user interest calculation from ad request. The UIC+MIMN framework has achieved remarkable results in Alibaba's precise and targeted display advertising scenarios.
Mining potential interests of users based on historical behavior data has become an important part of CTR prediction modeling. New algorithms are constantly emerging in the field of interest modeling. They have verified their effectiveness in an offline experimental environment, but most of the methods face huge challenges in actual industrial implementation. Most of the services in the industry respond in real time. For example, the click-through rate estimation service faces high concurrent requests and needs to complete the response in a very short time. In the face of long user behavior sequence data, the online reasoning modules of existing interest modeling algorithms are all under the pressure of delay and storage.
The length of the user sequence that can be modeled by the current mainstream CTR estimation technology in the industry is generally within 100. However, in the Alibaba e-commerce scenario, user behaviors are very rich. The average length of user behaviors in just 60 days exceeds 1,000. These data contain very rich information. We counted the average length of user behaviors for different days, and experimented with the offline performance of the CTR prediction model when user behaviors of different lengths were introduced, as shown in Figure 1. We use a simple DNN model to use 1000-length user behavior information compared with 100-length, which can bring about a 0.6% AUC improvement. Among them, the 0.6% offline promotion has great significance for online business.
In order to solve the above challenges, this paper designs a new modeling scheme from the perspective of algorithm and system co-design.
Algorithm side: We borrowed the idea of Memory Network and proposed a new generation of CTR prediction model MIMN (Multi-channel user InterestMemory Network). MIMN reads the user's ultra-long behavior sequence, extracts and summarizes the user's diverse interests, and deposits it into the memory network. In addition, MIMN has designed memory utilization rules, which can effectively improve the utilization rate of memory storage; at the same time, MIMN has introduced an interest induction structure, which can capture the evolution of different interest tracks of users;
System side: We designed an independent UIC module to be responsible for user interest calculation. The update calculation of the UIC module is triggered by the user's real-time new behavior, and is performed asynchronously with the calculation of the estimated click rate of the advertisement.
The co-design solution of MIMN+UIC proposed in this paper breaks the bottleneck restriction of behavior sequence length on user interest modeling technology. In Alibaba's display advertising scenario, we have achieved modeling of more than 1,000 ultra-long user behavior sequences. This technology has been actually deployed in the production system, and has achieved significant online effect improvement.
System side - real-time CTR estimation system
Figure 2 (A) shows the current real-time estimation system framework of Alibaba's display advertising business. The CTR estimation system needs to return the candidate set to estimate the click probability under strict time constraints.
In the field of e-commerce recommendation in the industry, user behavior contributes a large amount of storage, and 90% of the storage in our system is user behavior. Introducing longer user behavior sequences consumes more storage space. In order to maintain low time-consuming and high throughput in the recommendation system, user behavior data will be stored in a distributed storage system (such as Ali's TAIR). However, such a distributed storage system is expensive and cannot withstand massive amounts of data. If we continue to adopt the idea of sequence modeling, the ultra-long user behavior data will bring greater challenges. We use the DIEN model structure to model 1000-length user behavior, and it takes 200ms at 500QPS, which is unacceptable for Alibaba's display advertising scenario business. Therefore, it is not feasible to directly use existing systems and models to model long-term user behavior sequences.
To solve the above challenges, in the system part we design an independent UIC module to handle user interest calculation. Figure 2 (B) introduces the UIC module and redesigns the RTP system. The difference between Figure 2 (A) and Figure 2 (B) lies in the user interest calculation part. In Figure 2 (B), the UIC service provides the latest expression of the user's interest status. The user's interest state changes with the new behavior, and is decoupled from the click-through rate estimation request. Therefore, the user's interest inference calculation can be completed before the real-time calculation of the click-through rate. Therefore, the UIC module is not time-consuming for the CTR estimation and scoring service.
Algorithm side - multi-track user interest memory network
Modeling long sequence data is a well-known algorithmic challenge. Since the simple RNNS network (RNN, GRU, LSTM) is difficult to model longer sequence data, the Attention structure is introduced to enhance the expression of long sequence data. But in actual computing, the structure of RNN+Attention needs to store all historical behavior data, which will bring great storage pressure to the online system.
We borrowed the memory modeling idea of NTM and proposed the MIMN model. Thanks to the ingenuity of the memory structure, the MIMN model can run incrementally. It is implemented in the UIC module and is friendly to online real-time services. Although UIC stores abstract vectors to replace raw user actions, the memory size is limited due to storage pressure. Therefore, we designed the memory utilization regularization, which can effectively improve the utilization of memory storage; at the same time, we introduced the interest induction structure, which can capture the evolution of different interest tracks of users.
In order to effectively improve memory utilization, we propose memory utilization regularization, hoping to constrain the write variance in different memory slots.
image.png represents the sum of write weights from 1 to time t. Among them, image.png represents the write weight after rewriting at time c.
M represents the number of slots, and image.png can reduce the variance of utilization of different memory slots.
NTM algorithm memory is usually used to store raw data information, and the capture of higher-order information is lost. In order to better capture user interest, MIMN designed the Memory Induction Unit (MIU) to capture the evolution of user interest. Each memory slot is regarded as a user interest track in MIU. At time t, MIU selects K orbits of interest for evolution, and each orbit adopts the GRU structure for evolution calculation.
Where image.png is the original interest memory record, and image.png is the behavior embedding vector. Different from using the attention structure to capture user expressions, MIMN does not need to use advertisements to capture relevant user interests, but uses additional memory for storage and mining. Users have diverse interests. The memory-based model framework can perform incremental updates to user interests, making the length of sequence behavior modeling unlimited.
The online implementation of MIMN is shown in Figure 3. The calculation of MIU and NTM is implemented in the UIC server. When new user behavior arrives, UIC will incrementally calculate user interest and update it to TAIR. After the ad request arrives, the user's interest expression will be taken directly from TAIR for CTR estimation.
experiment
We conducted detailed experiments on the public datasets of Amazon (books) and Taobao, as well as the production datasets of Alimama's accurate display advertisements, and verified the effectiveness of UIC&MIMN co-design. The data scale used is as follows:
Public dataset experiment:
We conduct experiments on the book categories in the Amazon dataset, and the training set and test set are randomly divided according to users (that is, users in the test set will not appear in the training set). Sort the reviews written by users according to time, and use the previous T-1 reviews to predict whether T reviews will happen. The Taobao data set is also processed in the same way, using the previous T-1 clicks to predict the user's T clicks. The experimental results are as follows:
For all models, we use the Adam optimization method, with an initial learning rate of 0.001. The embedding dim is set to 16.
Production dataset experiments:
Production dataset We use 49 days of ad impression and click samples as the training set, and the next day as the test set. In addition to MIMN, others use the user's previous 14-day historical behavior as sequence modeling input. MIMN uses the user's behavior data in the first 60 days and truncates it to a length of 1000.
On the production data set, we only compare it with DIEN, the best model currently produced, and it can improve by 1% offline.
Production:
We deployed the MIMN&UIC architecture to Alibaba's display advertising business. Compared with DIEN, the best model in our production environment, the online CTR increased by 7.5%, and the RPM increased by 6%. Thanks to the UI architecture, the complex MIMN algorithm structure can be launched online, and the online throughput and time-consuming performance are improved compared with DIEN. Figure 4:
In addition, we encountered many difficulties in the process of launching the model and summed it up as experience.
UIC server and RTPserver model parameter synchronization problem
The MIMN algorithm is composed of two parts, one is user interest extraction, and the other is CTR estimation. Online calculation will involve two sets of services, UIC and RTP. Therefore, there will be a problem of model parameter synchronization. In Table 3, we have done related experiments to prove that the difference between UIC and RTP server model parameters by one day has no effect on the offline effect of the model. Our system adopts the method of incremental training, and the model parameters are updated every hour, which will greatly reduce the risk of parameter inconsistency.
Big promotion data impact
There are often big promotions in the e-commerce scene, such as the most famous Double Eleven promotion. In such a case, the distribution of data and user behavior are very different from usual. We compared user behavior during the introduction of promotional period to characterize user interest. It can be seen from Table 3 that the offline effect has a 0.2% drop.
initialization strategy
Although UIC can incrementally calculate user behavior, accumulating long-term user behavior will take a lot of time. So we set up an initialization mechanism. Export the interest expression of the user's 120-day behavior learned by the model, and put it into TAIR as initialization.
Rollback mechanism
In order to prevent online failures, such as behavioral data being polluted, online effect problems. We set up a breakpoint saving mechanism to export the user's interest status at zero o'clock every day to offline storage. After a failure, load the nearest offline storage.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00