What are the progress and trends of natural language processing technology?
2019 review: five major technological advances and four major applications and products
Looking back on the past, many meaningful landmark events have taken place in the field of application and research of natural language processing technology. We will review from two dimensions of "technical progress", "application and product".
In 2019, technological progress is mainly reflected in pre-trained language models, cross-language NLP/unsupervised machine translation, knowledge map development + dialogue technology integration, intelligent human-computer interaction, and platform manufacturers integrating AI product lines.
1 Pre-trained language model
As Google proposed the pre-training language model BERT at the end of 2018, which achieved better results in multiple NLP tasks, the research and application of pre-training language models was regarded as a major breakthrough in the field of NLP by academia and industry. The solution has gradually evolved from the previous complex model design for each task to the pre-training + fine-tuning paradigm, allowing many NLP applications to enjoy the dividends brought by the large corpus pre-training model. On the basis of adding a simple task layer, combined with a small amount of corpus in your own scene, you can get a good domain NLP model.
So far opened a new chapter in natural language processing.
In 2019, various research institutions and companies have further innovated on the basis of BERT, and have proposed their own pre-training models, such as: RoBERTa released by Facebook, XLNet released by CMU, ELECTRA released by Stanford, and ERNIE model from Baidu. Ali's structBERT model, NEZHA, of Technology and HKUST Xunfei have also proposed their own models, constantly refreshing the best results of NLP tasks.
To sum up, this new work mainly comes from two aspects of training task design and training algorithm.
training task design
Carry out finer semantic granularity modeling, including the introduction of finer-grained modeling objects and finer description of semantic associations.
For example, "whole word Mask" or "Knowledge Masking", the technology masks the whole word instead of a single Token in the MLM pre-training task, thereby increasing the difficulty of the task so that BERT can learn more semantic information, Chinese published jointly by of Technology and HKUST Xunfei The BERT model and the NEZHA model have been applied; another example is the introduction of more types of inter-sentence relationships, so that the semantic relevance can be described more accurately, and then the ability of semantic matching can be improved, which is reflected in the BERT model of Ali and the ant team .
Modeling with new machine learning methods
The XLNet jointly released by CMU and Google uses two schemes, Autoencoder and Auto-regressive; the ELECTRA model proposed by Stanford University introduces a confrontation mechanism for better MLM learning. The SpanBERT model jointly released by the University of Washington and Facebook also introduces the Span prediction task. These schemes apply more learning methods to model the connection between words, thus improving the model performance.
Training Algorithm Design
For the ease of use of the model, reduce the model parameters, or reduce the complexity of the model, including the ALBERT released by Google, which uses the decomposition of the vocabulary embedding matrix and the sharing of the middle layer.
Optimizations to improve training speed
Including mixed-precision training, using FP16 to represent weights, activation functions, and gradients; the LAMB optimizer adjusts the learning rate for each parameter in an adaptive manner, and the model training can use a large Batch Size; these methods are extremely Greatly increased training speed.
Ali's structBERT model improves language representation capabilities by introducing more model and task structured information. On the Gluebench mark, it has been ranked and maintained the leading position many times. Through distillation and CPU acceleration, RT has been improved by 10x, and the finetuned model has brought significant improvements to multiple business scenarios, and the AliNLP platform has been launched.
The pre-trained language model is pre-trained on large-scale unsupervised texts, and the obtained word and sentence representations are transferred to a wide range of downstream tasks, including text matching, text classification, text extraction, reading comprehension, machine question answering and other different scenarios . For example, the Ali language model has won the first place in the MS MARCO question and answer evaluation and TREC Deep Learning evaluation.
Downstream tasks can quickly obtain a good solution with low resources, which greatly improves the application landing ability of NLP algorithms.
2 Cross-lingual NLP/Unsupervised Machine Translation
As an extension of the pre-trained language model, Facebook researchers proposed a cross-language language model pre-training "Cross-lingual Language Model Pretraining", using only unsupervised training of monolingual data and supervised training using parallel corpora , the model effectively learns cross-language text representations, and has significantly improved compared to the previous best results in tasks such as multilingual classification and unsupervised machine learning.
Following Google's pre-trained language model BERT sweeping mainstream NLP tasks in 2018, Facebook released a new cross-language pre-trained language model XLM in 2019, which enables representation sharing of different languages in a unified embedding space and brings Significant quality improvement. In exploring the direction of large-scale, multilingual neural machine translation, Google, Alibaba, etc. have carried out effective exploration, by training a model on the parallel corpus of dozens or even hundreds of languages at the same time, instead of building a model for each language direction separately. Model, realize the sharing of semantic mapping relationship, not only compress the number of models, but also generally improve the translation effect of small languages.
In the past year, the research results of multilingual NLP technology have mainly focused on machine translation (especially unsupervised machine translation), cross-language word vectors, multilingual NER, dependency syntax analysis, word alignment and multilingual dictionary generation.
Since the learning/mapping of cross-language word vectors is a key step, the current unsupervised/cross-language NLP tasks work best between similar languages (such as English/French, English/Spanish, etc.), and in different The effect between language families (such as English/Vietnamese) still has a lot of room for improvement.
3 Knowledge map development + dialogue technology integration
With the accumulation of data volume and the improvement of application requirements for data quality and structure, knowledge graph has become a hot technology in recent years and has begun to attract attention.
The development of knowledge graph technology in 2019, including the construction and integration of domain knowledge graphs (financial, enterprise, etc.), the construction of graph platform standard capabilities (schema definition + construction + call), graph application algorithm construction (based on graph data Graph model + rule reasoning, etc.); and based on the constructed graph data and capabilities, it has begun to be applied in more business scenarios (search and recommendation content understanding and mining, financial risk control and decision-making, dialogue understanding and content generation, etc.).
In the technical direction of the combination of knowledge graph and dialogue, dialogue technology has formed a certain technical framework and business coverage in Q&A and task-based dialogue in recent years, and it is beginning to need to solve some domain scenarios that require higher knowledge understanding + answer professionalism ( Financial Assistant, etc.).
Dialogue technology combines the completeness of domain knowledge + structural quality advantages of knowledge graphs to cover, which can solve the shortcomings of corpus annotation (intention understanding) and expert configuration (dialogue process + response generation) in corresponding scenarios, and further improve dialogue coverage and response quality. Integrating the direction of knowledge graph dialogue, there will be more real-life scenarios and coverage in 2020.
4 Intelligent human-computer interaction
Natural language understanding and in-depth question-and-answer matching technologies continue to develop in academia and industry, and have been applied on a large scale in global businesses and scenarios. Based on pre-trained language models, performance has been further improved.
Machine reading comprehension has become a low-cost general-purpose technology, and application middle-end capabilities have been built around scenarios such as encyclopedias, policies and regulations, product detail pages, and manuals, and access efficiency has been greatly improved. The multi-modal VQA question answering technology combined with pictures and texts is the first to be incubated in the industry, and it has become a new competitiveness to understand the long pictures on the product detail page for question and answer.
Dialog technical capabilities have been further developed, but end-to-end data-driven dialogue state tracking and dialogue strategies can only be explored within a limited range. Task-based robots built on dialogue platforms in industrial scenarios have become mainstream implementation solutions .
Multilingual technology realizes rapid expansion of new languages, builds a multilingual language model based on Cross-Lingual, surpasses Google in long-distance language pairs English -> Chinese, English -> Thai long-distance language pairs, expands a new language from last year 2 months shortened to 2 weeks.
Dialogue generation technology has begun to make breakthroughs. The introduction of structured knowledge has improved the controllability of generation, and the generation of selling points has brought about an increase in the conversion rate of shopping guides.
5 Platform manufacturers integrate AI product lines
With the development of AI technology and the needs of AI applications, and the maturity of AI technology frameworks (Tensorflow, PyTorch, etc.), AI technology capabilities have gradually been standardized into a series of AI platform products, oriented to enterprises and developers, providing lower threshold and higher Efficient AI application support.
Conversational platforms, Google has released the Assistant dialogue assistant since 2016. In recent years, it has released Google Home (now integrated into the Nest smart home brand), Duplex voice phone, and acquired the API.AI dialogue development platform; this year Google has basically Integrating these dialogue product lines basically lays out the existing platform + terminal of dialogue, and forms a whole dialogue product line.
In terms of AI platforms, Amazon has released the SageMaker machine learning platform product since 2017. This year, it further integrated the AI development process based on SageMaker, while opening up the downstream technical framework and upstream AI applications, and integrating the AI product line. Similar to Ali's machine learning platform PAI, it is positioned as a one-stop machine learning platform for enterprises and developers.
In 2019, applications and products are mainly reflected in the continuous development of machine translation, dialogue systems, multi-round dialogue intelligent services, and intelligent voice applications.
6 machine translation
The product development of machine translation has continued the previous trend, expanding more language directions in general fields (news) and specific fields (e-commerce, medical care, etc.), supporting richer business scenarios, and continuing to bring commercial value. Alibaba has carried out fruitful explorations in the direction of translation intervention and intelligent generalization, better integrating business knowledge into the neural network translation framework, and greatly improving the translation accuracy of key information in vertical scenarios.
The translation of high-value and highly sensitive content is still inseparable from manual labor. Therefore, the introduction of intelligent algorithms in computer-aided translation (CAT) to realize human-machine collaborative translation, and new production models such as machine-translated post-editing (MTPE) are also receiving more and more attention. Much attention. Alibaba and Tencent have begun to launch products in automatic post-editing (APE) and interactive translation (IMT), and they have landed in actual business.
In addition to text translation, more multi-modal translation application scenarios have emerged, such as speech translation in simultaneous interpretation of conferences, bilingual subtitles, and attempts on translation machine hardware (the speeches of Mr. Ma and Xiaoyaozi at the 20th anniversary of Ali’s annual meeting were also delivered in real time. displayed in the form of bilingual subtitles).
Image translation combined with OCR, machine translation and picture-combining technology has been applied in Alipay, WeChat, and Sogou translators. With the rise of live streaming by sellers, there will be more and more scenarios and demands for live video translation. However, limited by the complex fields, professional terminology, fast speech rate and sometimes noisy background environment in the live broadcast scene, live translation is also a huge challenge for speech recognition and machine translation.
7 Dialogue System
The language coverage of the dialogue system has been further improved. Based on the multilingual migration ability, the dialogue system of French, Arabic, and Taiwanese has been rapidly expanded. Currently, it supports 11 languages and the mixed language understanding of Malay-English and Thai-English. Lazada and AE have greatly improved the resolution rate.
The dialogue system supports larger merchants and enterprises, and supports more than 50+ group economy customers. Dian has expanded the knowledge positioning capabilities of general packages, industry packages, and store packages, and has accumulatively carried millions of active merchants. Tens of millions of dialogue rounds. Dingding carries 40W daily active enterprises based on the enterprise intelligent assistant.
The interactive form of the dialogue system has been further enriched. The live broadcast of has realized the transformation from passive answering of product-related questions to active open dialogue with users, bringing cdau over one million.
VQA and other multi-modal understanding capabilities have landed in the store and the economy, which improves the user interaction experience and greatly reduces the configuration cost of the merchant.
As a typical case, Hotline's voice interaction capability was nominated as one of the top ten breakthrough technologies of the 2019 MIT Technology Reviewer.
8 rounds of dialogue intelligent service
Multi-round interaction plays an important role in solving user ambiguity problems and improving user experience in intelligent service scenarios (customer service robots). Fuzzy questions refer to incomplete user problem descriptions, such as "how to activate", which does not specify which business it is. This type of question accounts for 30% of the total number of questions asked by customer service robots.
The ant intelligence service team has designed a tag-based multi-round interaction scheme. First, the tags are mined offline and reviewed. The tags include business tags (huabei, reserve funds...) and appeal tags (how to open, how to repay...) , to clarify the user's question by asking the user back for a list of tags.
Existing problem clarification methods mainly directly recommend a complete solution to clarify problems, but the definition of what is a good clarification problem is still unclear. The ant team designed a solution based on reinforcement learning to recommend a list of tags to clarify problems. The entire tag recommendation is a The process of sequential decision-making, after the user clicks on the label, we will use the clicked label together with the original user question as the clarified question.
The goal of the whole optimization is that the goal is to maximize the coverage of the entire label list for potential clarification questions while maintaining an effective division of the set of potential clarification questions by different labels. Therefore, during the reinforcement learning process, the reward based on information gain is designed accordingly (Reward).
After multiple rounds of interaction based on the reinforcement learning method went online, 33% of the ambiguity problems in the Ant customer service robot scene were solved, and the conversion rate of the robot comprehensive scene to manual labor dropped by 1.2%.
9 Man-machine dialogue builds a new interaction portal
The scene-driven personalized multi-round dialogue technology boosts the expansion of man-machine dialogue scenarios, while the contextual semantic understanding technology integrating speech and semantics continuously improves the completion rate of multi-round dialogue.
In the past year, Tmall Genie has expanded its human-computer dialogue capabilities to Erha phone assistants, voice shopping, newcomer guidance and other complex interaction scenarios, and even created a record of 1 million voice shopping orders during the Double Eleven period .
Tmall Genie launched the anti-harassment phone assistant "Erha" on 315 last year, opening a new human-computer dialogue interaction scenario: completing the dialogue as a substitute for the user. The dialogue scenario of "Erha" is an open multi-round dialogue in the vertical field. The purpose is to identify the intention of the call through the dialogue and obtain the necessary information instead of the user. In "Erha", we proposed a machine reading comprehension technology based on multiple rounds of dialogue context to understand the intention and key information of the call; based on the understanding of the content of the call, we built a dialogue strategy model based on Transformer to select strategies and Generate dialogue. Aiming at the conversation scenario of "Erha", we propose to use the passing rate of the Turing test to measure the quality of the dialogue, that is, when the caller does not realize that the machine is talking to him during the entire conversation, it can be considered that "Erha" has passed Turing test. "Erha" currently has a Turing test pass rate of 87%, which effectively helps users deal with unfamiliar calls and saves users time.
Completing complex tasks through man-machine dialogue, such as ordering coffee, shopping, etc., often requires multiple dialogue interactions between the machine and the user. The answers are fluent. For example, in the voice shopping scene, Tmall Genie has the ability to be a cross-industry intelligent shopping guide, absorbing the sales experience of shopping guides in various industries, and when users conduct voice shopping, they aim at the final transaction conversion and take the initiative like a salesperson in a shopping mall. Carry out shopping guidance in the form of multiple rounds of dialogue, dig deep into the shopping needs of users and make accurate recommendations based on user portraits. And for different users, Tmall Genie can adopt the most suitable dialogue method for TA, so as to achieve personalized multiple rounds of dialogue.
The completion of multiple rounds of dialogue is based on the completion of a series of single-round interactions. If the completion rate of the overall task is the product of a simple single-round completion rate, the completion rate of multiple rounds of dialogue will be difficult. promote. The key to breaking the simple product relationship is to make full use of contextual information when understanding each round of dialogue.
On Tmall Genie, we explored contextual speech and semantic understanding. First of all, in the speech decoding process, we construct the entity information mentioned above in multiple rounds of dialogue into memory, and use the attention mechanism to allow the decoder network to perceive these dialogue scene information, which significantly improves the speech recognition accuracy of multi-round dialogue scenes , and then in the semantic understanding link, we created an end-to-end context inheritance model with cross-round attention capability to achieve more efficient dialogue scene recovery capabilities. As a result, the error rate of online multi-round dialogues has been reduced by 58.5%, effectively ensuring the expansion of complex multi-round dialogue scenarios.
10 Intelligent voice applications continue to develop
Smart speakers, in recent years, basically big players at home and abroad have entered the market one after another (Amazon Alexa, Google Home/Nest, Tmall Genie), entering the competitive landscape in 2019; Cargo volume is still increasing, but at a slower rate.
Smart speakers still focus on software services such as music playback, but further application innovation still relies on the further popularization of smart home and IoT devices.
Smart voice phone, Google I/O 2018 showcased a demo of Duplex's voice phone assistant. In 2019, smart voice phones began to be more applied to real business fields, including telemarketing, finance, government affairs and other fields, in order to improve user service coverage and reduce labor costs.
In 2019, Ant’s smart voice phone will also be applied and landed in more financial scenarios such as security (security verification), finance (insurance return visit, micro loan collection), payment (customer activation).
Intelligent voice applications, the user scenarios for which rely heavily on dialogue and voice interaction, promote the development of NLP technology and voice technology; with the development of technology and products and the improvement of user acceptance, the application scale and field in 2020 will increase. expand further.
2020 Trends: NLP Further Promotes the Evolution of Artificial Intelligence from Perceptual Intelligence to Cognitive Intelligence
Standing in the new decade, there will be breakthrough changes in intelligent human-computer interaction, multi-modal fusion, NLP solution construction combined with field needs, knowledge graph combined with landing scenarios, etc.
1 Intelligent human-computer interaction
The language model will play a more important role in intelligent human-computer interaction, forming a richer form, a multilingual language model that mixes 100 languages, and a fusion of image-text and speech-text multimodal language models will emerge, in different In small sample scenarios in different languages, different modalities, and different fields, it brings comprehensive capability improvement.
Multilingual interaction rises from the understanding of different languages to the understanding of different cultures. Through cross-cultural understanding technology, we can go deep into the local culture to achieve authentic dialogue interaction.
The interactive mode centered on online text will be fully transformed into a multi-modal human-computer interaction combining video, image, voice, and text.
Data-driven dialogue state tracking and dialogue strategies will gradually replace rule-based strategies, further evolving the multi-round dialogue technology and bringing a more natural dialogue experience.
The knowledge map will be widely integrated into various deep learning models of question answering and dialogue. Through the integration of prior knowledge and reasoning ability, the model will be more white-boxed, bringing better controllability and interpretability.
The improvement of the cold start capability of the dialogue system in the case of small samples has brought about a significant reduction in application construction costs. The dialogue system has expanded from serving a large number of customers to a more inclusive and extensive support for large-scale small businesses in various industries. And small businesses, and further go overseas, so that more users from different countries, languages and cultures enter the era of intelligent services.
2 Multimodal Fusion
With the gradual maturity and popularization of 5G and edge computing, it will bring about the comprehensive integration of video, image, text, voice and other modes. Understanding, will be able to integrate and understand the pictures, voice and text content sent by users through multiple rounds of dialogue, and reply in a multi-modal form;
The dialogue system products will fully realize multi-modal interaction capabilities. Live broadcast and IOT large-screen interaction will fully apply video + image + text multi-modal technology to bring a rich interactive experience, and smooth full-duplex voice dialogue robots will be widely used , to achieve human-like interaction capabilities such as listening and thinking, listening and guessing, and actively grabbing words.
In the voice interaction scenario, through the acoustic signal + text signal, identify the emotional changes in the user's communication, and realize the mimic life based on the camera and microphone in the IOT interaction scenario;
3 Construction of NLP solutions combined with domain requirements
In the past, NLP algorithms mostly output general-purpose models in the form of platforms/APIs. Correspondingly, general-purpose NLP algorithm platforms have been established on various clouds (Amazon Comprehend, Microsoft Azure Text Analytics, Google Cloud Natural Language, Ali NLP, Baidu NLP, etc.).
However, in business scenarios, each scenario domain has its own specific requirements and generates corresponding scenario data. The general model combines scene data for domain adaptive training, so that the output domain customized model will better meet business needs.
4 Combining Knowledge Graphs with Landing Scenarios
Facing the new decade, the two core technologies of NLP and knowledge graphs are used to build industry knowledge graphs. Machines can mine hidden relationships through knowledge graphs, gain insights into relationships and logic that cannot be discovered by the "naked eye", and use them for final business decisions. Realize the implementation of deeper business scenarios. From the perspective of development direction, it can be divided into the following aspects:
Optimize knowledge extraction ability: combine existing knowledge and NLP technical capabilities to further improve unstructured data understanding ability, apply pre-trained language model, information extraction, entity link and other related technologies to extract unstructured and semi-structured data And conversion, forming knowledge in the form of knowledge graph, and linking with structured knowledge in knowledge graph.
Precipitation with industry knowledge: In the actual implementation process of the industry knowledge map solution, there are many challenges. The construction of the industry knowledge map itself requires data accumulation and data understanding based on business scenarios. Building and accumulating industry knowledge maps will be Core competitiveness in the era of cognitive intelligence. In the construction of industry data, the accuracy of knowledge is very high, and entities usually need more and have industry significance. For the fusion of multi-source heterogeneous data, abstract modeling of various types of data is required based on the dynamically changing "concept-entity-attribute-relationship" data model.
Intelligent and credible knowledge reasoning: knowledge reasoning based on past known knowledge, understanding industry event knowledge to drive knowledge reasoning transmission, using industry rule logic combined with in-depth model for reasoning, so that it can bring more intelligence in business reasoning and auxiliary decision-making personalized experience.
The above is our review of the development of NLP technology in the past year and our thoughts on this year's trends. It is inevitable that omissions or generalizations are made in the words of one family. Throwing bricks to attract jade, hoping to get more students' thinking and corrections. Bill Gates once said, "Language understanding is the jewel in the crown of artificial intelligence". To reach such a height, breakthroughs in technology and applications are needed. Looking forward to the beginning of the next decade, we will make NLP technology develop more rapidly, enrich application scenarios, and promote the development of cognitive intelligence.
7 Dialogue System
The language coverage of the dialogue system has been further improved. Based on the multilingual migration ability, the dialogue system of French, Arabic, and Taiwanese has been rapidly expanded. Currently, it supports 11 languages and the mixed language understanding of Malay-English and Thai-English. Lazada and AE have greatly improved the resolution rate.
The dialogue system supports larger merchants and enterprises, and supports more than 50+ group economy customers.Has expanded the knowledge positioning capabilities of general packages, industry packages, and store packages, and has accumulatively carried millions of active merchants. Tens of millions of dialogue rounds. Dingding carries 40W daily active enterprises based on the enterprise intelligent assistant.
The interactive form of the dialogue system has been further enriched. The live broadcast of has realized the transformation from passive answering of product-related questions to active open dialogue with users, bringing cdau over one million.
VQA and other multi-modal understanding capabilities have landed in the store and the economy, which improves the user interaction experience and greatly reduces the configuration cost of the merchant.
As a typical case, Hotline's voice interaction capability was nominated as one of the top ten breakthrough technologies of the 2019 MIT Technology Reviewer.
8 rounds of dialogue intelligent service
Multi-round interaction plays an important role in solving user ambiguity problems and improving user experience in intelligent service scenarios (customer service robots). Fuzzy questions refer to incomplete user problem descriptions, such as "how to activate", which does not specify which business it is. This type of question accounts for 30% of the total number of questions asked by customer service robots.
The ant intelligence service team has designed a tag-based multi-round interaction scheme. First, the tags are mined offline and reviewed. The tags include business tags (huabei, reserve funds...) and appeal tags (how to open, how to repay...) , to clarify the user's question by asking the user back for a list of tags.
Existing problem clarification methods mainly directly recommend a complete solution to clarify problems, but the definition of what is a good clarification problem is still unclear. The ant team designed a solution based on reinforcement learning to recommend a list of tags to clarify problems. The entire tag recommendation is a The process of sequential decision-making, after the user clicks on the label, we will use the clicked label together with the original user question as the clarified question.
The goal of the entire optimization is that the goal is to maximize the coverage of the entire label list for potential clarification questions while maintaining an efficient division of the set of potential clarification questions by different labels. Therefore, in the reinforcement learning process, the reward based on information gain is designed accordingly (Reward).
After multiple rounds of interaction based on the reinforcement learning method went online, 33% of the ambiguity problems in the Ant customer service robot scene were solved, and the conversion rate of the robot comprehensive scene to manual labor dropped by 1.2%.
9 Man-machine dialogue builds a new interaction portal
The scene-driven personalized multi-round dialogue technology boosts the expansion of man-machine dialogue scenarios, while the contextual semantic understanding technology integrating speech and semantics continuously improves the completion rate of multi-round dialogue.
In the past year, Tmall Genie has expanded its human-computer dialogue capabilities to Erha phone assistants, voice shopping, newcomer guidance and other complex interaction scenarios, and even created a record of 1 million voice shopping orders during the Double Eleven period .
Tmall Genie launched the anti-harassment phone assistant "Erha" on 315 last year, opening a new human-computer dialogue interaction scenario: completing the dialogue as a substitute for the user. The dialogue scenario of "Erha" is an open multi-round dialogue in the vertical field. The purpose is to identify the intention of the call through the dialogue and obtain the necessary information instead of the user. In "Erha", we proposed a machine reading comprehension technology based on multiple rounds of dialogue context to understand the intention and key information of the call; based on the understanding of the content of the call, we built a dialogue strategy model based on Transformer to select strategies and Generate dialogue. Aiming at the conversation scenario of "Erha", we propose to use the passing rate of the Turing test to measure the quality of the dialogue, that is, when the caller does not realize that the machine is talking to him during the entire conversation, it can be considered that "Erha" has passed Turing test. "Erha" currently has a Turing test pass rate of 87%, which effectively helps users deal with unfamiliar calls and saves users time.
Completing complex tasks through man-machine dialogue, such as ordering coffee, shopping, etc., often requires multiple dialogue interactions between the machine and the user. The answers are fluent. For example, in the voice shopping scene, Tmall Genie has the ability to be a cross-industry intelligent shopping guide, absorbing the sales experience of shopping guides in various industries, and when users conduct voice shopping, they aim at the final transaction conversion and take the initiative like a salesperson in a shopping mall. Carry out shopping guidance in the form of multiple rounds of dialogue, dig deep into the shopping needs of users and make accurate recommendations based on user portraits. And for different users, Tmall Genie can adopt the most suitable dialogue method for TA, so as to achieve personalized multiple rounds of dialogue.
The completion of multiple rounds of dialogue is based on the completion of a series of single-round interactions. If the completion rate of the overall task is the product of a simple single-round completion rate, the completion rate of multiple rounds of dialogue will be difficult. promote. The key to breaking the simple product relationship is to make full use of contextual information when understanding each round of dialogue.
On Tmall Genie, we explored contextual speech and semantic understanding. First of all, in the speech decoding process, we construct the entity information mentioned above in multiple rounds of dialogue into memory, and use the attention mechanism to allow the decoder network to perceive these dialogue scene information, which significantly improves the speech recognition accuracy of multi-round dialogue scenes , and then in the semantic understanding link, we created an end-to-end context inheritance model with cross-round attention capability to achieve more efficient dialogue scene recovery capabilities. As a result, the error rate of online multi-round dialogues has been reduced by 58.5%, effectively ensuring the expansion of complex multi-round dialogue scenarios.
10 Intelligent voice applications continue to develop
Smart speakers, in recent years, basically big players at home and abroad have entered the market one after another (Amazon Alexa, Google Home/Nest, Tmall Genie), entering the competitive landscape in 2019; Cargo volume is still increasing, but at a slower rate.
Smart speakers still focus on software services such as music playback, but further application innovation still relies on the further popularization of smart home and IoT devices.
Smart voice phone, Google I/O 2018 showcased a demo of Duplex's voice phone assistant. In 2019, smart voice phones began to be more applied to real business fields, including telemarketing, finance, government affairs and other fields, in order to improve user service coverage and reduce labor costs.
In 2019, Ant’s smart voice phone will also be applied and landed in more financial scenarios such as security (security verification), finance (insurance return visit, micro loan collection), payment (customer activation).
Intelligent voice applications, the user scenarios for which rely heavily on dialogue and voice interaction, promote the development of NLP technology and voice technology; with the development of technology and products and the improvement of user acceptance, the application scale and field in 2020 will increase. expand further.
2020 Trends: NLP Further Promotes the Evolution of Artificial Intelligence from Perceptual Intelligence to Cognitive Intelligence
Standing in the new decade, there will be breakthrough changes in intelligent human-computer interaction, multi-modal fusion, NLP solution construction combined with field needs, knowledge graph combined with landing scenarios, etc.
1 Intelligent human-computer interaction
The language model will play a more important role in intelligent human-computer interaction, forming a richer form, a multilingual language model that mixes 100 languages, and a fusion of image-text and speech-text multimodal language models will emerge, in different In small sample scenarios in different languages, different modalities, and different fields, it brings comprehensive capability improvement.
Multilingual interaction rises from the understanding of different languages to the understanding of different cultures. Through cross-cultural understanding technology, we can go deep into the local culture to achieve authentic dialogue interaction.
The interactive mode centered on online text will be fully transformed into a multi-modal human-computer interaction combining video, image, voice, and text.
Data-driven dialogue state tracking and dialogue strategies will gradually replace rule-based strategies, further evolving the multi-round dialogue technology and bringing a more natural dialogue experience.
The knowledge map will be widely integrated into various deep learning models of question answering and dialogue. Through the integration of prior knowledge and reasoning ability, the model will be more white-boxed, bringing better controllability and interpretability.
The improvement of the cold start capability of the dialogue system in the case of small samples has brought about a significant reduction in application construction costs. The dialogue system has expanded from serving a large number of customers to a more inclusive and extensive support for large-scale small businesses in various industries. And small businesses, and further go overseas, so that more users from different countries, languages and cultures enter the era of intelligent services.
2 Multimodal Fusion
With the gradual maturity and popularization of 5G and edge computing, it will bring about the comprehensive integration of video, image, text, voice and other modes. Understanding, will be able to integrate and understand the pictures, voice and text content sent by users through multiple rounds of dialogue, and reply in a multi-modal form;
The dialogue system products will fully realize multi-modal interaction capabilities. Live broadcast and IOT large-screen interaction will fully apply video + image + text multi-modal technology to bring a rich interactive experience, and smooth full-duplex voice dialogue robots will be widely used , to achieve human-like interaction capabilities such as listening and thinking, listening and guessing, and actively grabbing words.
In the voice interaction scenario, through the acoustic signal + text signal, identify the emotional changes in the user's communication, and realize the mimic life based on the camera and microphone in the IOT interaction scenario;
3 Construction of NLP solutions combined with domain requirements
In the past, NLP algorithms mostly output general-purpose models in the form of platforms/APIs. Correspondingly, general-purpose NLP algorithm platforms have been established on various clouds (Amazon Comprehend, Microsoft Azure Text Analytics, Google Cloud Natural Language, Ali NLP, Baidu NLP, etc.).
However, in business scenarios, each scenario domain has its own specific requirements and generates corresponding scenario data. The general model combines scene data for domain adaptive training, so that the output domain customized model will better meet business needs.
4 Combining Knowledge Graphs with Landing Scenarios
Facing the new decade, the two core technologies of NLP and knowledge graphs are used to build industry knowledge graphs. Machines can mine hidden relationships through knowledge graphs, gain insights into relationships and logic that cannot be discovered by the "naked eye", and use them for final business decisions. Realize the implementation of deeper business scenarios. From the perspective of development direction, it can be divided into the following aspects:
Optimize knowledge extraction ability: combine existing knowledge and NLP technical capabilities to further improve unstructured data understanding ability, apply pre-trained language model, information extraction, entity link and other related technologies to extract unstructured and semi-structured data And conversion, forming knowledge in the form of knowledge graph, and linking with structured knowledge in knowledge graph.
Precipitation with industry knowledge: In the actual implementation process of the industry knowledge map solution, there are many challenges. The construction of the industry knowledge map itself requires data accumulation and data understanding based on business scenarios. Building and accumulating industry knowledge maps will be Core competitiveness in the era of cognitive intelligence. In the construction of industry data, the accuracy of knowledge is very high, and entities usually need more and have industry significance. For the fusion of multi-source heterogeneous data, abstract modeling of various types of data is required based on the dynamically changing "concept-entity-attribute-relationship" data model.
Intelligent and credible knowledge reasoning: knowledge reasoning based on past known knowledge, understanding industry event knowledge to drive knowledge reasoning transmission, using industry rule logic combined with in-depth model for reasoning, so that it can bring more intelligence in business reasoning and auxiliary decision-making personalized experience.
The above is our review of the development of NLP technology in the past year and our thoughts on this year's trends. It is inevitable that omissions or generalizations are made in the words of one family. Throwing bricks to attract jade, hoping to get more students' thinking and corrections. Bill Gates once said, "Language understanding is the jewel in the crown of artificial intelligence". To reach such a height, breakthroughs in technology and applications are needed. Looking forward to the beginning of the next decade, we will make NLP technology develop more rapidly, enrich application scenarios, and promote the development of cognitive intelligence.
Looking back on the past, many meaningful landmark events have taken place in the field of application and research of natural language processing technology. We will review from two dimensions of "technical progress", "application and product".
In 2019, technological progress is mainly reflected in pre-trained language models, cross-language NLP/unsupervised machine translation, knowledge map development + dialogue technology integration, intelligent human-computer interaction, and platform manufacturers integrating AI product lines.
1 Pre-trained language model
As Google proposed the pre-training language model BERT at the end of 2018, which achieved better results in multiple NLP tasks, the research and application of pre-training language models was regarded as a major breakthrough in the field of NLP by academia and industry. The solution has gradually evolved from the previous complex model design for each task to the pre-training + fine-tuning paradigm, allowing many NLP applications to enjoy the dividends brought by the large corpus pre-training model. On the basis of adding a simple task layer, combined with a small amount of corpus in your own scene, you can get a good domain NLP model.
So far opened a new chapter in natural language processing.
In 2019, various research institutions and companies have further innovated on the basis of BERT, and have proposed their own pre-training models, such as: RoBERTa released by Facebook, XLNet released by CMU, ELECTRA released by Stanford, and ERNIE model from Baidu. Ali's structBERT model, NEZHA, of Technology and HKUST Xunfei have also proposed their own models, constantly refreshing the best results of NLP tasks.
To sum up, this new work mainly comes from two aspects of training task design and training algorithm.
training task design
Carry out finer semantic granularity modeling, including the introduction of finer-grained modeling objects and finer description of semantic associations.
For example, "whole word Mask" or "Knowledge Masking", the technology masks the whole word instead of a single Token in the MLM pre-training task, thereby increasing the difficulty of the task so that BERT can learn more semantic information, Chinese published jointly by of Technology and HKUST Xunfei The BERT model and the NEZHA model have been applied; another example is the introduction of more types of inter-sentence relationships, so that the semantic relevance can be described more accurately, and then the ability of semantic matching can be improved, which is reflected in the BERT model of Ali and the ant team .
Modeling with new machine learning methods
The XLNet jointly released by CMU and Google uses two schemes, Autoencoder and Auto-regressive; the ELECTRA model proposed by Stanford University introduces a confrontation mechanism for better MLM learning. The SpanBERT model jointly released by the University of Washington and Facebook also introduces the Span prediction task. These schemes apply more learning methods to model the connection between words, thus improving the model performance.
Training Algorithm Design
For the ease of use of the model, reduce the model parameters, or reduce the complexity of the model, including the ALBERT released by Google, which uses the decomposition of the vocabulary embedding matrix and the sharing of the middle layer.
Optimizations to improve training speed
Including mixed-precision training, using FP16 to represent weights, activation functions, and gradients; the LAMB optimizer adjusts the learning rate for each parameter in an adaptive manner, and the model training can use a large Batch Size; these methods are extremely Greatly increased training speed.
Ali's structBERT model improves language representation capabilities by introducing more model and task structured information. On the Gluebench mark, it has been ranked and maintained the leading position many times. Through distillation and CPU acceleration, RT has been improved by 10x, and the finetuned model has brought significant improvements to multiple business scenarios, and the AliNLP platform has been launched.
The pre-trained language model is pre-trained on large-scale unsupervised texts, and the obtained word and sentence representations are transferred to a wide range of downstream tasks, including text matching, text classification, text extraction, reading comprehension, machine question answering and other different scenarios . For example, the Ali language model has won the first place in the MS MARCO question and answer evaluation and TREC Deep Learning evaluation.
Downstream tasks can quickly obtain a good solution with low resources, which greatly improves the application landing ability of NLP algorithms.
2 Cross-lingual NLP/Unsupervised Machine Translation
As an extension of the pre-trained language model, Facebook researchers proposed a cross-language language model pre-training "Cross-lingual Language Model Pretraining", using only unsupervised training of monolingual data and supervised training using parallel corpora , the model effectively learns cross-language text representations, and has significantly improved compared to the previous best results in tasks such as multilingual classification and unsupervised machine learning.
Following Google's pre-trained language model BERT sweeping mainstream NLP tasks in 2018, Facebook released a new cross-language pre-trained language model XLM in 2019, which enables representation sharing of different languages in a unified embedding space and brings Significant quality improvement. In exploring the direction of large-scale, multilingual neural machine translation, Google, Alibaba, etc. have carried out effective exploration, by training a model on the parallel corpus of dozens or even hundreds of languages at the same time, instead of building a model for each language direction separately. Model, realize the sharing of semantic mapping relationship, not only compress the number of models, but also generally improve the translation effect of small languages.
In the past year, the research results of multilingual NLP technology have mainly focused on machine translation (especially unsupervised machine translation), cross-language word vectors, multilingual NER, dependency syntax analysis, word alignment and multilingual dictionary generation.
Since the learning/mapping of cross-language word vectors is a key step, the current unsupervised/cross-language NLP tasks work best between similar languages (such as English/French, English/Spanish, etc.), and in different The effect between language families (such as English/Vietnamese) still has a lot of room for improvement.
3 Knowledge map development + dialogue technology integration
With the accumulation of data volume and the improvement of application requirements for data quality and structure, knowledge graph has become a hot technology in recent years and has begun to attract attention.
The development of knowledge graph technology in 2019, including the construction and integration of domain knowledge graphs (financial, enterprise, etc.), the construction of graph platform standard capabilities (schema definition + construction + call), graph application algorithm construction (based on graph data Graph model + rule reasoning, etc.); and based on the constructed graph data and capabilities, it has begun to be applied in more business scenarios (search and recommendation content understanding and mining, financial risk control and decision-making, dialogue understanding and content generation, etc.).
In the technical direction of the combination of knowledge graph and dialogue, dialogue technology has formed a certain technical framework and business coverage in Q&A and task-based dialogue in recent years, and it is beginning to need to solve some domain scenarios that require higher knowledge understanding + answer professionalism ( Financial Assistant, etc.).
Dialogue technology combines the completeness of domain knowledge + structural quality advantages of knowledge graphs to cover, which can solve the shortcomings of corpus annotation (intention understanding) and expert configuration (dialogue process + response generation) in corresponding scenarios, and further improve dialogue coverage and response quality. Integrating the direction of knowledge graph dialogue, there will be more real-life scenarios and coverage in 2020.
4 Intelligent human-computer interaction
Natural language understanding and in-depth question-and-answer matching technologies continue to develop in academia and industry, and have been applied on a large scale in global businesses and scenarios. Based on pre-trained language models, performance has been further improved.
Machine reading comprehension has become a low-cost general-purpose technology, and application middle-end capabilities have been built around scenarios such as encyclopedias, policies and regulations, product detail pages, and manuals, and access efficiency has been greatly improved. The multi-modal VQA question answering technology combined with pictures and texts is the first to be incubated in the industry, and it has become a new competitiveness to understand the long pictures on the product detail page for question and answer.
Dialog technical capabilities have been further developed, but end-to-end data-driven dialogue state tracking and dialogue strategies can only be explored within a limited range. Task-based robots built on dialogue platforms in industrial scenarios have become mainstream implementation solutions .
Multilingual technology realizes rapid expansion of new languages, builds a multilingual language model based on Cross-Lingual, surpasses Google in long-distance language pairs English -> Chinese, English -> Thai long-distance language pairs, expands a new language from last year 2 months shortened to 2 weeks.
Dialogue generation technology has begun to make breakthroughs. The introduction of structured knowledge has improved the controllability of generation, and the generation of selling points has brought about an increase in the conversion rate of shopping guides.
5 Platform manufacturers integrate AI product lines
With the development of AI technology and the needs of AI applications, and the maturity of AI technology frameworks (Tensorflow, PyTorch, etc.), AI technology capabilities have gradually been standardized into a series of AI platform products, oriented to enterprises and developers, providing lower threshold and higher Efficient AI application support.
Conversational platforms, Google has released the Assistant dialogue assistant since 2016. In recent years, it has released Google Home (now integrated into the Nest smart home brand), Duplex voice phone, and acquired the API.AI dialogue development platform; this year Google has basically Integrating these dialogue product lines basically lays out the existing platform + terminal of dialogue, and forms a whole dialogue product line.
In terms of AI platforms, Amazon has released the SageMaker machine learning platform product since 2017. This year, it further integrated the AI development process based on SageMaker, while opening up the downstream technical framework and upstream AI applications, and integrating the AI product line. Similar to Ali's machine learning platform PAI, it is positioned as a one-stop machine learning platform for enterprises and developers.
In 2019, applications and products are mainly reflected in the continuous development of machine translation, dialogue systems, multi-round dialogue intelligent services, and intelligent voice applications.
6 machine translation
The product development of machine translation has continued the previous trend, expanding more language directions in general fields (news) and specific fields (e-commerce, medical care, etc.), supporting richer business scenarios, and continuing to bring commercial value. Alibaba has carried out fruitful explorations in the direction of translation intervention and intelligent generalization, better integrating business knowledge into the neural network translation framework, and greatly improving the translation accuracy of key information in vertical scenarios.
The translation of high-value and highly sensitive content is still inseparable from manual labor. Therefore, the introduction of intelligent algorithms in computer-aided translation (CAT) to realize human-machine collaborative translation, and new production models such as machine-translated post-editing (MTPE) are also receiving more and more attention. Much attention. Alibaba and Tencent have begun to launch products in automatic post-editing (APE) and interactive translation (IMT), and they have landed in actual business.
In addition to text translation, more multi-modal translation application scenarios have emerged, such as speech translation in simultaneous interpretation of conferences, bilingual subtitles, and attempts on translation machine hardware (the speeches of Mr. Ma and Xiaoyaozi at the 20th anniversary of Ali’s annual meeting were also delivered in real time. displayed in the form of bilingual subtitles).
Image translation combined with OCR, machine translation and picture-combining technology has been applied in Alipay, WeChat, and Sogou translators. With the rise of live streaming by sellers, there will be more and more scenarios and demands for live video translation. However, limited by the complex fields, professional terminology, fast speech rate and sometimes noisy background environment in the live broadcast scene, live translation is also a huge challenge for speech recognition and machine translation.
7 Dialogue System
The language coverage of the dialogue system has been further improved. Based on the multilingual migration ability, the dialogue system of French, Arabic, and Taiwanese has been rapidly expanded. Currently, it supports 11 languages and the mixed language understanding of Malay-English and Thai-English. Lazada and AE have greatly improved the resolution rate.
The dialogue system supports larger merchants and enterprises, and supports more than 50+ group economy customers. Dian has expanded the knowledge positioning capabilities of general packages, industry packages, and store packages, and has accumulatively carried millions of active merchants. Tens of millions of dialogue rounds. Dingding carries 40W daily active enterprises based on the enterprise intelligent assistant.
The interactive form of the dialogue system has been further enriched. The live broadcast of has realized the transformation from passive answering of product-related questions to active open dialogue with users, bringing cdau over one million.
VQA and other multi-modal understanding capabilities have landed in the store and the economy, which improves the user interaction experience and greatly reduces the configuration cost of the merchant.
As a typical case, Hotline's voice interaction capability was nominated as one of the top ten breakthrough technologies of the 2019 MIT Technology Reviewer.
8 rounds of dialogue intelligent service
Multi-round interaction plays an important role in solving user ambiguity problems and improving user experience in intelligent service scenarios (customer service robots). Fuzzy questions refer to incomplete user problem descriptions, such as "how to activate", which does not specify which business it is. This type of question accounts for 30% of the total number of questions asked by customer service robots.
The ant intelligence service team has designed a tag-based multi-round interaction scheme. First, the tags are mined offline and reviewed. The tags include business tags (huabei, reserve funds...) and appeal tags (how to open, how to repay...) , to clarify the user's question by asking the user back for a list of tags.
Existing problem clarification methods mainly directly recommend a complete solution to clarify problems, but the definition of what is a good clarification problem is still unclear. The ant team designed a solution based on reinforcement learning to recommend a list of tags to clarify problems. The entire tag recommendation is a The process of sequential decision-making, after the user clicks on the label, we will use the clicked label together with the original user question as the clarified question.
The goal of the whole optimization is that the goal is to maximize the coverage of the entire label list for potential clarification questions while maintaining an effective division of the set of potential clarification questions by different labels. Therefore, during the reinforcement learning process, the reward based on information gain is designed accordingly (Reward).
After multiple rounds of interaction based on the reinforcement learning method went online, 33% of the ambiguity problems in the Ant customer service robot scene were solved, and the conversion rate of the robot comprehensive scene to manual labor dropped by 1.2%.
9 Man-machine dialogue builds a new interaction portal
The scene-driven personalized multi-round dialogue technology boosts the expansion of man-machine dialogue scenarios, while the contextual semantic understanding technology integrating speech and semantics continuously improves the completion rate of multi-round dialogue.
In the past year, Tmall Genie has expanded its human-computer dialogue capabilities to Erha phone assistants, voice shopping, newcomer guidance and other complex interaction scenarios, and even created a record of 1 million voice shopping orders during the Double Eleven period .
Tmall Genie launched the anti-harassment phone assistant "Erha" on 315 last year, opening a new human-computer dialogue interaction scenario: completing the dialogue as a substitute for the user. The dialogue scenario of "Erha" is an open multi-round dialogue in the vertical field. The purpose is to identify the intention of the call through the dialogue and obtain the necessary information instead of the user. In "Erha", we proposed a machine reading comprehension technology based on multiple rounds of dialogue context to understand the intention and key information of the call; based on the understanding of the content of the call, we built a dialogue strategy model based on Transformer to select strategies and Generate dialogue. Aiming at the conversation scenario of "Erha", we propose to use the passing rate of the Turing test to measure the quality of the dialogue, that is, when the caller does not realize that the machine is talking to him during the entire conversation, it can be considered that "Erha" has passed Turing test. "Erha" currently has a Turing test pass rate of 87%, which effectively helps users deal with unfamiliar calls and saves users time.
Completing complex tasks through man-machine dialogue, such as ordering coffee, shopping, etc., often requires multiple dialogue interactions between the machine and the user. The answers are fluent. For example, in the voice shopping scene, Tmall Genie has the ability to be a cross-industry intelligent shopping guide, absorbing the sales experience of shopping guides in various industries, and when users conduct voice shopping, they aim at the final transaction conversion and take the initiative like a salesperson in a shopping mall. Carry out shopping guidance in the form of multiple rounds of dialogue, dig deep into the shopping needs of users and make accurate recommendations based on user portraits. And for different users, Tmall Genie can adopt the most suitable dialogue method for TA, so as to achieve personalized multiple rounds of dialogue.
The completion of multiple rounds of dialogue is based on the completion of a series of single-round interactions. If the completion rate of the overall task is the product of a simple single-round completion rate, the completion rate of multiple rounds of dialogue will be difficult. promote. The key to breaking the simple product relationship is to make full use of contextual information when understanding each round of dialogue.
On Tmall Genie, we explored contextual speech and semantic understanding. First of all, in the speech decoding process, we construct the entity information mentioned above in multiple rounds of dialogue into memory, and use the attention mechanism to allow the decoder network to perceive these dialogue scene information, which significantly improves the speech recognition accuracy of multi-round dialogue scenes , and then in the semantic understanding link, we created an end-to-end context inheritance model with cross-round attention capability to achieve more efficient dialogue scene recovery capabilities. As a result, the error rate of online multi-round dialogues has been reduced by 58.5%, effectively ensuring the expansion of complex multi-round dialogue scenarios.
10 Intelligent voice applications continue to develop
Smart speakers, in recent years, basically big players at home and abroad have entered the market one after another (Amazon Alexa, Google Home/Nest, Tmall Genie), entering the competitive landscape in 2019; Cargo volume is still increasing, but at a slower rate.
Smart speakers still focus on software services such as music playback, but further application innovation still relies on the further popularization of smart home and IoT devices.
Smart voice phone, Google I/O 2018 showcased a demo of Duplex's voice phone assistant. In 2019, smart voice phones began to be more applied to real business fields, including telemarketing, finance, government affairs and other fields, in order to improve user service coverage and reduce labor costs.
In 2019, Ant’s smart voice phone will also be applied and landed in more financial scenarios such as security (security verification), finance (insurance return visit, micro loan collection), payment (customer activation).
Intelligent voice applications, the user scenarios for which rely heavily on dialogue and voice interaction, promote the development of NLP technology and voice technology; with the development of technology and products and the improvement of user acceptance, the application scale and field in 2020 will increase. expand further.
2020 Trends: NLP Further Promotes the Evolution of Artificial Intelligence from Perceptual Intelligence to Cognitive Intelligence
Standing in the new decade, there will be breakthrough changes in intelligent human-computer interaction, multi-modal fusion, NLP solution construction combined with field needs, knowledge graph combined with landing scenarios, etc.
1 Intelligent human-computer interaction
The language model will play a more important role in intelligent human-computer interaction, forming a richer form, a multilingual language model that mixes 100 languages, and a fusion of image-text and speech-text multimodal language models will emerge, in different In small sample scenarios in different languages, different modalities, and different fields, it brings comprehensive capability improvement.
Multilingual interaction rises from the understanding of different languages to the understanding of different cultures. Through cross-cultural understanding technology, we can go deep into the local culture to achieve authentic dialogue interaction.
The interactive mode centered on online text will be fully transformed into a multi-modal human-computer interaction combining video, image, voice, and text.
Data-driven dialogue state tracking and dialogue strategies will gradually replace rule-based strategies, further evolving the multi-round dialogue technology and bringing a more natural dialogue experience.
The knowledge map will be widely integrated into various deep learning models of question answering and dialogue. Through the integration of prior knowledge and reasoning ability, the model will be more white-boxed, bringing better controllability and interpretability.
The improvement of the cold start capability of the dialogue system in the case of small samples has brought about a significant reduction in application construction costs. The dialogue system has expanded from serving a large number of customers to a more inclusive and extensive support for large-scale small businesses in various industries. And small businesses, and further go overseas, so that more users from different countries, languages and cultures enter the era of intelligent services.
2 Multimodal Fusion
With the gradual maturity and popularization of 5G and edge computing, it will bring about the comprehensive integration of video, image, text, voice and other modes. Understanding, will be able to integrate and understand the pictures, voice and text content sent by users through multiple rounds of dialogue, and reply in a multi-modal form;
The dialogue system products will fully realize multi-modal interaction capabilities. Live broadcast and IOT large-screen interaction will fully apply video + image + text multi-modal technology to bring a rich interactive experience, and smooth full-duplex voice dialogue robots will be widely used , to achieve human-like interaction capabilities such as listening and thinking, listening and guessing, and actively grabbing words.
In the voice interaction scenario, through the acoustic signal + text signal, identify the emotional changes in the user's communication, and realize the mimic life based on the camera and microphone in the IOT interaction scenario;
3 Construction of NLP solutions combined with domain requirements
In the past, NLP algorithms mostly output general-purpose models in the form of platforms/APIs. Correspondingly, general-purpose NLP algorithm platforms have been established on various clouds (Amazon Comprehend, Microsoft Azure Text Analytics, Google Cloud Natural Language, Ali NLP, Baidu NLP, etc.).
However, in business scenarios, each scenario domain has its own specific requirements and generates corresponding scenario data. The general model combines scene data for domain adaptive training, so that the output domain customized model will better meet business needs.
4 Combining Knowledge Graphs with Landing Scenarios
Facing the new decade, the two core technologies of NLP and knowledge graphs are used to build industry knowledge graphs. Machines can mine hidden relationships through knowledge graphs, gain insights into relationships and logic that cannot be discovered by the "naked eye", and use them for final business decisions. Realize the implementation of deeper business scenarios. From the perspective of development direction, it can be divided into the following aspects:
Optimize knowledge extraction ability: combine existing knowledge and NLP technical capabilities to further improve unstructured data understanding ability, apply pre-trained language model, information extraction, entity link and other related technologies to extract unstructured and semi-structured data And conversion, forming knowledge in the form of knowledge graph, and linking with structured knowledge in knowledge graph.
Precipitation with industry knowledge: In the actual implementation process of the industry knowledge map solution, there are many challenges. The construction of the industry knowledge map itself requires data accumulation and data understanding based on business scenarios. Building and accumulating industry knowledge maps will be Core competitiveness in the era of cognitive intelligence. In the construction of industry data, the accuracy of knowledge is very high, and entities usually need more and have industry significance. For the fusion of multi-source heterogeneous data, abstract modeling of various types of data is required based on the dynamically changing "concept-entity-attribute-relationship" data model.
Intelligent and credible knowledge reasoning: knowledge reasoning based on past known knowledge, understanding industry event knowledge to drive knowledge reasoning transmission, using industry rule logic combined with in-depth model for reasoning, so that it can bring more intelligence in business reasoning and auxiliary decision-making personalized experience.
The above is our review of the development of NLP technology in the past year and our thoughts on this year's trends. It is inevitable that omissions or generalizations are made in the words of one family. Throwing bricks to attract jade, hoping to get more students' thinking and corrections. Bill Gates once said, "Language understanding is the jewel in the crown of artificial intelligence". To reach such a height, breakthroughs in technology and applications are needed. Looking forward to the beginning of the next decade, we will make NLP technology develop more rapidly, enrich application scenarios, and promote the development of cognitive intelligence.
7 Dialogue System
The language coverage of the dialogue system has been further improved. Based on the multilingual migration ability, the dialogue system of French, Arabic, and Taiwanese has been rapidly expanded. Currently, it supports 11 languages and the mixed language understanding of Malay-English and Thai-English. Lazada and AE have greatly improved the resolution rate.
The dialogue system supports larger merchants and enterprises, and supports more than 50+ group economy customers.Has expanded the knowledge positioning capabilities of general packages, industry packages, and store packages, and has accumulatively carried millions of active merchants. Tens of millions of dialogue rounds. Dingding carries 40W daily active enterprises based on the enterprise intelligent assistant.
The interactive form of the dialogue system has been further enriched. The live broadcast of has realized the transformation from passive answering of product-related questions to active open dialogue with users, bringing cdau over one million.
VQA and other multi-modal understanding capabilities have landed in the store and the economy, which improves the user interaction experience and greatly reduces the configuration cost of the merchant.
As a typical case, Hotline's voice interaction capability was nominated as one of the top ten breakthrough technologies of the 2019 MIT Technology Reviewer.
8 rounds of dialogue intelligent service
Multi-round interaction plays an important role in solving user ambiguity problems and improving user experience in intelligent service scenarios (customer service robots). Fuzzy questions refer to incomplete user problem descriptions, such as "how to activate", which does not specify which business it is. This type of question accounts for 30% of the total number of questions asked by customer service robots.
The ant intelligence service team has designed a tag-based multi-round interaction scheme. First, the tags are mined offline and reviewed. The tags include business tags (huabei, reserve funds...) and appeal tags (how to open, how to repay...) , to clarify the user's question by asking the user back for a list of tags.
Existing problem clarification methods mainly directly recommend a complete solution to clarify problems, but the definition of what is a good clarification problem is still unclear. The ant team designed a solution based on reinforcement learning to recommend a list of tags to clarify problems. The entire tag recommendation is a The process of sequential decision-making, after the user clicks on the label, we will use the clicked label together with the original user question as the clarified question.
The goal of the entire optimization is that the goal is to maximize the coverage of the entire label list for potential clarification questions while maintaining an efficient division of the set of potential clarification questions by different labels. Therefore, in the reinforcement learning process, the reward based on information gain is designed accordingly (Reward).
After multiple rounds of interaction based on the reinforcement learning method went online, 33% of the ambiguity problems in the Ant customer service robot scene were solved, and the conversion rate of the robot comprehensive scene to manual labor dropped by 1.2%.
9 Man-machine dialogue builds a new interaction portal
The scene-driven personalized multi-round dialogue technology boosts the expansion of man-machine dialogue scenarios, while the contextual semantic understanding technology integrating speech and semantics continuously improves the completion rate of multi-round dialogue.
In the past year, Tmall Genie has expanded its human-computer dialogue capabilities to Erha phone assistants, voice shopping, newcomer guidance and other complex interaction scenarios, and even created a record of 1 million voice shopping orders during the Double Eleven period .
Tmall Genie launched the anti-harassment phone assistant "Erha" on 315 last year, opening a new human-computer dialogue interaction scenario: completing the dialogue as a substitute for the user. The dialogue scenario of "Erha" is an open multi-round dialogue in the vertical field. The purpose is to identify the intention of the call through the dialogue and obtain the necessary information instead of the user. In "Erha", we proposed a machine reading comprehension technology based on multiple rounds of dialogue context to understand the intention and key information of the call; based on the understanding of the content of the call, we built a dialogue strategy model based on Transformer to select strategies and Generate dialogue. Aiming at the conversation scenario of "Erha", we propose to use the passing rate of the Turing test to measure the quality of the dialogue, that is, when the caller does not realize that the machine is talking to him during the entire conversation, it can be considered that "Erha" has passed Turing test. "Erha" currently has a Turing test pass rate of 87%, which effectively helps users deal with unfamiliar calls and saves users time.
Completing complex tasks through man-machine dialogue, such as ordering coffee, shopping, etc., often requires multiple dialogue interactions between the machine and the user. The answers are fluent. For example, in the voice shopping scene, Tmall Genie has the ability to be a cross-industry intelligent shopping guide, absorbing the sales experience of shopping guides in various industries, and when users conduct voice shopping, they aim at the final transaction conversion and take the initiative like a salesperson in a shopping mall. Carry out shopping guidance in the form of multiple rounds of dialogue, dig deep into the shopping needs of users and make accurate recommendations based on user portraits. And for different users, Tmall Genie can adopt the most suitable dialogue method for TA, so as to achieve personalized multiple rounds of dialogue.
The completion of multiple rounds of dialogue is based on the completion of a series of single-round interactions. If the completion rate of the overall task is the product of a simple single-round completion rate, the completion rate of multiple rounds of dialogue will be difficult. promote. The key to breaking the simple product relationship is to make full use of contextual information when understanding each round of dialogue.
On Tmall Genie, we explored contextual speech and semantic understanding. First of all, in the speech decoding process, we construct the entity information mentioned above in multiple rounds of dialogue into memory, and use the attention mechanism to allow the decoder network to perceive these dialogue scene information, which significantly improves the speech recognition accuracy of multi-round dialogue scenes , and then in the semantic understanding link, we created an end-to-end context inheritance model with cross-round attention capability to achieve more efficient dialogue scene recovery capabilities. As a result, the error rate of online multi-round dialogues has been reduced by 58.5%, effectively ensuring the expansion of complex multi-round dialogue scenarios.
10 Intelligent voice applications continue to develop
Smart speakers, in recent years, basically big players at home and abroad have entered the market one after another (Amazon Alexa, Google Home/Nest, Tmall Genie), entering the competitive landscape in 2019; Cargo volume is still increasing, but at a slower rate.
Smart speakers still focus on software services such as music playback, but further application innovation still relies on the further popularization of smart home and IoT devices.
Smart voice phone, Google I/O 2018 showcased a demo of Duplex's voice phone assistant. In 2019, smart voice phones began to be more applied to real business fields, including telemarketing, finance, government affairs and other fields, in order to improve user service coverage and reduce labor costs.
In 2019, Ant’s smart voice phone will also be applied and landed in more financial scenarios such as security (security verification), finance (insurance return visit, micro loan collection), payment (customer activation).
Intelligent voice applications, the user scenarios for which rely heavily on dialogue and voice interaction, promote the development of NLP technology and voice technology; with the development of technology and products and the improvement of user acceptance, the application scale and field in 2020 will increase. expand further.
2020 Trends: NLP Further Promotes the Evolution of Artificial Intelligence from Perceptual Intelligence to Cognitive Intelligence
Standing in the new decade, there will be breakthrough changes in intelligent human-computer interaction, multi-modal fusion, NLP solution construction combined with field needs, knowledge graph combined with landing scenarios, etc.
1 Intelligent human-computer interaction
The language model will play a more important role in intelligent human-computer interaction, forming a richer form, a multilingual language model that mixes 100 languages, and a fusion of image-text and speech-text multimodal language models will emerge, in different In small sample scenarios in different languages, different modalities, and different fields, it brings comprehensive capability improvement.
Multilingual interaction rises from the understanding of different languages to the understanding of different cultures. Through cross-cultural understanding technology, we can go deep into the local culture to achieve authentic dialogue interaction.
The interactive mode centered on online text will be fully transformed into a multi-modal human-computer interaction combining video, image, voice, and text.
Data-driven dialogue state tracking and dialogue strategies will gradually replace rule-based strategies, further evolving the multi-round dialogue technology and bringing a more natural dialogue experience.
The knowledge map will be widely integrated into various deep learning models of question answering and dialogue. Through the integration of prior knowledge and reasoning ability, the model will be more white-boxed, bringing better controllability and interpretability.
The improvement of the cold start capability of the dialogue system in the case of small samples has brought about a significant reduction in application construction costs. The dialogue system has expanded from serving a large number of customers to a more inclusive and extensive support for large-scale small businesses in various industries. And small businesses, and further go overseas, so that more users from different countries, languages and cultures enter the era of intelligent services.
2 Multimodal Fusion
With the gradual maturity and popularization of 5G and edge computing, it will bring about the comprehensive integration of video, image, text, voice and other modes. Understanding, will be able to integrate and understand the pictures, voice and text content sent by users through multiple rounds of dialogue, and reply in a multi-modal form;
The dialogue system products will fully realize multi-modal interaction capabilities. Live broadcast and IOT large-screen interaction will fully apply video + image + text multi-modal technology to bring a rich interactive experience, and smooth full-duplex voice dialogue robots will be widely used , to achieve human-like interaction capabilities such as listening and thinking, listening and guessing, and actively grabbing words.
In the voice interaction scenario, through the acoustic signal + text signal, identify the emotional changes in the user's communication, and realize the mimic life based on the camera and microphone in the IOT interaction scenario;
3 Construction of NLP solutions combined with domain requirements
In the past, NLP algorithms mostly output general-purpose models in the form of platforms/APIs. Correspondingly, general-purpose NLP algorithm platforms have been established on various clouds (Amazon Comprehend, Microsoft Azure Text Analytics, Google Cloud Natural Language, Ali NLP, Baidu NLP, etc.).
However, in business scenarios, each scenario domain has its own specific requirements and generates corresponding scenario data. The general model combines scene data for domain adaptive training, so that the output domain customized model will better meet business needs.
4 Combining Knowledge Graphs with Landing Scenarios
Facing the new decade, the two core technologies of NLP and knowledge graphs are used to build industry knowledge graphs. Machines can mine hidden relationships through knowledge graphs, gain insights into relationships and logic that cannot be discovered by the "naked eye", and use them for final business decisions. Realize the implementation of deeper business scenarios. From the perspective of development direction, it can be divided into the following aspects:
Optimize knowledge extraction ability: combine existing knowledge and NLP technical capabilities to further improve unstructured data understanding ability, apply pre-trained language model, information extraction, entity link and other related technologies to extract unstructured and semi-structured data And conversion, forming knowledge in the form of knowledge graph, and linking with structured knowledge in knowledge graph.
Precipitation with industry knowledge: In the actual implementation process of the industry knowledge map solution, there are many challenges. The construction of the industry knowledge map itself requires data accumulation and data understanding based on business scenarios. Building and accumulating industry knowledge maps will be Core competitiveness in the era of cognitive intelligence. In the construction of industry data, the accuracy of knowledge is very high, and entities usually need more and have industry significance. For the fusion of multi-source heterogeneous data, abstract modeling of various types of data is required based on the dynamically changing "concept-entity-attribute-relationship" data model.
Intelligent and credible knowledge reasoning: knowledge reasoning based on past known knowledge, understanding industry event knowledge to drive knowledge reasoning transmission, using industry rule logic combined with in-depth model for reasoning, so that it can bring more intelligence in business reasoning and auxiliary decision-making personalized experience.
The above is our review of the development of NLP technology in the past year and our thoughts on this year's trends. It is inevitable that omissions or generalizations are made in the words of one family. Throwing bricks to attract jade, hoping to get more students' thinking and corrections. Bill Gates once said, "Language understanding is the jewel in the crown of artificial intelligence". To reach such a height, breakthroughs in technology and applications are needed. Looking forward to the beginning of the next decade, we will make NLP technology develop more rapidly, enrich application scenarios, and promote the development of cognitive intelligence.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00