By Zhenzi
Who should read this article? If you are interested in upgrading your programming through machine learning, you will not want to miss this article. What will I learn from this article? You can break out of the old ways of thinking and understand the value of machine learning.
After many surveys, conversations, and reflection, I found the main reason is that most frontend personnel have not changed their thinking and still consider intelligent capabilities from a traditional perspective. Today, I want to describe how we can conceive of frontend intelligence in a new way.
The key question is: Will you or a machine learning model do that? If you will do something, you must consider the issue clearly. If the model will perform the task, you must define the issue clearly. What is the difference between considering a problem clearly and defining a problem clearly? This confuses many people. To put it simply, defining a problem clearly is to specify the problem that you want to solve and the field where the problem occurs. Considering a problem clearly is to specify the problem you want to solve, the field where the problem occurs, and how to solve the problem. In this case, you may have two questions: can the problem be solved, and, if so, how can I solve the problem?
This question is not a scientific one since the specific problem and the extent to which it can be solved are not known. This leads to many derivative questions. First, let's explore them from the macro perspective. My understanding of intelligence is from my analysis and consideration of pain points in the frontend and programming fields. When I arrived in Hangzhou in September 2018, DaVinci, the predecessor of imgcook.com, had developed for two years. To complement DaVinci's ability to generate code from designs, we established data standards and a unified delivery system at the underlying layer and set up business platforms, such as Ark and Mingyunshi at the upper layer. However, DaVinci could not solve all problems. It could only solve simple problems by strictly following standard design documents.
Furthermore, DaVinci identified design documents using OpenCV technology, which needed a large number of thresholds and constraints to achieve better results.
No matter how we adjust the thresholds in OpenCV, we cannot get rid of bad cases. As shown in the preceding figure, even with constant effort, we could not manage the infinite combinations and unlimited possibilities of design documents and the diverse personal habits of designers.
What can we do? We can use machine vision models and algorithms. I have studied many such models and algorithms.
These investigations gradually inspired me. After constant exploration, I finally achieved good results in the models MaskRCNN and Yolo V3.
Machine learning and deep neural networks can be combined to accurately identify images, text, and controllers.
In a word, intelligent approaches enhanced with machine learning can solve problems that traditional programming cannot solve with thresholds. Next, I will explain how to use intelligent approaches to solve problems that traditional programming cannot solve.
As the saying goes, "Attitude is everything." Now, I will discuss how my team and I achieved the correct attitudes. First question: Is machine learning a difficult tool? You should be able to guess the answer: No! Today, it is easy to use many mature frameworks and models in machine learning. You can click the following URLs to find such tools.
Install the command-line tool for managing Pipcook projects:
$ npm install -g @pipcook/pipcook-cli
Initialize a project:
$ mkdir pipcook-example && cd pipcook-example
$ pipcook init
Playground:
If you are wondering what you can do in Pipcook and where you can check your training logs and models, you could start from the Pipboard:
$ pipcook board
You will see a web page prompt in your browser. There is a MNIST showcase on the home page, and you can experiment with it there. If you want to train a model to recognize MNIST handwritten digits by yourself, you could try the examples below:
You can check this page for a complete list. It's quick and easy to run these examples. For example, to do a MNIST image classification, run the following prompt to start the pipeline:
$ node examples/pipeline/pipeline-mnist-image-classification.js
Uploader: That's all there is to say about frontend intelligence.
Going in chronological order, first, let's review the traditional research and development process and analyze why problems that DaVinci cannot solve with OpenCV can be solved by machine learning and intelligence.
Returning to the key question: What is the difference between defining a problem clearly and considering a problem clearly? When OpenCV is used, defining a problem means specifying how to extract images, text, and controllers using the OpenCV algorithm. First, we must understand what an image is, what text is, and what a controller is.
However, when we apply intelligent methods, we only need to tell a model: this is an image, this is text, this is a controller, and this is label data; just as I told my eldest son, "it is wrong to throw blocks at your little brother." The next time he made sure it was a rubber ball he threw at his brother. (#-_-). I admit that I failed to explain it clearly. I should have told my eldest son that it was wrong to throw anything at another person. The same is true when using OpenCV to extract images, text, and controllers. It is difficult to use these elements in design documents based on standard rules. The DaVinci team considered this problem for more than two years.
How can we solve this problem using machine learning and intelligence? It is very simple. We just need to provide both correct and incorrect answers as samples so that a model can find a solution using a brute force algorithm. This is an example of machine learning. Therefore, the most essential difference is that we do not need to consider how to solve a problem, but we need to select the right model for a given field. For more information, you can see the documentation of related frameworks and models. Then, we can use the correct answers to train the model.
How can we find correct answers? We developed a SimpleCook platform to collect, organize, and label datasets. To collect data, we have to find raw data. To identify controllers in imgcook projects, we use frontend technologies to generate the controllers on the SimpleCook platform. Puppeteer is the key technology used for this purpose. A headless browser is used to render pages, and then image areas corresponding to imgcook, text, and controllers are labeled based on programming rules or manual labeling. Then, we generate a data set of correct answers.
How do we train a model? Do you remember the Pipcook frontend machine learning framework mentioned earlier? Only a command line is needed:
You can check here for a complete list. It's quick and easy to run these examples. For example, to do a MNIST image classification, run the following prompt to start the pipeline:
$ node examples/pipeline/pipeline-mnist-image-classification.js
You only need to replace the sample data in the Pipcook tutorial with your data set. The whole process is simple. To sum up, the key to an intelligent approach is to organize correct answers, select a model, and train the model. Then, you're done!
The process of solving a problem through intelligent thinking is simple since there is always an answer. This is similar to difficult choices in our own lives. Everyone has his or her answer. Even if we tell a model a "so-called" correct answer, the trained model may provide an unexpected answer to an unknown question, but the model will always provide an answer.
Based on our experience, problems that have been solved in the industry can also be solved in our field. For example, an image classification model can accurately identify a cat or dog as long as a cat or dog exists in the images. This has been verified in rigorous experiments by other people. If I label images of cats and dogs I took and feed them into a model. The trained model can be used to identify images of cats and dogs taken by other people. This is another part of what we mean by certainty.
Finally, let's look at an example from real life. When you join a new team, you cannot remember everyone's faces and names at first. It takes time to get to know everyone's name and connect it with the correct face. The same is true for a model. The model cannot remember the faces of different people at the very beginning. Then, we label the faces of different people pictures at different angles and in different light conditions. After the model is trained, it can recall everyone's name. This is certainty.
It is easy to solve a problem with intelligent thinking. After we provide a model with correct answers, the model trains its parameters and weights based on this sample data to obtain an algorithm that defines the solution. This is robust. Let's go back to the problem that DaVinci encountered when it used OpenCV: incorrect thresholds. It is difficult to summarize and extract the characteristics of images, text, and controllers. However, if a model is provided with enough correct answers as sample data, it can extract robust answers.
If you are interested, you can search for genetic algorithms and ant colony algorithms in Google. You will find that any algorithm with high robustness will perform better than we expect. It is difficult to summarize and extract the models and ideas behind these algorithms. However, we can still write genetic algorithms and ant colony algorithms to train these robust models and algorithms in simulated or real environments. This programming approach itself robust. In the past, we always started writing code only when we had a clear idea. Today, if we use intelligent thinking, we can start writing code even when we do not have a clear idea. This approach to software development is also robust.
Today, we are faced with the issue of the high availability of servers. Around ten years ago, I spoke with my colleges about "self-healing" by systems. Few companies approached this goal, but this is possible through intelligent thinking. Now, I want to share my thoughts about the system self-healing starting with the evolution of intelligent thinking.
The evolution of intelligent thinking can be demonstrated in the way we solve problems. In the past, we determined a solution before development and implemented the solution during development. This was called hardcoding. When we encountered design problems that required flexibility, we had to write new code and constantly install patches to address new problems.
When we solve a problem with intelligent thinking, we do not provide a solution. A model automatically extracts a solution from the correct answers. When we encounter new conditions, we can feed these conditions to the model as new answers. Then, the model can automatically evolve to solve similar problems by itself. To achieve this, we need to form a closed-loop evolution process that includes evaluating answers, generating positive and negative samples based on new answers, and constructing a path for online training.
Therefore, solving problems with intelligent thinking is valuable due to its certainty, robustness, and evolution. Is there any programming language, technology, framework, scaffolding, or tool that can give us this capability?
No matter how much I say, it is useless if you do not take it in. What do you have to do to be able to truly listen and comprehend? I think you must break out of the old ways of thinking. First of all, I admit that I am proud and self-opinionated as a programmer. Since I wrote my first function in Basic on Dos 6.22 in 1990 and printed rhombus, circles, and squares composed of asterisks on the screen, I have been sure that I can program the world.
Later that year, my father bought a 486 DX-100 computer. Since then, I started on the road to program the whole world. From then on, I made every decision in my life on my own. I went to a technical secondary school, dropped out after only one semester, worked as a waiter in a convention hall, opened a store to sell videotapes and CDs, closed the store to take the adult college entrance exam, upgraded from a junior college to a university with a major in law, and worked at ERP Software as a salesman after graduation. Along the way, I did not listen to any opinions. After all, I am the one who can program the whole world.
The image above shows my relationship with programming.
Although I am very opinionated, I can still keep an open mind. This is an important reason why I can learn anything quickly, from teaching myself programming to learning about electronic circuit design. I learned embedded development and industrial design on my own when I ran a company.
Today, with the arrival of machine learning, new intelligent thinking is needed in the frontend field to analyze, define, and solve problems. I can reinvent myself any time and consider all new things and technologies in an unusual way. This allows me to find new opportunities.
The word "impossible" is not in my vocabulary. I can learn anything if I train myself. How can I train myself? First of all, I must learn how to control myself. I know that I can solve more complex and valuable problems using models. Once I fully solve a problem by evolving a model, I can advance towards my next goal. It is really difficult to restrain my desire to control and express. I have trained myself for almost ten years, but I am still at the very beginning.
It is natural for programmers like me to solve a problem by writing code. It is not intuitive for us to find a model and train the model to assist in solving a problem. What can I do? I know that life is found in motion. However, when I get home, I was still used to lying down and watching videos on my tablet.
What can I do? I made three rules for myself:
Now that I have rambled so much, you probably want to respond. I invite you to join the Frontend D2C Intelligence Team in the Taobao Technology Department. We are responsible for intelligent development in the Alibaba Frontend Technology Committee. No matter what problems you encounter in developing habits of intelligent thinking, we have solutions.
Do Alibaba's Frontend Developers Slice Images? No, it‘s Done by AI
AI Empowers Design-to-Code in the Mid-End and Backend Scenarios
66 posts | 3 followers
FollowAlibaba F(x) Team - December 7, 2020
Key - February 20, 2020
Alibaba F(x) Team - September 1, 2021
Alibaba Clouder - February 14, 2020
Alibaba F(x) Team - September 30, 2021
Alibaba F(x) Team - June 20, 2022
66 posts | 3 followers
FollowA platform that provides enterprise-level data modeling services based on machine learning algorithms to quickly meet your needs for data-driven operations.
Learn MoreAlibaba Cloud (in partnership with Whale Cloud) helps telcos build an all-in-one telecommunication and digital lifestyle platform based on DingTalk.
Learn MoreThis technology can be used to predict the spread of COVID-19 and help decision makers evaluate the impact of various prevention and control measures on the development of the epidemic.
Learn MoreA low-code development platform to make work easier
Learn MoreMore Posts by Alibaba F(x) Team