The Way to Train a Form Recognition Model on the Frontend Quickly

By Tianke from F(x) Team

This article will use Pipcook 1.0 to quickly train a form recognition model and use this model to improve the efficiency of form development.

Pain Point

You may encounter a pain point in the process of restoring web pages on the frontend. The designer designs a form in the design draft, and you copy and modify the code into a similar form that you can find in Ant Design or Fusion. It is inefficient and troublesome.

Can it be faster? Can the form code be generated by screenshots? The answer is Yes.

Solution

You can train a target detection model. The input of the model is screenshots, and the output is the types and coordinates of all form items. Therefore, you can get all the form items in it by taking a screenshot of the form in the design draft. The form code can be generated after combining them with labels generated by text recognition. For example, I have implemented the function of generating form code by screenshots.

In this figure, the red boxes are the form items detected by the target detection model, and the green boxes are the text recognized by the text recognition API. After some calculation, the form protocol or code can be generated.

Text recognition is universal, so I will not introduce it. However, how is the function of form item detection implemented? The following part introduces the overall steps:

Sample: Collect thousands of form pictures and label the form items
Training: Feed samples into machines for learning
Prediction: After training, it sends a new form picture to the model. The model can predict the labels.

The following part describes each step in detail.

Sample

Here, the form recognition samples are universal target detection samples. For the labeling method, please see the previous section. A dataset of form recognition samples is provided for convenience:

http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/mid/mid_base.zip

Training

Next, I will show you how to use Pipcook to run the sample pages to generate a large number of samples and train a target detection model.

An Introduction to Pipcook

Pipcook is a machine learning application framework developed by the D2C Team of Tao Technology for frontend developers. We hope Pipcook will become a platform for frontend engineers to learn and practice machine learning and promote frontend intelligence. Pipcook is an open-source framework. You are welcome to build it together with us.

Installation

Make sure that your node version is 12 or later. Then:

// Install cnpm for acceleration
npm i @pipcook/pipcook-cli cnpm -g --registry=https://registry.npm.taobao.org

The initialization is next:

pipcook init --tuna -c cnpm
pipcook daemon start

Configuration

Form recognition is a target detection task, so you can create a new configuration file in .json format. Don't worry. You do not need to modify most parameters in this configuration file.

form.json

{
  "plugins": {
    "dataCollect": {
      "package": "@pipcook/plugins-object-detection-pascalvoc-data-collect",
      "params": {
        "url": "http://ai-sample.oss-cn-hangzhou.aliyuncs.com/pipcook/datasets/mid/mid_base.zip"
      }
    },
    "dataAccess": {
      "package": "@pipcook/plugins-coco-data-access"
    },
    "modelDefine": {
      "package": "@pipcook/plugins-detectron-fasterrcnn-model-define"
    },
    "modelTrain": {
      "package": "@pipcook/plugins-detectron-model-train",
      "params": {
        "steps": 20000
      }
    },
    "modelEvaluate": {
      "package": "@pipcook/plugins-detectron-model-evaluate"
    }
  }
}

You need to set up parameters in dataCollect.params:

url: Your sample address

You can also run this configuration file directly to train a form detection model.

Running

The target detection model requires a large amount of computing, so you may need a GPU machine. Otherwise, the training will take several weeks.

pipcook run form.json --tuna

The training time may be a bit long, so go to lunch or write some business code.

After the training is completed, a model is generated and stored in the output directory.

Prediction

After the training is completed, the output is generated in the current directory. This is a brand new npm package. First, install the dependency:

cd output
// BOA_TUNA = 1  It is mainly for acceleration in China
BOA_TUNA=1 npm install

After the installation, go back to the root directory, download a test image, and name it test.jpg.

cd ..

curl https://img.alicdn.com/tfs/TB1bWO6b7Y2gK0jSZFgXXc5OFXa-1570-522.jpg --output test.jpg

Finally, we can begin to predict:

const predict = require('./output');
(async () => {
  const v1 = await predict('./test.jpg');
  console.log(v1); 
  // {
  //   boxes: [
  //       [83, 31, 146, 71],  // xmin, ymin, xmax, ymax
  //     [210, 48, 256, 78],
  //     [403, 30, 653, 72],
  //     [717, 41, 966, 83]
  //   ],
  //   classes: [
  //       0, 1, 2, 2  // class index
  //   ],
  //   scores: [
  //       0.95, 0.93, 0.96, 0.99 // scores
  //   ]
  // }
})();

Note: The result consists of three parts:

Boxes: This property is an array, and each element is another array containing four elements, xmin, xmax, ymin, and ymax.
Scores: This property is an array, and each element is the confidence coefficient of the corresponding prediction result.
Classes: This property is an array, and each element is the corresponding predicted category.

Visualized boxes, scores, and classes:

Community

The Way to Train a Form Recognition Model on the Frontend Quickly

Pain Point

Solution

Sample

Training

An Introduction to Pipcook

Installation

Configuration

form.json

Running

Prediction

Read previous post:

Read next post:

Alibaba F(x) Team

You may also like

Comments

Alibaba F(x) Team

Related Products

Platform For AI

Epidemic Prediction Solution

ChatAPP

Intelligent Speech Interaction