iTAG of Machine Learning Platform for AI (PAI) provides labeling templates for Named Entity Recognition (NER), text classification, and relationship analysis for named entities. When you create a text labeling job, you can select a labeling template based on your business scenario. This topic describes the scenarios of text labeling templates and the data structures of input and output data for these templates.

Background information

iTAG provides text labeling templates that support the following features:

NER

NER is used to drag a selection box over named entities and label the named entities.

  • Scenarios

    This labeling template applies to scenarios such as keyword recognition for commodity names and news content.

  • Data structures
    • Input data
      Each row in the .manifest file of input data contains an object. Each row must contain the source field.
      {"data":{"source":"Alibaba Group acquired Vendio and Auctiva, two e-commerce platforms that serve American small enterprises. In the same month, Alibaba Group launched mobile apps for Taobao."}}
      ...
    • Output data
      Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
      {
          "data": {
              "source": "Alibaba Group acquired Vendio and Auctiva, two e-commerce platforms that serve American small enterprises. In the same month, Alibaba Group launched mobile apps for Taobao."
          }, 
          "label-1430082002522152960": {
              "results": [
                  {
                      "objects": [
                          {
                              "result": {
                                  "Text content": [
                                      "Label 1"
                                  ]
                              }, 
                              "color": null, 
                              "id": null, 
                              "text": "Optical character recognition (OCR) result 1", 
                              "start": 49, 
                              "end": 51
                          }, 
                          {
                              "result": {
                                  "Text content": [
                                      "Label 2", 
                                      "Label 3"
                                  ]
                              }, 
                              "color": null, 
                              "id": null, 
                              "text": "OCR result 2", 
                              "start": 34, 
                              "end": 40
                          }, 
                      ], 
                      "empty": false
                  }
              ]
          }
      }

Text classification

Text classification is used to find one or more labels that match input text from a set of labels and add the labels to the text. This template supports single-label and multi-label text classification.

  • Scenarios

    This labeling template applies to scenarios such as news recommendation, knowledge management, and junk content filtering.

  • Data structures
    • Input data
      Each row in the .manifest file of input data contains an object. Each row must contain the source field.
      {"data":{"source":"Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan."}}
      ...
    • Output data
      Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
      { 
          "data": {
              "source": "Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan."
          }, 
          "label-1432989439570944000": {
              "results": [
                  {
                      "questionId": "2", 
                      "data": [
                          "Label 2", 
                          "Label 1"
                      ], 
                      "markTitle": "Multiple-choice", 
                      "type": "survey/multivalue"
                  }
              ]
          }
      }

Relationship analysis for named entities

Relationship analysis for named entities is used to label existing relationships among named entities. This template applies to scenarios in which triples and knowledge graphs are used to structure information.

  • Scenarios

    This labeling template applies to scenarios such as knowledge graphs.

  • Data structures
    • Input data
      Each row in the .manifest file of input data contains an object. Each row must contain the source field.
      {"data":{"source":"Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan."}}
      ...
    • Output data
      Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
      {
          "data": {
              "source": "Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan."
          }, 
          "label-1435488346167255040": {
              "results": [
                  {
                      "objects": [
                          {
                              "result": {
                                  "Multiple-choice": [
                                      "Label 3"
                                  ]
                              }, 
                              "color": null, 
                              "id": null, 
                              "text": "Group buying website", 
                              "start": 32, 
                              "end": 35
                          }, 
                          {
                              "result": {
                                  "Multiple-choice": [
                                      "Label 2"
                                  ]
                              }, 
                              "color": null, 
                              "id": null, 
                              "text": "1688", 
                              "start": 18, 
                              "end": 21
                          }, 
                          {
                              "result": {
                                  "Multiple-choice": [
                                      "Label 1"
                                  ]
                              }, 
                              "color": null, 
                              "id": null, 
                              "text": "Businesses", 
                              "start": 9, 
                              "end": 12
                          }
                      ], 
                      "empty": false
                  }, 
                  [
                      {
                          "result": {
                              "Single-choice": "Label 4"
                          }, 
                          "from": {
                              "x": -225, 
                              "y": -126, 
                              "start": 9, 
                              "end": 12, 
                              "text": "Businesses"
                          }, 
                          "to": {
                              "x": -233, 
                              "y": 75, 
                              "start": 18, 
                              "end": 21, 
                              "text": "1688"
                          }
                      }, 
                      {
                          "result": {
                              "Single-choice": "Label 6"
                          }, 
                          "from": {
                              "x": -225, 
                              "y": -126, 
                              "start": 9, 
                              "end": 12, 
                              "text": "Businesses"
                          }, 
                          "to": {
                              "x": 24, 
                              "y": -93, 
                              "start": 32, 
                              "end": 35, 
                              "text": "Group buying website"
                          }
                      }, 
                      {
                          "result": {
                              "Single-choice": "Label 4"
                          }, 
                          "from": {
                              "x": -233, 
                              "y": 75, 
                              "start": 18, 
                              "end": 21, 
                              "text": "1688"
                          }, 
                          "to": {
                              "x": 24, 
                              "y": -93, 
                              "start": 32, 
                              "end": 35, 
                              "text": "Group buying website"
                          }
                      }
                  ]
              ]
          }
      }