iTAG of Machine Learning Platform for AI (PAI) provides labeling templates for Named Entity Recognition (NER), text classification, and relationship analysis for named entities. When you create a text labeling job, you can select a labeling template based on your business scenario. This topic describes the scenarios of text labeling templates and the data structures of input and output data for these templates.
Background information
NER
NER is used to drag a selection box over named entities and label the named entities.
- Scenarios
This labeling template applies to scenarios such as keyword recognition for commodity names and news content.
- Data structures
- Input data
Each row in the .manifest file of input data contains an object. Each row must contain the source field.
{"data":{"source":"Alibaba Group acquired Vendio and Auctiva, two e-commerce platforms that serve American small enterprises. In the same month, Alibaba Group launched mobile apps for Taobao."}} ...
- Output data
Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
{ "data": { "source": "Alibaba Group acquired Vendio and Auctiva, two e-commerce platforms that serve American small enterprises. In the same month, Alibaba Group launched mobile apps for Taobao." }, "label-1430082002522152960": { "results": [ { "objects": [ { "result": { "Text content": [ "Label 1" ] }, "color": null, "id": null, "text": "Optical character recognition (OCR) result 1", "start": 49, "end": 51 }, { "result": { "Text content": [ "Label 2", "Label 3" ] }, "color": null, "id": null, "text": "OCR result 2", "start": 34, "end": 40 }, ], "empty": false } ] } }
- Input data
Text classification
Text classification is used to find one or more labels that match input text from a set of labels and add the labels to the text. This template supports single-label and multi-label text classification.
- Scenarios
This labeling template applies to scenarios such as news recommendation, knowledge management, and junk content filtering.
- Data structures
- Input data
Each row in the .manifest file of input data contains an object. Each row must contain the source field.
{"data":{"source":"Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan."}} ...
- Output data
Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
{ "data": { "source": "Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan." }, "label-1432989439570944000": { "results": [ { "questionId": "2", "data": [ "Label 2", "Label 1" ], "markTitle": "Multiple-choice", "type": "survey/multivalue" } ] } }
- Input data
Relationship analysis for named entities
Relationship analysis for named entities is used to label existing relationships among named entities. This template applies to scenarios in which triples and knowledge graphs are used to structure information.
- Scenarios
This labeling template applies to scenarios such as knowledge graphs.
- Data structures
- Input data
Each row in the .manifest file of input data contains an object. Each row must contain the source field.
{"data":{"source":"Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan."}} ...
- Output data
Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
{ "data": { "source": "Alibaba Group changed the name of its platform that serves Chinese businesses to 1688. In the same month, Alibaba Group launched a group buying website called Juhuasuan." }, "label-1435488346167255040": { "results": [ { "objects": [ { "result": { "Multiple-choice": [ "Label 3" ] }, "color": null, "id": null, "text": "Group buying website", "start": 32, "end": 35 }, { "result": { "Multiple-choice": [ "Label 2" ] }, "color": null, "id": null, "text": "1688", "start": 18, "end": 21 }, { "result": { "Multiple-choice": [ "Label 1" ] }, "color": null, "id": null, "text": "Businesses", "start": 9, "end": 12 } ], "empty": false }, [ { "result": { "Single-choice": "Label 4" }, "from": { "x": -225, "y": -126, "start": 9, "end": 12, "text": "Businesses" }, "to": { "x": -233, "y": 75, "start": 18, "end": 21, "text": "1688" } }, { "result": { "Single-choice": "Label 6" }, "from": { "x": -225, "y": -126, "start": 9, "end": 12, "text": "Businesses" }, "to": { "x": 24, "y": -93, "start": 32, "end": 35, "text": "Group buying website" } }, { "result": { "Single-choice": "Label 4" }, "from": { "x": -233, "y": 75, "start": 18, "end": 21, "text": "1688" }, "to": { "x": 24, "y": -93, "start": 32, "end": 35, "text": "Group buying website" } } ] ] } }
- Input data