All Products
Search
Document Center

Alibaba Cloud Model Studio:import data

Last Updated:Nov 27, 2024

This topic describes how to import unstructured data and structured data to Alibaba Cloud Model Studio.

Procedure

When you import data to Model Studio, you need to check whether the data to import is unstructured data or structured data.

  • Unstructured data: your document is in pdf, docx, doc, txt, md, pptx, ppt, png, jpg, jpeg, bmp, gif formats.

  • Structured data: your document is in xlsx, xls formats.

You can import data by using the console or the API. However, the API supports only unstructured data.
For more information about the API, see AddFile.

Unstructured data

  1. Go to the Data Management page of the console and select the Unstructured Data tab.

  2. In the Category Management section on the left, select the desired category for data import.

    Select the Default Category or click image to create a new category. The number of categories is not limited.
    You can upload up to 10,000 documents to each workspace.

    image

  3. Click Import Data to go to the Import Data page.

  4. Select Upload Local File or OSS as the Import Method.

    Model Studio does not support OSS buckets in the following classes: Archive, Cold Archive, and Deep Cold Archive. Buckets with content encryption and private buckets are supported.
    Before importing, you must first add the bailian-datahub-access tag to the bucket. For more information, see Import data from OSS.
  5. Configure Document Recognition. Use the default value Intelligent Document Parsing.

    The parser can detect and extract text from images within the document to create text summaries. These summaries, along with other content, are segmented and transformed into vectors for knowledge base retrieval.
  6. (Optional) Configure tags for the document.

    When calling applications by using API, tags can be specified in the request parameter tags to filter related documents, enhancing retrieval efficiency.
  7. Click Confirm to initiate the document parsing and importing process. This may take some time.

    Document parsing converts the uploaded document into a format that Model Studio can process. During peak periods, this process may take longer.
  8. Once parsing and importing are complete, click Details on the right side of the corresponding document to review the imported content.

Structured data

  1. Go to the Data Management page of the console and select the Structured Data tab.

  2. Create a new data table or select an existing one.

    You can create up to 1,000 data tables in each workspace. Each table can contain up to 10,000 rows, including the table header. Exceeding this limit will result in a failed import, so you may need to split the data in advance.

    Create a new data table

    Click image to create a data table.

    image

    1. Enter a name for the data table.

    2. Configure the table by Upload Excel File or Custom Header.

      • Upload Excel File: Model Studio automatically detects the table header in the uploaded Excel file and create the data table structure accordingly, importing the remaining content as data records.

      • Custom Header: Column Name and Type is necessary and Description is optional.

        Note
        • Once the data table is created, you cannot modify the Column Name, Description, or Type.

        • Make sure the table schema matches the schema of the data to be imported. For example, if the data table to be imported has 2 columns, the structure here must also have 2 fields with corresponding column names. Click New Columns or Delete in the Actions column to adjust the fields.

        • When you set Type to link, make sure the link directs to an image file that is publicly accessible and valid. Otherwise, the knowledge base cannot recognize the image.

          Example link format: https://example.com/downloads/pic.jpg
          When creating a knowledge base, the link type field is used to generate an image index. Model Studio accesses the image, extracts its features, and saves them as vectors after image embedding. These vectors are used for similarity comparison during knowledge base retrieval.

        image

    3. Upload your document.

      1. Click image to select and upload an Excel document (xlsx or xls format).

        The document must have a table header that matches the header structure of the current data table. Otherwise, the import will fail.
      2. Click Preview to review the imported data.

    4. Click Confirm. The new data table appears in the Table Management pane on the left.

      image

    Select an existing data table

    Select an existing table from the Table Management pane on the left and click Import Data.

    1. Select Upload and Overwrite or Incremental Upload as the Import Type.

      You can click Download Template to download a blank document with the table header. Then, insert data to the template and upload it directly.
    2. Click image to select and upload an Excel document (xlsx or xls format).

      The document must have a table header that matches the header structure of the current data table. Otherwise, the import will fail.
    3. Click Preview to review the imported data.

What to do next

After importing data to Model Studio, you can use the data to build a knowledge index. For more information, see Knowledge index.