Data Upload in DataWorks allows you to upload local files, Data Analysis spreadsheets, Object Storage Service (OSS) files, and HTTP files to engines such as MaxCompute, EMR Hive, Hologres, and StarRocks. This topic describes how to use the Data Upload feature.
Precautions
If your operations involve cross-border data transfers (such as transferring data out of the Chinese mainland), read the Relevant Compliance Statement beforehand. Failure to comply may result in upload failures and legal liability.
Ensure table headers are in English before uploading. Chinese headers may cause parsing failures and upload errors.
Limitations
Resource group limits: The Data Upload feature requires both a resource group for scheduling and a Resource Group for Data Integration.
Only serverless resource groups (recommended), exclusive resource groups for scheduling, and exclusive resource groups for Data Integration are supported. You must configure the scheduling resource group and data integration resource group for the corresponding engine in .
The selected resource group must be bound to the DataWorks workspace where the destination table resides. Ensure that the data source is connected to the resource group.
NoteTo configure engine resource groups via Data Analysis, see System administration.
For information about how to configure network connectivity between the data source and the resource group, see Network connectivity solutions.
For information about how to configure the workspace to which the exclusive resource group is bound, see Use an exclusive resource group for scheduling and Use exclusive resource groups for Data Integration.
Table limits:
You can only upload data to tables that you own (where you are the table owner).
You can only upload data to internal tables or tables in the default catalog (applicable to StarRocks).
Billing
Data upload incurs the following fees:
Data transmission fees.
Computing and storage fees (if a new table is created).
These fees are charged by the engine. For details, see the billing documentation of the corresponding engine: MaxCompute Billing, Hologres Billing, E-MapReduce Billing, and EMR Serverless StarRocks Billing.
Go to the Data Upload page
Go to the Upload and Download page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, click Go to Data Upload and Download.
Click the
icon in the left navigation pane to go to the Upload Data page.Click Upload Data and upload the target data by following the on-screen instructions.
Select data to upload
You can upload local files, spreadsheets, Object Storage Service (OSS) files, and HTTP files. Select a data source based on your business requirements.
When uploading files, specify whether to ignore dirty data.
Yes: The platform automatically ignores dirty data and continues the upload.
No: The platform does not ignore dirty data. The upload process terminates if dirty data is encountered.
Local file
Select this method to upload local files.
Data Source: Select Local File.
Specify Data to Be Uploaded: Drag the local file to the Select File area.
NoteSupported formats:
CSV,XLS,XLSX, andJSON. The maximum size for aCSVfile is5 GB. The maximum size for other files is100 MB.Only the first sheet is uploaded by default. To upload multiple sheets, save each sheet as a separate file.
SQLfiles are not supported.
Spreadsheet
Select this method if the data to be uploaded is a DataWorks Data Analysis Spreadsheet.
Data Source: Select Workbook.
Specify Data to Be Uploaded:
Select the spreadsheet to upload from the drop-down list next to Select File.
If the spreadsheet does not exist, click Create. Alternatively, go to the Data Analysis module to Create a spreadsheet and Import Data.
OSS
Select this method if the data to be uploaded is stored in OSS.
Prerequisites:
You have created an OSS bucket and stored the data in it.
Grant the account performing the upload access to the target bucket in Overview of permissions and access control before uploading.
Procedure:
Data Source: Select Object Storage OSS.
Specify Data to Be Uploaded:
In the Select Bucket drop-down list, select the bucket where the data is stored.
NoteData can only be uploaded from buckets located in the same region as the current DataWorks workspace.
In the Select File area, select the file to upload.
NoteSupported formats:
CSV,XLS,XLSX, andJSON.
HTTP file
Select this method if the data to be uploaded is an HTTP file.
Data Source: Select HTTP File.
Specify Data to Be Uploaded:
Parameter
Description
File URL
Enter the URL of the file.
NoteHTTP and HTTPS file addresses are supported.
File Type
The system automatically identifies the file type.
Supported formats:
CSV,XLS, andXLSX. The maximum size for aCSVfile is 5 GB. The maximum size for other files is 50 MB.Request Method
Supported methods: GET, POST, and PUT. The GET method is recommended. Select the method supported by your file server.
Advanced Parameters
You can configure Request Header and Request Body information in Advanced Parameters based on your business requirements.
Set the destination table
In the Configure Destination Table area, select the Target Engine for data upload and configure the related parameters.
Ensure you select the correct data source environment (PROD or DEV). Selecting the incorrect environment will result in data being uploaded to the wrong destination.
MaxCompute
To upload data to an internal table in MaxCompute, configure the parameters described in the following table.
Parameter | Description | |
MaxCompute Project Name | Select a MaxCompute data source in the current region. If the required data source is not found, Associate a MaxCompute compute resource in the current workspace to generate a data source with the same name. | |
Destination Table | Select Existing Table or Create Table. | |
Select destination table | The table where the uploaded data will be stored. Keyword search is supported. Note You can only upload data to tables that you own (where you are the table owner). For details, see Limitations. | |
Upload Mode | Select how to write data to the destination table.
| |
Table Name | Enter a new table name. Note DataWorks uses the MaxCompute account configured in the compute resource to create the table. | |
Table Type | Select Non-partitioned Table or Partitioned Table as needed. If you select a partitioned table, specify the partition fields and their values. | |
Lifecycle | Specify the lifecycle of the table. The table will be recycled after the specified period. For more information about the table lifecycle, see Lifecycle and Lifecycle management operations. | |
EMR Hive
To upload data to an internal table in EMR Hive, configure the parameters described in the following table.
Parameter | Description |
Data Source | Select the EMR Hive Data Source (Alibaba Cloud Instance Mode) bound to the workspace in the current region. |
Destination Table | Data can only be uploaded to an Existing Table. |
Select destination table | The table where the uploaded data will be stored. Keyword search is supported. Note
|
Upload Mode | Select how to add the uploaded data to the destination table.
|
Hologres
To upload data to an internal table in Hologres, configure the parameters described in the following table.
Parameter | Description |
Data Source | Select the Hologres data source bound to the workspace in the current region. If the required data source is not found, Associate a Hologres computing resource in the current workspace to generate a data source with the same name. |
Destination Table | Data can only be uploaded to an Existing Table. |
Select destination table | The table where the uploaded data will be stored. Keyword search is supported. Note
|
Upload Mode | Select how to add the uploaded data to the destination table.
|
Primary Key Conflict Strategy | If the uploaded data causes a primary key conflict in the destination table, select a handling strategy.
|
StarRocks
To upload data to a table in the default catalog of StarRocks, configure the parameters described in the following table.
Parameter | Description |
Data Source | Select the StarRocks data source bound to the workspace in the current region. |
Destination Table | Data can only be uploaded to an Existing Table. |
Select destination table | The table where the uploaded data will be stored. Keyword search is supported. Note
|
Upload Mode | Select how to add the uploaded data to the destination table.
|
Advanced Parameters | You can configure Stream Load request parameters. |
Preview file data to upload
After setting the destination table, you can adjust the file encoding and data mapping based on the data preview.
The preview displays only the first 20 records.
File Encoding Format: If characters appear garbled, change the encoding format. Supported formats:
UTF-8,GB18030,Big5,UTF-16LE, andUTF-16BE.Preview data and set destination table fields:
Upload data to an existing table: Map file columns to destination table fields. Field mapping must be completed before uploading. Mapping methods include Mapping by Column Name and Mapping by Order. You can also customize the field names in the destination table after mapping.
NoteUnmapped source data is grayed out and will not be uploaded.
Duplicate mapping relationships are not allowed.
Field names and field types cannot be empty.
Upload data to a new table: You can use Intelligent Field Generation to automatically fill in field information or manually modify the information.
NoteField names and field types cannot be empty.
EMR Hive, Hologres, and StarRocks engines do not support creating new tables during data upload.
Ignore First Row: Specify whether to upload the first row of the file data (usually column names) to the destination table.
Selected: Select this option if the first row contains column headers. The first row will be excluded from the upload.
Cleared: Clear this option if the first row contains actual data. The first row will be included in the upload.
Upload data
After previewing the data, click Upload Data in the lower-left corner to upload the data.
Subsequent operations
After the upload is complete, click the
icon in the left navigation pane to return to the Data Upload page. Locate the upload task to perform the following operations:
Continue upload: Click Continue Upload in the Actions column to upload data again.
Data query: Click Query Data in the Actions column to query and analyze the data.
View upload data details: Click the Table Name to go to Data Map and view details about the destination table. For details, see Metadata retrieval.
Cross-border compliance statement
If your operations involve cross-border data transfers (for example, transferring data from the Chinese mainland to an overseas region, or between different countries/regions), read the relevant compliance statement beforehand. Failure to comply may result in upload failures and legal liability.
Cross-border data operations involve transferring your cloud business data to the region you selected or the product deployment region. Ensure that your operations comply with the following requirements:
You have the authority to process the relevant cloud business data.
You have adopted sufficient data security protection technologies and strategies.
The data transmission complies with relevant laws and regulations. For example, the transferred data does not contain any content restricted or prohibited from transmission or disclosure by applicable laws.
Alibaba Cloud reminds you to consult with professional legal or compliance advisors if your data upload involves cross-border transmission. Ensure that the cross-border data transmission complies with the requirements of applicable laws, regulations, and regulatory policies (such as obtaining valid authorization from personal information subjects, signing and filing relevant contract clauses, and completing security assessments).
If you fail to comply with this compliance statement when performing cross-border data operations, you shall bear the corresponding legal consequences. You shall also be liable for any losses suffered by Alibaba Cloud and its affiliates.
References
DataStudio also supports uploading local CSV files or text files to MaxCompute tables. For details, see Import data to a MaxCompute table.
For more information about operations on MaxCompute tables, see Create and use MaxCompute tables.
For more information about operations on Hologres tables, see Create Hologres Table.
For more information about operations on EMR tables, see Create an EMR table.
FAQ
Resource group configuration issues.
Error message: A resource group is required for the current file source or target engine. Contact the workspace administrator to configure the resource group.
Solution: Configure the resource group used by the engine via Data Analysis. For details, see System administration.
Resource group binding issues.
Error message: The resource group configured for global data upload is not bound to the workspace containing the destination table. Contact the workspace administrator to bind the resource group.
Solution: Bind the resource group configured in System Management to the workspace.