This topic describes how to create an E-MapReduce (EMR) table.
Background information
After you add an EMR cluster to DataWorks as a data source, the Data Map service of DataWorks creates a crawler to collect the metadata of the cluster. If no database is available after you add an EMR data source, go to the DataMap page and use the crawler to collect the metadata of the cluster. For more information, see Collect metadata from an EMR data source.
Procedure
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
On the DataStudio page, move the pointer over the icon and choose .
You can also find the workflow in which you want to create an EMR table, right-click EMR, and then select Create Table.
In the Create Table dialog box, configure the parameters.
Click Create. The configuration tab of the table appears.
In the Basic attributes section, configure the parameters. The following table describes the parameters.
Parameter
Description
Level 1 theme
The name of the level-1 folder in which the table resides.
NoteThe level-1 and level-2 folders show the table locations in DataWorks to help you easily manage tables.
Level 2 theme
The name of the level-2 folder in which the table resides.
Create a theme
Click Create a theme to go to the Folder Management tab. On the Folder Management tab, you can create level-1 and level-2 folders.
Refresh
After you create a folder, click Refresh.
Description
The description of the table.
In the Physical model design section, configure the parameters. The following table describes the parameters.
Parameter
Description
Layer
Select a level and a business category from the drop-down lists based on your business requirements. To create levels and business categories, click Create Level to go to the Level Management tab and create levels and business categories. You can perform this operation only if you are the workspace administrator. After you create levels and business categories, click Refresh.
Physical classification
Partition type
Valid values: Partition table and Non-partitioned table.
Table type
Valid values: Internal tables and External tables.
Select the storage format
Select a storage format for files in the table based on your business requirements.
In the Table structure design section, configure the parameters. The following table describes the parameters.
Parameter
Description
Add fields
To add a field, click Add fields, configure the field information, and then click Save.
Move up
You can click the buttons to adjust the field sequence of the table. If you want to adjust the sequence of fields in an existing table, you must delete the table and create another table that has the same name. You are not allowed to adjust the sequence of fields in an existing table in the production environment.
Move down
Field name
The name of a field. The name can contain letters, digits, and underscores (_).
Field type
The data type of a field. EMR supports the following data types: TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, DECIMAL, VARCHAR, CHAR, STRING, BINARY, DATETIME, DATE, TIMESTAMP, BOOLEAN, ARRAY, MAP, and STRUCT.
Length/Settings
The length limit of a field. If the data type that you specified for a field requires a length limit, you must configure this parameter.
Description
The description of a field.
Primary key
Specifies whether a field serves as the primary key. The primary key is a business concept that ensures the uniqueness of a record for your business. DataWorks has no limits on the primary key.
Edit
You can click this button for a field to edit the field and click Save.
Delete
You can click this button for a field to delete the field.
NoteIf you want to delete a field from an existing table and then commit the table, you must delete the table and create another table that has the same name. You are not allowed to perform this operation in the production environment.
Add
If you set the Partition type parameter to Partition table in the Physical model design section, you must configure a partition for the table.
You can click this button to add a partition to the current table. If you want to add a partition to an existing table and then commit the table, you must delete the table and create another table that has the same name. You are not allowed to add a partition to an existing table in the production environment.
Click the icon in the top toolbar to commit the EMR table to the production environment.
If you use a workspace in standard mode, commit the table to the development environment and the production environment in sequence.
NoteYou must select a resource group for scheduling when you commit the table. If you use a serverless resource group to commit the table, DataWorks issues a table creation task to a compute engine and displays the run logs. If an error occurs when you commit the table, you can use the run logs to troubleshoot the issue. If no serverless resource groups are available, you can purchase and configure a serverless resource group. For more information, see Create and use a serverless resource group.