All Products
Search
Document Center

MaxCompute:Create tables

Last Updated:Oct 28, 2024

Table is the unit for storing data in MaxCompute. You must process data in MaxCompute based on tables. This topic describes how to create tables in MaxCompute.

Prerequisites

Make sure that the following requirements are met:

Usage notes

The data types of the columns in the table that you want to create must be consistent with the data types of the columns in the data files. If they are inconsistent, data cannot be imported to the table.

Step 1: Design tables

Determine the table type, table structure, and data type of each column based on the content of the data that you want to import and the expected result data.

The sample files used in this topic contain the following data of mortgagors:

-- Data that is used to create a non-partitioned table. 
44,blue-collar,married,basic.4y,unknown,yes,no,cellular,aug,thu,210,1,999,0,nonexistent,1.4,93.444,-36.1,4.963,5228.1,0
53,technician,married,unknown,no,no,no,cellular,nov,fri,138,1,999,0,nonexistent,-0.1,93.2,-42,4.021,5195.8,0
28,management,single,university.degree,no,yes,no,cellular,jun,thu,339,3,6,2,success,-1.7,94.055,-39.8,0.729,4991.6,1
-- Data that is used to create a partitioned table. 
53,technician,married,unknown,no,no,cellular,nov,fri,138,1,999,0,nonexistent,-0.1,93.2,-42,4.021,5195.8,0
28,management,single,university.degree,yes,no,cellular,jun,thu,339,3,6,2,success,-1.7,94.055,-39.8,0.729,4991.6,1
39,services,married,high.school,no,no,cellular,apr,fri,185,2,999,0,nonexistent,-1.8,93.075,-47.1,1.405,5099.1,0

You can count the number of columns, identify the data type of each column, and name each column based on the meaning of each column of data. The following table describes the information that you can obtain from the sample files.

Item

Description

Number of columns

Count the number of columns in each sample file and determine the number of columns in the table that you want to create:

  • The sample file that is used to create a non-partitioned table contains 21 columns. You can create a non-partitioned table that contains 21 columns based on the sample file.

    Note

    If the number of columns in the table exceeds the number of columns in the sample file, the additional columns in the table are filled with NULL. If the number of columns in the table is less than the number of columns in the sample file, the excessive data in the sample file is discarded.

  • Each of the sample files that are used to create a partitioned table contains 20 columns. You can create a partitioned table that contains 21 columns with one column as the partition key column based on the sample files. You must define partitions when you create the table.

Fields of the table

Identify the fields in each sample file and determine the fields in the table that you want to create:

  • Fields of the non-partitioned table:

    • age: age

    • job: job type

    • marital: marital status

    • education: educational level

    • credit: have a credit card

    • housing: mortgagors

    • loan: have loans

    • contact: contact information

    • month: month

    • day_of_week: day of the week

    • duration: duration

    • campaign: number of contacts for the current campaign

    • pdays: time elapsed after the last contact

    • previous: number of previous contacts

    • poutcome: results of previous marketing activities

    • emp_var_rate: employment variation rate

    • cons_price_idx: consumer price index

    • cons_conf_idx: consumer confidence index

    • euribor 3m: 3-month Euro Interbank Offered Rate (Euribor)

    • nr_employed: number of employees

    • fixed_deposit: have a time deposit

  • Fields of the partitioned table:

    • age: age

    • job: job type

    • marital: marital status

    • education: educational level

    • housing: mortgagors

    • loan: have loans

    • contact: contact information

    • month: month

    • day_of_week: day of the week

    • duration: duration

    • campaign: number of contacts for the current campaign

    • pdays: time elapsed after the last contact

    • previous: number of previous contacts

    • poutcome: results of previous marketing activities

    • emp_var_rate: employment variation rate

    • cons_price_idx: consumer price index

    • cons_conf_idx: consumer confidence index

    • euribor 3m: 3-month Euribor

    • nr_employed: number of employees

    • fixed_deposit: have a time deposit

    • credit: have a credit card (used as the partition key column)

Data types

Identify the data type of each column in the sample files:

  • age: BIGINT

  • job: STRING

  • marital: STRING

  • education: STRING

  • credit: STRING

  • housing: STRING

  • loan: STRING

  • contact: STRING

  • month: STRING

  • day_of_week: STRING

  • duration: STRING

  • campaign: BIGINT

  • pdays: DOUBLE

  • previous: DOUBLE

  • poutcome: STRING

  • emp_var_rate: DOUBLE

  • cons_price_idx: DOUBLE

  • cons_conf_idx: DOUBLE

  • euribor3m: DOUBLE

  • nr_employed: DOUBLE

  • fixed_deposit: BIGINT

Step 2: Create tables

Create a non-partitioned table named bank_data and a partitioned table named bank_data_pt based on the sample files to store business data. Create non-partitioned tables named result_table1 and result_table2 to store result data. For more information about operations on tables and partitions, see Table operations and Partition and column operations. Perform the following operations to create the tables:

  1. Optional:On the MaxCompute client, run the following command to switch to the MaxCompute project that you want to use.

    If the MaxCompute project is opened when you start the MaxCompute client, ignore this step.

    use doc_test_dev;

    If the command is successfully run, the following information is returned:

    doc_test_dev.default>
  2. On the MaxCompute client, create the bank_data, bank_data_pt, result_table1, and result_table2 tables.

    For more information about the CREATE TABLE syntax, see Create a table.

    • Create the non-partitioned table bank_data. Sample statement:

      create table if not exists bank_data
      (
       age             BIGINT comment 'age',
       job             STRING comment 'job type',
       marital         STRING comment 'marital status',
       education       STRING comment 'education level',
       credit          STRING comment 'have a credit card',
       housing         STRING comment 'mortgagors',
       loan            STRING comment 'have loans',
       contact         STRING comment 'contact information',
       month           STRING comment 'month',
       day_of_week     STRING comment 'day of the week',
       duration        STRING comment 'duration',
       campaign        BIGINT comment 'number of contacts for the current campaign',
       pdays           DOUBLE comment 'time elapsed after the last contact',
       previous        DOUBLE comment 'number of previous contacts',
       poutcome        STRING comment 'results of previous marketing activities',
       emp_var_rate    DOUBLE comment 'employment variance rate',
       cons_price_idx  DOUBLE comment 'consumer price index',
       cons_conf_idx   DOUBLE comment 'consumer confidence index',
       euribor3m       DOUBLE comment '3-month Euribor',
       nr_employed     DOUBLE comment 'number of employees',
       fixed_deposit   BIGINT comment 'have a time deposit'
      );
    • Create the partitioned table bank_data_pt and add partitions to the table. Sample statement:

      create table if not exists bank_data_pt
      (
       age             BIGINT comment 'age',
       job             STRING comment 'job type',
       marital         STRING comment 'marital status',
       education       STRING comment 'education level',
       housing         STRING comment 'mortgagors',
       loan            STRING comment 'have loans',
       contact         STRING comment 'contact information',
       month           STRING comment 'month',
       day_of_week     STRING comment 'day of the week',
       duration        STRING comment 'duration',
       campaign        BIGINT comment 'number of contacts for the current campaign',
       pdays           DOUBLE comment 'time elapsed after the last contact',
       previous        DOUBLE comment 'number of previous contacts',
       poutcome        STRING comment 'results of previous marketing activities',
       emp_var_rate    DOUBLE comment 'employment variance rate',
       cons_price_idx  DOUBLE comment 'consumer price index',
       cons_conf_idx   DOUBLE comment 'consumer confidence index',
       euribor3m       DOUBLE comment '3-month Euribor',
       nr_employed     DOUBLE comment 'number of employees',
       fixed_deposit   BIGINT comment 'have a time deposit'
      )partitioned by (credit STRING comment 'have a credit card');
      
      alter table bank_data_pt add if not exists partition (credit='yes') partition (credit='no') partition (credit='unknown');
    • Create the non-partitioned table result_table1. Sample statement:

      create table if not exists result_table1
      (
       education   STRING comment 'education level',
       num         BIGINT comment 'number of persons'
      );
    • Create the non-partitioned table result_table2. Sample statement:

      create table if not exists result_table2
      (
       education   STRING comment 'education level',
       num         BIGINT comment 'number of persons',
       credit      STRING comment 'have a credit card'
      );

Step 3: Check the table creation results

Perform the following operations to check whether the tables are created and whether the table schemas are correct:

  1. On the MaxCompute client, run the following command to check whether the tables that you created are in the MaxCompute project as expected:

    show tables;

    The names of the created tables are returned:

    ALIYUN$****:bank_data
    ALIYUN$****:bank_data_pt
    ALIYUN$****:result_table1
    ALIYUN$****:result_table2
  2. On the MaxCompute client, execute the following statements to check whether the table schemas are correct:

    -- View the schema of bank_data. 
    desc bank_data;
    -- View the schema of bank_data_pt. 
    desc bank_data_pt;
    -- View the partitions of bank_data_pt. 
    show partitions bank_data_pt;
    -- View the schema of result_table1. 
    desc result_table1;
    -- View the schema of result_table2. 
    desc result_table2;

    Returned result of the desc bank_data_pt; statement:

    desc bank_data_pt;
    -- The following result is returned: 
    +------------------------------------------------------------------------------------+
    | Owner: ALIYUN$****          | Project: doc_test_dev                                |
    | TableComment:                                                                      |
    +------------------------------------------------------------------------------------+
    | CreateTime:               2021-05-17 14:02:21                                      |
    | LastDDLTime:              2021-05-17 14:02:21                                      |
    | LastModifiedTime:         2021-05-17 14:02:21                                      |
    +------------------------------------------------------------------------------------+
    | InternalTable: YES      | Size: 0                                                  |
    +------------------------------------------------------------------------------------+
    | Native Columns:                                                                    |
    +------------------------------------------------------------------------------------+
    | Field           | Type       | Label | Comment                                     |
    +------------------------------------------------------------------------------------+
    | age             | bigint     |       | age                                        |
    | job             | string     |       | job type                                    |
    | marital         | string     |       | marital status                                        |
    | education       | string     |       | educational level                                    |
    | housing         | string     |       | mortgagors                                  |
    | loan            | string     |       | have loans                                  |
    | contact         | string     |       | contact information                                    |
    | month           | string     |       | month                                        |
    | day_of_week     | string     |       | day of the week                                      |
    | duration        | string     |       | duration                                    |
    | campaign        | bigint     |       | number of contacts for the current campaign                          |
    | pdays           | double     |       | time elapsed after the last contact                      |
    | previous        | double     |       | number of previous contacts                        |
    | poutcome        | string     |       | results of previous marketing activities                          |
    | emp_var_rate    | double     |       | employment variation rate                                |
    | cons_price_idx  | double     |       | consumer price index                              |
    | cons_conf_idx   | double     |       | consumer confidence index                              |
    | euribor3m       | double     |       | 3-month Euribor                                |
    | nr_employed     | double     |       | number of employees                                    |
    | fixed_deposit   | bigint     |       | have a time deposit                              |
    +------------------------------------------------------------------------------------+
    | Partition Columns:                                                                 |
    +------------------------------------------------------------------------------------+
    | credit          | string     | have a credit card                                        |
    +------------------------------------------------------------------------------------+
    show partitions bank_data_pt;
    -- The following result is returned: 
    credit=no
    credit=unknown
    credit=yes

What to do next

After you create the tables and confirm that the table schemas are correct, you can import data in the sample files to the tables. For more information about how to import data, see Import data to tables.