This topic uses customer.tbl as an example to describe how to convert text files to Parquet files.

  1. Create an OSS schema.
    CREATE SCHEMA dla_oss_db with DBPROPERTIES(
      catalog='oss',
      location 'oss://dlaossfile1/TPC-H/'
      )
  2. Create a table named customer_txt in DLA and set LOCATION to the path of customer.tbl in OSS.
    CREATE EXTERNAL TABLE customer_txt (
         c_custkey int,
         c_name string,
         c_address string,
         c_nationkey int,
         c_phone string,
         c_acctbal double,
         c_mktsegment string,
         c_comment string
     )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
          STORED AS TEXTFILE LOCATION 'oss://dlaossfile1/TPC-H/customer/customer.tbl'
  3. Create the target table customer_parquet in DLA and set LOCATION to the required path in OSS.
    Note LOCATION must be an existing directory in OSS and ended with /.
    CREATE EXTERNAL TABLE customer_parquet (
         c_custkey int,
         c_name string,
         c_address string,
         c_nationkey int,
         c_phone string,
         c_acctbal double,
         c_mktsegment string,
         c_comment string
     )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
          STORED AS PARQUET LOCATION 'oss://dlaossfile1/TPC-H/customer_parquet/'

    STORED AS PARQUET: indicates that the table is stored in Parquet format.

  4. Run the INSERT...SELECT statement to insert data from the customer_txt table to the customer_parquet table.
    INSERT INTO customer_parquet SELECT * FROM customer_txt;
  5. View the data in table customer_parquet.After the INSERT SELECT statement is executed, view the Parquet file created in OSS.