You can use Alibaba Cloud Data Online Migration or the ossimport tool to migrate data from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance.
Prerequisites
An Amazon Redshift cluster is created.
The Amazon S3 service that is used to export Amazon Redshift data is activated.
Alibaba Cloud Object Storage Service (OSS) is activated. For more information, see What is OSS?
An OSS bucket is created. For more information, see Create a bucket.
NoteTo facilitate subsequent data import, we recommend that you create an OSS bucket in the same region as the AnalyticDB for PostgreSQL instance.
An AnalyticDB for PostgreSQL instance is created. For information about how to select instance specifications, see the "Specification selection" section of this topic.
Specification selection
This section describes how to select specifications for an AnalyticDB for PostgreSQL instance based on the specifications of the source Amazon Redshift cluster.
An Amazon Redshift cluster consists of a leader node and multiple compute nodes.
Leader node: acts as the coordinator node within an AnalyticDB for PostgreSQL instance. The leader node communicates with the client, analyzes and develops query execution plans, and performs database operations.
Compute node: acts as a set of compute nodes within an AnalyticDB for PostgreSQL instance. Each compute node in an Amazon Redshift cluster is partitioned into slices. Each node slice acts as a compute node within an AnalyticDB for PostgreSQL instance and performs computing for data storage and query.
When you create an AnalyticDB for PostgreSQL instance, you can select the node specifications of the instance based on specifications of each node slice in the source Amazon Redshift cluster.
Example
An Amazon Redshift cluster consists of four compute nodes. Each compute node consists of two node slices. The specifications of each node slice are 2 cores, 16 GB memory, and 1 TB storage.
When you create an AnalyticDB for PostgreSQL instance, you can set the number of compute nodes to 8, node specifications to 2 cores and 16 GB memory, and single-node storage to 1,000 GB.
For information about how to create an AnalyticDB for PostgreSQL instance, see Create an instance.
We recommend that you set the storage type to Enhanced SSD (ESSD). ESSDs provide better I/O performance than ultra disks.
Procedure
Step 1: Export data from an Amazon Redshift cluster to an Amazon S3 bucket
You can execute the UNLOAD statement to export data. For more information, see UNLOAD.
Syntax:
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization
[ option [ ... ] ]
where option is
{ [ FORMAT [ AS ] ] CSV | PARQUET
| PARTITION BY ( column_name [, ... ] ) [ INCLUDE ]
| MANIFEST [ VERBOSE ]
| HEADER
| DELIMITER [ AS ] 'delimiter-char'
| FIXEDWIDTH [ AS ] 'fixedwidth-spec'
| ENCRYPTED [ AUTO ]
| BZIP2
| GZIP
| ZSTD
| ADDQUOTES
| NULL [ AS ] 'null-string'
| ESCAPE
| ALLOWOVERWRITE
| CLEANPATH
| PARALLEL [ { ON | TRUE } | { OFF | FALSE } ]
| MAXFILESIZE [AS] max-size [ MB | GB ]
| REGION [AS] 'Amazon-region' }
We recommend that you use the FORMAT AS PARQUET or CSV parameter to specify the format of the exported data.
To allow more files to be generated and improve data export efficiency, we recommend that you specify the PARALLEL ON parameter to export data in parallel.
To generate more files, we recommend that you specify the MAXFILESIZE parameter to limit the size of files. The recommended value is an integer multiple of the number of nodes within an AnalyticDB for PostgreSQL instance. This improves the data import efficiency when data is subsequently imported in parallel from OSS foreign tables to the AnalyticDB for PostgreSQL instance.
Step 2: Synchronize data from Amazon S3 to OSS
You can use Alibaba Cloud Data Online Migration or the ossimport tool to synchronize data from Amazon S3 to OSS.
Data Online Migration
Log on to the Data Transport console.
Create the source and destination data addresses for data synchronization.
In the left-side navigation pane, choose
.Click Create Data Address to create a source data address.
In the Create Data Address panel, configure the parameters that are described in the following table and click OK.
Parameter Required Description Data Type Yes Select AWS S3. Data Name Yes Enter a name. The name must be 3 to 63 characters in length. The name cannot contain special characters, except for hyphens (-) and underscores (_). Endpoint Yes Enter a region where the source data address is located. For more information, see Regions and endpoints. Bucket Yes The name of an AWS Simple Storage Service (AWS S3) bucket. The bucket name cannot contain invalid characters, such as spaces, line breaks, and tab characters.
Prefix Yes - Migrate All Data: All data in the bucket is migrated.
When you migrate all data, you do not need to specify a prefix.
- Migrate Partial Data: Only the files in the directory that is specified by the prefix are migrated. The prefix must end with a forward slash (/). For example, you can specify
data/to/oss/
as a prefix.
Access Key Id and Secret Access Key Yes Specify the AccessKey pair that is used to migrate data. Delete the key pair after the migration job is completed. - Migrate All Data: All data in the bucket is migrated.
Click Create Data Address again to create a destination data address.
In the Create Data Address panel, configure the parameters that are described in the following table and click OK.
Parameter Required Description Data Type Yes Select OSS. Data Name Yes Enter a name. The name must be 3 to 63 characters in length. The name cannot contain special characters, except for hyphens (-) and underscores (_). Data Region Yes Select the region where the destination data address is located. Enable Transfer Acceleration No For Data Online Migration to use the transfer acceleration feature provided by Object Storage Service (OSS), you must enable the transfer acceleration feature for buckets. The transfer acceleration feature takes effect within 30 minutes after it is enabled. After you enable the feature, wait for 30 minutes before you create a migration job. Important If you use the transfer acceleration feature to access your OSS buckets, you are charged additional fees for transfer acceleration. For more information, see Transfer acceleration.OSS Endpoint Yes Select an endpoint based on the region where your data is stored. For more information, see Regions and endpoints.- http://oss-cn-endpoint.aliyuncs.com: indicates that you use an HTTP-based endpoint to access OSS over the Internet.
- https://oss-cn-endpoint.aliyuncs.com: indicates that you use an HTTPS-based endpoint to access OSS over the Internet.
Note If you want to migrate data from third-party cloud services to OSS, you must access OSS from a public endpoint over the Internet.Access Key Id and Access Key Secret Yes Specify the AccessKey pair that is used to migrate data. For more information, see Prepare for a migration job. OSS Bucket Yes Select a bucket to store the migrated data. OSS Prefix No An OSS prefix cannot start with a forward slash (/) but must end with a forward slash (/). For example, you can specify data/to/oss/
as the OSS prefix. If you want to migrate data to the root directory of the specified bucket, do not specify this parameter.Important If the name of a source file starts with a forward slash (/), you must specify an OSS prefix when you configure the destination data address. If no OSS prefix is specified, the migration job fails. For example, if the name of a file to be migrated is /test/test.png, you must specify an OSS prefix, such as oss/. After the migration job is completed, the file name changes from/test/test.png
to oss//test/test.png.
Create an online migration job.
In the left-side navigation pane, choose
.Click Create Job.
In the Create Job panel, read the terms of the migration service and select I agree to the above terms and activate the Data Transport service. Then, click Next.
In the Fee Reminder message, click Yes, Go Ahead.
In the Create Job panel, configure the parameters that are described in the following table and click Next.
Parameter
Required
Description
Job Name
Yes
The name of the migration job. The name must be 3 to 63 characters in length and can contain only lowercase letters, digits, and hyphens (-). It cannot start or end with a hyphen (-).
Source Data Address
Yes
The source data address that you created.
Destination Data Address
Yes
The destination data address that you created.
Specified Directory
No
The directories in which the files and subdirectories are migrated or not migrated. Valid values:
Do not filter: migrates all data from the source data address.
Exclude: does not migrate the files or subdirectories in the specified directories.
For example, you want to migrate all the directories in the
root_dir/
directory exceptroot_dir/bad_sub1/
androot_dir/bad_sub2/
. In this case, you can select this method and specifybad_sub1/
andbad_sub2/
.Contain: migrates only the files and subdirectories in the specified directories.
For example, you want to migrate only
root_dir/good_sub1/
androot_dir/good_sub2/
in theroot_dir/
directory. In this case, you can select this method and specifygood_sub1/
andgood_sub2/
.
NoteA directory name cannot start with a forward slash (/) or a backslash (\), and cannot contain double forward slashes (//), double periods (..), or double quotation marks ("). The character string that consists of all the specified directory names cannot exceed 10 KB in length.
A directory name must end with a forward slash (/). Example: docs/.
You can specify up to 20 directories that you want to include or exclude.
Migration Type
Yes
The data migration mode. Valid values:
Full: The system migrates all data from the source data address to the destination data address. You must configure the Start Time Point of File parameter. The system migrates all data whose last modification time is later than the specified time. After all data is migrated, the migration job ends.
If you change the data at the source data address after the migration job is complete, you can submit another job to migrate all data. In this case, the system migrates only the changed data.
Incremental: The system runs a migration job based on the values that you specified for the Migration Interval and Migration Times parameters.
You must configure the Start Time Point of File parameter. During the first migration, the system migrates all data whose last modification time is later than the specified time. After the first migration is complete, incremental migration is performed based on the value that you specified for the Migration Interval parameter. For each incremental migration, the system migrates only the data that is created or modified after the previous migration starts and before the current migration starts.
If you set the Migration Times parameter to N, full migration is performed once and then incremental migration is performed (N - 1) times.
For example, the current time is 08:00, March 10, 2019 and you set the Migration Interval parameter to 1, the Migration Times parameter to 5, and the Start Time Point of File parameter to 08:00, March 5, 2019. When the first migration starts, the system migrates all files whose last modification time is between 08:00, March 5, 2019 and 08:00, March 10, 2019. In this example, the first migration requires one hour to complete and the second migration starts at 10:00, March 10, 2019. The duration from 08:00 to 10:00 consists of one hour for the first migration process and one hour for the migration interval. During the second migration, the system migrates the files whose last modification time is between 08:00, March 10, 2019 and 10:00, March 10, 2019. The migration job consists of one full migration and four incremental migrations.
ImportantBefore full or incremental migration starts, the system compares the files at the source data address with the files at the destination data address. If a file at the destination data address has the same name as a file at the source data address, the file at the destination data address is overwritten when one of the following conditions is met:
The Content-Type values of the files at the source data address and the destination data address are different.
The last modification time of the file at the source data address is later than that of the file at the destination data address.
The size of the file at the source data address is different from that of the file at the destination data address.
Start Time Point of File
Yes
The time to filter data to be migrated. The system migrates the data that is created or modified after the specified time. Valid values:
All: migrates all files.
Assign: migrates only the files that are created or modified after the specified time.
For example, if you set the time to 08:00:00, November 1, 2018, only the files that are created or modified after 08:00:00, November 1, 2018 are migrated.
Migration Interval
This parameter is required if you set the Migration Type parameter to Incremental.
The default value is 1, and the maximum value is 24. Unit: hours.
Migration Times
This parameter is required if you set the Migration Type parameter to Incremental.
The default value is 1, and the maximum value is 30.
File Overwrite Method
Yes
The method that is used to process the file at the source data address whose name is the same as that of a file at the destination data address. Valid values:
LastModified: If a file at the source data address has the same name as a file at the destination data address, the system compares the last modification time that is indicated by the LastModified property of the files.
If the last modification time of the file at the source data address is later than that of the file at the destination data address, the file at the source data address is migrated and the file at the destination data address is overwritten.
If the last modification time of the file at the source data address is earlier than that of the file at the destination data address, the file at the source data address is not migrated and the file at the destination data address is retained.
If the two files have the same last modification time, the system checks the values of the Size and Content-Type properties of the files.
If the two files have the same Size and Content-Type values, the file at the source data address is not migrated. If the two files have different values of at least one property, the file at the source data address is migrated and the file at the destination data address is overwritten.
Condition: If a file at the source data address has the same name as a file at the destination data address, the system compares the values of the LastModified, Size, and Content-Type properties of the two files.
If all property values of the two files are the same, the file at the source data address is not migrated.
If the two files have different values of at least one property, the file at the source data address is migrated and the file at the destination data address is overwritten.
All: If a file at the source data address has the same name as a file at the destination data address, the system performs no comparison and directly overwrites the file at the destination address with the file at the source data address.
No: If a file at the source data address has the same name as a file at the destination data address, the system performs no comparison and directly skips the file at the source data address during the migration.
In the Performance step, go to the Data Prediction section and configure the Data Size and File Count parameters.
NoteTo ensure that the migration job is successful, we recommend that you estimate the amount of data that you want to migrate as accurately as possible. For more information, see Prepare for a migration job.
(Optional) In the Performance step, go to the Flow Control section, specify the Time Range and Max Flow(MB/s) parameters, and then click Add.
NoteTo ensure business continuity, we recommend that you configure the Time Range and Max Flow parameters based on traffic fluctuations.
Click Create and wait until the migration job is complete.
ossimport
Download and install ossimport in standalone mode. For more information, see Overview.
The ossimport tool in standalone mode has the following file structure:
ossimport ├── bin │ └── ossimport2.jar # The JAR package that contains the Master, Worker, TaskTracker, and Console modules. ├── conf │ ├── local_job.cfg # The job configuration file. │ └── sys.properties # The configuration file that contains system parameters. ├── console.bat # The Windows command line utility that is used to run tasks in a distributed manner. ├── console.sh # The Linux command line utility that is used to run tasks in a distributed manner. ├── import.bat # The script that automatically imports files based on the conf/local_job.cfg configuration file in Windows. The configuration file contains parameters that specify data migration operations such as start, migration, verification, and retry. ├── import.sh # The script that automatically imports files based on the conf/local_job.cfg configuration file in Linux. The configuration file contains parameters that specify data migration operations, such as start, migration, verification, and retry. ├── logs # The directory that contains logs. └ ── README.md # The file that provides a description of ossimport. We recommend that you read the file before you use ossimport.
Configure ossimport in standalone mode.
You need to modify only the following parameters in the conf/local_job.cfg configuration file:
srcType=s3 srcAccessKey=<Your AWS access key> srcSecretKey=<Your AWS secret key> srcDomain=<The domain name that corresponds to the region in which the Amazon S3 bucket is deployed> srcBucket=<The name of the Amazon S3 bucket> destAccessKey=<Your Alibaba Cloud AccessKey ID> destSecretKey=<Your Alibaba Cloud AccessKey secret> destDomain=<The endpoint that corresponds to the region in which the OSS bucket is deployed> destBucket=<The name of the OSS bucket> destPrefix= isSkipExistFile=true
For more information, see Overview.
Run ossimport to synchronize data to OSS. For information about how to use ossimport in standalone mode, see Standalone deployment.
Step 3: Create a destination table
Create a destination table in the AnalyticDB for PostgreSQL instance to load data from Amazon Redshift. The destination table must have the same schema as the source table. For more information, see CREATE TABLE.
Modify information about database objects such as schemas, tables, functions, and views to convert DDL statements. For more information, see the "Convert DDL statements" section of this topic.
Step 4: Import data from OSS to the AnalyticDB for PostgreSQL instance
You can use the COPY statement or an OSS foreign table to import data to AnalyticDB for PostgreSQL.
For information about how to use the COPY statement to import OSS data, see Use the COPY or UNLOAD statement to import or export data between OSS foreign tables and AnalyticDB for PostgreSQL tables.
For information about how to use an OSS foreign table to import OSS data, see the "Import OSS data to an AnalyticDB for PostgreSQL table" section of the Use OSS foreign tables for data lake analysis topic.
Convert DDL statements
The DDL statements of Amazon Redshift are slightly different from the DDL statements of AnalyticDB for PostgreSQL and must be converted before you migrate data.
CREATE SCHEMA
Execute the CREATE SCHEMA statement of AnalyticDB for PostgreSQL to create a schema. Example:
CREATE SCHEMA schema1 AUTHORIZATION xxxpoc;
GRANT ALL ON SCHEMA schema1 TO xxxpoc;
GRANT ALL ON SCHEMA schema1 TO public;
COMMENT ON SCHEMA model IS 'for xxx migration poc test';
CREATE SCHEMA oss_external_table AUTHORIZATION xxxpoc;
CREATE FUNCTION
AnalyticDB for PostgreSQL does not support specific SQL functions of Amazon Redshift. You must modify or rewrite the SQL functions. Examples:
Replace the
CONVERT_TIMEZONE(a,b,c)
function with the following function:timezone(b, timezone(a,c))
Replace the
GETDATE()
function with the following function:current_timestamp(0):timestamp
Replace or optimize user-defined functions (UDFs). The following sample code provides an example of a UDF of Amazon Redshift:
CREATE OR REPLACE FUNCTION public.f_jdate(dt timestamp without time zone) RETURNS character varying AS ' from datetime import timedelta, datetime if dt.hour < 4: d = timedelta(days=-1) dt = dt + d return str(dt.date())' LANGUAGE plpythonu IMMUTABLE; COMMIT;
Replace the preceding function with the following function:
to_char(a - interval '4 hour', 'yyyy-mm-dd')
Replace other Amazon Redshift functions.
For information about standard PostgreSQL functions, see Functions and Operators. You can modify or rewrite the SQL functions of Amazon Redshift that are not supported by AnalyticDB for PostgreSQL, such as the following functions:
CREATE TABLE
Modify the name of a table.
The name of an Amazon Redshift table can be up to 127 characters in length, whereas the name of an AnalyticDB for PostgreSQL table can be up to 63 characters in length. If the names of database objects such as tables, functions, and views are more than 63 characters in length, you must remove the excess characters.
Modify the compression algorithm.
Replace
ENCODE XXX
in a CREATE TABLE statement of Amazon Redshift with the following clause:WITH (COMPRESSTYPE={ZLIB|ZSTD|RLE_TYPE|NONE})
For information about the compression algorithms supported by AnalyticDB for PostgreSQL, see the "Data compression" section of the Define storage models for tables topic.
Modify distribution keys.
Amazon Redshift supports three distribution keys. For more information, see Distribution styles. You must modify distribution keys based on the following rules:
EVEN distribution: Use
DISTRIBUTED RANDOMLY
.KEY distribution: Use
DISTRIBUTED BY (column, [ ... ] )
.ALL distribution: Use
DISTRIBUTED REPLICATED
.
Modify sort keys.
Replace the COMPOUND or INTERLEAVED option in the
[ COMPOUND | INTERLEAVED ] SORTKEY (column_name [, ...] ) ]
sort key clause of Amazon Redshift with the following clause:order by (column, [ ... ])
Examples:
Example 1:
The following sample code provides an example of a CREATE TABLE statement of Amazon Redshift:
CREATE TABLE schema1.table1 ( filed1 VARCHAR(100) ENCODE lzo, filed2 INTEGER DISTKEY, filed3 INTEGER, filed4 BIGINT ENCODE lzo, filed5 INTEGER ) INTERLEAVED SORTKEY ( filed1, filed2 );
Convert the preceding statement into the following CREATE TABLE statement of AnalyticDB for PostgreSQL:
CREATE TABLE schema1.table1 ( filed1 VARCHAR(100) , filed2 INTEGER, filed3 INTEGER, filed4 BIGINT, filed5 INTEGER ) WITH(APPENDONLY=true,ORIENTATION=column,COMPRESSTYPE=zstd) DISTRIBUTED BY (filed2) ORDER BY (filed1, filed2); -- Sort SORT schema1.table1; MULTISORT schema1.table1;
Example 2:
The following sample code provides an example of a CREATE TABLE statement of Amazon Redshift, which includes the ENCODE and SORTKEY options:
CREATE TABLE schema2.table2 ( filed1 VARCHAR(50) ENCODE lzo, filed2 VARCHAR(50) ENCODE lzo, filed3 VARCHAR(20) ENCODE lzo, ) DISTSTYLE EVEN INTERLEAVED SORTKEY ( filed1 );
Convert the preceding statement into the following CREATE TABLE statement of AnalyticDB for PostgreSQL:
CREATE TABLE schema2.table2 ( filed1 VARCHAR(50), filed2 VARCHAR(50), filed3 VARCHAR(20), ) WITH(APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zstd) DISTRIBUTED randomly ORDER BY (filed1); -- Sort SORT schema2.table2; MULTISORT schema2.table2;
NoteFor more information about sort keys, see Use sort keys and rough set indexes to accelerate queries in column-oriented tables.
CREATE VIEW
Convert a CREATE VIEW statement of Amazon Redshift into an SQL statement that complies with the AnalyticDB for PostgreSQL syntax. The conversion rules are similar to the conversion rules of CREATE TABLE statements.
The WITH NO SCHEMA BINDING clause is not supported.