When you Create schemas by using the wizard, you can customize some settings in advanced options, such as filtering by fields or tables and controlling the number of connections used for table synchronization.
Filter by field
Setting method: sensitive-columns=<table_name>.<column_name>...<table_name>.<column_name>
can specify multiple fields separated by commas (,).
For example, sensitive-columns=tbl01.col1,tbl01.col2,tbl02.col3
indicates that col1
and col2
in tbl01
and col3
in tbl02
are sensitive fields. col1
, col2
, and col3
are not synchronized to Object Storage Service (OSS) during schema creation.
Synchronize only some tables
Setting method: In include-tables=<table_name>
, table_name
indicates a common table name or a table name containing the wildcard %
.
For example, include-tables=tbl01,view_%
indicates that only the tbl01
table is synchronized or all tables prefixed with view_
are synchronized.
Filter by table
Setting method: In exclude-tables=<table_name>
, table_name
indicates a common table name or a table name containing the wildcard %
.
For example, exclude-tables=tbl01,view_%
indicates that the tbl01
table or all tables prefixed with view_
are not synchronized.
Note:
We recommend that either
include-tables
orexclude-tables
be configured.When both
include-tables
andexclude-tables
are configured,exclude-tables
is prior toinclude-tables
.
Specify the number of connections used for single table synchronization
When Data Lake Analytics (DLA) synchronizes data, 20 connections are used by default. When the ApsaraDB for RDS (RDS) table contains a numeric auto-increment primary key and the RDS table contains a large amount of data, you can set the number of connections used for data synchronization.
Setting method: connections-per-job=<number of connections>
.
For example, connections-per-job=100
.
Set the total number of connections
You can set the total number of connections used for data synchronization in DLA, to prevent synchronization tasks from using all connections and affecting other tasks.
Setting method: total-allowed-connections=<number of connections>
is used with connections-per-job=<number of connections>
.
For example, the following sample indicates that a synchronization task uses 100 connections and 1,000 connections at most. In this case, DLA can synchronize 10 tables at a time.
connections-per-job=100
total-allowed-connections=1000