DataWorks provides TiDB Reader for you to read data from TiDB data sources in offline mode. This topic describes the capabilities of synchronizing data from TiDB data sources.
Supported TiDB versions
Batch data read: TiDB 7.x and TiDB 8.x databases are supported.
Batch data write: No TiDB database versions are supported.
Real-time data read and write: No TiDB database versions are supported.
TiDB is highly compatible with the MySQL protocol and the features and syntax commonly used by MySQL 5.7 and MySQL 8.0. Data synchronization from a TiDB data source in offline mode is implemented based on the MySQL protocol. For information about compatibility comparison between TiDB and MySQL, see MySQL Compatibility.
Supported data types
For information about all data types in each TiDB version, see Data Types. The following table provides the support status of main data types in TiDB.
Data type | TiDB Reader for batch data read |
TINYINT | Supported |
SMALLINT | Supported |
MEDIUMINT | Supported |
INTEGER | Supported |
BIGINT | Supported |
FLOAT | Supported |
DOUBLE | Supported |
DECIMAL | Supported |
CHAR | Supported |
VARCHAR | Supported |
JSON | Supported |
TEXT | Supported |
TINYTEXT | Supported |
MEDIUMTEXT | Supported |
LONGTEXT | Supported |
VARBINARY | Supported |
BINARY | Supported |
BLOB | Supported |
TINYBLOB | Supported |
MEDIUMBLOB | Supported |
LONGBLOB | Supported |
ENUM | Supported |
SET | Supported |
BOOLEAN | Supported |
BIT | Supported |
DATE | Supported |
DATETIME | Supported |
TIMESTAMP | Supported |
TIME | Supported |
YEAR | Supported |
Prepare a TiDB environment before data synchronization
Before you use DataWorks to synchronize data from a TiDB data source, you must prepare a TiDB environment. This ensures that a data synchronization task can be configured and can synchronize data from the TiDB data source as expected. The following information describes how to prepare a TiDB environment for data synchronization from a TiDB data source.
Preparation 1: Check the version of your TiDB database
Data Integration requires that the version of the TiDB database be of 7.x
or 8.x
. You must check whether the version of your TiDB database from which you want to synchronize data meets the requirement. You can execute the following statement to check the version of your TiDB database:
SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.1.1
Edition: Community
Git Commit Hash: 821e491a20fbab36604b36b647b5bae26a2c1418
Git Branch: HEAD
UTC Build Time: 2024-08-27 19:16:25
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.00 sec)
Preparation 2: Prepare an account that has the required permissions
We recommend that you plan and create an account for DataWorks to access your TiDB database. To prepare such an account, perform the following steps:
Optional. Create an account. For more information, see TiDB User Account Management.
Grant the required permissions to the account.
Batch data read: The account must have the
SELECT
permission.You can execute the following statement to grant permissions to the account. Alternatively, you can grant the
SUPER
permission to the account. ReplaceAccount for data synchronization
in the statement with the created account.-- CREATE USER 'Account for data synchronization'@'%' IDENTIFIED BY 'Password'; //Create an account that can be used for data synchronization and specify a password. This way, you can use the account and password to access the database from any host. % indicates a host. GRANT SELECT ON *.* TO 'Account for data synchronization'@'%'; // Grant the SELECT permission to the account.
*.*
indicates that the account is granted the preceding permissions on all tables in all databases. You can also grant the preceding permissions on specific tables in a database to the account. For example, to grant the account the preceding permissions on the user table in the test database, execute the following statement:GRANT SELECT ON test.user TO 'Account for data synchronization'@'%';
.NoteThe
REPLICATION SLAVE
permission is a global permission. You cannot grant this permission on specific tables in a database to the account.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Code for TiDB Reader
The following code provides an example on how to configure a synchronization task to read data from a TiDB data source.
The comments in the JSON example are used to only indicate the definitions of some major parameters. Remove the comments when you configure the parameters.
{
"type": "job",
"version": "2.0",
"steps":
[
{
"stepType": "tidb",
"parameter":
{
"column":
[
"id",
"name"
],
"where": "",
"splitPk": "id",
"connection":
[
{
"selectedDatabase": "test_database",
"datasource": "test_datasource",
"table":
[
"test_table"
]
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter":
{
},
"name": "Writer",
"category": "writer"
}
],
"setting":
{
"errorLimit":
{
"record": "0"
},
"speed":
{
"throttle": false,
"concurrent": 3
}
},
"order":
{
"hops":
[
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Parameters in code for TiDB Reader
Parameter | Description | Required | Default value |
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
table | The name of the table from which you want to read data. You can make advanced configurations for the table parameter to specify a table range. Examples:
| Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a JSON array. The default value is
| Yes | No default value |
splitPk | The field that is used for data sharding when TiDB Reader reads data. If you configure this parameter, data sharding is performed based on the value of this parameter, and parallel threads can be used to read data. This improves data synchronization efficiency.
| No | No default value |
where | The WHERE clause. For example, you can set this parameter to
| No | No default value |