You can use the console command line interface to access DataHub projects and run commands. This topic describes how to install, configure, and run the client, and provides instructions on how to use it.
Prerequisites
Before you use the console command line interface, make sure that the following requirement is met:
The device where you want to install the console client has Java 8 or a later version installed.
Install and configure the console client
Download the datahub_console.tar.gz package and decompress it.
After you decompress the package, you will find the bin, conf, and lib folders.
Go to the conf folder and enter the AccessKey and endpoint information in the datahub.properties file. The file content is as follows:
datahub.accessid=
datahub.accesskey=
datahub.endpoint=The parameters are described as follows:
Parameter | Required | Description | Example |
datahub.accessid | Yes | The AccessKey ID of your Alibaba Cloud account or Resource Access Management (RAM) user. | N/A |
datahub.accesskey | Yes | The AccessKey secret that corresponds to the AccessKey ID. | N/A |
datahub.endpoint | Yes | The endpoint of the DataHub service. You must configure the endpoint based on the region and network connectivity type that you selected when you created the DataHub project. For more information about the endpoints for different regions and network types, see DataHub Domain Names. | https://dh-cn-hangzhou.aliyuncs.com |
Run the console client
You can start the console client in one of the following ways:
Method 1: In the bin folder of the console client installation path, double-click the datahubcmd.bat file (for Windows) to start the console client. The following information is returned, which indicates that the client has started.
Method 2: In the command line window of your operating system, go to the bin folder of the console client installation path. Run the datahubcmd command (for Windows) or the sh datahubcmd.sh command (for Linux or macOS) to start the client. The following figure shows that the client is connected to DataHub.
Get command help
You can obtain help for console client commands in one of the following ways:
Method 1: View command help information in the console client
View help for all commands.
helpSpecify a keyword to view help information for related commands.
For example, to retrieve a list of topics:
DataHub=>help lt
NAME
lt - List topic
SYNOPSYS
lt [-p] string
OPTIONS
-p string
projectName
[Mandatory]
Method 2: In the command line window of your operating system, switch to the bin folder of the console client installation path. Run the following command to view help information for all commands:
...\bin>datahubcmd helpUsage guide
Project operations
Create a project
-p: The project name.
-c: The project description.
cp -p test_project -c test_commentDelete a project
-p: The project name.
dp -p test_projectNote: Before you delete a project, you must delete all resources in the project, including topics, subscriptions, and sync tasks. Otherwise, the deletion fails.
Get a list of projects
lpTopic operations
Create a topic
-p: The project name.
-t: The topic name.
-m: The topic category. BLOB indicates a BLOB topic, and TUPLE indicates a TUPLE topic.
-f: The field format for a TUPLE topic is [(fieldName,fieldType,isNull)]. Separate multiple fields with commas (,).
-s: The number of shards.
-l: The time to live of data in days. Valid values: 1 to 7.
-c: The topic description.
ct -p test_project -t test_topic -m TUPLE -f [(name,string,true)] -s 3 -l 3 -c test_commentDelete a topic
-p: The project name.
-t: The topic name.
dt -p test_project -t test_topicGet topic information
-p: The project name.
-t: The topic name.
gt -p test_project -t test_topicExport a topic schema to a JSON file
-f: The path where the file is saved.
-p: The project name.
-t: The topic name.
gts -f filepath -p test_project -t test_topicGet a list of topics
-p: The project name.
lt -p test_projectCreate a topic from a JSON file
-s: The number of shards.
-l: The time to live of data in days. Valid values: 1 to 7.
-f: The file path.
-p: Specifies the name of the project.
-t: The topic name.
rtt -s 3 -l 3 -c test_comment -f filepath -p test_project -t test_topicModify the lifecycle of a topic
-p: Specifies the name of the project.
-t: The topic name.
-l: Specifies the topic lifecycle.
-c: The topic description.
utl -p test_project -t test_topic -l 3 -c test_commentConnector operations
Create an ODPS connector
-p: The project name.
-t: The topic name.
-m: The synchronization type. The following synchronization types are supported for ODPS: SYSTEM_TIME, USER_DEFINE, EVENT_TIME, and META_TIME.
-e: The ODPS endpoint. You must enter the classic network endpoint.
-op: The ODPS project name.
-oa: The AccessKey ID used to access ODPS.
-ok: The AccessKey used to access ODPS.
-tr: The partition interval in minutes. The default value in the console tool is 60.
-tf: The partition format. `ds` indicates partitioning by day, `ds hh` indicates partitioning by hour, and `ds hh mm` indicates partitioning by minute.
coc -p test_project -t test_topic -m SYSTEM_TIME -e odpsEndpoint -op odpsProject -ot odpsTable -oa odpsAccessId -ok odpsAccessKey -tr 60 -c (field1,field2) -tf ds hh mmAdd a field for ODPS synchronization
-p: The project name.
-t: The topic name.
-c: The connector ID. You can view the ID on the Data Synchronization tab.
-f: The name of the new field.
acf -p test_project -t test_topic -c connectorId -f fieldNameCreate a connector to sync data to MySQL or RDS
-p: The project name.
-t: The topic name.
-h: The host. You must enter the classic network address.
-po: The port.
-ty: The synchronization type. Two types are available:
SINK_MYSQL: Creates a connector to sync data to MySQL.
SINK_ADS: Creates a connector to sync data to ADS.
-d: Specifies the name of the database.
-ta: The table name.
-u: Username
-pa: Password.
-ht: The insert mode. Two modes are available:
IGNORE
OVERWRITE
-n: The fields to sync. For example, (field1,field2).
cdc -p test_project -t test_topic -h host -po 3306 -ty mysql -d mysql_database -ta msyql_table -u username -pa password -ht IGNORE -n (field1,field2)Create a DataHub connector
-p: The project name.
-t: The topic name.
-sp: The sink project where data is imported.
-st: The sink topic where data is imported.
-m: The authentication type.
AK: Authentication using an AccessKey. You must specify the AccessKey ID and AccessKey secret.
Specifies that authentication is performed using STS.
cdhc -p test_project -t test_topic -sp sinkProject -st sinkTopic -m AK -i accessid k accessKeyCreate an FC connector
-p: The project name.
-t: The topic name.
-e: The FC endpoint. You must enter the classic network endpoint.
-s: The FC service name.
-f: The name of the FC function
-au: The authentication method.
AK: Authentication using an AccessKey. You must specify the AccessKey ID and AccessKey secret.
STS: Authentication using STS.
-n: The fields to sync. For example, (field1,field2).
cfc -p test_project -t test_topic -e endpoint -s service -f function -au AK -i accessId -k accessKey -n (field1,field2)Create a Hologres connector
-p: The project name.
-t: The topic name.
-e: The endpoint.
-cl: The fields to sync to Hologres.
-au: The authentication method. Currently, only AccessKey authentication is supported for syncing data to Hologres.
-m: The parsing type. If you select Delimiter, you must specify the lineDelimiter, parseData, and columnDelimiter properties. If you select InformaticaJson, you must specify the parseData property.
Delimiter
InformaticaJson
chc -p test_project -t test_topic -e endpoint -cl (field,field2) -au AK -hp holoProject -ht holoTopic -i accessId -k accessKey -m Delimiter -l 1 -b false -n (field1,field2)Create an OTS connector
-p: The project name.
-t: The topic name.
it: The name of the OTS instance.
-m: The authentication type. STS is used by default.
AK: Authentication using an AccessKey. You must specify the AccessKey ID and AccessKey secret.
STS: Authentication using STS.
-t: The OTS table name.
-wm: The write mode. Two write modes are supported:
PUT
UPDATE
-c: The fields to sync. For example, (field1,field2).
cotsc -p test_project -t test_topic -i accessId -k accessKey -it instanceId -m AK -t table -wm PUT -c (field1,field2)Create an OSS connector
-p: The project name.
-t: The topic name.
-b: The OSS bucket name.
-e: The OSS endpoint name.
-pr: The directory prefix for syncing data to OSS.
-tf: The synchronization time format. For example, %Y%m%d%H%M indicates partitioning by minute.
-tr: The partition interval.
-c: The fields to sync.
csc -p test_project -t test_topic -b bucket -e endpoint -pr ossPrefix -tf ossTimeFormat -tr timeRange -c (f1,f2)Delete connectors
-p: The project name.
-t: The topic name.
-c: The connector ID. You can view the ID on the Data Synchronization tab.
dc -p test_project -t test_topic -c connectorIdGet connector details
-p: The project name.
-t: The topic name.
-c: The connector ID. You can view the ID on the Data Synchronization tab.
gc -p test_project -t test_topic -c connectorIdGet a list of connectors in a topic
-p: The project name.
-t: The topic name.
lc -p test_project -t test_topicRestart a connector
-p: The project name.
-t: The topic name.
-c: The connector ID. You can view the ID on the Data Synchronization tab.
rc -p test_project -t test_topic -c connectorIdUpdate the AccessKey of a connector
-p: The project name.
-t: The topic name.
-ty: The synchronization type. For example: SINK_ODPS.
uca -p test_project -t test_topic -ty SINK_ODPS -a accessId -k accessKeyShard operations
Merge shards
-p: Specifies the name of the project.
-t: The topic name.
-s: The ID of the shard to merge.
-a: The ID of the other shard to merge.
ms -p test_project -t test_topic -s shardId -a adjacentShardIdSplit a shard
-p: The project name.
-t: The topic name.
-s: The ID of the shard to split.
ss -p test_project -t test_topic -s shardIdGet all shards in a topic
-p: The project name.
-t: The topic name.
ls -p test_project -t topicNameGet the synchronization status of a shard
-p: The project name.
-t: The topic name.
-s: The shard ID.
-c: The connector ID. You can view the ID on the Data Synchronization tab.
gcs -p test_project -t test_topic -s shardId -c connectorIdGet the consumer offset of each shard for a subscription
-p: The project name.
-t: The topic name.
-s: The subscription ID.
-i: The shard ID.
gso -p test_project -t test_topic -s subid -i shardIdSubscription operations
Create a subscription
-p: The project name.
-t: The topic name.
-c: The subscription description.
css -p test_project -t test_topic -c commentDelete a subscription
-p: The project name.
-t: The topic name.
-s: The subscription ID.
dsc -p test_project -t test_topic -s subIdQuery a list of subscriptions
-p: The project name.
-t: The topic name.
lss -p test_project -t test_topicUpload and download data
Upload data
-f: The file path. Note: For a path in Windows, you must add escape characters. For example, D:\\test\\test.txt.
-p: The project name.
-t: The topic name.
-m: The text separator. Commas (,) and spaces are supported.
-n: The batch size for each upload. The default value is 1000.
uf -f filepath -p test_topic -t test_topic -m "," -n 1000Example: Upload a CSV file
This example shows how to use the console command line interface to upload a CSV file to DataHub. The format of the CSV file is as follows:
1. 0,qe614c760fuk8judu01tn5x055rpt1,true,100.1,14321111111
2. 1,znv1py74o8ynn87k66o32ao4x875wi,true,100.1,14321111111
3. 2,7nm0mtpgo1q0ubuljjjx9b000ybltl,true,100.1,14321111111
4. 3,10t0n6pvonnan16279w848ukko5f6l,true,100.1,14321111111
5. 4,0ub584kw88s6dczd0mta7itmta10jo,true,100.1,14321111111
6. 5,1ltfpf0jt7fhvf0oy4lo8m3z62c940,true,100.1,14321111111
7. 6,zpqsfxqy9379lmcehd7q8kftntrozb,true,100.1,14321111111
8. 7,ce1ga9aln346xcj761c3iytshyzuxg,true,100.1,14321111111
9. 8,k5j2id9a0ko90cykl40s6ojq6gruyi,true,100.1,14321111111
10. 9,ns2zcx9bdip5y0aqd1tdicf7bkdmsm,true,100.1,14321111111
11. 10,54rs9cm1xau2fk66pzyz62tf9tsse4,true,100.1,14321111111In the CSV file, each line is a record, and fields are separated by commas (,). The file is saved to the local path /temp/test.csv. The DataHub topic has the following format:
Field name | Field type |
id | BIGINT |
name | STRING |
gender | BOOLEAN |
salary | DOUBLE |
my_time | TIMESTAMP |
The command for the console tool is:
uf -f /temp/test.csv -p test_topic -t test_topic -m "," -n 1000Download data
-f: The file path. Note: For a path in Windows, you must add escape characters. For example, D:\\test\\test.txt.
-p: The project name.
-t: The topic name.
-s: The shard ID.
-d: The subscription ID.
-f: The download path.
-ti: The point in time after which you want to read data. The format is yyyy-mm-dd hh:mm:ss.
-l: The number of records to read each time.
-g: Specifies whether to read data continuously.
0: Reads only once. After the current number of records is read, consumption stops.
1: Reads continuously.
down -p test_project -t test_topic -s shardId -d subId -f filePath -ti "1970-01-01 00:00:00" -l 100 -g 0FAQ
The script fails to start: If you run the script in a Windows environment, check whether the script path contains parentheses.