Console command-line tool
Preparations
Download the datahub_console.tar.gz package and decompress the package. Enter the AccessKey pair and the endpoint in the datahub.properties file in the conf directory. After the configuration is complete, run the script in the bin directory.
Note:
Java Development Kit (JDK) 1.8 or later is used.
Instructions
Basic operations
View all executable commands.
Append another command to help to view the parameters required by the command. For example, run the help lt command to view the parameters required by the lt command.
help
Clear the screen.
clear
Exit the program.
exit
View the details of a command error. After a command error is reported, you can run the stacktrace command to view the error details.
stacktrace
Run a multi-command script.
script
Mange projects
Create a project.
cp -p test_project -c test_comment
Delete a project.
dp -p test_project
Query projects.
lp
Manage topics
Create a topic.
-m: the data type of the topic. A value of BLOB indicates that a topic of the BLOB type is to be created. A value of TUPLE indicates that a topic of the TUPLE type is to be created.
The fields for a topic of the TUPLE type are in the format of [(fieldName,fieldType,isNull)]. Separate the fields with commas (,).
ct -p test_project -t test_topic -m TUPLE -f [(name,string,true)] -s 3 -l 3 -c test_comment
Delete a topic.
dt test_project test_topic
Query the information about a topic.
gt -p test_project -t test_topic
Export the schema of a topic as a JSON file.
gts -f filepath -p test_project -t test_topic
Query topics.
lt -p test_project
Import a JSON file to create a topic.
rtt -s 3 -l 3 -c test_comment -f filepath -p test_project -t test_topic
Manage DataConnectors
Create a DataConnector to synchronize data to MaxCompute.
-m: the partition mode. Valid values: SYSTEM_TIME, USER_DEFINE, EVENT_TIME, and META_TIME.
-tr: the partition interval. Default value: 60. Unit: minutes.
-tf: the partition format. A value of ds indicates that data is partitioned by day. A value of ds hh indicates that data is partitioned by hour. A value of ds hh mm indicates that data is partitioned by minute.
coc -p test_project -t test_topic -m SYSTEM_TIME -e odpsEndpoint -op odpsProject -ot odpsTable -oa odpsAccessId -ok odpsAccessKey -tr 60 -c (field1,field2) -tf ds hh mm
Create a field for a DataConnector.
acf -p test_project -t test_topic -c connectorId -f fieldName
Create a DataConnector to synchronize data to ApsaraDB RDS, ApsaraDB RDS for MySQL, or AnalyticDB for MySQL.
-ty: the type of the DataConnector. Valid values:
mysql: creates a DataConnector to synchronize data to ApsaraDB RDS for MySQL.
ads: creates a DataConnector to synchronize data to AnalyticDB for MySQL.
-ht: the write mode. Valid values:
IGNORE
OVERWRITE
-n: the fields to be synchronized. Example: (field1,field2).
cdc -p test_project -t test_topic -h host -po 3306 -ty mysql -d mysql_database -ta msyql_table -u username -pa password -ht IGNORE -n (field1,field2)
Create a DataConnector.
-m: the authentication method. Valid values:
AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
STS: uses a Security Token Service (STS) token for authentication.
cdhc -p test_project -t test_topic -sp sourceProject -st sourceTopic -m AK -i accessid k accessKey
Create a DataConnector to synchronize data to Function Compute.
-au: the authentication method. Valid values:
AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
STS: uses an STS token for authentication.
-n: the fields to be synchronized. Example: (field1,field2).
cfc -p test_project -t test_topic -e endpoint -s service -f function -au AK -i accessId -k accessKey -n (field1,field2)
Create a DataConnector to synchronize data to Hologres.
-au: the authentication method. You can use only the AccessKey pair for authentication.
-m: the parsing type. If you set this option to Delimiter, you must specify the lineDelimiter, parseData, and columnDelimiter fields. If you set this option to IngormaticaJson, you must specify the parseData field.
Delimiter
InformaticaJson
chc -p test_project -t test_topic -e endpoint -cl (field,field2) -au AK -hp holoProject -ht holoTopic -i accessId -k accessKey -m Delimiter -l 1 -b false -n (field1,field2)
Create a DataConnector to synchronize data to Tablestore.
-m: the authentication method. Default value: STS. Valid values:
AK: uses the AccessKey pair for authentication. You must enter the AccessKey ID and the AccessKey secret.
STS: uses an STS token for authentication.
-wm: the write mode. Valid values:
PUT
UPDATE
-c: the fields to be synchronized. Example: (field1,field2).
cotsc -p test_project -t test_topic -i accessId -k accessKey -it instanceId -m AK -t table -wm PUT -c (field1,field2)
Create a DataConnector to synchronize data to Object Storage Service (OSS).
csc -p test_project -t test_topic -b bucket -e endpoint -pr ossPrefix -tf ossTimeFormat -tr timeRange -c (f1,f2)
Delete a DataConnector.
You can specify multiple DataConnector IDs. Separate the IDs with spaces.
dc -p test_project -t test_topic -c connectorId
Query the details of a DataConnector.
gc -p test_project -t test_topic -c connectorId
Query the DataConnectors in a topic.
lc -p test_project -t test_topic
Restart a DataConnector.
rc -p test_project -t test_topic -c connectorId
Update the AccessKey pair for a DataConnector.
uca -p test_project -t test_topic -c connectorId -ty connectorType -a accessId -k accessKey
Manage shards
Merge shards.
ms -p test_project -t test_topic -s shardId -a adjacentShardId
Split a shard.
ss -p test_project -t test_topic -s shardId
Query all shards in a topic.
ls -p test_project -t topicName
Query the synchronization status of a shard.
gcs -p test_project -t test_topic -s shardId -c connectorId
Query the consumer offset of each shard.
gso -p test_project -t test_topic -s subid -i shardId
Manage subscriptions
Create a subscription.
css -p test_project -t test_topic -c comment
Delete a subscription.
dsc -p test_project -t test_topic -s subId
Query subscriptions.
lss -p test_project -t test_topic
Upload and download data
Upload data.
-f: the path of the file to be uploaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.
-m: the text delimiter. Commas (,) and spaces can be used as delimiters.
-n: the size of the data to be uploaded each time. Default value: 1000.
uf -f filepath -p test_topic -t test_topic -m "," -n 1000
Example: Upload a CSV file
The following example shows how to use the console command-line tool to upload a CSV file to DataHub. The following shows the format of the CSV file.
1. 0,qe614c760fuk8judu01tn5x055rpt1,true,100.1,14321111111
2. 1,znv1py74o8ynn87k66o32ao4x875wi,true,100.1,14321111111
3. 2,7nm0mtpgo1q0ubuljjjx9b000ybltl,true,100.1,14321111111
4. 3,10t0n6pvonnan16279w848ukko5f6l,true,100.1,14321111111
5. 4,0ub584kw88s6dczd0mta7itmta10jo,true,100.1,14321111111
6. 5,1ltfpf0jt7fhvf0oy4lo8m3z62c940,true,100.1,14321111111
7. 6,zpqsfxqy9379lmcehd7q8kftntrozb,true,100.1,14321111111
8. 7,ce1ga9aln346xcj761c3iytshyzuxg,true,100.1,14321111111
9. 8,k5j2id9a0ko90cykl40s6ojq6gruyi,true,100.1,14321111111
10. 9,ns2zcx9bdip5y0aqd1tdicf7bkdmsm,true,100.1,14321111111
11. 10,54rs9cm1xau2fk66pzyz62tf9tsse4,true,100.1,14321111111
Each line is a record to be written to DataHub. Fields are separated by commas (,). Save the CSV file as /temp/test.csv on the on-premises computer. The following table describes the schema of the DataHub topic to which the CSV file is written.
Field name | Data type |
---|---|
id | BIGINT |
name | STRING |
gender | BOOLEAN |
salary | DOUBLE |
my_time | TIMESTAMP |
Run the following command by using the console command-line tool:
uf -f /temp/test.csv -p test_topic -t test_topic -m "," -n 1000
Download data.
-f: the storage path of the file to be downloaded. Add escape characters for a path in Windows, for example, D:\\test\\test.txt.
-ti: the offset from which you want to read data, in the format of yyyy-mm-dd hh:mm:ss.
-l: the size of the data that is to be read each time.
-g: specifies whether to read data all the time.
0: reads data only once. No more consumption occurs after the specified size of data is read.
1: reads data all the time.
down -p test_project -t test_topic -s shardId -d subId -f filePath -ti "1970-01-01 00:00:00" -l 100 -g 0
FAQ
Failed to start the script: Run the script in Windows to check whether the path of the script contains parentheses ().