StarRocks data source

0.0.201

DataWorks provides StarRocks Reader and StarRocks Writer for you to read data from and write data to StarRocks data sources. This topic describes the capabilities of synchronizing data from or to StarRocks data sources.

Supported versions

E-MapReduce (EMR) Serverless StarRocks 2.5 and 3.1.
StarRocks 2.1 in an EMR on ECS cluster.
StarRocks Community Edition. For more information, visit the StarRocks official website.
Note
StarRocks Community Edition is highly open. If you encounter an adaptation issue when you use a StarRocks data source, submit a ticket.

Data type mappings

Most StarRocks data types, including numeric, string, and date data types, are supported.

Preparations before data synchronization

To ensure the network connectivity of a resource group that you want to use, you must add the IP address or CIDR block of the resource group to the internal IP address whitelist of the desired EMR Serverless StarRocks instance in advance. In addition, you must allow the CIDR block to access ports 9030, 8030, and 8040.

For more information about how to obtain the IP address or CIDR block of a resource group in DataWorks, see Configure an IP address whitelist.
The following figure shows the entry points for accessing IP address whitelists of an EMR Serverless StarRocks instance.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

The following content describes the configuration of the Java Database Connectivity (JDBC) URL when you add a StarRocks data source:

If you add an EMR Serverless StarRocks instance as a data source, the JDBC URL is specified in the following format:

jdbc:mysql://<URL of the FE node>:<Query port of the FE node>/<Database name>.

FE node-relevant information: You can obtain FE node-relevant information on the details tab of the desired instance.
Database information: After you use EMR StarRocks Manager to connect to the instance, you can view the database information on the SQL editor or metadata management page.

Note

If you need to create a database, you can directly run SQL statements in the SQL editor.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a batch synchronization task to synchronize data of a single table

For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Code for StarRocks Reader

{
    "stepType": "starrocks",
    "parameter": {
        "selectedDatabase": "didb1",
        "datasource": "starrocks_datasource",
        "column": [
            "id",
            "name"
        ],
        "where": "id>100",
        "table": "table1",
        "splitPk": "id"
    },
    "name": "Reader",
    "category": "reader"
}

Parameters in code for StarRocks Reader

Parameter	Description	Required	Default value
datasource	The name of the StarRocks data source.	Yes	No default value
selectedDatabase	The name of the StarRocks database.	No	The name of the database that is configured in the StarRocks data source
column	The names of the columns from which you want to read data.	Yes	No default value
where	The WHERE clause. For example, you can set this parameter to `gmt_create>$bizdate` to read the data that is generated on the current day. You can use the WHERE clause to read incremental data. If the where parameter is not provided or is left empty, StarRocks Reader reads all data.	No	No default value
table	The name of the table from which you want to read data.	Yes	No default value
splitPk	The field that is used for data sharding when StarRocks Reader reads data. If you specify this parameter, data sharding is performed based on the value of this parameter, and parallel threads can be used to read data. This improves data synchronization efficiency. We recommend that you set the splitPk parameter to the name of the primary key column of the table. Data can be evenly distributed to different shards based on the primary key column, instead of being intensively distributed only to specific shards.	No	No default value

Code for StarRocks Writer

{
    "stepType": "starrocks",
    "parameter": {
        "selectedDatabase": "didb1",
        "loadProps": {
            "row_delimiter": "\\x02",
            "column_separator": "\\x01"
        },
        "datasource": "starrocks_public",
        "column": [
            "id",
            "name"
        ],
        "loadUrl": [
            "1.1.1.1:8030"
        ],
        "table": "table1",
        "preSql": [
            "truncate table table1"
        ],
        "postSql": [
        ],
        "maxBatchRows": 500000,
        "maxBatchSize": 5242880
    },
    "name": "Writer",
    "category": "writer"
}

Parameters in code for StarRocks Writer

Parameter	Description	Required	Default value
datasource	The name of the StarRocks data source.	Yes	No default value
selectedDatabase	The name of the StarRocks database.	No	The name of the database that is configured in the StarRocks data source
loadProps	The request parameters for the StarRocks Stream Load import method. If you want to import data as CSV files by using the Stream Load import method, you can configure request parameters. If you have no special requirements, set the parameter to {}. Request parameters that you can configure for the Stream Load import method: column_separator: specifies the column delimiter of a CSV file. The default value is \t. row_delimiter: specifies the row delimiter of a CSV file. The default value is \n. If the data that you want to write to StarRocks contains \t or \n, you must use other characters as delimiters. Example: `{ "column_separator": "\\x01", "row_delimiter": "\\x02"}`	Yes	No default value
column	The names of the columns to which you want to write data.	Yes	No default value
loadUrl	The URL of a StarRocks frontend node. The URL consists of the IP address of the frontend node and the HTTP port number. The default HTTP port number is 8030. If you specify URLs for multiple frontend nodes, separate them with commas (,).	Yes	No default value
table	The name of the table to which you want to write data.	Yes	No default value
preSql	The SQL statement that you want to execute before the synchronization task is run. For example, you can set this parameter to the TRUNCATE TABLE tablename statement to delete outdated data.	No	No default value
postSql	The SQL statement that you want to execute after the synchronization task is run.	No	No default value
maxBatchRows	The maximum number of rows of data written each time.	No	500000
maxBatchSize	The maximum number of bytes written each time.	No	5242880

Feedback

Previous: Sensors Data data sourceNext: SQL Server data source

On this page （1, T）

Supported versions

Data type mappings

Preparations before data synchronization

Add a data source

Develop a data synchronization task

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

Code for StarRocks Reader

Parameters in code for StarRocks Reader

Code for StarRocks Writer

Parameters in code for StarRocks Writer

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

Supported versions

Data type mappings

Preparations before data synchronization

Add a data source

Develop a data synchronization task

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

Code for StarRocks Reader

Parameters in code for StarRocks Reader

Code for StarRocks Writer

Parameters in code for StarRocks Writer

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)