BigQuery data source

0.0.201

DataWorks provides BigQuery Reader for you to read data from BigQuery data sources. You can configure a synchronization task for a BigQuery data source by using the codeless user interface (UI) or code editor. This topic describes the capabilities of synchronizing data from BigQuery data sources.

Supported BigQuery versions and supported regions

BigQuery uses the SDK google-cloud-bigquery 2.29.0. For more information about the SDK capabilities supported by BigQuery, see the BigQuery official documentation.
BigQuery data sources are supported in the following regions:
China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).

Data type mappings

For more information about BigQuery data types, see the BigQuery official documentation. The following table provides the support status of main data types in BigQuery.

BigQuery data type	Java data type

BigQuery data type	Java data type
BOOL	Bool
INT64	Long
FLOAT64	BigDecimal
NUMERIC	BigDecimal
BIGNUMERIC	BigDecimal
STRING	String
BYTES	Bytes
STRUCT	String
ARRAY	String
TIMESTAMP	Date
DATE	Date
TIME	Date
DATETIME	Date
GEOGRAPHY	String
JSON	String
INTERVAL	String

Establish a network connection between a BigQuery data source and an exclusive resource group for Data Integration

Before you use a BigQuery data source for data synchronization in Data Integration, you must establish a network connection between an exclusive resource group for Data Integration and the data source to enable the resource group to access the data source over an internal network address. For more information, see Establish a network connection between a resource group and a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following subsections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from or to a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

When you add a BigQuery data source, you must configure the following items:

BigQuery Project ID: the name of a BigQuery project of Google Cloud.
BigQuery authorization and authentication: You must upload the authentication files of Google Cloud.

Configure a batch synchronization task to synchronize data of a single table

For more information about the configuration procedure, see Configure a batch synchronization task by using the codeless UI and Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader in the code editor.

Code for BigQuery Reader

{
  "stepType": "bigquery"
  "parameter":
  {
    "datasource":"bq_test1",
    "table": "partition_1107",
    "where": "xxx=3",
    "dataSet": "database_0724",
    "partition": [
      "_PARTITIONTIME='2023-11-07'"
     ],
    "column":
    [
      "id",
      "table_id",
      "table_no",
      "table_name",
      "table_status"
    ]
  },
  "name": "Reader",
  "category": "reader"
}

Parameters in code for BigQuery Reader

Parameter	Description	Required	Default value

Parameter	Description	Required	Default value
datasource	The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor.	Yes	No default value
dataset	The BigQuery dataset.	Yes	No default value
table	The name of the table from which you want to read data.	Yes	No default value
column	The names of the fields from which you want to read data. Separate the names with commas (,), such as "column": ["id", "name", "age"].	Yes	No default value
where	The WHERE clause used to filter data. BigQuery Reader concatenates the values of the column, table, and where parameters to form an SQL statement and executes the SQL statement to read data from the source table. For example, when you perform a test, you can set the where parameter to `LIMIT 10`. To read the data that is generated on the current day, you can set the where parameter to `gmt_create>$bizdate`. You can use the WHERE clause to read incremental data. If the where parameter is not provided or is left empty, BigQuery Reader does not filter data based on a condition.	No	No default value
partition	The name of the partition from which you want to read data. You can specify multiple partition names at a time.	No	No default value
splitPk	If the partition parameter is configured, the splitPk parameter does not take effect. If you configure the splitPk parameter, data is sharded based on the field that you specify in the splitPk parameter. Data Integration then runs parallel threads to read data. This way, data can be synchronized more efficiently.	No	No default value

Feedback

Previous: Azure Blob Storage data sourceNext: ClickHouse data source

On this page （1, T）

Supported BigQuery versions and supported regions

Data type mappings

Establish a network connection between a BigQuery data source and an exclusive resource group for Data Integration

Develop a data synchronization task

Add a data source

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

Code for BigQuery Reader

Parameters in code for BigQuery Reader

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

Supported BigQuery versions and supported regions

Data type mappings

Establish a network connection between a BigQuery data source and an exclusive resource group for Data Integration

Develop a data synchronization task

Add a data source

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

Code for BigQuery Reader

Parameters in code for BigQuery Reader

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)