BigQuery as a source connection

BigQuery is a fully managed, serverless data warehouse provided by Google Cloud Platform that enables super-fast SQL-like queries using the processing power of Google's infrastructure.

Using BigQuery as a source in Data Integration, you can leverage the power of BigQuery's data storage and querying capabilities while integrating that data into your data workflows managed by Data Integration.

BigQuery compatibility versions

River modes	Versions supported	Support for managing schema drift
Multi-Tables (Standard Extraction)	All Versions	Yes
Custom Query	All Versions	Yes

Prerequisites

To establish a connection between Google BigQuery and Data Integration, ensure you have a BigQuery account with the necessary permissions.

Account and permissions requirements

There are two options for connection configuration:

Default service account and bucket provided by Data Integration
Custom service account and bucket.

Connection Configuration Options	Role Assignment	Permission Grant for Service Account Level
Default service account and bucket	You need a BigQuery account with the ability to grant both Job User and Data Viewer roles, which are crucial for data retrieval and processing.	N/A
Custom service account and bucket	You must have a BigQuery account that can assign Job User and Data Viewer roles for data handling. Additionally, assign the Storage Admin role to the specified Custom File Zone to enable access and add Service account buckets to the connection list.	Provide the storage.buckets.list Permission at the project level by linking it to a Role containing this permission and assigning it to the service account.

Default service account and bucket configuration

Procedure

Enable necessary APIs:
- Navigate to the Google Cloud console and access the APIs & Services section.
- Click Library.
- Find Google Cloud Storage JSON API and enable it.
- Find the BigQuery API and enable it.
Grant permissions:

Data Integration automatically generates a dedicated service account and Google Cloud Storage (GCS) account folder. This service account only has access to the created folder.
- To grant permissions, sign in to the Google Cloud Platform console and ensure you are in the desired project.
- Go to IAM & Admin and click IAM.
- Click on +GRANT ACCESS under the VIEW BY PRINCIPALS section.
- Add the Data Integration service account under the New Principals section.
- Assign the BigQuery Data Viewer and BigQuery Job User roles and save the settings.
Configure BigQuery source connection in Data Integration:

Log in to Data Integration account.
Create a new connection for your BigQuery instance:
- Go to Connections.
- Select New Connection.
- Choose Google BigQuery (Source) from the source list.
- Enter your Google Platform Service Account credentials:
  - Connection Name
  - Project ID (found in the Google Cloud Platform Home section)
  - Region: Select your dataset region. Data Integration exports data to a default bucket in the same region.
Test the connection and save the settings.

Custom service account and bucket configuration

note

Data Integration transfers your data to a Google Cloud Storage bucket. To reach the applicable storage bucket and BigQuery project, you must initially establish a user within the Google Cloud Platform console. This user should have permissions to access the relevant bucket and BigQuery project.

Procedure

Enable necessary APIs:

Follow the BigQuery source topic to enable the Google Cloud Storage JSON API and BigQuery API.
Create a Google Cloud service account:

note
The Custom Service account can use an established service account or one that is created. If you want to create a new service account within the Google Cloud Platform, you must confirm that your user possesses the ServiceAccountAdmin role.
- Sign in to the Google Cloud Platform console.
- Go to IAM & Admin > Service accounts and click +CREATE SERVICE ACCOUNT.
- In the Service Account Creation window:
  - Specify the Service Account name (For example, Data_Integration User).
  - Click CREATE AND CONTINUE.
  - Provide the service account with project access by selecting the BigQuery Data Viewer and BigQuery Job User roles. Proceed by clicking on CONTINUE.
  - You can grant users access to this service account. To complete the process, click the DONE.
  - After completion of the creation process, the service account is available within the Google Cloud console.
Provide access for the service account: Grant the storage.buckets.list permission at the project level by associating it with a Role that includes this specific permission and assigning it to the service account.

note
This permission is essential for retrieving Service account buckets and adding them to the connection list.
Create a Google Cloud storage bucket:

Sign in to the Google Cloud Platform console.
Go to Cloud Storage > Buckets and click +CREATE.
In the Bucket Creation window:
- Set Bucket Name (For example, project_name_data_integration_file_zone).
- Choose a Region for the bucket.
- Select a storage class.
- Configure the object access for your Bucket as Uniform, and make sure to select the option Enforce public access prevention on this bucket.
- Click CREATE.

Provide access to the dedicated bucket for the service account:

Navigate to Cloud Storage and click Buckets.
Select the intended bucket (designated for Custom File Zone).
Within the Permissions section, click on the +GRANT ACCESS option.
In the Add Principals section, include your service account.
For role assignment, designate the Storage Admin role for the specified Custom File Zone.
Click Save to complete the process.

Configure BigQuery source custom connection in Data Integration:
Configure BigQuery source connection in Data Integration:

Log in to Data Integration account.
Create a new connection for your BigQuery instance:
- Go to Connections.
- Select New Connection.
- Choose Google BigQuery (Source) from the source list.
- Enter your Google Platform Service Account credentials.
Enable the Custom File Zone toggle. To learn more, refer to Custom Service Account and Bucket Configuration.
Provide the Service Account email.
Submit your Service Account Private Key (JSON) file. To create your JSON file for the Service Account.
- Log in to the Google Cloud Platform console.
- Navigate to IAM & Admin and click Service accounts.
- Select the relevant service account from the menu and access the Manage Keys option.
- Click on the drop-down for adding a key, then choose to Create New Key.
- Opt for a JSON key format and click the Create.
- This generates a JSON file, which you can download to your local device.
Set the Service Account Private Key (JSON), and this leads to the automatic population of the "Service Account Email" and "Project ID" fields.
Set the Default Bucket to the one created earlier, ensuring Region consistency.
Test the connection and save the settings.

Project ID and custom file zone association

The project ID specified in the attached Service Account Private key (JSON) file determines the Custom File Zone. For instance, Data Integration-Bucket, this Project ID is extracted from the Service Account Private key (JSON), and this specific Project ID dictates the buckets available for use as the Custom File Zone.

Supported regions

Users can BigQuery data across various geographic locations, offering increased flexibility and accessibility. Multiple tables are accessible to users regardless of they location. However, it is crucial to acknowledge that some tables might not be included in this accessibility framework.

This discrepancy arises from BigQuery's requirement for users to specify the location of each table explicitly. Specific tables may not be readily available due to this requirement. Therefore, users should be aware of the potential limitations and take necessary steps to address any missing data.

info

In Data Integration, there is a strict region definition; the region in the connection must match the region of the datasets.