Creating a connection to Google BigQuery as a target

This guide provides step-by-step instructions for

Set up a connection between Data Integration and Google BigQuery as a target by configuring your Google Cloud Platform (GCP) account, setting up the necessary resources, and creating a connection in Data Integration. You can also manage service accounts and Google Cloud Storage buckets, whether you choose to use Data Integration’s managed resources or configure your own via a Custom File Zone.

For additional details, refer to schema drifts and troubleshooting common issues. You can successfully transfer data from Data Integration to Google BigQuery.

Prerequisites

Ensure you have signed up for the Google Cloud Platform (GCP) and have a console Admin user to manage the necessary resources.

Enable BigQuery API
Enable Google Cloud Storage JSON API
BigQuery Dataset: Ensure you have an available dataset or create a new one if needed.
Google Cloud Storage Bucket: Required only if you opt for the Custom File Zone setup.

Step 1: creating a Google BigQuery connection in Data Integration

In the Data Integration console, navigate to Connections > New Connection and select Google BigQuery (Target).
Retrieve your Google Cloud Platform Project ID and Project Number from the GCP console home page.
Enter these values in the Data Integration connection screen.
Select the region where your BigQuery dataset is located.

Step 2: choosing service account management (File Zone Setup)

You can either use Data Integration’s default service account and storage bucket, or opt to manage your own via the Custom File Zone toggle:

Default (Custom File Zone OFF): Data Integration automatically provisions a service account and Google Cloud Storage bucket.
Custom file zone (Custom File Zone ON): If you enable this, Data Integration uses a Google Cloud Storage bucket you manage. This option stages data in your own Google Cloud bucket before inserting it into BigQuery.

2.1. working with Data Integration’s default file zone

Data Integration automatically creates and manages a dedicated service account and Google Cloud Storage bucket. However, you must grant access to Data Integration’s service account.

Procedure to grant permission:

Sign in to Google Cloud Platform console.
Navigate to IAM & Admin > IAM.
Click +GRANT ACCESS.
Add the Data Integration service account as the principal.
Assign the BigQuery Admin role and click Save.
Go to 'API & Services' > 'Library'.
Look for BigQuery API and Cloud Storage JSON API, and enable each by clicking Enable.

2.2. working with a custom file zone

When you use the Custom File Zone option, Data Integration stages data in a Google Cloud Storage bucket that you manage within your GCP project.

Procedure

Ensure your GCP user has the ServiceAccountAdmin role.
Create a new service account in the GCP Console:
- Go to IAM & Admin > Service Accounts.
- Click CREATE SERVICE ACCOUNT.
- Set the service account name (For example, Data_Integration User).
- Assign BigQuery Admin and Storage Admin roles.
Copy the Service Account ID / Email.
Manage keys for the service account by selecting Manage Keys > ADD KEY > Create new key (JSON). This downloads the key locally.
Upload the JSON key to the Data Integration connection.
Enter your Google Cloud Storage bucket name in the Default Bucket field. If Data Integration cannot detect the bucket list, manually enter the bucket name (refer to the known issues section if permissions are incorrect).

Additional configuration options

Data type widening (Schema Drift)

When the data type of a column changes in the source table, Data Integration automatically adjusts the target table to accommodate a "wider" data type. For example:

If a column changes from an Integer to a String, the target data type becomes a String to encapsulate the integer values.
If a column changes from a String to a Date, it remains a String in the target.

You can turn this behavior on or off using the toggle in the Additional Options section:

If ON, the River adjusts data types automatically.
If OFF, the River fails when encountering data type mismatches.

Notifications

To receive warnings about data type changes, set the On Warning toggle to True in the settings.

Known issues

Storage admin permissions: The default "Storage Admin" role may lack the storage.buckets.get permission. If this occurs:
- Duplicate the "Storage Admin" role and add the missing permission.
- Assign the custom role to your service account.
Location consistency: When using BigQuery as a target, ensure that the connection and BigQuery Dataset ID are in the exact location (region). Mismatched locations result in errors such as:

**Error**: Cannot read and write in different locations.