Creating a connection to Google BigQuery as a target
This guide provides step-by-step instructions for
Set up a connection between Data Integration and Google BigQuery as a target by configuring your Google Cloud Platform (GCP) account, setting up the necessary resources, and creating a connection in Data Integration. You can also manage service accounts and Google Cloud Storage buckets, whether you choose to use Data Integration’s managed resources or configure your own via a Custom File Zone.
For additional details, refer to schema drifts and troubleshooting common issues. You can successfully transfer data from Data Integration to Google BigQuery.
Prerequisites
Ensure you have signed up for the Google Cloud Platform (GCP) and have a console Admin user to manage the necessary resources.
- Enable BigQuery API
- Enable Google Cloud Storage JSON API
- BigQuery Dataset: Ensure you have an available dataset or create a new one if needed.
- Google Cloud Storage Bucket: Required only if you opt for the Custom File Zone setup.
Step 1: creating a Google BigQuery connection in Data Integration
- In the Data Integration console, navigate to
Connections>New Connectionand select Google BigQuery (Target). - Retrieve your Google Cloud Platform Project ID and Project Number from the GCP console home page.
- Enter these values in the Data Integration connection screen.
- Select the region where your BigQuery dataset is located.
Step 2: choosing service account management (File Zone Setup)
You can either use Data Integration’s default service account and storage bucket, or opt to manage your own via the Custom File Zone toggle:
- Default (Custom File Zone OFF): Data Integration automatically provisions a service account and Google Cloud Storage bucket.
- Custom file zone (Custom File Zone ON): If you enable this, Data Integration uses a Google Cloud Storage bucket you manage. This option stages data in your own Google Cloud bucket before inserting it into BigQuery.
2.1. working with Data Integration’s default file zone
Data Integration automatically creates and manages a dedicated service account and Google Cloud Storage bucket. However, you must grant access to Data Integration’s service account.
Procedure to grant permission:
- Sign in to Google Cloud Platform console.
- Navigate to
IAM & Admin>IAM. - Click +GRANT ACCESS.
- Add the Data Integration service account as the principal.
- Assign the BigQuery Admin role and click Save.
- Go to 'API & Services' > 'Library'.
- Look for BigQuery API and Cloud Storage JSON API, and enable each by clicking Enable.
2.2. working with a custom file zone
When you use the Custom File Zone option, Data Integration stages data in a Google Cloud Storage bucket that you manage within your GCP project.
Procedure
- Ensure your GCP user has the ServiceAccountAdmin role.
- Create a new service account in the GCP Console:
- Go to
IAM & Admin>Service Accounts. - Click CREATE SERVICE ACCOUNT.
- Set the service account name (For example, Data_Integration User).
- Assign BigQuery Admin and Storage Admin roles.
- Go to
- Copy the Service Account ID / Email.
- Manage keys for the service account by selecting Manage Keys > ADD KEY > Create new key (JSON). This downloads the key locally.
- Upload the JSON key to the Data Integration connection.
- Enter your Google Cloud Storage bucket name in the Default Bucket field. If Data Integration cannot detect the bucket list, manually enter the bucket name (refer to the known issues section if permissions are incorrect).
Additional configuration options
Data type widening (Schema Drift)
When the data type of a column changes in the source table, Data Integration automatically adjusts the target table to accommodate a "wider" data type. For example:
- If a column changes from an Integer to a String, the target data type becomes a String to encapsulate the integer values.
- If a column changes from a String to a Date, it remains a String in the target.
You can turn this behavior on or off using the toggle in the Additional Options section:
- If ON, the River adjusts data types automatically.
- If OFF, the River fails when encountering data type mismatches.
Notifications
To receive warnings about data type changes, set the On Warning toggle to True in the settings.
Known issues
-
Storage admin permissions: The default "Storage Admin" role may lack the
storage.buckets.getpermission. If this occurs:- Duplicate the "Storage Admin" role and add the missing permission.
- Assign the custom role to your service account.
-
Location consistency: When using BigQuery as a target, ensure that the connection and BigQuery Dataset ID are in the exact location (region). Mismatched locations result in errors such as:
**Error**: Cannot read and write in different locations.