Skip to main content
Feedback

Creating a connection to Google BigQuery as a target

This guide provides step-by-step instructions for setting up a connection between Data Integration and Google BigQuery as a target. You will learn how to configure your Google Cloud Platform (GCP) account, set up the necessary resources, and create a connection in Data Integration. The guide also covers options for managing service accounts and Google Cloud Storage buckets, whether you choose to use Data Integration’s managed resources or configure your own via a Custom File Zone.

Additionally, you will find details on handling schema drifts and troubleshooting common issues. By following this guide, you'll be able to successfully transfer data from Data Integration to Google BigQuery.

Prerequisites

Before using this guide, ensure you have signed up for the Google Cloud Platform (GCP) and have a console admin user to manage the necessary resources. To proceed, complete the following:

  • Enable BigQuery API
  • Enable Google Cloud Storage JSON API
  • BigQuery Dataset: Ensure you have an available dataset or create a new one if needed.
  • Google Cloud Storage Bucket: Required only if you opt for the Custom File Zone setup.

Step 1: Create a Google BigQuery connection in Data Integration

  1. In the Data Integration console, navigate to Connections > New Connection and select Google BigQuery (Target).
  2. Retrieve your Google Cloud Platform Project ID and Project Number from the GCP console home page.
  3. Enter these values in the Data Integration connection screen.
  4. Select the region where your BigQuery dataset is located.

Step 2: Choose service account management (File Zone Setup)

You can either use Data Integration’s default service account and storage bucket or opt to manage your own via the Custom File Zone toggle:

  • Default (Custom File Zone OFF): Data Integration automatically provisions a service account and Google Cloud Storage bucket.
  • Custom file zone (Custom File Zone ON): If you toggle this ON, Data Integration will use a Google Cloud Storage bucket you manage. This option stages data in your own Google Cloud bucket before inserting it into BigQuery.

2.1. Working with Data Integration’s default file zone

Data Integration automatically creates and manages a dedicated service account and Google Cloud Storage bucket. However, you need to grant access to Data Integration’s service account.

Steps to grant permission:

  1. Sign in to Google Cloud Platform Console.
  2. Navigate to IAM & Admin > IAM.
  3. Click +GRANT ACCESS.
  4. Add the Data Integration service account as the principal.
  5. Assign the BigQuery Admin role and click Save.
  6. Go to 'API & Services' > 'Library'.
  7. Look for BigQuery API and Cloud Storage JSON API, and enable each by clicking Enable.

2.2. Working with a custom file zone

When you use the Custom File Zone option, Data Integration stages data in a Google Cloud Storage bucket that you manage within your GCP project.

Steps to set up:

  1. Ensure your GCP user has the ServiceAccountAdmin role.
  2. Create a new service account in the GCP Console:
    • Go to IAM & Admin > Service Accounts.
    • Click CREATE SERVICE ACCOUNT.
    • Set the service account name (e.g., Data_Integration User).
    • Assign BigQuery Admin and Storage Admin roles.
  3. Copy the Service Account ID / Email.
  4. Manage keys for the service account by selecting Manage Keys > ADD KEY > Create new key (JSON). This will download the key locally.
  5. Upload the JSON key to the Data Integration connection.
  6. Enter your Google Cloud Storage bucket name in the Default Bucket field. If Data Integration cannot detect the bucket list, manually enter the bucket name (see known issues section if permissions are incorrect).

Additional configuration options

Data type widening (Schema Drift)

When the data type of a column changes in the source table, Data Integration automatically adjusts the target table to accommodate a "wider" data type. For example:

  • If a column changes from an Integer to a String, the target data type will become a String to encapsulate the integer values.
  • If a column changes from a String to a Date, it will remain a String in the target.

You can enable or disable this behavior using the toggle in the Additional Options section:

  • If ON, the River will adjust data types automatically.
  • If OFF, the River will fail when encountering data type mismatches.

Notifications

To receive warnings about data type changes, set the On Warning toggle to True in the settings.

Known issues

  • Storage admin permissions: The default "Storage Admin" role may lack the storage.buckets.get permission. If this occurs:

    • Duplicate the "Storage Admin" role and add the missing permission.
    • Assign the custom role to your service account.
  • Location consistency: When using BigQuery as a target, ensure that the connection and BigQuery Dataset ID are in the same location (region). Mismatched locations will result in errors such as:

    Error: Cannot read and write in different locations.

On this Page