Google BigQuery as a target
Configure Google BigQuery as a target in Source to Target Rivers to load data from supported source systems.
Prerequisites
Before you begin, ensure that you have:
- A valid Google BigQuery connection
- The target dataset already exists in BigQuery
- Permission to create and load tables in the dataset
Refer to Create a Google BigQuery connection topic to establish a connection.
Configuring Google BigQuery as the target
- Target
- Schema
Use the Target tab to specify where data is loaded in Google BigQuery and to configure optional staging and metadata settings.
- Navigate to Target tab and select Google BigQuery (Target).
- Under Target Connection, select your BigQuery connection.
- Click Test Connection to ensure the target credentials are valid.
- In Data Loading Settings, select the destination Dataset ID from the drop-down menu.
- Click Advanced Settings to expand the configuration panel.
- (Optional) Turn on Add Rivery Metadata to include tracking columns like
Rivery_last_updatein your target table. Configure the Custom File Zone (Staging) - Custom File Zone:
- Bucket Name: Select the cloud storage bucket used for intermediate staging. For example,
data-pipeline-staging. - File Zone Path: Define the directory structure using variables or text. For example,
{river_name}_{river_id}. - File Zone Folders Period Partition: Select the folder partitioning frequency, such as By Day, from the drop-down menu.
- Bucket Name: Select the cloud storage bucket used for intermediate staging. For example,
- (Optional) Turn on Add Rivery Metadata to include tracking columns like
Configure how Data Integration retrieves data from your selected source based on your synchronization requirements. At this stage, select the extraction method to use: Standard Extraction is suitable for scheduled or periodic data loads. CDC provides near–real-time replication by capturing INSERT, UPDATE, and DELETE events from SAP. Standard Extraction
Standard extraction flow
Change Data Capture (CDC)
Configure CDC migration
Performs a one-time full load of historical data. Automatically disables after the River runs successfully.
Skips historical data and captures only new changes going forward.Select tables for CDC
Common table configuration
Applies to both Standard Extraction and CDC extraction methods.
Configure table mapping
Use this procedure to map source columns to the target table and configure match keys, partitioning, and clustering.
Procedure
- From the table configuration page, select the Mapping tab.
- Review the source columns and verify the Target Column Name, Type, and Mode.
- Clear the checkbox for any column you do not want to load to the target.
- (Optional) Use Search to locate a column or click Reload Metadata to refresh the column list.
- To define match keys:
- Click Match Key.
- Select one or more columns and move them to Match Key Columns.
- Ensure at least one match key is defined when using Upsert Merge loading mode.
- To configure partitioning:
- Locate a column of type DATE or TIMESTAMP.
- Select the Partition checkbox.
- In Partition Granularity, select Year, Month, Day, or Hour (Hour is available only for TIMESTAMP columns).
- To configure clustering:
- Click Cluster.
- Select up to four columns and move them to the cluster list.
- Arrange the columns in the required order.
- (Optional) Click Add Calculated Column to define a derived column.
- Click Apply Changes.
Configure table source settings
To define how data is extracted from the selected source table, perform the following steps:
Procedure
- From the table configuration page, click the Table Source Settings tab.
- In the Extraction Method section, select one of the following:
- Incremental – Extracts only new or updated records based on the configured incremental field.
- All – Extracts all records from the table.
- In the Exporter Chunk Size field, specify the number of records to export per chunk.
- (Optional) In the Filters field, enter a filter expression to limit the data extracted from the source.
- Click Apply Changes to save the table source settings.
Configure table target settings
Use this procedure to define how data is written to the target table.
Procedure
-
From the table configuration page, click the Table Target Settings tab.
-
In the Target Table Name field, enter the table name.
-
Under Split Tables By , select one of the following:
- Don’t Split: Loads all data into a single target table.
- Timestamp: Splits data into multiple target tables based on a timestamp value.
- Expression: Splits data based on a custom expression.
-
In the SQL Dialects, select the SQL dialect to use:
- Standard SQL (default)
- Legacy SQL (if supported by the target system)
Switching between legacy SQL and standard SQL
Switching between Legacy SQL and Standard SQL
Legacy SQL and Standard SQL differ in supported features, syntax, and behavior. These differences must be carefully considered when switching between dialects.
Table reference syntax
Standard SQL uses the following format:
bigquery-public-data.samples.shakespeareFor more information, refer to the BigQuery documentation on migrating from Legacy SQL to Standard SQL.
RECORD (STRUCT) fields
In Standard SQL, columns of type RECORD are automatically flattened. In Legacy SQL, RECORD columns are not flattened.
When switching from Legacy SQL to Standard SQL:
- A single RECORD column is expanded into multiple flattened columns.
- The resulting table schema may change.
If your target table already exists, do not switch back to Legacy SQL mode.
Append and merge considerations
Append and merge operations may behave differently between SQL dialects. Differences in record handling can result in unexpected behavior or data loss.
If a table contains partitions or clusters:
- Queries using Legacy SQL syntax fail.
- Rivers configured with Legacy SQL cannot query partitioned or clustered tables.
Record handling in Standard SQL
When using Standard SQL, record handling depends on whether the logic step is mapped and whether flattening is enabled.
Behavior matrix
Logic step status Flattening disabled Flattening enabled Logic step not mapped Records are loaded as-is You are prompted to map the step Logic step mapped You are prompted to modify mapping or disable flattening Records are flattened Behavior summary
- If the logic step is not mapped and flattening is disabled, records are loaded as-is.
- If the logic step is not mapped and flattening is enabled, mapping is required.
- If the logic step is mapped and flattening is enabled, records are flattened.
- If the logic step is mapped and flattening is disabled, you must modify the mapping or disable flattening.
Creating or modifying partitions and clusters on an existing table
BigQuery does not support modifying partitioning or clustering on an existing table. To change these settings, the table must be dropped and recreated.
Procedure
Step 1: Create a temporary table
Create a temporary table containing the existing data and define the desired partitioning and clustering.
CREATE TABLE <dataset_name>.<tmp_table_name>
PARTITION BY TIMESTAMP_TRUNC(<timestamp_column>, {DAY | HOUR | MONTH | YEAR})
CLUSTER BY <column1>, <column2> AS
SELECT *
FROM <dataset_name>.<table_name>;
```
Example:
CREATE TABLE dwh.temp_users
PARTITION BY TIMESTAMP_TRUNC(inserted_at, DAY)
CLUSTER BY user_id AS
SELECT *
FROM dwh.users;Step 2: Drop the original table
After confirming that the data exists in the temporary table, drop the original table.
Until this process is complete, the original table is unavailable.
DROP TABLE <dataset_name>.<table_name>;Example:
DROP TABLE dwh.users;Step 3: Copy the temporary table
Using the BigQuery console:
- Select the temporary table in the Resources pane.
- Click Copy table.
- In the Destination section:
- Choose the destination dataset
- Enter the original table name
- Complete the copy operation.
Step 4: Drop the temporary table
DROP TABLE <dataset_name>.<tmp_table_name>;Example:
DROP TABLE dwh.temp_users; -
Enable Override Default Target Settings to customize target behavior for this table.
-
In the Table Loading Mode field, select how data is loaded into the target:
- Upsert Merge (Default): Matches rows based on the selected key columns and updates existing records or inserts new ones.
- Append Only: Appends incoming data to the target table without updating existing rows.
- Overwrite: Replaces the existing data in the target table with the incoming data during each load.
noteIf no key columns are selected, the system automatically applies Append Only loading mode.
Configure merge method for a target table
To define how records are merged into the target table when Upsert Merge loading mode is selected.
-
In the Merge Method field, select one of the following options:
- Merge (Default): Inserts new rows when they do not exist in the target and updates existing rows using a MERGE clause.
warning
The values in the selected Merge Key columns must be unique within the source data. If duplicate keys are present in the source, the BigQuery river fails with the following error:
UPDATE/MERGE must match at most one source row for each target row. - Switch – Merge: Replaces the entire target table with the incoming data during the load operation.
- Merge (Default): Inserts new rows when they do not exist in the target and updates existing rows using a MERGE clause.
-
Click Apply Changes to save the merge configuration.
warningWhen using Merge (Optimized with Partition Pruning), updating a partition column value can create duplicate records because the original row may exist in a different partition.
Recommendation:
- Avoid changing partition column values.
- If partition changes are required, ensure only one record exists per key, or use
Switch-Mergeto rewrite the table.