New Source to Target experience
The new Source to Target experience is currently available only in private preview.
The "Source to Target Rivers" feature in Data Integration simplifies building robust data pipelines. It lets you extract data from a designated source and load it into a target system. This automation facilitates the detection of incoming data structures, and Data Integration generates the corresponding target tables and columns for data storage without manual intervention.
You can connect MySQL/PostgreSQL/Oracle to Snowflake/BigQuery using Data Integration Source to Target Rivers feature. It includes step-by-step instructions for setting up and configuring the source and target, letting you establish seamless data flows between their chosen platforms.
Before you begin
Navigate to the Create River section and select Source to Target River from the available options.
Step 1: set up the data source
In the Source step, you can select the data source from which you ingest data. You can customize the ingestion process using parameters.
Data Integration provides a comprehensive list of supported connectors, letting you choose the appropriate source for your data needs.
Creating a connection
After selecting MySQL, PostgreSQL, or Oracle, you must choose a connector or create a connection. If a connection already exists in the Connections tab, it appears in the drop-down menu. To create a new one, select Add new connection.
Test the connection to ensure it is valid before running the River.
Step 2: select data target
In the Target tab, select the Snowflake destination where the source data is loaded. Ensure that the appropriate Snowflake instance matches the data integration requirements. This step configures the destination for the data Flow, ensuring that the source data is directed to the correct Snowflake environment for further processing or analysis.
Creating a target connection
Select an existing Target connection or create a new one. If the Target is a cloud data warehouse, you must specify the database, schema, and target table where the data is loaded.
Data loading settings
Define a database, schema, and target table, where data from your selected source lands. Data Integration automatically detects the available databases and schemas for you to choose from.
In "Snowflake" and "BigQuery", the connection form includes a Default Pre-Populated Values option. You can specify the values you want to work with, and the connection automatically remembers and uses these as the default.
Advanced settings
Data Integration provides a variety of advanced settings specific to Snowflake, letting you manage and customize data handling with precision.
-
Truncate Columns:
Truncates anyVARCHARvalues that exceed the defined column length. This prevents data overflow and ensures consistency with the column definition. -
Replace Invalid UTF-8 Characters:
Automatically replaces invalid UTF-8 characters with the Unicode replacement character. This ensures that invalid or corrupted characters do not interfere with data processing.- Replace invalid UTF-8 characters:
Replaces invalid UTF-8 characters with the Unicode replacement character. This option prevents corrupted characters from disrupting data processing.
- Replace invalid UTF-8 characters:
-
Add Data Integration metadata:
When Data Integration metadata is turned on, additional metadata columns are added to the target table in Snowflake.
These include:
River_last_update: Tracks the timestamp of the most recent update.River_river_id: Identifies the specific River used for the data load.River_run_id: Stores the unique ID of each run of the River.
You can also use custom expressions to include additional metadata fields beyond the defaults.
- Custom File Zone:
Custom File Zone lets you specify a custom file zone for staging files before loading into Snowflake. This option helps manage file storage and properly handle large data files.
These advanced settings control data processing and storing in Snowflake, ensuring data integrity, compatibility, and enhanced metadata tracking.
Step 3: configure schema
Choose an extraction mode that fits your requirements.
Extraction mode
Data Integration provides multiple extraction modes depending on the selected source. When using a source like an RDBMS (Relational Database Management System), you have two options for extracting data:
- CDC
- Standard Extraction
Use the Extraction Mode option on the left side to switch between modes after selecting.
Change data capture
This mode tracks and extracts the data that has changed since the last extraction, making it efficient for high-volume data sources with frequent updates.
The Data Integration Change Data Capture (CDC) extraction mode captures changes by monitoring logs or records generated by the source database, then capturing changes made to the source data. This change data is then collected, transformed, and loaded into the target database, ensuring the target is in sync with the source. To learn more, refer to the Database River Modes.

Standard extraction
This mode lets you map and transform data from multiple tables into a unified schema before loading it into the destination. You establish relationships between tables to link and load the data.
The Multi-Tables River mode in Data Integration uses SQL-based queries to perform data transformations. You can configure it to run on a defined schedule or trigger it manually. To learn more, refer to the Database River Modes.
After selecting the table, its name appears in the Target configuration. You can then choose the extraction method and configure it. Define specific settings to use Incremental extraction.

Incremental extraction: "created_at" field
If you use the "Created_at" option in the incremental field, you must specify both a Start Date and an End Date to select the time range for data extraction. This ensures incremental load includes only records created within this period.
-
Start date and end date configuration:
- Start Date: Specify the beginning of the time range from which you want to pull data.
- End Date: Leave this blank if you want the data extraction to continue up to the current River execution time.
-
Automatic date updates:
- After each river run, the Start Date is automatically updated to match the previous End Date. You can set End Date to empty, letting the next run extract data from the endpoint of the previous run.
-
Time zone offset:
- Set the appropriate time zone offset to align data extraction with the local time when the River executes(if you set the "End Date" to empty).
-
Days back:
- Use the Days Back field to specify several days before the provided Start Date. This lets Data Integration to pull data from a historical point relative to the Start Date.
- The Start Date is not updated if a river run fails. To alter this behavior, navigate to More Options and select the checkbox to advance the Start Date even in the event of a failed river run.
- This is not recommended, as it may cause inconsistencies in your data extraction.
To learn more on selecting time periods, refer to the Source to Target River - general overview.
Definitions
These settings apply to all selected tables to modify schema definitions on the "Source" and "Target".
-
Advanced Source Definitions: The options vary based on the selected extraction mode.
-
Advanced Target Definitions: You can customize the loading modes depending on the target, each providing its own set of supported loading modes. To learn more about loading modes, refer to the Targets.
Table settings
After selecting a specific table, a Settings page appears with three key options:
- Mapping
- Table Source Settings
- Table Target Settings
Mapping
All columns tab
In the All Columns tab under the Mapping section, you can view the Source and Target columns and their respective data Type and Mode. Click the arrow icon next to a column, update the settings as needed, and click Apply Changes to save each modification. You can also add a "Calculated Column" feature for advanced customizations. This feature lets you apply expressions, including mathematical and string operations, to the data from the Source. This can be useful for tailoring the output to specific business needs. For example, use functions to concatenate fields, perform arithmetic calculations, or transform data types. To learn more on using this feature, refer to the Targets.
Match key
The Match Key section lets you define the keys used to match records during data loading. Use the arrow buttons to move the relevant columns to the left table. This step is essential when using the Upsert-Merge loading mode, as at least one match key is required to identify and merge records accurately.
Cluster
The Cluster Key lets you organize data by selecting and moving the desired columns to the left table. These keys are arranged in descending order and help optimize performance when querying and managing large datasets.
Table source settings
The Table Source Settings section lets you define how data is extracted from the source. You can choose between Incremental or All data extraction methods, and configure Advanced Settings such as:
- Update Incremental Date Range on Failures: Determines if the date range should be updated even when an extraction fails.
- Interval Chunk Size: Defines the size of data chunks for extraction.
- Filter Expression: Enter a filter expression to control which data gets extracted.
Table target settings
This example is specific to Snowflake. Each Target has its own unique settings.
In the Table Target Settings, you can override the default target configuration to customize how the data is stored in the target system. Options include:
- Filter logical key duplication between files: Ensures that duplicate logical keys across files are filtered out.
- Enforce masking policy: Applies masking to sensitive data fields as per policy.
- Support escape character: Supports escape characters in data to handle special characters.
These options offer greater control over how data is handled and stored in the target.
Reload metadata
The Reload Metadata options for refreshing schema metadata within Data Integration. You can choose to:
-
Reload Metadata for Selected Schema: This option refreshes the metadata for a specific schema, ensuring any recent changes are reflected.
-
Reload Metadata for All Schemas: This option updates the metadata across all schemas withinData Integration. This is useful for applying widespread updates when working with multiple schemas.
Custom query mode
Data Integration provides a Custom Query mode that lets you load data into the platform using personalized SQL queries. Use this mode when your data import requires advanced flexibility and precise control.
Key aspects of the Custom Query mode include:
-
User-defined queries: Write SQL queries to define the data to be loaded and determine the needed transformations.
-
Data sources: The Custom Query mode supports loading data from various databases and data warehouses, making it suitable for different data environments.
-
Automatic scheduling: Schedule queries to run data loads regularly or trigger them on demand, ensuring real-time access to the latest data.
Switching to Custom Query mode redirects you to the older version of Data Integration that supports this feature, and you cannot return to the previous mode.
To learn more on using Custom Query mode, refer to the Database River Modes.
Step 4: schedule and settings
Scheduling the river
The River is scheduled to run automatically by default, which is the recommended setting. However, you can customize the schedule according to your preferences.
Timeout settings
You can specify timeout settings for River execution. Set a timeout duration to control how long the River runs before automatically terminating. This ensures that prolonged executions do not hinder system performance.
Notifications
Enable notifications to stay informed about the River's execution status. Enter your email address to receive one or more notifications. To add multiple email addresses, separate them with a comma (,)..
Additional river information
In this section, you can include any additional information relevant to the River to enhance clarity and maintain comprehensive documentation.
To learn more on using Custom Query mode, refer to the Settings tab.
This topic is on an earlier UI version, but the feature remains unchanged.
River activation
After completing all configurations for your River, you can activate it. This step lets you verify everything works as expected.
- Activation process: Click Activation to monitor the status and ensure the River initializes correctly.
Side bar features
Data Integration provides an additional sidebar with important options to enhance user experience.
- River Info: Access detailed information about the current river setup and configuration.
- Version history: R Review the changes made to the River over time, including updates and modifications.
- Activities: Monitor and track all activities related to the River, such as data extraction and transformations.
- Variables: Manage and configure variables used within your river.
- Scheduling and notifications: Set up schedules for data runs and receive notifications based on the River's performance and status.
Editing and managing rivers
Only applicable to CDC extraction mode.
Reactivation ensures that updates or modifications are applied correctly, letting the River resume processing data. You must reactivate the River to restore full feature when making changes to the River.
Summary page
The Summary page (first page) provides a comprehensive overview of the River’s recent activity, performance metrics, and current configuration settings, letting you quickly review its status and key operational details.
Procedure
- Navigate to the Data Integration console.
- Click Rivers from the left-hand menu.
- Select the specific River you want to edit or review.
- The "Summary" page shows essential information about the River's current state and recent activity.
The Source, Target, and Schema tabs are identical to those used when creating a new River.

Deployments
Deploying configurations from one environment to another is supported for Rivers created using the new Source to Target experience.
The deployment process includes the following details:
-
River type and status:
- You can view the River type in the relevant column.
- You can view the current status of the "Source Environment" and the intended status of the "Target Environment".
-
Original status retention:
Rivers created with the new "Source to Target experience" retain their original status when deployed to the "Target Environment. - CDC Rivers" are deployed with a Disabled status.
- Existing Rivers in the target environment:
If a River exists in the "Target Environment," its configuration is updated without altering its status. For example, Active Rivers remain active, and their validation state does not reset.
Recommendation: While the process avoids re-validation, it is recommended to manually reactivate the Rivers to ensure proper feature and integration in the "Target Environment".