CDC 'Point in Time' position
Change Data Capture (CDC) is a system designed to monitor source database logs and capture modifications to the source data with precision. The CDC Point in Time Position feature enables you to understand better the operational details of the River's streaming process. This feature also provides crucial assistance for data recovery and synchronization by enabling you to locate and retrieve data from a specific point in time using the precise data stored in the CDC log position.
Prerequisites
Before proceeding with the CDC Point in Time Position setup, ensure you have established a functioning CDC connection for your specific database. If you have not done so, refer to the following documentation for CDC setup instructions:
Glossary
- Initial Migration: The process of transferring historical data from the source database to the target data warehouse.
- Streaming Process: The CDC-based River's active retrieval of changes from the source database log.
- Table Status: The specific status associated with each selected table, with different states detailed in the Table Configuration Options Document.
- Detected Tables: All tables identified during the stream enablement process have a waiting for migration table status.
Setting up a new river
After configuring the Source, schema, and Target settings in a CDC-based River, enable the streaming process.
Procedure
- Activate the Enable Stream toggle located at the bottom of your page.
- Data Integration prompts for the desired sync options:
Automated sync
Automated Sync is the recommended choice for initial setup and low-touch stream management.
- Enable the Enable Stream UI toggle.
- Data Integration establishes a CDC connector (sink) if the enablement process is successful.
- The CDC connector continuously fetches changes from your database since the enablement of the CDC process.
- If a complete migration is required, Data Integration initiates a one-time migration process concurrently with the CDC connector establishment and CDC log-based changes retrieval. Note that the duration of the initial migration process depends on the size of the tables in your database that are being migrated.
- The historical data from the initial migration is stored in a managed/custom filezone as per the your Target connection definition.
- Data Integration replicates all historical data from the file zone to the user's DWH target table(s).
- After migration, all changes captured from the CDC connector establishment time and future changes stream to the DWH target table(s) based on the River schedule. If the user opts to skip the migration process, the first load and any subsequent changes will stream directly from the CDC connector to the target table(s).

Reinitialize sync
Reinitialize Sync is recommended in the event of a database failure, corrupted log, or other scenarios requiring a log re-sync. When selected, Data Integration points the River’s log position to the source database's current position and captures changes from this point.
Manual sync
This option grants complete command over the streaming process, concerning the log position of the River. Ensure that you employ this feature if you intend to start retrieving updates from database log at a specific point or if you want to restore data from a particular point within the river. When enabled, Data Integration obtains the data from the user-inputted log position once the river is executed, following the established schedule. Any modifications retrieved from the log will be replicated to the target Data Warehouse (DWH) following the initial migration (or immediately if the migration process is skipped).
Ensure that improper usage of this option can lead to data loss. Before using this feature, ensure that you intend to load data starting from a specific point in time and that no changes before the provided position need to be retrieved.
Manually configure the position
Procedure
- Activate the enable stream UI toggle.
- Data Integration will establish a CDC connector based on the user's manual configuration, provided the specified location exists.
- The CDC connector will continuously retrieve any database changes from the moment the CDC process is re-enabled.
- If the user opts for a complete migration, Data Integration will initiate a one-time migration process concurrently with the establishment of the CDC connector and the retrieval of changes.
The initial migration process will impact the runtime of your data river, depending on the size of the tables being migrated.
- The historical data from the initial migration will be stored in a managed or custom filezone, as defined by the user's Target connection configuration.
- Data Integration will replicate all historical data from the file zone to the user's designated Data Warehouse (DWH) target table(s).
- Following the completion of the migration process, any changes captured from the time of CDC connector establishment and all future changes will be streamed to the user's DWH target table(s) based on the River's schedule. If you choose to skip the migration process, the initial load and all future changes will be streamed directly from the CDC connector to the target table(s).

Existing river setup
Upon completing the initial setup of the streaming process, every operational river maintains a designated log position that continuously adapts to changes within your database. You can check the current log position.
Procedure
- In the River, go to the Schema tab.
- Select Table Definition. The Tables Definitions pop-up appears.
- Click Advanced Source Definitions, and then select Check Log Position. The latest CDC log position displays.
The Check Log Position option will be available after you run the first River.
In case the streaming process of an existing River is disabled, or you wish to change the River's position mode.
Automated sync
The Automated sync option is advisable for managing streams with minimal manual intervention. Once activated within an established river, the Data Integration CDC connector autonomously retrieves any updates starting from the most recent stream position behind the scenes. These updates will be immediately pushed or deferred until after the initial migration process (if chosen).
Reinitialize sync
This option is used where a database failure, log corruption, or any other circumstance necessitates a log re-synchronization. When activated within an established river, Data Integration will reset the log position of the existing river and initialize it by aligning the CDC connector's log position with the current position in the database. It then captures changes from this point onward.
Any updates retrieved from the log will either be immediately pushed or deferred until after the initial migration process is complete.
Improper use of this option can lead to data loss.
To reinitialize the synchronization process, follow these steps:
- Deactivate the Enable stream UI toggle.
- This action turns off Data Integration CDC connector.
- The last known CDC position (where Data Integration ceased to retrieve changes) is established as the most recent river position, denoted as 'X' in the diagram.
- Reactivate the Enable stream UI toggle.
- Data Integration re-establishes the CDC connector, disregarding the current known log position ('X') and replacing it with the latest available log position from the user's database.
- The CDC connector continuously fetches any changes from the database starting from the moment of reactivation.
- If you opt to execute a complete migration, Data Integration initiates a one-time migration process concurrently with CDC connector re-establishment and change retrieval. Note that the initial migration process will extend the duration of your river run, depending on the size of the tables being migrated in your database.
- Historical data from the initial migration is stored in a managed or custom file zone, as per the target connection definition.
- Data Integration replicates all historical data from the file zone to the user's Data Warehouse (DWH) target table(s).
- After migration, all changes captured from the time of CDC connector re-establishment and all subsequent changes are streamed to the user's DWH target table(s) based on the River's schedule. If the user chooses to skip the migration process, the initial load and all subsequent changes will be streamed directly from the CDC connector to the target table(s).

Manual sync
This option grants comprehensive control over the streaming process, particularly regarding the log position in the river. It is advisable to employ this option when you wish to retrieve database log changes from a specific starting point or restore the river's data from a particular point in time. When activated within an existing river, Data Integration will erase the current river log position and instead configure it to match the user-provided input, which represents the log position. Changes from this designated point will be directly pushed or scheduled for push after the initial migration process (if chosen).
Improper use of this option can result in data loss. Before utilizing it, ensure that you are proficient in obtaining the database log position.
To perform a manual synchronization, follow these steps:
- Disable the enable stream UI toggle.
- The Data Integration CDC connector (sink) will be deactivated.
- Set the latest known CDC connector position (where Data Integration ceased to retrieve changes) as the current river position, denoted as 'X' in the diagram.
- Enable the enable stream UI toggle again.
- Data Integration will re-establish the CDC connector, disregarding the current known log position ('X') and replacing it with the position specified by the user, represented as 'Y' in the diagram.
- The CDC connector will continuously retrieve any changes from your database from the moment the CDC is re-enabled.
- If you opt to execute a complete migration, Data Integration will commence a one-time migration process in parallel with CDC connector re-establishment and change retrieval. Note that the initial migration process will extend the duration of your river run, depending on the size of the migrated tables in your database.
- The historical data from the initial migration will be stored in a managed or custom file zone in accordance with the user's Target connection definition.
- Data Integration will replicate all historical data from the file zone to the user's Data Warehouse (DWH) target table(s).
- After migration, all changes captured from the time of CDC connector re-establishment and all subsequent changes will be streamed to the user's DWH target table(s) based on the River's schedule. If the user decides to skip the migration process, the initial load and any subsequent changes will be streamed directly from the CDC connector to the target table(s).
