CDC 'Point in Time' position
Change Data Capture (CDC) is a system that monitors source database logs and captures modifications to the source data with precision. The CDC Point in Time Position feature enables you to understand better the operational details of the Data Flow's streaming process. This feature also provides crucial assistance for data recovery and synchronization by enabling you to locate and retrieve data from a specific point in time using the precise data that Data Integration stores in the CDC log position.
Prerequisites
Before proceeding with the CDC Point in Time Position setup, ensure you have established a functioning CDC connection for your specific database. If you have not done so, refer to the following topics for CDC setup instructions:
Glossary
- Initial Migration: The process of transferring historical data from the source database to the target data warehouse.
- Streaming Process: The CDC-based Data Flow's active retrieval of changes from the source database log.
- Table Status: The specific status associated with each selected table, with different states that the Table Configuration Options Document describes.
- Detected Tables: All tables that Data Integration identifies during the stream enablement process have a waiting for migration table status.
Setting up a new Data Flow
After configuring the Source, schema, and Target settings in a CDC-based Data Flow, enable the streaming process.
Procedure
- Activate the Enable Stream toggle located at the bottom of your page.
- Data Integration prompts for the desired sync options:
Automated sync
Automated Sync is the best choice for initial setup and low-touch stream management.
- Enable the Enable Stream UI toggle.
- Data Integration establishes a CDC connector (sink) if the enablement process is successful.
- The CDC connector continuously fetches changes from your database since the enablement of the CDC process.
- If you require a complete migration, Data Integration initiates a one-time migration process concurrently with the CDC connector establishment and CDC log-based changes retrieval. Note that the duration of the initial migration process depends on the size of the tables Data Integration migrates from your database.
- Data Integration stores the historical data from the initial migration in a managed/custom filezone as per your Target connection definition.
- Data Integration replicates all historical data from the file zone to the user's DWH target table(s).
- After migration, all changes captured from the CDC connector establishment time and future changes stream to the DWH target table(s) based on the Data Flow schedule. If the user opts to skip the migration process, the first load and any subsequent changes will stream directly from the CDC connector to the target table(s).

Reinitialize sync
Reinitialize Sync in the event of a database failure, corrupted log, or other scenarios requiring a log re-sync. When selected, Data Integration points the Data Flow’s log position to the source database's current position and captures changes from this point.
Manual sync
This option grants complete command over the streaming process, concerning the log position of the Data Flow. Ensure that you employ this feature if you intend to start retrieving updates from database log at a specific point or if you want to restore data from a particular point within the Data Flow. When you enable this option, Data Integration retrieves the data from the user-inputted log position once the Data Flow runs, following the established schedule. Data Integration replicates any modifications retrieved from the log to the target Data Warehouse (DWH) following the initial migration (or immediately if you skip the migration process).
Note that improper use of this option can lead to data loss. Before using this feature, ensure that you intend to load data starting from a specific point in time and that you do not need to retrieve any changes before the position you specify.
Manually configure the position
- Activate the enable stream UI toggle.
- Data Integration will establish a CDC connector based on the user's manual configuration, provided the specified location exists.
- The CDC connector will continuously retrieve any database changes from the moment the CDC process is re-enabled.
- If the user opts for a complete migration, Data Integration will start a one-time migration process concurrently with the establishment of the CDC connector and the retrieval of changes.
The initial migration process impacts the runtime of your data Data Flow, depending on the size of the tables Data Integration migrates.
- Data Integration stores the historical data from the initial migration in a managed or custom filezone, according to your Target connection configuration.
- Data Integration will replicate all historical data from the file zone to the user's designated Data Warehouse (DWH) target table(s).
- After the migration process completes, Data Integration streams all changes captured from the time of CDC connector establishment and all future changes to the user's DWH target table(s) based on the Data Flow's schedule. If you choose to skip the migration process, Data Integration streams the initial load and all future changes directly from the CDC connector to the target table(s).

Existing Data Flow setup
Upon completing the initial setup of the streaming process, every operational Data Flow maintains a designated log position that continuously adapts to changes within your database. You can check the current log position.
Procedure
- In the Data Flow, go to the Schema tab.
- Select Table Definition. The Tables Definitions pop-up appears.
- Click Advanced Source Definitions, and then select Check Log Position. The latest CDC log position displays.
The Check Log Position option will be available after you run the first Data Flow.
If you disable the streaming process of an existing Data Flow, or wish to change the Data Flow's position mode, use one of the following options.
Automated sync
Use the Automated sync option to manage streams with minimal manual intervention. Once you activate it within an established Data Flow, the Data Integration CDC connector autonomously retrieves any updates starting from the most recent stream position behind the scenes. Data Integration immediately pushes these updates or defers them until after the initial migration process (if chosen).
Reinitialize sync
Use this option where there is a database failure, log corruption, or any other circumstance needs a log re-synchronization. When you activate it within an established Data Flow, Data Integration will reset the log position of the existing Data Flow and initialize it by aligning the CDC connector's log position with the current position in the database. It then captures changes from this point onward.
Data Integration either immediately pushes any updates retrieved from the log or defers them until after the initial migration process completes.
Improper use of this option can lead to data loss.
To reinitialize the synchronization process, follow these steps:
- Deactivate the Enable stream UI toggle.
- This action turns off Data Integration CDC connector.
- Data Integration establishes the last known CDC position (where it ceased to retrieve changes) as the most recent Data Flow position, denoted as 'X' in the diagram.
- Reactivate the Enable stream UI toggle.
- Data Integration re-establishes the CDC connector, disregarding the current known log position ('X') and replacing it with the latest available log position from the user's database.
- The CDC connector continuously fetches any changes from the database starting from the moment of reactivation.
- If you opt to execute a complete migration, Data Integration initiates a one-time migration process concurrently with CDC connector re-establishment and change retrieval. Note that the initial migration process will extend the duration of your Data Flow run, depending on the size of the tables Data Integration migrates from your database.
- Data Integration stores the historical data from the initial migration in a managed or custom file zone, according to the target connection definition.
- Data Integration replicates all historical data from the file zone to the user's Data Warehouse (DWH) target table(s).
- After migration, Data Integration streams all changes captured from the time of CDC connector re-establishment and all subsequent changes to your DWH target table(s) based on the Data Flow's schedule. If you choose to skip the migration process, Data Integration streams the initial load and all subsequent changes directly from the CDC connector to the target table(s).

Manual sync
This option grants comprehensive control over the streaming process, particularly regarding the log position in the Data Flow. Use this option when you wish to retrieve database log changes from a specific starting point or restore the Data Flow's data from a particular point in time. When you activate it within an existing Data Flow, Data Integration will erase the current Data Flow log position and instead configure it to match the user-provided input, which represents the log position. Data Integration directly pushes changes from this designated point or schedules them for push after the initial migration process (if chosen).
Improper use of this option can result in data loss. Before utilizing it, ensure that you are proficient in retrieving the database log position.
Procedure
- Disable the enable stream UI toggle.
- Data Integration deactivates the CDC connector (sink).
- Set the latest known CDC connector position (where Data Integration ceased to retrieve changes) as the current Data Flow position, denoted as 'X' in the diagram.
- Enable the enable stream UI toggle again.
- Data Integration will re-establish the CDC connector, disregarding the current known log position ('X') and replacing it with the position specified by the user, represented as 'Y' in the diagram.
- The CDC connector will continuously retrieve any changes from your database from the moment the CDC is re-enabled.
- If you opt to execute a complete migration, Data Integration will commence a one-time migration process in parallel with CDC connector re-establishment and change retrieval. Note that the initial migration process will extend the duration of your Data Flow run, depending on the size of the migrated tables in your database.
- Data Integration stores the historical data from the initial migration in a managed or custom file zone in accordance with the user's Target connection definition.
- Data Integration will replicate all historical data from the file zone to the user's Data Warehouse (DWH) target table(s).
- After migration, Data Integration streams all changes captured from the time of CDC connector re-establishment and all subsequent changes to your DWH target table(s) based on the Data Flow's schedule. If you decide to skip the migration process, Data Integration streams the initial load and any subsequent changes directly from the CDC connector to the target table(s).
