SQL server change tracking
Change Tracking captures the fact that rows in a table have changed, but does not capture the changed data or how many times a row changed. This enables applications to determine which rows changed and when the change occurred.
Therefore, Change Tracking answers fewer historical questions than Change Data Capture. For those applications that do not require historical information, there is far less storage overhead because Change Tracking does not capture the changed data itself.
Change Tracking saves the change information by keeping the primary key column of each row that changed. It saves a version number for each INSERT/UPDATE/DELETE operation. To retrieve the latest data for changed rows, an application can use the primary key column values to join the source table with the tracked table. In each execution, the application can track the latest changes by version numbers.
Changing tracking extraction in Data Integration
To align the data and the metadata in the first run, Data Integration makes a migration of the chosen table(s) using the Overwrite loading mode. After the initial migration completes successfully, Data Integration saves the database's current change tracking version in the Data Flow settings. The following Data Flow execution extracts data from this version up to the newest version at the current execution time. It loads the data using the Upsert-Merge loading mode into the target table(s).
Enabling change tracking for tables
To track changes, enable change tracking for the database, and then enable it for the specific tables within that database that you want to track. You do not need to change the table definition, and SQL Server does not create any triggers.
To enable change tracking for a DB, use the following ALTER operation:
ALTER DATABASE AdventureWorks2012
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
To enable change tracking for a specific table, use the following ALTER operation:
ALTER TABLE Person.Contact
ENABLE CHANGE_TRACKING
To learn more about change tracking settings, refer to the Microsoft SQL Server docs.
Initiate migration
Procedure
- Navigate to the Data Integration Account.
- Select the Data Flow and connection that you created.
- Go to the Schema tab.
- Select the desired table.
- Click on Table Settings.
- Scroll down and check the Initiate Migration checkbox. By default, this process also overwrites the table to align the SQL Server database with the corresponding Snowflake or Databricks table (optional).
Using change tracking
- Navigate to the Data Integration Console.
- Select the Data Flow and connection that you created.
- Select the Change Tracking option in the Source tab:
-
Before initiating the first Data Flow execution, verify that you have marked all tables as Initiate Migration within the Table Setting tab available in the Schema section.
-
This extracts all the source table's data into your target table and synchronizes the Data Flow with the DB's last change tracking version.
-
The following executions extract data from the last synchronization version until the database's current version.
- In the Activities tab, you can find the versions extracted for each execution.
Hard delete
Microsoft-SQL Change Tracking with Snowflake and Databricks Target Data Flows offers 'Include Deleted Rows' feature.
This option lets you track data that the source table deleted.
Activating Include Deleted Rows
- In Microsoft SQL Server Data Flow, click on Schema.
- Choose a dedicated Source Table.
- Select Table settings.
- Check Initiate Migration, and then Overwrite Table Migration to match the existing database in SQL Server with the Snowflake/Databricks table (optional).
- Skip this step if you want to keep working with the data in Snowflake/Databricks that is marked as deleted.
- After the first Initiate Migration is complete, the checkbox is turned off.
- Click Include Deleted Rows.
- If you select 'Include Deleted Rows', the Data Flow retrieves all changes from the Change Tracking table, including deleted rows.
- It also adds a '__DELETED' column to indicate if the source table deleted the row or not.
- Enable Remove Deleted Rows(optional).
- This option requires you to select 'Upsert - Merge' in Loading Mode.
- By enabling 'Remove Deleted Rows', Data Integration eliminates and removes all rows with the status 'True' (Deleted) from the Snowflake/Databricks Target table.
- Click Run.
- Connect to Snowflake or Databricks to access and view the results.
Troubleshooting
Data restore
After you perform a database restore operation on a table, check the “Initiate Migration” checkbox to initialize the table. This loads all the data from the source table into the target table to make sure the tables are in sync and to prevent data loss.
Cleanup during Data Flow extraction
While Data Integration extracts the data, a database cleanup process may remove change tracking data that is older than the designated retention period.
By the time Data Integration retrieves the changes, the recent synchronization version might no longer be valid. Therefore, an error pops up - Min valid version is greater than the last synchronization version. Please reinitialize the table. To re-initialize the table, navigate to your Data Flow's Schema and choose Set initiate migration for all tables from the three dots menu.
All the chosen tables should have the status Waiting For Migration, which indicates that Data Integration reset your synchronization version.
Verifying change tracking status
To verify that you have enabled Change Tracking for a database and its tables in SQL Server, execute the following queries:
Check change tracking for databases
SELECT
d.name,
ctd.*
FROM
sys.change_tracking_databases ctd
JOIN
sys.databases d
ON d.database_id = ctd.database_id;
Checking change tracking for tables
SELECT OBJECT_NAME(object_id) AS TableName,
is_track_columns_updated_on AS ChangeTrackingEnabled
FROM sys.change_tracking_tables
WHERE OBJECT_NAME(object_id) = 'YourTableName';
These queries will help you determine whether you have enabled Change Tracking at both the database and table levels in your SQL Server environment.
Cleanup during Data Flow extraction
[RVR-MSSQL-RDBMS-301]: Last synchronization version does not exist for this table. Please reinitialize the table data. Please refer to our docs: Page - 'SQL Server Change Tracking'. Section - 'Cleanup During Data Flow Extraction'
This issue arises due to the absence of the last_sync_version. If the system detects no last sync version, it triggers an error message, and you must reset the table to ensure the system can access change tracking data. This situation can occur due to automated cleanup processes, which may arise either frequently or infrequently, depending on specific factors.
The occurrence of this issue stems from excessive cleanup, which can lead to synchronization problems. To address this, the system examines the logs for any indications of issues related to change tracking or the cleanup process in the MSSQL server logs. After resetting and running the Data Flow, it is crucial to monitor the logs closely during subsequent runs to catch any recurrence of this error message.
Understanding how to use the change tracking tables is also essential. Automatic resets can occur due to factors, including the frequency of updates and deletions in the tables.
-
Is there a high volume of update and delete operations without corresponding inserts in this table?
-
Is the version column updated or deleted frequently?
-
Are there concurrency issues, such as multiple processes attempting to synchronize with change tracking simultaneously, if this table serves purposes beyond just this Data Flow? Proper coordination of the synchronization process is necessary to prevent such issues.
Checking changes for the relevant version for a table
In some cases, you may need to retrieve either the latest or the initial change tracking version. This helps to determine if your Data Flow is going out of sync due to the purging of the minimum valid version in the database.
To fetch the minimum valid version
// CHANGE_TRACKING_MIN_VALID_VERSION (Transact-SQL) - SQL Server
SELECT CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('[<schea_name>].[<table_name>]'));
To fetch the current (latest) version on the server
// CHANGE_TRACKING_CURRENT_VERSION (Transact-SQL) - SQL Server SELECT CHANGE_TRACKING_CURRENT_VERSION()
To retrieve changes since a specific version
// CHANGETABLE (Transact-SQL) - SQL Server
DECLARE @last_sync_version bigint;
SET @last_sync_version = <value obtained from query>;
SELECT *,
SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION,
SYS_CHANGE_COLUMNS, SYS_CHANGE_CONTEXT
FROM CHANGETABLE (CHANGES <schema>.<table_name>, @last_sync_version) AS C;
To get all current rows with their associated version
// CHANGETABLE (Transact-SQL) - SQL Server
// Get all current rows with associated version
// This is using CHANGETABLE(VERSION) option in SQL Server
SELECT t.*
c.SYS_CHANGE_VERSION, c.SYS_CHANGE_CONTEXT
FROM <tbl_name> AS t
CROSS APPLY CHANGETABLE
(VERSION <table_name>, ([<primary_key_col>]), (t.[<primary_key_col>])) AS c;