SQL server change tracking

Change Tracking captures the fact that rows in a table have changed, but does not capture the changed data or how many times a row changed. This enables applications to determine which rows changed and when the change occurred.

Change Tracking answers fewer historical questions than Change Data Capture. For those applications that do not require historical information, there is far less storage overhead because Change Tracking does not capture the changed data itself.

Change Tracking saves the change information by keeping the primary key column of each row that changed. It saves a version number for each INSERT/UPDATE/DELETE operation. To retrieve the latest data for changed rows, an application can use the primary key column values to join the source table with the tracked table. In each execution, the application can track the latest changes by version numbers.

Changing tracking extraction in Data Integration

To align the data and the metadata in the first run, Data Integration makes a migration of the chosen table(s) using the Overwrite loading mode. After the initial migration completes successfully, Data Integration saves the database's current change tracking version in the Data Flow settings. The following Data Flow execution extracts data from this version up to the newest version at the current execution time. It loads the data using the Upsert-Merge loading mode into the target table(s).

Enabling change tracking for tables

To track changes, enable change tracking for the database, and then enable it for the specific tables within that database that you want to track. You do not need to change the table definition, and SQL Server does not create any triggers.

To enable change tracking for a DB, use the following ALTER operation:

ALTER DATABASE AdventureWorks2012
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

To enable change tracking for a specific table, use the following ALTER operation:

ALTER TABLE Person.Contact ENABLE CHANGE_TRACKING

To learn more about change tracking settings, refer to the Microsoft SQL Server docs.

Initiate migration

Procedure

Navigate to the Data Integration Account.
Select the Data Flow and connection that you created.
Go to the Schema tab.
Select the desired table.
Click Table Settings.
Scroll down and check the Initiate Migration checkbox. By default, this process also overwrites the table to align the SQL Server database with the corresponding Snowflake or Databricks table (optional).

Using change tracking

Navigate to the Data Integration Console.
Select the Data Flow and connection that you created.
Select the Change Tracking option in the Source tab:

Before initiating the first Data Flow execution, verify that you have marked all tables as Initiate Migration within the Table Setting tab available in the Schema section.
This extracts all the source table's data into your target table and synchronizes the Data Flow with the DB's last change tracking version.
The following executions extract data from the last synchronization version until the database's current version.

In the Activities tab, you can find the versions extracted for each execution.

Hard delete

Microsoft-SQL Change Tracking with Snowflake and Databricks Target Data Flows offers Include Deleted Rows feature.

This option lets you track data that the source table deleted.

Activating Include Deleted Rows

In Microsoft SQL Server Data Flow, click Schema.
Choose a dedicated Source Table.
Select Table settings.
Check Initiate Migration, and then Overwrite Table Migration to match the existing database in SQL Server with the Snowflake/Databricks table (optional).

note

Skip this step if you want to keep working with the data in Snowflake/Databricks that is marked as deleted.
After the first Initiate Migration is complete, the checkbox is turned off.

Click Include Deleted Rows.

note

If you select Include Deleted Rows, the Data Flow retrieves all changes from the Change Tracking table, including deleted rows.
It also adds a __DELETED column to indicate if the source table deleted the row or not.

Enable Remove Deleted Rows(optional).

note

This option requires you to select Upsert - Merge in Loading Mode.
By enabling Remove Deleted Rows, Data Integration eliminates and removes all rows with the status True (Deleted) from the Snowflake/Databricks Target table.

Click Run.
Connect to Snowflake or Databricks to access and view the results.

Troubleshooting

Data restore

After you perform a database restore operation on a table, check the Initiate Migration checkbox to initialize the table. This loads all the data from the source table into the target table to make sure the tables are in sync and to prevent data loss.

Cleanup during Data Flow extraction

While Data Integration extracts the data, a database cleanup process may remove change tracking data that is older than the designated retention period.

By the time Data Integration retrieves the changes, the recent synchronization version might no longer be valid. This causes the error: - Min valid version is greater than the last synchronization version. Reinitialize the table. To re-initialize the table, navigate to your Data Flow's Schema and choose Set initiate migration for all tables from the three dots menu.

All the chosen tables should have the status Waiting For Migration, which indicates that Data Integration reset your synchronization version.

Verifying change tracking status

To verify that you have enabled Change Tracking for a database and its tables in SQL Server, execute the following queries:

Check change tracking for databases

 SELECT
 d.name, 
 ctd.*
 FROM 
 sys.change_tracking_databases ctd
 JOIN 
 sys.databases d 
 ON d.database_id = ctd.database_id;

Checking change tracking for tables

SELECT OBJECT_NAME(object_id) AS TableName,
is_track_columns_updated_on AS ChangeTrackingEnabled
FROM sys.change_tracking_tables
WHERE OBJECT_NAME(object_id) = 'YourTableName';

These queries will help you determine whether you have enabled Change Tracking at both the database and table levels in your SQL Server environment.

Cleanup during Data Flow extraction

[RVR-MSSQL-RDBMS-301]: Last synchronization version does not exist for this table. Reinitialize the table data. Refer to our topics: Page - 'SQL Server Change Tracking'. Section - 'Cleanup During Data Flow Extraction'

This issue arises due to the absence of the last_sync_version. If the system detects no last sync version, it triggers an error message, and you must reset the table to ensure the system can access change tracking data. This situation can occur due to automated cleanup processes, which may arise either frequently or infrequently, depending on specific factors.

The occurrence of this issue stems from excessive cleanup, which can lead to synchronization problems. To address this, the system examines the logs for any indications of issues related to change tracking or the cleanup process in the MSSQL server logs. After resetting and running the Data Flow, it is crucial to monitor the logs closely during subsequent runs to catch any recurrence of this error message.

Understanding how to use the change tracking tables is also essential. Automatic resets can occur due to factors, including the frequency of updates and deletions in the tables.

Questions to consider are:

Is there a high volume of update and delete operations without corresponding inserts in this table?
Is the version column updated or deleted frequently?
Are there concurrency issues, such as multiple processes attempting to synchronize with change tracking simultaneously, if this table serves purposes beyond just this Data Flow? Proper coordination of the synchronization process is necessary to prevent such issues.

Checking changes for the relevant version for a table

In some cases, you may need to retrieve either the latest or the initial change tracking version. This helps to determine if your Data Flow is going out of sync due to the purging of the minimum valid version in the database.

To fetch the minimum valid version

// CHANGE_TRACKING_MIN_VALID_VERSION (Transact-SQL) - SQL Server SELECT CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('[<schea_name>].[<table_name>]'));

To fetch the current (latest) version on the server

// CHANGE_TRACKING_CURRENT_VERSION (Transact-SQL) - SQL Server SELECT CHANGE_TRACKING_CURRENT_VERSION()

To retrieve changes since a specific version

// CHANGETABLE (Transact-SQL) - SQL Server 
DECLARE @last_sync_version bigint;  
SET @last_sync_version = <value obtained from query>;  
SELECT *,
SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION,  
SYS_CHANGE_COLUMNS, SYS_CHANGE_CONTEXT   
FROM CHANGETABLE (CHANGES <schema>.<table_name>, @last_sync_version) AS C;

To get all current rows with their associated version

// CHANGETABLE (Transact-SQL) - SQL Server 
// Get all current rows with associated version  
// This is using CHANGETABLE(VERSION) option in SQL Server
SELECT t.*
c.SYS_CHANGE_VERSION, c.SYS_CHANGE_CONTEXT  
FROM <tbl_name> AS t
CROSS APPLY CHANGETABLE   
(VERSION <table_name>, ([<primary_key_col>]), (t.[<primary_key_col>])) AS c;