Automatic schema drift management
Schema drift refers to the gradual changes in the structure of data in a Source over time. These changes can have a significant impact on the integrity and accuracy of data replicated to a destination Target, such as a data warehouse.
Data Integration provides automated solutions for extracting, transforming, and loading data from various Sources into a Target. One of the key features is the ability to detect and handle schema changes in the Source.
Example of schema drift
Customers table: initial structure
| ID | Name | Address | Phone Number |
|---|---|---|---|
| 1 | Jane | 500 7th Avenue | 5551235555 |
| 2 | John | 16 Great Chapel Street | 5554565556 |
Customers table: after schema drift
| ID | Name | Address | Phone Number | Social Media Handle | |
|---|---|---|---|---|---|
| 1 | Jane | 500 7th Avenue | 555-123-5555 | Null | Null |
| 2 | John | 16 Great Chapel Street | 555-456-5556 | Null | Null |
| 3 | Bob | 14425 Falcon Head Blvd | (+1)555-789-5559 | bob.smith@boomi.com | @bobsmith |
The Customers database initially included four columns, as seen in this example: ID, Name, Address, and Phone Number. However, due to schema drift, the table now has two extra columns, Email and Social Media Handle, and the Phone Number column's data type has been changed from integer to string.
Handling schema changes
When a schema change is detected in the source, Data Integration automatically updates the destination target's schema to match the new structure. This process happens without disrupting data flow or requiring manual intervention, whether through Standard extraction or Change Data Capture (CDC).
Data Integration continuously compares the source's schema with its metadata—a record of the source's structure and attributes—to detect changes. On each run, Data Integration resyncs the metadata with the source, ensuring they are always aligned. The metadata then serves as the foundation for data processing.
New field detection and processing
When Data Integration detects a new field in the source schema, such as ExcludeInMDM, it appears in the Data Integration interface. However, for sources like SQL Server using Change Tracking, Data Integration only updates the target schema (For example, Snowflake) when processing new or modified data. The latest field does not reflect in Snowflake until it is included in new or updated data for the relevant table.
If the new field is created with a default value (For example, 0), this action is not considered a data modification by Change Tracking. Therefore, Data Integration does not recognize it as a change that would trigger a schema update in the target. The new field will remain pending in Data Integration’s schema until the river processes actual new or changed data. Running the river without such data will not trigger a schema change in the target, even if the process completes with warnings.
Column-specific schema changes
- Dropping a Column in the Source Table: If a column is dropped in the source but still exists in the target, the corresponding field in the target table will be populated with null values.
- Renaming a Column in the Source Table: When a column is renamed in the source, Data Integration detects the change during the next river run. This results in a new column being added to the target table with the updated name, while existing records are duplicated with null values in the new column. This approach maintains data integrity while accommodating schema changes.
- Adding a Column in the Source Table: Upon detecting a new column in the source, Data Integration adds it to the target table during the next river run. Existing records in the target table, predating the introduction of the new column, will have null values in this new column.
- Changing Data Type of an Existing Column: If the data type of a source column changes, Data Integration adjusts the target column to a wider data type. For instance, changing a field from an integer to a string in the source will result in the target column being set to a string. However, if a field is changed from a string to a date, it will remain a string in the target.

Sources supporting schema drift
Schema drift is available for the following Sources:
- BigQuery
- MariaDB
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Redshift
- Salesforce
- Teradata
- Vertica
MongoDB considerations
As a non-schema-based database, MongoDB operates differently from traditional databases. To simplify the mapping process when working with MongoDB collections, there are three powerful features. For more details, see streamlining the mapping process.