Data Hub walkthrough

note

The Data Hub source connector is available as a limited availability (Beta) release.

Using the Data Hub connector, you can extract golden records from one or more Universe models in your Data Hub repository and load them into a supported target such as Snowflake or Databricks in Data Integration. For an overview of Data Hub concepts, refer to Boomi Data Hub.

Prerequisites

A configured Data Hub connector connection
At least one Universe model deployed in your Data Hub repository
A supported target connector configured in Data Integration

Setting up a Data Hub Data Flow

Step 1: Setting up your data source

Navigate to the Data Integration Console.
Select Create New Data Flow > Source to Target Flow as your Data Flow type.
Find and select Data Hub in the list of data sources.
Under Selected Data Source, select Data Hub.
Under Source Connection, select the connection you configured. To edit an existing connection or create a new one, click the edit icon next to the connection field.
Click Test Connection to verify that Data Integration can reach your Data Hub repository.

Click Next.

Step 2: Selecting your data target

Under Selected Data Target, select your target connector.
Under Target Connection, select your target connection. Click the edit icon to modify an existing connection or create a new one.
Click Test Connection to verify the target connection.
Under Data Loading Settings, enter the values for your target destination:

Field	Description	Required
Database	The target database to load data into.	Yes
Schema	The target schema within the database.	Yes
Advanced Settings	Optional advanced loading configuration. Click to expand.	No

Click Next.

Step 3: Configuring the schema

The Configure Schema step is where you select Universe models, set their extraction methods, and configure mapping and loading settings.

Selecting models and configuring extraction

The Configure Schema step displays all Universe models available in the connected repository as rows in a table. Each row contains the following columns:

Column	Description
Model	The Universe model name. Select the checkbox to include it in the Data Flow. Click the model name to open its detailed settings.
Target Table	The destination table name in the target. Auto-generated from the model name by default.
Extract Method	The extraction method for this model: All or Incremental.
Incremental Field	The field used as the cursor for incremental extraction. Required when Extract Method is Incremental.
Incremental Type	The data type of the incremental field.
Start Value	The start value for incremental extraction.
End Value	The optional end value for incremental extraction. Leave blank to extract until the current run time.
Loading Mode	The loading mode for this model. Inherited from Tables Definitions unless overridden per model. Default: Upsert Merge.

To select and configure models:

Select the checkbox next to each Universe model you want to include.
In the Extract Method column, select All or Incremental for each model.
If you selected Incremental, fill in the Incremental Field, Incremental Type, and Start Value columns for that model.

note

Each extracted record includes an is_enddated column. Active records are tagged is_enddated = false. Soft-deleted records are tagged is_enddated = true. Use this column in your target to distinguish active from deleted records. To include end-dated records in the extraction, click the model name and enable Include End-Dated in the Table Source Settings tab.
For large datasets, use Incremental extraction. It retrieves only records updated after the start value, reducing load time on each run.

Configuring Tables Definitions (optional)

Click Tables Definitions in the toolbar to apply settings across all models in the Data Flow.

Field	Description	Required	Default
Table Prefix	A character or phrase added to the beginning of each target table name.	No	—
Default Loading Mode	The loading mode applied to all models unless overridden in Table Target Settings.	No	Upsert Merge
Merge Method	The merge strategy applied when Loading Mode is Upsert Merge.	No	Merge
Filter Logical Key Duplication Between Files	Filters out duplicate records in the current source pull. Use only when duplicates are expected in the source but not in the target table.	No	Off

Applying bulk actions (optional)

Use Bulk Actions to apply extraction and loading settings across multiple Universe models at once, instead of configuring each model individually. Refer to Using bulk actions for more information.

Configuring model settings

Click a model name in the table to open its detailed settings. The model panel contains three tabs: Mapping, Table Source Settings, and Table Target Settings.

Mapping tab

The Mapping tab shows the column-level mapping between Data Hub source fields and target table columns.

Use the Search field to find specific columns.
Click Reload Model Metadata to refresh the schema from the Data Hub repository.
Click Add Calculated Column to add a custom computed field to the mapping.
Use the All Columns, Match Key, and Cluster tabs to filter the column view.

Each mapping row contains the following fields:

Column	Description
Source Column Name / Expression	The field name as it appears in the Data Hub golden record.
Target Column Name	The field name in the destination table. Editable.
Type	The data type of the field (STRING, TIMESTAMP, and so on).
Mode	Whether the field accepts null values. Default: NULLABLE.
Cluster Key	Assigns this field as a cluster key in the target. Used for query optimization.

Table Source Settings tab

The Table Source Settings tab controls how data is extracted from Data Hub for this model.

Enable Include End-Dated to include soft-deleted records in the extraction. When disabled, only active records (is_enddated = false) are extracted.

Under Extraction Method, select the extract method for this model:

Option	Description
All	Retrieves all golden records for this model on every run. The connector maintains no state between runs. Use for initial loads or small datasets.
Incremental	Retrieves only records updated after the configured start value. Use for large datasets.

Table Target Settings tab

The Table Target Settings tab controls how data is loaded into the target table for this model.

Field	Description	Required	Default
Target Table Name	Overrides the target table name for this model only.	No	Inherited from model name
Override Default Target Settings	When enabled, allows per-model overrides of the loading mode and merge settings below.	No	Off
Table Loading Mode	The loading mode for this model. Available when Override is enabled. Append Only is applied automatically if no key columns are selected.	Conditional	Upsert Merge
Merge Method	The merge strategy for this model. Available when Override is enabled.	Conditional	Merge
Filter Logical Key Duplication Between Files	Filters out duplicate records in the current source pull for this model. Use only when duplicates are expected in the source but not in the target table.	No	Off
Enforce Masking Policy	Preserves the data masking policy applied at the column level in the target table. Requires copy permission on the masking policy and at least one column with an active masking policy.	No	Off

Click Next.

tip

If your repository contains 100,000 or more golden records, activate Accelerated Query in Boomi Data Hub before running extractions. Accelerated Query significantly improves Repository API query performance, which can reduce extraction time for large datasets. Refer to Activating accelerated query for golden records for more information.

Step 4: Scheduling and activating your data flow

Under Schedule Data Flow, enable scheduling, then set the run frequency. All times are in UTC.
Under Set Custom Timeout, enable to set a custom timeout. By default, the timeout is handled automatically based on table size, between 12 hours and 7 days.
Under Notifications, configure email alerts for pipeline events:

Option	Description
Failure	Sends an email when the data flow fails.
Warning	Sends an email when the data flow completes with warnings.
Run Threshold	Sends an email when a run exceeds a defined duration.

Under Data Flow Info, enter a name for the data flow. Optionally assign it to a group and add a description.
Click Activate to save and activate the data flow.