Skip to main content
Feedback

Data Hub walkthrough

note

The Data Hub source connector is available as a limited availability (Beta) release.

Using the Data Hub connector, you can extract golden records from one or more Universe models in your Data Hub repository and load them into a supported target such as Snowflake or Databricks in Data Integration. For an overview of Data Hub concepts, refer to Boomi Data Hub.

Prerequisites

  • A configured Data Hub connector connection
  • At least one Universe model deployed in your Data Hub repository
  • A supported target connector configured in Data Integration

Setting up a Data Hub Data Flow

Step 1: Setting up your data source

  1. Navigate to the Data Integration Console.
  2. Select Create New Data Flow > Source to Target Flow as your Data Flow type.
  3. Find and select Data Hub in the list of data sources.
  4. Under Selected Data Source, select Data Hub.
  5. Under Source Connection, select the connection you configured. To edit an existing connection or create a new one, click the edit icon next to the connection field.
  6. Click Test Connection to verify that Data Integration can reach your Data Hub repository.
  1. Click Next.

Step 2: Selecting your data target

  1. Under Selected Data Target, select your target connector.
  2. Under Target Connection, select your target connection. Click the edit icon to modify an existing connection or create a new one.
  3. Click Test Connection to verify the target connection.
  4. Under Data Loading Settings, enter the values for your target destination:
FieldDescriptionRequired
DatabaseThe target database to load data into.Yes
SchemaThe target schema within the database.Yes
Advanced SettingsOptional advanced loading configuration. Click to expand.No
  1. Click Next.

Step 3: Configuring the schema

The Configure Schema step is where you select Universe models, set their extraction methods, and configure mapping and loading settings.

Selecting models and configuring extraction

The Configure Schema step displays all Universe models available in the connected repository as rows in a table. Each row contains the following columns:

ColumnDescription
ModelThe Universe model name. Select the checkbox to include it in the Data Flow. Click the model name to open its detailed settings.
Target TableThe destination table name in the target. Auto-generated from the model name by default.
Extract MethodThe extraction method for this model: All or Incremental.
Incremental FieldThe field used as the cursor for incremental extraction. Required when Extract Method is Incremental.
Incremental TypeThe data type of the incremental field.
Start ValueThe start value for incremental extraction.
End ValueThe optional end value for incremental extraction. Leave blank to extract until the current run time.
Loading ModeThe loading mode for this model. Inherited from Tables Definitions unless overridden per model. Default: Upsert Merge.

To select and configure models:

  1. Select the checkbox next to each Universe model you want to include.
  2. In the Extract Method column, select All or Incremental for each model.
  3. If you selected Incremental, fill in the Incremental Field, Incremental Type, and Start Value columns for that model.
note
  • Each extracted record includes an is_enddated column. Active records are tagged is_enddated = false. Soft-deleted records are tagged is_enddated = true. Use this column in your target to distinguish active from deleted records. To include end-dated records in the extraction, click the model name and enable Include End-Dated in the Table Source Settings tab.
  • For large datasets, use Incremental extraction. It retrieves only records updated after the start value, reducing load time on each run.

Configuring Tables Definitions (optional)

Click Tables Definitions in the toolbar to apply settings across all models in the Data Flow.

FieldDescriptionRequiredDefault
Table PrefixA character or phrase added to the beginning of each target table name.No
Default Loading ModeThe loading mode applied to all models unless overridden in Table Target Settings.NoUpsert Merge
Merge MethodThe merge strategy applied when Loading Mode is Upsert Merge.NoMerge
Filter Logical Key Duplication Between FilesFilters out duplicate records in the current source pull. Use only when duplicates are expected in the source but not in the target table.NoOff

Applying bulk actions (optional)

Use Bulk Actions to apply extraction and loading settings across multiple Universe models at once, instead of configuring each model individually. Refer to Using bulk actions for more information.

Configuring model settings

Click a model name in the table to open its detailed settings. The model panel contains three tabs: Mapping, Table Source Settings, and Table Target Settings.

Mapping tab

The Mapping tab shows the column-level mapping between Data Hub source fields and target table columns.

  • Use the Search field to find specific columns.
  • Click Reload Model Metadata to refresh the schema from the Data Hub repository.
  • Click Add Calculated Column to add a custom computed field to the mapping.
  • Use the All Columns, Match Key, and Cluster tabs to filter the column view.

Each mapping row contains the following fields:

ColumnDescription
Source Column Name / ExpressionThe field name as it appears in the Data Hub golden record.
Target Column NameThe field name in the destination table. Editable.
TypeThe data type of the field (STRING, TIMESTAMP, and so on).
ModeWhether the field accepts null values. Default: NULLABLE.
Cluster KeyAssigns this field as a cluster key in the target. Used for query optimization.
Table Source Settings tab

The Table Source Settings tab controls how data is extracted from Data Hub for this model.

Enable Include End-Dated to include soft-deleted records in the extraction. When disabled, only active records (is_enddated = false) are extracted.

Under Extraction Method, select the extract method for this model:

OptionDescription
AllRetrieves all golden records for this model on every run. The connector maintains no state between runs. Use for initial loads or small datasets.
IncrementalRetrieves only records updated after the configured start value. Use for large datasets.
Table Target Settings tab

The Table Target Settings tab controls how data is loaded into the target table for this model.

FieldDescriptionRequiredDefault
Target Table NameOverrides the target table name for this model only.NoInherited from model name
Override Default Target SettingsWhen enabled, allows per-model overrides of the loading mode and merge settings below.NoOff
Table Loading ModeThe loading mode for this model. Available when Override is enabled. Append Only is applied automatically if no key columns are selected.ConditionalUpsert Merge
Merge MethodThe merge strategy for this model. Available when Override is enabled.ConditionalMerge
Filter Logical Key Duplication Between FilesFilters out duplicate records in the current source pull for this model. Use only when duplicates are expected in the source but not in the target table.NoOff
Enforce Masking PolicyPreserves the data masking policy applied at the column level in the target table. Requires copy permission on the masking policy and at least one column with an active masking policy.NoOff
  1. Click Next.
tip

If your repository contains 100,000 or more golden records, activate Accelerated Query in Boomi Data Hub before running extractions. Accelerated Query significantly improves Repository API query performance, which can reduce extraction time for large datasets. Refer to Activating accelerated query for golden records for more information.

Step 4: Scheduling and activating your data flow

  1. Under Schedule Data Flow, enable scheduling, then set the run frequency. All times are in UTC.
  2. Under Set Custom Timeout, enable to set a custom timeout. By default, the timeout is handled automatically based on table size, between 12 hours and 7 days.
  3. Under Notifications, configure email alerts for pipeline events:
OptionDescription
FailureSends an email when the data flow fails.
WarningSends an email when the data flow completes with warnings.
Run ThresholdSends an email when a run exceeds a defined duration.
  1. Under Data Flow Info, enter a name for the data flow. Optionally assign it to a group and add a description.
  2. Click Activate to save and activate the data flow.
On this Page