Skip to main content
Feedback

Best practices for designing models

Review the following best practices and tips to help you design and create models.

General

  • Save a model as a draft and delay publishing until you are confident about the field types since you cannot change the field types after publishing.

  • Consider designing the ‘reference’ models before publishing the first version of a base model if possible. These are the models that will be referenced from the base model. You can’t reference them from the base model (via a Reference Field) unless you create a draft of the model(s) first.

  • Designate the Record Title Format where applicable. This will help tremendously when visualizing the referenced golden record(s) in Hub Data Stewardship scenarios.

  • Create simple match rules when possible. Complex match rules with multiple OR clauses can negatively affect performance.

Sources

  • Start by loading initial data from the source that will contribute the most trusted information to the golden record. This creates a good baseline for golden record creation. For on-going synchronization, we recommend sending data from the most trusted source last or using source ranking to indicate the most trusted source.

  • Exclude fields from your model that are subject to frequent change such as source metadata fields (example: “Last Modified Date”).

  • Before marking a field as required in a model, compare the field requirement across your different contributing sources. It may not be a required field in all your sources. Golden records are quarantined when required fields are missing values. Be sure it is a rule you want to implement across all contributing sources.

  • If a field’s format requirements vary, implement the strictest formatting rules. For example, if one source has a maximum of 50 characters while another source has 100, require a 50 character maximum for the field.

Size

Be aware of the size limitations for models. You cannot publish and deploy models that exceed the size limits. Calculate both the model size and the row size to ensure models do not exceed any of those limitations. The formula to calculate both is described below.

Calculating model size

The total size of all the fields in a model, excluding the id field, reference fields, and repeatable (collection) fields, cannot exceed 65,423 bytes. There are two ways to calculate the model total byte sizes depending on the character encoding in which the DataHub repository is using either UTF8MB3 or UTF8MB4. UTF8MB4 supported repositories include a "UTF8MB4 Supported" label on the repository card in the Repositories Page. If your repository is hosted in a repository that supports UTF8MB4, your golden records now support supplemental 4-byte characters and symbols. UTF8MB4 also improves matching performance and accuracy.

note

4-byte UTF8MB4 support will be rolled out to all existing Hub clouds in the near future, as the existing 3-byte encoding is currently deprecated. Read Connecting to the Boomi Runtime Clouds, Hub Clouds, and Event Streams Clouds for more information on Hub clouds.

Calculate total UTF8MB3 model byte size

To calculate the total size:

  1. To calculate the total size, count the number of fields for each data type. For example, you can calculate that four of your fields are integers, four are data/time fields and two fields are text fields.

  2. Add the byte size for each field in the model. Refer to the following table for the amount of bytes per data type.

The table details the model total byte size for the following clouds that have UTF8MB3 enabled:

  • USA East Hub Cloud

  • GBR Hub Cloud

  • DEU Hub Cloud 01

  • ANZ Hub Cloud

    Data typeBytes
    TextMaximum text length x 3 bytes. If less than 255 then add 1 bytes otherwise add 2 bytes
    Integer4 bytes
    Float8 bytes
    Date/Time8 bytes
    Date3 bytes
    Time3 bytes
    Boolean1 byte
    Enumeration767 bytes
    Long Text12 bytes

For example:

  • 4 integer fields x 4 bytes = 16 bytes

  • 4 date/time field x 8 bytes = 32 bytes

  • 1 text field x (maximum length 10 characters x 3 bytes) + 1 = 30 + 1 = 31

  • 1 text field x (maximum length 100 characters x 3 bytes) + 2 = 300 + 2 = 302

  • Total size of the model = 381 bytes

  • Since 381 is smaller than 65,423, this model is within the limit with respect to the model size calculation, and the row size calculation needs to be evaluated next.

Calculate total UTF8MB4 model byte size

The table details the model total byte size for the following clouds that have UTF8MB4 enabled:

  • USA East Hub Cloud 02

  • Singapore Hub Cloud 01

  • Boomi Japan Hub Cloud 01

  • Canada Hub Cloud 01

  • ANZ Local Hub Cloud 01

    Data typeBytes
    TextMaximum text length x 4 bytes. If less than 255 then add 1 bytes otherwise add 2 bytes
    Integer4 bytes
    Float8 bytes
    Date/Time8 bytes
    Date3 bytes
    Time3 bytes
    Boolean1 byte
    Enumeration1022 bytes
    Long Text12 bytes

For example:

  • 4 integer fields x 4 bytes = 16 bytes

  • 4 date/time field x 8 bytes = 32 bytes

  • 1 text field x (maximum length 10 characters x 4 bytes) + 1 = 40 + 1 = 41

  • 1 text field x (maximum length 100 characters x 4 bytes) + 2 = 400 + 2 = 402

  • Total size of the model = 491 bytes

  • Since 491 is smaller than 65,423, this model is within the limit with respect to the model size calculation, and the row size calculation needs to be evaluated next.

Calculate row size of a model

Model data is stored in a database table. The row size of a model, excluding the id field, reference fields, and repeatable (collection) fields, cannot exceed 8,006 bytes.

To calculate the row size:

  1. Count the number of fields for each data type. For example, you can calculate that four of your fields are integers, four are data/time fields and two fields are text fields.

  2. Add the byte size to the number of fields for each data type. Refer to the following table for the amount of bytes per data type:

    Data typeBytesComment
    Text41 bytesCan be slightly less for text fields with a max length less than 10
    Integer4 bytes
    Float8 bytes
    Date/Time8 bytes
    Date3 bytes
    Time3 bytes
    Boolean1 byte
    Enumeration41 bytes
    Long Text41 bytes

For example:

  • 4 integer fields x 4 bytes = 16 bytes

  • 2 text fields x 41 = 82 bytes

  • 4 date/time fields x 8 = 32 bytes

  • Total byte size = 130

  1. Calculate the following formula and round up to the nearest whole number: (N+2)/8 where N is the number of fields in the model, excluding the id field, reference fields, and repeatable (collection) fields.

For example:

Using the above example: (10 fields + 2)/8 = 12/8 or 1.5 ≈ 2

  1. Add the following: Total byte size + formula total.

For example: 130 + 2 = 132 bytes. The row size is therefore 132 bytes and is less than the limit of 8,006 bytes.

On this Page