Salesforce walkthrough

important

Salesforce v62 is supported.

Salesforce is a cloud-based CRM platform. Using Data Integration, you can extract data from Salesforce and load it into your target database.

Configuring the Source

Procedure

Navigate to the Data Integration Account.
Click + Create River from the top right-hand corner of the Data Integration page.
Choose Source to Target River as your river type.
In the Search tab, enter Salesforce and select it.
Define a Salesforce connection.
note
If you do not yet have a Salesforce connection in your Data Integration account, you can create a new connection by clicking + New Connection.

Pulling data from Salesforce

Salesforce data is organized in tables called entities.
The entities can be regular Salesforce entities or custom ones.
You can pull all the data from a given table or part of it using a filter according to an incremental field. For example, to retrieve the accounts created since 1.1.25, you must retrieve the data according to the createdDate field and only when its value is later than 1.1.25.

Bulk, SOQL, and Metadata

There are three ways to extract data from Salesforce with Data Integration:

Bulk API - The new and preferred way to extract large sets of data from Salesforce has a limitation of 10,000 batches in a 24-hour sliding window.
SOAP API/SOQL - Extract data utilizing the SOAP API; this method tends to be slower than the Bulk API.
Metadata - It is a report that lets you extract metadata information on an entity or multiple entities. Each row in a metadata report represents the definition of a field from the selected entities. The 'pickup values' field holds the naming conventions between the API and the UI for the closed list of pickup values. The Metadata report is useful when comparing API to UI naming conventions.

Details

Additional features for Bulk API mode

The Bulk API has an additional feature to extract the method all only, called PK Chunking.

PK Chunking is an automatic primary key chunking that splits bulk queries on large tables into chunks based on the record IDs, or primary keys of the queried records.
Supported objects: Account, Campaign, CampaignMember, Case, CaseHistory, Contact, Event, EventRelation, Lead, LoginHistory, Opportunity, Task, User, and custom objects. A custom object is any object that ends with an _c. The available range for PK chunking is between 100,000 and 250,000 records per chunk. PK chunking enhances the extraction of large datasets.

Bulk API limitations

Batches for data loads can consist of a single file not larger than 10 MB.
A batch can contain a maximum of 10,000 records.
A batch can contain a maximum of 10,000,000 characters for all the data in a batch.
A field can contain a maximum of 32,000 characters.
10,000 batches in 24 24-hour sliding window limitation.

Configuring a Salesforce River

important

Rivers built using the Salesforce Legacy connector cannot be automatically transitioned to the new Salesforce connector

Choose your River mode from the following:

Multi-Tables: Load multiple tables (entities) simultaneously from Salesforce to your target.
Custom Query: Use Custom Query mode to load data from a single Salesforce object using a SOQL query.

Custom Query mode

note

Custom Query mode always uses the SOAP API, as SOQL execution is supported only through SOAP.

Creating a Custom Query

Select Custom Query under Choose your River mode.
Enter a valid SOQL query in the Custom Query text box.
Select the Extraction Method:
- All: Fetches all results returned by the query.
- Incremental: Fetches only records added or updated after the previous run (using “last days back” logic).
Configure the Mapping Attribute to map only the fields you want to load.

Writing SOQL Filters

You can use SOQL to:

Filter by field values
Apply multiple logical conditions (AND / OR)
Filter by dates
Handle NULL and boolean conditions
Query parent or child relationships
Sort, limit, or group results
Combine conditions with parentheses for advanced filtering

Example: Custom Query with Filters

The following example retrieves Accounts where the Billing Country is United States and the Annual Revenue is greater than 100,000, using logical operators:

SELECT Id, Name, BillingCountry, AnnualRevenue
FROM Account
WHERE BillingCountry = 'United States'
  AND AnnualRevenue > 100000

Extract method

Using query mode, you can choose how data will be extracted in Custom Query mode:

All: Fetches all data returned by the custom SOQL query. Recommended when performing initial full loads or when incremental logic is not required.
Incremental: Extracts only new or updated records since the previous run. Recommended for large datasets to optimize performance and reduce API usage.

Incremental Extraction Settings

When using Incremental extraction mode, configure the following options to control how data is extracted from Salesforce over time.

Incremental Configuration Fields

Field	Description
Incremental Field *	Select the Salesforce field used to track changes (for example, `LastModifiedDate`, `CreatedDate`, or numeric sequence fields).
Incremental Type *	Choose the type of the selected field such as `Timestamp`, `Date`, or `Running Number`).
Start Date *	Defines when the incremental extraction begins (select date and optional time).
End Date	Defines when the incremental extraction ends. Leave blank to extract up to the most recent values.
Include End Value	Enable to include records matching the end value. Disable to include them in the next run.
Interval chunks size (optional)	Splits the extraction into multiple intervals when large amounts of data are returned, improving stability and performance.
Split your chunk by	Choose the interval type: `Don’t split`, `Weekly`, `Monthly`, or `Yearly`.
Interval Size	Enter the numeric value for the selected split type (e.g., 7 days or 1 month).
Update incremental date range also on failures	When enabled, advances the start date even if the previous run failed. (Not recommended)

Mapping attribute

Select the fields to pull from the selected entity.

Click Auto Mapping to automatically load available fields.
Review the field list and keep only the fields you need.
To add a field, click + Add Field.
To remove a field, select it and click the trash icon.
(Optional) Enable Keep double underscores in auto mapping if you want to preserve __c naming.

Only the fields shown in the mapping table are extracted.

Multi-table mode

Load multiple tables simultaneously from Salesforce to your target. You can choose Bulk API or SOAP API as the Extraction API.

Extraction API

After selecting the extract API, the metadata in the Schema tab is pulled according to your selection.

note

When switching between these options, the metadata of tables and columns in the Schema tab will be updated accordingly.

Auto-detect New Fields In Each Run

By default, Data Integration automatically updates table metadata before each run and adds any newly detected fields. If you disable this option, the river will run using the saved metadata without refreshing it. To update metadata manually, click Reload Metadata in the Schema tab. When new fields are detected, existing target field names and data types remain unchanged.

Include Deleted Rows (default value for all selected tables)

Enable this option to include Salesforce records that are marked as deleted in the extraction.

Configuring the Target

Define how the extracted data should be loaded into your preferred target system.

The target configuration varies depending on the destination you choose. Data Integration provides a dedicated guide covering all target setup options. For more information, refer to Targets overview.

Configuring the Schema

Follow the steps below to configure the Schema for your Salesforce River:

Navigate to the Schema tab to view all Salesforce objects available for extraction.
Each object appears in the grid with the following details:
- Source Table
- Target Table
- Status (Tracked / Not Tracked)
- Loading Mode
- Extract Method
- Time Period
Use the Search tables field to quickly locate any Salesforce entity by name.
Select the tables to load.
Click the Time Period menu (⋯ icon) and choose one of the following:
- Show all – Display all available Salesforce tables.
- Show selected tables – Display only the tables you have selected.
- Edit time period for all tables – Apply a single extraction time range across all selected tables.
Click Reload Metadata to refresh the list of objects and fields.
Click Edit on the right side of the selected row. You will now see two tabs:
- Columns
- Table Settings

Columns settings

All Salesforce fields for the selected object are listed here.

Check or uncheck the columns you want to load. All selected columns are included in the target table.
Edit the Target Column field if you want a different column name in the target.
Click + Add Calculated Column to define a column using an expression.
Click Reload Table Metadata if new fields were added in Salesforce or permissions changed.

Table settings

On the Table Settings tab, you can perform the following:

Change the loading mode.
Change the extraction method. If you select Incremental, you can define which field will be used to define the increment.
Filter by an expression used as a WHERE clause to fetch the selected data from the table.
Set PK Chunking for entities that support primary key chunking.
Enable the option to include deleted rows in the extracted data.

Filter

Apply any filter to act as a WHERE clause while pulling the data.

note

Pull all the data without filters from Salesforce, and then filter it using the Logic in Data Integration. If you decide to use the filters in Salesforce, ensure that you maintain the syntax of the filters as supported by Salesforce.
Combine multiple filters using the next operators- AND and OR.
Number and string values must be quoted. Dates and boolean values must not be quoted. For example, billingcountry='United States' OR billingpostalcode='48226' AND isdeleted=FALSE

Target settings

Use the Target Settings section to define how the data from Salesforce will be written into your target.

Target Table Name - Specify or edit the name of the table that will be created or updated in the target.
Table Type - Choose the table type. The default is FACT, but you may change it if your target requires a different structure.
Loading Mode - Select how data should be loaded into the target:
- Upsert – Merge (Default): Updates matching rows and inserts new ones.
- Append Only: Adds all records as new rows.
- Overwrite: Replaces the entire table.
Filter Logical Key Duplication Between Files (Optional) - Enable this only when duplicates may appear in the source but should not exist in the target. This option filters duplicate rows during ingestion.

To pull the data from Salesforce:

Select the Start Date: Data Integration pulls only data with the selected incremental field later than this start date.
Select the End Date: Data Integration pulls only data from the selected incremental field earlier than this start date. Leave the End Date field empty to retrieve data until the river runs.
After the river runs, the start date will be updated with the value of the end date, and the end date will be updated with an empty value. The next run will extract data later than the current end date.
Include End Value: Enable it to include records with the end value in the results. If you turn off this checkbox, then those records will be pulled in the next run.

note

The Start Date does not advance if a River run is unsuccessful. If you want to remove this default setting, click More options and select the checkbox to advance the start date even if the River run is unsuccessful (Not recommended).

Settings

Go to the Settings tab.
Configure scheduling:
- Click Schedule Me! to define frequency (e.g., daily, hourly).
Set timeouts:
- Default execution time is 12 hours (Rivery can extend to 48 hours for large tables).
Configure notifications:
- Enable alerts On Failure, On Warnings, or On Run Time Threshold.

Save and Run

After all configurations are complete, click Save and Run the River.

Monitor the execution logs to confirm successful extraction and loading. For more information, refer to the Activity Logs.