Skip to main content
Feedback

Google Sheets walkthrough

Prerequisites

A Google Sheets connection.

Pulling data from Google Sheets using Data Integration

Procedure

  1. Navigate to the Data Integration Account.

  2. Select Create New River.

  3. Choose Data Source to Target as your river type.

  4. In the General Info tab, name your river, describe it, and choose a group.

  5. Select the Source tab, find Google Sheets in the list of data sources, and select it. (under Organization)

  6. Under Source Connection, select the connection you created, or create a new one.

  7. Enter the GUID of the single file you want to retrieve data from (compatible with shared files).

note

The ID is the value that appears between /d/ and /edit in your spreadsheet URL. For example, in the following Google Sheets URL:docs.google.com/spreadsheets/d/ 1qpyC0XzvTcKT6EISywvqESX3A0MwQoFDE8p-Bll4hps/edit#gid=0

The ID of this spreadsheet is 1qpyC0XzvTcKT6EISywvqESX3A0MwQoFDE8p-Bll4hps.

There are two ways to extract data:

  • All - If you choose All enter the following parameters:

    • Type in a null representative word (for example, null).
    • Skip to step 1.
  • Incremental load by file modified timestamp - If you choose Incremental, enter the following parameters:

    • Choose a Date Range and Time period.
note
  • Start Date is mandatory.
  • Data can be retrieved for the date range specified between the start and end dates.
  • If you leave the end date blank, the data will be pulled at the current time of the river's run.
  • Date timezone: UTC.
  • The Start Date won't be advanced if a River run is unsuccessful.

If you want to turn off this default setting, click More Options and select the checkbox to advance the start date even if the River run is unsuccessful (Not recommended).

  • Select merged headers if they exist.

    • If multiple columns have a joint header (merged cells), then the leftmost cell is passed on to the target even if it is empty.
    • Example starting from this Google spreadsheet:
    • The result merged table in the target is:
    • Specify whether the sheet has a header. If a header is present, specify its position.
    • Set the sheet's columns by name or position number (for example, 1,2,3).
      • For multiple sheets, separate them with commas. Leave empty to include all.
      • According to name:
      • According to position:
  • Define file manipulations(Optional).

Source Auto Mapping

Data Integration automatically sets your mapping based on the data returned from the sheets you selected.
You may also modify the generated fields according to:

  • Choose the fields you want to fetch in the Mapping table, and add fields on your own, or remove unwanted fields.
  • Select the fill empty checkbox if you want empty cells to get values from the above closest cell.
  • Define each field as a source or static. The source is a field derived from your Google Sheet, and static is a field for adding an expression.
    • Static puts a repeated value in each cell in the respective field alias.

Using regular expressions

You can apply regular expressions (regex) in the Expression column of the field mapping. You can extract, transform, or partially match portions of the source value during processing.

The source evaluates the regex against the field value using Python’s re.match engine.

  • If the expression matches, the matched portion of the value is returned.
  • If the expression does not match, the result is returned as NULL.

For more information about supported regex syntax, refer to Python’s documentation.

warning
  • Defining an existing source field erases its previous source. If you erased a source field by setting it as static and want to restore it, add it manually or click Clear All and then restart auto-mapping.
  • The expressions may contain strings or functions that Google Sheets supports.
On this Page