Amazon Redshift walkthrough

You can integrate data from a Redshift database into a cloud target using Data Integration.

Connecting to Redshift in the Data Integration console

To connect to your Redshift Database, refer to the Redshift connection.

Pulling Redshift data into a Target

Using Data Integration, you can pull data from your Redshift tables and send that data into your target database.

Procedure

Select Create New River from the top right of the Data Integration page.
Choose Data Source to Target as your river type.
In the General Info tab, name your river and give it a description.
Go to the Source tab, find Redshift in the list of data sources, and select it.
Define a Source Connection (this is the connection created earlier in the process). If you do not have a Redshift connection in your Data Integration account, you can create a new connection by clicking Create New Connection.
Choose your River mode.
- Multi-Tables: Load multiple tables simultaneously from Redshift to your target.
- Custom Query: Create a custom query and load it into your target.

Database migration

For a detailed walkthrough of Multi-Tables mode, refer to the Database River modes topic.

Pulling data using a custom query

You can also use the Custom Query to define a query data pull from Data Integration. You can use any query that works in the source, using a specific SELECT query without any other statements. Data Integration is not compatible with multi-statements or SQL Script in the custom query field.

Configuring Extract method

Using Data Integration, you can pull your data incrementally or pull the complete data that exists in the table:

All: Fetch all data in the table using Chunks.
Incremental: Option to run over a column in a table or your custom query to SELECT. You can filter by Start and End dates to run over it. You can choose to run over the date using Daily, Monthly, Weekly, or Yearly chunks.

Define the incremental Field to use in the Incremental Field section. After choosing the incremental field, select the Incremental Type and the dates/values you would like to fetch.

note

Data Integration manages the increments over the runs using the Maximum value in the data. You can always retrieve the entire dataset since the last run, which prevents data gaps. You need to configure your river once.

Recommended: Define your incremental field in Data Integration over a field with an Index or Partitions key in the table.

Limit and auto mapping

After defining the extract method, you can choose a limit of top N rows to fetch. Data Integration sets your Schema using the Auto Mapping feature. You can also select fields you want to fetch in the Mapping table and add fields on your own.

Multi-Table mode

Load multiple tables simultaneously from SQL Server to your target. There are two Default Extraction Modes: Standard Extraction and Log-Based.

On the Table Settings tab, you can:

Change the loading mode
Change the extraction method. If you select Incremental, define which field will be used to define the increment.
Filter by expression that is used as a WHERE clause to fetch the selected data from the table.

Scheduling the river

After creating a river, navigate to the Schedule tab and click Schedule Me.
Choose the frequency at which to schedule the river.
To notify certain users about a river failure or warning, enable notifications.
You can edit your {Mail_Alert_Group} in the Variables page in the left-hand pane of the browser.

Monitoring the river

During the river run, or after the run has completed, you can monitor the river in its Activities tab.

In this tab, you can monitor the status of the current river run. For the Multi-Table mode, you can monitor at the table-level.

By toggling between Run View and Target View, you can view the river results grouped either by time of run or by target location. Refer to the Targets section to find out how to load the data into your target data warehouse.