Amazon Redshift walkthrough
You can integrate data from a Redshift database into a cloud target using Data Integration.
Connecting to Redshift in the Data Integration console
To connect to your Redshift Database, refer to the Redshift connection.
Pulling Redshift data into a Target
Using Data Integration, you can pull data from your Redshift tables and send that data into your target database.
Procedure
-
Select Create New River from the top right of the Data Integration page.
-
Choose Data Source to Target as your river type.
-
In the General Info tab, name your river and give it a description.
-
Go to the Source tab, find Redshift in the list of data sources, and select it.
-
Define a Source Connection (this is the connection created earlier in the process). If you do not have a Redshift connection in your Data Integration account, you can create a new connection by clicking Create New Connection.
-
Choose your River mode.
- Multi-Tables: Load multiple tables simultaneously from Redshift to your target.
- Custom Query: Create a custom query and load it into your target.
- Legacy River: Choose a single source table to load into a single target.
Database migration
For a detailed walkthrough of Multi-Tables mode, refer to the Database Migration topic.
Pulling data using a custom query
You can also use the Custom Query to define a query data pull from Data Integration. You can use any query that works in the source, using a specific SELECT query without any other statements. Data Integration is not compatible with multi-statements or SQL Script in the custom query field.
Configuring Extract method
Using Data Integration, you can pull your data incrementally or pull the complete data that exists in the table:
-
All: Fetch all data in the table using Chunks.
-
Incremental: Option to run over a column in a table or your custom query to SELECT. You can filter by Start and End dates to run over it. You can choose to run over the date using Daily, Monthly, Weekly, or Yearly chunks.
Define the incremental Field to use in the Incremental Field section. After choosing the incremental field, select the Incremental Type and the dates/values you would like to fetch.
Data Integration manages the increments over the runs using the Maximum value in the data. You can always retrieve the entire dataset since the last run, which prevents data gaps. You need to configure your river once.
Recommended: Define your incremental field in Data Integration over a field with an Index or Partitions key in the table.
Limit and auto mapping
After defining the extract method, you can choose a limit of top N rows to fetch. Data Integration sets your Schema using the Auto Mapping feature. You can also select fields you want to fetch in the "Mapping" table and add fields on your own.
Legacy river mode
This river mode lets you load a single source table into a single target table.
There are steps to define the source table to pull. Data Integration auto-detects available schemas and tables, the extraction method to use, and options for filters or row limits on the data pull.
Multi-Table mode
Load multiple tables simultaneously from SQL Server to your target. There are two Default Extraction Modes: Standard Extraction and Log-Based.
On the Table Settings tab, you can:
- Change the loading mode
- Change the extraction method. If you select Incremental, define which field will be used to define the increment.
- Filter by expression that is used as a WHERE clause to fetch the selected data from the table.
Scheduling the river
-
After creating a river, navigate to the Schedule tab and click Schedule Me.
-
Choose the frequency at which to schedule the river.
-
To notify certain users about a river failure or warning, enable notifications.
-
You can edit your
{Mail_Alert_Group}in the Variables page in the left-hand pane of the browser.
Monitoring the river
During the river run, or after the run has completed, you can monitor the river in its Activities tab.
In this tab, you can monitor the status of the current river run. For the Multi-Table mode, you can monitor at the table-level.
By toggling between Run View and Target View, you can view the river results grouped either by time of run or by target location. Refer to the Targets section to find out how to load the data into your target data warehouse.