Amazon Redshift as a target connection

By integrating Amazon Redshift with Data Integration, you can easily extract data from various sources, transform it as needed, and load it into Amazon Redshift for further analysis or operational use.

Prerequisites

AWS Superuser account

AWS permissions

Creating a Redshift cluster

Procedure

If you do not have a Redshift cluster, you can create one.
Once your cluster is created, you need to authorize access to the cluster.

Creating your user and schema within the Redshift database

You must define the database where Data Integration uploads data in the Cluster creation page of AWS.

Procedure

Connect to your Redshift cluster using the credentials set during cluster creation, using your preferred SQL management tool (for example, Aginity, SQL Workbench/J, or DBeaver).

Ensure the Redshift cluster is externally accessible. Obtain the host address from the Redshift cluster details.

Create the user and schema by running the following SQL commands (replace Data Integration User and schema name with your desired names):

       CREATE USER < Data Integration User> with password '`<myPwd>`' CREATEUSER;
       CREATE SCHEMA IF NOT EXISTS <schema name> authorization <Data Integration User>;

Grant the necessary permissions to the user by running the following commands:

   GRANT SELECT on ALL TABLES IN SCHEMA <schema name> TO < Data Integration User>;
   GRANT EXECUTE on ALL FUNCTIONS IN SCHEMA <schema name> TO <Data Integration User>;
   GRANT USAGE on SCHEMA <schema name> TO `<Data_Integration_User>`

Replace Data Integration User with the user previously created and schema name with the schema you grant Data Integration permission to use. Repeat this process for every schema you wish for Data Integration to access and use data.**

note

Data Integration conducts a review of the Access Control Lists (ACLs) to ensure that these permissions are replicated across all tables. The current Grant permissions are available within the raw ACLs, which are structured as shown below:

data_integration=arwdRxtDPA/data_integration,group test_group=arwdxtD/data_integration,user1_name=wx/data_integration,user1_name=x/data_integration

In this context, the string arwdRxtDPA denotes the permissions granted to the Data Integration user. Each character within this string corresponds to a specific action, with 'a' representing INSERT, 'r' representing SELECT, 'w' representing UPDATE, and 'd' representing DELETE. Data Integration interprets this string, generates grant statements, and subsequently executes them individually on the respective tables.

Allowlist Data Integration IPs

Procedure

Log in to your AWS account and search for the Amazon Redshift service in the AWS console.
Access the clusters section in the left-hand menu.
Select the cluster you created.
Navigate to the Properties tab and find VPC security group under Network and security settings. Click on the Security Group ID.
In the Inbound rules tab, click Edit inbound rules.
Add a rule of type Redshift and specify the required source IP addresses as outlined in the documentation.
Save the rules to allowlist the specified IPs.

Amazon Redshift serverless

Data Integration supports Amazon Redshift Serverless, enabling you to seamlessly load and transform data into a fully managed data warehouse. This integration enables robust analytics capabilities and optimized workflows without requiring infrastructure management.

Key features of Amazon Redshift serverless integration

1. Serverless architecture

Eliminates the need for managing clusters by leveraging Amazon Redshift Serverless.
Automatically scale based on workload while maintaining cost control through maximum RPU (Redshift Processing Unit) limits.

2. Encryption

Ensure data security with always-on encryption powered by AWS Key Management Service (KMS).

Leverage Redshift's data-sharing capabilities to seamlessly integrate workflows across both serverless and provisioned environments.

4. Data API support

Simplify connection management using the Amazon Redshift Data API, ideal for serverless setups or when preferring IAM-based authentication over traditional credentials.

5. Flexible connectivity

Configure your connections with supported port ranges (for example, 5431–5455, 8191–8215) for greater flexibility.

Configuring Amazon Redshift serverless as a target in Data Integration

Step 1: Prepare your Amazon Redshift Serverless workgroup

Setting up workgroup and namespace

Log in to the AWS Management console.
Navigate to Amazon Redshift Serverless.
Create or verify your Workgroup and Namespace configurations.
Note the Workgroup Endpoint and Port (default: 5439, or within the ranges 5431–5455 or 8191–8215).

Configuring security settings

Ensure your VPC security groups or public access settings permit connections from Data Integration IP range or your network.
Grant access to the necessary databases and schemas within your namespace.

Enabling data API (optional but recommended)

Enable the Amazon Redshift Data API for simplified connection management.
The Data API is beneficial for serverless configurations and for using IAM-based authentication.

Step 2: Prepare IAM roles or database credentials

Option 1: IAM authentication

Attach an IAM role to your Redshift serverless workgroup with policies such as AmazonRedshiftDataFullAccess.
Provide the AWS Access Key and Secret Key for authentication in Data Integration.

Option 2: database user credentials

Create a database user with write permissions to the target schema.

Step 3: Set up Redshift serverless as a target in Data Integration

Log in to Data Integration and click on the Connections tab.
Create a new Target connection by selecting Amazon Redshift as the target type.
Fill in the following details:
- Connection Name: Provide a name for the connection.
- Workgroup Endpoint (Host): Enter the endpoint of your Redshift Serverless workgroup.
- Port: Specify the port configured for your workgroup.
- Database: Enter the database name within the namespace.
- Authentication: Enter the Username and Password of the database user.
Click Test Connection to ensure Data Integration can successfully connect to your Amazon Redshift Serverless instance.

Step 4: Configure your river

Select the Target: Choose the newly created Redshift Serverless connection as the Target for your River.
Define Target Schema and Table:
- Specify the Schema and Table in the database where the system writes data.
- Data Integration automatically creates tables if they do not already exist based on your River configuration. For more information, refer to the Amazon Redshift Serverless documentation.

Data Integration Redshift connection

Navigate the Data Integration console and select the Connection tab from the left-hand menu, and find 'Amazon Redshift'.

You can connect to Redshift using one of two methods:

Login Credentials
SSH Tunnel

Allowlist our IPs.
Choose your Connection Name.
Enter Host.
Fill in the Port Number.
Enter your Username and Password
Click Test Connection to verify your connection is up to the task. If the connection succeeds, you can use this connection in Data Integration.

SSH tunnel (optional)

To connect via SSH Tunnel, refer to Creating SSH tunnel topic.

note

When using an SSH tunnel, leave the SSH port empty or set it to the default port of 22.

Custom file zone (optional)

Data Integration lets you create a custom FileZone to manage your data in your own S3 service. Your data will be stored in your S3 bucket.

Toggle the Custom File Zone to true.
Click FileZone connection and select the configured Amazon S3 connection.
Choose a Default Bucket from the drop-down list.
Click the Test Connection function to verify your connection is up to the task.
If the connection is successful, click Save.

Prerequisites

AWS Superuser account

AWS permissions

Procedure

Procedure

Procedure

Amazon Redshift serverless

Key features of Amazon Redshift serverless integration

1. Serverless architecture

2. Encryption

3. Data sharing

4. Data API support

5. Flexible connectivity

Configuring Amazon Redshift serverless as a target in Data Integration

Step 1: Prepare your Amazon Redshift Serverless workgroup

Setting up workgroup and namespace

Configuring security settings

Enabling data API (optional but recommended)

Step 2: Prepare IAM roles or database credentials

Option 1: IAM authentication

Option 2: database user credentials

Step 3: Set up Redshift serverless as a target in Data Integration

Step 4: Configure your river

Data Integration Redshift connection

Login credentials

SSH tunnel (optional)

Custom file zone (optional)