Skip to main content
Feedback

Amazon Redshift as a target connection

By integrating Amazon Redshift with Data Integration, you can easily extract data from various sources, transform it as needed, and load it into Amazon Redshift for further analysis or operational use.

Prerequisites

AWS Superuser account

Sign in to AWS as a Super user.

AWS permissions

Creating a Redshift cluster

Procedure

  1. If you do not have a Redshift cluster, you can create one.
  2. Once your cluster is created, you need to authorize access to the cluster.
Creating your user and schema within the Redshift database

You must define the database where Data Integration uploads data in the Cluster creation page of AWS.

Procedure

  1. Connect to your Redshift cluster using the credentials set during cluster creation, using your preferred SQL management tool (for example, Aginity, SQL Workbench/J, or DBeaver).

Ensure the Redshift cluster is externally accessible. Obtain the host address from the Redshift cluster details.

  1. Create the user and schema by running the following SQL commands (replace "Data Integration User" and "schema name" with your desired names):
       CREATE USER < Data Integration User> with password '`<myPwd>`' CREATEUSER;
CREATE SCHEMA IF NOT EXISTS <schema name> authorization <Data Integration User>;
  1. Grant the necessary permissions to the user by running the following commands:
   GRANT SELECT on ALL TABLES IN SCHEMA <schema name> TO < Data Integration User>;
GRANT EXECUTE on ALL FUNCTIONS IN SCHEMA <schema name> TO <Data Integration User>;
GRANT USAGE on SCHEMA <schema name> TO `<Data_Integration_User>`
  • Replace "Data Integration User" with the user previously created and "schema name" with the schema you grant Data Integration permission to use. Repeat this process for every schema you wish for Data Integration to access and use data.**
note

Data Integration conducts a review of the Access Control Lists (ACLs) to ensure that these permissions are replicated across all tables. The current Grant permissions are available within the raw ACLs, which are structured as shown below:

data_integration=arwdRxtDPA/data_integration,group test_group=arwdxtD/data_integration,user1_name=wx/data_integration,user1_name=x/data_integration

In this context, the string "arwdRxtDPA" denotes the permissions granted to the Data Integration user. Each character within this string corresponds to a specific action, with 'a' representing INSERT, 'r' representing SELECT, 'w' representing UPDATE, and 'd' representing DELETE. Data Integration interprets this string, generates grant statements, and subsequently executes them individually on the respective tables.

Whitelist Data Integration IPs

Procedure

  1. Log in to your AWS account and search for the "Amazon Redshift" service in the AWS console.
  2. Access the "clusters" section in the left-hand menu.
  3. Select the cluster you created.
  4. Navigate to the Properties tab and find "VPC security group" under "Network and security settings." Click on the Security Group ID.
  5. In the Inbound rules tab, click Edit inbound rules.
  6. Add a rule of type "Redshift" and specify the required source IP addresses as outlined in the documentation.
  7. Save the rules to whitelist the specified IPs.

Amazon Redshift serverless

Data Integration supports Amazon Redshift Serverless, enabling you to seamlessly load and transform data into a fully managed data warehouse. This integration enables robust analytics capabilities and optimized workflows without requiring infrastructure management.

Key features of Amazon Redshift serverless integration

1. Serverless architecture

  • Eliminates the need for managing clusters by leveraging Amazon Redshift Serverless.
  • Automatically scale based on workload while maintaining cost control through maximum RPU (Redshift Processing Unit) limits.

2. Encryption

  • Ensure data security with always-on encryption powered by AWS Key Management Service (KMS).

3. Data sharing

  • Leverage Redshift's data-sharing capabilities to seamlessly integrate workflows across both serverless and provisioned environments.

4. Data API support

  • Simplify connection management using the Amazon Redshift Data API, ideal for serverless setups or when preferring IAM-based authentication over traditional credentials.

5. Flexible connectivity

  • Configure your connections with supported port ranges (for example, 5431–5455, 8191–8215) for greater flexibility.

Configuring Amazon Redshift serverless as a target in Data Integration

Step 1: preparing your Amazon Redshift Serverless workgroup

Setting up workgroup and namespace

  1. Log in to the AWS Management console.
  2. Navigate to Amazon Redshift Serverless.
  3. Create or verify your Workgroup and Namespace configurations.
  4. Note the Workgroup Endpoint and Port (default: 5439, or within the ranges 5431–5455 or 8191–8215).

Configuring security settings

  1. Ensure your VPC security groups or public access settings permit connections from Data Integration IP range or your network.
  2. Grant access to the necessary databases and schemas within your namespace.
  1. Enable the Amazon Redshift Data API for simplified connection management.
  2. The Data API is beneficial for serverless configurations and for using IAM-based authentication.

Step 2: preparing IAM roles or database credentials

Option 1: IAM authentication

  • Attach an IAM role to your Redshift serverless workgroup with policies such as AmazonRedshiftDataFullAccess.
  • Provide the AWS "Access Key" and "Secret Key" for authentication in Data Integration.

Option 2: database user credentials

  • Create a database user with write permissions to the target schema.

Step 3: setting up Redshift serverless as a target in Data Integration

  1. Log in to Data Integration and click on the Connections tab.
  2. Create a new Target connection by selecting Amazon Redshift as the target type.
  3. Fill in the following details:
    • Connection Name: Provide a name for the connection.
    • Workgroup Endpoint (Host): Enter the endpoint of your Redshift Serverless workgroup.
    • Port: Specify the port configured for your workgroup.
    • Database: Enter the database name within the namespace.
    • Authentication: Enter the Username and Password of the database user.
  4. Click Test Connection to ensure Data Integration can successfully connect to your Amazon Redshift Serverless instance.

Step 4: configuring your river

  1. Select the Target: Choose the newly created Redshift Serverless connection as the Target for your River.
  2. Define Target Schema and Table:
    • Specify the "Schema" and "Table" in the database where the system writes data.
    • Data Integration automatically creates tables if they do not already exist based on your River configuration. For more information, refer to the Amazon Redshift Serverless documentation.

Data Integration Redshift connection

Navigate the Data Integration console and select the Connection tab from the left-hand menu, and find 'Amazon Redshift'.

You can connect to Redshift using one of two methods:

  • Login Credentials
  • SSH Tunnel

Login credentials

  1. Whitelist our IPs.
  2. Choose your Connection Name.
  3. Enter Host.
  4. Fill in the Port Number.
  5. Enter your Username and Password
  6. Click Test Connection to verify your connection is up to the task. If the connection succeeds, you can use this connection in Data Integration.

SSH tunnel (optional)

To connect via SSH Tunnel, refer to Creating SSH tunnel topic.

note

When using an SSH tunnel, leave the SSH port empty or set it to the default port of 22.

Custom file zone (optional)

Data Integration lets you create a custom FileZone to manage your data in your own S3 service. Your data will be stored in your S3 bucket.

  1. Toggle the Custom File Zone to true.
  2. Click FileZone connection and select the configured Amazon S3 connection.
  3. Choose a Default Bucket from the drop-down list.
  4. Click the Test Connection function to verify your connection is up to the task.
  5. If the connection is successful, click Save.
On this Page