Amazon S3 as a source connection
Prerequisites
Creating a Bucket
A bucket is an object container. To store data in Amazon S3, you must create a bucket and specify a bucket name and an AWS Region. Then, upload your data as objects to the Amazon S3 bucket. Each object has a key (or key name) that serves as the object's unique identifier within the bucket.
To get started, log in to AWS and search for Buckets.
Adding an IAM policy
An IAM policy is a resource-based policy that can be attached to an IAM Role to grant permissions. Create a policy to grant the necessary permissions.
Make sure to replace RiveryFileZoneBucket with the name of your S3 bucket.
Here is the policy's code:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
Creating a Data Integration user in AWS
To connect to the Amazon S3 Source and Target in Data Integration console, you must create an AWS Data Integration user.
Create a user for Data Integration and grant it permission to manage and read the FileZone bucket.
Procedure
-
Sign in to the AWS Management console and open the Amazon S3 console.
-
Navigate to Users > Add User.
-
In the console, set your username, and select the Access type to Programmatic access.
-
Click Next: Permissions.
-
In the Set Permissions form, select Attach Existing Policies Directly.
-
Click Create Policy.
-
Navigate to the JSON tab.
-
Copy and paste the following policy:
Replace RiveryFileZoneBucket with the name of your S3 bucket
.
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
-
Select Review Policy.
-
Give the Policy a name and click Create Policy.
-
Refresh the list of policies, check the policy you just created, and select Next: Tags.
-
Click Next: Tags, Next: Review and then Create User to complete the process.
-
In the summary page, you can view the user's AWS credentials (Access key ID and Secret access key), which you can download as a CSV file (You can do this only once).
- The User can manage and read the FZ bucket created for Data Integration. Verify that the policy you created is linked to the user you created.
AWS role chaining
This connection setup involves Role Chaining, a process where one AWS identity assumes another IAM role to access resources. AWS enforces specific session duration limits for chained roles. For more information, refer to Role chaining.
Establishing Amazon S3 as a source connection in Data Integration
Procedure
AWS keys
- Navigate to the Data Integration Account.
- Click Connections and select + New Connection.
- Choose Amazon S3.
- Enter the Connection Name.
- From the drop-down menu, choose your Region.
- Choose AWS Keys as the Credentials Type.
- Enter your AWS Access key id and Secret access key.
- Assume Role Timeout (Seconds): The default value is 3600. You can adjust this value as needed for your specific configuration.
note
To ensure connection stability, configure the Maximum session duration for the IAM Role in the AWS Console to 12 hours (43200 seconds).
- Click Test Connection to verify your connection is up to the task. If the connection succeeded, you can use this connection in Data Integration.
IAM role - automatic
- Enter the Connection Name.
- From the drop-down menu, choose your Region.
- Select IAM Role - Automatic credentials type.
- To start the AWS CloudFormation Stack, click Launch Stack.
- Replace the External ID in the Parameters section with the one provided in the Data Integration console.
- Select I acknowledge that AWS CloudFormation may create IAM resources in the Review tab, then click Create.
- Copy the value of RiveryAssumeRoleArn from the Output tab in the stack.
- Paste the Role ARN Key.
- Assume Role Timeout (Seconds): The default value is 3600. You can adjust this value as needed for your specific configuration.
note
To ensure connection stability, configure the Maximum session duration for the IAM Role in the AWS Console to 12 hours (43200 seconds).
- Click Test Connection to verify your connection is up to the task. If the connection succeeded, you can use this connection in Data Integration.
IAM role - manual
- Enter the Connection Name.
- From the drop-down menu, choose your Region.
- Select IAM Role - Automatic credentials type.
- Go to the AWS IAM console.
- Click Policies from the menu, and select Create Policy.
a. Navigate to the JSON tab.
b. Copy the following policy:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
c. Paste the Policy it into the description, then click Review Policy.
- Name the Policy - Data Integration-S3-Policy and click Create Policy.
- Click Roles from the menu, and select Create Role.
- Select Another AWS Account and change the Account ID to the one provided in the Data Integration console.
- Check Require External ID, and set External ID to the one provided in the Data Integration console.
- Click Next.
- Attach the Data Integration-S3-Policy to the Attach Policy form.
- Set Data Integration-S3-Role as the role name.
- Copy the Role ARN from the Role's window and paste it into the field.
- Assume Role Timeout (Seconds): The default value is 3600. You can adjust this value as needed for your specific configuration.
note
To ensure connection stability, configure the Maximum session duration for the IAM Role in the AWS Console to 12 hours (43200 seconds).
- Click Test Connection to verify your connection is up to the task. If the connection succeeded, you can use this connection in Data Integration.