Amazon S3 as a source connection
This topic provides a step-by-step tutorial for creating an Amazon S3 connection.
Prerequisites
Create a Bucket
A bucket is an object container. To store data in Amazon S3, you must first create a bucket and specify a bucket name as well as an AWS Region. Then you upload your data as objects to that bucket in Amazon S3. Each object has a key (or key name) that serves as the object's unique identifier within the bucket. Let's begin by logging into AWS and searching for Buckets:
Add an IAM policy
An IAM policy is a resource-based policy that can be attached to an IAM Role to grant permissions. Let's create a policy to grant the necessary permissions.
Make sure to replace RiveryFileZoneBucket with the name of your S3 bucket.
Here's the policy's code:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
Create a Data Integration user in AWS
Now, in order to connect to the Amazon S3 Source and Target (described in the following section) in Data Integration console, you must first create an AWS Data Integration user:
Create an AWS user for Data Integration
Create a user for Data Integration and grant it permission to manage and read the FileZone bucket by following these steps:
-
Sign in to the AWS Management Console and open the Amazon S3 console.
-
Navigate to Users > Add User.
-
In the console, set your username, and select the Access type to Programmatic access.
-
Click Next: Permissions.
-
In the Set Permissions form, select Attach Existing Policies Directly.
-
Click Create Policy.
-
Navigate to the JSON tab.
-
Copy and paste the following policy:
Replace RiveryFileZoneBucket with the name of your S3 bucket
.
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
-
Select Review Policy.
-
Give the Policy a name and click Create Policy.
-
Using the rounded arrows on the upper right, refresh the list of policies, check the policy you just created, and select Next: Tags.
-
Click Next: Tags, Next: Review and then Create User to complete the process.
-
In the summary screen, you'll find the user's AWS credentials (Access key id and Secret access key), which you can download as a CSV file (this is the only time you'll be able to do so).
-
The User should now be able to manage and read the FZ bucket that was created for Data Integration. Check once again that the policy you created is linked to the user you created.
Connection procedure
AWS keys
- Enter the Connection Name.
- From the drop-down menu, choose your Region.
- Select AWS Keys credentials type.
- Enter your AWS Access key id and Secret access key.
- Use the Test Connection function to see if your connection is up to the task. If the connection succeeded, you can now use this connection in Data Integration.
IAM role - automatic
- Enter the Connection Name.
- From the drop-down menu, choose your Region.
- Select IAM Role - Automatic credentials type.
- To initiate the AWS CloudFormation Stack, click Launch Stack.
- Replace the External ID in the Parameters section with the one you were given in the Data Integration console.
- Select I acknowledge that AWS CloudFormation may create IAM resources in the Review tab, then click Create.
- Copy the value of RiveryAssumeRoleArn from the Output tab in the stack.
- Paste the Role ARN Key.
- Use the Test Connection function to see if your connection is up to the task. If the connection succeeded, you can now use this connection in Data Integration.
IAM role - manual
-
Enter the Connection Name.
-
From the drop-down menu, choose your Region.
-
Select IAM Role - Automatic credentials type.
-
Initiate the AWS IAM console.
-
Click Policies on the side menu, and select Create Policy.
a. Navigate to the JSON tab.
b. Copy the following policy:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}c. Paste the Policy it into the description box, then click Review Policy.
-
Name the Policy - Data Integration-S3-Policy and click Create Policy.
-
Click Roles on the side menu, and select Create Role.
-
Select Another AWS Account and change the Account ID to the one you were given in the Data Integration console.
-
Check Require External ID, and set External ID to the one you were given in the Data Integration console.
- Click Next.
- Attach the Data Integration-S3-Policy to the Attach Policy form.
- Set Data Integration-S3-Role as the role name.
- Copy the Role ARN From the Role's window and paste it into the field below.
- Use the Test Connection function to see if your connection is up to the task. If the connection succeeded, you can now use this connection in Data Integration.