Converting a CSV file to Parquet
You can use Data Integration to convert a CSV file to Parquet in Amazon S3.
Prerequisites
- Bucket
- Policy
- Data Integration User in AWS
Creating a bucket
A bucket is an object container. To store data in Amazon S3, you must first create a bucket and specify a bucket name as well as an AWS Region. Then you upload your data as objects to that bucket in Amazon S3. Each object has a key (or key name) that serves as the object's unique identifier within the bucket. Let's begin by logging into AWS and searching for Buckets:
Policy
A bucket policy is a resource-based policy that allows you to grant access permissions to your bucket and the objects contained within it. Now that you've created a bucket, let's create a policy to grant the necessary permissions:
Policy code:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
Data Integration user in AWS
To connect to the Amazon S3 source and target (described in the following section) in Data Integration console, you must create an AWS Data Integration user.
Converting with Data Integration
After completing all the necessary AWS configurations, you need to create a Data Integration Account to connect to the Data Integration console. Then, using Data Integration feature, you can convert the CSV file to Parquet.