Quick guide to converting a CSV file to Parquet
This is a step-by-step guide for using Data Integration to convert a CSV file to Parquet in Amazon S3.
You will need the following to do so:
- Bucket
- Policy
- Data Integration User in AWS
Bucket
A bucket is an object container. To store data in Amazon S3, you must first create a bucket and specify a bucket name as well as an AWS Region. Then you upload your data as objects to that bucket in Amazon S3. Each object has a key (or key name) that serves as the object's unique identifier within the bucket. Let's begin by logging into AWS and searching for Buckets:
Policy
A bucket policy is a resource-based policy that allows you to grant access permissions to your bucket and the objects contained within it. Now that you've created a bucket, let's create a policy to grant the necessary permissions:
Here's the policy's code:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"RiveryManageFZBucket",
"Effect":"Allow",
"Action":[
"s3:GetBucketCORS",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`"
},
{
"Sid":"RiveryManageFZObjects",
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:PutObjectVersionAcl",
"s3:PutObjectAcl",
"s3:ListMultipartUploadParts"],
"Resource":"arn:aws:s3:::`<RiveryFileZoneBucket>`/*"
},
{
"Sid":"RiveryHeadBucketsAndGetLists",
"Effect":"Allow",
"Action":"s3:ListAllMyBuckets",
"Resource":"*"
}
]
}
Data Integration user in AWS
Now, in order to connect to the Amazon S3 source and target (described in the following section) in Data Integration console, you must first create an AWS Data Integration user:
Converting with Data Integration
After you've completed all of the necessary AWS configurations, you'll need to create a Data Integration Account in order to connect to the Data Integration Console. Then, using Data Integration feature, you'll be able to convert the CSV file to Parquet.
That's all there is to it; you've completed the quick guide and successfully converted the file.