Google Cloud Storage connection
This topic provides step-by-step instructions on how to set your Google Cloud credential JSON key, create a Google Cloud Storage bucket, and obtain the necessary credentials for using Google Cloud Storage with Data Integration. By the end of this guide, you will have configured Data Integration to connect with your Google Cloud Storage.
Prerequisites
Ensure you have signed up for Google Cloud Platform and have an admin user for the Google Cloud Console. If not, you need to sign up here. If you don’t have one of these prerequisites, you can start here.
Create a service account user for Data Integration
Data Integration uses Google Cloud Storage to upload your source data. Follow these steps to create a user in the Google Cloud Platform Console with access to the relevant bucket and BigQuery Project.
Steps to create a service account
-
Sign into Google Cloud Platform Console.
-
Go to IAM & Admin.
-
Click Create Service Account.
-
In the Service Account Wizard, set your Service Account name (e.g., Data_Integration User) and click Create and Continue.
-
Create a Key for the Service Account:
a. Assign the BigQuery Admin role.
b. Click on Add Another Role and assign the Storage Admin role.
c. Copy your Service Account Email for later use in Data Integration.
-
Create a key for the service account:
a. Go to the service account screen, locate the service you've just created and click on it.
b. In the new service account screen click Key.
c. Click Add Key.
d. Choose key type JSON and click create. e. Your JSON secret key will be download. keep it in a safe place.
Enable Cloud Storage and GCS API
-
Go to API's & Services and click ENABLE APIS AND SERVICE.
-
Search for Google Cloud Storage JSON API and click Enable API.
Create a Google Cloud Storage bucket
Data Integration needs a Google Cloud Storage bucket to be a FileZone before your data is loading up to BigQuery. You can either use the FileZone bucket or objects as a base to other Hadoop or Apache Spark operation by Google Data PROC, or by your other services.
So, let's create a Google Cloud Storage bucket for Data Integration:
-
Sign into Google Cloud Platform Console.
-
Go to Storage > Browse and click Create Bucket.
-
In the wizard:
a. Set Bucket Name, example: project_name_data_integration_file_zone.
b. Set your Bucket to be Regional (Multi-Region is not stable for loading) and choose your preferred location.
c. Click Create.
Configure your Google Cloud Storage bucket in Data Integration
Let’s create a new connection for your Google Cloud Storage.
-
Go to Connections.
-
Click New Connection.
-
From the source list, choose Google Cloud Storage.
-
Enter your credentials information for Google Platform Service Account:
a. Connection Name.
b. (Optional): Description.
c. Project Id (can be found on Google Platform Home section).
d. (Optional): Project Number: Can be found on Google Platform Home section.
e. Service Account email: It's Service Account Id that you used to copy the Service Account Wizard.
f. Choose file: The JSON credentials file that was generated at the end of Service Account Wizard.
g. Region: The region your bucket was created at.
h. Default bucket: The default bucket Data Integration will use(the one you've created).
-
Click Test Connection at the bottom to test. Once a valid connection is made, save the connection.
If you cannot get a valid connection set up, contact helpme@rivery.io for support.
Known issues
- Sometimes the Storage Admin type user role does not have a certain permission storage.buckets.get given to it by default:
In this case, you will have to edit your GCP user roles by duplicating that Storage Admin role by clicking Create from Role, making sure the custom role you create has the storage.buckets.get permission, then assigning your service account this custom role instead of the Storage Admin (see the Create a Service Account User for Data Integration section of this topic).
Conclusion
This topic showed you how to create a Service Account user for Data Integration and Cloud Storage Bucket.
You now have a Google Cloud Storage connection that you can use in every river that targets to it and also as a source.