Setting up Google Cloud Storage as a target
Set your Google Cloud credential JSON key, create a Google Cloud Storage bucket, and obtain the necessary credentials for using Google Cloud Storage with Data Integration.
Prerequisite
Ensure you’ve signed up for the Google Platform and you have a console Admin user. If you do not have one of these prerequisites, you can start here .
Create a service account user for Data Integration
Data Integration uses a Google Cloud Storage bucket to upload your source data into it. Create a user in the Google Cloud Platform console with access to the relevant bucket and the relevant BigQuery project.
Procedure
- Sign in to the Google Cloud Platform console.
- Go to IAM & Admin > Service account > and click CREATE SERVICE ACCOUNT.
- In the CREATE SERVICE ACCOUNT page:
- Set your Service Account name (Data_Integration User) and click CREATE AND CONTINUE.
- Grant the service account access to the project by setting Roles:
- Click on the drop-down list and select BigQuery Admin.
- Click the ADD ANOTHER ROLE and do the same process for Storage Admin.
- Copy your Account Service ID / Email from the service account list. You can use this to enter it in a Data Integration connection.
- Create a key for the service account:
- Go to the service account screen, locate the service you created, and click on it.
- In the new service account page, click on Key.
- Click on Add key.
- Choose key type JSON and click on create.
- Your JSON secret key is downloaded.
Enabling Cloud Storage and GCS API
- Go to API's & Services and click ENABLE APIS AND SERVICE.
- Search for Google Cloud Storage JSON API and click Enable API.
Creating a Google Cloud Storage bucket
Data Integration needs a Google Cloud Storage bucket to be a FileZone before your data is loaded up to BigQuery. You can use the FileZone bucket or objects as a base for other Hadoop or Apache Spark operations, such as those provided by Google Data PROC or your different services.
Procedure
- Sign in to the Google Cloud Platform console.
- Go to Storage > Browse, and click CREATE BUCKET.
- In the CREATE BUCKET page:
- Set Bucket Name, for example: project_name_data_integration_file_zone
- Set your Bucket to Regional (Multi-Region is not stable for loading) and choose your preferred location
- Click CREATE .
Configuring Google Cloud Storage bucket in Data Integration
Create a new connection for your Google Cloud Storage. Enter your credentials information for Google Platform Service Account.
- Connection Name
- Project ID (is available on the Google Platform Home section)
- Project Number (optional - is available on Google Platform Home section)
- Service Account Email - you need the Service Account ID to copy the Service Account page.
- Region - the region your bucket was created in.
- Set your custom File Xone to save the data in your own staging area (Optional).
- Click Test Connection. Once a valid connection is made, save the connection.
Known issues
Sometimes the Storage Admin type user role does not have a certain permission storage.buckets.get given to it by default:
In this case, you must edit your GCP user roles by duplicating that Storage Admin role. Ensure the custom role you create has the storage.buckets.get permission, then assign your service account this custom role instead of the Storage Admin (Refer to the Create a Service Account User for Data Integration section).