Amazon RDS PostgreSQL as a target

PostgreSQL is primarily used as an operational database management system. It structures data in a row-based format, similar to other operational RDBMS, as opposed to a column-based format. This row-based structure offers specific advantages and disadvantages, particularly when processing large volumes of data. However, it provides optimized performance for key columns, indexes, and JOIN operations.

Establish a connection

Refer to Amazon RDS PostgreSQL connection topic to establish a connection.

Configuring PostgreSQL as target

In your Data Source to Target River, select the RDS/Aurora PostgreSQL option in the Target step.
Input the Schema and Table to write into.
Choose the Loading mode: Upsert-Merge, Append, or Overwrite.
- Overwrite: TRUNCATE existing records in the target Table and INSERT new records.
- Append: INSERT INTO the target Table (matching on columns), appending new records and retaining existing ones.
- Upsert Merge: Offers 2 methods:
  - Delete-Insert: Deletes existing rows (when keys match between new and existing data) and inserts new data. This method avoids duplication, creates an index for better performance, and recommended for ongoing merge runs.
  - Insert on Conflict: Uses INSERT INTO and ON CONFLICT clauses based on primary keys selected in the table mapping.

Loading unstructured data into PostgreSQL

PostgreSQL has powerful JSON extraction functions since version 9.4. To manage dynamic loading from unstructured data sources (such as MongoDB), any unstructured column in the mapping remains as a JSON data type in the target table. The table contains both structured data types (for example, varchar, integer, and date) and unstructured JSON data types (for example, arrays and records).

Example of handling JSON data in PostgreSQL:

-- Action table has action_calc json column with the following values:
-- {"name": "click", "amount": 4 }

SELECT action_calc -> 'name' AS customer
FROM actions;

-- Result: 'click'

Data Integration data load and table metadata handling

Data Integration manages table metadata updates along with data loading.

Staging table loading

Create a table with a tmp_ prefix with the correct metadata.
Load data into the tmp_ table using aws_s3.table_import_from_s3 (for RDS/Aurora) or the COPY statement (for others).
Flatten the data (if JSON type in the source) using JSON functions and casting with <column>::<type> syntax.
Auto-scale integer columns to the appropriate data type based on their data length (SMALLINT -> INTEGER -> BIGINT -> NUMERIC).
Drop the old tmp_ table with the json_data field (if applicable).

Load from staging table to target table

You can load from the staging table in one transaction (leveraging PostgreSQL's ability to run DDL and DML inside one transaction).

Define and update any metadata changes on the target table, excluding dropping columns.
Start a transaction with a time limit.
If the Target table does not exist, create it using the CREATE AS SELECT clause.
Add, drop, or alter columns in the Target table.
Create any primary key/index needed for data uniqueness (in Merge - insert-on-conflict) or for target merge performance (in Merge - Delete-Insert).
Retain any key set by the user in the overwrite loading mode with the “purge keys” option enabled.
Load data into the Target table using the specified loading mode.
Commit the transaction.
Drop the temporary tables used in the process.

Considerations and best practices in PostgreSQL

Data Integration relies on PostgreSQL's default lower-case sensitivity. Case sensitivity is fundamental for creating, updating, and managing schemas, tables, columns, views, and other database objects within Data Integration. Ensure your metadata does not contain names that differ only by case.