Amazon RDS PostgreSQL as a target
PostgreSQL is primarily used as an operational database management system. It structures data in a row-based format, similar to other operational RDBMS, as opposed to a column-based format. This row-based structure offers specific advantages and disadvantages, particularly when processing large volumes of data. However, it provides optimized performance for key columns, indexes, and JOIN operations.
Establish a connection
Refer to Amazon RDS PostgreSQL connection topic to establish a connection.
Configuring PostgreSQL as target
- In your Data Source to Target River, select the RDS/Aurora PostgreSQL option in the "Target" step.
- Input the Schema and Table to write into.
- Choose the Loading mode: Upsert-Merge, Append, or Overwrite.
- Overwrite: TRUNCATE existing records in the target Table and INSERT new records.
- Append: INSERT INTO the target Table (matching on columns), appending new records and retaining existing ones.
- Upsert Merge: Offers 2 methods:
- Delete-Insert: Deletes existing rows (when keys match between new and existing data) and inserts new data. This method avoids duplication, creates an index for better performance, and recommended for ongoing merge runs.
- Insert on Conflict: Uses INSERT INTO and ON CONFLICT clauses based on primary keys selected in the table mapping.
Loading unstructured data into PostgreSQL
PostgreSQL has powerful JSON extraction functions since version 9.4. To manage dynamic loading from unstructured data sources (such as MongoDB), any unstructured column in the mapping remains as a "JSON" data type in the target table. The table contains both structured data types (for example, varchar, integer, and date) and unstructured JSON data types (for example, arrays and records).
Example of handling JSON data in PostgreSQL:
-- Action table has action_calc json column with the following values:
-- {"name": "click", "amount": 4 }
SELECT action_calc -> 'name' AS customer
FROM actions;
-- Result: 'click'
Data Integration data load and table metadata handling
Data Integration manages table metadata updates along with data loading.
Staging table loading
- Create a table with a
tmp_prefix with the correct metadata. - Load data into the
tmp_table usingaws_s3.table_import_from_s3(for RDS/Aurora) or the COPY statement (for others). - Flatten the data (if JSON type in the source) using JSON functions and casting with
<column>::<type>syntax. - Auto-scale integer columns to the appropriate data type based on their data length (SMALLINT -> INTEGER -> BIGINT -> NUMERIC).
- Drop the old
tmp_table with thejson_datafield (if applicable).
Load from staging table to target table
You can load from the staging table in one transaction (leveraging PostgreSQL's ability to run DDL and DML inside one transaction).
- Define and update any metadata changes on the target table, excluding dropping columns.
- Start a transaction with a time limit.
- If the Target table does not exist, create it using the
CREATE AS SELECTclause. - Add, drop, or alter columns in the Target table.
- Create any primary key/index needed for data uniqueness (in Merge - insert-on-conflict) or for target merge performance (in Merge - Delete-Insert).
- Retain any key set by the user in the overwrite loading mode with the “purge keys” option enabled.
- Load data into the Target table using the specified loading mode.
- Commit the transaction.
- Drop the temporary tables used in the process.
Data Integration relies on PostgreSQL's default lower-case sensitivity. Case sensitivity is fundamental for creating, updating, and managing schemas, tables, columns, views, and other database objects within Data Integration. Ensure your metadata does not contain names that differ only by case.