Skip to main content
Feedback

Configuring PostgreSQL CDC replication slots and publications

Replication is a key feature in database management systems that synchronizes data between multiple database instances. Database teams use replication for high availability, disaster recovery, and data distribution. In some replication setups, publication slots allow you to control the flow of changes between the primary and replica databases.

note
  • Replication slot and publication names in PostgreSQL are case-sensitive. To access data from a replication slot, you must have superuser privileges. On Amazon RDS, this requires full superuser access. On Google Cloud Platform, you must have the cloudsqlsuperuser role to retrieve the necessary REPLICATION permission.
  • To create a publication, ensure you have superuser (or cloudsqlsuperuser) privileges, which grants you the required CREATE permission.

Replication slots configuration

Replication slots enable data replication in PostgreSQL by reserving space on the server to store replicated data before it is consumed. To configure replication slots, you must grant the Data Integration user the REPLICATION role.

If the server does not have a REPLICATION role, you can create one using the following SQL commands:

CREATE ROLE `<replication_role_name>` REPLICATION LOGIN;

Replace <replication_role_name> with a role name of your choice.

Once the role has been created, run the following SQL command to create a new login for Data Integration Log-Based extraction if it does not already exist:

CREATE USER data_integration WITH `<replication_role_name>`;

If the Data Integration user already exists, run the following SQL command to assign the new role:

ALTER USER data_integration WITH `<replication_role_name>`;

Configure replica identity for selected tables

Each table's REPLICA IDENTITY controls what previous-row data PostgreSQL writes to the WAL for UPDATE and DELETE events. This setting determines whether Data Integration can replicate the table correctly.

SettingBehavior
DEFAULTLogs primary key columns only. Default for every table.
INDEXLogs columns of a specified unique index.
FULLLogs every column of the previous row.
NOTHINGLogs nothing. UPDATE and DELETE events will not replicate.

For more information on DELETE events, refer to Databases overview.

When FULL is required

Data Integration requires REPLICA IDENTITY FULL on a selected table when either condition is true:

  • The table has TOAST storage. PostgreSQL stores oversized values (large TEXT, JSONB, BYTEA) out of line. When an UPDATE does not modify a TOASTed column, PostgreSQL omits its value from the WAL. Without FULL, that value cannot be recovered and the column is written as NULL at the destination, overwriting real data.
  • The table has no primary key or unique index. Without an identifying key in the WAL, UPDATE and DELETE events cannot be matched to a target row.

Tables with a primary key and no TOAST-eligible columns can remain on DEFAULT. Set FULL only where required.

Identify which tables need FULL

Run the following query to check a table's TOAST storage and replica identity setting:

SELECT relname,
reltoastrelid, -- non-zero means TOAST is linked
relreplident -- 'd'=Default, 'f'=Full, 'i'=Index, 'n'=Nothing
FROM pg_class
WHERE relname = '<table_name>';

A selected table must have relreplident = 'f' if reltoastrelid is non-zero or it has no primary key or unique index.

Apply the setting

ALTER TABLE <schema>.<table> REPLICA IDENTITY FULL;

What happens if you skip it

When a Data Flow is activated, Data Integration validates replica identity on every selected table. Activation fails for any qualifying table not set to FULL, with the offending table named in the error.

This prevents silent data loss. Without FULL, an UPDATE to a non-TOAST column on a row with a TOASTed column produces a WAL event missing the TOAST value. The destination row is then written with NULL in place of real data, with no error to indicate the loss.

Performance considerations

FULL increases WAL volume because the entire previous row is logged on every UPDATE and DELETE. On high-throughput tables, this can increase WAL size and replication lag. Apply FULL only to tables where it is technically required, not globally.

Publication slots configuration

Use the Create Replication Slot command in PostgreSQL to establish a slot dedicated to publishing data. This command reserves space in the primary database for the publication slot, enabling you to publish data and keep other databases in sync.

The basic syntax to create a publication slot is as follows:

SELECT pg_create_logical_replication_slot('slot_name', 'plugin');

Replace 'slot_name' with a unique name for the publication slot, and 'plugin' with the name of the logical decoding plugin to use for replication.

Custom configurations

Once you have granted the appropriate permissions, Data Integration automatically generates replication slots and publications in a PostgreSQL database as part of its default configuration. You can also create dedicated replication slots or publications for specific Data Flows.

Custom replication slots

With Data Integration, you can set a custom replication slot name. This name identifies the PostgreSQL logical decoding slot that streams changes from a particular plug-in for a specific database or schema.

note

If you do not provide a replication slot, a default one will be automatically created: Data_Integration_<account id>_<{conKeyRefs.DataFlow} id>_source.

Run the following command to view the replication slots and their associated data:

SELECT * FROM pg_replication_slots;

To create a replication slot:

  1. Access the PostgreSQL client by logging in with superuser credentials.
  2. Run the following command:
    SELECT pg_create_logical_replication_slot('data_integration_pgoutput', 'pgoutput');
  3. Assign the replication slot name to the Data Flow.

Custom publication slots

You can create a publication for specific tables so that Data Integration replicates only changes from those tables. You must have CREATE privileges to perform this action.

note

If a specific name is not provided, the default name for the following publication will be created: data_integration_<account id>_<{conKeyRefs.DataFlow} id>_source.

In such cases, Data Integration automatically executes the following command during the initial run and creates a publication for ALL TABLES:

 CREATE PUBLICATION data_integration_<account id>_<{conKeyRefs.DataFlow} id>_source FOR ALL TABLES;

To create a publication for specific tables:

  1. Log in to a PostgreSQL client as a superuser.
  2. Create publication for your tables.
    CREATE PUBLICATION data_integration_publication FOR TABLE movies, films, actors;
    To remove or update the list of tables under a publication:
    ALTER PUBLICATION data_integration_publication ADD/DROP TABLE my_table;
    Optionally, you can create a publication for specific operations on particular tables:
    CREATE PUBLICATION data_integration_insert_only FOR TABLE movies WITH (publish = 'INSERT');
  3. Ensure that your tables appear in the publication.
    SELECT * FROM pg_publication_tables;
  4. Once you have created your publication, assign its name to the Data Flow.
  5. To verify the publication is configured correctly, run the following command:
    SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
    FROM pg_publication
    WHERE pubname = ‘data_integration_insert_only’;

Partitioned tables in PostgreSQL

Data Integration supports PostgreSQL partitioned tables in both Standard Extraction and Change Data Capture (CDC).

On this Page