Configuring PostgreSQL CDC replication slots and publications

Replication is a key feature in database management systems that enables the synchronization of data between multiple database instances. It is commonly used for high availability, disaster recovery, and data distribution. In some replication setups, publication slots are used to control the flow of changes between the Master and replica databases.

note

Replication slot and publication names in PostgreSQL are case-sensitive. Having superuser privileges is essential for accessing data from a replication slot. On Amazon RDS, this requires full superuser access, while on Google Cloud Platform you must have the cloudsqlsuperuser role to obtain the necessary REPLICATION permission.
To create a Publication, make sure to have superuser (or cloudsqlsuperuser) privileges, which grants you the required CREATE permission.

Replication slots configuration

Activating replication slots enables efficient and reliable data replication in PostgreSQL. By authorizing the Data Integration user with the REPLICATION role, you enable the necessary permissions for managing replication slots. These slots play a crucial role in the replication process by reserving space on the server to store replicated data before it is consumed. Proper configuration of replication slots is crucial for maintaining this setup.

To configure replication slots, the Data Integration user must be granted the REPLICATION role. If the server does not have a REPLICATION role, it can be created using the provided SQL commands in your PostgreSQL:

CREATE ROLE `<replication_role_name>` REPLICATION LOGIN;

Ensure to replace <replication_role_name> with a role name of your preference.

Once the role has been created, execute the following SQL command to create a new login for Data Integration Log-Based extraction if it doesn't already exist:

CREATE USER data_integration WITH `<replication_role_name>`;

In case the Data Integration user already exists, use the following SQL command to assign the new role to the user:

ALTER USER data_integration WITH `<replication_role_name>`;

Configure replica identity to each non-keyed or unique table

To ensure that the previous values of a table are retrieved from the Write-Ahead Log (WAL) in Postgresql during DELETE events, it is necessary to configure the replication identity of the tables. Postgres offers 4 different types of REPLICA IDENTITY that dictate how events are written to the logical replication stream:

DEFAULT: This setting includes the previous values of a table's primary key columns only in UPDATE and DELETE events. In the case of UPDATE, it includes only the primary columns with changed values.
NOTHING: With this setting, the UPDATE and DELETE events do not contain any information about the previous values of any table column.
FULL: The previous values of all columns in the table are stored in the UPDATE and DELETE events when this setting is used.
INDEX: UPDATE and DELETE events include the previous values of the columns defined in the index named "index name". However, in the case of UPDATE, only the indexed columns with changed values are included.

For additional details regarding DELETE events, refer to our Databases overview page.

Choose the appropriate replica identity option based on your requirements. For non-keyed or unique tables, the FULL option is often used to ensure accurate replication:

ALTER TABLE `<schema>`.`<table>` REPLICA IDENTITY FULL;

Replace <**schema**> and <**table**> with the actual names of the schema and table you want to modify.

Publication slots configuration

Publication slots configuration involves using the Create Replication Slot command in PostgreSQL to establish a slot dedicated to publishing data. This command serves the purpose of reserving a specific space within the primary database, exclusively intended for the designated publication slot. By creating such a slot, users gain the ability to efficiently and securely publish data to the primary database while ensuring seamless replication and synchronization with other databases.

The basic syntax to create a publication slot is as follows:

SELECT pg_create_logical_replication_slot('slot_name', 'plugin');

Replace 'slot_name' with a unique name for the publication slot, and 'plugin' with the name of the logical decoding plugin you intend to use for replication.

Custom configurations

Once you have granted the appropriate permissions, Data Integration will automatically generate replication slots and publications in a Postgresql database as part of its default configuration. However, you have the option to create dedicated replication slots or publications for specific Rivers according to your preferences.

Custom replication slots

With Data Integration, you have the flexibility to set a custom Replication slot name. This name refers to the PostgreSQL logical decoding slot that is generated to stream changes from a particular plug-in for a specific database/schema.

note

If you do not provide a replication slot, a default one will be automatically created: Data_Integration_<account id>_<river id>_source.

Run this to view the replication slots and the associated data within them:

SELECT * FROM pg_replication_slots;

To create a replication, perform the following commands in your database:

Access the PostgreSQL client by logging in with superuser credentials.

Execute the following command:

SELECT pg_create_logical_replication_slot('data_integration_pgoutput', 'pgoutput');

Assign the replication slot name to the River.

Custom publication slots

You can create a publication for specific tables in your database, where only changes from those tables will be replicated by Data Integration. To perform this action, you will need to have the CREATE privileges.

note

If a specific name is not provided, the default name for the following publication will be created: data_integration_<account id>_<river id>_source.

In such cases, Data Integration automatically executes the following command during the initial run and creates a publication for ALL TABLES:

 CREATE PUBLICATION data_integration_<account id>_<river id>_source FOR ALL TABLES;

To create a publication for specific tables within your database, perform the following commands:

Create publication for your tables.

CREATE PUBLICATION data_integration_publication FOR TABLE movies, films, actors;

To remove or update the list of tables under a publication:

ALTER PUBLICATION data_integration_publication ADD/DROP TABLE my_table;

Optionally, you can also choose to create a publication for specific operations on particular tables:

CREATE PUBLICATION data_integration_insert_only FOR TABLE movies WITH (publish = 'INSERT');

Ensure that your tables are included in the publication.
```
SELECT * FROM pg_publication_tables;
```
Once you have created your publication, assign its name to the River.

To ensure the accurate insertion of the publication, execute the following command:

SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
FROM pg_publication
WHERE pubname = ‘data_integration_insert_only’;

Partitioned tables in PostgreSQL

Data Integration supports PostgreSQL partitioned tables in both Standard Extraction and Change Data Capture (CDC).