Skip to main content
Feedback

MySQL CDC configuration

Prerequisites

  • MySQL 5.6+ or MySQL 8.0+.
  • MySQL server instance running.
  • Permission to 'GRANT' and 'CREATE USER' in MySQL.

If you want to use CDC from a Read Replica, you must enable GTID.

Binlog vs. GTID

MySQL replication is a process that automatically copies data from one MySQL database server (the Master) to one or more MySQL database servers (the Slaves).

You can track these changes in MySQL in two ways:

  • Binlog
  • GTID

Binlog

The binary log is a collection of log files that contain information about the data changes made to a MySQL server instance. Starting the server with the --log-bin option enables the log.

MySQL 3.23.14 introduced the binary log. It contains all data-updating statements. Unless you use row-based logging, it also includes statements that could have potentially updated it (for example, a DELETE that matched no rows). Statements are saved as "Events" that describe the modifications. The binary log also includes information on how long each statement that updates data. Other metadata in the binary log include:

  • Information about the server's current state that is required to reproduce statements correctly.
  • Error messages.
  • Metadata required for binary log maintenance (for example, Rotate Events).

GTID

Global transaction identifiers (GTIDs) are unique identifiers generated for committed transactions in MySQL. GTIDs can be used to troubleshoot Binlog replication.

GTIDs are unique across all DB instances in a replication configuration. GTIDs make replication configuration easier as they eliminate the need to refer to log file positions. GTIDs also simplify tracking replicated transactions and determining whether the source instance and replicas are consistent.

GTIDs are unique identifiers for transactions that occur on a server within a cluster. Using GTIDs streamlines replication and ensures consistency between the primary and replica servers.

Using Binlog or GTID

When you have multiple instances (for example, a cluster replica), use GTID; otherwise, use Binlog.

Configuring MySQL CDC

Configuring CDC on a self-managed MySQL server

You must have access to the server's configuration files if the MySQL database is hosted on a self-managed MySQL server.

Check if the server's 'binlog' is enabled.

Run the following SQL statement to see if the 'binlog' is enabled in the MySQL server:

SELECT variable_value as "BINARY LOGGING STATUS (log-bin)::";
FROM information_schema.global_variables WHERE variable_name='log_bin';

If the outcome is 'OFF,' make the following changes to your MySQL configuration file (my.cnf):

  • Set binlog_format to ROW
  • Set binlog_row_image to FULL
  • Set expire_logs_days to 10
  • Set log_bin to mysql-bin
  • Make sure your server has enough storage to keep your database binlog for ten days.

Configuring CDC on Amazon RDS/Aurora MySQL

If you are using Amazon RDS, configure these properties through the RDS Console.

Creating a parameter group

Procedure

  1. Log in to the Amazon RDS console.
  2. Go to Parameter Groups and then Create Parameter Group.
  3. Create a parameter group to use with the database:
    • Under Parameter group family , select the relevant MySQL version.
    • Click Create after selecting a suitable Group Name and Description.
  4. Set the relevant parameters:
    • Choose a new group from the drop-down menu.
    • Click on Edit Parameters.
  5. Set the following relevant parameters in the parameter group:
    1. Set binlog_format to ROW.
    2. Set binlog_row_image to Full.
  6. Make sure you have enabled automated backups with at least 1 day retention.

Set the duration of your 'binlog' retention period

Data Integration suggests a 10-day 'binlog' retention period. Due to out-of-sync scenarios, you must limit the amount of re-sync (migration) required from the database.

According to the Amazon RDS documentation, the default strategy is to remove the log as quickly as possible so that you can increase the retention period from the database's default setting.

To check the current state of the bin log retention period:

  1. Connect to your MySQL using Workbench.
  2. Run the following command:
mysql> call mysql.rds_show_configuration;
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| name | value | description |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| binlog retention hours | NULL | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)

Query OK, 0 rows affected (0.06 sec)

If the 'binlog retention hours' parameter is NULL or 0, the database erases the log as quickly as possible. Run the following command to increase this configuration value:

call mysql.rds_set_configuration('binlog retention hours', 168);
caution

Increasing binary log retention hours requires more disk space in your source Amazon RDS MySQL database, as the number of log files saved at one time increases.

After establishing the retention period, check the DB instance's storage usage to make sure the binary logs you want to keep are not taking up too much space.

Creating a user for replication

Create a user for CDC permissions

To enable CDC to read from MySQL, you must create a user with specific replication permissions. These permissions ensure the CDC process can read the binary log and track changes.

  • Log in to MySQL as an administrative user.
  • Create a new user (or modify an existing one) for CDC. Run the following SQL commands: CREATE USER 'cdc_user'@'%' IDENTIFIED BY 'password'; GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON . TO 'cdc_user'@'%';

This grants the user the necessary permissions to:

  • REPLICATION SLAVE: Read the binary logs on the replica.
  • REPLICATION CLIENT: Execute commands related to replication.

SELECT: Read data from the database tables monitored by CDC.

Flush privileges to apply the changes:

FLUSH PRIVILEGES;

Configuring CDC from a MySQL replica

Configure Change Data Capture (CDC) from a MySQL Read Replica, when using Global Transaction Identifiers (GTID) for tracking transactions. GTIDs help maintain transactional consistency and ensure CDC can capture changes accurately and in the correct order.

Prerequisites

note

To enable MySQL GTIDs for Amazon RDS or Aurora MySQL, refer to the Amazon documentation for instructions on enabling this feature in your Amazon Aurora MySQL database.

GTID permissions

GTID is a critical feature for ensuring CDC reads data changes correctly and efficiently from a MySQL Read Replica.

Here are the key reasons why GTID is essential:

  • Transaction Consistency: GTIDs ensure that each transaction has a unique identifier. This enables the CDC to capture and process the exact sequence of changes made to the database, thereby avoiding issues such as data loss or duplication.
  • Improved Failover and Resilience: In the event of a failure or restart of the CDC process, GTIDs enable the process to resume from the exact transaction it last processed, preventing data inconsistency during recovery.

Configuring GTID-based CDC from a MySQL replica

Procedure

1. Verifying GTID configuration on the replica

GTID-based replication must be enabled on your MySQL replica. Check the current GTID settings and ensure the following options are enabled:

  • gtid_mode: This must be set to ON to enable GTID replication.
  • enforce_gtid_consistency: This should also be set to ON to ensure consistency of GTIDs across all transactions.

Run the following commands to verify these settings:

SHOW VARIABLES LIKE 'gtid_mode';
SHOW VARIABLES LIKE 'enforce_gtid_consistency';

If either of these settings is not enabled, modify the MySQL configuration (my.cnf or my.ini) and restart the replica server.

2. Creating a user for CDC with GTID permissions

To enable CDC to read from the MySQL Read Replica, you must create a user with specific replication permissions. These permissions ensure the CDC process can read the binary log and track changes via GTIDs.

  1. Log in to MySQL as an Administrative user.
  2. Create a new user (or modify an existing one) for CDC. Run the following SQL commands:
CREATE USER 'cdc_user'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'cdc_user'@'%';

This grants the user the necessary permissions to:

  • REPLICATION SLAVE: Read the binary logs on the replica.
  • REPLICATION CLIENT: Execute commands related to replication.
  • SELECT: Read data from the database tables monitored by CDC.
  1. Flush privileges to apply the changes:
FLUSH PRIVILEGES;

3. Enable GTID-based replication

Ensure that GTID replication is enabled on the MySQL replica. You must configure the following settings:

  • gtid_mode : ON
  • enforce_gtid_consistency : ON

These settings can be adjusted in the MySQL configuration file (my.cnf or my.ini) and the server must be restarted for changes to take effect.

gtid_mode = ON enforce_gtid_consistency = ON

4. Verifying the CDC process

Once the user is configured and GTID replication is enabled, monitor the CDC process to ensure it is reading and capturing changes from the replica without issues.

  • Check the CDC logs to confirm that the CDC process is reading the binary logs in the correct sequence and processing transactions accurately.
  • To verify the GTID value, execute the following command on the MySQL Server:
PL/SQL

____

SHOW MASTER STATUS;

Here's an example of the expected output:

`

Limitations

The MySQL CDC does not support transactions that include SAVEPOINT, ROLLBACK TO SAVEPOINT, or RELEASE SAVEPOINT statements.

  • Any transaction containing SAVEPOINT operations is skipped, resulting in permanent data loss downstream.
  • Frameworks such as Ruby on Rails ActiveRecord generate nested transactions using SAVEPOINT automatically. This may cause unexpected data loss if used with MySQL CDC.

Example

SAVEPOINT active_record_1;  
ROLLBACK TO SAVEPOINT active_record_1;
RELEASE SAVEPOINT active_record_1;

Recommendation

Avoid using nested transactions, for example, SAVEPOINT operations, in source applications when CDC is enabled.

Additional considerations

  • Performance: Enabling GTID-based CDC may affect performance on high-throughput systems. Ensure that you test the configuration in a staging environment before deploying it to production systems.

  • Storage: Storing binary logs for extended periods (as required by CDC) consumes additional disk space. Regularly monitor the disk usage to ensure that the system has sufficient resources.

  • Replication consistency: Ensure the replica is configured correctly and synced with the Master to prevent replication lag or inconsistencies during CDC reads.

On this Page