aurora ddl notesLast quarter of 2021 AWS released Aurora version 3. This new version aligns Aurora with the latest MySQL 8 version porting many of the advantages MySQL 8 has over previous versions.

While this brings a lot of new interesting features for Aurora, what we are going to cover here is to see how DDLs behave when using the ONLINE option. With a quick comparison with what happens in MySQL 8 standard and with Group Replication. 

Tests

All tests were run on an Aurora instance r6g.large with secondary availability zone.
The test was composed by:

        4 connections

    • #1 to perform ddl
    • #2 to perform insert data in the table I am altering
    • #3 to perform insert data on a different table 
    • #4 checking the other node operations

In the Aurora instance, a sysbench schema with 10 tables and 5 million rows was created, just to get a bit of traffic. While the test table with 5ml rows as well was:

CREATE TABLE `windmills_test` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `millid` smallint NOT NULL,
  `kwatts_s` int NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `active` tinyint NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`),
  KEY `kuuid_x` (`uuid`),
  KEY `millid_x` (`millid`),
  KEY `active_x` (`active`),
  KEY `idx_1` (`uuid`,`active`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC

The executed commands:

Connection 1:
    ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
    ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
    
Connection 2:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills3  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4:
     while [ 1 = 1 ];do echo "$(date +'%T.%3N')";mysql --defaults-file=./my.cnf -h <secondary aurora instance> -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done
     

Operations:
1) start inserts from connections
2) start commands in connections 4 - 5 on the other nodes
3) execute: DC1-1(root@localhost) [windmills_large]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;

With this, what I was looking to capture is the operation impact in doing a common action as creating an Index. My desired expectation is to have no impact when doing operations that are declared “ONLINE” such as creating an index, as well as data consistency between nodes. 

Let us see what happened…

Results

While running the insert in the same table, performing the alter:

mysql>  ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (16.51 sec)
Records: 0  Duplicates: 0  Warnings: 0

Is NOT stopping the operation in the same table or any other table in the Aurora instance.

We can only identify a minimal performance impact:

[root@ip-10-0-0-11 tmp]# while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
.347
.283
.278
.297
.291
.317
.686  ← start
<Snip>
.512  ← end
.278
.284
.279

The secondary node is not affected at all, and this is because Aurora managed at storage level the data replication. Given that there is no such thing as Apply from Relaylog, as we have in standard MySQL asynchronous or data replicated with Group Replication.  

The result is that in Aurora 3, we can have zero impact index (or any other ONLINE/INSTANT) operation, with this I include the data replicated in the other instances for High Availability. 

If we compare this with Group replication (see blog):

                               	 	 	  GR         Aurora 3
Time on hold for insert for altering table   	~0.217 sec   ~0.523 sec
Time on hold for insert for another table   	~0.211 sec   ~0.205 sec 

However, keep in mind that MySQL with Group Replication will still need to apply the data on the Secondaries. This means that if your alter was taking 10 hours to build the index, the Secondary nodes will be misaligned with the Source for approximately another 10 hours. 

With Aurora 3 or with PXC, changes will be there when Source has completed the operation.    

What about Percona XtraDB Cluster (PXC)? Well, with PXC we have a different scenario:

                               	 	 	 PXC(NBO)     Aurora 3
Time on hold for insert for altering table   	~120 sec      ~0.523 sec
Time on hold for insert for another table   	~25  sec      ~0.205 sec

We will have a higher impact while doing the Alter operation, but the data will be on all nodes at the same time maintaining a high level of consistency in the cluster. 

Conclusion

Aurora is not for all use, and not for all budgets, however it has some very good aspects like the one we have just seen. The Difference between standard MySQL and Aurora is not in the time of holding/locking (aka operation impact), but on the HA aspects. If I have my data/structure on all my Secondary at the same time of the Source, I will feel much more comfortable, than having to wait an additional T time. 

This is why PXC in that case is a better alternative if you can afford locking time, if not, well Aurora 3 is your solution, just do your math properly and be conservative with the instance resources.