Or how wsrep_on can bring you to have a cluster with usless data.
This is a WARNING article, and it comes out after I have being working on define internal blueprint on how to perform DDL operation using RSU safely.
The fun, if fun we want to call it, comes as usual by the fact that I am a curious guy and I often do things my way and not always following the official instructions.
Anyhow, lets us go straight to the point and describe what can happen on ANY MySQL/Galera installation.
The environment
The test environment, MySQL/Galera (Percona PXC 5.6.20 version).
The cluster was based on three nodes local no geographic distribution, no other replication in place then Galera.
Haproxy on one application node, simple application writing in this table:
Table: tbtest1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
CREATE TABLE: CREATE TABLE 'tbtest1' ( 'auAPP1nc' bigint(11) NOT NULL AUTO_INCREMENT, 'a' int(11) NOT NULL, 'uuid' char(36) COLLATE utf8_bin NOT NULL, 'b' varchar(100) COLLATE utf8_bin NOT NULL, 'c' char(200) COLLATE utf8_bin NOT NULL, 'counter' bigint(20) DEFAULT NULL, 'time' timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, 'partitionid' int(11) NOT NULL DEFAULT '0', 'date' date NOT NULL, 'strrecordtype' char(3) COLLATE utf8_bin DEFAULT NULL, PRIMARY KEY ('auAPP1nc','partitionid'), KEY 'IDX_a' ('a'), KEY 'IDX_uuid' ('uuid') ) ENGINE=InnoDB AUTO_INCREMENT=482 DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
Small app
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
#! /bin/bash -- i=1 while : do echo "$i " mysql -ustress -pxxx -h192.168.0.35 -P 3307 -e "SET @HH=@@HOSTNAME; insert into test.tbtest1 (a,uuid,b,c,strrecordtype,date,partitionid)
values($i,UUID(),@HH,'a','APP1'|'APP2',now(),RAND()*100)"; i=$((i + 1)) if [ $i -eq 100 ] then break fi sleep 0.5; done |
Server Information
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
(root@localhost:pm) [(none)]>\s SHOW global STATUS LIKE 'wsrep_provider_version'; -------------- /home/mysql/templates/PCX-56/bin/mysql Ver 14.14 Distrib 5.6.20-68.0, FOR Linux (i686) USING EditLine wrapper Connection id: 90 Current DATABASE: Current user: root@localhost SSL: NOT IN USE Current pager: stdout USING OUTFILE: '' USING delimiter: ; Server version: 5.6.20-68.0-25.7-log Percona XtraDB Cluster BINARY (GPL) 5.6.20-25.7, Revision 886, wsrep_25.7.r4126 Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8 Db characterset: utf8 Client characterset: utf8 Conn. characterset: utf8 UNIX socket: /home/mysql/instances/galera1-56/mysql.sock Uptime: 2 min 38 sec Threads: 3 Questions: 282 Slow queries: 0 Opens: 78 FLUSH TABLES: 3 Open TABLES: 8 Queries per second avg: 1.784 -------------- +------------------------+---------------+ | Variable_name | Value | +------------------------+---------------+ | wsrep_provider_version | 3.7(r7f44a18) | +------------------------+---------------+ 1 row IN SET (0.01 sec) |
Facts
In MySQL/Galera there is variable that allow us to say to the server to do not replicate. This variable is wsrep_on and when we set it as OFF the server will not replicate any statement to the other node.
This is quite useful when in the need to perform actions on an single node, like when you need to perform DDL on RSU mode.
But this flexibility can bite you quite badly.
I had done a simple small change to the widely use command:
SET wsrep_on=OFF;
I just add GLOBAL:
SET GLOBAL wsrep_on=OFF;
To be honest I was expecting to have the command rejected, but no it was accept and this is what happened:
I had run the small loop (see above) on two application servers, one pointing to HAProxy and writing APP1 in the field strrecordtype, the other pointing directly to the node where I will issue the command with wsrep_on inserting APP2.
The results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
(root@localhost:pm) [test]>select @@HOSTNAME;select count(*) AS APP2_COUNTER FROM tbtest1 WHERE strrecordtype='APP2'; select count(*) AS APP1_COUNTER FROM tbtest1 WHERE strrecordtype='APP1'; +---------------+ | @@HOSTNAME | +---------------+ | tusacentral03 | +---------------+ 1 row IN SET (0.00 sec) +-------------+ |APP2_COUNTER | +-------------+ | 99 | +-------------+ 1 row IN SET (0.00 sec) +-------------+ |APP1_COUNTER | +-------------+ | 99 | +-------------+ 1 row IN SET (0.00 sec) (root@localhost:pm) [test]> (root@localhost:pm) [test]>SET GLOBAL wsrep_on=OFF; <------------- It should not be GLOBAL (root@localhost:pm) [test]>select @@HOSTNAME;select count(*) AS APP2_COUNTER FROM tbtest1 WHERE strrecordtype='APP2'; select count(*) AS APP1_COUNTER FROM tbtest1 WHERE strrecordtype='APP1'; +---------------+ | @@HOSTNAME | +---------------+ | tusacentral01 | +---------------+ 1 row IN SET (0.00 sec) +-------------+ |APP2_COUNTER | +-------------+ | 0 | +-------------+ 1 row IN SET (0.00 sec) +-------------+ |APP1_COUNTER | +-------------+ | 66 | <-------------------- 1/3 lost because HAProxy think that the node is ok ... +-------------+ 1 row IN SET (0.00 sec) |
As you can see in the tusacentral03 (which is the one where I issue SET GLOBAL wsrep_ON=OFF), I have ALL the records inserted in the local node and ALL the records coming from the others node.
But on the node tusacentral01, I had NO records related to APP2, but more relevant I had lost 1/3 of my total inserts.
Why?
Well this is quite clear, and unfortunately is by design.
If I issue wsrep_ON=OFF with GLOBAL the server will apply the setting to ALL sessions, meaning all session on that will STOP to replicate.
In the source code the section relevant to this is quite clear:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
#wsrep_mysqld.cc #line 1395 int wsrep_to_isolation_begin(THD *thd, char *db_, char *table_, const TABLE_LIST* table_list) { /* No isolation for applier or replaying threads. */ if (thd->wsrep_exec_mode == REPL_RECV) return 0; int ret= 0; mysql_mutex_lock(&thd->LOCK_wsrep_thd); if (thd->wsrep_conflict_state == MUST_ABORT) { WSREP_INFO("thread: %lu, %s has been aborted due to multi-master conflict", thd->thread_id, thd->query()); mysql_mutex_unlock(&thd->LOCK_wsrep_thd); return WSREP_TRX_FAIL; } mysql_mutex_unlock(&thd->LOCK_wsrep_thd); DBUG_ASSERT(thd->wsrep_exec_mode == LOCAL_STATE); DBUG_ASSERT(thd->wsrep_trx_meta.gtid.seqno == WSREP_SEQNO_UNDEFINED); if (thd->global_read_lock.can_acquire_protection()) { WSREP_DEBUG("Aborting APP1: Global Read-Lock (FTWRL) in place: %s %lu", thd->query(), thd->thread_id); return -1; } if (wsrep_debug && thd->mdl_context.has_locks()) { WSREP_DEBUG("thread holds MDL locks at TI begin: %s %lu", thd->query(), thd->thread_id); } /* It makes sense to set auto_increment_* to defaults in APP1 operations. Must be done before wsrep_APP1_begin() since Query_log_event encapsulating APP1 statement and auto inc variables for wsrep replication is constructed there. Variables are reset back in THD::reset_for_next_command() before processing of next command. */ if (wsrep_auto_increment_control) { thd->variables.auto_increment_offset = 1; thd->variables.auto_increment_increment = 1; } if (thd->variables.wsrep_on && thd->wsrep_exec_mode==LOCAL_STATE) <------- Here we have a check for wsrep_on { switch (wsrep_OSU_method_options) { case WSREP_OSU_APP1: ret = wsrep_APP1_begin(thd, db_, table_, table_list); break; case WSREP_OSU_APP2: ret = wsrep_APP2_begin(thd, db_, table_); break; } if (!ret) { thd->wsrep_exec_mode= TOTAL_ORDER; } } return ret; } enum wsrep_exec_mode { LOCAL_STATE, REPL_RECV, TOTAL_ORDER, LOCAL_COMMIT }; |
So what happen is that the server check if the thd object has that variable ON and has LOCAL_STATE, if so it replicates, if not it does nothing.
But as said while this makes sense in the SESSION scope, it does not in the GLOBAL.
Not only, setting wsrep_on to OFF in global scope does NOT trigger any further action from MySQL/Galera, like for instance the possible FACT that the node could be desynchronize from the remaining cluster.
The interesting effect of this is that HAProxy has NO WAY to know that the node had stop to replicate, and as such the server can receive the requests but those will not replicate to the other node causing data diversion.
You can say, that a DBA SHOULD know what he is doing, and as such he/her should be MANUALLY desync the node and then issue the command.
My point instead is that I don't see ANY good reason to have wsrep_on as global variable; instead I see this as a very dangerous and conceptually wrong "feature".
Browsing the Codership manual, I noticed that the wsrep_on variable comes with the "L" flag, meaning that the variable is NOT suppose to be GLOBAL.
But it is ...
I also had dig in the code and:
1 2 3 4 5 6 7 8 9 10 11 12 |
wsrep_var.cc #line58 bool wsrep_on_update (sys_var *self, THD* thd, enum_var_type var_type) { if (var_type == OPT_GLOBAL) { // FIXME: this variable probably should be changed only per session thd->variables.wsrep_on = global_system_variables.wsrep_on; } return false; } |
That is interesting isn't it?
Wondering when this comment was inserted and why it was ignored.
Anyhow the source of all problems is here in the wsrep_on variable definition:
1 2 3 4 5 6 7 |
static Sys_var_mybool Sys_wsrep_on ( "wsrep_on", "To enable wsrep replication ", SESSION_VAR(wsrep_on), <----------------------- This allow global CMD_LINE(OPT_ARG), DEFAULT(TRUE), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(0), ON_UPDATE(wsrep_on_update)); |
The variable was defined as SESSION_VAR instead of SESSION_ONLY, and as such used also in global scope.
As already state, this is from my point of view a conceptual error not a bug, but something that should not exists at all, because in a cluster where I have data certify/replicate/synchronize there should NOT be any option for a DBA/user to bypass at GLOBAL level the data validation/replication process.
To note, and to make things worse, after I had done the test I can easily set wsrep_on back, and my node will continue to act as part of the cluster as if all the nodes are equal, while they are not.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
(root@localhost:pm) [test]>select @@HOSTNAME;select count(*) AS RSU_COUNTER FROM tbtest1
WHERE strrecordtype='RSU';
select count(*) AS TOI_COUNTER FROM tbtest1 WHERE strrecordtype='TOI'; +---------------+ | @@HOSTNAME | +---------------+ | tusacentral03 | +---------------+ 1 row IN SET (0.00 sec) +-------------+ | RSU_COUNTER | +-------------+ | 181 | +-------------+ 1 row IN SET (0.00 sec) +-------------+ | TOI_COUNTER | +-------------+ | 177 | +-------------+ 1 row IN SET (0.00 sec) +---------------+ | @@HOSTNAME | +---------------+ | tusacentral01 | +---------------+ 1 row IN SET (0.00 sec) +-------------+ | RSU_COUNTER | +-------------+ | 77 | +-------------+ 1 row IN SET (0.00 sec) +-------------+ | TOI_COUNTER | +-------------+ | 139 | +-------------+ |
As you can see the cluster continue to insert data using HAProxy and all the node, but it has a data set that is inconsistent.
Conclusions
- Never use SET GLOBAL with wsrep_on
- IF you are so crazy to do so, be sure no one is writing on the node.
- I am sure this is a mistake in the logic and as such this variable should be change from the source, in the code defining the variable SESSION_ONLY and not SESSION_VAR
Or wsrep_on can damage you quite badly.