Sub title: When you do stupid things,  you get stupid things back, but sometime is good to discover odd behaviour.

 

The other day, I was having fun playing with the latest toy, mysql-cluster-gpl-7.3.2.

 

I was setting up a new cluster set, of 6 data nodes and 3 Mysql node, to do so I was using a simple bash file that helps me to generate what I need on my environment.

During that I had a very stupid idea, to have the script to recognize the machines IPs and PORT I have set, then use them as SERVER IDs.

 

I know I know is not a good way to define SERVER ID, anyhow I did it, so the result was that in my my.cnf my server_id looks like:

server-id = 1921680355510

 

Some of you already see what kind of stupid thing this is, but let me not anticipate it.

 

So as usual I start the mysql node first to generate a brand new database, to do so I have first disable the ndbcluster and all went fine.

 

Then I start my management nodes and my data nodes.

 

I was quite happy and I finally attach my MySQL node to the rest of the cluster.

But Nodes failed to starts, I was quite surprise so I open the log and is saw:

 

2013-10-28 10:20:58 2692 [Note] NDB[3]: NodeID: 77, all storage nodes connected
2013-10-28 10:20:58 2692 [ERROR] NDB: server id provided is too large to be represented in opt_server_id_bits or is reserved
mysqld: /usr/src/mysql-cluster-gpl-7.3.2/sql/ndb_component.cc:139: virtual int Ndb_component::deinit(): Assertion 'm_thread_state == TS_STOPPED' failed.
14:20:58 UTC - mysqld got signal 6 ;
 

 

 

What??? My fantastic server_id of 13!!!!! digit is too long?

 

Hey wait a minute?! how many digit??

 

At that point I just hit the keyboard with my head.. realizing what I have done.

headdesk

Server id is an unsigned integer, his max value is 4294967295 with 10 digit.

Once more my friend NDB was right and I was trying to do a stupid thing.

 

But when I realize it, I also wonder, why when activating ndbcluster I get the error and when I have MySQL running without it NO?

 

This kind of checking should be done at common level at MySQL start-up, so why I get it only with MySQL and NDB active?

I dig a little bit and I found the place where NDB is doing the check.

The if condition where the check is executed is quite simple and clear.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
....
sql::ha_ndbcluster_binlog.cc
 
int ndbcluster_binlog_start()
{
  DBUG_ENTER("ndbcluster_binlog_start");
 
  if (::server_id == 0)
  {
    sql_print_warning("NDB: server id set to zero - changes logged to "
                      "bin log with server id zero will be logged with "
                      "another server id by slave mysqlds");
  }
 
  /* 
     Check that ServerId is not using the reserved bit or bits reserved
     for application use
  */
  if ((::server_id & 0x1 << 31) ||                             // Reserved bit <--------------------- Check
      !ndbcluster_anyvalue_is_serverid_in_range(::server_id))  // server_id_bits
  {
    sql_print_error("NDB: server id provided is too large to be represented in "
                    "opt_server_id_bits or is reserved");
    DBUG_RETURN(-1);
  }

 

 

But again why the error is not reported in MySQL core?

 

Because MySQL is silently converting my server id to his maximum value, as such my stupid server id of 1921680355510 become 4294967295.

 

Oh my...!

An identifier is managed as a quantity!

 

I have done a very stupid thing setting the server_id to , but honestly I don't like this as well.

 

As we know MySQL silently adjust his behaviour in relation to the SQL_MODE, but the way it adjust the parameters should be something different.

I agree that variables like max_connections CAN be adjusted, but others like server_id, well no, I think is wrong given those are identifier(s) and not quantity.

 

Server ID is an ID, as such it must be validated and if not matching the validation an ERROR must be return and the server should not start at all.

 

Instead MySQL currently is just raising a WARNING and continue to run, not caring if there will be another server with the same id because this "adjustment".

 

131028 14:00:58 mysqld_safe Starting mysqld daemon with databases from /home/mysql/instances/csite3/data
2013-10-28 14:00:58 0 [Warning] option 'server_id': unsigned value 192168075510 adjusted to 4294967295 <----------------------------------------------

I was really "not happy" about this, so I went into the code and check where the server ID is evaluated and modified.
So this is the action calls:

 

mysqld::mysqld_main
	=>mysqld::init_common_variables
		=> my_getopt::init_variables(longopts, init_one_value);
			=> my_getopt::init_one_value
				=> my_getopt::getopt_ll_limit_value

 

So what exactly is happening here?

 

MySQL initialize reading the options in my_getopt and on the base of the data type it pass it to getopt_ll_limit_value.

There whatever numeric argument  is pass, if it exceed the max value is reset to his Max value. At this stage there is no logic to manage values like Server ID, that is an identifier and not quantity.

This is obviously something that can be managed better, and actually the fact that NDB prevent the server to go ahead is by far more correct, this because Server Id is it used internally to recognize the server.

 

I have done a quick check, and I have see that at least the following functions are using the server_id variable, in the new replication mechanism.

 

rpl_master.cc
int register_slave(THD* thd, uchar* packet, uint packet_length)
void unregister_slave(THD* thd, bool only_mine, bool need_lock_slave_list)
bool show_slave_hosts(THD* thd)
static int fake_rotate_event(NET* net, String* packet, char* log_file_name,ulonglong position, const char** errmsg,uint8 checksum_alg_arg)
static int send_heartbeat_event(NET* net, String* packet,const struct event_coordinates *coord,uint8 checksum_alg_arg)
bool com_binlog_dump(THD *thd, char *packet, uint packet_length)
bool com_binlog_dump_gtid(THD *thd, char *packet, uint packet_length)
void mysql_binlog_send(THD* thd, char* log_ident, my_off_t pos,const Gtid_set* slave_gtid_executed)

 

I have not investigate more, to see what will happen if there are two different servers having the same ID, I have not done it, because in any case it is conceptually wrong.

 

Just for fun I have create an horrible patch to manage the mismatch, adding a control on the parameter name.

 

Obviously that should be something different like an attribute of the my_option object that indicate if the option is a quantity an identifier or whatever else, then on that base decide what to do.

Any how also a stupid and horrible things like the check on the name is better then not having it at all:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
ulonglong getopt_ull_limit_value(ulonglong num, const struct my_option *optp,
                                 my_bool *fix)
{
  my_bool adjusted= FALSE;
  ulonglong old= num;
  char buf1[255], buf2[255];
  const ulonglong max_of_type=
    max_of_int_range(optp->var_type & GET_TYPE_MASK);
 
	char string1[255], string2[255];
	int result;
	string1=optp->name;
	string2="server_id";
	scanf("%s",string1[10]);
	scanf("%s",string2[10]);
	int chlimit = 255;
	result = strncmp(string1,string2,chlimit);
 
 
 
  if ((ulonglong) num > (ulonglong) optp->max_value &&
      optp->max_value) /* if max value is not set -> no upper limit */
  {
    num= (ulonglong) optp->max_value;
    adjusted= TRUE;
    if(result == 0){
           my_getopt_error_reporter(ERROR_LEVEL,
           "option '%s': unsigned value %s CANNOT BE adjusted to %s please review your settings, and start MySQL again",
           optp->name, llstr(old, buf1), llstr(num, buf2));
           exit(1);
 
    }
  }
  if (num > max_of_type)
  {
    num= max_of_type;
    adjusted= TRUE;
  }
 

 

Running the server with the fix, I will finally had a consisten behaviour, and my favorite MySQL server doing the right thing:

 

131029 13:38:03 mysqld_safe mysqld from pid file /home/mysql/instances/csite3/mysql.pid ended
131029 13:41:11 mysqld_safe Starting mysqld daemon with databases from /home/mysql/instances/csite3/data
2013-10-29 13:41:11 0 [ERROR] 'option server_id' unsigned value 192168075510 CANNOT BE adjusted to 4294967295 please review your settings, and start MySQL again
131029 13:41:12 mysqld_safe mysqld from pid file /home/mysql/instances/csite3/mysql.pid ended

 

 

 

 

Conclusion

From a stupid action it could come out something interesting, at least for me.

I would like to classify this as bug, but given it is a conceptual thing may be is more a what? Feature request? Logic fix?

 

Finally my conclusion is that MySQL is a great product, but is growing fast and sometimes things are left behind, this is a small but possible dangerous one.