My MySQL tips

Binary log and Transaction cache in MySQL 5.1 & 5.5

The Binary Log

The binary log in MySQL has two main declared purpose, replication and PTR (point in time recovery), as declared in the MySQL manual. In the MySQL binary log are stored all that statements that cause a change in the database. In short statements like DDL such as ALTER, CREATE, and so on, and DML like INSERT, UPDATE and DELETE.

As said binary log transform multi thread concurrent data/structure modification in to a serialize steps of activity. To handle this, a lock for the binary log—the LOCK_log mutex—is acquired just before the event is written to the binary log and released just after the event has been written. Because all session threads for the server log statements to the binary log, it is quite common

for several session threads to block on this lock.

It is not in the scope of this discussion to provide a detailed description of the binary log, which could be found at http://dev.mysql.com/doc/refman/5.1/en/binary-log.html.

What it is relevant is that MySQL write "a" log and read "a" log, serializing what is happening in the MASTER in a multi thread scenario. Also relevant is that MySQL has to deal with different behavior in relation to TRANSACTIONAL and NON-TRANSACTIONAL storage engine, trying to behave consistently.

In order to do that MySQL had changes his behavior while evolving, so it changes the way he was and is using the TRANSACTION CACHE also creating not few confusion.

Transaction Cache

What is the Transaction Cache or TC? It is a "per thread" cache used by MySQL to store the statements that will be needed to be written in the binary log.

This cache is active ONLY when the Binary log is active, otherwise it will not be used, and, as it declare, it is a "Transaction" cache so used to store statements executed in a transaction, and that will be consistently flushed in the binary log serializing the multi thread execution.

But what if we also have non transactional storage engine involved in our operations? Well here is where the fun begins, and also where the difference in how MySQL acts makes our life quite complicate.

Actors involved (who control what)

The TC (transaction Cache) is currently controlled by few parameters, and it is important to have them in mind:

log-bin

o this variable is to enable the binary log, if not present all the structure related to it is not active;

binlog_cache_size

o The size of the cache to hold changes to the binary log during a transaction

binlog_stmt_cache_size

o this variable determines the size of the cache for the binary log to hold nontransactional statements issued during a transaction.

binlog_direct_non_transactional_updates

o This variable when enabled, causes updates to non-transactional tables to be written directly to the binary log, rather than to the transaction cache.

Binlog_cache_use vs Binlog_cache_disk_use

o Status Variables to use for checking the efficiency of the cache.

Binlog_stmt_cache_use vs Binlog_stmt_cache_disk_use

o Status Variables to use for checking the efficiency of the cache.

MySQL Transaction cache and MySQL versions

Ok let start trying to define some milestones, like what is what, in which version, and what works and what doesn't.

binlog_cache_size is present from the start but behave different, in:

5.1 controls the whole transaction cache

5.5.3 up to 5.5.8 controls both size of the statement and none cache

5.5.9 and newer controls only the transaction cache demanded to transactions

binlog_stmt_cache_size

5.1 not present

5.5.9 it controls the dimension of the transaction cache demanded to store the NON transactional statements present in a transaction.

binlog_direct_non_transactional_updates

5.1.44 Introduced. it is DISABLE by default. If enable it allows all statements executed, also AFTER a Transaction (we will see later what this means), to be directly flushed to the binary log without going to the TC. It has effect only if the binlog_format is set to STATEMENT or MIXED.

5.5.2 Introduced.

5.5.5 It was modify to be active ONLY if binlog_format is STATEMENT in any other case it is ignored.

This feature, regardless the MySQL version, create more problems then solutions, and honestly I think it should be removed, please see the bug (http://bugs.mysql.com/bug.php?id=51291) to understand what I mean. In any case it has now a very limited scope of action, and better to consider it as not present.

How the whole works

The first point to take in mind, is that the TC is used only during a transaction, also the new TCStatement, is suppose to store only statement that are issued during a transaction, and it will be flush at the moment of the COMMIT, actually it go the queue for acquiring the LOG_LOCK mutex.

All other statements will directly go to the LOG_LOCK mutex queue.

Said that let us see how the TC works and what it does and how.

In general terms when you mix the Transactional and NON-transactional statements these rules apply, unless the infamous

binlog_direct_non_transactional_updates is enable (only 5.1):

1. If the statement is marked as transactional, it is written to the transaction cache.

2. If the statement is not marked as transactional and there are no statements in the

transaction cache, the statement is written directly to the binary log.

3. If the statement is not marked as transactional, but there are statements in the

transaction cache, the statement is written to the transaction cache.

Interesting point here is what and how is define as NON Transactional or as TRANSACTIONAL, from reading the manual and the many reference we have that:

After MySQL 5.1.31

a statement is considered non-transactional if it changes only non-transactional tables;

a statement is transactional if it changes only transactional tables.

Finally a statement that changes both non-transactional and transactional tables is considered “mixed”. Mixed statements, like transactional statements, are cached and logged when the transaction commits.

Before MySQL 5.1.31

a statement was considered non-transactional if the first changes it makes change non-transactional tables;

a statement was considered transactional if the first changes it makes change transactional tables.

In term of code:

Before MySQL 5.1.31

Transactional:

BEGIN;

INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;

COMMIT;

NON-Transactional (the first command also if inside a declared transaction):

BEGIN;

INSERT INTO myisam_tbl VALUES (a),(b);

INSERT INTO innodb_tbl SELECT * FROM innodb_tbl;

COMMIT;

After MySQL 5.1.31

Transactional:

BEGIN;

INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO innodb_tbl SELECT * FROM innodb_tbl;

COMMIT;

NON-Transactional:

INSERT INTO myisam_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;

Mixed:

BEGIN;

INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;

COMMIT;

A graph could help as well to understand what happens between threads and flush of the TC in the binary log.

In the graphs we have three treads running in parallel, remember that inside mysql the main entity is the THD "The THD class defines a thread descriptor. It contains the information pertinent to the thread that is handling the given request. Each client connection is handled by a thread. Each thread has a descriptor object."

The first thread will do all the operations using TRANSACTION storage engine (InnoDB);

the second thread will use NON-TRANSACTIONAL (MyISAM) in the first insert, then TRANSACTIONAL (InnoDB);

the third statement will do first a TRANSACTIONAL insert and then will use it for a NON-TRANSACTIONAL.

In the first one I have assume the 5.1.31 and over behavior.

--------------------------------------------------------------

The three statements start almost at the same time, and none of them has data in any cache or buffer. Said that it is probably true that statement 1 in Thread2 will be executed (fully) faster, given it will take less as overhead, and giving that is the first statement in the THD2 transaction, it will be treated as NON-TRANSACTIONAL and directly sent to the binary-log's queue.

THD1 is fully TRANSACTIONAL, it will do his internal action then flush to binary-log.

Then THD2 had done also the second part of the operation TRANSACTIONAL and it is flushing it as well.

Last the THD3 which had first insert a record from TRANSACTIONAL table and then use it to populate the NON-TRANSACTIONAL.

Note

It looks more or less fine, isen't it? Well but assume that all the three THD share the InnoDB table and that as per default the ISOLATION LEVEL is Repeatable Read... Do you get it??

Actions and value on the MASTER will do and set values, that are potentially different from the ones in the SLAVE given the different order in the execution.

Immagine to have updates, instead insert and/or WHERE condition by values. Results could be different given the position, and remember, we are still talking about the STATEMENT replication given it is still the default ALSO in 5.5.

Finally remember that the issue it is NOT that THD1/2/3 could potentially change each other value, but the fact that they do something on the MASTER which could potentially different in the SLAVE.

That's it, the changes introduced in MySQL were done to reduce this wrong behavior, or to in some way try to keep consistency.

Let see what and how in the others graphs.

What about using binlog_direct_non_transactional_updates, will it helps?

I admit that I have exaggerated a little bit, in this example but my intention was to show how dangerous this option could be.

THD2 statement 1, as usual will be the first one, then THD1 consistently apply the whole set, but then it could happens that given the option set binlog_direct_non_transactional_updates THD3 will be faster in flushing the NON-TRANSACTIONAL statement, writing to the binary log before THD2, adding fragmentation and breaking the order in the binary log.

As said I have exaggerate a little, but theoretically this is very likely to happen in a standard context.

So in addition to the already found bug(s) (see the reference section), this option for me it is not really an additional help, but an issue generator.

Transaction Cache in MySQL 5.5

From MySQL 5.5.4 the TC is now divide in two parts, one control/store the TRANSACTIONAL statements, another store ONLY the NON_TRANSACTIONAL statements.

What does it means? In theory it means that any statement which was suppose to be part of a TRANSACTION but related to a NON-TRANSACTIONAL table (like statement at the start of a transaction), instead going directly to the binary log should be moved to the TC-statement, and then flushed when the TC is flushed.

The involvement of the binlog_direct_non_transactional_updates complicate a little all the scenario and in addition to that, it also create some major issue in the correct flush, so I am ignoring it here.

Let us see what happens using the new TC mechanism:

THD1 is as usual consistent in the way it flush to binary log. Arbitrarily I have decide that this time THD2 is taking a little bit more time like a millisecond, and that is enough for THD1 to acquire the LOG_LOCK before it.

The scope as you have probably guess was to make more ORDERED the graph, given that this time having one or the other flushing before or after will not really make any difference this time.

This because the TCStatement will be flushed (at least this is what I have seen and understood) when the TC is flushed.

Result is that the flush will take place using same the order on master and on the slave.

Finally THD3 also if will have the possibility to use the TCstatements it will not giving the fact that the statement using a NON-TRANSACTIONAL storage engine is NOT the first statement, so the mixed transaction will be placed in the TC as expected.

Conclusion

There was not so much to say from the beginning, to be honest, the TC and TCstatements are useful only for that case we do mix-transactions. The only significant changes are between after 5.1.31 and before and the introduction of the TCstatement. All the other tentative to add flexibility to this area, had in my vision increased bad behaviors.

It is easy for Master and Slave to store or modify data in a unexpected way, issue that any DBA face probably too often in MySQL.

I am not saying that all that cases should bring back to this specific issue, but for sure it is one of the less known and difficult to track, because many things do influence the how MySQL will flush to binary log.

The golden rule is, do not mix NON-TRANSACTIONAL and TRANSACTIONAL, but if you really have to do that, be sure of doing it in using 5.5.9 and above, which at least gives you more insurance of flushing to binary log the commands consistently.

Last notes

Remember that any statement NON-TRANSACTIONAL in the TC cache will be flushed on COMMIT or ROLLBACK, and that in the latest case a WARNING message should be written in the MySQL error log.

Note that I will post later (when I will have the time) some code and related raw data for documentation purpose.

References:

--------------

http://dev.mysql.com/doc/refman/5.1/en/replication-features-transactions.html

http://dev.mysql.com/doc/refman/5.1/en/replication-options-binary-log.html#sysvar_binlog_cache_size

http://dev.mysql.com/doc/refman/5.1/en/replication-options-binary-log.html#sysvar_binlog_direct_non_transactional_updates

http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_stmt_cache_size

http://bugs.mysql.com/bug.php?id=51291

http://www.chriscalender.com/?tag=large-transaction

http://mysqlmusings.blogspot.com/2009/02/mixing-engines-in-transactions.html

How to recover for deleted binlogs

Overview on what a binlog is

The binary log contains “events” that describe database changes such as table creation operations or changes to table data.
It also contains events for statements that potentially could have made changes (for example, a DELETE which matched no rows),
unless row-based logging is used.
The binary log also contains information about how long each statement took that updated data.

The binary log has two important purposes:

For replication, the binary log is used on master replication servers as a record of the statements to be sent to slave servers.
The master server sends the events contained in its binary log to its slaves, which execute those events to make the same data
changes that were made on the master. See Section 16.2, “Replication Implementation”.
Certain data recovery operations require use of the binary log. After a backup has been restored, the events in the binary log
that were recorded after the backup was made are re-executed. These events bring databases up to date from the
point of the backup. See Section 6.5, “Point-in-Time (Incremental) Recovery Using the Binary Log”.

Running a server with binary logging enabled makes performance slightly slower.
However, the benefits of the binary log in enabling you to set up replication and for
restore operations generally outweigh this minor performance decrement.

Binlog main settings

The binlog is controlled by few settings in the my.cnf file, let quickly see them (credits MySQL manual):

log-bin[=base_name]
Enable binary logging. The option value, if given, is the basename for the log sequence.
The server creates binary log files in sequence by adding a numeric suffix to the basename.
It is recommended that you specify a basename, otherwise, MySQL uses host_name-bin as the basename.

Parameter to do or ignore databases

binlog-do-db=db_name
binlog-ignore-db=db_name
replicate-do-db
replicate-ignore-db

sync_binlog
-------------
If the value of this variable is greater than 0, the MySQL server synchronizes its binary log to disk (using fdatasync()) after every sync_binlog writes to the binary log.
There is one write to the binary log per statement if autocommit is enabled, and one write per transaction otherwise.
The default value of sync_binlog is 0, which does no synchronizing to disk—in this case, the server relies
on the operating system to flush the binary log's contents from to time as for any other file.
A value of 1 is the safest choice because in the event of a crash you lose at most one statement or transaction from the binary log.
However, it is also the slowest choice (unless the disk has a battery-backed cache, which makes synchronization very fast).

binlog_cache_size
-----------------
The size of the cache to hold the SQL statements for the binary log during a transaction.
A binary log cache is allocated for each client if the server supports any transactional storage engines and if the server has the binary log enabled (--log-bin option).
If you often use large, multiple-statement transactions, you can increase this cache size to get more performance.

max_binlog_cache_size
---------------------
If a multiple-statement transaction requires more than this many bytes of memory, the server generates a Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage error.
The minimum value is 4096.

max_binlog_size
---------------
If a write to the binary log causes the current log file size to exceed the value of this variable, the server rotates the binary logs(closes the current file and opens the next one).
The minimum value is 4096 bytes. The maximum and default value is 1GB.

expire_logs_days
----------------
The number of days for automatic binary log file removal. The default is 0, which means “no automatic removal.”
Possible removals happen at startup and when the binary log is flushed.

An example of usage of all of them:

##############
#REPLICATION
#############
log_bin            =  /var/mysql/log/binlog51/mysql-bin
relay-log        =  /var/mysql/log/relaylog/relay

expire_logs_days      = 30
max_binlog_size       = 500M
log-slave-updates      = 1
max_binlog_cache_size = 4294967295
sync_binlog              = 1    

#skip-slave-start
replicate-ignore-db=mysql

In this case what we have define is an automatic purge of the Binary logs after 30 days.
30 days is a safe window for the replication to happen, so there should not be any issue here in removing the binary logs.
Not always we are so lucky, and some time we have to clean up our disks to save space.

In this case is a good idea to set-up a procedure to perform checks against the slaves before performing a purge, you can do it by yourself
or trust one of the many script on the web, good starting point is Maatkit.

Now let us assume that for any reason you have purged a binary log from the master, and that the slave was not already there.

What will happen is that you will receive a very nice message from MySQL replication:
"Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted
(you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log),
a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to
know their names by issuing 'SHOW SLAVE STATUS' on this slave."

Nothing unexpected ... you have deleted it with the PURGE so log not there anymore.
Assuming you have the backup of the purged binlog (as it should ALWAYS be), copy back the file(s), and edit the <binlog>.index file, adding the missed entry previously PURGED.

Then check if it sounds fine:

mysql> show binary logs;

It will show the added entry.

Well Nice ALL done !!!

Assuming from that we could restart the slave... So let us try.
What we will get is something frustrating:

...
100929 11:21:47 [Note] Slave SQL thread initialized, starting replication 
in log 'mysql-bin.002671' at position 1073741674, relay log './xxxxxxxxxxxx-relay-bin.000001' position: 4
100929 11:21:47 [Note] Slave I/O thread: connected to master This email address is being protected from spambots. You need JavaScript enabled to view it.:3306',
  replication started in log 'mysql-bin.002671' at position 1073741674
100929 11:21:47 [ERROR] Error reading packet from server: Could not find first log 
file name in binary log index file ( server_errno=1236)
100929 11:21:47 [ERROR] Got fatal error 1236: 'Could not find first log file name 
in binary log index file' from master when reading data from binary log
100929 11:21:47 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.002671', position 1073741674
100929 11:22:08 [Note] Error reading relay log event: slave SQL thread was killed

Why this happens?

The situation seems a little bit confusing, right?
The MySQL server say that it CAN see them, but files are not valid for the replication process, so what is going on?

Well first of HOW I DO FIX it this will help us to understand.
Here a comment, I got comment form some MySQL DBA saying that the way is to RESTART MySQL, true if you do that it will fix it,
but you DO NOT NEED TO restart MySQL in this case, instead do the right thing:

mysql> FLUSH LOGS;

Then restart the slave process on the SLAVE:

mysql> SLAVE START;

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: xxxxx.xxxxxx.xxx
Master_User: xxxx
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.xxxx
Read_Master_Log_Pos: xxxxx
Relay_Log_File: xxxxxxxxxxxx-relay-bin.xxxx
Relay_Log_Pos: xxxxxx
Relay_Master_Log_File: mysql-bin.xxx
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 131915378
Relay_Log_Space: 563985741
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 946273
1 row in set (0.00 sec)

WOW it works ..., well this is expected, but I want to spend one second on WHY it works.

What we have to do in this cases is to read the source code and once more see what is happening there.

Reading the code we will see that FLUSH LOGS, will also call this function, which will instanciate a new MYSQL_LOG object.

The point is all here.

 if (log_type == LOG_BIN) 
 { 
 if (!no_auto_events) 
 { 
 /* 
 We log the whole file name for log file as the user may decide 
 to change base names at some point. 
 */ 
 THD *thd = current_thd; /* may be 0 if we are reacting to SIGHUP */ 
 Rotate_log_event r(thd,new_name+dirname_length(new_name), 
 0, LOG_EVENT_OFFSET, 0); 
 r.write(&log_file); 
 bytes_written += r.data_written; 
 } 
 /* 
 Update needs to be signalled even if there is no rotate event 
 log rotation should give the waiting thread a signal to 
 discover EOF and move on to the next log. 
 */ 
 signal_update(); 
 } 
 old_name=name; 
 save_log_type=log_type; 
 name=0;    // Don't free name 
 close(LOG_CLOSE_TO_BE_OPENED | LOG_CLOSE_INDEX);

The function will close the current BINLOG and also the the <binlog>.index associated, then will ROTATE
the binarylog, and create new one with a new name, finally reopen a new  <binlog>.index.

/* reopen index binlog file, BUG#34582 */
if (!open_index_file(index_file_name, 0))
open(old_name, save_log_type, new_name_ptr,
io_cache_type, no_auto_events, max_size, 1);
my_free(old_name,MYF(0));

Lesson Learn

===============
Also if MySQL allow us to do nice changes on the fly, editing and modifying the text files like binlog.index,
internally these changes are not considered valid unless the descriptors for these objects are refresh.
Refresh that is done in different way one of them is using FLUSH LOGS.

Final Consideration

===================
So far so good and nothing really new except that we have done another dig in MySQL code, to understand what and how it works.
There will be much more to say about, and I admit that I am doing it simpler then it is, but it was not my intention to write an article
on code digging, but only showing how also simple and well know actions like the one here reported, could be misleading, if we do not
have a better understanding.
This also put in evidence that it would have being much better:

refresh the binlog.index automatically when it is modify
do not accept/show the binlog.index modified IF there is not an internal refresh, this is confusing, misleading and not consistent.

Hope this (simple) issue will be fix by Oracle in on of the many code review to go.

void MYSQL_LOG::new_file(bool need_lock)
{
char new_name[FN_REFLEN], *new_name_ptr, *old_name;
enum_log_type save_log_type;

DBUG_ENTER("MYSQL_LOG::new_file");
if (!is_open())
{
DBUG_PRINT("info",("log is closed"));
DBUG_VOID_RETURN;
}

if (need_lock)
pthread_mutex_lock(&LOCK_log);
pthread_mutex_lock(&LOCK_index);

safe_mutex_assert_owner(&LOCK_log);
safe_mutex_assert_owner(&LOCK_index);

/*
if binlog is used as tc log, be sure all xids are "unlogged",
so that on recover we only need to scan one - latest - binlog file
for prepared xids. As this is expected to be a rare event,
simple wait strategy is enough. We're locking LOCK_log to be sure no
new Xid_log_event's are added to the log (and prepared_xids is not
increased), and waiting on COND_prep_xids for late threads to
catch up.
*/
if (prepared_xids)
{
tc_log_page_waits++;
pthread_mutex_lock(&LOCK_prep_xids);
while (prepared_xids)
pthread_cond_wait(&COND_prep_xids, &LOCK_prep_xids);
pthread_mutex_unlock(&LOCK_prep_xids);
}

/* Reuse old name if not binlog and not update log */
new_name_ptr= name;

/*
If user hasn't specified an extension, generate a new log name
We have to do this here and not in open as we want to store the
new file name in the current binary log file.
*/
if (generate_new_name(new_name, name))
goto end;
new_name_ptr=new_name;

if (log_type == LOG_BIN)
{
if (!no_auto_events)
{
/*
We log the whole file name for log file as the user may decide
to change base names at some point.
*/
THD *thd = current_thd; /* may be 0 if we are reacting to SIGHUP */
Rotate_log_event r(thd,new_name+dirname_length(new_name),
0, LOG_EVENT_OFFSET, 0);
r.write(&log_file);
bytes_written += r.data_written;
}
/*
Update needs to be signalled even if there is no rotate event
log rotation should give the waiting thread a signal to
discover EOF and move on to the next log.
*/
signal_update();
}
old_name=name;
save_log_type=log_type;
name=0; // Don't free name
close(LOG_CLOSE_TO_BE_OPENED | LOG_CLOSE_INDEX);

/*
Note that at this point, log_type != LOG_CLOSED (important for is_open()).
*/

/*
new_file() is only used for rotation (in FLUSH LOGS or because size >
max_binlog_size or max_relay_log_size).
If this is a binary log, the Format_description_log_event at the beginning of
the new file should have created=0 (to distinguish with the
Format_description_log_event written at server startup, which should
trigger temp tables deletion on slaves.
*/

/* reopen index binlog file, BUG#34582 */
if (!open_index_file(index_file_name, 0))
open(old_name, save_log_type, new_name_ptr,
io_cache_type, no_auto_events, max_size, 1);
my_free(old_name,MYF(0));

end:
if (need_lock)
pthread_mutex_unlock(&LOCK_log);
pthread_mutex_unlock(&LOCK_index);

DBUG_VOID_RETURN;
}

{joscommentenable}

How and why tmp_table_size and max_heap_table_size are bounded.

Overview

In some cases, MySQL creates internal temporary tables while processing queries.
On the base of the dimensions of the resultset MySQL will use the MEMORY engine AND/OR the MyISAM engine.
The difference is that MEMORY engine will handle the table in memory, while MyISAM will write it on disk.
A table created using the MEMORY engine can be automatically converted by the MySQL server if exceed the define threshold.

Then there are others circumstances which could create temporary tables using MEMORY but that can go to MyISAM (so disk) if too large:

If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue;
DISTINCT combined with ORDER BY may require a temporary table;
In the case of the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.

There are some conditions which will force the temporary table to use MyISAM :

Presence of a BLOB or TEXT column in the table;
Presence of any column in a GROUP BY or DISTINCT clause larger than 512 bytes;
Presence of any column larger than 512 bytes in the SELECT list, if UNION or UNION ALL;

The tables explicitly created with CREATE TABLE ENGINE MEMORY use ONLY the max_heap_table_size system variable to determines how large the table is permitted to grow and there is no conversion to on-disk format.

Why still doing a post on this issue?

Because i still see on MySQL configuration that DBA seems use it as separate assignments.Let us take this from the start, the manual.
In the manual section (http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_tmp_table_size) we can find the tmp_table_size definition.

Command-Line Format --tmp_table_size=#
Config-File Format tmp_table_size
Option Sets Variable Yes,tmp_table_size
Variable Name tmp_table_size
Variable Scope Both
Dynamic Variable Yes
Permitted Values
Type numeric
Default system dependent
Range 1024-4294967295

What we need to take in account, and in mind, are these two element:

Default system dependent
Range 1024-4294967295

Now let us check for the max_heap_table_size

Command-Line Format --max_heap_table_size=#
Config-File Format max_heap_table_size
Option Sets Variable Yes, max_heap_table_size
Variable Name max_heap_table_size
Variable Scope Both
Dynamic Variable Yes
Permitted Values
Type numeric
Default 16777216
Range 16384-4294967295

We can easily identify that both can be dynamically modified (Dynamic Variable Yes), that both the variable could be assign at global scope or per session (Variable Scope Both).

Finally that max_heap_size is by default 16M while tmp_table_size is system dependent.

Let us to try to clarify this digging in the code doing:

<mysql_source_dir>/sql ->
[root@tusacentral07 sql]# grep tmp_table_size *.*

and we found this files.

set_var.cc:static sys_var_thd_ulonglong sys_tmp_table_size(&vars, "tmp_table_size",
set_var.cc: &SV::tmp_table_size);

From that my understanding is that both starts with a value of 16MB full-stop.

In set_var.cc the statement is:

"static sys_var_thd_ulonglong sys_tmp_table_size(&vars, "tmp_table_size",&SV::tmp_table_size);"

we have no explicit reference to allocation.

The place were we have it is in mysqld.cc :

{"tmp_table_size", OPT_TMP_TABLE_SIZE,
"If an internal in-memory temporary table exceeds this size, MySQL will"
" automatically convert it to an on-disk MyISAM table.",
&global_system_variables.tmp_table_size,
&max_system_variables.tmp_table_size, 0, GET_ULL,
REQUIRED_ARG, 16*1024*1024L, 1024, MAX_MEM_TABLE_SIZE, 0, 1, 0},

{"max_heap_table_size", OPT_MAX_HEP_TABLE_SIZE,
"Don't allow creation of heap tables bigger than this.",
&global_system_variables.max_heap_table_size,
&max_system_variables.max_heap_table_size, 0, GET_ULL,
REQUIRED_ARG, 16*1024*1024L, 16384, MAX_MEM_TABLE_SIZE,
MALLOC_OVERHEAD, 1024, 0},

In the MySQL 5.5 and above, we have a clearer place and assignment:

in sys_vars.cc:

static Sys_var_ulonglong Sys_tmp_table_size(
"tmp_table_size",
"If an internal in-memory temporary table exceeds this size, MySQL "
"will automatically convert it to an on-disk MyISAM table",
SESSION_VAR(tmp_table_size), CMD_LINE(REQUIRED_ARG),
VALID_RANGE(1024, (ulonglong)~(intptr)0), DEFAULT(16*1024*1024),
BLOCK_SIZE(1));

static Sys_var_ulonglong Sys_max_heap_table_size(
"max_heap_table_size",
"Don't allow creation of heap tables bigger than this",
SESSION_VAR(max_heap_table_size), CMD_LINE(REQUIRED_ARG),
VALID_RANGE(16384, (ulonglong)~(intptr)0), DEFAULT(16*1024*1024),
BLOCK_SIZE(1024));

But the result is the same, MySQL start from 16MB, no matter what system you have, or if it does, is very well hide from search.

Now let see what the manual say about how this two variables and their relation:

The maximum size of internal in-memory temporary tables. (The actual limit is determined as the minimum of tmp_table_size and max_heap_table_size.)

If an in-memory temporary table exceeds the limit, MySQL automatically converts it to an on-disk MyISAM table.

Increase the value of tmp_table_size (and max_heap_table_size if necessary) if you do many advanced GROUP BY queries and you have lots of memory.

This variable does not apply to user-created MEMORY tables.

Well this one seems clear enough for everyone but to avoid confusion, check the code:

so again

[root@tusacentral07 sql]# grep tmp_table_size *.*

but this time we open the file

sql_select.cc

if (thd->variables.tmp_table_size == ~ (ulonglong) 0) // No limit
share->max_rows= ~(ha_rows) 0;
else
share->max_rows= (ha_rows) (((share->db_type() == heap_hton) ?
min(thd->variables.tmp_table_size,
thd->variables.max_heap_table_size) :
thd->variables.tmp_table_size) /
share->reclength);

Here is were we find the min() functions used from MySQL.

min(thd->variables.tmp_table_size,thd->variables.max_heap_table_size)

So as the manual state MySQL will take the LOWER value assign to the two variables, as the good value for tmp_table_size (only).

To be more clear, if we have in the my.cnf:

tmp_table_size = 32M
max_heap_table_size = 16M

MySQL will assign to tmp_table_size the value of 16M and to max_heap_table_size 16M.

if we have:

tmp_table_size = 16M
max_heap_table_size = 32M

MySQL will assign to tmp_table_size the value of 16M and to max_heap_table_size 32M.

This needs to take in account because I still see at client site a lot of :

tmp_table_size = 32M
max_heap_table_size = 16M

Which makes no sense.

Finally, this variable has huge possible impact on memory consumption.

Just consider that having:

max_connctions = 600
tmp_table_size = 32M
max_heap_table_size = 32M

These settings could potentially generate a memory usage on the server of (600 x 16MB) = 9.3 GB.

Given that the variables are dynamic, and that could be assign at session level, it is good practics to increase the value of tmp_table_size only when is really needed, in the session before performing the operation which will require it.

Seems all to me but If I have miss something here, corrections are welcome.

{joscommentenable}

How to Reset root password in MySQL

I have a problem, It's been months since I used MySQL and (I believe) I had set it up with a root password. Now I can't log on to MySQL as root MySQL user and create a new user or manage an existing user (I can log onto server)...

Familiar situation. :)

Do so:

 service mysql stop

wait until MySQL shuts down. Then run

mysqld_safe --skip-grant-tables &

then you will be able to login as root with no password.

mysql -uroot mysql

In MySQL command line prompt issue the following command:

UPDATE user SET password=PASSWORD("abcd") WHERE user="root";

FLUSH PRIVILEGES;

At this time your root password is reset to "abcd" and MySQL will now
know the privileges and you'll be able to login with your new password:

mysql -uroot -pabcd mysql

How to insert information on Access denied on the MySQL error log

===============================================================

MySQL is really poor in is support for AUDITING.

There will be some new feature in 5.5 or later, in which we will see the AUDIT interface finally implemented.

But ... who knows what will really happen, and who know if it will work or not.

So in the meantime, if you want to have some information printed out (at least), you can use this simple pathc.

What you need to do is modifying the file mysqld.cc in the sql directory.

in:> sql/mysqld.cc
Look for the function : int my_message_sql(uint error, const char *str, myf MyFlags)

change the code in the function with the one here, but also check that it is not inserting new bugs ;-).

int my_message_sql(uint error, const char *str, myf MyFlags)
{
THD *thd;
DBUG_ENTER("my_message_sql");
DBUG_PRINT("error", ("error: %u message: '%s'", error, str));

DBUG_ASSERT(str != NULL);
/*
Code added for writing access denied
*/
if ((global_system_variables.log_warnings > 1) &&
(error == ER_DBACCESS_DENIED_ERROR ||
error == ER_ACCESS_DENIED_ERROR ||
error == ER_TABLEACCESS_DENIED_ERROR ||
error == ER_COLUMNACCESS_DENIED_ERROR ||
error == ER_SPECIFIC_ACCESS_DENIED_ERROR ||
error == ER_PROCACCESS_DENIED_ERROR))
{ // then this is an access-denied error, log it
sql_print_warning(str);
}

Recompile and test it just trying to accessing something with a user that DO NOT have th permission to do it.

You will see the Alert in the log!

So simple so coool

Sidebar

Main Menu Mobile

MySQL

Binary log and Transaction cache in MySQL 5.1 & 5.5

The Binary Log

Transaction Cache

Actors involved (who control what)

MySQL Transaction cache and MySQL versions

How the whole works

Note

What about using binlog_direct_non_transactional_updates, will it helps?

Transaction Cache in MySQL 5.5

Conclusion

Last notes

How to recover for deleted binlogs

Overview on what a binlog is

Binlog main settings

Parameter to do or ignore databases

Lesson Learn

Final Consideration

How and why tmp_table_size and max_heap_table_size are bounded.

Overview

Why still doing a post on this issue?

How to Reset root password in MySQL

How to insert information on Access denied on the MySQL error log

More Articles ...

login