Tuesday, November 22, 2011

Upgrading from Mysql 5.0 to 5.5

Having dealt with Application server upgrades, library upgrades and more in the past, I have to say that this has been one of the smoother component upgrades I have dealt with so far.

Here are some brief findings and code changes I had to go through to get our applications upgraded from Mysql 5.0[58]/ Mysql 5.0[77] to Mysql 5.5[15]

1. INNODB definition

a. The TYPE attribute is unsupported as of Mysql 5.5

b. Schema needs to be updated to use the ENGINE instead.

c. Example: ENGINE=InnoDB (unsupported: TYPE=InnoDB)


2. Timestamp

a. In very old versions of MySQL (prior to 4.1), the TIMESTAMP data type supported a display width, which was silenty ignored beginning with MySQL 4.1. This is deprecated in MySQL 5.1, and removed altogether in MySQL 5.5. These changes in behavior can lead to two problem scenarios when trying to use TIMESTAMP(N) columns with a MySQL 5.5 or later server:

i. When importing a dump file (for example, one created using mysqldump) created in a MySQL 5.0 or earlier server into a server from a newer release series, a CREATE TABLE or ALTER TABLE statement containing TIMESTAMP(N) causes the import to fail with a syntax error.

ii. To fix this problem, edit the dump file in a text editor to replace any instances of TIMESTAMP(N) with TIMESTAMP prior to importing the file. Be sure to use a plain text editor for this, and not a word processor; otherwise, the result is almost certain to be unusable for importing into the MySQL server.


3. GROUP_CONCAT(N)

GROUP_CONCAT when used with Integers used to give us back a byte [] as of Mysql 5.0. Mysql 5.5 treats this differently and does an auto conversion to String. So any code that expected a byte [] will have to be updated.

Here is an example:

SELECT GROUP_CONCAT(ROW_NUMBER ORDER BY ROW_NUMBER) AS ROW_NUMBER FROM test_table;

TO get this to work consistently we will have to make code changes that involve Casting object types in SQL and always returning a String. Any code that read them back as a byte [] will have to be refactored to use a String instead.

For example:

SELECT CAST(GROUP_CONCAT(ROW_NUMBER ORDER BY ROW_NUMBER) AS BINARY) AS ROW_NUMBER FROM test_table;



4 Mysqldump – This has been largely backward compatible, except for one of the options that we have been using, which is -–all. This option was deprecated a while ago and somewhere between 5.1.x and 5.5.x this was taken out. The recommendation was to use –create-options instead.


Apart from these we have had challenges to migrate customers from one installation to the other but that goes without saying. There are enough documentation out there that help with these.

Here are some performance benchmarks using supersmack:


The 5.0.92 Percona machine seems to have much higher wait IO and lower throughput:
bash-3.2# super-smack update-select.smack 30 10000
Query Barrel Report for client smacker
connect: max=2ms min=0ms avg= 1ms from 30 clients
Query_type num_queries max_time min_time q_per_s
select_index 300000 12 0 1740.80
update_index 300000 18 0 1740.80


MySQL 5.5.15
bash-3.2# /usr/local/bin/super-smack update-select.smack 30 10000
Query Barrel Report for client smacker
connect: max=15ms min=0ms avg= 5ms from 30 clients
Query_type num_queries max_time min_time q_per_s
select_index 300000 3 0 5209.22
update_index 300000 12 0 5209.22

bash-3.2# mysql --version
mysql Ver 14.12 Distrib 5.0.92, for unknown-linux-gnu (x86_64) using readline 5.1
bash-3.2# mysqladmin status
Uptime: 99573 Threads: 6 Questions: 973831 Slow queries: 0 Opens: 1290 Flush tables: 1 Open tables: 320 Queries per second avg: 9.780

bash-3.2# mysql --version
mysql Ver 14.14 Distrib 5.5.15, for Linux (x86_64) using EditLine wrapper
bash-3.2# mysqladmin status
Uptime: 99247 Threads: 6 Questions: 1822824 Slow queries: 0 Opens: 1686 Flush tables: 1 Open tables: 96 Queries per second avg: 18.366


Its certainly fast and much improved InnoDB comes as a relief. Performance Schema is exciting too although it will be interesting to see if it adds up to any overhead.

InnoDB and TEMPORARY tables

Test Environment
5.0.92 64 Bit – Percona Mysql Server, 4 CPU, 32 GB RAM, CentOS, VMWare.

I have been looking at ways to optimize performance our J2EE applications while running expensive tasks and stumbled upon an interesting optimization that is described below

For InnoDB users, Its a common practice for developers to use InnoDB as the engine for whatever we use. Common causes for that would be copying over create statements from some other existing tables or maybe just addicted to InnoDB. This Maybe true for even TEMPORARY tables that are used as part of JDBC transactions.

It may well be a mistake in some cases.

Here is a snippet of SQL code:

CREATE TEMPORARY TABLE temp3(user_id VARCHAR(32), total MEDIUMINT, PRIMARY KEY (user_id))ENGINE=InnoDB DEFAULT CHARSET=utf8;

We used these sort of tables heavily within the scope of a transaction and ran some heavy queries out of these, and eventually cleaned them up. Generally these sort of calls were made as part of some iterations of N.

Here is what I changed it to:

CREATE TEMPORARY TABLE temp3(user_id VARCHAR(32), total MEDIUMINT, PRIMARY KEY (user_id))ENGINE=MyISAM DEFAULT CHARSET=utf8;

Without Optimizations:
160 seconds (Avg. of 100 iterations)

With Optimizations:
50 seconds (Avg. of 100 iterations)

That's a Huge Improvement!

Although its known that MyISAM is faster in certain scenarios, and InnoDB works better in others, its a choice the developer has to make while defining tables that do not care for referential integrity, selects vs updates and so forth. Choosing MyISAM for scenarios as described above not only makes it go faster but also transfers some load out of the InnoDB engine out to MyISAM, whereas InnoDB could be better utilized for other parts of your application. Highly recommend to try for yourself and prove.

Another interesting test would be to perform similar tests on Mysql 5.5 which has got a better InnoDB engine. I will try to publish some results on these soon.

Mysql O_DIRECT and InnoDB Performance

TEST Environment

Mysql 5.0.92 - Percona Mysql Server, VMWare, CentOS, 32 GB RAM, 4 CPU, SAN.

In my tests with O_DIRECT set as the InnoDB flush method it was found to be resulting in a slow down for IO bound queries while running with SAN as the MySQL backend storage. A lot of our batch jobs or features like reports use temporary tables that are IO bound.

Some extracts from the Mysql Documentation in support of these findings

[["innodb_flush_method If set to fdatasync (the default), InnoDB uses fsync() to flush both the data and log files. If set to O_DSYNC, InnoDB uses O_SYNC to open and flush the log files, and fsync() to flush the data files. If O_DIRECT is specified (available on some GNU/Linux versions, FreeBSD, and Solaris), InnoDB uses O_DIRECT (or directio() on Solaris) to open the data files, and uses fsync() to flush both the data and log files. Note that InnoDB uses fsync() instead of fdatasync(), and it does not use O_DSYNC by default because there have been problems with it on many varieties of Unix. This variable is relevant only for Unix. On Windows, the flush method is always async_unbuffered and cannot be changed.

Different values of this variable can have a marked effect on InnoDB performance. For example, on some systems where InnoDB data and log files are located on a SAN, it has been found that setting innodb_flush_method to O_DIRECT can degrade performance of simple SELECT statements by a factor of three." ]]


It was observed in my testing that running with defaults gave me at least a 60% improvement. Here are some performance metrics:


With O_DIRECT


[root@host]# vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 21533436 489548 8243880 0 0 2 66 2 13 1 0 99 0 0
0 1 0 21456184 489556 8243964 0 0 13734 4272 2204 2598 6 2 67 24 0
0 1 0 21426052 489564 8243972 0 0 2598 7415 1923 1849 13 2 67 18 0
0 1 0 21403608 489564 8244136 0 0 1130 7932 1885 1739 14 2 68 16 0
1 0 0 21379676 489564 8244084 0 0 1632 8526 1979 1943 13 2 68 17 0
1 0 0 21360580 489572 8244216 0 0 1165 8336 1909 1821 13 2 68 17 0
0 1 0 21340988 489576 8244144 0 0 1114 8719 1963 1946 12 2 68 18 0
0 1 0 21323752 489580 8244336 0 0 694 8711 1940 1854 12 2 68 18 0
0 1 0 21305896 489580 8244248 0 0 778 8207 1888 1791 13 2 68 18 0
0 0 0 21287792 489580 8244444 0 0 787 8398 1916 1799 12 2 68 18 0
0 1 0 21270556 489584 8244392 0 0 611 8661 1926 1853 13 2 68 17 0
0 0 0 20929308 489652 8246276 0 0 0 393 989 91 0 0 100 0 0



Without O_DIRECT (Mysql Defaults)


[root@host]# vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 18700144 489492 8238128 0 0 2 66 2 13 1 0 99 0 0
0 1 0 18681236 489492 8238844 0 0 16 11651 2162 2086 18 3 69 10 0
1 0 0 18662140 489500 8239296 0 0 0 10927 2073 1933 18 2 69 11 0
0 0 0 18643356 489500 8239652 0 0 16 11022 2084 1963 18 3 68 11 0
0 1 0 18624508 489500 8239648 0 0 0 11665 2162 2072 17 3 69 11 0
1 0 0 18606032 489508 8240008 0 0 0 11210 2113 2006 18 2 68 12 0
1 0 0 18587308 489512 8240132 0 0 0 11120 2097 1983 18 3 68 11 0
1 0 0 18567964 489516 8240564 0 0 0 10951 2067 1944 18 2 68 11 0
1 0 0 18550604 489520 8241068 0 0 0 11098 2105 1982 17 3 68 12 0
1 0 0 18531012 489520 8241008 0 0 0 11286 2113 2005 19 3 69 9 0
0 1 0 18512412 489528 8241360 0 0 0 11349 2123 2015 17 3 68 12 0
0 0 0 18282208 489536 8245508 0 0 10 2454 1183 404 3 1 94 2 0


Some of the key differences seen in VMSTAT are highlighted above. Mainly seen in bi, bo and wa.

It is worth testing this further with other tools specifically for IO against SAN or just measuring stats for reads/writes before and after the changes, but as documented by MySQL we just need to try it out ourselves and use what works best on a given platform/environment.


Useful tools we could use to further test this and more:

1. http://googlux.com/bonnie.html

2. http://vegan.net/tony/supersmack/