Thursday, March 22, 2007

Fun with MySQL query optimizer

A few days ago, after a minor bug fix in our site code, suddenly, the load on the database server dropped about 50%. I was obviously interested in what caused such a major improvement and found out a few interesting things. To demonstrate, this fascinating phenomenon, lets create a database:
CREATE DATABASE test;

create a table
CREATE TABLE `table1` (
`f1` int(11) NOT NULL auto_increment,
`f2` char(10) NOT NULL,
PRIMARY KEY (`f1`),
KEY `an_index` (`f2`)
);

and populate this table with some values
DELIMITER $$
DROP PROCEDURE IF EXISTS `test`.`populate_table1`$$
CREATE PROCEDURE `test`.`populate_table1` (ct INT)
BEGIN
PREPARE q1 FROM 'INSERT INTO table1 (f2) SELECT ?';
SET @x = 0;
REPEAT
EXECUTE q1 USING @x;
SET @x = @x + 1;
UNTIL @x >= ct
END REPEAT;
END$$
DELIMITER ;
CALL populate_table1(100000);

And now the evil magic begins (query results skipped for brevity)

 mysql> SELECT COUNT(*) FROM table1
WHERE f2 IN ("100", "101", "102", "103", "104", "105", "106",
"107", "108", "109", "110", "111", "112", "113", "114", "115",
"116", "117", "118", "119", "120", "121", "122", "123", "124",
"125", "126", "127", "128", "129", "130", "131", "132", "133",
"134", "135", "136", "137", "138", "139", "140", "141", "142",
"143", "144", "145", "146", "147", "148", "149", "150", "151",
"152", "153", "154", "155", "156", "157", "158", "159", "160",
"161", "162", "163", "164", "165", "166", "167", "168", "169",
"170", "171", "172", "173", "174", "175", "176", "177", "178",
"179", "180", "181", "182", "183", "184", "185", "186", "187",
"188", "189", "190", "191", "192", "193", "194", "195", "196",
"197", "198", "199", "200");
+----------+
| COUNT(*) |
+----------+
| 101 |
+----------+
1 row in set (0.00 sec)
mysql> SELECT COUNT(*) FROM table1
WHERE f2 IN ("100", "101", "102", "103", "104", "105", "106",
"107", "108", "109", "110", "111", "112", "113", "114", "115",
"116", "117", "118", "119", "120", "121", "122", "123", "124",
"125", "126", "127", "128", "129", "130", "131", "132", "133",
"134", "135", "136", "137", "138", "139", "140", "141", "142",
"143", "144", "145", "146", "147", "148", "149", "150", "151",
"152", "153", "154", "155", "156", "157", "158", "159", 160,
"161", "162", "163", "164", "165", "166", "167", "168", "169",
"170", "171", "172", "173", "174", "175", "176", "177", "178",
"179", "180", "181", "182", "183", "184", "185", "186", "187",
"188", "189", "190", "191", "192", "193", "194", "195", "196",
"197", "198", "199", "200");
+----------+
| COUNT(*) |
+----------+
| 101 |
+----------+
1 row in set (0.16 sec)

Do you see the difference? Probably not. But it just made the query 16 times slower! I spent a long time finding that one (and in our case I was looking at endless lists of zip codes). Let me illustrate.
 mysql> SELECT COUNT(*) FROM table1
WHERE f2 IN (100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122,
123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135,
136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161,
162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174,
175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187,
188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200);
+----------+
| COUNT(*) |
+----------+
| 101 |
+----------+
1 row in set (0.16 sec)

Right. Type casting. In the second query I unquoted one of the values in the list in the list. Namely the value "160". Now to the point. I was a bit frustrated and I couldn't believe that a cast of an INT to a CHAR would take so long and it doesn't. A simple explain shows what is wrong. We are not using indexes properly.
 mysql> explain SELECT COUNT(*) FROM table1
WHERE f2 IN (<badly quoted list>)G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table1
type: index
possible_keys: an_index
key: an_index
key_len: 10
ref: NULL
rows: 100000
Extra: Using where; Using index
1 row in set (0.00 sec)
mysql> explain SELECT * FROM table1
WHERE f2 IN (<well quoted list>)G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: table1
type: range
possible_keys: an_index
key: an_index
key_len: 10
ref: NULL
rows: 116
Extra: Using where; Using index
1 row in set (0.01 sec)

As you can see in the first case (with a type cast) we look through all the 100K values, where in the second case (with quoting fixed) we only look through 116. That is where the performance goes. But why? I asked some people from MySQL AB (good thing we have a support contract). And the answer I got was that it is difficult to use indexes on CHAR fields after a cast from INT, since you can comapre a CHAR to an INT in more than one way. Wow! A quick check in MySQL confirmed my nightmares.

mysql> select 1 = 1;
+-------+
| 1 = 1 |
+-------+
| 1 |
+-------+
1 row in set (0.00 sec)
mysql> select 1 = "1";
+---------+
| 1 = "1" |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)
mysql> select 1 = "1 ";
+-----------+
| 1 = "1 " |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
mysql> select 1 = "1asdf";
+-------------+
| 1 = "1asdf" |
+-------------+
| 1 |
+-------------+
1 row in set, 1 warning (0.00 sec)

Whaaa!!! So that explains why MySQL finds it "difficult" to use indexes after a type cast. The question that remains is what MySQL team was smoking when they implemented string comparison in this creative new way.

No comments:

Post a Comment