While optimization is possible with limited knowledge of your system or application, the more you know about your system, the better your optimization will be. This article, the first of two parts, covers some of the different points you will need to know for optimizing MySQL. It is excerpted from chapter six of the book MySQL Administrator's Guide, by MySQL AB (Sams, 2004; ISBN: 0672326345)
In some cases, MySQL can use an index to satisfy an ORDER BY or GROUP BY clause without doing any extra sorting.
The index can also be used even if the ORDER BY doesn't match the index exactly, as long as all the unused index parts and all the extra ORDER BY columns are constants in the WHERE clause. The following queries will use the index to resolve the ORDER BY or GROUP BY part:
SELECT * FROM t1 ORDER BY key_part1,key_part2,... ;
SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2;
SELECT * FROM t1 WHERE key_part1=constant GROUP BY key_part2;
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 DESC;
SELECT * FROM t1
WHERE key_part1=1 ORDER BY key_part1 DESC, key_part2 DESC;
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still will use indexes to find the rows that match the WHERE clause. These cases include the following:
You use ORDER BY on different keys:
SELECT * FROM t1 ORDER BY key1, key2;
You use ORDER BY on non-consecutive key parts:
SELECT * FROM t1 WHERE key_part2=constant ORDER BY key_part2;
You mix ASC and DESC:
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;
The key used to fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
You are joining many tables, and the columns in the ORDER BY are not all from the first non-constant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that doesn't have a const join type.)
You have different ORDER BY and GROUP BY expressions.
The type of table index used doesn't store rows in order. For example, this is true for a HASH index in a HEAP table.
In those cases where MySQL must sort the result, it uses the following algorithm:
Read all rows according to key or by table scanning. Rows that don't match the WHERE clause are skipped.
Store the sort key value in a buffer. The size of the buffer is the value of the sort_buffer_size system variable.
When the buffer gets full, run a qsort (quicksort) on it and store the result in a temporary file. Save a pointer to the sorted block. (If all rows fit into the sort buffer, no temporary file is created.)
Repeat the preceding steps until all rows have been read.
Do a multi-merge of up to MERGEBUFF (7) regions to one block in another temporary file. Repeat until all blocks from the first file are in the second file.
Repeat the following until there are fewer than MERGEBUFF2 (15) blocks left.
On the last multi-merge, only the pointer to the row (the last part of the sort key) is written to a result file.
Read the rows in sorted order by using the row pointers in the result file. To optimize this, we read in a big block of row pointers, sort them, and use them to read the rows in sorted order into a row buffer. The size of the buffer is the value of the read_rnd_buffer_size system variable. The code for this step is in the sql/records.cc source file.
With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column. See Section 6.2.1, "EXPLAIN Syntax (Get Information About a SELECT)."
If you want to increase ORDER BY speed, first see whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:
Increase the size of the sort_buffer_size variable.
Increase the size of the read_rnd_buffer_size variable.
Change tmpdir to point to a dedicated filesystem with lots of empty space. If you use MySQL 4.1 or later, this option accepts several paths that are used in round-robin fashion. Paths should be separated by colon characters (':') on Unix and semicolon characters (';') on Windows, NetWare, and OS/2. You can use this feature to spread the load across several directories. Note: The paths should be for directories in filesystems that are located on different physical disks, not different partitions of the same disk.
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL. For example:
INSERT INTO foo
SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
6.2.10 How MySQL Optimizes LIMIT
In some cases, MySQL will handle a query differently when you are using LIMIT row_count and not using HAVING:
If you are selecting only a few rows with LIMIT, MySQL uses indexes in some cases when normally it would prefer to do a full table scan.
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather than sorting the whole table.
When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.
In some cases, a GROUP BY can be resolved by reading the key in order (or doing a sort on the key) and then calculating summaries until the key value changes. In this case, LIMIT row_count will not calculate any unnecessary GROUP BY values.
As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.
LIMIT 0 always quickly returns an empty set. This is useful to check the query or to get the column types of the result columns.
When the server uses temporary tables to resolve the query, the LIMIT row_count is used to calculate how much space is required.
6.2.11 How to Avoid Table Scans
The output from EXPLAIN will show ALL in the type column when MySQL uses a table scan to resolve a query. This usually happens under the following conditions:
The table is so small that it's faster to do a table scan than a key lookup. This is a common case for tables with fewer than 10 rows and a short row length.
There are no usable restrictions in the ON or WHERE clause for indexed columns.
You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large a part of the table and that a table scan would be faster. See Section 6.2.4, "How MySQL Optimizes WHERE Clauses."
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it will probably do a lot of key lookups and that a table scan would be faster.
For small tables, a table scan often is appropriate. For large tables, try the following techniques to avoid having the optimizer incorrectly choose a table scan:
Use ANALYZE TABLE tbl_name to update the key distributions for the scanned table.
Use FORCE INDEX for the scanned table to tell MySQL that table scans are very expensive compared to using the given index.
SELECT * FROM t1, t2 FORCE INDEX (index_for_column)
Start mysqld with the --max-seeks-for-key=1000 option or use SET max_seeks_for_key=1000 to tell the optimizer to assume that no key scan will cause more than 1,000 key seeks. See Section 4.2.3, "Server System Variables."
This article is excerpted from MySQL Administrator's Guide, by MySQL AB (editor) (Sams, 2004; ISBN 0672326345). Check it out at your favorite bookstore today. Buy this book now.