MySQL Optimization, part 1 - 6.2.9 How MySQL Optimizes ORDER BY (
Page 6 of 8 )
In some cases, MySQL can use an index to satisfy an ORDER BY or
GROUP BY clause without doing any extra sorting.
The index can also be used even if the ORDER BY doesn't match the
index exactly, as long as all the unused index parts and all the extra ORDER
BY columns are constants in the WHERE clause. The following
queries will use the index to resolve the ORDER BY or GROUP BY
part:
SELECT * FROM t1 ORDER BY key_part1,key_part2,... ;
SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2;
SELECT * FROM t1 WHERE key_part1=constant GROUP BY key_part2;
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 DESC;
SELECT * FROM t1
WHERE key_part1=1 ORDER BY key_part1 DESC, key_part2 DESC;
In some cases, MySQL cannot use indexes to resolve the ORDER
BY, although it still will use indexes to find the rows that match the
WHERE clause. These cases include the following:
SELECT * FROM t1 ORDER BY key1, key2;
SELECT * FROM t1 WHERE key_part2=constant ORDER BY key_part2;
SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC;
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
-
You are joining many tables, and the columns in the ORDER BY are not
all from the first non-constant table that is used to retrieve rows. (This is
the first table in the EXPLAIN output that doesn't have a
const join type.)
-
You have different ORDER BY and GROUP BY expressions.
-
The type of table index used doesn't store rows in order. For example, this
is true for a HASH index in a HEAP table.
In those cases where MySQL must sort the result, it uses the following
algorithm:
-
Read all rows according to key or by table scanning. Rows that don't match
the WHERE clause are skipped.
-
Store the sort key value in a buffer. The size of the buffer is the value of
the sort_buffer_size system variable.
-
When the buffer gets full, run a qsort (quicksort) on it and store the result
in a temporary file. Save a pointer to the sorted block. (If all rows fit into
the sort buffer, no temporary file is created.)
-
Repeat the preceding steps until all rows have been read.
-
Do a multi-merge of up to MERGEBUFF (7) regions to one block in
another temporary file. Repeat until all blocks from the first file are in the
second file.
-
Repeat the following until there are fewer than MERGEBUFF2 (15)
blocks left.
-
On the last multi-merge, only the pointer to the row (the last part of the
sort key) is written to a result file.
-
Read the rows in sorted order by using the row pointers in the result file.
To optimize this, we read in a big block of row pointers, sort them, and use
them to read the rows in sorted order into a row buffer. The size of the buffer
is the value of the read_rnd_buffer_size system variable. The code for
this step is in the sql/records.cc source file.
With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can
use indexes to resolve the query. It cannot if you see Using filesort
in the Extra column. See Section 6.2.1, "EXPLAIN Syntax (Get
Information About a SELECT)."
If you want to increase ORDER BY speed, first see whether you can
get MySQL to use indexes rather than an extra sorting phase. If this is not
possible, you can try the following strategies:
-
Increase the size of the sort_buffer_size variable.
-
Increase the size of the read_rnd_buffer_size variable.
-
Change tmpdir to point to a dedicated filesystem with lots of empty
space. If you use MySQL 4.1 or later, this option accepts several paths that are
used in round-robin fashion. Paths should be separated by colon characters
(':') on Unix and semicolon characters (';') on Windows,
NetWare, and OS/2. You can use this feature to spread the load across several
directories. Note: The paths should be for directories in filesystems
that are located on different physical disks, not different partitions of
the same disk.
By default, MySQL sorts all GROUP BY col1, col2, ...
queries as if you specified ORDER BY col1, col2, ... in
the query as well. If you include an ORDER BY clause explicitly that
contains the same column list, MySQL optimizes it away without any speed
penalty, although the sorting still occurs. If a query includes GROUP
BY but you want to avoid the overhead of sorting the result, you can
suppress sorting by specifying ORDER BY NULL. For example:
INSERT INTO foo
SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
6.2.10 How MySQL Optimizes LIMIT
In some cases, MySQL will handle a query differently when you are using
LIMIT row_count and not using HAVING:
-
If you are selecting only a few rows with LIMIT, MySQL uses indexes
in some cases when normally it would prefer to do a full table scan.
-
If you use LIMIT row_count with ORDER BY, MySQL ends
the sorting as soon as it has found the first row_count lines
rather than sorting the whole table.
-
When combining LIMIT row_count with DISTINCT, MySQL
stops as soon as it finds row_count unique rows.
-
In some cases, a GROUP BY can be resolved by reading the key in
order (or doing a sort on the key) and then calculating summaries until the key
value changes. In this case, LIMIT row_count will not calculate
any unnecessary GROUP BY values.
-
As soon as MySQL has sent the required number of rows to the client, it
aborts the query unless you are using SQL_CALC_FOUND_ROWS.
-
LIMIT 0 always quickly returns an empty set. This is useful to check
the query or to get the column types of the result columns.
-
When the server uses temporary tables to resolve the query, the LIMIT
row_count is used to calculate how much space is
required.
6.2.11 How to Avoid Table Scans
The output from EXPLAIN will show ALL in the type
column when MySQL uses a table scan to resolve a query. This usually happens
under the following conditions:
-
The table is so small that it's faster to do a table scan than a key lookup.
This is a common case for tables with fewer than 10 rows and a short row
length.
-
There are no usable restrictions in the ON or WHERE clause
for indexed columns.
-
You are comparing indexed columns with constant values and MySQL has
calculated (based on the index tree) that the constants cover too large a part
of the table and that a table scan would be faster. See Section 6.2.4, "How
MySQL Optimizes WHERE Clauses."
-
You are using a key with low cardinality (many rows match the key value)
through another column. In this case, MySQL assumes that by using the key it
will probably do a lot of key lookups and that a table scan would be
faster.
For small tables, a table scan often is appropriate. For large tables, try
the following techniques to avoid having the optimizer incorrectly choose a
table scan:
-
Use ANALYZE TABLE tbl_name to update the key distributions
for the scanned table.
-
Use FORCE INDEX for the scanned table to tell MySQL that table scans
are very expensive compared to using the given index.
SELECT * FROM t1, t2 FORCE INDEX (index_for_column)
WHERE t1.col_name=t2.col_name;
-
Start mysqld with the --max-seeks-for-key=1000 option or
use SET max_seeks_for_key=1000 to tell the optimizer to assume that no
key scan will cause more than 1,000 key seeks. See Section 4.2.3, "Server System
Variables."
This article is excerpted from MySQL
Administrator's Guide, by MySQL AB (editor) (Sams, 2004;
ISBN 0672326345). Check it out at your
favorite bookstore today. Buy
this book now. |