Indexes are used to find rows with specific column values fast. Without an index, MySQL has to start with the first record and then read through the whole table to find the relevant rows. The bigger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. Note that if you need to access almost all 1,000 rows, it is faster to read sequentially, because that minimizes disk seeks.
Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial column types use R-trees, and MEMORY (HEAP) tables support hash indexes.
Strings are automatically prefix- and end-space compressed.
In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described at the end of this section.
Suppose that you issue the following SELECT statement:
mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch the rows.
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
MySQL can't use a partial index if the columns don't form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:
SELECT * FROM tbl_name WHERE col1=val1; SELECT * FROM tbl_name WHERE col2=val2; SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first of the preceding queries uses the index. The second and third queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).
An index is used for columns that you compare with the =, >, >=, <, <=, or BETWEEN operators.
MySQL also uses indexes for LIKE comparisons if the argument to LIKE is a constant string that doesn't start with a wildcard character. For example, the following SELECT statements use indexes:
SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%'; SELECT * FROM tbl_name WHERE key_col LIKE 'Pat%_ck%';
In the first statement, only rows with 'Patrick' <= key_col < 'Patricl' are considered. In the second statement, only rows with 'Pat' <= key_col < 'Pau' are considered.
The following SELECT statements will not use indexes:
SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%'; SELECT * FROM tbl_name WHERE key_col LIKE other_col;
In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.
MySQL 4.0 and up performs an additional LIKE optimization. If you use ... LIKE '%string%' and string is longer than three characters, MySQL will use the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then use this pattern to perform the search quicker.
Searching using col_name IS NULL will use indexes if col_name is indexed.
Any index that doesn't span all AND levels in the WHERE clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.
The following WHERE clauses use indexes:
... WHERE index_part1=1 AND index_part2=2 AND other_column=3 /* index = 1 OR index = 2 */ ... WHERE index=1 OR A=10 AND index=2 /* optimized like "index_part1='hello'" */ ... WHERE index_part1='hello' AND index_part3=5 /* Can use index on index1 but not on index2 or index3 */ ... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
These WHERE clauses do not use indexes:
/* index_part1 is not used */ ... WHERE index_part2=1 AND index_part3=2 /* Index is not used in both AND parts */ ... WHERE index=1 OR A=10 /* No index spans all rows */ ... WHERE index_part1=1 OR index_part2=10
Sometimes MySQL will not use an index, even if one is available. One way this occurs is when the optimizer estimates that using the index would require MySQL to access a large percentage of the rows in the table. (In this case, a table scan is probably much faster, because it will require many fewer seeks.) However, if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, because it can much more quickly find the few rows to return in the result.
Hash indexes have somewhat different characteristics than those just discussed:
6.4.6 The MyISAM Key Cache
To minimize disk I/O, the MyISAM storage engine employs a strategy that is used by many database management systems. It exploits a cache mechanism to keep the most frequently accessed table blocks in memory:
This section first describes the basic operation of the MyISAM key cache. Then it discusses changes made in MySQL 4.1 that improve key cache performance and that enable you to better control cache operation:
The key cache mechanism also is used for ISAM tables. However, the significance of this fact is on the wane. ISAM table use has been decreasing since MySQL 3.23 when MyISAM was introduced. MySQL 4.1 carries this trend further; the ISAM storage engine is disabled by default.
You can control the size of the key cache by means of the key_buffer_size system variable. If this variable is set equal to zero, no key cache is used. The key cache also is not used if the key_buffer_size value is too small to allocate the minimal number of block buffers (8).
When the key cache is not operational, index files are accessed using only the native filesystem buffering provided by the operating system. (In other words, table index blocks are accessed using the same strategy as that employed for table data blocks.)
An index block is a contiguous unit of access to the MyISAM index files. Usually the size of an index block is equal to the size of nodes of the index B-tree. (Indexes are represented on disk using a B-tree data structure. Nodes at the bottom of the tree are leaf nodes. Nodes above the leaf nodes are non-leaf nodes.)
All block buffers in a key cache structure are the same size. This size can be equal to, greater than, or less than the size of a table index block. Usually one of these two values is a multiple of the other.
When data from any table index block must be accessed, the server first checks whether it is available in some block buffer of the key cache. If it is, the server accesses data in the key cache rather than on disk. That is, it reads from the cache or writes into it rather than reading from or writing to disk. Otherwise, the server chooses a cache block buffer containing a different table index block (or blocks) and replaces the data there by a copy of required table index block. As soon as the new index block is in the cache, the index data can be accessed.
If it happens that a block selected for replacement has been modified, the block is considered "dirty." In this case, before being replaced, its contents are flushed to the table index from which it came.
Usually the server follows an LRU (Least Recently Used) strategy: When choosing a block for replacement, it selects the least recently used index block. To be able to make such a choice easy, the key cache module maintains a special queue (LRU chain) of all used blocks. When a block is accessed, it is placed at the end of the queue. When blocks need to be replaced, blocks at the beginning of the queue are the least recently used and become the first candidates for eviction.
blog comments powered by Disqus