Optimzing your queries can help them run more efficiently, which can save a significant amount of time. This article covers index optimization and index usage. It is excerpted from chapter 13 of the MySQL Certification Guide, written by Paul Dubois et al. (Sams, 2005; ISBN: 0672328127).
Short index values can be processed more quickly than long ones. Therefore, when you index a column, it's worth asking whether it's sufficient to index partial column values rather than complete values. This technique, known as indexing a column prefix, can be applied to string column types.
Suppose that you're considering creating a table using this definition:
CREATE TABLE t
If you index all 255 bytes of the values in the name column, index processing will be relatively slow:
It's necessary to read more information from disk.
Longer values take longer to compare.
The index cache is not as effective because fewer index values fit into it at a time.
It's often possible to overcome these problems by indexing only a prefix of the column values. For example, if you expect column values to be distinct most of the time in the first 15 bytes, index only that many bytes of each value, not all 255 bytes.
To specify a prefix length for a column, follow the column name in the index definition by a number in parentheses. The following table definition is the same as the previous one, except that the index uses just the first 15 bytes of each column value:
CREATE TABLE t
Indexing a column prefix can speed up query processing, but works best when the prefix values tend to have about the same amount of uniqueness as the original values. Don't use such a short prefix that you produce a very high frequency of duplicate values in the index. It might require some testing to find the optimal balance between long index values that provide good uniqueness versus shorter values that compare more quickly but have more duplicates. To determine the number of records in the table, the number of distinct values in the column, and the number of duplicates, use this query:
COUNT(*) AS 'Total Rows',
COUNT(DISTINCT name) AS 'Distinct Values',
COUNT(*) - COUNT(DISTINCT name) AS 'Duplicate
That gives you an estimate of the amount of uniqueness in the name values. Then run a similar query on the prefix values:
COUNT(DISTINCT LEFT(name,n)) AS 'Distinct Prefix
COUNT(*) - COUNT(DISTINCT LEFT(name,n)) AS
'Duplicate Prefix Values'
That tells you how the uniqueness characteristics change when you use an n-byte prefix of the name values. Run the query with different values of n to determine an acceptable prefix length.
Note that when an index on a full column is a PRIMARY KEY or UNIQUE index, you might have to change the index to be nonunique if you decide to index prefix values instead. If you index partial column values and require the index to be unique, that means the prefix values must be unique, too.