Enhancing MySQL Query Efficiency

There are a number of things you can do in general to optimize your queries and make them more efficient. This article discusses several of these. It is excerpted from chapter 13 of the MySQL Certification Guide, written by Paul Dubois et al. (Sams, 2005; ISBN: 0672328127).

13.3 General Query Enhancement

The way you write a query often affects how well indexes are used. Use the following principles to make your queries more efficient:

  • Don’t refer to an indexed column within an expression that must be evaluated for every row in the table. Doing so prevents use of the index. Instead, isolate the column onto one side of a comparison when possible. For example, one way to select rows containing date values from the year 1994 and up is as follows:

    SELECT * FROM t WHERE YEAR(d) >= 1994;

    In this case, the value of YEAR(d) must be evaluated for every row in the table, so the index cannot be used. Instead, write the query like this:

    SELECT * FROM t WHERE d >= '1994-01-01';

    In the rewritten expression, the indexed column stands by itself on one side of the comparison and MySQL can apply the index to optimize the query.

    In situations like this, EXPLAIN is useful for verifying that one way of writing a query is better than another. For the two date-selection queries just shown, for example, you might find that EXPLAIN tells you something like this:

    mysql> EXPLAIN SELECT * FROM t WHERE YEAR(d)
    >= 1994G *********************** 1. row *************************** table: t type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 867038 Extra: Using where mysql> EXPLAIN SELECT * FROM t WHERE d >=
    '1994-01-01'G *********************** 1. row *************************** table: t type: range possible_keys: d key: d key_len: 4 ref: NULL rows: 70968 Extra: Using where

    These results indicate that the second query is indeed better from the optimizer’s point of view. MySQL can perform a range scan using the index for the column d, drastically reducing the number of rows that need to be examined. (The rows value drops from 867,038 to 70,968.)

  • When comparing an indexed column to a value, use a value that has the same datatype as the column. For example, you can look for rows containing a numeric id value of 18 with either of the following WHERE clauses:

    WHERE id = 18
    WHERE id = '18'

    MySQL will produce the same result either way, even though the value is specified as a number in one case and as a string in the other case. However, for the string value, MySQL must perform a string-to-number conversion, which might cause an index on the id column not to be used.

  • In certain cases, MySQL can use an index for pattern-matching operations performed with the LIKE operator. This is true if the pattern begins with a literal prefix value rather than with a wildcard character. An index on a name column can be used for a pattern match like this:

    WHERE name LIKE 'de%'

    That’s because the pattern match is logically equivalent to a range search:

    WHERE name >= 'de' AND name < 'df'

    On the other hand, the following pattern makes LIKE more difficult for the optimizer:

    WHERE name LIKE '%de%'

    When a pattern starts with a wildcard character as just shown, MySQL cannot make efficient use of any indexes associated with that column. (Even if an index is used, the entire index must be scanned.)

{mospagebreak title=13.3.1 Optimizing Queries by Limiting Output}

Some optimizations can be done independently of whether indexes are used. A simple but effective technique is to reduce the amount of output a query produces.

One way to eliminate unnecessary output is by using a LIMIT clause. If you don’t need the entire result set, specify how many rows the server should return by including LIMIT in your query. This helps two ways:

  • Less information need be returned over the network to the client.

  • In many cases, LIMIT allows the server to terminate query processing earlier than it would otherwise.

Another way to limit query output is by selecting only the columns you need, rather than using SELECT * to retrieve all columns. Suppose you want information about countries having names that begin with M. The following query produces that information:

SELECT * FROM Country WHERE Name LIKE 'M%';

However, if all you really want to know is the country names, don’t write the query like that. Most of the information retrieved will be irrelevant to what you want to know, resulting in unnecessary server effort and network traffic. Instead, select only the Name column:

SELECT Name FROM Country WHERE Name LIKE 'M%';

The second query is faster because MySQL has to return less information to the client when you select just one column rather than all of them.

In addition, if an index on Name exists, you get even more improvement for two reasons:

  • The index can be used to determine quickly which Name values satisfy the condition in the WHERE clause. This is faster than scanning the entire table.

  • Depending on the storage engine, the server might not read the table rows at all. If the values requested by the query are in the index, by reading the index, MySQL already has the information that the client wants. For example, the MyISAM handler will read the index file to determine which values satisfy the query, and then return them to the client without reading the datafile at all. Doing so is faster than reading both the index file and the datafile.

{mospagebreak title=13.3.2 Optimizing Updates}

The optimizations discussed so far have been shown for SELECT statements, but optimization techniques can be used for statements that update tables, too:

  • For a DELETE or UPDATE statement that uses a WHERE clause, try to write it in a way that allows an index to be used for determining which rows to delete or update. Techniques for this were discussed earlier for SELECT statements; they apply to DELETE or UPDATE as well.

  • EXPLAIN is used with SELECT queries, but you might also find it helpful for analyzing UPDATE and DELETE queries. Write a SELECT statement that has the same WHERE clause as the UPDATE or DELETE and analyze that.

  • Use multiple-row INSERT statements instead of multiple single-row INSERT statements. For example, instead of using three single-row statements like this:

    mysql> INSERT INTO t (id, name)
    VALUES(1,'Bea');
    mysql> INSERT INTO t (id, name)
    VALUES(2,'Belle');
    mysql> INSERT INTO t (id, name)
    VALUES(3,'Bernice');

    You could use a single multiple-row statement that does the same thing:

    mysql> INSERT INTO t (id, name)
    VALUES(1,'Bea'),(2,'Belle'),(3,'Bernice');

    The multiple-row statement is shorter, which is less information to send to the server. More important, it allows the server to perform all the updates at once and flush the index a single time, rather than after each of the individual inserts. This optimization can be used with any storage engine.

    If you’re using an InnoDB table, you can get better performance even for single-row statements by grouping them within a transaction rather than by executing them with autocommit mode enabled:

    mysql> BEGIN;
    mysql> INSERT INTO t (id, name)
    VALUES(1,'Bea');
    mysql> INSERT INTO t (id, name)
    VALUES(2,'Belle');
    mysql> INSERT INTO t (id, name)
    VALUES(3,'Bernice');
    mysql> COMMIT;

    Using a transaction allows InnoDB to flush the changes at commit time. In autocommit mode, InnoDB flushes the changes after each insert.

  • For any storage engine, LOAD DATA INFILE is even faster than multiple-row INSERT statements. For MyISAM in particular, if you’re loading an empty table, MySQL will even automatically disable updating nonunique indexes during the load operation to speed it up more.

  • To replace existing rows, use REPLACE rather than DELETE plus INSERT.

{mospagebreak title=13.3.3 Using Scheduling Modifiers}

For an application that uses MyISAM tables, you can change the priority of statements that retrieve or modify data. This can be useful in situations where the normal scheduling priorities do not reflect the application’s requirements.

Consider an application consisting of a logging process that uses INSERT statements to record information in a log table, and a summary process that periodically issues SELECT queries to generate reports from the log table. Normally, the server will give updates to the table priority over retrievals, so at times of heavy logging activity, report generation might be delayed. If the application places high importance on having the summary process execute as quickly as possible, it can use scheduling modifiers to alter the usual query priorities. Two approaches are possible:

  • To elevate the priority of the summary queries, use SELECT HIGH_PRIORITY rather than SELECT with no modifier. This will move the SELECT ahead of pending INSERT statements that have not yet begin to execute.

  • To reduce the priority of record logging statements, use INSERT with either the LOW_PRIORITY or DELAYED modifier.

Scheduling modifiers are covered further in section 14.3.2, “Query Scheduling Modifiers.”

Please check back next week for the conclusion of this article.

[gp-comments width="770" linklove="off" ]

chat