Analyzing Queries for Speed with EXPLAIN

When you are trying to optimize your queries to run quickly and efficiently, you may encounter queries that really should run faster. That’s where EXPLAIN comes in handy. This article shows you how to use EXPLAIN in query analysis. It is excerpted from chapter 13 of the MySQL Certification Guide, written by Paul Dubois et al. (Sams, 2005; ISBN: 0672328127).

13.2 Using EXPLAIN to Analyze Queries

When a SELECT query does not run as quickly as you think it should, use the EXPLAIN statement to ask the MySQL server for information about how the query optimizer processes the query. This information is useful in several ways:

  • EXPLAIN can provide information that points out the need to add an index.

  • If a table already has indexes, you can use EXPLAIN to find out whether the optimizer is using them. (To see what indexes a table has, use SHOW INDEX, as described in section 13.1.2, “Obtaining Table Index Information.”)

  • If indexes exist but aren’t being used, you can try writing a query different ways. EXPLAIN can tell you whether the rewrites are better for helping the server use the available indexes.

When using EXPLAIN to analyze a query, it’s helpful to have a good understanding of the tables involved. If you need to determine a table’s structure, remember that you can use DESCRIBE to obtain information about a table’s columns, and SHOW INDEX for information about its indexes.

This section describes how EXPLAIN works. Later in the chapter, section 13.3, “General Query Enhancement,” discusses some general query-writing principles that help MySQL use indexes more effectively. You can apply those principles in conjunction with EXPLAIN to determine the best way of writing a query.

13.2.1 Identifying Candidates for Query Analysis

EXPLAIN can be used to analyze any SELECT query, but some query performance characteristics make it especially likely that EXPLAIN will be helpful:

  • When a query that you issue (for example, using the mysql client) clearly takes a long time.

  • When a query appears in the slow query log, particularly if it appears consistently each time it is issued.

Recognize that “slow” can be a relative term. You don’t want to waste time trying to optimize a query that seems slow but is so only for external reasons and is not inherently slow:

  • Queries in the slow log are determined to be slow using wallclock time. Queries will appear more often in the log when the server host is heavily loaded than when it is not, so you should evaluate query execution time against general system activity on that host.

  • A query might appear slow if the machine is very busy, but otherwise perform acceptably. For example, if filesystem backups are taking place, they’ll incur heavy disk activity that impedes the performance of other programs, including the MySQL server. The machine might be processing a heavy load for other reasons, such as if you have a very active Web server running on the same host.

Keeping in mind the preceding considerations, you have a good indicator that a query might be in need of being optimized if you find that it is consistently slow in comparison to other queries no matter when you run it, and you know the machine isn’t just generally bogged down all the time.

Another factor to recognize is that the mere presence of a query in the slow query log does not necessarily mean that the query is slow. If the server is run with the –long-log-format option, the slow query log also will contain queries that execute without using any index. In some cases, such a query may indeed be a prime candidate for optimization (for example, by adding an index). But in other cases, MySQL might elect not to use an existing index simply because a table is so small that scanning all of its rows is just as fast as using an index.

The SHOW PROCESSLIST statement is another useful source of information about query execution. Use it periodically to get information about what queries currently are running. If you notice that a particular query often seems to be causing a backlog by making other queries block, see whether you can optimize it. If you’re successful, it will alleviate the backlog. To get the most information from SHOW PROCESSLIST, you should have the PROCESS privilege. Then the statement will display queries being run by all clients, not just your own queries.

{mospagebreak title=13.2.2 How EXPLAIN Works}

To use EXPLAIN, write your SELECT query as you normally would, but place the keyword EXPLAIN in front of it. As a very simple example, take the following statement:

SELECT 1;

To see what EXPLAIN will do with it, issue the statement like this:

mysql> EXPLAIN SELECT 1;
+----------------+
| Comment        |
+----------------+
| No tables used |
+----------------+

In practice, it’s unlikely that you’d use EXPLAIN very often for a query like that because the output doesn’t tell you anything particularly interesting. However, the example does illustrate an important principle: EXPLAIN can be applied to any SELECT query. One of the implications of this principle is that you can use EXPLAIN with simple queries while you’re learning how to use it and how to interpret its results. You don’t have to begin with a complicated multiple-table join.

With that in mind, consider these two simple single-table queries:

SELECT * FROM Country WHERE Name = 'France';
SELECT * FROM Country WHERE Code = 'FRA';

Both queries produce the same output (information about the country of France), but they are not equally efficient. How do you know? Because EXPLAIN tells you so. When you use EXPLAIN with each of the two queries, it provides the following information about how the MySQL optimizer views them:

mysql> EXPLAIN SELECT * FROM Country WHERE Name =
'France'G ********************* 1. row *************************** table: Country type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 Extra: Using where mysql> EXPLAIN SELECT * FROM Country WHERE Code =
'FRA'G ********************* 1. row *************************** table: Country type: const possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: const rows: 1 Extra:

EXPLAIN produces several columns of information. In the example just shown, NULL in the possible_keys and key columns shows for the first query that no index is considered available or usable for processing the query. For the second query, the table’s PRIMARY KEY column (the Code column that contains three-letter country codes) can be used, and is in fact the one that the optimizer would choose. The rows column of the EXPLAIN output shows the effect of this difference. Its value indicates the number of rows that MySQL estimates it will need to examine while processing the query:

  • For the first query, the value is 239, which happens to be the number of rows in the Country table. This value indicates that MySQL would scan all rows of the table, which is inefficient.

  • For the second query, only one row need be examined. This is because MySQL can use the table’s primary key to go directly to the single relevant row.

This example briefly indicates the kind of useful information that EXPLAIN can provide, even for simple queries. The conclusion to draw is that, if possible, you should use the Code column rather than the Name column to look up Country table records. However, the real power of EXPLAIN lies in what it can tell you about joins—SELECT queries that use multiple tables.

EXPLAIN is especially important for join analysis because they have such enormous potential to increase the amount of processing the server must do. If you select from a table with a thousand rows, the server might need to scan all one thousand rows in the worst case. But if you perform a join between two tables with a thousand rows each, the server might need to examine every possible combination of rows, which is one million combinations. That’s a much worse worst case. EXPLAIN can help you reduce the work the server must do to process such a query, so it’s well worth using.

{mospagebreak title=13.2.3 Analyzing a Query}

The following example demonstrates how to use EXPLAIN to analyze and optimize a sample query. The purpose of the query is to answer the question, “Which cities have a population of more than eight million?” and to display for each city its name and population, along with the country name. This question could be answered using only city information, except that to get each country’s name rather than its code, city information must be joined to country information.

The example uses tables created from world database information. Initially, these tables will have no indexes, so EXPLAIN will show that the query is not optimal. The example then adds indexes and uses EXPLAIN to determine the effect of indexing on query performance.

Begin by creating the initial tables, CountryList and CityList. These are derived from the Country and City tables, but need contain only the columns involved in the query:

mysql> CREATE TABLE CountryList
  -> SELECT Code, Name FROM Country;
Query OK, 239 rows affected (0.04 sec)
mysql> CREATE TABLE CityList
 -> SELECT CountryCode, Name, Population FROM City;
Query OK, 4079 rows affected (0.04 sec)

The query that retrieves the desired information in the required format looks like this:

mysql> SELECT CountryList.Name, CityList.Name,
CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000;
+--------------------+------------------+------------+ | Name | Name | Population | +--------------------+------------------+------------+ | Brazil | Saõ Paulo | 9968485 | | Indonesia | Jakarta | 9604900 | | India | Mumbai (Bombay) | 10500000 | | China | Shanghai | 9696300 | | South Korea | Seoul | 9981619 | | Mexico | Ciudad de México | 8591309 | | Pakistan | Karachi | 9269265 | | Turkey | Istanbul | 8787958 | | Russian Federation | Moscow | 8389200 | | United States | New York | 8008278 | +--------------------+------------------+------------+

While the tables are in their initial unindexed state, applying EXPLAIN to the query yields the following result:

mysql> EXPLAIN SELECT CountryList.Name,
CityList.Name, CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000G
********************** 1. row *************************** table: CountryList type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 Extra: ********************** 2. row *************************** table: CityList type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4079 Extra: Using where

The information displayed by EXPLAIN shows that no optimizations could be made:

  • The type value in each row shows how MySQL will read the corresponding table. For CountryList, the value of ALL indicates a full scan of all rows. For CityList, the value of ALL indicates a scan of all its rows to find a match for each CountryList row. In other words, all combinations of rows will be checked to find country code matches between the two tables.

  • The number of row combinations is given by the product of the rows values, where rows represents the optimizer’s estimate of how many rows in a table it will need to check at each stage of the join. In this case, the product is 239 * 4,079 or 974,881.

EXPLAIN shows that MySQL would need to check nearly a million row combinations to produce a query result that contains only 10 rows. Clearly, this query would benefit from the creation of indexes that allow the server to look up information faster.

Good columns to index typically are those that you use for searching, grouping, or sorting records. The query does not have any GROUP BY or ORDER BY clauses, but it does use columns for searching. Specifically:

  • The query uses CountryList.Code and CityList.CountryCode to match records between tables.

  • The query uses CityList.Population to cull records that do not have a large enough population.

To see the effect of indexing, try creating indexes on the columns used to join the tables. In the CountryList table, Code is a primary key that uniquely identifies each row. Add the index using ALTER TABLE:

mysql> ALTER TABLE CountryList ADD PRIMARY KEY
(Code);

In the CityList table, CountryCode is a nonunique index because multiple cities can share the same country code:

mysql> ALTER TABLE CityList ADD INDEX (CountryCode);

After creating the indexes, EXPLAIN reports a somewhat different result:

mysql> EXPLAIN SELECT CountryList.Name,
CityList.Name, CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000G
********************** 1. row *************************** table: CityList type: ALL possible_keys: CountryCode key: NULL key_len: NULL ref: NULL rows: 4079 Extra: Using where ********************** 2. row *************************** table: CountryList type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: CityList.CountryCode rows: 1 Extra:

Observe that EXPLAIN now lists the tables in a different order. CityList appears first, which indicates that MySQL will read rows from that table first and use them to search for matches in the second table, CountryList. The change in table processing order reflects the optimizer’s use of the index information that is now available for executing the query.

MySQL still will scan all rows of the CityList table (its type value is ALL), but now the server can use each of those rows to directly look up the corresponding CountryList row. This is seen by the information displayed for the CountryList table:

  • The type value of eq_ref indicates that an equality test is performed by referring to the column named in the ref field, CityList.CountryCode.

  • The possible_keys value of PRIMARY shows that the optimizer sees the primary key as a candidate for optimizing the query, and the key field indicates that it will actually use the primary key when executing the query.

The result from EXPLAIN shows that indexing CountryList.Code as a primary key improves query performance. However, it still indicates a full scan of the CityList table. The optimizer sees that the index on CountryCode is available, but the key value of NULL indicates that it will not be used. Does that mean the index on the CountryCode column is of no value? It depends. For this query, the index is not used. In general, however, it’s good to index joined columns, so you likely would find for other queries on the CityList table that the index does help.

The product of the rows now is just 4,079. That’s much better than 974,881, but perhaps further improvement is possible. The WHERE clause of the query restricts CityList rows based on their Population values, so try creating an index on that column:

mysql> ALTER TABLE CityList ADD INDEX (Population);

After creating the index, run EXPLAIN again:

mysql> EXPLAIN SELECT CountryList.Name,
CityList.Name, CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000G
*********************** 1. row *************************** table: CityList type: range possible_keys: CountryCode,Population key: Population key_len: 4 ref: NULL rows: 78 Extra: Using where *********************** 2. row *************************** table: CountryList type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: CityList.CountryCode rows: 1 Extra:

The output for the CountryList table is unchanged compared to the previous step. That is not a surprise; MySQL already found that it could use a primary key for lookups, which is very efficient. On the other hand, the result for the CityList table is different. The optimizer now sees two indexes in the table as candidates. Furthermore, the key value shows that it will use the index on Population to look up records. This results in an improvement over a full scan, as seen in the change of the rows value from 4,079 to 78.

The query now is optimized. Note that the product of the rows values, 78, still is larger than the actual number of rows produced by the query (10 rows). This is because the rows values are only estimates. The optimizer cannot give an exact count without actually executing the query.

To summarize:

  • With unindexed tables, the rows product was 974,881.

  • After indexing the join columns, the rows product dropped to 4,079, a 99.6% improvement.

  • After indexing the Population column, the rows product dropped to 78, a further improvement of 98.1% over the previous step.

The example shows that using indexes effectively can substantially reduce the work required by the server to execute a query, and that EXPLAIN is a useful tool for assessing the effect of indexing.

{mospagebreak title=13.2.4 EXPLAIN Output Columns}

To use EXPLAIN productively, it’s important to know the meaning of the columns in each row of output that it produces:

  • table is the name of the table to which the information in the row applies. EXPLAIN produces one row of output for each table named in the SELECT query. The order of the tables named in these rows indicates the order in which MySQL will read the tables to process the query. This is not necessarily the order in which you name them in the FROM clause, because the optimizer attempts to determine which order will result in the most efficient processing. The example in the preceding section shows this: The table order displayed by EXPLAIN changed as indexes were added.

  • type indicates the join type. The value is a measure of how efficiently MySQL can scan the table. The possible type values are described later in this section.

  • possible_keys indicates which of the table’s indexes MySQL considers to be candidates for identifying rows that satisfy the query. This value can be a list of one or more index names, or NULL if there are no candidates. The word PRIMARY indicates that MySQL considers the table’s primary key to be a candidate.

  • key indicates the optimizer’s decision about which of the candidate indexes listed in possible_keys will yield most efficient query execution. If the key value is NULL, it means no index was chosen. This might happen either because there were no candidates or because the optimizer believes it will be just as fast to scan the table rows as to use any of the possible indexes. A table scan might be chosen over an index scan if the table is small, or because the index would yield too high a percentage of the rows in the table to be of much use.

  • key_len indicates how many bytes of index rows are used. From this value, you can derive how many columns from the index are used. For example, if you have an index consisting of three INT columns, each index row contains three 4-byte values. If key_len is 12, you know that the optimizer uses all three columns of the index when processing the query. If key_len is 4 or 8, it uses only the first one or two columns (that is, it uses a leftmost prefix of the index).

    If you’ve indexed partial values of string columns, take that into account when assessing the key_len value. Suppose that you have a composite index on two CHAR(8) columns that indexes only the first 4 bytes of each column. In this case, a key_len value of 8 means that both columns of the index would be used, not just the first column.

  • ref indicates which indexed column or columns are used to choose rows from the table. const means key values in the index are compared to a constant expression, such as in Code='FRA'. NULL indicates that neither a constant nor another column are being used, indicating selection by an expression or range of values. It might also indicate that the column does not contain the value specified by the constant expression. If neither NULL nor const is specified, a table_name.column_name combination will be shown, indicating that the optimizer is looking at column_name in the rows returned from table_name to identify rows for the current table.

  • rows is the optimizer’s estimate of how many rows from the table it will need to examine. The value is an approximation because, in general, MySQL cannot know the exact number of rows without actually executing the query. For a multiple-table query, the product of the rows values is an estimate of the total number of row combinations that need to be read. This product gives you a rough measure of query performance. The smaller the value, the better.

  • Extra provides other information about the join. The possible values are described later in this section.

The type value indicates the join type, but joins may be performed with varying degrees of efficiency. The type value provides a measure of this efficiency by indicating the basis on which rows are selected from each table. The following list shows the possible values, from the best type to the worst:

  • system

    The table has exactly one row.

  • const

    The table has exactly one matching row. This type value is similar to system, except that the table may have other, nonmatching rows. The EXPLAIN output from the query with WHERE Code='FRA' is an example of this:

    mysql> EXPLAIN SELECT * FROM Country WHERE
    Code = 'FRA'G *********************** 1. row *************************** table: Country type: const possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: const rows: 1 Extra:

    The query has a type value of const because only one row out of all its rows need be read. If the table contained only the row for France, there would be no nonmatching rows and the type value would be system rather than const.

    For both system and const, because only one row matches, any columns needed from it can be read once and treated as constants while processing the rest of the query.

  • eq_ref

    Exactly one row is read from the table for each combination of rows from the tables listed earlier by EXPLAIN. This is common for joins where MySQL can use a primary key to identify table rows.

  • ref

    Several rows may be read from the table for each combination of rows from the tables listed earlier by EXPLAIN. This is similar to eq_ref, but can occur when a nonunique index is used to identify table rows or when only a leftmost prefix of an index is used. For example, the CountryLanguage table has a primary key on the CountryCode and Language columns. If you search using only a CountryCode value, MySQL can use that column as a leftmost prefix, but there might be several rows for a country if multiple languages are spoken there.

  • range

    The index is used to select rows that fall within a given range of index values. This is common for inequality comparisons such as id<10.

  • index

    MySQL performs a full scan, but it scans the index rather than the data rows. An index scan is preferable: The index is sorted and index rows usually are shorter than data rows, so index rows can be read in order and more of them can be read at a time.

  • ALL

    A full table scan of all data rows. Typically, this indicates that no optimizations are done and represents the worst case. It is particularly unfortunate when tables listed later in EXPLAIN output have a join type of ALL because that indicates a table scan for every combination of rows selected from the tables processed earlier in the join.

The Extra column provides additional information about how the table is processed. Some values indicate that the query is efficient:

  • Using index

    MySQL can optimize the query by reading values from the index without having to read the corresponding data rows. This optimization is possible when the query selects only columns that are in the index.

  • Where used

    MySQL uses a WHERE clause to identify rows that satisfy the query. Without a WHERE clause, you get all rows from the table.

  • Distinct

    MySQL reads a single row from the table for each combination of rows from the tables listed earlier in the EXPLAIN output.

  • Not exists

    MySQL can perform a LEFT JOIN “missing rows” optimization that quickly eliminates rows from consideration.

By contrast, some Extra values indicate that the query is not efficient:

  • Using filesort

    Rows that satisfy the query must be sorted, which adds an extra processing step.

  • Using temporary

    A temporary table must be created to process the query.

  • Range checked for each record

    MySQL cannot determine in advance which index from the table to use. For each combination of rows selected from previous tables, it checks the indexes in the table to see which one will be best. This is not great, but it’s better than using no index at all.

Using filesort and Using temporary generally are the two indicators of worst performance.

To use EXPLAIN for query analysis, examine its output for clues to ways the query might be improved. Make the change, and then run EXPLAIN again to see how its output changes. Changes might involve rewriting the query or changing the structure of your tables.

The following query rewriting techniques can be useful:

  • If the keys value is NULL even when there are indexes available, you can try adding a USE INDEX option as a hint to the optimizer which index is relevant for the query. To force MySQL to use the index, use FORCE INDEX. To tell MySQL to ignore an index that it chose and choose a different one instead, use IGNORE INDEX. Each of these options is used in the FROM clause, following the table name containing the index you want to control. The option is followed by parentheses containing a comma-separated list of one or more index names. PRIMARY means the table’s primary key.

    SELECT Name FROM CountryList USE
    INDEX(PRIMARY) WHERE Code > 'M'; SELECT Name FROM CountryList IGNORE
    INDEX(Population) WHERE Code < 'B' AND Population > 50000000;

    The keyword KEY may be used instead of INDEX in all three options.

  • If you want to force MySQL to join tables in a particular order, begin the query with SELECT STRAIGHT_JOIN rather than SELECT, and then list the tables in the desired order in the FROM clause.

  • Sometimes a table in a query has an index available, but the query is written in such a way that prevents the index from being used. If you can rewrite the query into an equivalent form that allows use of the index, do so. Some rewriting techniques you can use are given in section 13.3, “General Query Enhancement.”

Another way to provide the optimizer with better information on which to base its decisions is to change the structure of your tables:

  • If the possible_keys value is NULL in the output from EXPLAIN, it means MySQL finds no applicable index for processing the query. See whether an index can be added to the columns that identify which records to retrieve. For example, if you perform a join by matching a column in one table with a column in another, but neither of the columns are indexed, try indexing them.

  • Keep table index statistics up-to-date to help MySQL choose optimal indexes. If the table is a MyISAM table (or an InnoDB table, as of MySQL 4.0.13), you can update its statistics with the ANALYZE TABLE statement. As a table’s contents change, the statistics go out of date and become less useful to the optimizer in making good decisions about query execution strategies. You should run ANALYZE TABLE more frequently for tables that change often than for those that are updated rarely.

Please check back next week for the continuation of this article.

Google+ Comments

Google+ Comments