HomeMySQL Page 3 - SQL Performance and Tuning Considerations
General RDBMS Considerations - MySQL
This article, the first of two parts, will show you how to make the SQL statements you write run faster and more efficiently. It is excerpted from chapter 11 of the book SQL DeMYSTiFied, written by Andy Oppel (McGraw-Hill/Osborne, 2005; ISBN: 0072262249).
This section covers design and tuning considerations that apply to most SQL implementations. As always, consult the documentation for the DBMS you are using for applicability information.
Know Your Optimizer
The query optimizer is the software component in the RDBMS that analyzes an SQL statement to determine the best way to execute it. Most modern optimizers are cost-based, which means that they estimate the cost of all possible ways to execute a statement and then chose the one with the lowest cost. An important component of cost-based optimizers is statistics gathered from the database, such as the number of rows in each table and the number of unique values each indexed column has. However, some optimizers are rule-based, which means that they apply a set of rules to the SQL statement instead of cost estimates when deciding how to best execute it. A few optimizers such as Oracle's allow a choice of either cost-based or rule-based. Here are some considerations regarding query optimizers:
Order of table names Does the order of the table names in the FROM or JOIN clause have any influence on the order in which tables are accessed when performing the joins? This is more likely the case with a rule-based optimizer. Ideally, the DBMS should access the most selective table (the one that will eliminate the most number of rows from the result set) first. For rule-based optimizers, you may find some surprising differences. For example, Oracle version 7 processed the table name list from right to left, but this was changed to left to right order starting with version 8.
Order of search predicates Does the order of the predicates in the WHERE clause have any influence on the order in which the predicates are evaluated? Ideally, the most restrictive predicate (the one that eliminates the most number of rows) should be evaluated first.
Lack of statistics If a cost-based optimizer is being used, what does it do when statistics have not been collected for one or more tables? Some optimizers, such as Oracle's, revert back to rule-based, while others assume default values for the required statistics or simply refuse to use any indexes and do full table scans of all the tables. A full table scan is where the DBMS reads every row in the table to find the desired ones, which of course can be a performance disaster for tables with very large numbers of rows.
Query rewrites Are queries rewritten into more efficient forms by the optimizer? For example, many optimizers automatically rewrite subselects into equivalent joins in order to simplify subsequent processing. In some cases, you may find that certain DBMS options must be enabled in order to allow the optimizer to rewrite queries.
View definition merges For queries that use views, at what point does the DBMS merge the view definition (the query that defines the view) into the SQL statement submitted by the database user? This has obvious implications for the optimizer -the sooner it can evaluate the entire SQL statement, the smarter its decision should be.
Other criteria What other criteria influence the optimizer? For example, some optimizers will favor unique indexes over nonunique ones, and some will favor the use of an index to sequence rows over sorting the result set.