MySQL Optimization, part 1 (
Page 1 of 8 )
While optimization is possible with limited knowledge of your system or application, the more you know about your system, the better your optimization will be. This article, the first of two parts, covers some of the different points you will need to know for optimizing MySQL. It is excerpted from chapter six of the book MySQL Administrator's Guide, by MySQL AB (Sams, 2004; ISBN: 0672326345)
Optimization is a complex task because ultimately it requires understanding
of the entire system to be optimized. Although it may be possible to perform
some local optimizations with little knowledge of your system or application,
the more optimal you want your system to become, the more you will have to know
about it.
This chapter tries to explain and give some examples of different ways to
optimize MySQL. Remember, however, that there are always additional ways to make
the system even faster, although they may require increasing effort to
achieve.
6.1 Optimization Overview
The most important factor in making a system fast is its basic design. You
also need to know what kinds of things your system will be doing, and what your
bottlenecks are.
The most common system bottlenecks are:
-
Disk seeks. It takes time for the disk to find a piece of data. With modern
disks, the mean time for this is usually lower than 10ms, so we can in theory do
about 100 seeks a second. This time improves slowly with new disks and is very
hard to optimize for a single table. The way to optimize seek time is to
distribute the data onto more than one disk.
-
Disk reading and writing. When the disk is at the correct position, we need
to read the data. With modern disks, one disk delivers at least 10-20MB/s
throughput. This is easier to optimize than seeks because you can read in
parallel from multiple disks.
-
CPU cycles. When we have the data in main memory (or if it was already
there), we need to process it to get our result. Having small tables compared to
the amount of memory is the most common limiting factor. But with small tables,
speed is usually not the problem.
-
Memory bandwidth. When the CPU needs more data than can fit in the CPU cache,
main memory bandwidth becomes a bottleneck. This is an uncommon bottleneck for
most systems, but one to be aware of.
6.1.1 MySQL Design Limitations and Tradeoffs
When using the MyISAM storage engine, MySQL uses extremely fast
table locking that allows multiple readers or a single writer. The biggest
problem with this storage engine occurs when you have a steady stream of mixed
updates and slow selects on a single table. If this is a problem for certain
tables, you can use another table type for them. See Chapter 8, "MySQL Storage
Engines and Table Types."
MySQL can work with both transactional and non-transactional tables. To be
able to work smoothly with non-transactional tables (which can't roll back if
something goes wrong), MySQL has the following rules:
-
All columns have default values.
-
If you insert a "wrong" value in a column, such as a too-large numerical
value into a numerical column, MySQL sets the column to the "best possible
value" instead of giving an error. For numerical values, this is 0, the smallest
possible value, or the largest possible value. For strings, this is either the
empty string or the longest possible string that can be stored in the
column.
-
All calculated expressions return a value that can be used instead of
signaling an error condition. For example, 1/0 returns
NULL.
The implication of these rules is that you should not use MySQL to check
column content. Instead, you should check values within your application before
storing them in the database.
6.1.2 Designing Applications for Portability
Because all SQL servers implement different parts of standard SQL, it takes
work to write portable SQL applications. It is very easy to achieve portability
for very simple selects and inserts, but becomes more difficult the more
capabilities you require. If you want an application that is fast with many
database systems, it becomes even harder!
To make a complex application portable, you need to determine which SQL
servers it must work with, then determine what features those servers
support.
All database systems have some weak points. That is, they have different
design compromises that lead to different behavior.
You can use the MySQL crash-me program to find functions, types, and
limits that you can use with a selection of database servers. crash-me
does not check for every possible feature, but it is still reasonably
comprehensive, performing about 450 tests.
An example of the type of information crash-me can provide is that
you shouldn't have column names longer than 18 characters if you want to be able
to use Informix or DB2.
The crash-me program and the MySQL benchmarks are all very database
independent. By taking a look at how they are written, you can get a feeling for
what you have to do to make your own applications database independent. The
programs can be found in the sql-bench directory of MySQL source
distributions. They are written in Perl and use the DBI database interface. Use
of DBI in itself solves part of the portability problem because it provides
database-independent access methods.
For crash-me results, visit http://dev.mysql.com/tech-resources/crash-me.php. See http://dev.mysql.com/tech-resources/benchmarks/ for the
results from the benchmarks.
If you strive for database independence, you need to get a good feeling for
each SQL server's bottlenecks. For example, MySQL is very fast in retrieving and
updating records for MyISAM tables, but will have a problem in mixing
slow readers and writers on the same table. Oracle, on the other hand, has a big
problem when you try to access rows that you have recently updated (until they
are flushed to disk). Transactional databases in general are not very good at
generating summary tables from log tables, because in this case row locking is
almost useless.
To make your application really database independent, you need to
define an easily extendable interface through which you manipulate your data.
Because C++ is available on most systems, it makes sense to use a C++
class-based interface to the databases.
If you use some feature that is specific to a given database system (such as
the REPLACE statement, which is specific to MySQL), you should
implement the same feature for other SQL servers by coding an alternative
method. Although the alternative may be slower, it will allow the other servers
to perform the same tasks.
With MySQL, you can use the /*! */ syntax to add MySQL-specific
keywords to a query. The code inside /**/ will be treated as a comment
(and ignored) by most other SQL servers.
If high performance is more important than exactness, as in some Web
applications, it is possible to create an application layer that caches all
results to give you even higher performance. By letting old results "expire"
after a while, you can keep the cache reasonably fresh. This provides a method
to handle high load spikes, in which case you can dynamically increase the cache
and set the expiration timeout higher until things get back to normal.
In this case, the table creation information should contain information of
the initial size of the cache and how often the table should normally be
refreshed.
An alternative to implementing an application cache is to use the MySQL query
cache. By enabling the query cache, the server handles the details of
determining whether a query result can be reused. This simplifies your
application. See Section 4.10, "The MySQL Query Cache."
6.1.3 What We Have Used MySQL For
This section describes an early application for MySQL.
During MySQL initial development, the features of MySQL were made to fit our
largest customer, which handled data warehousing for a couple of the largest
retailers in Sweden.
From all stores, we got weekly summaries of all bonus card transactions, and
were expected to provide useful information for the store owners to help them
find how their advertising campaigns were affecting their own customers.
The volume of data was quite huge (about seven million summary transactions
per month), and we had data for 4-10 years that we needed to present to the
users. We got weekly requests from our customers, who wanted to get "instant"
access to new reports from this data.
We solved this problem by storing all information per month in compressed
"transaction" tables. We had a set of simple macros that generated summary
tables grouped by different criteria (product group, customer id, store, and so
on) from the tables in which the transactions were stored. The reports were Web
pages that were dynamically generated by a small Perl script. This script parsed
a Web page, executed the SQL statements in it, and inserted the results. We
would have used PHP or mod_perl instead, but they were not available at the
time.
For graphical data, we wrote a simple tool in C that could process SQL query
results and produce GIF images based on those results. This tool also was
dynamically executed from the Perl script that parsed the Web pages.
In most cases, a new report could be created simply by copying an existing
script and modifying the SQL query in it. In some cases, we needed to add more
columns to an existing summary table or generate a new one, but this also was
quite simple because we kept all transaction-storage tables on disk. (This
amounted to about 50GB of transaction tables and 200GB of other customer
data.)
We also let our customers access the summary tables directly with ODBC so
that the advanced users could experiment with the data themselves.
This system worked well and we had no problems handling the data with quite
modest Sun Ultra SPARCstation hardware (2x200MHz). Eventually the system was
migrated to Linux.
This article is excerpted from MySQL
Administrator's Guide, by MySQL AB (editor) (Sams, 2004;
ISBN 0672326345). Check it out at your
favorite bookstore today. Buy
this book now. |