This chapter covers the MySQL architecture, locking and concurrency, and transactions. It also discusses how to select the right engine and looks at each of MySQL's storage engines in detail. (From the book High Performance MYSQL: Optimization, Backups, Replication and Load Balancing, by Jeremy Zawodny and Derek Balling, ISBN: 0596-003064, O'Reilly Media, 2004.)
These issues may seem rather abstract without some sort of real-world context. So let’s consider some common uses for tables in various database applications. For each table, we’ll look at which engine best matches with the table’s needs. The details of each engine are covered in the next section.
Suppose you want to use MySQL to log a record of every telephone call from a central telephone switch in real time. Or maybe you’ve installed mod_log_sql for Apache so you can log all visits to your web site directly in a table. In such an application, speed is probably the most important goal; you don’t want the database to be the bottleneck. Using MyISAM tables works very well because they have very low overhead and can handle inserting thousands of records per second.
Things will get interesting if you decide it’s time to start running reports to summarize the data you’ve logged. Depending on the queries you use, there’s a good chance you will significantly slow the process of inserting records while gathering data for the report. What can you do?
You can use MySQL’s built-in replication (Chapter 7) to clone the data onto a second (slave) server. You can then run your time-and CPU-intensive queries against the data on the slave. This keeps the master free to insert records as fast as it possibly can while also giving you the freedom to run any query you want without worrying about how it could affect the real-time logging.
Another option is to use a MyISAM Merge table. Rather than always logging to the same table, adjust the application to log to a table that contains the name or number of the month in its name, such as web_logs_2004_01 or web_logs_2004_jan. Then define a Merge table that contains the data you’d like to summarize and use it in your queries. If you need to summarize data daily or weekly, the same strategy works; you just need to create tables with more specific names, such as web_logs_ 2004_01_01. While you’re busy running queries against tables that are no longer being written to, your application can log records to its current table uninterrupted. Merge tables are discussed in the later section “MyISAM Merge tables.”
A final possibility is simply to switch to using a table that has more granular locking than MyISAM does. Either BDB or InnoDB works well in this case. Non-MyISAM tables will generally use more CPU and disk space, but that may be a reasonable tradeoff in this case. Also, in the event of a crash, MyISAM tables may take quite a long time to check and repair while InnoDB tables should recover quickly.
Read-only or read-mostly tables
Tables that contain the data used to construct a catalog or listing of some sort (jobs, auctions, real estate, etc.) are usually read from far more often than they are written to. This makes them great candidates for MyISAM.
When you deal with any sort of order processing, transactions are a requirement. Half-completed orders aren’t going to endear customers to your service. Using trans-action-safe table types (InnoDB or BDB), these unfortunate “data surprises” can be avoided. Considering that BDB tables use—at best—locking at the page level, applications with high transaction volumes should consider InnoDB tables.
In the case of order processing, InnoDB has a distinct advantage because it supports referential integrity through the use of foreign keys. These keys allow a field in one table to have an enforced relationship to the key of another table (e.g., an Order record contains a CustomerID field that “points” to the primary key of the Customer table). Foreign keys effectively point to those other tables and indicate that data is maintained in them, and they help you keep data consistent across your tables. (Keep in mind that a foreign key in an InnoDB table must reference another InnoDB table. Currently they can’t cross storage engines.)
You might want to design your tables so that customers can’t be removed without also removing all their orders. Or maybe you’d like to ensure that products aren’t deleted from the catalog table before the orders that reference those products are archived. With InnoDB’s foreign keys, you can.
If you’re collecting stock quotes for your own analysis, MyISAM tables work great. However, if you’re running a high-traffic web service that has a real-time quote feed and thousands of users, a query should never have to wait. At any time, there could be many clients attempting to read and write to the table, so the row-level locking provided by InnoDB tables is the way to go.
If you have sufficient memory, MySQL’s in-memory Heap tables might be an option, too. However, their indexes have some interesting restrictions you need to investigate first. See the “Heap Tables” section in Chapter 4 for more details.
Bulletin boards and threaded discussion forums
Threaded discussions are an interesting problem for MySQL users. There are hundreds of freely available PHP and Perl-based systems available that provide threaded discussions. Many of them aren’t written with database efficiency in mind, so they tend to perform a large number of queries for each request they serve, as well as updates to counters and usage statistics about the various discussions. Many of the systems also use a small number of monolithic tables to store all their data. As a result, a few central tables become the focus of heavy read and write activity, and the locks required to enforce concurrency become a substantial source of contention.
Despite their design shortcomings, most of the systems work well for small and medium loads. However, if a web site grows large enough and generates a significant amount of traffic, it may begin to get very slow. The obvious solution is to switch to a different table type that can handle the heavy read/write volume. Users who have attempted this are sometimes surprised to find that the system runs even more slowly than it did before!
What they don’t realize is that the system is using a particular query, normally something like:
SELECT COUNT(*) FROM table WHERE ...
The problem is that not all engines can run that query quickly. MyISAM tables keep accurate row counts available, so they can. But BDB and InnoDB must actually scan the data to count all the rows. The developers of the popular web site Slashdot (http://slashdot.org/) ran into this problem when they moved their system from MyISAM to InnoDB tables. They spent time going through their code to eliminate all those queries.
MySQL’s query cache, which we’ll cover in more detail in Chapter 5, can often be a big help in situations in which an application issues the same query over and over with the same parameters.
If you ever need to distribute a CD-ROM-or DVD-ROM-based application that uses MySQL data files, consider using MyISAM or Compressed MyISAM tables. They can be easily isolated and copied to other media. Compressed MyISAM tables take far less space than uncompressed ones, but they are read-only. Since the data is going to be on read-only media anyway, there’s little reason not to use compressed tables.
If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!