This chapter covers the MySQL architecture, locking and concurrency, and transactions. It also discusses how to select the right engine and looks at each of MySQL's storage engines in detail. (From the book High Performance MYSQL: Optimization, Backups, Replication and Load Balancing, by Jeremy Zawodny and Derek Balling, ISBN: 0596-003064, O'Reilly Media, 2004.)
The first of those problems is how to deal with concurrency and locking. In any data repository you have to be careful when more than one person, process, or client needs to change data at the same time. Consider, for example, a classic email box on a Unix system. The popular mbox file format is incredibly simple. Email messages are simply concatenated together, one after another. This simple format makes it very easy to read and parse mail messages. It also makes mail delivery easy: just append a new message to the end of the file.
But what happens when two processes try to deliver messages at the same time to the same mailbox? Clearly that can corrupt the mailbox, leaving two interleaved messages at the end of the mailbox file. To prevent corruption, all well-behaved mail delivery systems implement a form of locking to prevent simultaneous delivery from occurring. If a second delivery is attempted while the mailbox is locked, the second process must wait until it can acquire the lock before delivering the message.
This scheme works reasonably well in practice, but it provides rather poor concurrency. Since only a single program may make any changes to the mailbox at any given time, it becomes problematic with a high-volume mailbox, one that receives thousands of messages per minute. This exclusive locking makes it difficult for mail delivery not to become backlogged if someone attempts to read, respond to, and delete messages in that same mailbox. Luckily, few mailboxes are actually that busy.
Reading from the mailbox isnít as troublesome. Thereís nothing wrong with multiple clients reading the same mailbox simultaneously. Since they arenít making changes, nothing is likely to go wrong. But what happens if someone tries to delete message number 25 while programs are reading the mailbox? It depends. A reader could come away with a corrupted or inconsistent view of the mailbox. So to be safe, even reading from a mailbox requires special care.
Database tables are no different. If you think of each mail message as a record and the mailbox itself as a table, itís easy to see that the problem is the same. In many ways, a mailbox is really just a simple database table. Modifying records in a database table is very similar to removing or changing the content of messages in a mailbox file.
The solution to this classic problem is rather simple. Systems that deal with concurrent read/write access typically implement a locking system that consists of two lock types. These locks are usually known as shared locks and exclusive locks,or read locks and write locks.
Without worrying about the actual locking technology, we can describe the concept as follows. Read locks on a resource are shared: many clients may read from the resource at the same time and not interfere with each other. Write locks, on the other hand, are exclusive, because it is safe to have only one client writing to the resource at given time and to prevent all reads when a client is writing. Why? Because the single writer is free to make any changes to the resourceóeven deleting it entirely.
In the database world, locking happens all the time. MySQL has to prevent one client from reading a piece of data while another is changing it. It performs this lock management internally in a way that is transparent much of the time.
One way to improve the concurrency of a shared resource is to be more selective about what is locked. Rather than locking the entire resource, lock only the part that contains the data you need to change. Better yet, lock only the exact piece of data you plan to change. By decreasing the amount of data that is locked at any one time, more changes can occur simultaneouslyóas long as they donít conflict with each other.
The downside of this is that locks arenít free. There is overhead involved in obtaining a lock, checking to see whether a lock is free, releasing a lock, and so on. All this business of lock management can really start to eat away at performance because the system is spending its time performing lock management instead of actually storing and retrieving data. (Similar things happen when too many managers get involved in a software project.)
To achieve the best performance overall, some sort of balance is needed. Most commercial database servers donít give you much choice: you get what is known as row-level locking in your tables. MySQL, on the other hand, offers a choice in the matter. Among the storage engines you can choose from in MySQL, youíll find three different granularities of locking. Letís have a look at them.
The most basic and low-overhead locking strategy available is a table lock, which is analogous to the mailbox locks described earlier. The table as a whole is locked on an all-or-nothing basis. When a client wishes to write to a table (insert, delete, or update, etc.), it obtains a write lock that keeps all other read or write operations at bay for the duration of the operation. Once the write has completed, the table is unlocked to allow those waiting operations to continue. When nobody is writing, readers obtain read locks that allow other readers to do the same.
For a long time, MySQL provided only table locks, and this caused a great deal of concern among database geeks. They warned that MySQL would never scale up beyond toy projects and work in the real world. However, MySQL is so much faster than most commercial databases that table locking doesnít get in the way nearly as much as the naysayers predicted it would.
Part of the reason MySQL doesnít suffer as much as expected is because the majority of applications for which it is used consist primarily of read queries. In fact, the MyISAM engine (MySQLís default) was built assuming that 90% of all queries run against it will be reads. As it turns out, MyISAM tables perform very well as long as the ratio of reads to writes is very high or very low.
A slightly more expensive form of locking that offers greater concurrency than table locking, a page lock is a lock applied to a portion of a table known as a page. All the records that reside on the same page in the table are affected by the lock. Using this scheme, the main factor influencing concurrency is the page size; if the pages in the table are large, concurrency will be worse than with smaller pages. MySQLís BDB (Berkeley DB) tables use page-level locking on 8-KB pages.
The only hot spot in page locking is the last page in the table. If records are inserted there at regular intervals, the last page will be locked frequently.
The locking style that offers the greatest concurrency (and carries the greatest overhead) is the row lock. In most applications, itís relatively rare for several clients to need to update the exact same row at the same time. Row-level locking, as itís commonly known, is available in MySQLís InnoDB tables. InnoDB doesnít use a simple row locking mechanism, however. Instead it uses row-level locking in conjunction with a multiversioning scheme, so letís have a look at that.
If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!