ACID transactions ensure that banks don’t lose your money. By wrapping arbitrarily complex logic into single units of work, the database server takes some of the burden off application developers. The database server’s ACID properties offer guarantees that reduce the need for code guarding against race conditions and handling crash recovery. The downside of this extra security is that the database server has to do more work. It also means that a database server with ACID transactions will generally require more CPU power, memory, and disk space than one without them. As mentioned earlier, this is where MySQL’s modularity comes into play. Because you can decide on a per-table basis if you need ACID transactions or not, you don’t need to pay the performance penalty on a table that really won’t benefit from transactions. Isolation LevelsThe previous description of isolation was a bit simplistic. Isolation is more complex than it might first appear because of some peculiar cases that can occur. The SQL standard defines four isolation levels with specific rules for which changes are and aren’t visible inside and outside a transaction. Let’s look at each isolation level and the type of problems that can occur. Read uncommittedIn the read uncommitted isolation level, transactions can view the results of uncommitted transactions. At this level, many problems can occur unless you really, really know what you are doing and have a good reason for doing it. Read uncommitted is rarely used in practice. Reading uncommitted data is also known as a dirty read. Read committedThe default isolation level for most database systems is read committed. It satisfies the simple definition of isolation used earlier. A transaction will see the results only of transactions that were already committed when it began, and its changes won’t be visible to others until it’s committed. However, there are problems that can occur using that definition. To visualize the problems, refer to the sample data for the Stock and StockPrice tables as shown in Tables 2-2 and 2-3.
Table 2-2. The Stock table. stock_id date open high low close 1 2002-05-01 21.25 22.30 20.18 21.30 2 2002-05-01 10.01 10.20 10.01 10.18 3 2002-05-01 18.23 19.12 18.10 19.00 4 2002-05-01 45.55 46.99 44.87 45.71 1 2002-05-02 21.30 21.45 20.02 20.21 2 2002-05-02 10.18 10.55 10.10 10.35 3 2002-05-02 19.01 19.88 19.01 19.22 4 2002-05-02 45.69 45.69 44.03 44.30 Table 2-3. The StockPrice table Imagine you have a Perl script that runs nightly to fetch price data about your favorite stocks. For each stock, it fetches the data and adds a record to the StockPrice table with the day’s numbers. So to update the information for Amazon.com, the transaction might look like this:
But what if, between the select and insert, Amazon’s id changes from 4 to 17 and a new stock is added with id 4? Or what if Amazon is removed entirely? You’ll end up inserting a record with the wrong id in the first case. And in the second case, you’ve inserted a record for which there is no longer a corresponding row in the Stock table. Neither of these is what you intended. The problem is that you have a nonrepeatable read in the query. That is, the data you read in the SELECT becomes invalid by the time you execute the INSERT. The repeatable read isolation level exists to solve this problem. Repeatable readAt the repeatable read isolation level, any rows that are read during a transaction are locked so that they can’t be changed until the transaction finishes. This provides the perfect solution to the problem mentioned in the previous section, in which Ama-zon’s id can change or vanish entirely. However, this isolation level still leaves the door open to another tricky problem: phantom reads. Using the same data, imagine that you have a script that performs some analysis based on the data in the StockPrice table. And let’s assume it does this while the nightly update is also running. The analysis script does something like this:
But the nightly update script inserts between those two queries new rows that happen to match the close BETWEEN 10 and 20 condition. The second query will find more rows that the first one! These additional rows are known as phantom rows (or simply phantoms). They weren’t locked the first time because they didn’t exist when the query ran. Having said all that, we need to point out that this is a bit more academic than you might think. Phantom rows are such a common problem that InnoDB’s locking (known as next-key locking) prevents this from happening. Rather than locking only the rows you’ve touched in a query, InnoDB actually locks the slot following them in the index structure as well. SerializableThe highest level of isolation, serializable, solves the phantom read problem by ordering transactions so that they can’t conflict. At this level, a lot of timeouts and lock contention may occur, but the needs of your application may bring you to accept the decreased performance in favor of the data stability that results. Table 2-2 summarizes the various isolation levels and the drawbacks associated with each one. Keep in mind that as you move down the list, you’re sacrificing concurrency and performance for increased safety.
Table 2-4. ANSI SQL isolation levels
blog comments powered by Disqus |
|
|
|
|
|
|
|