Many of you are still working with earlier versions of the MySQL database. This article takes a look at MySQL 4.1. It is the first of several parts that examine more recent versions of the software. It is excerpted from chapter eight of Beginning MySQL Database Design and Optimization: From Novice to Professional, written by Jon Stephens and Chad Russell (Apress, ISBN: 1590593324).
So what are the advantages of using subqueries? For one thing, the syntax is often much closer to natural language and human thinking processes. Look again at theWHEREclause of the last query we showed you:
WHERE price IN (SELECT MAX(price) FROM products GROUP BY category_id)
This is remarkably similar to saying in ordinary language “. . . where the price is in the set of maximum prices obtained when we group together the products according to their categories.”
Another reason to use subqueries even when you don’t have to is because they tend to be easier to break down into their logical components. This makes them relatively simple to modify, which can be very handy when you’re working with dynamically generated queries, for example.
In addition, there are situations in which it’s either extremely difficult or even impossible to accomplish a desired task in a single query without using a subquery. You saw such a case in the last example: Without subqueries, obtaining information of this sort (maximum or minimum by category) becomes much more difficult, and requires either the use of a temporary table or a dodge known as the “Max-Concat Trick,” which in this case would look something like this:
SELECT SUBSTRING( MAX( CONCAT(LPAD(p.price, 6, '0'), p.name) ), 7) AS name, c.name AS category, 0.00 + LEFT( MAX( CONCAT(LPAD(p.price, 6, '0'), p.name) ), 6) AS price FROM products p JOIN categories c ON c.id = p.category_id GROUP BY p.category_id;
We’ll leave it as an exercise for you to work out the details, but the basic idea here is that you concatenate each product name with its corresponding price, get the maximum of the resulting set of strings within each category, then split that string back into its parts again. This is highly inefficient, and not a recommended practice.
Subqueries That Return Rows—Derived Tables
We’ve talked about subqueries that return single values and those that return one-dimensional sets or lists of values. The progression doesn’t end there. Beginning with version 4.1, MySQL also supports subqueries that return two-dimensional sets of values that can be identified by column and row—in other words, just as you would access items of data in a table. This type of subquery is known as a derived table and can be used in theFROMclause of the outer query. Its columns may be referenced along with the rest of those used in the outer query, provided we give the derived table an alias.
NOTE Understanding the concept of a derived table is an important step on the road to understanding what an SQL view is and does, as we’ll see later in this chapter when we look at features being implemented in MySQL 5.0.
Perhaps an example will serve to make this clearer, so let’s see how we can use a derived table to tell us how many categories contain more than one product.
NOTE In this particular case, you could also obtain this information by means of a HAVINGclause: SELECT category_id, COUNT(category_id) AS number FROM products GROUP BY category_id HAVING number > 1;.
To obtain the number of products in each category, along with the ID of the category, you’d use aGROUP BYquery, like so:
SELECT category_id, COUNT(category_id) FROM products GROUP BY category_id;
You can’t test any of the values returned by this in theWHEREclause of an outer query because this query returns neither a single value nor a list of values, but a result set. What you can do is to assign it an alias and then use it in theFROMclause; when you do this, MySQL 4.1 versions and above let you treat it just like any other table in that you can refer to its columns usingtable_name.column_namesyntax (or “dot” notation if you prefer to call it that). You can think of this as something like creating a temporary table, except that you do it inside the query at one go and the table disappears as soon as the query has been executed. You can also use aliases on the columns in the derived table as well.
You can see how this works here:
Once you grasp the concept, this is not difficult to understand and modify if and when necessary. For instance, we can now get a set of the categories for which there are more than two products, just by changing the test in theWHEREclause of the outer query:
SELECT c.cid FROM ( SELECT category_id AS cid, COUNT(category_id) AS ct FROM products GROUP BY category_id ) AS c WHERE c.ct > 2;
We can also use this derived table in a join, like so:
If this seems confusing, perhaps you’ll be able to see better just what’s happening here if we break it down by means of a temporary table, as shown here:
First we create the temporary table using aCREATE ... SELECTstatement to extract the necessary data from products. The contents of this temporary table (named tempcatin the preceding example) are shown next—this is exactly what’s contained in the derived table in the query that used the subquery. Next, we do a join on the temporary and categoriestables. The end result is the same as that obtained from the single query we used before.