What’s New In MySQL 4.1 Part One: Overview and Subqueries

The current release of MySQL, version 4.1.10, offers significant improvements over version 4. While it still has some room for improvement, its new features and capabilities should silence the critics who have up until now regarded it as little more than a toy. In this article, the first of two parts, David Fells covers scalar and correlated queries, derived tables, and row level subqueries.

Back in October of 2004, MySQL 4.1.7, the first production release of MySQL, was made available to the public. Like many other developers I had participated in testing MySQL 4.1 from the start, and eagerly awaited the day that the final release would be ready for production. With version 4, and especially version 4.1, MySQL has started to come of age. In this article we will cover all the big changes as well as most of the minor ones that have been seen in version 4 so far, up to and including the current release, 4.1.10.

For years it has been regarded as a toy by SQL Server, DB2 and Oracle afficionados, who pointed out the lack of support for advanced RDBMS features like views, triggers, stored procedures and subqueries. While there are still many features lacking in MySQL, nearly all of them are slated for releate in MySQL 5 – and version 4 does a considerable job closing the gap between fledgling small business database and mature enterprise database. It is true that, since version 3.23, MySQL has been a top contender for the best performance in a number of database benchmarks, largely due to its efficient use of indexes and incredible ability to optimize table joins. Without features like views and subqueries though, MySQL has left complex reporting tasks entirely to whatever host language is taking advantage of it – be it directly through PHP or via import into MS Excel.

MySQL 4.1 had three main focal areas with new features: subqueries and derived tables, character sets, and replication. In addition to these areas, there were many changes to improve standards compliance and portability; some functions were added, as well as some especially useful modifiers for typical statements.

{mospagebreak title=Scalar and Correlated Subqueries}

Subqueries and derived tables are arguably the biggest and most important change in MySQL 4.1. A subquery is, as the name suggests, a query within a query. There are five general types of subqueries in standard SQL, all of which are supported. Subqueries may be used in the context of a row, a table to select from, a table to test membership, a correlated table, or as a scalar.

A scalar subquery is a subquery that returns a single value of a basic data type – meaning that the result of the subquery contains no rows or columns. This can also be thought of as one column from one row. Scalar subqueries may be used in the field list of a SELECT statement, for comparison in a WHERE statement, or in the VALUES list of an INSERT statement. Scalar subqueries are the kind that are most easily replaced by a join, whereas some of the other subquery types cannot be replicated with joins. Here is an example of a scalar subquery.

Here is our sample data:

CREATE TABLE table1 (column1 TINYINT);
INSERT INTO table1 (column1) VALUES (1);

CREATE TABLE table2 (column2 TINYINT);
INSERT INTO table2 (column2) VALUES (2);

Here is our sample subquery:

SELECT (SELECT column1 FROM table1) FROM table2;

The sample query would return a 1, because the value selected from table1 would be 1. If there were multiple rows in table1, the subquery would fail without adding a WHERE clause to restrict the results to a single row. This example is academic, but demonstrates the nature of a scalar subquery. Scalar subqueries can typically be replaced with joins, which are far more efficient in terms of processing time. Keep this in mind when constructing queries – just because subqueries are available does not make them right tool for the job.

The next type of subquery is possibly the most useful, and the most taxing on the server: the correlated subquery. A correlated subquery is a subquery that refers to one or more tables outside of the subquery expression. Consider this example:

SELECT 
     *
FROM
     table1
WHERE
     column1 = (
                SELECT
                     column1
                FROM
                     table2
                WHERE
                     table1.column1 = table2.column1
                )

In this example, the subquery does not contain a reference to table1 in the FROM clause, so the engine goes to the outer query, where it finds table1. If table1 was not in the outer query’s FROM clause, this statement would produce an error. You may have noticed that this subquery is also a scalar subquery – it only returns one value to the outer query. This type of query often cannot be replicated with a join, depending on the complexity of the query.

Subqueries can be nested to 63 levels in theory, though the query optimizer will likely fall apart after four or five if they are correlated. It is important to remember that when nesting subqueries, the innermost subquery will be processed first, working outward. In order to get the data you expect, this often requires aliasing columns as well as tables if you intend to relate to a column in a table beyond the first outer query.

{mospagebreak title=Derived Tables}

The next type of subquery uses what is referred to as a derived table, appearing in the list above as “a table to select from.” In this usage, the subquery expression appears as a part of the FROM clause or to test membership. Subqueries used to test membership will be used with keywords like ANY, EXISTS, and NOT EXISTS in the outer query and may or may not be replaceable by a join. On the flip side are tables to select from, which are easily replaced with joins in nearly any situation. The tables created by these subqueries are called derived tables, which are essentially unnamed, non-persistent views.

Testing membership is one of the core features that any implementation of SQL should provide. SQL is a set oriented language and memberhip tests are fundamental in set theory. While MySQL has provided the UNION operation for as long as I can remember, the EXISTS, NOT EXISTS, and ANY operations are new and much needed. Subqueries that test membership can also be correlated and are especially useful when performing complex reporting operations. In the following example, we will select values from one table where they do not exist in another.

SELECT 
    column1
FROM 
    table 1
WHERE 
    NOT EXISTS
        (
         SELECT
              column1 
         FROM
              table2 
        )

In this example, the inner query is executed first, selecting column1 from table2. The result of that subquery is used in the NOT EXISTS clause which, logically, is the same as using NOT IN and declaring a set of values by hand. In fact, the EXISTS and IN statements can be used interchangeably in this context.

An example of a derived table in the FROM clause is shown below.


SELECT
     column1 
FROM
     (
      SELECT
           column1
      FROM
           table2
     )

The process here is similar to that of a correlated subquery – the engine first processes the inner query, creating a derived table. That derived table is then used in the outer query as the target for record selection. These queries can also be correlated and are typically more useful in that context, but do have uses on their own.

{mospagebreak title=Row Level Subquery}

The final type of subquery to discuss is a row level subquery. Row level subqueries return a single row and may contain any number of columns. They use what are called row constructors. The following two statements demonstrate the two available row constructor syntaxes used in row subqueries.

 SELECT * FROM table1 WHERE (1, 2) = (SELECT column1, column2 FROM table2);
 SELECT * FROM table1 WHERE ROW(1, 2) = (SELECT column1, column2 FROM table2);

The expressions (1, 2) and ROW(1, 2) are equivalent, each creating a logical row for evaluation against the subquery. The following example demonstrates the use of a row subquery to find a row match on three columns in another table.

 SELECT
      column1,
      column2, 
      column3
 FROM
      table1
 WHERE
      (column1, column2, column3)
 IN
      (
       SELECT
            column1,
            column2,
            column3
       FROM
            table2
      )

This example is equivalent to:

 
 
 SELECT
      table1.column1,
      table1.column2,
      table1.column3
 FROM
      table1,
      table2 
 WHERE
      table1.column1 = table2.column1
 AND
      table1.column2 = table2.column2
 AND
      table1.column3 = table2.column3

The subquery uses a membership test with the IN statement in the WHERE clause, whereas the join simply uses multiple conditions in the WHERE clause. Logically, the first example makes more sense, as we are not selecting any rows from table2 – however, the second example will be the faster performer, because MySQL is extremely good at join optimization and not so great at optimizing set operations.

In this article we have been introduced to MySQL 4.1 and gone over the syntax for using the various types of subqueries that are now available. Subqueries are extremely useful tools but are often supurfluous and can be easily replaced by a much more efficient table join. There are of course an equal number of situations where subqueries cannot be simulated with any number of joins. In these cases, subqueries are invaluable. I hope that having been introduced to the syntax, you all will get your hands dirty and experiment with them. In the next article, we will go over all the other changes, including new character set handling, date handling, and a lot of other smaller changes.

[gp-comments width="770" linklove="off" ]

chat