Optimizing Queries with Operators, Branching and Functions, continued

This article will give you a good grounding in operators, branching and functions in MySQL, so you can make the database, instead of your own code, do the bulk of the work. It is the second of three parts, and excerpted from chapter four of Beginning MySQL Database Design and Optimization: From Novice to Professional, written by Jon Stephens and Chad Russell (Apress; ISBN: 1590593324).

MySQL Functions

We are now going to look at a number of MySQL functions. Again, the focus here is on the central premise of this chapter: replacing programming logic with SQL logic is a key optimization skill, and it is almost always the best course of action to make the database perform as much work as possible. You want to minimize the amount of filtering and manipulating data that you do in your programming code.

Of course, MySQL has quite a large number of built-in functions, and some of them have a great many possible arguments as well. We won’t attempt to catalog all of them here—that’s what the MySQL Manual is for, after all—but we will discuss and demonstrate those functions that, in our experience, have proved to be the most important and useful in terms of replacing programming logic with SQL logic.

Math Functions

MySQL has most of the standard mathematical functions you would expect to find in any programming language. These include rounding functions, exponential and logarithmic functions, and even trigonometric functions.

Rounding Functions

All of the following functions round a floating-point number to one with fewer digits to the right of the decimal point or to an integer in one way or another:

CEILING(): Returns the smallest integer that is not less than the argument. CEILING(2.45) returns 3; CEILING(-2.45) returns –2. It can also be written as CEIL() in MySQL 4.0.6 and later.

FLOOR(): Returns the largest integer that’s smaller than the argument. FLOOR(2.65) yields 2; FLOOR(-2.65) returns –3.

TRUNCATE (N, D): Returns N truncated to D decimal places. TRUNCATE (2.45, 1) returns 2.4; TRUNCATE(-2.45, 1) yields –2.4. It is possible for D to be larger than the number of decimals in N; for example, TRUNCATE(2.45, 3) returns 2.450. Note that both arguments are required; TRUNCATE(2.45, 0) evaluates to 2. By using a negative value for D, you can round to the left of the decimal point as well; TRUNCATE(542.5, -1) yields 540 and TRUNCATE(542.5, -2) returns 500.

ROUND(N[,  D]): Rounds N to D decimal places; if D isn’t specified, then 0 is implied. Note that the manner in which rounding is performed is system-dependent. If portability is an issue for your application, or if you need to be able to depend on the exact behavior, consider using TRUNCATE(), FLOOR() , or a user-defined function (UDF) instead. Beginning in MySQL 5.0, you can also define a stored procedure or function that implements the exact rounding algorithm desired. (See Chapter 8 for information about user-defined functions and stored procedures.)

TIP
MySQL also has a convenient function for formatting numbers for display: FORMAT(N,  D) returns a number N formatted in comma-separated format truncated to D decimal places. For example, FORMAT(23456.789,  2) returns the string ‘23,456.78’ .

Powers, Exponents, and Logarithms

To obtain powers of numbers in MySQL, you can use the POW() or POWER() function (the two function names are synonymous). For example, 10 to the power of 2 (that is, 10 * 10 or 102) can be found using POW() as shown here:

The logarithm of X to the base Y is defined as the power to which Y must be raised is order to produce X. For example, the logarithm of 100 to the base 10 is 2, because 10 to the power of 2 is 100. Using mathematical notation, we would write this relationship as this:

log10100 = 2 102 = 100

The most frequently used base for logarithms in mathematics, physics, and engineering is the special value e, where e is calculated by this series:

Logarithms to the base e are known as natural logarithms (loge, often written as ln) and can be obtained in MySQL using the LOG() function. Enterprise applications don’t have that many uses for natural logarithms. However, it can sometimes be quite handy to be able to calculate base-2, base-8, base-10, or base-16 logarithms, which you can do by making use of the fact that logxy = ln y / ln x. Here is an example, in which the second result tells us that we need six digits to express the base 10 number 100 in binary notation.

NOTE
The logarithm of 1 to any base is always 0; the logarithm of any number less than or equal to 0 (to any base) is always undefined and in MySQL will return NULL. Attempting to obtain a logarithm to the base 1 or 0 will also yield a NULL result.

In MySQL 4.0.3, logarithmic functions were improved in a number of ways. The function LN() was added as another means for calculating natural logarithms, and the LOG() function was enhanced so that it became possible to find logarithms bases other than 2 without needing to perform division using the notation LOG(basevalue). Here are a couple examples:

When used with a single argument, LOG() still returns the natural logarithm. The LN() function may take only a single argument and always return the natural logarithm.

TIP  To learn more about natural logarithms and e,a good place to start is on the MathWorld web site at
http://mathworld.wolfram.com/e.html.

The functions LOG2() (logarithm to the base 2) and LOG10() (logarithm to the base 10) were also added in MySQL 4.0.3.

To get the value of a number raised to the power of e, use the EXP() function, as shown in the next example.

NOTE
POW(), EXP(), and LOG() always return decimal values, even if all arguments are integers.

POW(), EXP(), and LOG()

{mospagebreak title=Trigonometric Functions}

The trigonometric functions are useful for calculating lengths, arcs, and angles, as well as plotting curves and performing complex calculations involving waves.

NOTE
We won’t attempt to provide definitions or to explain trigonometry here. If you don’t already know what they are or how to use them, check out the MathWorld web site, starting from
http://mathworld.wolfram.com/Trigonometry. html.

MySQL provides the following trigonometric functions, where X is an angle expressed in radians:

SIN(X): Sine of X

COS(X): Cosine of X

TAN(X): Tangent of X

COT(X): Cotangent of X

NOTE
The secant and cosecant trigonometric functions (often written as sec and csc in mathematical notation) can be obtained in MySQL by using the expressions 1 / COS(X) and 1 / SIN(X) , respectively.

The radian is a unit of angular measure defined such that 180 degrees is equal to p (pi) radians. (The number p is represented in MySQL as PI().) To convert radians to degrees, you can use the DEGREES() function, and to convert from degrees to radians, use the RADIANS() function.

The inverse trigonometric functions all return angle values expressed in radians:

ASIN(X): Arcsine of X

ACOS(X): Arccosine of X

ATAN(X): Arctangent of X

ATAN(X,  Y) or ATAN2(X,  Y): Arctangent of Y / X

NOTE
MySQL stores p and e as double-precision floating-point values. This should be of sufficient accuracy for 99% of all applications requiring the use of these values.

{mospagebreak title=Other Math Functions}

As noted earlier, ABS() returns the absolute value of its argument. The SIGN() function (known in mathematics as the signum function and often written sgn(x) ) tests the sign of its argument, and returns 1 if the argument is positive, –1 if it’s negative, and 0 if the argument is 0.

To obtain the square root of a number, use the SQRT() function, as shown here:

It’s often useful or necessary in programming to generate random numbers. MySQL provides this capability using the RAND() function, which returns a pseudo-random floating-point value from 0 to 1. An integer may be passed to this function as an optional argument; this serves to “seed” the function and allows you to produce a fairly random but repeatable sequence of values. You can also use RAND() to obtain a set of values in random order from a query by including it in an ORDER BY clause.

Conversion Functions

In nearly any programming application, you encounter situations in which it’s necessary to convert data from one type to another. For example, you may have a variable whose value is the string “535” , and you may need to be able to treat this like the number 535 in order to multiply it by another value. Much the same is true in MySQL. In many cases, MySQL handles the conversion for you. For example, if you try to multiply the number 5 by the string “5”, the latter is automatically converted to a number, so that the result is the number 25. However, there are times when MySQL’s built-in type conversions don’t do what you want them to, and you need to force the issue.

MySQL has two types of conversion functions for this purpose: the first is for casting datatypes to other datatypes (and in MySQL 4.01 and newer, for converting strings in one character set to another), and the second is for converting numbers in one base to another.

Type Conversions

While MySQL generally does a pretty good job of converting between datatypes, there may be times when it’s necessary to perform such conversions explicitly. Beginning in MySQL 4.0.2, two functions are provided for this purpose. Both CAST() and CONVERT() take an argument of one type and attempt to return the same value converted to a different datatype. As of MySQL 4.1, CONVERT() can also be used to convert strings from one character set to another.

The CAST() function has the following syntax:

CAST(expression AS type)

where expression is a value or an expression that evaluates to a value, and type is one of the following datatypes:

• BINARY (Note that BINARY colname  is an alias for CAST(colname  AS BINARY))
• CHAR (available in MySQL 4.0.6 and later)
• DATE
• DATETIME
• SIGNED or SIGNED INTEGER
• TIME
• UNSIGNED or UNSIGNED INTEGER

When used to convert between datatypes, the CONVERT() function has this syntax:

CONVERT(expression,  type)

This function can use any of the same values for type as CAST() does.

These functions are most useful when creating a new table using CREATE … SELECT … or copying data between tables using an INSERT … SELECT … query. For instance, suppose that you have a TEXT column in a table that you would like to have sorted using case-sensitivity. Instead of using BINARY textcol  in each of your queries, you can copy the table definition and use this statement:

CREATE newtable  SELECT CAST(textcol  AS BINARY) FROM oldtable;

or this one:

INSERT INTO newtable  SELECT CAST(textcol  AS BINARY) FROM oldtable;

In the latter case, you’ll need to make sure that the column into which you’re selecting was declared as BINARY. (If you copied the definition of the old table, you’ll need to change the column type using ALTER TABLE.)

The second form of CONVERT() has the following syntax:

CONVERT(expression USING encoding)

where encoding is the name of a character set, such as in this example:

SELECT CONVERT(‘I like MySQL’ USING utf8);

You can see which character sets are available on your installation by using SHOW CHARACTER SET; .

NOTE
In this book, we assume the use of the default Latin 1 (latin1) character set and latin1_swedish_ci collation. For information about configuring other character sets and collations, consult the MySQL Manual.

{mospagebreak title=Conversion of Numbers Between Bases}

Four functions convert numbers from one base to another:

BIN(X): Converts a base-10 number X to binary form (base 2).

CONV(X,  N,  M): Converts the number X from base N to base M. The minimum allowed value for N or M is 2; the maximum value for either one is 36 (10 digits + 26 letters of the Latin alphabet for “digits” greater than 9).

HEX(X): Converts a base-10 number X to its hexadecimal equivalent (base 16).

OCT(X): Converts a base-10 number X to octal (base 8).

All four of these functions return their result as a string. More often than not, when you use these functions within more complex expressions, MySQL will automatically convert the result to a number, as shown in the following example.

Keep in mind that numbers containing characters other than the digits 0 through 9 need to be quoted.

CAUTION
Do not confuse CONV() with the CONVERT() function! The latter is used to convert between datatypes, or to convert strings from one character set to another (see the previous section).

Encryption and Encoding Functions

Several MySQL functions encrypt or encode data. You can use these functions for purposes such as keeping passwords secure and ensuring that critical data hasn’t been corrupted.

MD5() and SHA1() are one-way hashing algorithms that are useful for safeguarding application passwords and other sensitive data, as is ENCRYPT() on Unix systems. ENCODE()/DECODE(), AES_ENCRYPT()/ AES_DECRYPT(), and DES_ENCRYPT() / DES_DECRYPT() provide keyword-based encryption and decryption. PASSWORD() and OLD_PASSWORD() duplicate the algorithms used to encrypt MySQL’s own passwords. COMPRESS() can be used to compress lengthy strings prior to storage and so save on space.

MD5(): Calculates a 128-bit checksum for a string and returns it as a 32-digit hexadecimal number using the RSA-MD5 Message Digest Algorithm. This function is supported in or has been ported to a number of programming languages, including C, Java, PHP, Perl, Visual Basic, and even JavaScript. For more information about MD5 and links to ports of it in various languages, visit http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html.

SHA1(): Calculates a 160-bit hash for a string and returns it as a 40-bit hexadecimal number. Like the MD5 algorithm, the Secure Hashing Algorithm 1 (SHA1) has been implemented in a number of different programming languages. For more information about the SHA1 algorithm, see http://www.faqs.org/rfcs/rfc3174.html.

ENCODE() and DECODE(): ENCODE() takes as arguments a data string to be encoded and a password string, and returns a binary-encoded version of the data string that’s the same length as the original. (If you wish to store the encoded string, be sure to use a BLOB column rather than a CHAR or VARCHAR column.) DECODE() also takes two arguments—the encoded string and the same password that was used to encode it—and returns the original string.

ENCRYPT(): Uses the Unix crypt() system call to encrypt a string. This function can take an optional second parameter in addition to the string to be encrypted, a “seed” string of two or more characters. This function does not work on Windows and other operating systems that don’t support the crypt() system call; on these systems, it always returns NULL .

AES_ENCRYPT() and AES_DECRYPT(): Implement the U.S. Government’s Approved Encryption Standard, also known as FIPS-197, and are the most cryptographically secure encoding functions available in MySQL as of this writing. Like ENCODE() and DECODE() , these functions require two arguments; the second argument is a password. For more information, see the AES home page at http://csrc.nist.gov/CryptoToolkit/aes/. This pair of functions was added in MySQL 4.0.2.

DES_ENCRYPT() and DES_DECRYPT(): Implement the U.S. National Institute of Standards and Technology’s Triple-DES algorithm and are available only if MySQL has been compiled with Secure Sockets Layer (SSL) support. For more information, check the MySQL Manual or http://csrc.nist.gov/CryptoToolkit/ Encryption.html#a3DES.

PASSWORD(): Same as that used by MySQL to encrypt passwords set using GRANT commands. It is strongly recommended that you do not use it in your application code, since it is liable to change between major releases. Prior to MySQL 5.0, this function returns a 16-digit hexadecimal number; starting in MySQL 5.0, the algorithm for PASSWORD() has changed, and the function returns a 41-character string that begins with the ‘*’ (asterisk) character followed by a 40-digit hexadecimal number.

OLD_PASSWORD(): In MySQL 5.0, emulates the behavior of PASSWORD() in previous versions of MySQL.

COMPRESS() and UNCOMPRESS(): In MySQL 4.1.1 and later, COMPRESS() compresses a string using the zlib or equivalent library. The compressed string must be stored in a BLOB column. This can be useful for storing large amounts of textual data. The inverse of the COMPRESS() function is UNCOMPRESS(), which was also added in MySQL 4.1.1. You can find out how long a compressed string will be when uncompressed by using UNCOMPRESSED_LENGTH() on it before uncompressing it. If zlib functionality is not available, these functions will return NULL .

TIP
It’s usually better to compress data before trying to encrypt it.

CRC32(): Calculates a 32-bit cyclic redundancy check (CRC) for a string. This can be used as a hedge against data corruption. However, be aware that multiple errors can cancel each other out, so it is not 100% foolproof. (On the other hand, it’s fairly reliable, and the larger the block of data, the less likely this is to happen.) If you’re interested in knowing more, the MathWorld web site has some good technical information on CRCs and how they work at http://mathworld.wolfram.com/ CyclicRedundancyCheck.html .

{mospagebreak title=String Functions and Regular Expressions}

Another way that you can save on programming logic in MySQL-based applications, and gain optimization benefits, is by performing string manipulations as part of your queries. MySQL provides a large number of string operations and supports regular expressions as well. First, let’s look at some of the string functions.

String Operations

MySQL has functions for joining strings, finding and extracting substrings, and performing other useful string operations. To obtain the length of a string (that is, the number of characters it contains), you can use the LENGTH() function for single-byte character sets. For multibyte characters, you must use CHAR_LENGTH() . If internationalization is likely to be a concern in your application, you should definitely use CHAR_LENGTH() .

For joining strings together, you might be tempted to use the + operator. Several programming languages allow you to do this, but when we try it in MySQL, here’s what happens:

In MySQL, the + operator can be used only for adding numbers, and (as we discussed earlier) any strings occurring in an expression involving addition are automatically converted to zero. Instead, use either CONCAT() or CONCAT_WS() .

Both of these functions concatenate strings, but in slightly different ways. CONCAT() takes two or more quoted strings separated by commas and returns a single string containing all of them joined together.

This is a bit better, but is probably not exactly what we had in mind in this case. One possible solution is to insert a new string containing a space in between the strings we’re already using, as shown in this example:

This is a little more like it. But there’s a better way: using the CONCAT_WS() function. The “WS” stands for “with separator” and indicates that the first argument is to be inserted as a separator or delimiter between each of the strings to be joined.

You can see another example of using CONCAT() in the “improved” PHP 4 and Python coding examples in the “Replacing Program Logic with SQL Logic: A Demonstration” section at the beginning of this chapter.

NOTE
Unlike many programming languages, MySQL does not have a concatenation operator like the & operator in Visual Basic or the . operator in PHP or Perl; you must use CONCAT() or CONCAT_WS() to join strings.

To extract substrings from strings, use SUBSTRING(), whose syntax can take either of the forms shown here:

SUBSTRING(string, position[, length])
SUBSTRING(string  FROM position[ FOR length])

This function returns the next length characters of string beginning with the character at position, with the first character in string at position 1. If length is
unspecified, then all remaining characters of string are included in the returned substring. Here are some examples:

There are no differences in behavior between the forms using commas and those using FROM and (optionally) FOR. However, the latter syntax is SQL-92 standard and thus more portable between different databases. In the SQL-92 form, the FROM clause is required.

NOTE
MID() is a synonym for SUBSTRING(),
takes the same arguments, and works in the same way. However, it’s not part of the SQL standard, and so is not likely to be supported in databases other
than MySQL.

You can use a variation on this function, SUBSTRING_INDEX(), to indicate a starting point for a substring in terms of where a given delimiter occurs within the string. Although it’s not standard and not found in any other widely used database, it can be extremely handy on occasion. The syntax for calling this function is as follows:

SUBSTRING_INDEX(string, delimiter, index)

Here, string represents the string from which you wish to extract a substring, delimiter the delimiter string, and index how many occurrences of the delimiter to count. If index is positive, the delimiter count is from the left, and everything to the left of the final delimiter is returned; if it is negative, the delimiter count is from the right, and the substring returned is taken from the right of this delimiter, as you can see in the following examples.

Another way to extract a substring is by using the RIGHT() function. It takes two arguments—a string and an integer length—and returns the rightmost length characters from string.

For locating a substring within a given string, you can use LOCATE() , INSTR() , and POSITION() . LOCATE() has this form:

LOCATE(substring, string[, position])

The function returns the position at which the first occurrence of substring is found in string. If position is specified, the function returns the position at which the first occurrence of substring is found in string following position.

INSTR() works like LOCATE() with two arguments, except that the order of the arguments is reversed.

The POSITION() function works similarly, except that it has the following syntax:

POSITION(substring IN string)

Prior to MySQL 4.0, the functions LOCATE(), INSTR(), and POSITION() were case-sensitive. Since then, they are case-sensitive only if at least one of the string or substring arguments is a binary string.

NOTE
POSITION() is standard SQL. LOCATE() is specific to MySQL. INSTR() is supported for compatibility with Oracle.

Now let’s look at some ways to modify strings. These are similar to some of the string-handling functions found in programming languages like Perl, PHP, Visual Basic, and others.

To change the case of a string to uppercase, use UCASE() or UPPER() . To convert it to lowercase, use LCASE() or LOWER() .

NOTE
UPPER() and LOWER() are standard SQL. UCASE() and LCASE() are for compatibility with Oracle.

A common task you’ll encounter when working with strings is trimming excess spaces from the beginning or end of a string (or both). LTRIM() returns a string with all leading spaces removed; RTRIM() returns a string with all trailing spaces removed. TRIM() is a bit more sophisticated; it can be used to trim all leading or trailing occurrences of any substring from a string. Here’s the complete syntax:

TRIM([[LEADING | TRAILING | BOTH] [substring] FROM] string)

The best way to explain how this function works is simply to provide a few examples. If substring is not specified, a space character is assumed:

If LEADING, TRAILING, or BOTH is not specified, BOTH is assumed:

The substring may be any group of one or more characters:

NOTE
All three trimming functions are standard SQL and are supported in most other widely used databases, including SQL Server, Oracle, and PostgreSQL.

LPAD() and RPAD() do more or less the opposite of LTRIM() and RTRIM(): they pad out a string to a given length with one or more padding characters. LPAD() and RPAD() have the same syntax:

If length is less than the length of string, then string is truncated to length and returned.

NOTE
LPAD() and RPAD() are nonstandard, but are also supported in Oracle. However, unlike with the Oracle versions of these functions, the padchars argument has no default value and must be specified.

To replace all occurrences of one substring with another within a given string, you can use the REPLACE() function, which has the following syntax:

REPLACE(string, substring_from, substring_to)

Here’s an example:

Finally, here are a few miscellaneous string functions you may find useful from time to time:

QUOTE(string): Returns a copy of string in which all quotation marks have been escaped.

REPEAT(string, count): Returns a new string containing string repeated count times. (Returns an empty string if count is less than 1.)

SPACE(count): Returns a string consisting of count spaces. (Returns an empty string if count is less than 1.)

STRCMP(string1, string2): Similar to the C language function of the same name, this returns –1 if string1 sorts before string2, 1 if the reverse is true, and 0 if the two strings are the same. It uses the current character set. This function is case-sensitive before MySQL 4.0; in MySQL 4.0 and later, it is case-insensitive unless at least one of the two strings is binary.

REVERSE(string): Returns a string with all of the characters in string in reverse order. For example, REVERSE (‘tram’) returns the string ‘mart’ .

We have not provided an exhaustive listing of all of the string functions supported in MySQL, but instead covered those we feel are likely to be useful in helping you move the processing of strings from your application code into your MySQL queries. For a complete list, see the MySQL Manual.

{mospagebreak title=Pattern-Matching and Regular Expressions}

An often-overlooked feature of MySQL is its ability to perform pattern-matching, including the use of using regular expressions. For simple pattern-matching, the LIKE operator is used, with the following syntax:

string LIKE pattern

This operator yields either a 1 (TRUE) or 0 (FALSE), depending on whether or not a match for pattern is found in string. The pattern argument supports two wildcards:

• _ (underscore character), for any single character
• % (percent sign), for any group of characters

These wildcards can be used at the beginning, end, or anywhere inside of pattern. Here are some examples:

NOTE
Unlike some other databases, MySQL does not support the grouping ([ ]) or negation (^) operators for use with the LIKE
operator.

MySQL also supports full-fledged regular expressions, similar to that implemented in a number of programming languages (using the POSIX 1003.2
standard syntax). A regular expression is a sequence of characters and/or special characters (known as pattern modifiers) that forms a pattern for which a match is sought in a string. In MySQL, you test for a match using the REGEXP operator
and this syntax:

string REGEXP pattern

This will evaluate as true if a match is found, and false if it is not.

Regular expressions as used in MySQL accept the pattern modifiers shown in Table 4-2.

Table 4-2. MySQL Regular Expression Pattern Modifiers

*

 MODIFIER* DESCRIPTION ^ Outside any group, marks the beginning of the string to be searched. Within a group, negates the pattern; for example, [^a-e] matches any character except the letters a, b, c, d, or e \$ End of the string to be searched . Matches any single character * Zero or more occurrences of the preceding group (equivalent to {0, })

Table 4-2. MySQL Regular Expression Pattern Modifiers (Continued)

 MODIFIER* DESCRIPTION + One or more occurrences of the preceding group (equivalent to {1, }) ? Zero or one occurrence of the preceding group (equivalent to {0, 1}) | Branch operator (OR); for example, dog|cat matches either dog or cat () Encloses an expression; for example, (aeiou) matches the sequence aeiou [] Encloses a set or range; for example, [1-4] matches any one of the digits 1, 2, 3, or 4; [c-g] matches any one of the letters c, d, e, f, or g; [aeiou] matches any one of the letters a, e, i, o, or u. It is possible to combine sets and ranges in a single group; for example, [0-4g] matches any one of the following: 0g, 1g, 2g, 3g, or 4g. {} Quantifies the preceding group; {N} indicates that the group must be repeated N times; {N, M} indicates that the group may be repeated from N to M times.

* To match a literal occurrence of any of the characters shown in the Modifier column, escape it with a backslash; for example, [why?] matches the string “why?” (including the question mark). To match a backslash, use a double backslash: \ .

The following are some examples of regular expressions.

MySQL doesn’t provide any direct means to return matched expressions (only that a match was or wasn’t found), but it’s still possible to use REGEXP in WHERE clauses to achieve the same effect, as in the next example.

The RLIKE operator is also supported as an alias to REGEXP , and it works in the same way.

TIP
For complete details on using regular expressions, see the Unix regexp(7) man page. If you’re a Windows user, you can find this on the Web at http://www.unusualresearch.com/regex/ regexmanpage.htm.