Home arrow MySQL arrow Page 9 - Data Definition Language, Part 1

4.10.2 String Column Types - MySQL

Studying for the MySQL Certification exam? This article, the first of two parts, covers roughly 10 percent of the material that will appear on the exam, including general database and table properties, storage engines and table types, and more. It is excerpted from chapter four of the book MySQL Certification Guide written by Paul Dubois et. al. (Sams, 2004, ISBN: 0672326329).

TABLE OF CONTENTS:
  1. Data Definition Language, Part 1
  2. 4.1 General Database and Table Properties
  3. 4.3 Limits on Number and Size of Database Components
  4. 4.4 Identifier Syntax
  5. 4.5 CREATE DATABASE and DROP DATABASE
  6. 4.7 DROP TABLE
  7. 4.9 Creating and Dropping Indexes
  8. 4.10 Column Types
  9. 4.10.2 String Column Types
  10. 4.10.3 Date and Time Column Types
  11. 4.10.4 Column Options
  12. 4.10.5 Using the AUTO_INCREMENT Column Option
  13. 4.10.6 Automatic Type Conversion and Value Clipping
By: Sams Publishing
Rating: starstarstarstarstar / 41
January 19, 2005

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

The string column types are listed in the following table:

Type

Description

CHAR

Fixed-length string

VARCHAR

Variable-length string

BLOB

Variable-length binary string

TEXT

Variable-length nonbinary string

ENUM

Enumeration consisting of a fixed set of legal values

SET

Set consisting of a fixed set of legal values


When you choose a string datatype, consider the following factors:

  • The maximum length of values you need to store.

  • Whether to use a fixed or variable amount of storage.

  • Whether you need to store binary or nonbinary strings.

  • The number of distinct values required; ENUM or SET may be useful if the set of values is fixed.

The following discussion first describes the general differences between binary and nonbinary strings, and then the specific characteristics of each of the string column datatypes.

4.10.2.1 Binary and Nonbinary String Characteristics

Strings in MySQL may be treated as binary or nonbinary. The two types are each most suited to different purposes.

Binary strings have the following characteristics:

  • A binary string is treated as a string of byte values.

  • Comparisons of binary strings are performed on the basis of those byte values. This has the following implications:

    • Uppercase and lowercase versions of a given character have different byte values, so binary string comparisons are case sensitive.

    • Versions of a character that have different accent marks have different byte values, so binary string comparisons are also accent sensitive.

  • A multi-byte character, if stored as a binary string, is treated simply as multiple individual bytes. Character boundaries of the original data no longer apply.

Nonbinary strings are associated with a character set. The character set affects interpretation of string contents and sorting as follows:

  • A nonbinary string is a string of characters, all of which must belong to a specific character set. Characters may consist of a single byte, or multiple bytes if the character set allows it. For example, each character in the MySQL default character set (Latin-1, also known as ISO-8859-1) requires one byte to store. In contrast, the Japanese SJIS character set contains so many characters that they cannot all be represented in a single byte, so each character requires multiple bytes to store.

  • Nonbinary comparisons are based on the collating (sorting) order of the character set associated with the string.

  • Collating orders, or collations, sometimes treat uppercase and lowercase versions of a given character as equivalent. This means that comparisons using such collations are not case sensitive, so that, for example, 'ABC ', 'Abc ', and 'abc 'are all considered equal.

  • Collations may also treat a given character with different accent marks as equivalent. The result is that comparisons of nonbinary strings may not be accent sensitive. For example, an a with no accent may be considered the same as the and characters.

  • Multi-byte character comparisons are performed in character units, not in byte units.

The preceding remarks regarding case and accent sensitivity are not absolute, just typical. A given character set can be defined with a collating order that's case or accent sensitive, or both. MySQL takes care to create character sets that correspond to the sorting order rules of different languages.

String comparison rules are addressed in more detail in section 6.1.1, "Case Sensitivity in String Comparisons."

The different treatment of binary and nonbinary strings in MySQL is important when it comes to choosing datatypes for table columns. If you want column values to be treated as case and accent insensitive, you should choose a nonbinary column type. Conversely, if you want case and accent sensitive values, choose a binary type. You should also choose a binary type for storing raw data values that consist of untyped bytes.

The CHAR and VARCHAR string column types are nonbinary by default, but can be made binary by including the keyword BINARY in the column definition. Other string types are inherently binary or nonbinary. BLOB columns are always binary, whereas TEXT columns are always nonbinary.

You can mix binary and nonbinary string columns within a single table. For example, assume that you want to create a table named auth_info, to store login name and password authorization information for an application. You want login names to match in any lettercase but passwords to be case sensitive. This statement would accomplish the task:

CREATE TABLE auth_info
(
login  CHAR(16),     # not case sensitive
password CHAR(16) BINARY  # case sensitive
);
4.10.2.2 The CHAR and VARCHAR Column Types

The CHAR and VARCHAR column types hold strings up to the maximum length specified in the column definition. To define a column with either of these datatypes, provide the column name, the keyword CHAR or VARCHAR, the maximum length of acceptable values in parentheses, and possibly the keyword BINARY. The maximum length should be a number from 0 to 255. (One of the sample exercises at the end of this chapter discusses why you might declare a zero-length column.) By default, CHAR and VARCHAR columns contain nonbinary strings. The BINARY modifier causes the values they contain to be treated as binary strings.

The CHAR datatype is a fixed-length type. Values in a CHAR column always take the same amount of storage. A column defined as CHAR(30), for example, requires 30 bytes for each value, even empty values. In contrast, VARCHAR is a variable-length datatype. A VARCHAR column takes only the number of bytes required to store each value, plus one byte per value to record the value's length.

For MySQL 4.0, the length for CHAR and VARCHAR columns is measured in bytes, not characters. There's no difference for single-byte character sets, but the two measures are different for multi-byte character sets. In MySQL 4.1, this will change; column lengths will be measured in characters. For example, CHAR(30) will mean 30 characters, even for multi-byte character sets.

4.10.2.3 The BLOB and TEXT Column Types

The BLOB and TEXT datatypes each come in four different sizes, differing in the maximum length of values they can store. All are variable-length types, so an individual value requires storage equal to the length (in bytes) of the value, plus 1 to 4 bytes to record the length of the value. The following table summarizes these datatypes; L represents the length of a given value.

Type

Storage Required

Maximum Length

TINYBLOB, TINYTEXT

L + 1 byte

255 bytes

BLOB, TEXT

L + 2 bytes

65,535 bytes

MEDIUMBLOB, MEDIUMTEXT

L + 3 bytes

16,777,215 bytes

LONGBLOB, LONGTEXT

L + 4 bytes

4,294,967,295 bytes


BLOB column values are always binary and TEXT column values are always nonbinary. When deciding which of the two to choose for a column, you would normally base your decision on whether you want to treat column values as case sensitive or whether they contain raw bytes rather than characters. BLOB columns are more suitable for case-sensitive strings or for raw data such as images or compressed data. TEXT columns are more suitable for case- insensitive character strings such as textual descriptions.

4.10.2.4 The ENUM and SET Column Types

Two of the string column types, ENUM and SET, are used when the values to be stored in a column are chosen from a fixed set of values. You define columns for both types in terms of string values, but MySQL represents them internally as integers. This leads to very efficient storage, but can have some surprising results unless you keep this string/integer duality in mind.

ENUM is an enumeration type. An ENUM column definition includes a list of allowable values; each value in the list is called a "member" of the list. Every value stored in the column must equal one of the values in the list. A simple (and very common) use for ENUM is to create a two-element list for columns that store yes/no or true/false choices. The following table shows how to declare such columns:

CREATE TABLE booleans
(
yesno   ENUM('Y','N'),
truefalse ENUM('T','F')
);

Enumeration values aren't limited to being single letters or uppercase. The columns could also be defined like this:

CREATE TABLE booleans
(
yesno   ENUM('yes','no'),
truefalse ENUM('true','false')
);

An ENUM column definition may list up to 65,535 members. Enumerations with up to 255 members require one byte of storage per value. Enumerations with 256 to 65,535 members require two bytes per value. The following table contains an enumeration column continent that lists continent names as valid enumeration members:

CREATE TABLE Countries
(
name char(30),
continent ENUM ('Asia','Europe','North America','Africa',
'Oceania','Antarctica','South America')
);

The values in an ENUM column definition are given as a comma-separated list of quoted strings. Internally, MySQL stores the strings as integers, using the values 1 through n for a column with n enumeration members. The following statement assigns the enumeration value 'Africa 'to the continent column; MySQL actually stores the value 4 because 'Africa 'is the fourth continent name listed in the enumeration definition:

INSERT INTO Countries (name,continent) VALUES('Kenya','Africa');

MySQL reserves the internal value 0 as an implicit member of all ENUM columns. It's used to represent illegal values assigned to an enumeration column. For example, if you assign 'USA 'to the continent column, MySQL will store the value 0, rather than any of the values 1 through 7, because 'USA 'is not a valid enumeration member. If you select the column later, MySQL displays 0 values as ''(the empty string).

The SET datatype, like ENUM, is declared using a comma-separated list of quoted strings that define its valid members. But unlike ENUM, a given SET column may be assigned a value consisting of any combination of those members. The following definition contains a list of symptoms exhibited by allergy sufferers:

CREATE TABLE allergy
(
symptom SET('sneezing','runny nose','stuffy head','red eyes')
);

A patient may have any or all (or none) of these symptoms, and symptom values therefore might contain zero to four individual SET members, separated by commas. The following statements set the symptom column to the empty string (no SET members), a single SET member, and multiple SET members, respectively:

INSERT INTO allergy (symptom) VALUES('');
INSERT INTO allergy (symptom) VALUES('stuffy head');
INSERT INTO allergy (symptom) VALUES('sneezing,red eyes');

MySQL represents SET columns as a bitmap using one bit per member, so the elements in the symptom definition have internal values of 1, 2, 4, and 8 (that is, they have the values of bits 0 through 3 in a byte). Internally, MySQL stores the values shown in the preceding INSERT statements as 0 (no bits set), 4 (bit 2 set), and 9 (bits 0 and 3 set; that is, 1 plus 8).

A SET definition may contain up to 64 members. The internal storage required for set values varies depending on the number of SET elements (1, 2, 3, 4, or 8 bytes for sets of up to 8, 16, 24, 32, or 64 members).

If you try to store an invalid list member into a SET column, it's ignored because it does not correspond to any bit in the column definition. For example, setting a symptom value to 'coughing,sneezing,wheezing 'results in an internal value of 1 ( 'sneezing '). The 'coughing 'and 'wheezing 'elements are ignored because they aren't listed in the column definition as legal set members.

As mentioned earlier in this section, the conversion between string and numeric representations of ENUM and SET values can result in surprises if you aren't careful. For example, although you would normally refer to an enumeration column using the string forms of its values, you can also use the internal numeric values. The effect of this can be very subtle if the string values look like numbers. Suppose that you define a table t like this:

CREATE TABLE t (age INT, siblings ENUM('0','1','2','3','>3'));

In this case, the enumeration values are the strings '0 ', '1 ', '2 ', '3 ', and '>3 ', and the matching internal numeric values are 1, 2, 3, 4, and 5, respectively. Now suppose that you issue the following statement:

INSERT INTO t (age,siblings) VALUES(14,'3');

The siblings value is specified here as the string '3 ', and that is the value assigned to the column in the new record. However, you can also specify the siblings value as a number, as follows:

INSERT INTO t (age,siblings) VALUES(14,3);

But in this case, 3 is interpreted as the internal value, which corresponds to the enumeration value '2 '! The same principle applies to retrievals. Consider the following two statements:

SELECT * FROM t WHERE siblings = '3';
SELECT * FROM t WHERE siblings = 3;

In the first case, you get records that have an enumeration value of '3 '. In the second case, you get records where the internal value is 3; that is, records with an enumeration value of '2 '.

This chapter excerpt is from MySQL Certification Guide by Paul Dubois et al. (Sams, 2004, ISBN: 0672326329 ). Check it out at your favorite bookstore today. Buy this book now.



 
 
>>> More MySQL Articles          >>> More By Sams Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

MYSQL ARTICLES

- Oracle Unveils MySQL 5.6
- MySQL Vulnerabilities Threaten Databases
- MySQL Cloud Options Expand with Google Cloud...
- MySQL 5.6 Prepped to Handle Demanding Web Use
- ScaleBase Service Virtualizes MySQL Databases
- Oracle Unveils MySQL Conversion Tools
- Akiban Opens Database Software for MySQL Use...
- Oracle Fixes MySQL Bug
- MySQL Databases Vulnerable to Password Hack
- MySQL: Overview of the ALTER TABLE Statement
- MySQL: How to Use the GRANT Statement
- MySQL: Creating, Listing, and Removing Datab...
- MySQL: Create, Show, and Describe Database T...
- MySQL Data and Table Types
- McAfee Releases Audit Plugin for MySQL Users

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: