The string column types are listed in the following table:
When you choose a string datatype, consider the following factors:
The following discussion first describes the general differences between binary and nonbinary strings, and then the specific characteristics of each of the string column datatypes. 4.10.2.1 Binary and Nonbinary String CharacteristicsStrings in MySQL may be treated as binary or nonbinary. The two types are each most suited to different purposes. Binary strings have the following characteristics:
Nonbinary strings are associated with a character set. The character set affects interpretation of string contents and sorting as follows:
The preceding remarks regarding case and accent sensitivity are not absolute, just typical. A given character set can be defined with a collating order that's case or accent sensitive, or both. MySQL takes care to create character sets that correspond to the sorting order rules of different languages. String comparison rules are addressed in more detail in section 6.1.1, "Case Sensitivity in String Comparisons." The different treatment of binary and nonbinary strings in MySQL is important when it comes to choosing datatypes for table columns. If you want column values to be treated as case and accent insensitive, you should choose a nonbinary column type. Conversely, if you want case and accent sensitive values, choose a binary type. You should also choose a binary type for storing raw data values that consist of untyped bytes. The CHAR and VARCHAR string column types are nonbinary by default, but can be made binary by including the keyword BINARY in the column definition. Other string types are inherently binary or nonbinary. BLOB columns are always binary, whereas TEXT columns are always nonbinary. You can mix binary and nonbinary string columns within a single table. For example, assume that you want to create a table named auth_info, to store login name and password authorization information for an application. You want login names to match in any lettercase but passwords to be case sensitive. This statement would accomplish the task: CREATE TABLE auth_info ( login CHAR(16), # not case sensitive password CHAR(16) BINARY # case sensitive );4.10.2.2 The CHAR and VARCHAR Column Types The CHAR and VARCHAR column types hold strings up to the maximum length specified in the column definition. To define a column with either of these datatypes, provide the column name, the keyword CHAR or VARCHAR, the maximum length of acceptable values in parentheses, and possibly the keyword BINARY. The maximum length should be a number from 0 to 255. (One of the sample exercises at the end of this chapter discusses why you might declare a zero-length column.) By default, CHAR and VARCHAR columns contain nonbinary strings. The BINARY modifier causes the values they contain to be treated as binary strings. The CHAR datatype is a fixed-length type. Values in a CHAR column always take the same amount of storage. A column defined as CHAR(30), for example, requires 30 bytes for each value, even empty values. In contrast, VARCHAR is a variable-length datatype. A VARCHAR column takes only the number of bytes required to store each value, plus one byte per value to record the value's length. For MySQL 4.0, the length for CHAR and VARCHAR columns is measured in bytes, not characters. There's no difference for single-byte character sets, but the two measures are different for multi-byte character sets. In MySQL 4.1, this will change; column lengths will be measured in characters. For example, CHAR(30) will mean 30 characters, even for multi-byte character sets. 4.10.2.3 The BLOB and TEXT Column TypesThe BLOB and TEXT datatypes each come in four different sizes, differing in the maximum length of values they can store. All are variable-length types, so an individual value requires storage equal to the length (in bytes) of the value, plus 1 to 4 bytes to record the length of the value. The following table summarizes these datatypes; L represents the length of a given value.
BLOB column values are always binary and TEXT column values are always nonbinary. When deciding which of the two to choose for a column, you would normally base your decision on whether you want to treat column values as case sensitive or whether they contain raw bytes rather than characters. BLOB columns are more suitable for case-sensitive strings or for raw data such as images or compressed data. TEXT columns are more suitable for case- insensitive character strings such as textual descriptions. 4.10.2.4 The ENUM and SET Column TypesTwo of the string column types, ENUM and SET, are used when the values to be stored in a column are chosen from a fixed set of values. You define columns for both types in terms of string values, but MySQL represents them internally as integers. This leads to very efficient storage, but can have some surprising results unless you keep this string/integer duality in mind. ENUM is an enumeration type. An ENUM column definition includes a list of allowable values; each value in the list is called a "member" of the list. Every value stored in the column must equal one of the values in the list. A simple (and very common) use for ENUM is to create a two-element list for columns that store yes/no or true/false choices. The following table shows how to declare such columns: CREATE TABLE booleans
(
yesno ENUM('Y','N'),
truefalse ENUM('T','F')
);
Enumeration values aren't limited to being single letters or uppercase. The columns could also be defined like this: CREATE TABLE booleans
(
yesno ENUM('yes','no'),
truefalse ENUM('true','false')
);
An ENUM column definition may list up to 65,535 members. Enumerations with up to 255 members require one byte of storage per value. Enumerations with 256 to 65,535 members require two bytes per value. The following table contains an enumeration column continent that lists continent names as valid enumeration members: CREATE TABLE Countries
(
name char(30),
continent ENUM ('Asia','Europe','North America','Africa',
'Oceania','Antarctica','South America')
);
The values in an ENUM column definition are given as a comma-separated list of quoted strings. Internally, MySQL stores the strings as integers, using the values 1 through n for a column with n enumeration members. The following statement assigns the enumeration value 'Africa 'to the continent column; MySQL actually stores the value 4 because 'Africa 'is the fourth continent name listed in the enumeration definition: INSERT INTO Countries (name,continent) VALUES('Kenya','Africa');
MySQL reserves the internal value 0 as an implicit member of all ENUM columns. It's used to represent illegal values assigned to an enumeration column. For example, if you assign 'USA 'to the continent column, MySQL will store the value 0, rather than any of the values 1 through 7, because 'USA 'is not a valid enumeration member. If you select the column later, MySQL displays 0 values as ''(the empty string). The SET datatype, like ENUM, is declared using a comma-separated list of quoted strings that define its valid members. But unlike ENUM, a given SET column may be assigned a value consisting of any combination of those members. The following definition contains a list of symptoms exhibited by allergy sufferers: CREATE TABLE allergy
(
symptom SET('sneezing','runny nose','stuffy head','red eyes')
);
A patient may have any or all (or none) of these symptoms, and symptom values therefore might contain zero to four individual SET members, separated by commas. The following statements set the symptom column to the empty string (no SET members), a single SET member, and multiple SET members, respectively: INSERT INTO allergy (symptom) VALUES('');
INSERT INTO allergy (symptom) VALUES('stuffy head');
INSERT INTO allergy (symptom) VALUES('sneezing,red eyes');
MySQL represents SET columns as a bitmap using one bit per member, so the elements in the symptom definition have internal values of 1, 2, 4, and 8 (that is, they have the values of bits 0 through 3 in a byte). Internally, MySQL stores the values shown in the preceding INSERT statements as 0 (no bits set), 4 (bit 2 set), and 9 (bits 0 and 3 set; that is, 1 plus 8). A SET definition may contain up to 64 members. The internal storage required for set values varies depending on the number of SET elements (1, 2, 3, 4, or 8 bytes for sets of up to 8, 16, 24, 32, or 64 members). If you try to store an invalid list member into a SET column, it's ignored because it does not correspond to any bit in the column definition. For example, setting a symptom value to 'coughing,sneezing,wheezing 'results in an internal value of 1 ( 'sneezing '). The 'coughing 'and 'wheezing 'elements are ignored because they aren't listed in the column definition as legal set members. As mentioned earlier in this section, the conversion between string and numeric representations of ENUM and SET values can result in surprises if you aren't careful. For example, although you would normally refer to an enumeration column using the string forms of its values, you can also use the internal numeric values. The effect of this can be very subtle if the string values look like numbers. Suppose that you define a table t like this: CREATE TABLE t (age INT, siblings ENUM('0','1','2','3','>3'));
In this case, the enumeration values are the strings '0 ', '1 ', '2 ', '3 ', and '>3 ', and the matching internal numeric values are 1, 2, 3, 4, and 5, respectively. Now suppose that you issue the following statement: INSERT INTO t (age,siblings) VALUES(14,'3'); The siblings value is specified here as the string '3 ', and that is the value assigned to the column in the new record. However, you can also specify the siblings value as a number, as follows: INSERT INTO t (age,siblings) VALUES(14,3); But in this case, 3 is interpreted as the internal value, which corresponds to the enumeration value '2 '! The same principle applies to retrievals. Consider the following two statements: SELECT * FROM t WHERE siblings = '3'; SELECT * FROM t WHERE siblings = 3; In the first case, you get records that have an enumeration value of '3 '. In the second case, you get records where the internal value is 3; that is, records with an enumeration value of '2 '.
blog comments powered by Disqus |
|
|
|
|
|
|
|