Home arrow MySQL arrow Page 2 - Administering MySQL: International Usage and Log Files

4.7.4 The Character Definition Arrays - MySQL

If you need to administer MySQL, this article gets you off to a good start. In this section, we discuss localization and international usage, as well as the MySQL log files. The sixth of a multi-part series, it is excerpted from chapter four of the book MySQL Administrator's Guide, written by Paul Dubois (Sams; ISBN: 0672326345).

  1. Administering MySQL: International Usage and Log Files
  2. 4.7.4 The Character Definition Arrays
  3. 4.8 The MySQL Log Files
  4. 4.8.4 The Binary Log
  5. 4.8.5 The Slow Query Log
By: Sams Publishing
Rating: starstarstarstarstar / 3
June 29, 2006

print this article



to_lower[] and to_upper[] are simple arrays that hold the lowercase and uppercase characters corresponding to each member of the character set. For example:

to_lower['A'] should contain 'a'
to_upper['a'] should contain 'A'

sort_order[] is a map indicating how characters should be ordered for comparison and sorting purposes. Quite often (but not for all character sets) this is the same as to_upper[], which means that sorting will be case-insensitive. MySQL will sort characters based on the values of sort_order[] elements. For more complicated sorting rules, see the discussion of string collating in Section 4.7.5, "String Collating Support."

ctype[] is an array of bit values, with one element for one character. (Note that to_lower[], to_upper[], and sort_order[] are indexed by character value, but ctype[] is indexed by character value + 1. This is an old legacy convention to be able to handle EOF.)

You can find the following bitmask definitions in m_ctype.h:

#define _U   01   /* Uppercase */
#define _L   02   /* Lowercase */
#define _N   04   /* Numeral (digit) */
#define _S   010   /* Spacing character */
#define _P   020   /* Punctuation */
#define _C   040   /* Control character */
#define _B   0100  /* Blank */
#define _X   0200  /* heXadecimal digit */

The ctype[] entry for each character should be the union of the applicable bitmask values that describe the character. For example, 'A' is an uppercase character (_U) as well as a hexadecimal digit (_X), so ctype['A'+1] should contain the value:

_U + _X = 01 + 0200 = 0201

4.7.5 String Collating Support

If the sorting rules for your language are too complex to be handled with the simple sort_order[] table, you need to use the string collating functions.

Right now the best documentation for this is the character sets that are already implemented. Look at the big5, czech, gbk, sjis, and tis160 character sets for examples.

You must specify the strxfrm_multiply_MYSET=N value in the special comment at the top of the file. N should be set to the maximum ratio the strings may grow during my_strxfrm_MYSET (it must be a positive integer).

4.7.6 Multi-Byte Character Support

If you want to add support for a new character set that includes multi-byte characters, you need to use the multi-byte character functions.

Right now the best documentation on this consists of the character sets that are already implemented. Look at the euc_kr, gb2312, gbk, sjis, and ujis character sets for examples. These are implemented in the ctype-'charset'.c files in the strings directory.

You must specify the mbmaxlen_MYSET=N value in the special comment at the top of the source file. N should be set to the size in bytes of the largest character in the set.

4.7.7 Problems with Character Sets

If you try to use a character set that is not compiled into your binary, you might run into the following problems:

  • Your program has an incorrect path to where the character sets are stored. (Default /usr/local/mysql/share/mysql/charsets). This can be fixed by using the --character-sets-dir option when you run the program in question.

  • The character set is a multi-byte character set that can't be loaded dynamically. In this case, you must recompile the program with support for the character set.

  • The character set is a dynamic character set, but you don't have a configure file for it. In this case, you should install the configure file for the character set from a new MySQL distribution.

  • If your Index file doesn't contain the name for the character set, your program will display the following error message:

    ERROR 1105: File '/usr/local/share/mysql/
    charsets/?.conf' not found (Errcode: 2)
  • In this case, you should either get a new Index file or manually add the name of any missing character sets to the current file.

For MyISAM tables, you can check the character set name and number for a table with myisamchk -dvv tbl_name.

>>> More MySQL Articles          >>> More By Sams Publishing

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Oracle Unveils MySQL 5.6
- MySQL Vulnerabilities Threaten Databases
- MySQL Cloud Options Expand with Google Cloud...
- MySQL 5.6 Prepped to Handle Demanding Web Use
- ScaleBase Service Virtualizes MySQL Databases
- Oracle Unveils MySQL Conversion Tools
- Akiban Opens Database Software for MySQL Use...
- Oracle Fixes MySQL Bug
- MySQL Databases Vulnerable to Password Hack
- MySQL: Overview of the ALTER TABLE Statement
- MySQL: How to Use the GRANT Statement
- MySQL: Creating, Listing, and Removing Datab...
- MySQL: Create, Show, and Describe Database T...
- MySQL Data and Table Types
- McAfee Releases Audit Plugin for MySQL Users

Developer Shed Affiliates


Dev Shed Tutorial Topics: