While optimization is possible with limited knowledge of your system or application, the more you know about your system, the better your optimization will be. This article, the second of two parts, covers some of the different points you will need to know for optimizing MySQL. It is excerpted from chapter six of the book MySQL Administrator's Guide, by MySQL AB (Sams, 2004; ISBN: 0672326345).
The following list indicates some of the ways that the mysqld server uses memory. Where applicable, the name of the system variable relevant to the memory use is given:
The key buffer (variable key_buffer_size) is shared by all threads; other buffers used by the server are allocated as needed. See Section 6.5.2, "Tuning Server Parameters."
Each connection uses some thread-specific space:
A stack (default 64KB, variable thread_stack)
A connection buffer (variable net_buffer_length)
A result buffer (variable net_buffer_length)
The connection buffer and result buffer are dynamically enlarged up to max_allowed_packet when needed. While a query is running, a copy of the current query string is also allocated.
All threads share the same base memory.
Only compressed ISAM and MyISAM tables are memory mapped. This is because the 32-bit memory space of 4GB is not large enough for most big tables. When systems with a 64-bit address space become more common, we may add general support for memory mapping.
Each request that performs a sequential scan of a table allocates a read buffer (variable read_buffer_size).
When reading rows in "random" order (for example, after a sort), a random-read buffer may be allocated to avoid disk seeks. (variable read_rnd_buffer_size).
All joins are done in one pass, and most joins can be done without even using a temporary table. Most temporary tables are memory-based (HEAP) tables. Temporary tables with a large record length (calculated as the sum of all column lengths) or that contain BLOB columns are stored on disk.
One problem before MySQL 3.23.2 is that if an internal in-memory heap table exceeds the size of tmp_table_size, the error The table tbl_name is full occurs. From 3.23.2 on, this is handled automatically by changing the in-memory heap table to a disk-based MyISAM table as necessary. To work around this problem for older servers, you can increase the temporary table size by setting the tmp_table_size option to mysqld, or by setting the SQL option SQL_BIG_TABLES in the client program.
In MySQL 3.20, the maximum size of the temporary table is record_buffer*16; if you are using this version, you have to increase the value of record_buffer. You can also start mysqld with the --big-tables option to always store temporary tables on disk. However, this will affect the speed of many complicated queries.
Most requests that perform a sort allocate a sort buffer and zero to two temporary files depending on the result set size. See Section A.4.4, "Where MySQL Stores Temporary Files."
Almost all parsing and calculating is done in a local memory store. No memory overhead is needed for small items, so the normal slow memory allocation and freeing is avoided. Memory is allocated only for unexpectedly large strings; this is done with malloc() and free().
For each MyISAM and ISAM table that is opened, the index file is opened once and the data file is opened once for each concurrently running thread. For each concurrent thread, a table structure, column structures for each column, and a buffer of size 3 * N are allocated (where N is the maximum row length, not counting BLOB columns). A BLOB column requires five to eight bytes plus the length of the BLOB data. The MyISAM and ISAM storage engines maintain one extra row buffer for internal use.
For each table having BLOB columns, a buffer is enlarged dynamically to read in larger BLOB values. If you scan a table, a buffer as large as the largest BLOB value is allocated.
Handler structures for all in-use tables are saved in a cache and managed as a FIFO. By default, the cache has 64 entries. If a table has been used by two running threads at the same time, the cache contains two entries for the table. See Section 6.4.8, "How MySQL Opens and Closes Tables."
A FLUSH TABLES statement or mysqladmin flush-tables command closes all tables that are not in use and marks all in-use tables to be closed when the currently executing thread finishes. This effectively frees most in-use memory.
ps and other system status programs may report that mysqld uses a lot of memory. This may be caused by thread stacks on different memory addresses. For example, the Solaris version of ps counts the unused memory between stacks as used memory. You can verify this by checking available swap with swap -s. We have tested mysqld with several memory-leakage detectors (both commercial and open source), so there should be no memory leaks.
6.5.5 How MySQL Uses DNS
When a new client connects to mysqld, mysqld spawns a new thread to handle the request. This thread first checks whether the hostname is in the hostname cache. If not, the thread attempts to resolve the hostname:
If the operating system supports the thread-safe gethostbyaddr_r() and gethostbyname_r() calls, the thread uses them to perform hostname resolution.
If the operating system doesn't support the thread-safe calls, the thread locks a mutex and calls gethostbyaddr() and gethostbyname() instead. In this case, no other thread can resolve hostnames that are not in the hostname cache until the first thread unlocks the mutex.
You can disable DNS hostname lookups by starting mysqld with the --skip-name-resolve option. However, in this case, you can use only IP numbers in the MySQL grant tables.
If you have a very slow DNS and many hosts, you can get more performance by either disabling DNS lookups with --skip-name-resolve or by increasing the HOST_CACHE_SIZE define (default value: 128) and recompiling mysqld.
You can disable the hostname cache by starting the server with the --skip-host-cache option. To clear the hostname cache, issue a FLUSH HOSTS statement or execute the mysqladmin flush-hosts command.
If you want to disallow TCP/IP connections entirely, start mysqld with the --skip-networking option.
6.6 Disk Issues
Disk seeks are a big performance bottleneck. This problem becomes more apparent when the amount of data starts to grow so large that effective caching becomes impossible. For large databases where you access data more or less randomly, you can be sure that you will need at least one disk seek to read and a couple of disk seeks to write things. To minimize this problem, use disks with low seek times.
Increase the number of available disk spindles (and thereby reduce the seek overhead) by either symlinking files to different disks or striping the disks:
Using symbolic links
This means that, for MyISAM tables, you symlink the index file and/or data file from their usual location in the data directory to another disk (that may also be striped). This makes both the seek and read times better, assuming that the disk is not used for other purposes as well. See Section 6.6.1, "Using Symbolic Links."
Striping means that you have many disks and put the first block on the first disk, the second block on the second disk, and the Nth block on the (N mod number_of_disks) disk, and so on. This means if your normal data size is less than the stripe size (or perfectly aligned), you will get much better performance. Striping is very dependent on the operating system and the stripe size, so benchmark your application with different stripe sizes. See Section 6.1.5, "Using Your Own Benchmarks."
The speed difference for striping is very dependent on the parameters. Depending on how you set the striping parameters and number of disks, you may get differences measured in orders of magnitude. You have to choose to optimize for random or sequential access.
For reliability you may want to use RAID 0+1 (striping plus mirroring), but in this case, you will need 2*N drives to hold N drives of data. This is probably the best option if you have the money for it! However, you may also have to invest in some volume-management software to handle it efficiently.
A good option is to vary the RAID level according to how critical a type of data is. For example, store semi-important data that can be regenerated on a RAID 0 disk, but store really important data such as host information and logs on a RAID 0+1 or RAID N disk. RAID N can be a problem if you have many writes, due to the time required to update the parity bits.
On Linux, you can get much more performance by using hdparm to configure your disk's interface. (Up to 100% under load is not uncommon.) The following hdparm options should be quite good for MySQL, and probably for many other applications:
hdparm -m 16 -d 1
Note that performance and reliability when using this command depends on your hardware, so we strongly suggest that you test your system thoroughly after using hdparm. Please consult the hdparm man page for more information. If hdparm is not used wisely, filesystem corruption may result, so back up everything before experimenting!
You can also set the parameters for the filesystem that the database uses:
If you don't need to know when files were last accessed (which is not really useful on a database server), you can mount your filesystems with the -o noatime option. That skips updates to the last access time in inodes on the filesystem, which avoids some disk seeks.
On many operating systems, you can set a filesystem to be updated asynchronously by mounting it with the -o async option. If your computer is reasonably stable, this should give you more performance without sacrificing too much reliability. (This flag is on by default on Linux.)
6.6.1 Using Symbolic Links
You can move tables and databases from the database directory to other locations and replace them with symbolic links to the new locations. You might want to do this, for example, to move a database to a file system with more free space or increase the speed of your system by spreading your tables to a different disk.
The recommended way to do this is to just symlink databases to a different disk. Symlink tables only as a last resort.