This section lists a number of miscellaneous tips for improving query processing speed:
Use persistent connections to the database to avoid connection overhead. If you can't use persistent connections and you are initiating many new connections to the database, you may want to change the value of the thread_cache_size variable. See Section 6.5.2, "Tuning Server Parameters."
Always check whether all your queries really use the indexes you have created in the tables. In MySQL, you can do this with the EXPLAIN statement. See Section 6.2.1, "EXPLAIN Syntax (Get Information About a SELECT)."
Try to avoid complex SELECT queries on MyISAM tables that are updated frequently, to avoid problems with table locking that occur due to contention between readers and writers.
With MyISAM tables that have no deleted rows, you can insert rows at the end at the same time that another query is reading from the table. If this is important for you, you should consider using the table in ways that avoid deleting rows. Another possibility is to run OPTIMIZE TABLE after you have deleted a lot of rows.
Use ALTER TABLE ... ORDER BY expr1, expr2, ... if you mostly retrieve rows in expr1, expr2, ... order. By using this option after extensive changes to the table, you may be able to get higher performance.
- In some cases, it may make sense to introduce a column that is "hashed" based on information from other columns. If this column is short and reasonably unique, it may be much faster than a big index on many columns. In MySQL, it's very easy to use this extra column:
SELECT * FROM tbl_name
AND col1='constant' AND col2='constant';
For MyISAM tables that change a lot, you should try to avoid all variable-length columns (VARCHAR, BLOB, and TEXT). The table will use dynamic record format if it includes even a single variable-length column. See Chapter 8, "MySQL Storage Engines and Table Types."
It's normally not useful to split a table into different tables just because the rows get "big." To access a row, the biggest performance hit is the disk seek to find the first byte of the row. After finding the data, most modern disks can read the whole row fast enough for most applications. The only cases where it really matters to split up a table is if it's a MyISAM table with dynamic record format (see above) that you can change to a fixed record size, or if you very often need to scan the table but do not need most of the columns. See Chapter 8, "MySQL Storage Engines and Table Types."
- If you very often need to calculate results such as counts based on information from a lot of rows, it's probably much better to introduce a new table and update the counter in real time. An update of the following form is very fast:
UPDATE tbl_name SET count_col=count_col+1 WHERE key_col=constant;
This is really important when you use MySQL storage engines such as MyISAM and ISAM that have only table-level locking (multiple readers / single writers). This will also give better performance with most databases, because the row locking manager in this case will have less to do.
If you need to collect statistics from large log tables, use summary tables instead of scanning the entire log table. Maintaining the summaries should be much faster than trying to calculate statistics "live." It's much faster to regenerate new summary tables from the logs when things change (depending on business decisions) than to have to change the running application!
If possible, you should classify reports as "live" or "statistical," where data needed for statistical reports is created only from summary tables that are generated periodically from the live data.
Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL needs to do and improves the insert speed.
In some cases, it's convenient to pack and store data into a BLOB column. In this case, you must add some extra code in your application to pack and unpack information in the BLOB values, but this may save a lot of accesses at some stage. This is practical when you have data that doesn't conform to a rows-and-columns table structure.
Normally, you should try to keep all data non-redundant (what is called "third normal form" in database theory). However, do not be afraid to duplicate information or create summary tables if necessary to gain more speed.
Stored procedures or UDFs (user-defined functions) may be a good way to get more performance for some tasks. However, if you use a database system that does not support these capabilities, you should always have another way to perform the same tasks, even if the alternative method is slower.
You can always gain something by caching queries or answers in your application and then performing many inserts or updates together. If your database supports table locks (like MySQL and Oracle), this should help to ensure that the index cache is only flushed once after all updates.
Use INSERT DELAYED when you do not need to know when your data is written. This speeds things up because many records can be written with a single disk write.
Use INSERT LOW_PRIORITY when you want to give SELECT statements higher priority than your inserts.
Use SELECT HIGH_PRIORITY to get retrievals that jump the queue. That is, the SELECT is done even if there is another client waiting to do a write.
Use multiple-row INSERT statements to store many rows with one SQL statement (many SQL servers support this).
Use LOAD DATA INFILE to load large amounts of data. This is faster than using INSERT statements.
Use AUTO_INCREMENT columns to generate unique values.
Use OPTIMIZE TABLE once in a while to avoid fragmentation with MyISAM tables when using a dynamic table format. See Section 8.1.3, "MyISAM Table Storage Formats."
Use HEAP tables when possible to get more speed. See Chapter 8, "MySQL Storage Engines and Table Types."
When using a normal Web server setup, images should be stored as files. That is, store only a file reference in the database. The main reason for this is that a normal Web server is much better at caching files than database contents, so it's much easier to get a fast system if you are using files.
Use in-memory tables for non-critical data that is accessed often, such as information about the last displayed banner for users who don't have cookies enabled in their Web browser.
Columns with identical information in different tables should be declared to have identical data types. Before MySQL 3.23, you get slow joins otherwise.
Try to keep column names simple. For example, in a table named customer, use a column name of name instead of customer_name. To make your names portable to other SQL servers, you should keep them shorter than 18 characters.
If you need really high speed, you should take a look at the low-level interfaces for data storage that the different SQL servers support! For example, by accessing the MySQL MyISAM storage engine directly, you could get a speed increase of two to five times compared to using the SQL interface. To be able to do this, the data must be on the same server as the application, and usually it should only be accessed by one process (because external file locking is really slow). One could eliminate these problems by introducing low-level MyISAM commands in the MySQL server (this could be one easy way to get more performance if needed). By carefully designing the database interface, it should be quite easy to support this type of optimization.
If you are using numerical data, it's faster in many cases to access information from a database (using a live connection) than to access a text file. Information in the database is likely to be stored in a more compact format than in the text file, so accessing it will involve fewer disk accesses. You will also save code in your application because you don't have to parse your text files to find line and column boundaries.
Replication can provide a performance benefit for some operations. You can distribute client retrievals among replication servers to split up the load. To avoid slowing down the master while making backups, you can make backups using a slave server. See Chapter 5, "Replication in MySQL."
Declaring a MyISAM table with the DELAY_KEY_WRITE=1 table option makes index updates faster because they are not flushed to disk until the table is closed. The downside is that if something kills the server while such tables are open, you should ensure that they are okay by running the server with the --myisam-recover option, or by running myisamchk before restarting the server. (However, even in this case, you should not lose anything by using DELAY_KEY_WRITE, because the key information can always be generated from the data rows.)