Indexes can greatly improve data access times. However, always keep in mind that indexes take up storage space and they have to be maintained. Here are some considerations related to the use of indexes to improve query performance:
- Avoid indexes on frequently updated columns. Creating an index on a column that is frequently updated doubles up the amount of writes required when the column is updated. Always remember that when column data is updated, the DBMS must also update any indexes that include that column.
- Create only selective indexes. Index selectivity is a ratio of the number of distinct values a column has divided by the number of rows in a table. For example, if a table has 1000 rows and a column has 800 distinct values, the selectivity of the index is 0.8, which is considered good. However, a column such as gender that only has two distinct values (M and F) has very poor selectivity (.002 in this case). Unique indexes always have a selectivity ratio of 1.0, which is the best possible. A good rule of thumb is to avoid indexes with a selectivity of less than 0.33 unless they are indexes especially designed for low selectivity such as bit-map indexes.
- Foreign key indexes improve joins. With most optimizers, an index on a foreign key column greatly improves join performance, and it can enable additional join methods for the optimizer to use.
- Index columns frequently used in predicates. For large tables, every query should contain a WHERE predicate that references an indexed column. Therefore, it is best to ﬁnd the columns that are most frequently referenced by predicates and to index them.
- Don't overindex. As a rule of thumb, don't create more than three or four indexes for any table. As already stated, indexes take up storage and must be maintained. Too many indexes on a table can cripple the performance of an INSERT or UPDATE issued against that table.
- Avoid overlapping indexes. Nearly every RDBMS can use an index even when the WHERE predicate references only the ﬁrst column of the index. Therefore, overlapping indexes (those that have the same leading column) are redundant and unnecessary.
- Consider unique indexes. With some RDBMSs, such as DB2, unique indexes are so superior that DBAs often add otherwise unnecessary columns to an index just to make it unique.
- Drop indexes for bulk loads. For mass loads of data into a table, consider dropping some of the indexes and re-creating them after the load is complete. This can save a substantial amount of time in some DBMSs.
Here are a few tuning considerations that are particular to the MySQL DBMS:
- Storage engine MySQL has a unique feature that provides multiple storage engines, one of which may be selected for each new table. These storage engines include MyISAM (a replacement for the original ISAM), HEAP, MERGE, InnoDB, and BDB (Berkeley DB). The differences are too involved to explain here, but there are several chapters of the MySQL Reference Manual devoted to them. Sufﬁce it to say that the choice of storage engine has a profound effect on the optimizer and the performance of SQL statements issued against the table.
- Hash indexes MySQL supports hash indexes where the data values are sent through a hashing algorithm before being added to the index. While this technique scatters sequentially assigned values so new rows of data end up more uniformly spread out in the index, hash indexes have the disadvantage of not being useful as a replacement for sorts because the index entries are not in key sequence.
Please check back next week for the conclusion of this article.