SQL Performance and Tuning Considerations, concluded

This article, the second of two parts, will show you how to make the SQL statements you write run faster and more efficiently. It is excerpted from chapter 11 of the book SQL DeMYSTiFied, written by Andy Oppel (McGraw-Hill/Osborne, 2005; ISBN: 0072262249).

Oracle Considerations

This section covers some tuning considerations that are specific to Oracle databases.

Oracle Index Considerations

Oracle provides some additional indexing options that are worth consideration:

  • Function-based indexes  Normally a predicate such as WHERE UPPER(MOVIE_TITLE) LIKE ‘BIG%’ precludes the use of an index. However, Oracle allows an index to be created on a function, such as UPPER(MOVIE_TITLE), which then is usable for queries that reference the function. There are several prerequisites for the use of this option, so you may need some help from your Oracle DBA before you can use it. The syntax for a function-based index simply uses the function specification instead of the column name in the ON clause of the CREATE INDEX statement:

    CREATE INDEX IX_MOVIE_TITLE_UPPER
     
    ON MOVIE (UPPER(MOVIE_TITLE));
  • Bit-map indexes  Oracle bit-map indexes are designed to handle columns where the cardinality is low (that is, where there are relatively few data values among many rows). A bit-map index contains records that have one bit for every row in the indexed table and one index record for every possible data value in the indexed column. A bit is “on” (a binary 1) if the corresponding row in the table contains the value that the index record represents, and “off” (a binary 0) if not. For example, if the MOVIE table contained 5000 rows and you built a bit-map index on the MPAA_RATING_CODE column, the index would contain records that were 5000 bits (roughly 625 bytes) in size, and there would be 6 such records (one for each value of MPAA_RATING_CODE). The first index record would represent the first MPAA_RATING_CODE value (“G”) and bits in that record would be “on” when the corresponding row in the MOVIE table had a rating of “G”. The DBMS can use matrix algebra to quickly find desired data rows, particularly when the search predicates reference multiple columns that have bit-map indexes. A bit-map index is created using the normal CREATE INDEX syntax with the keyword BITMAP used instead of the keyword UNIQUE:

    CREATE BITMAP INDEX IX_MOVIE_MPAA_RATING
     
    ON MOVIE (MPAA_RATING_CODE);
  • Index organized tables  It’s a good practice to create an index on the primary key of every table. However, for tables with only a few columns, such as reference tables and small intersection tables, this always seems wasteful because most or all of the data in the table is repeated in the index. Oracle provides a nice solution that allows the entire table to be stored in the index. Essentially, an index organized table (IOT) is a primary key index and a table rolled into a single structure. While you can create additional indexes on an IOT, in those cases you might be better off using a conventional table. Here is an example that creates a reference table for media formats as an IOT:

    CREATE TABLE MEDIA_FORMAT (
     
    MEDIA_FORMAT_CODE    CHAR(1),
     
    MEDIA_FORMAT_DESC    VARCHAR(50),
     
    CONSTRAINT PK_MEDIA_FORMAT
          PRIMARY KEY (MEDIA_FORMAT_CODE))
      ORGANIZATION INDEX;

{mospagebreak title=Using EXPLAIN PLAN}

In Oracle, the SQL EXPLAIN PLAN statement analyzes an SQL statement and posts analysis results to a special plan table. The plan table must be created exactly as specified by Oracle, so it is best to use the script they provide for this purpose, which can be found under the Oracle Home directory as  /RDBMS/ADMIN/ catplan.sql. After running the EXPLAIN PLAN statement, you must then retrieve the results from the plan table using a SELECT statement. Fortunately, Oracle’s Enterprise Manager has a GUI version available that makes query tuning a lot easier.

Here is an example of an EXPLAIN PLAN statement:

EXPLAIN PLAN
 
SET STATEMENT_ID = ‘STMT_1′
 
FOR
 
SELECT MOVIE_ID, 
        MOVIE_GENRE.MOVIE_GENRE_DESCRIPTION AS GENRE, 
        
MOVIE.MPAA_RATING_CODE AS RATING,
        MPAA_RATING.MPAA_RATING_DESCRIPTION AS RATING_DESC
  FROM MOVIE
      
JOIN MOVIE_GENRE ON MOVIE.MOVIE_GENRE_CODE =
            MOVIE_GENRE.MOVIE_GENRE_CODE

       JOIN MPAA_RATING ON MOVIE.MPAA_RATING_CODE =

            MPAA_RATING.MPAA_RATING_CODE
  WHERE MOVIE_ID < 6
  ORDER BY MOVIE_ID;
Explained.

NOTE:

  • STATEMENT_ID is any character string that the statement author wishes to use to identify the explain results. This feature allows multiple explains to be run with the STATEMENT_ID as the identifier that keeps the information about each execution plan separate in the plan table.
  • The statement to be explained follows the FOR keyword and can be any valid SQL statement.
  • When the EXPLAIN PLAN statement is run, the SQL statement is not actually executed. Instead of a result set containing rows of data, only the message “Explained.” is returned to the user. This indicates that the explain plan information has been successfully written to the plan table.

Following is the statement commonly used to retrieve and display the execution plan. It is a complex SQL statement, but the only thing that has to be changed when it is run is the STATEMENT_ID. The CONNECT BY clause is an Oracle proprietary SQL extension that joins a recursive relationship in a table through all iterations (from child to parent to grandparent, and so forth). This SQL was included here to illustrate one method of viewing explain plan results. As mentioned before, there is also a GUI tool in Oracle’s Enterprise Manager.

SELECT rtrim(substr(LPAD(‘ ‘,2*(LEVEL-1))||operation,1,30))||’ ‘
                 
||rtrim(options)||’ ‘||rtrim(object_name)|| ‘ ‘
                
||'(cost= ‘||cost||’, cardinality=’||cardinality||’)’
                
“Query Plan”
 
FROM plan_table
 START WITH id = 0
       
AND upper(statement_id) = upper(‘STMT_1′)
CONNECT BY PRIOR id = parent_id
         AND upper(statement_id) = upper(‘STMT_1′);
Query Plan
——————————————-
SELECT STATEMENT  (cost= 10, cardinality=5)
  SORT ORDER BY (cost= 10, cardinality=5)

    HASH JOIN (cost= 9, cardinality=5)
     
MERGE JOIN (cost= 5, cardinality=5)
        TABLE ACCESS BY INDEX ROWID MPAA_RATING
                      
(cost= 2, cardinality=6)
         
INDEX FULL SCAN SYS_C005440
                       (cost= 1, cardinality=6)
       
SORT JOIN (cost= 3, cardinality=5)
          TABLE ACCESS BY INDEX ROWID MOVIE
                       (cost= 2, cardinality=5)
           
INDEX RANGE SCAN SYS_C005449
                       (cost= 1, cardinality=5)
     
TABLE ACCESS FULL MOVIE_GENRE
                      
(cost= 3, cardinality=16)
10 rows selected.

Here are some key points regarding the query plan that appears in the result set:

  • The indentation shows the order of execution, with the innermost steps being performed first.
  • The cost values show a relative cost. The numbers have no meaning beyond their relative differences. For example, a cost of 10 represents a step that uses twice the resources of a step that has a cost of 5.
  • The cardinality values show the estimated number of rows that are processed by the step.
  • “TABLE ACCESS FULL” indicates a full table scan where all rows in the table are read sequentially.
  • “INDEX RANGE SCAN” indicates the scan of a portion of the rows in an index.
  • “TABLE ACCESS BY INDEX” indicates access to a table using the index shown.
  • The sorts and joins should be self evident by the step names.

{mospagebreak title=Microsoft SQL Server Considerations}

This section covers some tuning considerations that are specific to Microsoft SQL Server databases.

SQL Server Query Performance Considerations

Here are some considerations for running queries in Microsoft SQL Server databases:

  • Access via stored procedures.  Placing frequently used SQL queries in stored procedures can yield significant performance gains. In fact, in older versions of Microsoft SQL Server, it was essential to do so in order to achieve reasonable performance because there was no SQL cache to hold recently used SQL statements. In SQL Server 2005, the SQL cache allows the DBMS to recognize a statement similar to one that has already been run and to bypass statement preparation and optimization steps when the statement is reused from the cache. Nonetheless, any statement that is accessed through a stored procedure is precompiled and optimized, so it will always perform well (even if it is not found in the cache at the time it is run). This is one reason why seasoned developers strive to do most or all of their SQL Server database access using stored procedures.
  • Avoid nulls in unique indexes.  Unlike most other SQL implementations, unique indexes in SQL Server do not handle null column values well. You can define a unique index that includes a column that is allowed to be null, but the second index entry that contains a null is considered a duplicate of the first, and therefore results in a duplicate key error. Most other SQL implementations avoid this issue by not indexing null values.
  • Consider clustered indexes.  A clustered index causes the table rows to be physically ordered by the indexed column(s). This can help considerably in search and join operations, but it’s at the expense of table maintenance, particularly when new rows force existing rows to be moved to maintain their sequence.

{mospagebreak title=Displaying Execution Plans Using SQL Query Analyzer}

Microsoft provides a facility for explaining query execution plans just as Oracle and DB2 do. In Microsoft SQL Server 2000, the SQL Query Analyzer tool has a button labeled “Display Estimated Execution Plan” that graphically displays how the SQL statement will be executed. This feature is also accessible from the Query menu item as the option Show Execution Plan. These items may have different names in other versions of Microsoft SQL Server. In SQL Server 2005, there does not appear to be a tool called SQL Query Analyzer, and it is not clear what the tool will be called when the product is formally released.

In the Query Analyzer tool, the default mode is Object Browser. To display an execution plan, click the Display Estimated Execution Plan icon or press CTRL-L. The execution plan is displayed graphically as shown in Figure 11-1. Each step is shown as an icon and the number near the icon shows the percentage of the total query cost that applies to that step. You can place your cursor pointer over any icon to see a pop-up window showing more detailed information for that step.


Figure 11-1.  SQL Query Analyzer output

DB2 UDB Considerations

Like Oracle and Microsoft SQL Server, DB2 also provides a utility to display query execution plans. IBM provides a script called EXPLAIN.DDL that creates the tables needed to hold the output of the explain facility. Once the explain tables are created, the explain facility can be used to create execution plans for queries, and the plans can then be displayed using either Visual Explain in the Control Center or by querying the explain tables using SQL.

Tuning DML Statements

DML (Data Manipulation Language) statements generally produce fewer performance problems than query statements. However, there can be issues.

For INSERT statements, there are two main considerations:

  • Ensuring adequate free space for new rows Tablespaces that are short on space present problems as the DBMS searches for free space to hold rows being inserted. Moreover, inserts do not usually put rows into the table in primary key sequence because there usually isn’t free space in exactly the right places. Therefore, reorganizing the table, which is essentially a process of unloading the rows to a flat file, re-creating the table, and then reloading the table can improve both insert and query performance.
  • Index maintenance  Every time a row is inserted into a table, a corresponding entry must be inserted into every index built on the table (although null values are usually not indexed). The more indexes, the more overhead every insert will require. Index free space can usually be tuned just as table free space can.

    UPDATE statements have the following considerations:
  • Index maintenance  If columns that are indexed are updated, the corresponding index entries must also be updated. In general, updating primary key values has particularly bad performance implications, so much so that some RDBMSs prohibit them.
  • Row expansion  When columns are updated in such a way that the row grows significantly in size, the row may no longer fit in its original location, and there may not be free space around the row for it to expand in place (other rows might be right up against the one just updated). When this occurs, the row must either be moved to another location in the data file where it will fit or be split with the expanded part of the row placed in a new location, connected to the original location by a pointer. Both of these situations are not only expensive when they occur but are also detrimental to the performance of subsequent queries that touch those rows. Table reorganizations can resolve the issue, but it’s better to prevent the problem by designing the application so that rows tend not to grow in size after they are inserted.

DELETE statements are the least likely to present performance issues. However, a table that participates as a parent in a relationship that is defined with the ON DELETE CASCADE option can perform poorly if there are many child rows to delete. Moreover, index maintenance also comes into play here because index entries must be removed for any deleted rows.

{mospagebreak title=Quiz}

Choose the correct responses to each of the multiple-choice questions. Note that there may be more than one correct response to each question.

  1. Performance requirements

    a. Should be set after the SQL statements are tuned

    b. Provide a way to identify statements that need tuning

    c. Are best when they contain complex criteria

    d. Provide a way to know when to stop tweaking a query

    e. Are developed just to make the auditors happy
  2. Disk reads and writes can be minimized by

    a. Allocating buffers of sufficient size

    b. Placing all the database files on one disk drive

    c. Putting the entire database in memory

    d. Adding indexes for all the important table columns

    e. Spreading files across all available disk drives
  3. The computer system can be tuned by

    a. Collecting database statistics

    b. Applying available security patches

    c. Selecting fast and reliable hardware

    d. Following the DBMS tuning recommendations

    e. Consulting a tuning guide for the operating system
  4. Efficient table design includes

    a. Using VARCHAR for all variable-length character columns

    b. Using the smallest possible numeric data type that holds the data values

    c. Using triggers whenever possible

    d. Using identical data types for primary keys and their matching foreign keys

    e. Using identical data types for all primary key columns
  5. A query execution plan

    a. Describes how the DBMS will execute a query

    b. Is stored in the SQL cache

    c. Is created using the explain plan feature of the RDBMS

    d. Requires a plan table to hold the explain results

    e. Requires the use of a stored procedure
  6. The query optimizer

    a. Creates a query execution plan in the plan table

    b. Determines the best way to execute an SQL statement

    c. May use statistics gathered from the database

    d. May use rules applied to the way the statement was written

    e. Can be cost-based or rule-based
  7. In order to design proper queries, the developer should

    a. Know the characteristics of the data in the database

    b. Match data types in predicates

    c. Use hints as much as possible

    d. Avoid unnecessary tables and columns

    e. Maximize the number of rows in each result set
  8. Table scans can be avoided by

    a. Including a WHERE clause that references an indexed column

    b. Using GROUP BY instead of DISTINCT

    c. Making sure that statistics are up to date

    d. Making at least one predicate references the leading column of an index

    e. Avoiding unnecessary columns
  9. An index cannot be used when

    a. The WHERE clause references the second column of an index

    b. A LIKE clause references a comparison string that contains a wildcard (except in the first position of the string)

    c. The NOT operator is used in a predicate

    d. An SQL function is included in a column comparison (except when they match a function-based index)

    e. The NOT EQUAL operator is used in a predicate
  10. Considerations for using indexes include

    a. Placing indexes on all frequently updated columns

    b. Placing indexes on foreign key columns

    c. Avoiding overlapping indexes

    d. Creating indexes on columns that have only a few possible values

    e. Avoiding unique indexes
  11. Tuning considerations for MySQL include

    a. Function-based indexes

    b. Hash indexes

    c. Clustering indexes

    d. Storage engine options

    e. Bit-map indexes
  12. Tuning considerations for Oracle include

    a. Function-based indexes

    b. Hash indexes

    c. Clustering indexes

    d. Storage engine options

    e. Bit-map indexes
  13. Tuning considerations for Microsoft SQL Server include

    a. Function-based indexes

    b. Hash indexes

    c. Clustering indexes

    d. Storage engine options

    e. Bit-map indexes
  14. An explain plan in Oracle

    a. Requires the use of a plan table

    b. Contains a PLAN_ID to uniquely identify it

    c. Can be viewed using Enterprise Manager

    d. Can be viewed with SQL by selecting it from the plan table

    e. Is created using the CREATE PLAN statement
  15. An execution plan in Microsoft SQL Server

    a. Requires the use of a plan table

    b. Can be displayed using an option in SQL Query Analyzer

    c. Displays the execution plan in a text format

    d. Displays the execution plan in a graphical format

    e. Can be viewed with SQL by selecting it from the plan table
  16. When tuning INSERT statements, one should consider

    a. Index maintenance

    b. Row expansion

    c. The CASCADE option

    d. Adequate free space

    e. Query rewrites 
  17. When tuning UPDATE statements, one should consider

    a. Index maintenance

    b. Row expansion

    c. The CASCADE option

    d. Adequate free space

    e. Query rewrites 
  18. When tuning DELETE statements, one should consider

    a. Index maintenance

    b. Row expansion

    c. The CASCADE option

    d. Adequate free space

    e. Query rewrites
  19. Data types should match

    a. For all primary key columns

    b. Between primary key and corresponding secondary key columns

    c. Between primary key and corresponding foreign key columns

    d. Between column values and literal values compared in predicates

    e. For all function-based indexes
  20. The most likely cause of query performance problems is

    a. A poorly tuned operating system

    b. A poorly written SQL statement

    c. Trigger overhead

    d. Index maintenance overhead

    e. Row expansion
[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye