When dealing with databases, uptime is not the only issue; if the database is up but slow, the end user will still be unhappy. Fortunately, there are ways to tune an Oracle database to deal with this issue, which is the subject of this article. It is excerpted from chapter three of Oracle Database 10g High Availability with RAC, Flashback and Data Guard, written by Matthew Hart and Scott Jesse (McGraw-Hill/Osborne, 2004; ISBN: 0072254289).
When we talk about high availability, we must take into account more than just the basic question of “is the database up?” Much more than just that simple question is at stake. What really matters is the perceptions of the users. If the database is up but responsiveness of the database is poor, then from the end user perspective, uptime is affected. This is not a revelation to anyone, but it bears repeating, as this is the premise of this chapter, and the reason for its existence within the covers of a high-availability book. Oracle Database 10g has added numerous tuning enhancements, as well as manageability enhancements that will make it easier to detect bottlenecks in the database, and to alleviate those bottlenecks, sometimes automatically.
In this chapter, we will discuss some of these key features, including a better understanding of what they are, and how to set them up to enable the HA DBA to take full advantage of them. Since manageability is a key component of this as well, we will also base many of our examples on the Oracle Database 10g Enterprise Manager. This latest version of EM is web-based, and is greatly enhanced to make identifying and fixing bottlenecks easier and more intuitive than ever before.
Oracle Database 10g has made great strides in utilizing the information that the database knows about its own inner workings, and turning that knowledge into an infrastructure that will allow the HA DBA to easily and automatically take advantage of that information. This intelligent infrastructure, as termed by Oracle, begins with the Automatic Workload Repository. The Workload Repository gathers information on activity in the database at regular intervals. Each run of the Workload Repository is followed by an automatic run of The Automatic Database Diagnostic Monitor (ADDM), which interprets the information supplied by AWR and makes recommendations. One can dig deeper into these recommendations by then manually running one of several advisors that Oracle has provided. In addition, an active session history, or ASH, is maintained to track all events that sessions are waiting for, to allow real-time information gathering. Altogether this is referred to as the intelligent infrastructure. Combined, these tools give the HA DBA more immediate access than ever before into the inner workings of the Oracle RDBMS.
MMON Background Process
The MMON background process is a new background process introduced in Oracle Database 10g, responsible for the automatic monitoring that goes on within the database. MMON is responsible for kicking off the Automatic Database Diagnostic Monitor at the end of each AWR run, to check for bottlenecks or problems identified within the report. In addition, MMON monitors the ALERT_QUE, a queue owned by sys, once a minute, for any new information/alerts. MMON can be used to either automatically take a corrective action or to send an alert via the ALERT_QUE. These alerts are then displayed on the database home page in Enterprise Manager. Additional subscribers (such as third-party or home-grown tools) can be added to the ALERT_ QUE using the DBMS_AQADM.ADD_SUBSCRIBER procedure, allowing alerts to be sent to multiple locations. If the ALERT_QUE is unavailable for some reason, alerts will go to the alert.log for the instance.
AWR: Automatic Workload Repository
We begin by looking at one of the key components of these enhancements in Oracle Database 10g—the Automatic Workload Repository, or AWR. For those of you who are familiar with StatsPack from previous releases, AWR will have a familiar look and feel to it. For those who are new to Oracle, or perhaps did not take advantage of StatsPack in previous releases, we will begin with an overview of how it works. Later in this section, we will talk about what is new between AWR and the older versions of StatsPack.
What Is AWR?
AWR is essentially a job-based scheduled collection of statistics, gathered and stored in the Oracle database, containing information about the database itself. This metadata repository by default is kept in a new tablespace in Oracle Database 10g, called SYSAUX. The script catawr.sql creates the Workload Repository, with objects owned by sys. Workload Repository object names are prefaced with WRH$, WRI$, or WRM$. When creating a new database in Oracle Database 10g, the Workload Repository schema and the job to gather statistics are automatically created as part of the database creation. By default, statistics collection will run every 60 minutes, gathering information from within the database on stats such as I/O waits and other wait events, CPU used per session, sorts, I/O rates on various datafiles, and so forth. In order for this to occur, the parameter STATISTICS_LEVEL must be set to ALL or TYPICAL (TYPICAL being the default). At the end of a run, the repository is updated with current information for that period. Reports can then be generated, using begin and end values corresponding to previous Workload Repository runs, to determine what was happening within the database during a given period of time.
Since AWR runs are scheduled by default when the database is created, nothing special need be done to enable this functionality. As mentioned above, AWR runs at 60-minute intervals, collecting stats and storing that information. However, as you can imagine, this can lead to a rather large repository over time, so by default, the data will be purged after seven days. Should you wish to change these defaults—either the frequency of the runs, or how soon the repository is purged—it is easy enough to do so using Enterprise Manager. Should you wish to disable AWR altogether, you may do that as well; however, we strongly recommend that you do not do this. The overhead for AWR is minimal, and the gains achieved by allowing the statistic gathering can come in handy at the most unexpected times.
Command-Line API for AWR
As mentioned in the previous section, the functionality of the Workload Repository can best be utilized within Enterprise Manager (EM). Thus, the majority of examples in this chapter will be EM-centric. However, for those of you who are command-line junkies, the API for the Workload Repository is a new one, aptly named DBMS_ WORKLOAD_REPOSITORY. This can be used to create snapshots, drop snapshots, build baselines, and so forth. We will not be delving into the details of the DBMS_ WORKLOAD_REPOSITORY package, but for those of you who are so inclined (or are desperately in need of some heavy reading), please refer to the Oracle Database 10g Release 1 Guide titled “PL/SQL Packages and Types Reference,” which is part of the Oracle documentation set.
Viewing an AWR Report
OK, so the Workload Repository is populated every 60 minutes with statistics, and this happens automatically. That’s nice. Now what? What do I do with this information? The answer is that you define times outlined by AWR runs, in which to turn this information into a report. When creating an AWR report, you define the beginning and ending interval for the reports using snapshot IDs from the Workload Repository. Your report will begin at one snapshot and end at a later snapshot.
An AWR report can be viewed in Enterprise Manager, using the administration page. From there, select Automatic Workload Repository, from under the Workload section, and then click on the underlined number next to Snapshots. Note that here you can also change the interval for AWR—we will cover these steps in more detail in an HA Workshop later in this section. As mentioned previously, you can determine the interval of the reports, with the beginning of the interval going back as far as the repository keeps the data. Reports can be viewed using a full report view in HTML format, or you can view the data with pertinent information summarized in the details view. If you prefer the old text look of the reports, as in the Oracle9i and Oracle8i StatsPack Reports, then a text report must be generated using the AWR_REPORT_ TEXT function of the DBMS_WORLOAD_REPOSITORY command-line API.
Interpreting the Workload Repository Report Output
The report, when viewed in its full form, will begin with a heading, which provides the DB information such as DB_Name, DBID, instance_name, and so on. The begin and end snap times will also be shown, with the total elapsed time for the report displayed in minutes. This interval should not be too long, or else the results will be diluted. The default interval of 60 minutes is usually a good starting point, but when troubleshooting a specific issue, you may want to lower that interval to 30 minutes to get more granularity in your data. After the heading information, the report is then broken out into several sections, covering various different types of statistics of value.
When reviewing the report, the Report Summary section at the top will give a quick overview of the database performance. The ending cache sizes will be shown, as well as the load profile, instance efficiency percentages, shared pool statistics, and the top five wait events. This report summary gives a quick look at the system, allowing an immediate review of the most common sources of bottlenecks. The Instance Efficiency section gives a quick look at the buffer and library cache hit ratios, sorts, parsing information, and so on, and allows you to immediately see if these areas of memory need to be increased. The load profile gives you an overall picture of how much of certain types of activity is going on in the database, such as physical reads and writes, sorts, parses, logons, and so forth. The top five timed events tell us where we are spending most of our time waiting, and are also categorized now into wait classes, which we will discuss later in this chapter. The wait classes give the HA DBA another opportunity for a quick look at the situation. For example, if the top five wait events are all I/O related, it can be made readily apparent by showing that all five events are part of the same wait class.
If you are running in a RAC environment, the next section will contain a quick look at RAC statistics to give you an idea of the overall performance of your cluster. This section provides information on the workload characteristics of your cluster, and gives insight into how much traffic is going across the interconnect, how frequently instances need to read from the remote cache vs. the local cache, and what the average times are for these operations.
After the top sections, the main report body will follow. Here you will have links to various other sections of the report that are of interest, including sections on SQL statements ordered by various stats such as buffer gets, reads, executions, file and tablespace I/O stats, SGA stats, and so forth. These sections will help pinpoint specific areas that are common bottlenecks, by highlighting poorly tuned or frequently executed SQL, and/or by bringing to light I/O hot spots where disk response time is not acceptable. Each report contains a virtual goldmine of information on what is happening within the database, and what may be causing slowdowns at various stages of operation. Our purpose here is not to discuss each individual section, but rather to provide an overview of how to gather and review the reports. Automating the interpretation of these reports via ADDM will be discussed later in this chapter.