HomeOracle Page 5 - Oracle and Availability: Illustrated Downtime Scenarios
Out of Space in the Woodscrew Tablespace - Oracle
Ever have to recover your servers from a flooded basement? Move a database to an new platform and need the weekend to test it but users need the database today? We will illustrate various database problems and which piece of technology could be employed to prevent the outage or to recover from it quickly. (From the book Oracle Database 10g High Availability with RAC, Flashback & Data Guard by Matthew Hart and Scott Jesse, ISBN: 0072254289, McGraw-Hill/Osborne, 2004.)
Availability can be significantly hampered by routine database management tasks. Take something as straightforward as datafile management. Where the files reside, how big they are, and on what disks, can lead to significant availability problems.
Horatio's Woodscrew database recently ran out of disk space while autoextending a datafile in the WS_APP_DATA tablespace. They had turned autoextend on so that they would not have to be faced with disk management. But business has been booming, and the orders and customer tables are growing dramatically. In addition, new reports are being generated by upper management looking to mine the data for more edge in the marketplace. In a word, the single datafile on a single disk volume has begun to fall short of their needs. They recently had disk problems, and the system administrator is reporting excessive I/O on WS_APP_DATA datafiles. A disk burnout is imminent.
However, reorganizing the datafiles across new disk volumes not only means an explicit investigation into which tables, exactly, are getting hit the hardest, but also the outage that would be required while the tablespace is reorganized across new datafiles on new volumes. Such a massive undertaking could take days.
Use ASM to Simplify Disk and File Management
In Oracle Database 10g, automatic storage management, or ASM, greatly simplifies file management by removing the burden of laying out files to avoid hot spots, and also by simplifying file creation and sizing. ASM is a volume manager for Oracle files, which gives the ability to stripe and mirror files with very little effort on the part of the DBA or the sysadmin. In addition, ASM is constantly monitoring for I/O hot spots, and it will automatically direct reads of allocation units or segments to disks that are least heavily used, to avoid the occurrence of hot spots and maintain I/O performance. Aside from this, if a disk fails, or if a new disk is added to an ASM disk group, ASM will automatically rebalance the existing files, to maintain the distribution of I/O. ASM will be discussed in more detail in Chapter 3.
Downtime for Hardware Fixes
Database availability can be taken out of the hands of the DBA when an outage is related to a hardware problem on the database server. At Horatio's Woodscrews, they have had users complaining about losing their connection to the database intermittently. The DBA could find nothing wrong with their setup, but the system administrator ran some diagnostics that pointed to a flaky network interface card. The solution is to replace the network card as soon as possible. Doing so means taking the database down, shutting down the entire server, and replacing the card. With month-end reports running around the clock in order to complete on time, shutting down the database will mean that the reports will have to wait until after the database comes back up.
RAC Clusters Mask Problems Involving a Single Node
With Real Application Clusters, multiple nodes are accessing the database at the same time. As such, if a node fails due to hardware or operating system problems, that node can be taken offline and repaired while users are still accessing the database through the remaining nodes in the cluster. Once repaired, the node can be restarted and will rejoin the cluster automatically, making the instance again available to users. We discuss the setup and configuration of a RAC cluster in Chapter 4.
Transparent Application Failover (TAF) Allows Queries to Be Failed Over to Another Node and Restarted Automatically
With Real Application Clusters and transparent application failover, should an instance on one node crash, it is possible for users connected to that instance to automatically fail over to one of the remaining nodes in the cluster, and have queries be restarted—and then to continue on uninterrupted. This is discussed in Chapter 11.
This chapter is from Oracle Database 10g High Availability with RAC, Flashback & Data Guard, by Hart and Jesse. (McGraw-Hill/Osborne, 2004, ISBN: 0072254289). Check it out at your favorite bookstore today. Buy this book now.