In this chapter, we will be covering topics and features available in Oracle Database 10g with which you will need to be familiar when working with large databases. These features are among the more advanced that you will encounter, but theyíre necessary, as databases are growing larger and larger. When you start working with Oracle, you will find yourself facing the trials and tribulations associated with large databases sooner rather than later. The quicker you understand the features and know where and when to use them, the more effective you will be.
CRITICAL SKILL 9.1
What Is a Large Database?
Letís start by describing what we mean by a large database. ďLargeĒ is a relative term that changes over time. What was large five or ten years ago is small by todayís standards, and what is large today will be peanuts a few years from now. Each release of Oracle has included new features and enhancements to address the need to store more and more data. For example, Oracle8i was released in 1999 and could handle databases with terabytes (1024 gigabytes) of data. In 2001, Oracle9i was released and could deal with up to 500 petabytes (1024 terabytes). Oracle Database 10g now offers support for exabyte (1024 petabytes) databases. You wonít come across too many databases with exabytes of data right now, but in the future at least we know Oracle will support them.
The most obvious examples of large database implementations are data warehouses and decision support systems. These environments usually have tables with millions or billions of rows, or wide tables with large numbers of columns and many rows. There are also many OLTP systems that are very large and can benefit from the features we are about to cover. Since weíve got many topics to get through, letís jump right in and start with data partitioning.
Why and How to Use Data Partitioning
As our user communities require more and more detailed information in order to remain competitive, it has fallen to us as database designers and administrators to help ensure that the information is managed efficiently and can be retrieved for analysis effectively. In this section, we will discuss partitioning data, and why it is so important when working with large databases. Afterward, weíll follow the steps required to make it all work.
Why Use Data Partitioning
Letís start by defining what we mean by data partitioning. In its simplest form, it is a way of breaking up or subsetting data into smaller units that can be managed and accessed separately. It has been around for a long time both as a design technique and as a technology. Letís look at some of the issues that gave rise to the need for partitioning and the solutions to these issues.
Tables containing very large numbers of rows have always posed problems and challenges for DBAs, application developers, and end-users alike. For the DBA, the problems centered on the maintenance and manageability of the underlying data files that contain the data for these tables. For the application developers and end users, the issues were query performance and data availability.
To mitigate these issues, the standard database design technique was to create physically separate tables, identical in structure (for example, columns), but with each containing a subset of the total data (we will refer to this design technique as nonpartitioned). These tables could be referred to directly or through a series of views. This technique solved some of the problems, but still meant maintenance for the DBA to create new tables and/or views as new subsets of data were acquired. In addition, if access to the entire dataset was required, a view was needed to join all subsets together.
Figure 9-1 illustrates this design technique. In this sample, separate tables with identical structures have been created to hold monthly sales information for 2005. Views have also been defined to group the monthly information into quarters using a union query. The quarterly views themselves are then grouped together into a view that represents the entire year. The same structures would be created for each year of data. In order to obtain data for a particular month or quarter, an end user would have to know which table or view to use.
Similar to the technique illustrated in Figure 9-1, the partitioning technology offered by Oracle Database 10g is a method of breaking up large amounts of data into smaller, more manageable chunks. But, unlike the nonpartitioned technique, it is transparent to
FIGURE 9-1. Partitioning using separate physical tables
the end user, offering improved performance and reduced maintenance. Figure9-2 illustrates the same SALES table, but implemented using Oracle Database10gís partitioning option. From the end userís perspective, there is only one table called SALES and all that is required to access data from the correct partition is a date (or a month and year).
Oracle partitioning was first introduced in Oracle8 and is only available with the Enterprise Edition. Many improvements have been made since then and Oracle Database 10g contains all the latest features. The remainder of this section discusses these features in more detail.
When administering large databases, DBAs are required to determine the most efficient and effective way to configure the underlying data files that support the tables in the database. The decisions made will affect data accessibility and availability as well as back up and recovery.
Some of the benefits to database manageability when using partitioned tables include the following:
One of the main reasons for partitioning a table is to improve I/O response time when selecting data from the table. Having a tableís data partitioned into subsets can yield much faster query results when you are looking for data that is limited to one subset of the total. Letís look at an example to illustrate.
Assume the SALES table contains 100 million records representing daily sales revenue for the three years 2005 to 2007 inclusive. We want to know what the total revenue is for February 2006. The query might look something like this:
Using a nonpartitioned table design, all 100 million rows would need to be scanned to determine if they belong to the date criteria. Using a partitioned table design based on monthly partitions with about 2.8 million rows for each month, only those rows in the February 2006 partition (and therefore only about 2.8 million rows) would be scanned. The process of eliminating data not belonging to the subset defined by the query criteria is referred to as partition pruning.
With the basic concepts of partitioning and why we use it under our belts, we can now discuss the finer details of how to implement partitioning.
blog comments powered by Disqus