HomeOracle Page 2 - Large Database Features In Oracle
Implement Data Partitioning - Oracle
When you are working with large databases with Oracle Database 10g, there are certain topics with which you need to be familiar. These topics are covered in this article, the first of two parts. It is excerpted from chapter nine of the book Oracle Database 10g A Beginner's Guide, written by Ian Abramson et al (McGraw-Hill/Osborne, 2004; ISBN: 0072230789).
Implementing data partitioning in Oracle Database 10g is a process that requires careful planning to ensure success. You will need to understand your database environment, hardware, structures, and data before you can make the appropriate decisions. In the next few sections, we will outline the steps you will take when partitioning. Let’s start by looking at the characteristics of the candidate table.
Analyze the Candidate Table
The first step in the partitioning process is to analyze and understand the candidate table, its environment and its uses. Following are some criteria to consider.
Table Structure and Data Contents You will need to look at what attributes are available and what is the distribution of the data within each attribute. You must consider currently available data as well as projected future data. The distribution of data over each attribute is important because you want to ensure that the resulting data subsets are evenly distributed across the defined partitions.
Consider a table called PHONE_USAGE that contains detailed mobile phone call records with over 300 million records per month. It has many attributes including the toll type (toll_type_cd) and the date of call (call_date). Table 9-1 shows a sample row count for a month by toll_type_cd. As you can see, using this attribute would probably not be an ideal choice by which to create subsets since the distribution is heavily skewed toward LOCAL calls.
Table 9-2 looks at the distribution of the same data by the day of the week (for example, Sunday to Saturday based on call_date).
We can see that the day of the week provides a relatively even distribution that is more suitable for partitioning.
How the Data Will Be Accessed To access the data, you will need to know what the commonest data selection criteria are. This is perhaps the most important part of the analysis because, as we stated earlier, query performance is the most noticeable
toll_type_cd
Record Count (Sample Month)
INTNL
27,296,802
CONTNL US
52,227,998
LOCAL
189,554,584
NRTH AMRCA
36,367,841
TABLE 9-1. Row Distribution by toll_type_cd Attribute
Day of the Week
Record Count (Sample Month)
(Based on call_date)
SUN
41,635,356
MON
44,235,019
TUE
42,875,502
WED
43,235,721
THU
43,922,997
FRI
45,005,293
SAT
44,537,337
TABLE 9-2. Row Distribution by Day of the Week
gain of data partitioning. In order for this to be realized, your data subsets need to be defined according to the commonest selection criteria so that unnecessary partitions can be pruned from the result set. The selection criteria will be determined largely by your user community and can be determined using historical query patterns (if available) or consulting business requirements.
Referring to our example using the SALES table, our analysis of query patterns for a three-month period (averaging 400 queries per month) yields the results shown in Table 9-3.
Times Used in Query Selection Criteria
Attribute
(Average/Month)
prod_id
33
cust_id
40
time_id
355
channel_id
55
promo_id
298
quantity_sold
25
amount_sold
20
TABLE 9-3.Query Frequency of SALES Table Attributes
The analysis tells us that time_id and promo_id are both frequently used as query predicates. We would use this information along with the corresponding row distribution to determine which attribute would result in the better partitioning strategy.
Hardware Configuration Factors such as the number of physical disks and disk controllers will contribute to the effectiveness of your partitioning strategy. Generally, the greater the number of disks and/or controllers, the better—you can spread the partitions over more hardware to improve I/O performance.
Identify the Partition Key
Once you understand the characteristics of your candidate table, the next step in the partitioning process is to select the attribute(s) of the candidate table that will define the partition subsets and how the subsets will be defined. The selected attributes will form the partition key. Only one set of attributes can be chosen to partition the data. This is an important decision, since it will affect the manageability and usability of the table.
The results of your analysis of the candidate table should provide you with a good idea of which attributes to use. The best attributes will be those that satisfy the most criteria. Keep in mind though, that the adage “you can satisfy some of the criteria some of the time, but you can’t satisfy all of the criteria all of the time” applies here. Despite your best efforts and planning, there will still be situations when the table will be treated as if it were nonpartitioned. Take, for example, a perfectly valid query submitted by the user community that does not include the attributes of the partition key as part of the selection criteria, or groups by the partition key. In this case, data from the entire table (that is, all partitions) would be scanned in order to satisfy the request.