Home arrow Oracle arrow Page 2 - Large Database Features In Oracle

Implement Data Partitioning - Oracle

When you are working with large databases with Oracle Database 10g, there are certain topics with which you need to be familiar. These topics are covered in this article, the first of two parts. It is excerpted from chapter nine of the book Oracle Database 10g A Beginner's Guide, written by Ian Abramson et al (McGraw-Hill/Osborne, 2004; ISBN: 0072230789).

TABLE OF CONTENTS:
  1. Large Database Features In Oracle
  2. Implement Data Partitioning
  3. Select the Type of Partitioning
  4. Define the Indexing Strategy
  5. Project 9-1 Creating a Range-Partitioned Table and a Local Partitioned Index
  6. Data Compression
  7. Parallel Processing Configuration
By: McGraw-Hill/Osborne
Rating: starstarstarstarstar / 26
December 08, 2005

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Implementing data partitioning in Oracle Database 10g is a process that requires careful planning to ensure success. You will need to understand your database environment, hardware, structures, and data before you can make the appropriate decisions. In the next few sections, we will outline the steps you will take when partitioning. Let’s start by looking at the characteristics of the candidate table.

Analyze the Candidate Table

The first step in the partitioning process is to analyze and understand the candidate table, its environment and its uses. Following are some criteria to consider.

Table Structure and Data Contents  You will need to look at what attributes are available and what is the distribution of the data within each attribute. You must consider currently available data as well as projected future data. The distribution of data over each attribute is important because you want to ensure that the resulting data subsets are evenly distributed across the defined partitions.

Consider a table called PHONE_USAGE that contains detailed mobile phone call records with over 300 million records per month. It has many attributes including the toll type (toll_type_cd) and the date of call (call_date). Table 9-1 shows a sample row count for a month by toll_type_cd. As you can see, using this attribute would probably not be an ideal choice by which to create subsets since the distribution is heavily skewed toward LOCAL calls.

Table 9-2 looks at the distribution of the same data by the day of the week (for example, Sunday to Saturday based on call_date).

We can see that the day of the week provides a relatively even distribution that is more suitable for partitioning.

How the Data Will Be Accessed  To access the data, you will need to know what the commonest data selection criteria are. This is perhaps the most important part of the analysis because, as we stated earlier, query performance is the most noticeable

toll_type_cd

Record Count (Sample Month)

INTNL

27,296,802

CONTNL US

52,227,998

LOCAL

189,554,584

NRTH AMRCA

36,367,841

TABLE 9-1.  Row Distribution by toll_type_cd Attribute

Day of the Week

Record Count (Sample Month)

(Based on call_date)

SUN

41,635,356

MON

44,235,019

TUE

42,875,502

WED

43,235,721

THU

43,922,997

FRI

45,005,293

SAT

44,537,337

TABLE 9-2.  Row Distribution by Day of the Week

gain of data partitioning. In order for this to be realized, your data subsets need to be defined according to the commonest selection criteria so that unnecessary partitions can be pruned from the result set. The selection criteria will be determined largely by your user community and can be determined using historical query patterns (if available) or consulting business requirements.

Referring to our example using the SALES table, our analysis of query patterns for a three-month period (averaging 400 queries per month) yields the results shown in Table 9-3.

 

Times Used in Query Selection Criteria

Attribute

(Average/Month)

prod_id

33

cust_id

40

time_id

355

channel_id

55

promo_id

298

quantity_sold

25

amount_sold

20

TABLE 9-3.  Query Frequency of SALES Table  
                    Attributes

The analysis tells us that time_id and promo_id are both frequently used as query predicates. We would use this information along with the corresponding row distribution to determine which attribute would result in the better partitioning strategy.

Hardware Configuration  Factors such as the number of physical disks and disk controllers will contribute to the effectiveness of your partitioning strategy. Generally, the greater the number of disks and/or controllers, the better—you can spread the partitions over more hardware to improve I/O performance.

Identify the Partition Key

Once you understand the characteristics of your candidate table, the next step in the partitioning process is to select the attribute(s) of the candidate table that will define the partition subsets and how the subsets will be defined. The selected attributes will form the partition key. Only one set of attributes can be chosen to partition the data. This is an important decision, since it will affect the manageability and usability of the table.

The results of your analysis of the candidate table should provide you with a good idea of which attributes to use. The best attributes will be those that satisfy the most criteria. Keep in mind though, that the adage “you can satisfy some of the criteria some of the time, but you can’t satisfy all of the criteria all of the time” applies here. Despite your best efforts and planning, there will still be situations when the table will be treated as if it were nonpartitioned. Take, for example, a perfectly valid query submitted by the user community that does not include the attributes of the partition key as part of the selection criteria, or groups by the partition key. In this case, data from the entire table (that is, all partitions) would be scanned in order to satisfy the request.



 
 
>>> More Oracle Articles          >>> More By McGraw-Hill/Osborne
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

ORACLE ARTICLES

- Oracle Java Security Woes Continue
- Oracle's New IaaS Cloud Option: There's a Ca...
- Oracle Acquires Eloqua to Boost Cloud Presen...
- Choosing Innovation: Oracle Survey Insights
- Oracle Fixes Privilege Escalation Bug
- Oracle`s Communications Service Availability...
- Oracle Releases Exalytics, Taleo Plans
- Oracle Releases Communications Network Integ...
- Oracle Releases Communications Data Model 11...
- Oracle Releases PeopleSoft PeopleTools 8.52
- Oracle Integrates Cloudera Apache Distro, My...
- Oracle Releases MySQL 5.5.18
- Oracle Announces NoSQL Database Availability
- Sorting Database Columns With the SELECT Sta...
- Retrieving Table Data with the LIKE Operator

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: