Large Database Features In Oracle

When you are working with large databases with Oracle Database 10g, there are certain topics with which you need to be familiar. These topics are covered in this article, the first of two parts. It is excerpted from chapter nine of the book Oracle Database 10g A Beginner’s Guide, written by Ian Abramson et al (McGraw-Hill/Osborne, 2004; ISBN: 0072230789).

In this chapter, we will be covering topics and features available in Oracle Database 10g with which you will need to be familiar when working with large databases. These features are among the more advanced that you will encounter, but they’re necessary, as databases are growing larger and larger. When you start working with Oracle, you will find yourself facing the trials and tribulations associated with large databases sooner rather than later. The quicker you understand the features and know where and when to use them, the more effective you will be.

CRITICAL SKILL 9.1

What Is a Large Database?

Let’s start by describing what we mean by a large database. “Large” is a relative term that changes over time. What was large five or ten years ago is small by today’s standards, and what is large today will be peanuts a few years from now. Each release of Oracle has included new features and enhancements to address the need to store more and more data. For example, Oracle8i was released in 1999 and could handle databases with terabytes (1024 gigabytes) of data. In 2001, Oracle9i was released and could deal with up to 500 petabytes (1024 terabytes). Oracle Database 10g now offers support for exabyte (1024 petabytes) databases. You won’t come across too many databases with exabytes of data right now, but in the future at least we know Oracle will support them.

The most obvious examples of large database implementations are data warehouses and decision support systems. These environments usually have tables with millions or billions of rows, or wide tables with large numbers of columns and many rows. There are also many OLTP systems that are very large and can benefit from the features we are about to cover. Since we’ve got many topics to get through, let’s jump right in and start with data partitioning.


NOTE

Many of the topics discussed in this chapter could, each on their own, take an entire book to cover completely. Since this is an introductory book, specifics for some topics have been omitted. Real-world experiences and additional reading will build on this material.


CRITICAL SKILL 9.2

Why and How to Use Data Partitioning

As our user communities require more and more detailed information in order to remain competitive, it has fallen to us as database designers and administrators to help ensure that the information is managed efficiently and can be retrieved for analysis effectively. In this section, we will discuss partitioning data, and why it is so important when working with large databases. Afterward, we’ll follow the steps required to make it all work.

Why Use Data Partitioning

Let’s start by defining what we mean by data partitioning. In its simplest form, it is a way of breaking up or subsetting data into smaller units that can be managed and accessed separately. It has been around for a long time both as a design technique and as a technology. Let’s look at some of the issues that gave rise to the need for partitioning and the solutions to these issues.

Tables containing very large numbers of rows have always posed problems and challenges for DBAs, application developers, and end-users alike. For the DBA, the problems centered on the maintenance and manageability of the underlying data files that contain the data for these tables. For the application developers and end users, the issues were query performance and data availability.

To mitigate these issues, the standard database design technique was to create physically separate tables, identical in structure (for example, columns), but with each containing a subset of the total data (we will refer to this design technique as nonpartitioned). These tables could be referred to directly or through a series of views. This technique solved some of the problems, but still meant maintenance for the DBA to create new tables and/or views as new subsets of data were acquired. In addition, if access to the entire dataset was required, a view was needed to join all subsets together.

Figure 9-1 illustrates this design technique. In this sample, separate tables with identical structures have been created to hold monthly sales information for 2005. Views have also been defined to group the monthly information into quarters using a union query. The quarterly views themselves are then grouped together into a view that represents the entire year. The same structures would be created for each year of data. In order to obtain data for a particular month or quarter, an end user would have to know which table or view to use.

Similar to the technique illustrated in Figure 9-1, the partitioning technology offered by Oracle Database 10g is a method of breaking up large amounts of data into smaller, more manageable chunks. But, unlike the nonpartitioned technique, it is transparent to

FIGURE 9-1.  Partitioning using separate physical tables

the end user, offering improved performance and reduced maintenance. Figure9-2 illustrates the same SALES table, but implemented using Oracle Database10g’s partitioning option. From the end user’s perspective, there is only one table called SALES and all that is required to access data from the correct partition is a date (or a month and year).

Oracle partitioning was first introduced in Oracle8 and is only available with the Enterprise Edition. Many improvements have been made since then and Oracle Database 10g contains all the latest features. The remainder of this section discusses these features in more detail.

Manageability

When administering large databases, DBAs are required to determine the most efficient and effective way to configure the underlying data files that support the tables in the database. The decisions made will affect data accessibility and availability as well as back up and recovery.

Some of the benefits to database manageability when using partitioned tables include the following:                                 

                                                

 
          FIGURE 9-2. Partitioning using Oracle 
              Database 10g partitioning
 

  • The size of each underlying data file is generally smaller for partitioned tables than nonpartitioned tables. This allows for easier and quicker backups.
  • Historical partitions can be made read-only and will not need to be backed up more than once. This also means faster backups.
  • The structure of a partitioned table needs to be defined only once. As new subsets of data are acquired, they will be assigned to the correct partition, based on the partition option chosen.
  • Individual tablespaces and/or their data files can be taken offline for maintenance or archiving without affecting access to other subsets of data. For example, assuming data for a table is partitioned by month (later in this chapter, we will discuss the different types of partitioning) and only 13 months of data is to be kept online at any one time, when a new month is acquired, the earliest month is archived and dropped from the table. This is accomplished using the command alter table abc drop partition xyz and does not affect access to the remaining 12 months of data.

Ask the Expert

 Q:  Can you use the analyze table command to gather statistics on partitioned tables?

 A:  No; at least not correctly. The supplied DBMS_STATS package should be used instead to gather statistics on partitioned tables. The analyze table command does not gather all required statistics for partitioned tables, in particular global statistics. In addition, the analyze command will eventually be phased out (for all types of table and indexes) and only those statistics gathered by the DBMS_STATS package will be used by the cost-based optimizer.

  •  Other commands that would normally apply at the table level can also be applied to a particular partition of the table. These include, but are not limited to: delete, insert, select, truncate, and update.

Performance

One of the main reasons for partitioning a table is to improve I/O response time when selecting data from the table. Having a table’s data partitioned into subsets can yield much faster query results when you are looking for data that is limited to one subset of the total. Let’s look at an example to illustrate.

Assume the SALES table contains 100 million records representing daily sales revenue for the three years 2005 to 2007 inclusive. We want to know what the total revenue is for February 2006. The query might look something like this:

select sum(amount_sold)
from sales
where time_id between to_date(‘2006-02-01′, ‘YYYY-MM-DD’) and
to_date(‘2006-02-28′, ‘YYYY-MM-DD’);

Using a nonpartitioned table design, all 100 million rows would need to be scanned to determine if they belong to the date criteria. Using a partitioned table design based on monthly partitions with about 2.8 million rows for each month, only those rows in the February 2006 partition (and therefore only about 2.8 million rows) would be scanned. The process of eliminating data not belonging to the subset defined by the query criteria is referred to as partition pruning.

With the basic concepts of partitioning and why we use it under our belts, we can now discuss the finer details of how to implement partitioning.

{mospagebreak title=Implement Data Partitioning}

Implementing data partitioning in Oracle Database 10g is a process that requires careful planning to ensure success. You will need to understand your database environment, hardware, structures, and data before you can make the appropriate decisions. In the next few sections, we will outline the steps you will take when partitioning. Let’s start by looking at the characteristics of the candidate table.

Analyze the Candidate Table

The first step in the partitioning process is to analyze and understand the candidate table, its environment and its uses. Following are some criteria to consider.

Table Structure and Data Contents  You will need to look at what attributes are available and what is the distribution of the data within each attribute. You must consider currently available data as well as projected future data. The distribution of data over each attribute is important because you want to ensure that the resulting data subsets are evenly distributed across the defined partitions.

Consider a table called PHONE_USAGE that contains detailed mobile phone call records with over 300 million records per month. It has many attributes including the toll type (toll_type_cd) and the date of call (call_date). Table 9-1 shows a sample row count for a month by toll_type_cd. As you can see, using this attribute would probably not be an ideal choice by which to create subsets since the distribution is heavily skewed toward LOCAL calls.

Table 9-2 looks at the distribution of the same data by the day of the week (for example, Sunday to Saturday based on call_date).

We can see that the day of the week provides a relatively even distribution that is more suitable for partitioning.

How the Data Will Be Accessed  To access the data, you will need to know what the commonest data selection criteria are. This is perhaps the most important part of the analysis because, as we stated earlier, query performance is the most noticeable

toll_type_cd

Record Count (Sample Month)

INTNL

27,296,802

CONTNL US

52,227,998

LOCAL

189,554,584

NRTH AMRCA

36,367,841

TABLE 9-1.  Row Distribution by toll_type_cd Attribute

Day of the Week

Record Count (Sample Month)

(Based on call_date)

SUN

41,635,356

MON

44,235,019

TUE

42,875,502

WED

43,235,721

THU

43,922,997

FRI

45,005,293

SAT

44,537,337

TABLE 9-2.  Row Distribution by Day of the Week

gain of data partitioning. In order for this to be realized, your data subsets need to be defined according to the commonest selection criteria so that unnecessary partitions can be pruned from the result set. The selection criteria will be determined largely by your user community and can be determined using historical query patterns (if available) or consulting business requirements.

Referring to our example using the SALES table, our analysis of query patterns for a three-month period (averaging 400 queries per month) yields the results shown in Table 9-3.

 

Times Used in Query Selection Criteria

Attribute

(Average/Month)

prod_id

33

cust_id

40

time_id

355

channel_id

55

promo_id

298

quantity_sold

25

amount_sold

20

TABLE 9-3.  Query Frequency of SALES Table  
                    Attributes

The analysis tells us that time_id and promo_id are both frequently used as query predicates. We would use this information along with the corresponding row distribution to determine which attribute would result in the better partitioning strategy.

Hardware Configuration  Factors such as the number of physical disks and disk controllers will contribute to the effectiveness of your partitioning strategy. Generally, the greater the number of disks and/or controllers, the better—you can spread the partitions over more hardware to improve I/O performance.

Identify the Partition Key

Once you understand the characteristics of your candidate table, the next step in the partitioning process is to select the attribute(s) of the candidate table that will define the partition subsets and how the subsets will be defined. The selected attributes will form the partition key. Only one set of attributes can be chosen to partition the data. This is an important decision, since it will affect the manageability and usability of the table.

The results of your analysis of the candidate table should provide you with a good idea of which attributes to use. The best attributes will be those that satisfy the most criteria. Keep in mind though, that the adage “you can satisfy some of the criteria some of the time, but you can’t satisfy all of the criteria all of the time” applies here. Despite your best efforts and planning, there will still be situations when the table will be treated as if it were nonpartitioned. Take, for example, a perfectly valid query submitted by the user community that does not include the attributes of the partition key as part of the selection criteria, or groups by the partition key. In this case, data from the entire table (that is, all partitions) would be scanned in order to satisfy the request.

{mospagebreak title=Select the Type of Partitioning}

After you have selected the partition key, the next step in the partitioning process is to decide which type of partitioning you want to implement on the candidate table. Oracle Database 10g provides four ways to partition data:

  • Range partitioning
  • List partitioning
  • Hash partitioning
  • Composite partitioning

The type of partitioning you choose will depend on the results of your analysis of the candidate table. The commonest type of partitioning is range partitioning and will be covered here in the most detail. Let’s look at the characteristics of each.

Range Partitioning  Range partitioning has been around the longest of all partitioning types and is the one implemented most often. In most cases, the ranges are based on some date component, such as quarters, months or, in the case of very large data volumes, days. (Theoretically, you can go down to any level of time—hours, minutes, and so on—assuming you have a time component. But the maintenance implications of defining this many partitions make it unrealistic.) The ranges selected will again be based on the results of your analysis of the table, using dates, numeric values, or character values. Following is an example based on the SALES table we saw earlier in the chapter.


NOTE

The partitioning examples presented in this chapter do not address all of the command options available. They are meant to give you a taste of what is available.


To create our SALES table as nonpartitioned, we would use the standard create table statement as shown in this listing:

create table sales (
prod_id        number        not null,
cust_id        number        not null,
time_id        date          not null, channel_id     number        not null,
promo_id       number        not null, quantity_sold  number (10,2) not null, amount_sold    number (10,2) not null)

tablespace example pctfree 5 initrans 1 maxtrans 255
storage ( initial 65536 minextents 1 maxextents 2147483645);

Based on our analysis of the usage patterns and row distribution, we have decided that the optimal partition strategy for this table is based on sales month. We will now redefine the SALES table using time_id as our partition key to create monthly partitions for January 2005 to December 2007, inclusive. Creation of data partitions is accomplished using extensions of the create table statement. The following listing shows the creation of the table with range partitions. Explanations of the important lines are given in Table 9-4.

1 create table sales (
2  prod_id           number        not null,
3  cust_id           number        not null,
4  time_id           date          not null,
5  channel_id        number        not null,
6  promo_id          number        not null,
7  quantity_sold     number (10,2) not null, 8  amount_sold       number (10,2) not null) 9  storage (initial 65536 minextents 1 
   maxextents 2147483645)
10 partition by range (time_id)
11 (partition sales_200501 values less than
12                      (to_date(‘2005-02-
   01′,’YYYY-MM-DD’))
13                      tablespace
   sales_ts_200501,
14 partition sales_200502 values less than 15                      (to_date(‘2005-03-
   01′,’YYYY-MM-DD’))
16                      tablespace
   sales_ts_200502,
17 partition sales_200503 values less than 18                      (to_date(‘2005-04-
   01′,’YYYY-MM-DD’))
19                      tablespace
   sales_ts_200503,

113 partition sales_200711 values less than 114                     (to_date(‘2007-12-
    01′,’YYYY-MM-DD’))
115                     tablespace
    sales_ts_200711,
116 partition sales_200712 values less than 117                     (to_date(‘2008-01-
    01′,’YYYY-MM-DD’))
118                     tablespace
    sales_ts_200712,
119 partition sales_max values less than
    (maxvalue)
120                     tablespace
    sales_ts_max);

Lines

Important Points

9

Defines the default table-level storage parameters that will apply to all

 

partitions. It is possible to override these defaults at the partition level

 

in favor of specific parameters required for a particular partition.

10

Defines the type of partitioning (for example, range) and the partition

 

key (for instance, time_id).

11–118

Define each partition based on the values of time_id (repetitive

 

lines for Apr 2005 to Oct 2007 omitted for brevity’s sake). For each

 

partition, the upper boundary of the partition key value is specified

 

(as defined by the values less than clause), as well as the name of the

 

tablespace where the subset is to be stored. Values must be specified

 

in ascending order and cannot overlap. It is good practice to give

 

meaningful names to both the partitions and tablespaces.

TABLE 9-4.  Explanationof Range Partitioning Syntax

Lines

Important Points

119–120

Define the default partition for any values of time_id that are

 

greater than the last defined partition (for instance, dates greater

 

than June 30, 2006). maxvalue is an Oracle keyword that results

 

in a value that is higher than any other possible value of the partition

 

key. It will also include null values if the situation exists.

TABLE 9-4.  Explanation of Range Partitioning  
                   Syntax (continued)
 


NOTE

Lines 11 to 13 define the first partition to hold data where time_id is less than February 1, 2005. Our intention in this example is that this first partition will only hold data for January 2005 (our data analysis tells us that there is no data before this date). However, if there happens to be data prior to January 2005, it will also be placed in this partition and may skew the row distribution by placing many more rows than intended in this partition.


That completes our discussion on range partitioning. Let’s now have a look at list and hash partitioning.

List Partitioning  There may be cases when, after your analysis of a candidate table, you decide that range partitioning is not the best fit for your table. Another way to subset your data is to use list partitioning, where you group a set of partition key values and assign them to their own tablespace. By using this type of partitioning, you can control the placement of the records in specified partitions, thereby allowing you to group related records together that may not otherwise have a relationship.

As an example, assume we have an INS_COVERAGE table that contains insurance coverages. Our analysis of this table and its usage leads us to decide that we should partition, based on the attribute cov_type_cd, into the buckets shown in Table 9-5.

COV_TYPE_CD

Grouping

TERM 65

Life

UL

Life

TABLE 9-5.  Insurance Coverage Groupings

COV_TYPE_CD ADB

Grouping Life

COLA

GIB

GPO

GIB

WP DIS MF

Disability Disability Investment


TABLE 9-5.  Insurance Coverage Groupings
                   (continued)

The syntax of the create table statement is similar to that for range partitioning. An explanation is provided in Table 9-6.

 1 create table ins_coverage (
 2  plan_id         number         not null,
 3  cust_id         number         not null,
 4  time_id         date           not null,
 5  dist_channel_id number         not null,
 6  cov_type_cd     varchar2(50)   not null,
 7  cov_amt         number (10,2)  not null,
 8  prem_amt        number (10,2)  not null)
 9  storage (initial 65536 minextents 1
    maxextents 2147483645)
10 partition by list (cov_type_cd)
11 (partition cov_life values (‘TERM 65′, 
   ‘UL’, ‘ADB’)
12              tablespace cov_life_ts,
13 partition cov_gib values (‘COLA’, ‘GIB’) 14              tablespace cov_gib_ts,
15 partition cov_dis values (‘WP’, ‘DIS’)
16              tablespace cov_dis_ts,
17 partition cov_inv values (‘MF’)
18              tablespace cov_inv_ts
19 partition cov_other values(default));


TIP

If you discover missing partition keys values that need to be added to existing partition definitions after the table has been created, you can issue and alter table abc modify partition xyz add values (‘value1’, …).


Hash Partitioning  If you determine from your table analysis that neither range nor list partitioning is appropriate for your table, but you still want to reap the benefits

Lines

Important Points

10

Defines the type of partitioning (for example, list) and the partition key (cov_type_cd, for instance). Note that with list partitioning, only one attribute from the table can be chosen as the partition key—in other words, multicolumn partition keys are not permitted.

11–18

Define each partition based on the groups of values of cov_type_cd.

TABLE 9-6.  Explanation of List Partitioning Syntax

offered by partitioning, Oracle Database 10g provides a third partitioning option called hash partitioning. With hash partitioning, you define up to 16 partition key attributes as well as the number of partitions you want to spread the data across. As long as each partition is on its own physical device and most of the queries use the partition key as a predicate, you should see performance gains. Hash partitioning is useful if the distribution of data is unknown or unpredictable.

The following listing is an example of hash partitioning. Table 9-7 explains the important lines.

 1 create table sub_activations (
 2  sub_id           number     not null,
 3  dist_channel_id  number     not null,
 4  act_date         date       not null,
 
5  deact_date       date       not null,
 
6  sales_rep_id     number     not null)
 
7  storage (initial 65536 minextents 1
    maxextents 2147483645)
 
8 partition by hash (sub_id)
 
9 partitions 4
10 store in (subact_ts1, subact_ts2,
   subact_ts3, subact_ts4);

Lines

Important Points

8

Defines the type of partitioning (for instance, hash) and the partition key (for example, sub_id).

9

Specifies the number of partitions over which to spread the data.

10

Specifies the tablespaces into which the partitions will be placed.

TABLE 9-7.  Explanation of Hash Partitioning Syntax

It is beyond our scope to discuss the hashing algorithm used by Oracle Database 10g.We can say, however, that it is based on the number of attributes in the partition key and the number of partitions selected.

Composite Partitioning  The final type of partitioning is a combination of two of the previous types. Combining two types of partitioning is called composite partitioning. There are two valid combinations: range with hash and range with list. Using composite partitioning allows you to take advantage of the features of either hash or list partitioning within the higher groupings of ranges.

A good example of where this type of partitioning is used would be the PHONE_USAGE table we saw in our candidate table analysis. In this case, we have a table that is being loaded with 300 million records per month. We could choose to implement range partitioning by month and then subdivide the monthly partitions into four hash partitions. The following listing shows the SQL syntax that accomplishes this, and Table 9-8 provides the explanation of the important lines.

 1 create table phone_usage
 
2 (sub_id                number,
 
3  call_date             date,
 
4  call_type_id          number,
 
5  called_location       varchar2(50),
 
6  service_carrier_id    number)
 
7 storage (initial 65536 minextents 1
   maxextents 2147483645)
 8 partition by range (call_date)
 9 subpartition by hash(sub_id)
10 subpartition template(
11  subpartition sub1 tablespace ph_usg_ts1, 12  subpartition sub2 tablespace ph_usg_ts2, 13  subpartition sub3 tablespace ph_usg_ts3, 14  subpartition sub4 tablespace ph_usg_ts4) 15 (partition phoneusg_200601 values less
   than
16                      (to_date(‘2006-02-
   01′,’YYYY-MM-DD’)),
17 partition phoneusg_200602 values less
   than
18                      (to_date(‘2006-03-
   01′,’YYYY-MM-DD’)),
19 partition phoneusg_200603 values less
   than
20                      (to_date(‘2006-04-
   01′,’YYYY-MM-DD’)),
21 partition phoneusg_200604 values less
   than
22                      (to_date(‘2006-05-
   01′,’YYYY-MM-DD’)),
23 partition phoneusg_200605 values less
   than
24                      (to_date(‘2006-06-
   01′,’YYYY-MM-DD’)),
25 partition phoneusg_200606 values less
   than
26                      (to_date(‘2006-07-
   01′,’YYYY-MM-DD’)),
27 partition phoneusg_max values less than
   (maxvalue));

Lines

Important Points

8

Defines the higher-level partitioning type (for example, range) and its

 

partition key (for instance, call_date).

9

Specifies the secondary partitioning type (in this case, hash) and its

 

partition key (here, sub_id).

10–14

Specify a template that will be used to define the tablespace names

 

for each subpartition, as well as the tablespace names. The name

 

of each subpartition will be composed of the higher-level partition

 

name concatenated with an underscore, then the subpartition name

 

specified in the template. For example, data for January 2006 will be

 

placed in tablespace ph_usg_ts1 and divided into four subpartitions

 

called PHONEUSG_200601_SUB1, PHONEUSG_200601_SUB2,

 

PHONEUSG_200601_SUB3, and PHONEUSG_200601_SUB4.

15–27

Specify the ranges for the higher-level partition based on the call_date

TABLE 9-8.  Explanation of Composite Partitioning
                        Syntax

We’ve covered all the partitioning types and some of their nuances. Now it’s time to look at how we can get an even bigger boost from our partitioned tables using indexing.

Ask the Expert

Q:  If the partition key of record in a partitioned table is updated and the new value means that the data belongs to a different partition, does Oracle Database 10g automatically move the record to the appropriate partition?

A:  Yes, but the table must have the enable row movement option before the update is made. This option is invoked as either part of the create table
statement, or using an alter table statement. Otherwise, the update statement will generate an Oracle error.

{mospagebreak title=Define the Indexing Strategy}

Okay, so now you have decided how you are going to partition your data. To really get the most out of partitioning, you will need to look at some indexing strategies. There are two types of indexes applicable to partitioned tables: local and global. Let’s take a brief look at each.

Local Partitioned Indexes  Local partitioned indexes are indexes that are partitioned in the exact same manner as the data in their associated table—that is, they have a direct one-to-one relationship with the data partitions and use the same partition key. This association is illustrated in Figure 9-3; as you can see, each partition has its own associated “local” index. This drawing is based on Figure 9-2, which we saw at the beginning of the chapter. It shows how the data and indexes for each monthly subset are related and then joins it with the concept of the local index.

Because of this relationship, the following points apply to local indexes:

  • You cannot explicitly add or drop a partition to/from a local index. Oracle Database 10g automatically adds or drops index partitions when related data partitions are added or dropped. Referring back to Figure 9-3, if we dropped the data partition for January 2005, the corresponding index partition would automatically be dropped as well. Likewise, it we added a new data partition for January 2006, a new index partition would automatically be created.
  • One of the advantages of partitioning data is to allow access to other subsets while maintenance is being carried out on another partition. Since local index partitions are in line with the data partitions, this advantage still exists.
  • Local partitioned indexes require less maintenance than global indexes.

The SQL syntax for creating a local index is presented in the next listing, which refers to the SALES table we created under the Range Partitioning section. Table 9-9 contains an explanation of the syntax.

 1 create index sales_idx_l1 on sales
   (time_id)
 
2 local
 
3 (partition sales_idx_200501 tablespace
   sales_ts_idx_200501,
 
4  partition sales_idx_200502 tablespace
   sales_ts_idx_200502,
 
5  partition sales_idx_200503 tablespace
   sales_ts_idx_200503,

37  partition sales_idx_200711 tablespace
   sales_ts_idx_200711,
38  partition sales_idx_200712 tablespace
   sales_ts_idx_200712,
39  partition sales_idx_max tablespace
   sales_ts_idx_max);
 


FIGURE 9-3.  Local partitioned index

Lines

Important Points

2

Specifies that the index is to be local. This line alone tells Oracle

 

Database 10g that the index is to be partitioned along the same ranges

 

as the data.

3–39

Define the partition names and tablespaces for each partition. These

 

lines are optional, but without them, Oracle Database 10g would use

 

the same partition names as the data and would also place the index

 

in the same tablespaces as the data—a situation that is less than ideal

 

for performance!

TABLE 9-9.  Explanation of Local Partitioned Index 
                     Syntax

Ask the Expert

Q:  After a table and its local indexes have been defined using range partitioning with a default maxvalue partition, how can you add more partitions as new subsets of data are received?

A:  Use an alter table statement to split the default data partitions, adding your new partition ranges. For example, to add a data partition for January 2008
data in the SALES table in the previous listing, issue the following command:

alter table sales
 split partition sales_max at (to_date(‘2008-02-01′,’YYYY-MM-DD’))
 into (partition sales_200801 tablespace sales_ts_200801,
       partition sales_max tablespace sales_ts_max);

This alter table command will result in the default index partition for sales_idx_l1 to also be split. However, it will use the data partition names (for example, sales_200801) and tablespaces (sales_ts_200801, for instance);
remember in the local index example we explicitly specified the partition names and tablespaces for the index. Therefore, the partition names and tablespaces
will need to be adjusted using alter index commands, as follows:

alter index sales_idx_l1
 rename partition sales_200801 to sales_idx_200801;
alter index sales_idx_l1
 rebuild partition sales_idx_200801 tablespace sales_ts_idx_200801;
alter index sales_idx_l1
 rebuild partition sales_idx_max tablespace sales_ts_idx_max;

Some other points about local partitioned indexes:

  1. They can be unique, but only if the data partition key is part of the index key attributes.
  2. Bitmap indexes on partitioned tables must be local.
  3. Subpartitioned indexes are always local.
  • They are best suited for data warehouses and decision support systems.
  •  Local unique indexes also work well in OLTP 
     environments.

Global Partitioned Indexes  Global partitioned indexes are indexes that are not directly associated with the data partitions. Instead, their partitions are defined independently, and the partition key can be different from the data partition key. This association is illustrated in Figure 9-4. This figure is again based on Figure 9-2, seen at the beginning of the chapter. It shows that the data is partitioned by monthly ranges, with a global index partitioned by product.

One advantage of global indexes is that if partition pruning cannot occur for the data partitions due to the predicates of a query, index partition pruning may still be possible with the global partition index. Global partitioned indexes are available as

FIGURE 9-4.  Global partitioned index

either range-based or hash-based. When using range-based global indexes, you must specify a default partition with maxvalue. Let’s look at an example for creating a global partitioned index and then discuss how it would be used.

Referring again to the PHONE_USAGE table we created in the Composite Partitioning section, we have decided to create a global index on the call_type_id. Following is the SQL for this, and the explanation is presented in Table 9-10.

 1 create index phone_usg_idx_g1 on
   phone_usage (call_type_id)
 2 global
 3 partition by range (call_type_id)
 4 (partition ph_usg_idx_g1 values less than
   (2)
 5                   tablespace
   ph_usg_ts_idx_1,
 6 partition ph_usg_idx_g2 values less than
   (3)
 7                   tablespace
   ph_usg_ts_idx_2,
 8 partition ph_usg_idx_g3 values less than
   (4)
 9                   tablespace
   ph_usg_ts_idx_3,
10 partition ph_usg_idx_g4 values less than
   (5)
11                   tablespace
   ph_usg_ts_idx_4,
12 partition ph_usg_idx_gmax values less
   than (maxvalue)
13                   tablespace
   ph_usg_ts_idx_max);

Now, assume the following query is executed against the PHONE_USAGE table:

select count(*)
from phone_usage
where call_id = 3;

Without the global index we just defined, no partition pruning would occur since the query predicate does not refer to the data partition key call_date. But with the global index, only the index entries from the partition ph_usg_idx_g3 would be scanned and therefore only data records related to those entries would be used in the result set.

Lines

Important Points

2

Specifies that the index is to be global.

3

Defines the type of partitioning (here, range) for this index and the partition key (call_type_id, in this case).

4–13

Define the partition names and tablespaces for each partition.

TABLE 9-10.  Explanation for Global Partitioned Index Syntax

Some other points on global partitioned indexes:

  1. They require more maintenance than local indexes, especially when you drop data partitions.
  2. They can be unique.
  3. They cannot be bitmap indexes.
  4. They are best suited for OLTP systems for direct access to specific records.

Prefixed and Nonprefixed Partition Indexes  In your travels through the world of partitioning, you will hear the terms prefixed and nonprefixed partition indexes. These terms apply to both local and global indexes. An index is prefixed when the leftmost column of the index key is the same as the leftmost column of the index partition key. If the columns are not the same, the index is nonprefixed. That’s all well and good, but what affect does it have?

It is a matter of performance—nonprefixed indexes cost more, from a query perspective, than prefixed indexes. When a query is submitted against a partitioned table and the predicate(s) of the query include the index keys of a prefixed index, then pruning of the index partition can occur. If the same index was nonprefixed instead, then all index partitions may need to be scanned. (Scanning of all index partitions will depend on the predicate in the query and the type of index, global or local—if the data partition key is included as a predicate and the index is local, then the index partitions to be scanned will be based on pruned data partitions.)

{mospagebreak title=Project 9-1 Creating a Range-Partitioned Table and a Local Partitioned Index}

Data and index partitioning are an important part in maintaining large databases. We have discussed the reasons for partitioning and shown the steps to implement it. In this project, you will create a range-partitioned table and a related local partitioned index.

Step by Step

  1. Create two tablespaces called inv_ts_2007q1 and inv_2007q2 using the following SQL statements. These will be used to store data partitions.

    create tablespace inv_ts_2007q1
        datafile ‘inv_ts_2007q1_1.dat’ size 10m; create tablespace inv_ts_2007q2
        datafile ‘inv_ts_2007q2_1.dat’ size 10m;
  2. Create two tablespaces called inv_idx_ts_2007q1 and inv_idx_2007q2 using the following SQL statements. These will be used to store index partitions.

    create tablespace inv_idx_ts_2007q1 
       datafile ‘inv_idx_ts_2007q1_f1.dat’ size 10m;
    create tablespace inv_idx_ts_2007q2
       datafile ‘inv_idx_ts_2007q2_f1.dat’ size 10m;
  3. Create a partitioned table called INVOICE using the following listing, based on the following information: 

    a.  Define the table with the columns identified in Table 9-11.

    b.  Use order_date as the partition key, and subset the data into the first and second calendar quarters 2007.

    c.  Define the table with the data partitions and tablespaces identified in Table 9-12.

    d.  Use the enable row movement option:

    create table invoice(
      invoice_id  number,
      customer_id number,
      order_date  date,
      ship_date   date)

    partition by range (order_date) (partition INV_2007Q1 values less than
                             (to_date(2007-04-01′,’YYYY-MM-DD’))
                           tablespace inv_ts_2007Q1,
    partition INV_2007Q2 values less than 
                             (to_date(‘2007-07-01′,’YYYY-MM-DD’))
                            tablespace inv_ts_2007Q2,
    partition inv_max values less than (maxvalue)
                            tablespace inv_ts_max)
    enable row movement;
  4. Create a local partitioned index called inv_order_dt_idx on call_date using the following listing as well as the index partitions and tablespaces identified in Table 9-12.

    create index inv_order_dt_idx on invoice(order_date)
    local
    (partition inv_idx_2007q1 tablespace inv_idx_ts_2007q1,
    partition inv_idx_2007q2  tablespace inv_idx_ts_2007q2,
    partition inv_idx_max tablespace inv_idx_ts_max);

 

Column Name

Data Type

INVOICE_ID

NUMBER

CUSTOMER_ID

NUMBER

ORDER_DATE

DATE

SHIP_DATE

DATE

TABLE 9-11.  INVOICE Table Columns

Project Summary

The steps in this project reinforce some of the more common scenarios you will encounter: range-based partitioning and prefixed local partitioned indexes. Separate tablespaces were used for data and indexes, quarterly partitions were defined, a local index was defined, and the enable row movement was used to allow the database to automatically redistribute rows to their related partitions in the event of an update to the partition key.

Well, we have certainly covered a lot in this section. Having the background information on these topics will serve you well when maintaining and tuning large databases. Before we move on to the next section, let’s take a quick progress check to make sure it all sank in.

 

Partition Name

Tablespace Name

Upper Range Limit

Data Partitions

INV_2007Q1

INV_TS_2007Q1

Apr 1, 2007

 

INV_2007Q2

INV_TS_2007Q2

July 1, 2007

 

INV_MAX

INV_TS_MAX

MAXVALUE

Index Partitions

INV_IDX_2007Q1

INV_IDX_TS_2007Q1

Apr 1, 2007

 

INV_IDX_2007Q2

INV_IDX_TS_2007Q2

July 1, 2007

 

INV_IDX_MAX

INV_IDX_TS_MAX

MAXVALUE

TABLE 9-12.  INVOICE Table Data and Index
                    Partitions

———————————————————-

Progress Check

  1. List at least three DML commands that can be applied to partitions as well as tables.
  2. What does partition pruning mean?
  3. How many table attributes can be used to define the partition key in list partitioning?
  4. Which type of partitioning is most commonly used with a date-based partition key?
  5. Which partitioning types cannot be combined together for composite partitioning?
  6. How many partition keys can be defined for a partitioned table?
  7. Which type of partitioned index has a one-to-one relationship between the data and index partitions?
  8. What is meant by a prefixed partitioned index?

———————————————————-

CRITICAL SKILL 9.3

Compress Your Data

As you load more and more data into your database, performance and storage maintenance can quickly become concerns. Usually at the start of an implementation of a database, data volumes are estimated and projected a year or two ahead. However, often times these estimates turn out to be on the low side and you find yourself


Progress Check Answers
  1. The following DML commands can be applied to partitions as well as tables: delete, insert, select, truncate, and update.
  2. Partition pruning is the process of eliminating data not belonging to the subset defined by the criteria of a query.
  3. Only one table attribute can be used to define the partition key in list partitioning.
  4. Range partitioning is most commonly used with a date-based partition key.
  5. List and hash partitioning cannot be combined for composite partitioning.
  6. Only one partition key may be defined.
  7. Local partitioned indexes have a one-to-one relationship between the data and index partitions.
  8. A partitioned index is prefixed when the leftmost column of the index key is the same as the leftmost column of the index partition key.

scrambling for more space in order to load new data. In addition to the partitioning abilities discussed in the previous section, Oracle Database 10g has the ability to compress your data and indexes to further address the concerns of performance and maintenance.

Compression can be performed at the data or index levels. In this section, we will discuss the options available with Oracle Database 10g and their impacts.

{mospagebreak title=Data Compression}

With data compression, duplicate values in a database block are removed, leaving only a reference to the removed value, which is placed at the beginning of the block. All of the information required to rebuild the data in a block is contained within the block.

By compressing data, physical disk space required is reduced, and disk I/O and memory usage are also reduced, thereby improving performance. However, there are some cases when data compression is not appropriate. The following should be considered when looking at whether or not to compress data:

  1. Does the table exist in an OLTP or data warehousing environment? Data compression is best suited for data that is updated infrequently or, better yet, is read-only. Since most data in a data warehouse is considered read-only, data compression is more compatible with this type of environment.
  2. Does the table have many foreign keys? Foreign keys result in a lot of duplicate values in data. Tables with these structures are ideal candidates for data compression.
  3. How will data be loaded into the table? Even when compression is enabled, data is only compressed during bulk loading (for example, SQL*Loader). If data is loaded using a standard insert into statement, the data will not be compressed.

Compressed Data Objects

Compression can be specified for various data-related objects using the create or alter object commands. Table 9-13 identifies these objects and their first-level parent object, from which default compression properties are inherited if not specified for the base object. For example, if no compression property is specified for a table, it will inherit the property from its tablespace. The same applies to a data partition—if not specified at the partition level, the default property from the table will be used.

Object Type

Compression Property Inheritance Parent

Table

Tablespace

Materialized View

Tablespace

Partition

Table

TABLE 9-13.  Compression Property Inheritance

The following listing demonstrates the creation of a table with compression enabled. Line 7 contains the keyword compress to tell Oracle that data compression is to be enabled.

1 create table commission (
2  sales_rep_id     number,
3  prod_id          number,
4  comm_date        date,
5 comm_amt          number(10,2))
6 tablespace comm_ts pctfree 5 initrans 1
  maxtrans 255
7 compress;

Because compression can be enabled or disabled at different points in an object’s lifetime (say, by using an alter command), and because the compression action only occurs on new data being loaded, it is possible for an object to contain both compressed and uncompressed data at the same time.

Ask the Expert

Q:  Can existing data in a table be compressed and uncompressed?

A:  Yes. There are two methods. The first is by using an alter table statement such as

alter table sales
move compress;

The second method is by using the utilities contained in the dbms_redefinition package.

Index Key Compression

Index key compression works in a similar manner to data compression in that duplicated values are removed from the index entries, but is a little more complicated and has more restrictions and considerations than data compression, partly due to the way indexes are structured. Since the details of these structures are beyond the scope of this book, we will focus on the benefits of, and the mechanisms for, defining index compression.

Compressing indexes offer the same benefits as data compression—that is, reduced storage and improved (usually) performance. However, performance may suffer during index scans as the burden on the CPU is increased in order to rebuild the key values. One restriction we should mention is that index compression cannot be used on a unique index that has only one attribute.

Enabling index compression is done using the create index statement. If you need to compress or uncompress an existing index, you must drop the index first and then re-create it with or without the compression option enabled. The following listing illustrates the syntax for creating a compressed index. Table 9-14 provides an explanation of the syntax.

1 create index comm_sr_prod_idx
2  on commission (sales_rep_id, prod_id)
3  compress 1;

Using data and index compression can provide substantial benefits in the areas of storage and performance. In the next section, we will look at how to improve query performance using Oracle Database 10g’s parallel processing options.

 

Lines

Important Points

1–2

Specify that the index is to be created on columns sales_rep_id and

 

prod_id.

3

Specifies that the index is to be compressed, with the number of

 

prefixing (leading) columns to compress. In this case, we used a value

 

of 1 to indicate that duplicate values of the first column, sales_rep_id,

 

are to be removed.

TABLE 9-14.  Explanation of Index Compression Syntax

CRITICAL SKILL 9.4

Use Parallel Processing to Improve Performance

Improving performance, and by this we usually mean query performance, is always a hot item with database administrators and users. One of the best and easiest ways to boost performance is to take advantage of the parallel processing option offered by Oracle Database 10g (Enterprise Edition only).

Using normal (that is, serial) processing, the data involved in a single request (for example, user query) is handled by one database process. Using parallel processing, the request is broken down into multiple units to be worked on by multiple database processes. Each process looks at only a portion of the total data for the request. Serial and parallel processing are illustrated in Figures 9-5 and 9-6, respectively.

Parallel processing can help improve performance in situations where large amounts of data need to be examined or processed, such as scanning large tables, joining large tables, creating large indexes and scanning partitioned indexes. In order to realize the benefits of parallel processing, your database environment should not already be running at, or near, capacity. Parallel processing requires more processing, memory, and I/O resources than serial processing. Before implementing parallel processing, you may need to add hardware resources. Let’s forge ahead by looking at the Oracle Database 10g components involved in parallel processing.

Parallel Processing Database Components

Oracle Database 10g’s parallel processing components are the parallel execution coordinator and the parallel execution servers. The parallel execution coordinator is responsible for breaking down the request into as many processes as specified by the request. Each process is passed to a parallel execution server for execution during which only a portion of the total data is worked on. The coordinator then assembles the results from each server and presents the complete results to the requester.

FIGURE 9-5.  Serial processing

FIGURE 9-6.  Paralled processing

{mospagebreak title=Parallel Processing Configuration}

Generally, not much configuration is required for Oracle Database 10g to perform parallel processing. There are, however, a number of configuration options that are required and will affect the effectiveness of parallelism.

To begin with, parallel processing is enabled by default for DDL (for example, create and alter) and query (for example, select) commands, but disabled for DML (say, insert, update, delete, merge) commands. If you wish to execute a DML command in parallel mode, you must first issue the following command for the session in which the command is to be executed, as in the following.

alter session enable parallel dml;

Several database initialization parameters affect parallel processing. These are shown next.

Initialization Parameters

When an Oracle instance starts, the parameters in the initialization file are used to define or specify the settings for the instance. Table 9-15 identifies the initialization parameters that affect parallel processing. In many cases, the default values will provide adequate results for your large database. Specifics of your own environment will influence your decisions on the best values to use.

As you can see from Table 9-15, there are dependencies  between parameters. Modifying one may necessitate modifying others. If you modify any of the parallel processing parameters, you may also have to modify the following parameters:

  1. INSTANCE GROUPS
  2. PROCESSES

 

Parameter

Default Setting

Comment

PARALLEL_ADAPTIVE_

True

When set to True, enables an

MULTI_USER

 

adaptive algorithm designed to

 

 

improve performance in multiuser

 

 

environments that use parallel

 

 

processing.

PARALLEL_AUTOMATIC_

False

No longer used. Exists for backward

TUNING

 

compatibility only.

PARALLEL_EXECUTION_

Installation Dependent

Specifies the byte size of messages for

MESSAGE_SIZE

 

parallel processing.

PARALLEL_INSTANCE_

Installation Dependent

Used in Real Application Cluster

GROUP

 

environments to restrict parallel

 

 

query operations to a limited

 

 

number of database instances.

PARALLEL_MAX_SERVERS

# of CPUs available to the

Specifies maximum number of parallel

 

database instance

processes for the database instance.

PARALLEL_MIN_PERCENT

0

Specifies minimum percentage of

 

 

parallel processes required for parallel

 

 

processing. Value is a percentage of

 

 

PARALLEL_MAX_SERVERS.

PARALLEL_MIN_SERVERS

0

Specifies minimum number of parallel

 

 

processes for the database instance.

 

 

Cannot be greater than value of

 

 

PARALLEL_MAX_SERVERS.

PARALLEL_THREADS_

Usually set to 2, depending

Specifies the number of parallel

PER_CPU

on operation system

processes per CPU.


TABLE 9-15.  Initialization Parameters Affecting Parallel
                         Processing

  • SESSIONS
  • TRANSACTIONS

Invoke Parallel Execution

Parallel execution can be applied to tables, views, and materialized views. Assuming all necessary configurations have been made, there are several ways to invoke parallel execution. The first way is during table creation (including materialized views), using the parallel clause. If the table is being created using the results of a subquery, the loading of the table will be parallelized. In addition, by default, all queries that are executed against the table will be parallelized to the same extent. The next listing shows an example of specifying the parallel option for a table creation.

1 create table commission (
2  sales_rep_id     number,
3  prod_id          number,
4  comm_date        date,
5  comm_amt         number(10,2))
6 tablespace comm_ts pctfree 5 initrans 1 
  maxtrans 255
7 parallel;

The import line here is Line 7, specifying the parallel clause. This line could also have included an integer to specify the degree of parallelism—that is, the number of processes that are to be used to execute the parallel process. As the degree of parallelism is omitted in this example, the number of processes used will be calculated as number of CPUs × the value of the PARALLEL_THREADS_PER_CPU initialization parameter. The degree of parallelism for a table or materialized view can be changed using an alter statement.

Parallel processing can also be invoked when the parallel hint is used in a select statement. This hint will override any default parallel processing options specified during table creation. The following listing illustrates the use of the parallel hint. Line 1 contains the parallel hint, specifying the table to be parallelized (commission) and the degree of parallelism (4).

1 select /*+ parallel (commission, 4) */
2  prod_id, sum(comm_amt), count(*)
3 from commission
4 group by prod_id;

In some cases, Oracle Database 10g will alter how, or if, parallel processing is executed. Examples of these include the following:

  • Parallel processing will be disabled for DML commands (for example, insert, update, delete, and merge) on tables with triggers or referential integrity constraints.
  • If a table has a bitmap index, DML commands are always executed using serial processing if the table is nonpartitioned. If the table is partitioned, parallel processing will occur, but Oracle will limit the degree of parallelism to the number of partitions affected by the command.

Parallel processing can have a significant positive impact on performance. Impacts on performance are even greater when you combine range or hash-based partitioning with parallel processing. With this configuration, each parallel process can act on a particular partition. For example, if you had a table partitioned by month, the parallel execution coordinator could divide the work up according to those partitions. This way, partitioning and parallelism work together to provide results even faster.

CRITICAL SKILL 9.5

Use Materialized Views

So far, we have discussed several features and techniques at our disposal to improve performance in large databases. In this section, we will discuss another feature of Oracle Database 10g that we can include in our arsenal: materialized views.

Originally called snapshots, materialized views were introduced in Oracle8 and are only available in the Enterprise Edition. Like a regular view, the data in a materialized view are the results of a query. However, the results of a regular view are transitory—they are lost once the query is complete and if needed again, the query must be reexecuted. In contrast, the results from a materialized view are kept and physically stored in a database object that resembles a table. This feature means that the underlying query only needs to be executed once and then the results are available to all who need them.

From a database perspective, materialized views are treated like tables:

  1. You can perform most DML and query commands such as insert, delete, update and select.
  2. They can be partitioned.
  3. They can be compressed.
  4. They can be parallelized.
  5. You can create indexes on them.

Materialized views are different in other ways and have some interesting features associated with them. Before we talk about those, let’s look at some ways to use materialized views.

Uses for Materialized Views

Materialized views are used as a performance enhancing technique. Following are some usage examples. In this section, we will be discussing the first three uses, as they are applicable to our topic of large databases.

  1. Performing data summarization (for example, sums, averages)
  2. Prejoining tables
  3. Performing CPU-intensive calculations
  4. Replicating and distributing data

In large databases, particularly data warehousing environments, there is always a need to summarize, join, perform calculations, or do all three at once, on large numbers of records for reporting and analysis purposes. To improve performance in the past, a combination of views and physical tables were usually implemented that contained the results of these operations. The summary tables would require some type of extraction, transformation, and load (ETL) process to populate and refresh them. In addition to the base tables containing the detailed data, the users would need to know which combinations of the views and/or summary tables to use. These structures are illustrated in Figure 9-7.

Using materialized views has several advantages over more traditional methods. These include the following:

  1. Materialized views have a built-in data refresh process, which eliminates the need for custom ETL.
  2. As we said earlier, the data in materialized views can be partitioned, using the same techniques that apply to tables.
  3. Materialized views are transparent to the users. This is probably the most attractive feature of using materialized views, and we will expand more on this in the next section when we discuss automatic query rewriting.

Figure 9-8 illustrates summarization using materialized views.

FIGURE 9-7.  Summarization using views and summary tables

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye