Home arrow Oracle arrow Page 6 - Large Database Features In Oracle, conclusion

Windowing Functions - Oracle

When you are working with large databases with Oracle Database 10g, there are certain topics with which you need to be familiar. These topics are covered in this article, which is the second of two parts. It is excerpted from chapter nine of the book Oracle Database 10g A Beginner's Guide, written by Ian Abramson et al (McGraw-Hill/Osborne, 2004; ISBN: 0072230789).

TABLE OF CONTENTS:
  1. Large Database Features In Oracle, conclusion
  2. RAC Architecture
  3. Self-Managing Databases
  4. cube
  5. Analysis Functions
  6. Windowing Functions
  7. Project 9-2 Using Analytic SQL Functions and Models
By: McGraw-Hill/Osborne
Rating: starstarstarstarstar / 6
December 15, 2005

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Before we get into the details, we need to define a couple of terms: analytic partitioning and analytic window.

  1. Analytic partitioning is the division of the results of an analytic function into groups within which the analytic function operates. This is accomplished using the partition by clause of the analytic function. Do not confuse this partitioning with data partitioning discussed earlier in this chapter. Analytic partitioning can be used with any of the analytic functions we have discussed so far.
  2. An analytic window is a subset of an analytic partition in which the values of each row depend on the values of other rows in the window. There are two types of windows: physical and logical. A physical window is defined by a specified number of rows. A logical window is defined by the order by values.

Windowing functions can only be used in the select and order by clauses. They can be used to calculate the following:

  1. Moving sum
  2. Moving average
  3. Moving min/max
  4. Cumulative sum
  5. Statistical functions

Let’s look at an example of a moving sum function. The following shows the listing and results for calculating the moving sum from our SALES table by product category for a six-month period:

select b.prod_category,
       to_char(a.time_id, 'yyyy-mm'),
       sum(a.quantity_sold),
       sum(sum(a.quantity_sold)) over (partition by b.prod_category
                               
order by to_char(a.time_id, 'yyyy-mm')
                               
rows unbounded preceding) as cume_sum
from sales a, products b
where a.prod_id = b.prod_id
and b.prod_category_id between 202 and 204 and to_char(a.time_id, 'yyyy-mm') between '2001-01' and '2001-06'
group by b.prod_category, to_char(a.time_id, 'yyyy-mm')
order by b.prod_category, to_char(a.time_id, 'yyyy-mm');

 

PROD_CATEGORY

TO_CHAR

SUM(A.QUANTITY_SOLD)

CUME_SUM

-------------

-------

-----------------

--------

Hardware 2001-01281281
Hardware 2001-02306587
Hardware 2001-034421029
Hardware 2001-044391468
Hardware 2001-054131881
Hardware 2001-064292310
Peripherals and Accessories  2001-0154395439
Peripherals and Accessories2001-02598411423
Peripherals and Accessories2001-03510416527
Peripherals and Accessories2001-04561922146
Peripherals and Accessories2001-05495527101
Peripherals and Accessories2001-06548632587
Photo2001-0128022802
Photo 2001-0222205022
Photo 2001-0329828004
Photo 2001-04282410828
Photo 2001-05235913187
Photo 2001-06322516412

As you can see in the results, the moving sum is contained within each product category and resets when a new product category starts. In the past, windowing analysis used to require third-party products such as spreadsheet applications. Having the capabilities to perform these functions right in the database can streamline analysis and report generation efforts.

Other Functions

There are many other functions included with Oracle Database 10g that can be used to analyze data in large databases. While we will not be going into any detail for these, they are listed here for completeness:

  1. Statistical functions, including:
  2. Linear regression functions
  3. Descriptive statistics functions
  4. Hypothetical testing and crosstab statistics functions (contained a new PL/SQL package called dbms_statistics)
  5. first/last functions
  6. lag/lead functions
  7. Reporting aggregate functions
  8. Inverse percentile functions
  9. Hypothetical rank and distribution functions

As we stated earlier, we as database administrators do not have to know the theory behind the functions provided by Oracle Database 10g, or even how to use their results. However, we should be able to let our users know what capabilities are available. Knowing this, our users will be able to take advantage of these functions and construct efficient queries. In the next section, we will be discussing a new feature in Oracle Database 10g—SQL models.

CRITICAL SKILL 9.10

Create SQL Models

One of the more powerful data analysis features introduced in Oracle Database 10g is SQL models. SQL models allow a user to create multidimensional arrays from query results. Formulas, both simple and complex, can then be applied to the arrays to generate results in which the user is interested. SQL models allow inter-row calculations to be applied without doing expensive self-joins.

SQL models are similar to other multidimensional structures used in business intelligence applications. However, because they are part of the database, they can take advantage of Oracle Database 10g’s built-in features of scalability, manageability security, and so on. In addition, using SQL models, there is no need to transfer large amounts of data to external business intelligence applications.

A SQL model is defined by the model extension of the select statement. Columns of a query result are classified into one of three groups:

  1. Partitioning  This is the same as the analytic partitioning we defined in the Windowing Functions section.
  2. Dimensions  These are the attributes used to describe or fully qualify a measure within a partition. Examples could include product, sales rep id, and phone call type.
  3. Measures  These are the numeric (usually) values to which calculations are applied. Examples could include quantity sold, commission amount, and call duration.

One of the main applications of SQL models is projecting or forecasting measures based on existing measures. Let’s look at an example of the model clause to illustrate. The listing and its results show an aggregate query using the SALES table:

select c.channel_desc, p.prod_category, t.calendar_year year,
      
sum(s.quantity_sold) quantity_sold from sales s, products p, channels c, times t
where s.prod_id = p.prod_id
and   s.channel_id = c.channel_id
and   s.time_id = t.time_id
and   c.channel_desc = 'Direct Sales'
group by c.channel_desc, p.prod_category, t.calendar_year
order by c.channel_desc, p.prod_category, t.calendar_year;

CHANNEL_DESC

PROD_CATEGORY

YEAR

QUANTITY_SOLD

------------

--------------

----

-------------

Direct Sales

Electronics

1998

7758

Direct Sales

Electronics

1999

15007

...

 

 

 

Direct Sales

Hardware

2000

1970

Direct Sales

Hardware

2001

2399

Direct Sales

Peripherals and Accessories

1998

44258

...

 

 

 

Direct Sales

Software/Other

2000

64483

Direct Sales

Software/Other

2001

49146

In the results, we see the historical aggregate quantity_sold for each year by product category for the ‘Direct Sales’ channel. We can use the model clause to project the quantity_sold. In the following listing, we will project values for 2002 for the product category Hardware in the channel. The quantity_sold will be based on the previous year’s value (2001), plus 10 percent. Table 9-18 explains the syntax of the listing.

 1  select channel_desc, prod_category,
   year, quantity_sold
 2  from
 3  (select c.channel_desc, p.prod_category,
   t.calendar_year year,
 4          sum(s.quantity_sold)
   quantity_sold
 5  from sales s, products p, channels c,
   times t
 6  where s.prod_id = p.prod_id
 7  and   s.channel_id = c.channel_id
 8  and   s.time_id = t.time_id
 9  group by c.channel_desc,
   p.prod_category, t.calendar_year) sales
10 where channel_desc = 'Direct Sales'
11 model
12   partition by (channel_desc)
13   dimension by (prod_category, year)
14   measures (quantity_sold)
15   rules (quantity_sold['Hardware', 2002] 16                   = quantity_sold
   ['Hardware', 2001] * 1.10)
17 order by channel_desc, prod_category, 
   year;

 

Lines

Important Points

3–9

Define an in-line select that will be the source for the query. It is basically the same query we started with before the model clause.

10

Specifies that we are only going to look at ‘Direct Sales’ channels.

11

Specifies the model clause.

12

Specifies the partition by clause (in this case, channel_desc).

13

Specifies the dimension by clause (here, prod_category and year). These elements will fully qualify the measure within the channel_desc partition.

14

Specifies the measures clause (quantity_sold).

15–16

Specify the rules clause—that is, calculations we want to perform on the measure. In this example, we are referring to a specific cell of quantity_sold, described by the dimensions prod_category (Hardware) and year (2002).

TABLE 9-18.  Explanation of model Clause Syntax

 

Following are the results of the previous query. Notice that a new row has been added for Hardware in 2002. Its quantity_sold is 2638.9, which is the previous year’s value (2399) plus 10 percent.

CHANNEL_DESC

PROD_CATEGORY

YEAR

QUANTITY_SOLD

------------

-----------------

----

-------------

Direct Sales

Electronics

1998

7758

Direct Sales

Electronics

1999

15007

...

 

 

 

Direct Sales

Hardware

2000

1970

Direct Sales

Hardware

2001

2399

Direct Sales

Hardware

2002

2638.9

Direct Sales

Peripherals and Accessories

1998

44258

...

 

 

 

Direct Sales

Software/Other

2000

64483

Direct Sales

Software/Other

2001

49146

The model clause has many variations and allows for very powerful calculations. Let’s point out some of the characteristics and/or features you should be aware of. Supported functionalities include the following:

  • Looping (for example, FOR loops)
  • Recursive calculations
  • Regression calculations
  • Nested cell references
  • Dimension wildcards and ranges
  • The model clause does not update any base table, although in theory you could create a table or materialized view from the results of the query using the model clause.

Restrictions include the following:

  1. The rules clause cannot include any analytic SQL or windowing functions.
  2. A maximum of 20,000 rules may be specified. This may seem like plenty, but a FOR loop is expanded into many single-cell rules at execution time.

 



 
 
>>> More Oracle Articles          >>> More By McGraw-Hill/Osborne
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

ORACLE ARTICLES

- Oracle Java Security Woes Continue
- Oracle's New IaaS Cloud Option: There's a Ca...
- Oracle Acquires Eloqua to Boost Cloud Presen...
- Choosing Innovation: Oracle Survey Insights
- Oracle Fixes Privilege Escalation Bug
- Oracle`s Communications Service Availability...
- Oracle Releases Exalytics, Taleo Plans
- Oracle Releases Communications Network Integ...
- Oracle Releases Communications Data Model 11...
- Oracle Releases PeopleSoft PeopleTools 8.52
- Oracle Integrates Cloudera Apache Distro, My...
- Oracle Releases MySQL 5.5.18
- Oracle Announces NoSQL Database Availability
- Sorting Database Columns With the SELECT Sta...
- Retrieving Table Data with the LIKE Operator

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: