Large Database Features In Oracle, conclusion - Windowing Functions (Page 6 of 7 )
Before we get into the details, we need to define a couple of terms: analytic partitioning and analytic window.
- Analytic partitioning is the division of the results of an analytic function into groups within which the analytic function operates. This is accomplished using the partition by clause of the analytic function. Do not confuse this partitioning with data partitioning discussed earlier in this chapter. Analytic partitioning can be used with any of the analytic functions we have discussed so far.
- An analytic window is a subset of an analytic partition in which the values of each row depend on the values of other rows in the window. There are two types of windows: physical and logical. A physical window is defined by a specified number of rows. A logical window is defined by the order by values.
Windowing functions can only be used in the select and order by clauses. They can be used to calculate the following:
- Moving sum
- Moving average
- Moving min/max
- Cumulative sum
- Statistical functions
Let’s look at an example of a moving sum function. The following shows the listing and results for calculating the moving sum from our SALES table by product category for a six-month period:
select b.prod_category,
to_char(a.time_id, 'yyyy-mm'),
sum(a.quantity_sold),
sum(sum(a.quantity_sold)) over (partition by b.prod_category
order by to_char(a.time_id, 'yyyy-mm')
rows unbounded preceding) as cume_sum
from sales a, products b
where a.prod_id = b.prod_id
and b.prod_category_id between 202 and 204 and to_char(a.time_id, 'yyyy-mm') between '2001-01' and '2001-06'
group by b.prod_category, to_char(a.time_id, 'yyyy-mm')
order by b.prod_category, to_char(a.time_id, 'yyyy-mm');
PROD_CATEGORY | TO_CHAR | SUM(A.QUANTITY_SOLD) | CUME_SUM |
------------- | ------- | ----------------- | -------- |
| Hardware | 2001-01 | 281 | 281 |
| Hardware | 2001-02 | 306 | 587 |
| Hardware | 2001-03 | 442 | 1029 |
| Hardware | 2001-04 | 439 | 1468 |
| Hardware | 2001-05 | 413 | 1881 |
| Hardware | 2001-06 | 429 | 2310 |
| Peripherals and Accessories | 2001-01 | 5439 | 5439 |
| Peripherals and Accessories | 2001-02 | 5984 | 11423 |
| Peripherals and Accessories | 2001-03 | 5104 | 16527 |
| Peripherals and Accessories | 2001-04 | 5619 | 22146 |
| Peripherals and Accessories | 2001-05 | 4955 | 27101 |
| Peripherals and Accessories | 2001-06 | 5486 | 32587 |
| Photo | 2001-01 | 2802 | 2802 |
| Photo | 2001-02 | 2220 | 5022 |
| Photo | 2001-03 | 2982 | 8004 |
| Photo | 2001-04 | 2824 | 10828 |
| Photo | 2001-05 | 2359 | 13187 |
| Photo | 2001-06 | 3225 | 16412 |
As you can see in the results, the moving sum is contained within each product category and resets when a new product category starts. In the past, windowing analysis used to require third-party products such as spreadsheet applications. Having the capabilities to perform these functions right in the database can streamline analysis and report generation efforts.
Other Functions
There are many other functions included with Oracle Database 10g that can be used to analyze data in large databases. While we will not be going into any detail for these, they are listed here for completeness:
- Statistical functions, including:
- Linear regression functions
- Descriptive statistics functions
- Hypothetical testing and crosstab statistics functions (contained a new PL/SQL package called dbms_statistics)
- first/last functions
- lag/lead functions
- Reporting aggregate functions
- Inverse percentile functions
- Hypothetical rank and distribution functions
As we stated earlier, we as database administrators do not have to know the theory behind the functions provided by Oracle Database 10g, or even how to use their results. However, we should be able to let our users know what capabilities are available. Knowing this, our users will be able to take advantage of these functions and construct efficient queries. In the next section, we will be discussing a new feature in Oracle Database 10g—SQL models.
CRITICAL SKILL 9.10
Create SQL Models
One of the more powerful data analysis features introduced in Oracle Database 10g is SQL models. SQL models allow a user to create multidimensional arrays from query results. Formulas, both simple and complex, can then be applied to the arrays to generate results in which the user is interested. SQL models allow inter-row calculations to be applied without doing expensive self-joins.
SQL models are similar to other multidimensional structures used in business intelligence applications. However, because they are part of the database, they can take advantage of Oracle Database 10g’s built-in features of scalability, manageability security, and so on. In addition, using SQL models, there is no need to transfer large amounts of data to external business intelligence applications.
A SQL model is defined by the model extension of the select statement. Columns of a query result are classified into one of three groups:
- Partitioning This is the same as the analytic partitioning we defined in the Windowing Functions section.
- Dimensions These are the attributes used to describe or fully qualify a measure within a partition. Examples could include product, sales rep id, and phone call type.
- Measures These are the numeric (usually) values to which calculations are applied. Examples could include quantity sold, commission amount, and call duration.
One of the main applications of SQL models is projecting or forecasting measures based on existing measures. Let’s look at an example of the model clause to illustrate. The listing and its results show an aggregate query using the SALES table:
select c.channel_desc, p.prod_category, t.calendar_year year,
sum(s.quantity_sold) quantity_sold from sales s, products p, channels c, times t
where s.prod_id = p.prod_id
and s.channel_id = c.channel_id
and s.time_id = t.time_id
and c.channel_desc = 'Direct Sales'
group by c.channel_desc, p.prod_category, t.calendar_year
order by c.channel_desc, p.prod_category, t.calendar_year;
CHANNEL_DESC | PROD_CATEGORY | YEAR | QUANTITY_SOLD |
------------ | -------------- | ---- | ------------- |
Direct Sales | Electronics | 1998 | 7758 |
Direct Sales | Electronics | 1999 | 15007 |
... | | | |
Direct Sales | Hardware | 2000 | 1970 |
Direct Sales | Hardware | 2001 | 2399 |
Direct Sales | Peripherals and Accessories | 1998 | 44258 |
... | | | |
Direct Sales | Software/Other | 2000 | 64483 |
Direct Sales | Software/Other | 2001 | 49146 |
In the results, we see the historical aggregate quantity_sold for each year by product category for the ‘Direct Sales’ channel. We can use the model clause to project the quantity_sold. In the following listing, we will project values for 2002 for the product category Hardware in the channel. The quantity_sold will be based on the previous year’s value (2001), plus 10 percent. Table 9-18 explains the syntax of the listing.
1 select channel_desc, prod_category,
year, quantity_sold
2 from
3 (select c.channel_desc, p.prod_category,
t.calendar_year year,
4 sum(s.quantity_sold)
quantity_sold
5 from sales s, products p, channels c,
times t
6 where s.prod_id = p.prod_id
7 and s.channel_id = c.channel_id
8 and s.time_id = t.time_id
9 group by c.channel_desc,
p.prod_category, t.calendar_year) sales
10 where channel_desc = 'Direct Sales'
11 model
12 partition by (channel_desc)
13 dimension by (prod_category, year)
14 measures (quantity_sold)
15 rules (quantity_sold['Hardware', 2002] 16 = quantity_sold
['Hardware', 2001] * 1.10)
17 order by channel_desc, prod_category,
year;
Lines | Important Points |
3–9 | Define an in-line select that will be the source for the query. It is basically the same query we started with before the model clause. |
10 | Specifies that we are only going to look at ‘Direct Sales’ channels. |
11 | Specifies the model clause. |
12 | Specifies the partition by clause (in this case, channel_desc). |
13 | Specifies the dimension by clause (here, prod_category and year). These elements will fully qualify the measure within the channel_desc partition. |
14 | Specifies the measures clause (quantity_sold). |
15–16 | Specify the rules clause—that is, calculations we want to perform on the measure. In this example, we are referring to a specific cell of quantity_sold, described by the dimensions prod_category (Hardware) and year (2002). |
TABLE 9-18. Explanation of model Clause Syntax
Following are the results of the previous query. Notice that a new row has been added for Hardware in 2002. Its quantity_sold is 2638.9, which is the previous year’s value (2399) plus 10 percent.
CHANNEL_DESC | PROD_CATEGORY | YEAR | QUANTITY_SOLD |
------------ | ----------------- | ---- | ------------- |
Direct Sales | Electronics | 1998 | 7758 |
Direct Sales | Electronics | 1999 | 15007 |
... | | | |
Direct Sales | Hardware | 2000 | 1970 |
Direct Sales | Hardware | 2001 | 2399 |
Direct Sales | Hardware | 2002 | 2638.9 |
Direct Sales | Peripherals and Accessories | 1998 | 44258 |
... | | | |
Direct Sales | Software/Other | 2000 | 64483 |
Direct Sales | Software/Other | 2001 | 49146 |
The model clause has many variations and allows for very powerful calculations. Let’s point out some of the characteristics and/or features you should be aware of. Supported functionalities include the following:
- Looping (for example, FOR loops)
- Recursive calculations
- Regression calculations
- Nested cell references
- Dimension wildcards and ranges
- The model clause does not update any base table, although in theory you could create a table or materialized view from the results of the query using the model clause.
Restrictions include the following:
- The rules clause cannot include any analytic SQL or windowing functions.
- A maximum of 20,000 rules may be specified. This may seem like plenty, but a FOR loop is expanded into many single-cell rules at execution time.
Next: Project 9-2 Using Analytic SQL Functions and Models >>
More Oracle Articles
More By McGraw-Hill/Osborne
|
This article is excerpted from chapter nine of the book Oracle Database 10g A Beginner's Guide, written by Ian Abramson et al (Mcgraw-Hill/Osborne, 2004; ISBN: 0072230789). Check it out at your favorite bookstore. Buy this book now.
|
|