HomeOracle Page 6 - Large Database Features In Oracle, conclusion
Windowing Functions - Oracle
When you are working with large databases with Oracle Database 10g, there are certain topics with which you need to be familiar. These topics are covered in this article, which is the second of two parts. It is excerpted from chapter nine of the book Oracle Database 10g A Beginner's Guide, written by Ian Abramson et al (McGraw-Hill/Osborne, 2004; ISBN: 0072230789).
Before we get into the details, we need to define a couple of terms: analytic partitioning and analytic window.
Analytic partitioning is the division of the results of an analytic function into groups within which the analytic function operates. This is accomplished using the partition by clause of the analytic function. Do not confuse this partitioning with data partitioning discussed earlier in this chapter. Analytic partitioning can be used with any of the analytic functions we have discussed so far.
An analytic window is a subset of an analytic partition in which the values of each row depend on the values of other rows in the window. There are two types of windows: physical and logical. A physical window is defined by a specified number of rows. A logical window is defined by the order by values.
Windowing functions can only be used in the select and order by clauses. They can be used to calculate the following:
Moving sum
Moving average
Moving min/max
Cumulative sum
Statistical functions
Let’s look at an example of a moving sum function. The following shows the listing and results for calculating the moving sum from our SALES table by product category for a six-month period:
select b.prod_category, to_char(a.time_id, 'yyyy-mm'), sum(a.quantity_sold), sum(sum(a.quantity_sold)) over (partition by b.prod_category order by to_char(a.time_id, 'yyyy-mm') rows unbounded preceding) as cume_sum from sales a, products b where a.prod_id = b.prod_id and b.prod_category_id between 202 and 204 and to_char(a.time_id, 'yyyy-mm') between '2001-01' and '2001-06' group by b.prod_category, to_char(a.time_id, 'yyyy-mm') order by b.prod_category, to_char(a.time_id, 'yyyy-mm');
PROD_CATEGORY
TO_CHAR
SUM(A.QUANTITY_SOLD)
CUME_SUM
-------------
-------
-----------------
--------
Hardware
2001-01
281
281
Hardware
2001-02
306
587
Hardware
2001-03
442
1029
Hardware
2001-04
439
1468
Hardware
2001-05
413
1881
Hardware
2001-06
429
2310
Peripherals and Accessories
2001-01
5439
5439
Peripherals and Accessories
2001-02
5984
11423
Peripherals and Accessories
2001-03
5104
16527
Peripherals and Accessories
2001-04
5619
22146
Peripherals and Accessories
2001-05
4955
27101
Peripherals and Accessories
2001-06
5486
32587
Photo
2001-01
2802
2802
Photo
2001-02
2220
5022
Photo
2001-03
2982
8004
Photo
2001-04
2824
10828
Photo
2001-05
2359
13187
Photo
2001-06
3225
16412
As you can see in the results, the moving sum is contained within each product category and resets when a new product category starts. In the past, windowing analysis used to require third-party products such as spreadsheet applications. Having the capabilities to perform these functions right in the database can streamline analysis and report generation efforts.
Other Functions
There are many other functions included with Oracle Database 10g that can be used to analyze data in large databases. While we will not be going into any detail for these, they are listed here for completeness:
Statistical functions, including:
Linear regression functions
Descriptive statistics functions
Hypothetical testing and crosstab statistics functions (contained a new PL/SQL package called dbms_statistics)
first/last functions
lag/lead functions
Reporting aggregate functions
Inverse percentile functions
Hypothetical rank and distribution functions
As we stated earlier, we as database administrators do not have to know the theory behind the functions provided by Oracle Database 10g, or even how to use their results. However, we should be able to let our users know what capabilities are available. Knowing this, our users will be able to take advantage of these functions and construct efficient queries. In the next section, we will be discussing a new feature in Oracle Database 10g—SQL models.
CRITICAL SKILL 9.10
Create SQL Models
One of the more powerful data analysis features introduced in Oracle Database 10g is SQL models. SQL models allow a user to create multidimensional arrays from query results. Formulas, both simple and complex, can then be applied to the arrays to generate results in which the user is interested. SQL models allow inter-row calculations to be applied without doing expensive self-joins.
SQL models are similar to other multidimensional structures used in business intelligence applications. However, because they are part of the database, they can take advantage of Oracle Database 10g’s built-in features of scalability, manageability security, and so on. In addition, using SQL models, there is no need to transfer large amounts of data to external business intelligence applications.
A SQL model is defined by the model extension of the select statement. Columns of a query result are classified into one of three groups:
Partitioning This is the same as the analytic partitioning we defined in the Windowing Functions section.
Dimensions These are the attributes used to describe or fully qualify a measure within a partition. Examples could include product, sales rep id, and phone call type.
Measures These are the numeric (usually) values to which calculations are applied. Examples could include quantity sold, commission amount, and call duration.
One of the main applications of SQL models is projecting or forecasting measures based on existing measures. Let’s look at an example of the model clause to illustrate. The listing and its results show an aggregate query using the SALES table:
select c.channel_desc, p.prod_category, t.calendar_year year, sum(s.quantity_sold) quantity_sold from sales s, products p, channels c, times t where s.prod_id = p.prod_id and s.channel_id = c.channel_id and s.time_id = t.time_id and c.channel_desc = 'Direct Sales' group by c.channel_desc, p.prod_category, t.calendar_year order by c.channel_desc, p.prod_category, t.calendar_year;
CHANNEL_DESC
PROD_CATEGORY
YEAR
QUANTITY_SOLD
------------
--------------
----
-------------
Direct Sales
Electronics
1998
7758
Direct Sales
Electronics
1999
15007
...
Direct Sales
Hardware
2000
1970
Direct Sales
Hardware
2001
2399
Direct Sales
Peripherals and Accessories
1998
44258
...
Direct Sales
Software/Other
2000
64483
Direct Sales
Software/Other
2001
49146
In the results, we see the historical aggregate quantity_sold for each year by product category for the ‘Direct Sales’ channel. We can use the model clause to project the quantity_sold. In the following listing, we will project values for 2002 for the product category Hardware in the channel. The quantity_sold will be based on the previous year’s value (2001), plus 10 percent. Table 9-18 explains the syntax of the listing.
1 select channel_desc, prod_category, year, quantity_sold 2 from 3 (select c.channel_desc, p.prod_category, t.calendar_year year, 4 sum(s.quantity_sold) quantity_sold 5 from sales s, products p, channels c, times t 6 where s.prod_id = p.prod_id 7 and s.channel_id = c.channel_id 8 and s.time_id = t.time_id 9 group by c.channel_desc, p.prod_category, t.calendar_year) sales 10 where channel_desc = 'Direct Sales' 11 model 12 partition by (channel_desc) 13 dimension by (prod_category, year) 14 measures (quantity_sold) 15 rules (quantity_sold['Hardware', 2002] 16 = quantity_sold ['Hardware', 2001] * 1.10) 17 order by channel_desc, prod_category, year;
Lines
Important Points
3–9
Define an in-line select that will be the source for the query. It is basically the same query we started with before the model clause.
10
Specifies that we are only going to look at ‘Direct Sales’ channels.
11
Specifies the model clause.
12
Specifies the partition by clause (in this case, channel_desc).
13
Specifies the dimension by clause (here, prod_category and year). These elements will fully qualify the measure within the channel_desc partition.
14
Specifies the measures clause (quantity_sold).
15–16
Specify the rules clause—that is, calculations we want to perform on the measure. In this example, we are referring to a specific cell of quantity_sold, described by the dimensions prod_category (Hardware) and year (2002).
TABLE 9-18.Explanation of model Clause Syntax
Following are the results of the previous query. Notice that a new row has been added for Hardware in 2002. Its quantity_sold is 2638.9, which is the previous year’s value (2399) plus 10 percent.
CHANNEL_DESC
PROD_CATEGORY
YEAR
QUANTITY_SOLD
------------
-----------------
----
-------------
Direct Sales
Electronics
1998
7758
Direct Sales
Electronics
1999
15007
...
Direct Sales
Hardware
2000
1970
Direct Sales
Hardware
2001
2399
Direct Sales
Hardware
2002
2638.9
Direct Sales
Peripherals and Accessories
1998
44258
...
Direct Sales
Software/Other
2000
64483
Direct Sales
Software/Other
2001
49146
The model clause has many variations and allows for very powerful calculations. Let’s point out some of the characteristics and/or features you should be aware of. Supported functionalities include the following:
Looping (for example, FOR loops)
Recursive calculations
Regression calculations
Nested cell references
Dimension wildcards and ranges
The model clause does not update any base table, although in theory you could create a table or materialized view from the results of the query using the model clause.
Restrictions include the following:
The rules clause cannot include any analytic SQL or windowing functions.
A maximum of 20,000 rules may be specified. This may seem like plenty, but a FOR loop is expanded into many single-cell rules at execution time.