Oracle
  Home arrow Oracle arrow Page 5 - Large Database Features In Oracle, con...
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
ORACLE

Large Database Features In Oracle, conclusion
By: McGraw-Hill/Osborne
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 5
    2005-12-15

    Table of Contents:
  • Large Database Features In Oracle, conclusion
  • RAC Architecture
  • Self-Managing Databases
  • cube
  • Analysis Functions
  • Windowing Functions
  • Project 9-2 Using Analytic SQL Functions and Models

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    Route your faxes to your email inbox. Private, secure fax numbers available from CallWave. Choose your fax number.

    Large Database Features In Oracle, conclusion - Analysis Functions
    (Page 5 of 7 )

    Oracle Database 10g provides a number of ranking and statistical functions that previously would have required some pretty heavy SQL to perform, or an extract to a third-party application. In this section, we will look at the analysis function available and provide examples of their use where appropriate.


    NOTE

    Some of the functions in this section are based on complex statistical calculations. Don’t be worried if you are unfamiliar with these concepts. It is more important for you to know that these functions exist than it is to understand the theory behind them.


    Ranking Functions

    Ranking functions provide the ability to rank a row of a query result relative to the other rows in the result set. Common examples of uses for these functions include identifying the top ten selling products for a period, or classifying or grouping a salesperson’s commissions into one of four buckets. The ranking functions included in Oracle Database 10g are

    1. rank
    2. dense_rank
    3. cume_dist
    4. percent_rank
    5. ntile
    6. row_number

    rank and dense_rank Functions  The simplest ranking functions are rank and dense_rank. These functions are very similar and determine the ordinal position of each row within the query result set. The difference between these two functions is that rank will leave a gap in the sequence when there is a tie for position, whereas dense_rank does not leave a gap. The results of the following listing illustrate the difference between the two.

    select prod_id,
           sum(quantity_sold),
           rank () over (order by sum(quantity_sold) desc) as rank,
           dense_rank () over (order by sum(quantity_sold) desc) as
    dense_rank
    from sales
    where to_char(time_id, 'yyyy-mm') = '2001-06'
    group by prod_id;
    PROD_ID SUM(QUANTITY_SOLD) RANK  DENSE_RANK
    ------ ---------------- ---- ---------
         24                 762   1          1 
         30                 627   2          2
        147                 578   3          3
        
    33                 552   4          4
        
    40                 550   5          5
       
    133                 550   5          5
        
    48                 541   7          6
       
    120                 538   8          7
        
    23                 535   9          8
       
    119                 512  10          9
       
    124                 503  11         10
       
    140                 484  12         11
       
    148                 472  13         12
       
    139                 464  14         13
       
    123                 459  15         14
       
    131                 447  16         15
        
    25                 420  17         16
       
    135                 415  18         17
       
    137                 407  19         18
       
    146                 401  20         19

    As you can see, the ordinal position 6 does not exist as a value for rank, but it does for dense_rank. If from this result set you wanted to see the top ten listings for prod_id, you would use the original query as a subquery, as in the following listing:

    select * from
     
    (select prod_id,
            
    sum(quantity_sold),
            
    rank () over (order by sum(quantity_sold) desc) as rank,
            
    dense_rank () over (order by sum(quantity_sold) desc) as
    dense_rank
      from sales
      where to_char(time_id, 'yyyy-mm') = '2001-06'
      group by prod_id)
    where rank < 11;

    To see the bottom ten prod_ids, use the same query, but change the order by option from descending (desc) to ascending (asc). The next two functions, cume_dist and percent_rank, are statistical in nature but still part of the family of ranking functions.

    cume_dist and percent_rank Functions  The cume_dist function calculates the cumulative distribution of a value in a group of values. Since this is not a statistics beginner’s guide, we will not attempt to provide the theoretical background on cumulative distribution. However, we can offer these points:

    • The range of values returned by cume_dist is always between 0 and 1.
    • The value returned by cume_dist is always the same in the case of tie values in the group.
    • The formula for cumulative distribution is

    # of rows with values = value of row being evaluated 
                         # of rows being evaluated

    Looking at the query we used when discussing rank, the following listing calculates the cumulative distribution for quantity_sold. The results immediately follow the listing.

    select prod_id,
       sum(quantity_sold),
       cume_dist () over (order by sum(quantity_sold) asc) as
    cume_dist
    from sales
    where to_char(time_id, 'yyyy-mm') = '2001-06'
    group by prod_id
    order by sum(quantity_sold) desc;
    PROD_ID SUM(QUANTITY_SOLD)  CUME_DIST
    ------- ------------------ -----------
         24                762          1
         30                627 .985915493
        147                578 .971830986
         33                552 .957746479
         40                550 .943661972
        133                550 .943661972
         48                541 .915492958
        120                538 .901408451
        
    23                535 .887323944
        119                512 .873239437
        124                503 .85915493
        140                484 .845070423
        148                472 .830985915
        139                464 .816901408
        123                459 .802816901
        131                447 .788732394
        
    25                420 .774647887
        135                415 .76056338
        137                407 .746478873
        146                401 .732394366

    The percent_rank function is similar to the cume_dist function, but calculates a percentage ranking of a value relative to its group. Again, without getting into the theory, we can make some points about percent_rank:

    • The range of values returned by the function is always between 0 and 1.
    • The row with a rank of 1 will have a percent rank of 0.
    • The formula for calculating percent rank is

    rank of row within its group – 1
       # of rows in the group – 1

    The next listing and its results demonstrate the percent_rank function using the base query we have been using in this section.

    select prod_id,
           sum(quantity_sold),
           rank () over (order by sum(quantity_sold) desc) as rank,
           percent_rank ()
                  
    over (order by sum(quantity_sold) asc) as
    percent_rank
    from sales
    where to_char(time_id, 'yyyy-mm') = '2001-06'
    group by prod_id
    order by sum(quantity_sold) desc;

    PROD_ID

    SUM(QUANTITY_SOLD)

    RANK

    PERCENT_RANK

    --------------

    ----------------------

    ------

    -------------------

    24

    762

    1

    1

    30

    627

    2

    .985714286

    147

    578

    3

    .971428571

    33

    552

    4

    .957142857

    40

    550

    5

    .928571429

    133

    550

    5

    .928571429

    48

    541

    7

    .914285714

    120

    538

    8

    .9

    23

    535

    9

    .885714286

    119

    512

    10

    .871428571

    124

    503

    11

    .857142857

    140

    484

    12

    .842857143

    148

    472

    13

    .828571429

    139

    464

    14

    .814285714

    123

    459

    15

    .8

    131

    447

    16

    .785714286

    25

    420

    17

    .771428571

    135

    415

    18

    .757142857

    137

    407

    19

    .742857143

    146

    401

    20

    .728571429

     

    The ntile Function  The ntile function divides a result set into a number of buckets specified at query time by the user, and then assigns each row in the result set a bucket number. The most common number of buckets used are 3 (tertiles), 4 (quartiles), and 10 (deciles). Each bucket will have the same number of rows, except in the case when the number of rows does not divide evenly by the number of buckets. In this case, each of the leftover rows will be assigned to buckets with the lowest bucket numbers until all leftover rows are assigned. For example, if four buckets were specified and the number of rows in the result set was 98, buckets 1 and 2 would have 25 rows each and buckets 3 and 4 would have 24 rows each.

     

    Let’s look at an example. Using our base query of amount_sold in the SALES table, we want to look at amount_sold by product subcategory and rank the amounts into four buckets. Here’s the SQL:

    select b.prod_subcategory,
          
    sum(a.quantity_sold),
          
    ntile(4) over (ORDER BY SUM(a.quantity_sold) desc) as
    quartile
    from sales a, products b
    where a.prod_id = b.prod_id
    and to_char(a.time_id, 'yyyy-mm') = '2001-06'
    group by b.prod_subcategory;

    As you can see in the following results, the number of product subcategories was not evenly divisible by the number of buckets specified (in this case, 4). Therefore, six subcategories were assigned to the first quartile (bucket 1) and five subcategories were assigned to the second, third, and fourth quartiles.

    PROD_SUM  
    SUBCATEGORY   (A.QUANTITY_SOLD)   QUARTILE   ROWNUMBER   
    ---------------------------------------
    Accessories  323011 
    Y Box Games  257212 
    Recordable CDs  227813 
    Camera Batteries  219214 
    Recordable DVD Discs211515
    Documentation193116
    Modems/Fax131427
    CD-ROM107628
    Y Box Accessories105029 
    Printer Supplies    956210 
    Memory748211
    Camera Media664312
    Home Audio370313
    Game Consoles  352314
    Operating Systems343315
    Bulk Pack Diskettes270316
    Portable PCs215417
    Desktop PCs214418
    Camcorders196419
    Monitors178420
    Cameras173421

    More Oracle Articles
    More By McGraw-Hill/Osborne


       · This article is an excerpt from the book "Oracle Database 10g A Beginner's Guide,"...
     

    Buy this book now. This article is excerpted from chapter nine of the book Oracle Database 10g A Beginner's Guide, written by Ian Abramson et al (Mcgraw-Hill/Osborne, 2004; ISBN: 0072230789). Check it out at your favorite bookstore. Buy this book now.

       

    ORACLE ARTICLES

    - Tuning PL/SQL Code
    - Debugging PL/SQL Code
    - Testing PL/SQL Code
    - Working With PL/SQL Code
    - Conditional Compilation for Oracle Database ...
    - Compile-Time Warnings for Oracle DB 10g
    - Compiling PL/SQL Code for an Oracle Database
    - Troubleshooting PL/SQL Code
    - Managing PL/SQL Code
    - Data Manipulation and More for HTML DB Appli...
    - Oracle Database Fundamentals
    - Adding Processes to HTML DB Applications
    - Adding Computations, Processes, and Validati...
    - Sub-templates and More with Oracle HTML DB
    - Focusing on Templates in Oracle HTML DB

     
    Accelerating Trading Partner Performance
     
    Competing on Analytics
     
    Cost Effective Scaling with Virtualization and Coyote Point Systems
     
    Five Checkpoints to Implementing IP Telephony
     
    Hosted Email Security: Staying Ahead of New Threats
     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway