Suppose that you run an analysis consisting of a set of retrievals that each perform a complex SELECT of a set of records (perhaps using an expensive join), and that differ only in the way they summarize the records. That's inefficient because it unnecessarily does the work of selecting the records repeatedly. A better technique is to select the records once, and then use them to generate the summaries. In such a situation, consider the following strategy:
The following example creates a summary table containing the average GNP value of countries in each continent. Then it compares the summary information to individual countries to find those countries with a GNP much less than the average and much more than the average. First, create the summary table: mysql> CREATE TABLE ContinentGNP -> SELECT Continent, AVG(GNP) AS AvgGNP -> FROM Country GROUP BY Continent; mysql> SELECT * FROM ContinentGNP; +---------------+---------------+ | Continent | AvgGNP | +---------------+---------------+ | Asia | 150105.725490 | | Europe | 206497.065217 | | North America | 261854.789189 | | Africa | 10006.465517 | | Oceania | 14991.953571 | | Antarctica | 0.000000 | | South America | 107991.000000 | +---------------+---------------+ Next, compare the summary table to the original table to find countries that have a GNP less than 1% of the continental average: mysql> SELECT -> Country.Continent, Country.Name, -> Country.GNP AS CountryGNP, -> ContinentGNP.AvgGNP AS ContinentAvgGNP -> FROM Country, ContinentGNP -> WHERE -> Country.Continent = ContinentGNP.Continent -> AND Country.GNP < ContinentGNP.AvgGNP * .01 -> ORDER BY Country.Continent, Country.Name; +-----------+---------------------------+------------+-----------------+ | Continent | Name | CountryGNP | ContinentAvgGNP | +-----------+---------------------------+------------+-----------------+ | Asia | Bhutan | 372.00 | 150105.725490 | | Asia | East Timor | 0.00 | 150105.725490 | | Asia | Laos | 1292.00 | 150105.725490 | | Asia | Maldives | 199.00 | 150105.725490 | | Asia | Mongolia | 1043.00 | 150105.725490 | | Europe | Andorra | 1630.00 | 206497.065217 | | Europe | Faroe Islands | 0.00 | 206497.065217 | | Europe | Gibraltar | 258.00 | 206497.065217 | | Europe | Holy See (Vat. City State)| 9.00 | 206497.065217 | | Europe | Liechtenstein | 1119.00 | 206497.065217 | ... Use the summary table again to find countries that have a GNP more than 10 times the continental average: mysql> SELECT -> Country.Continent, Country.Name, -> Country.GNP AS CountryGNP, -> ContinentGNP.AvgGNP AS ContinentAvgGNP -> FROM Country, ContinentGNP -> WHERE -> Country.Continent = ContinentGNP.Continent -> AND Country.GNP > ContinentGNP.AvgGNP * 10 -> ORDER BY Country.Continent, Country.Name; +---------------+---------------+------------+-----------------+ | Continent | Name | CountryGNP | ContinentAvgGNP | +---------------+---------------+------------+-----------------+ | Asia | Japan | 3787042.00 | 150105.725490 | | Europe | Germany | 2133367.00 | 206497.065217 | | North America | United States | 8510700.00 | 261854.789189 | | Africa | South Africa | 116729.00 | 10006.465517 | | Oceania | Australia | 351182.00 | 14991.953571 | +---------------+---------------+------------+-----------------+ The technique of using a summary table has several benefits:
Use of summary tables has the disadvantage that the records they contain are up-to-date only as long as the original values remain unchanged, and thus so are any summaries calculated from them. If the original table rarely or never changes, this might be only a minor concern. For many applications, summaries that are close approximations are sufficiently accurate. The summary table technique can be applied at multiple levels. Create a summary table that holds the results of an initial summary, and then summarize that table in different ways to produce secondary summaries. This avoids the computational expense of generating the initial summary repeatedly. When a summary consists of a single value, you need not create a table at all. Use a SQL variable to hold the value. Then you can use the value for comparison purposes in subsequent queries without having to calculate it again.
blog comments powered by Disqus |