MySQL
  Home arrow MySQL arrow Page 3 - MySQL wizardry
Dev Shed Forums 
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Sun Developer Network 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Moblin 
JMSL Numerical Library 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
MYSQL

MySQL wizardry
By: Giuseppe Maxia
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 37
    2001-06-05

    Table of Contents:
  • MySQL wizardry
  • Opening the path
  • Some help from SQL itself
  • Conclusion

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    MySQL wizardry - Some help from SQL itself


    (Page 3 of 4 )

    The Wizard smiled. "If you like to, you're welcome. However, you could get some help from the database engine itself, provided that you ask nicely." And while he was speaking he typed a very cryptic statement:

    mysql> SELECT CONCAT(', SUM(IF(dept = "',dept,'", 1,0)) AS `',dept,'`')
    FROM departments;
    CONCAT(', SUM(IF(dept = "',dept,'", 1,0)) AS `',dept,'`')
    , SUM(IF(dept = "Development", 1,0)) AS `Development`
    , SUM(IF(dept = "Personnel", 1,0)) AS `Personnel`
    , SUM(IF(dept = "Research", 1,0)) AS `Research`
    , SUM(IF(dept = "Sales", 1,0)) AS `Sales`
    , SUM(IF(dept = "Training", 1,0)) AS `Training`
    5 rows in set (0.00 sec)

    "You know," he went on, "you can also use SQL to produce SQL code. This is one of the cases. You have in front of you the list of columns that you should include in your query. Now, with some cut-and-paste, we could get the result you want. Here."

    mysql> SELECT location
    , SUM(IF(dept = "Development", 1,0)) AS `Development` 
    , SUM(IF(dept = "Personnel", 1,0)) AS `Personnel`
    , SUM(IF(dept = "Research", 1,0)) AS `Research`
    , SUM(IF(dept = "Sales", 1,0)) AS `Sales`
    , SUM(IF(dept = "Training", 1,0)) AS `Training`
    , COUNT(*) AS total
    FROM locations INNER JOIN employees USING (loc_code)
    INNER JOIN departments USING (dept_code)
    GROUP BY location;
    locationDevelopmentPersonnelResearchSalesTrainingtotal
    Boston200103
    Cagliari003003
    London010012
    Manchester000202
    Marseille200103
    Milano000112
    New York300104
    Paris100102
    Roma010113
    9 rows in set (0.36 sec)

    The wizard was now unstoppable. He had reached the stage where he simply couldn't help giving away his knowledge. "Before we go on," he said, taking possession of my keyboard, "let's see how this same method can do more than counting. Let's replace those 1s with a numeric field, and do real summing up."

    mysql> SELECT location
    , SUM(IF(dept = "Development", salary,0)) AS `Development`
    , SUM(IF(dept = "Personnel", salary,0)) AS `Personnel`
    , SUM(IF(dept = "Research", salary,0)) AS `Research`
    , SUM(IF(dept = "Sales", salary,0)) AS `Sales`
    , SUM(IF(dept = "Training", salary,0)) AS `Training`
    , SUM(salary) AS total
    FROM locations INNER JOIN employees USING (loc_code)
    INNER JOIN departments USING (dept_code)
    GROUP BY location;
    locationDevelopmentPersonnelResearchSalesTrainingtotal
    Boston11900005950017850
    Cagliari00168000016800
    London0570000570011400
    Manchester00011550011550
    Marseille11150005800016950
    Milano0005550490010450
    New York17850006100023950
    Paris5700005400011100
    Roma0500005500510015600
    9 rows in set (0.00 sec)

    "Don't forget to change also the total field. A simple SUM without IF, and your total is ready."

    I was looking at the screen, which was showing what seemed to be the complete solution to my problem, but the wizard was shaking his head. "As a matter of fact," he was saying, there is something that we can improve here. We have two queries, in which we are reading the departments table. So we are reading it twice twice. The whole process could be improved, by querying for department code the first time, and omitting the join with departments the second time."

    mysql> SELECT CONCAT(', SUM(IF(dept_code = "',dept_code,'", 1,0)) AS `',dept,'`')
    FROM departments;


    "Here. Let's get the columns once more. Good. And there it is. This one looks better."

    SELECT location
    , SUM(IF(dept_code = "1", 1,0)) AS `Personnel`
    , SUM(IF(dept_code = "2", 1,0)) AS `Training`
    , SUM(IF(dept_code = "3", 1,0)) AS `Research`
    , SUM(IF(dept_code = "4", 1,0)) AS `Sales`
    , SUM(IF(dept_code = "5", 1,0)) AS `Development`
    , COUNT(*) AS total
    FROM locations INNER JOIN employees USING (loc_code)
    GROUP BY location;


    He changed the previous two SQL statements, executed them, with some cut-and-pasting in the middle, and got exactly the same result. Now he was explaining me why he did it. "The first query is scanning all the departments table, and we know that its results will be used to build the second query containing the employees table, which has already a department code. Therefore, we can skip the join with departments, since the only purpose of that join was to get the names of the departments."
    "I see" was the only comment I could offer, Since I was overwhelmed by his continuous insight. I got the idea, and I couldn't help thinking that he must have done that before. All those pieces of information were coming just too fast for me. Luckily, all logs were on, so I knew that I would be able to get all the statements back when he would leave. Which was not the case yet. The wizard was now ready to give me his philosophical view of cross tabulating.

    {mospagebreak title=The golden rules} "Rule number one:" -- wizards have always a rule n. 1 for everything -- "cross-tabulating is a simple algorithm with a complex implementation. Once you know the principle, you can do everything you need for your statistics. The difficult part is making the process automatic. You don't know in advance which values you should use for your columns. Hence, you have to find such values every time, before creating the second query. The real problem with cross-tabs is that there are no simple cases. Even the ones that look as such, have sticky problems. And you should consider this as rule number two."

    I was about to lose heart, but he had more for me. "On the other hand," he continued, "This algorithm is so flexible that will let you do things that your ordinary spreadsheet won't let you. For example, let's suppose that you want a x-tab with the number of employees by gender and the total of their salaries as well. In your common graphical tool you can do either of them. With this system, you can combine both. It's no difficult, just complicated. Now, if you fetch some more coffee, I'll show you."
    And when I came back from the kitchenette, there was it.

    mysql> SELECT location
    , SUM(IF(gender='M',1,0)) AS M
    , SUM(IF(gender='M',salary,0)) AS salary_M
    , SUM(IF(gender='F',1,0)) AS F
    , SUM(IF(gender='F',salary,0)) AS salary_F
    , COUNT(*) AS empl
    , SUM(salary) AS tot_salary
    FROM locations INNER JOIN employees USING (loc_code) GROUP BY location;
    locationMsalary_MFsalary_Fempltot_salary
    Boston15800212050317850
    Cagliari15600211200316800
    London1570015700211400
    Manchester1560015950211550
    Marseille15550211400316950
    Milano21045000210450
    New York211950212000423950
    Paris1540015700211100
    Roma31560000315600
    9 rows in set (0.00 sec)

    He took a sip from his mug and added, "And of course nobody could prevent you from inserting a totally unrelated line into your query, like this one:
    mysql> SELECT location ,SUM(IF(gender='M',1,0)) AS M 
    , SUM(IF(gender='M',salary,0)) AS salary_M 
    , SUM(IF(gender='F',1,0)) AS F 
    , SUM(IF(gender='F',salary,0)) AS salary_F 
    , SUM(IF(dept_code = "4", 1,0)) AS `Sales` 
    , COUNT(*) AS empl 
    , SUM(salary) AS tot_salary FROM locations 
    INNER JOIN employees USING (loc_code) GROUP BY location;
    locationMsalary_MFsalary_FSalesempltot_salary
    Boston158002120501317850
    Cagliari156002112000316800
    London15700157000211400
    Manchester15600159502211550
    Marseille155502114001316950
    Milano210450001210450
    New York2119502120001423950
    Paris15400157001211100
    Roma315600001315600
    9 rows in set (0.00 sec)

    "Here we have results coming from three different sources: the counting of employees by gender, the sum of their salaries by gender, and the total number of the employees in the Sales department, regardless of their gender. I know that this particular example doesn't make much sense, but this is something that people in the management, for reasons that completely escape my intellectual hold, want to put together. Don't underestimate such possibility, which could prove to be useful in many occasions."

    I knew by then that his enthusiasm was growing thinner, and I should have expected him to leave abruptly any moment. I had to refuel his good disposition and ask some juicy questions. "This is really wonderful," I said, "but in a real application you would not use SQL alone. How can I implement this algorithm with a general purpose language?"

    Putting the pieces together

    The wizard startled, as if awaken from a troubled sleep and suddenly his eyes were sparkling with renewed interest. "Yes, of course," he replied. "If we stick to one-level cross-tabulations, this algorithm will fit nicely into any high level language. We should avoid the multi-level ones, for the moment, because the general concept is more important than the thorny details. Let me draw you a flow-chart diagram."
    I had already relinquished my desktop, and I stayed quietly at his side, while he was skillfully drawing this diagram, which he explained almost as fast as he was designing.

    "The required actions are quite simple. However, you need some planning before starting. First, you have to identify the source for the columns. It could be the same source from where you need to count or sum, but in a well organized and normalized database it should be in a separated table. Either way, your first action with the database will be to query for distinct column values. Then you will merge such values within the summary statements composing your query. And at that point you'll be ready to execute. You can only omit the initial query if you are 100% sure that your values have not changed. This could be the case, for example, if your column values are in a read-only table. But usually this is not the case, or else cross-tabulations wouldn't be that hard. Well, let's start. Any preferred language?"

    This was a false democratic question, which I knew by experience. I could mention any language and he would be proficient in it, but he would reject on some ground, until I would eventually manage to name the language he had in mind. Having gone through the motions before, I had my answer ready: "I think that Perl would serve the purpose," I said hastily. He nodded, approving my wisdom, and fired a copy of vim from a xterm. "Sorry if you are an Emacs guy," but his voice betrayed his complete lack of sorrow, "Nothing like vim to highlight Perl syntax. I am sure you can follow me into this, and besides, I have the helm." That, I knew for sure. My computer was firmly in his possession, and I started doubting that I would ever have it back.

    The wizard, unaware of my anxiety, was already writing

    #!/usr/bin/perl -w use DBI; use strict;


    Now, you would expect your average wizard to be adventurous and careless, keen of programming without constraints. Not this one. My wizard is a wizard because he behaves like one, but, being a database wizard, he is also very strict in matter of coding. More than once he gave me a speech on the importance of catching the errors in advance, before they can catch you. Since I knew that lecture by heart, I did not comment. My only contribution was to provide the IP address of my server and the location where the DBI could look for my user-name and password. I suspect that the wizard already knew that much, but he wanted to make me feel important by letting me provide some tiny contribution to the script.

    my %params;
    while (<>) { # gets the configuration from the input pipe next if /^#/; # skips comments my ($name,$value) = split '=',$_; chomp $value; $params{$name} = $value ; } $params{database} or die "no valid input given\n";
    "We are going to read the parameters into a hash from a configuration file. They are too many for the command line, and it's better to save them to a file, which will be even clearer. In a production case, you would rather store those parameters into a database table. Four our purposes, we are going to use a text file. Actually, let's write it down, before we continue. I think it would be better to start with the simple case, location by gender."

    title=location-gender database=xcompany row_name=location row_alias=town col_name=gender col_alias=gender col_value=1 col_from=FROM employees col_where= col_order= row_from=FROM locations INNER JOIN employees USING(loc_code) row_where= row_order=ORDER BY loc_sort_order row_group=group by location
    He saved those parameters into a location-gender-count.xtab and explained: "See, our parameters are the description of the final query. In order to build the query, we need to set its parts. In this particular case, we are not going to use a WHERE and a ORDER BY clause for the columns, but you know that we may need them for the others, so we'll leave them. Our parsing mechanism can easily take care of the empty strings. In this script I assume that all these parameters are properly set in the configuration file. I will skip all the error checking on that, and you can implement it later on."

    my $dbh = DBI->connect("DBI:mysql:$params{database};" ."host=172.16.35.1;mysql_read_default_file=$ENV{HOME}/.my.cnf") or die "can't connect $! \n";
    "This is fairly simple. Just the connection to the database. Now we have a database handle $dbh and with it we can go for the first query."

    my $row_statement = ""; my $sth = $dbh->prepare("SELECT DISTINCT $params{col_name}, $params{col_alias} " . $params{col_from} . " " . $params{col_where} . $params{col_order}); $sth->execute(); while (my ($colname, $colalias) = $sth->fetchrow_array()) { $row_statement .= ", SUM(IF($params{col_name} = \"${colname}\"," . " $params{col_value}, 0)) AS `${colalias}` \n" } $sth->finish(); $dbh->disconnect();
    I recognized the proceedings. He was instructing Perl to do the same thing that he was previously doing manually. I told him that much. "Yes, exactly," he approved. "And in addition to that, using a high level language will grant us some more freedom. Since we have already collected the field list from our configuration file, we are not in a hurry to compose our second query. Having the parameters in a hash will also give us some amount of control, since any missing parameter from the configuration file will be duly reported by the Perl compiler. At this point, the only thing we need to do is to gather all the pieces together and produce our final query."

    # ------------ add total column and row details $row_statement .= ", " . (($params{col_value} eq "1")? "COUNT(*)" : "SUM($params{col_value})") . " AS total \n" . $params{row_from} . $params{row_where} ; print "use $params{database};\n"; $row_statement =~ s/\n\n/\n/g; # remove double EOL print "SELECT $params{row_name} AS $params{row_alias} \n", "$row_statement\n$params{row_group} \n $params{row_order} ;\n"; # the cross table print "SELECT 'TOTAL'\n$row_statement;\n"; # the total line
    He continued explaining. "Since you have seen the manual examples and the flow-chart, what we are doing shouldn't come as a surprise to you. We add a total line, which will be a real SUM or a COUNT, depending on the parameter that was entered. Then we add the FROM and WHERE clause, and we don't care if this last one is empty. The resulting script will include a USE database statement, followed by a query that should look much like the one we entered manually with cut-and-paste."
    I wanted desperately to enter the discussion, so I told him what was troubling me. "Why you did not complete the $row_statement with the GROUP and ORDER-BY clauses? Why did you stop at the WHERE level?"
    Almost annoyed by my lack of insight, he replied "You should see it by yourself. The $row_statement as I made it is what we have in common between the normal selection and the total line. To get the grand total you should not GROUP, and since we are going to get only one line, it doesn't make much sense to ORDER it. Does it? Now, since you mentioned the total, this solution is just for a quick demonstration, which I can do with a single line of Perl. If your tables are those big babies with millions of record that you might have in your data warehouse, then you'd better send the output of this query to a temporary table and then get the grand total from there. For tables with less than one hundred thousand records, MySQL will slurp this kind of cross-tabs in a blink."

    He got up from my chair, and I knew that my lesson was over. "Wait a minute." I said hastily. " This script doesn't execute the cross-tab query. How do you use it?"

    Still halfway from my chair, he bent to the keyboard, and typed

    $ perl xtab.pl < location-gender-count.xtab | mysql -t

    "Just add your password and host, if you need it, and it 
    will give you the X-tab." 

    "But? wait! I have a few questions about multi-level cross-tabulations."
    He smiled in a devilish way. "I am sure you do. But I have a girlfriend who wants to discuss database theory with me," he looked at his watch and added "in exactly ten minutes at that new Chinese restaurant downtown, and I shouldn't make her wait. I think you have enough food for thought to fill your weekend. Give me a call next week, and perhaps, if I can spare half an hour, I can give you a hand." And while offering his metaphoric limb, he extended his real one to shake mine, and off he went before I could say 'son of a wizard!'



    More MySQL Articles
    More By Giuseppe Maxia


     

       

    MYSQL ARTICLES

    - Take Some Load off MySQL with MemCached
    - MySQL Table Prefix Changer Tool in PHP
    - Using the SIGNAL Statement for Error Handling
    - Error Handling Examples
    - Error Handling
    - Completing a Search Engine with MySQL and PH...
    - Paginating Result Sets for a Search Engine B...
    - Building a Search Engine with MySQL and PHP 5
    - Using Boolean Operators for Full Text and Bo...
    - PHP, MySQL and the PEAR Database
    - Working with PHP and MySQL
    - Getting PHP to Talk to MySQL
    - Creating an RSS Reader: the Reader
    - MySQL Security Overview
    - Creating the Admin Script for a PHP/MySQL Bl...





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway