MySQL wizardry (
Page 1 of 4 )
"I called in the Wizard on a Friday evening. It was almost six p.m. and
I was afraid he had already left for the day. Instead, he answered at
the first ring. Wizards never go home early. He recognized me and asked
how he could help. I told him. He listed patiently, without asking silly
questions in between, and finally said 'I think I could provide you with
some useful tool. See you in half an hour' and hung up."
|
Cross tabulations are
statistical reports where you de-normalize your data and show results grouped by
one field, having one column for each distinct value of a second
field.
|
 |
Basic problem
definition. Starting from a list of values, we want to group them by field A and
create a column for each distinct value of field B. |
|
 |
 |
| |
The desired result is
a table with one column for field A, several columns for each value of field B,
and a total column. |
|
According to some
authoritative sources (Joe Celko, "SQL for smarties") we should use specialized
(and expensive) statistical tools to achieve this purpose with a database
server.
My recent experience with MySQL has shown that
you don't have to invest a fortune to have server-side cross-tabulations. The
way I found it is littered with errors and disappointment, and in perspective it
should appear quite boring. This is the chronicle of how I would have liked to
find out a
solution. |
{mospagebreak title=A
call for help - Defining the problem} I called in the Wizard on a Friday
evening. It was almost six p.m. and I was afraid he had already left for the
day. Instead, he answered at the first ring.
Wizards never go home
early.
He recognized me and asked how he could help. I told him. He
listed patiently, without asking silly questions in between, and finally said "I
think I could provide you with some useful tool. See you in half an hour" and
hung up.
Twenty five minutes later he was in my office, sitting in front
of me and mentioning coffee. Then he started to ask questions.
"So, what
exactly do you need to do?"
"I need to get a cross-tab out of my database."
"How did you solve the problem so far?"
"In the beginning, I used to
export my data to a spreadsheet, and let the users do the work. You know,
nowadays these spreadsheets can do almost anything, provided that you are smart
enough. Therefore, I gave my users the possibility of exporting records to their
favorite application so that they could twist the data as they liked."
"And
why you can't do it anymore?"
"We're growing, you know. When I started the
database, we had just a few hundred records, and everything worked just fine.
But then, we hit the market really hard, and before I could realize it, we had
more than ninety thousand records that were about to go down to the spreadsheets
and the users complained that it was slow and asked me to integrate the x-tab
into the main application. They say that, after all, these nice desktop
databases that I don't want to mention -- the Wizard nodded approvingly -- have
such features, and my application should have it as well."
"Then you did
what they asked for. You created a function to integrate the cross-tabulation
into your application." He was reading me like an open book.
"Yes, of
course I did it, even though it was not easy. To minimize the amount of data
transferred from the server to the client I was only dealing with summarized
data, you know, GROUP BY two fields. An then I found a way of translating the
values of the second field into columns and summing up the data."
The wizard
looked at his coffee mug for a long time and said "You seem to have solved your
problem then. Haven't you?"
"Well, not exactly. Dealing with a large matrix
of data is not what C language seems to be made for. I mean, it is, but I can't
cope with it as well as I can work out a few spreadsheet macros. And the
algorithm has to take care of peculiar cases where the server does not return a
value for the second field, and then the management wanted to have the cross-tab
broken down by additional rows and columns, and they want such reports to be
available into the main application by Monday, and I don't think my algorithm
could cope with such request. This is why I asked your help."
I looked
at him expectantly. I knew that now he could do one of two things: he could
either say that the problem was trivial and uninteresting and he would leave me
in the glue, or he should ask to see the algorithm and tell me what was wrong
with it and give me a simple and efficient solution, which will make my
application scalable to solve the darn problem. He had done that before for me.
He had looked at my application, found the reason for inefficiency, suggested a
simpler approach, and left me happy, wiser and puzzled.
That day,
though, he did none of that. He didn't want to see the code at all. Instead, he
asked for a refill of coffee and told me "Give me a tour of your company." I
understood what he wanted. He wasn't looking for a walk among the desks, but he
wanted to see the data. I explained that I could not show him the real data,
since my boss was really concerned about the competition learning what we are up
to, but happily I had some dummy data that I was using with our real structure
when I was testing the database, and I showed him that.
"This is the
design of the main tables in the DB." I explained. "We have an
employees
table, code-related with
departments and
locations, which is also
related to c
ountries. The
sales table has references to the
employees and
categories tables. Only employees belonging to the
Sales department can be involved in sales, but they can sell anything, from
software to services to education."
The Wizard studied the diagram for a
few minutes, nodding from time to time, as if recognizing an old friend. Then he
looked at me and said "OK. Let's do some cross tabulation. Do you have anything
especially urgent?"
"As a matter of fact, I have more than one specific
urgent job, but if you don't mind, I would like to see the solution of a simple
one, so that I can work it out on a more complex one." "It's fine with me." He
said. "Show me your case."
"OK. I will query two tables,
employees and
locations, and get the list of employees with the
town where they work."
I opened a Xterm in my Linux box and connected to
MySQL. There I entered:
mysql> SELECT name, gender, location FROM
locations INNER JOIN employees USING (loc_code);
| name |
gender |
location |
| Luigi |
M |
Roma |
| Mario |
M |
Roma |
| Fred |
M |
Milano |
| Cinzia |
F |
Cagliari |
| Marco |
M |
Cagliari |
| Jim |
M |
Roma |
| John |
M |
Milano |
| Sue |
F |
Cagliari |
| Maria |
F |
Paris |
| Giselle |
F |
Marseille |
| Sonia |
F |
Marseille |
| Jacques |
M |
Marseille |
| Paul |
M |
Paris |
| Jennifer |
F |
Manchester |
| Julie |
F |
New York |
| Christine |
F |
London |
| Don |
M |
London |
| Sam |
M |
Manchester |
| Colette |
F |
New York |
| Connie |
F |
Boston |
| Guy |
M |
Boston |
| Steve |
M |
New York |
| Antonio |
M |
New York |
| Nina |
F |
Boston |
24 rows in set (0.00 sec) "What I
would like to have," I explained, "Is a row for each town, with a column for
each gender and a total column."