MySQL wizardry - Opening the path (
Page 2 of 4 )
The wizard took a sheet of paper and drew a table
"We will go for it manually." He
said. "This way, we are going to understand what to ask the database engine to
do." It was typical of the Wizard. When he was in this explaining mood, I should
better let him talk.
"Let's start with the first row. It's
Roma. The
employee is male, so we write
1 under
M and
0 under
F. Then we get the second line. It's again
Roma. Which gender is
this one? If we have a male, then we are going to add
1 to the value
under
M, and add a
0 under
F, and so on."
| Town |
M |
F |
total |
| Roma |
1+1 |
0+0 |
|
He looked at me, as if expecting me
to see the light and have a magic understanding of the algorithm he was hinting
at. My blank stare must have told him that I was still at large. "Don't you get
it? It's simple. We
sum to M
if the employee is male, and we
sum to F
if she's a female." He stressed the words
sum and
if. Then he grabbed my keyboard and modified my previous statement:
mysql> SELECT location, SUM(IF(gender='M',1,0)) AS M, SUM(IF(gender='F',1,0)) AS F
FROM locations INNER JOIN employees USING (loc_code) GROUP BY location;
| location |
M |
F |
| Boston |
1 |
2 |
| Cagliari |
1 |
2 |
| London |
1 |
1 |
| Manchester |
1 |
1 |
| Marseille |
1 |
2 |
| Milano |
2 |
0 |
| New York |
2 |
2 |
| Paris |
1 |
1 |
| Roma |
3 |
0 |
9 rows in set (0.00 sec) "So we are
telling the engine to do exactly the same thing that we would have done
manually.
Sum ... if. Only the engine will do it faster."
I said
"Wow!" but my mind was racing to see how this incredibly simple statement could
be of help. "What about the total column?" I asked.
"Oh, that. Here you
are." And he modified the statement once more:
mysql> SELECT location, SUM(IF(gender='M',1,0)) AS M,
SUM(IF(gender='F',1,0)) AS F, COUNT(*) AS total
GROUP by location;
| location |
M |
F |
total |
| Boston |
1 |
2 |
3 |
| Cagliari |
1 |
2 |
3 |
| London |
1 |
1 |
2 |
| Manchester |
1 |
1 |
2 |
| Marseille |
1 |
2 |
3 |
| Milano |
2 |
0 |
2 |
| New York |
2 |
2 |
4 |
| Paris |
1 |
1 |
2 |
| Roma |
3 |
0 |
3 |
9 rows in set (0.00 sec) "I don't think I
really understand, though." I said. "We need to count, but we are summing up.
How comes?"
"From the SQL point of view, we are doing the same thing. COUNT
of star and SUM of one are the same thing. Try it yourself. Type a 'select COUNT
star from employees'".
mysql> SELECT COUNT(*) from employees;
1 row in set (0.00 sec) "Now replace COUNT
of star with SUM of one."
mysql> SELECT SUM(1) from employees;
1 row in set (0.00 sec) "It's the same!" I
said, excited.
"No, actually it's not. COUNT of star is optimized by MySQL,
and it is performed from the table descriptor, without actually counting the
records. You can't realize the difference in such a small table. If you had one
million records, and you were actually counting by groups, you would see that
SUM takes a couple of milliseconds more than COUNT, and I think we can live with
that. Notice that we could not use COUNT in our cross-tab, because it would have
counted all the rows anyway. Try it."
mysql> SELECT location, COUNT(IF(gender='M',1,0)) AS M,
COUNT(IF(gender='F',1,0)) AS F,
COUNT(*) AS total
FROM locations INNER JOIN employees USING (loc_code)
GROUP BY location;
(warning: gives WRONG
results!)
| location |
M |
F |
total |
| Boston |
3 |
3 |
3 |
| Cagliari |
3 |
3 |
3 |
| London |
2 |
2 |
2 |
| Manchester |
2 |
2 |
2 |
| Marseille |
3 |
3 |
3 |
| Milano |
2 |
2 |
2 |
| New York |
4 |
4 |
4 |
| Paris |
2 |
2 |
2 |
| Roma |
3 |
3 |
3 |
9 rows in set (0.00 sec) "See? That's why
we have to sum up, instead of counting. COUNT is a dumb function which will
count any piece of junk it finds. SUM has some grace, in its choice."
It
looked so trivial that I was ashamed of myself for not having found it alone.
But suddenly I saw something that didn't seem right to me. "Here we have a
simple case, where we know all the values that will go into the columns. But
what should we do if we don't know? What if we want the departments instead?"
The Wizard took a glance at the diagram and typed:
mysql> SELECT dept from departments;
| dept |
| Development |
| Personnel |
| Research |
| Sales |
| Training |
5 rows in set (0.01 sec) "Yeah. I see." I
said, with a hint of disappointment in my voice. "You mean that I have to
compose the query manually, entering a SUM/IF statement for each value in
departments?"