HomeMySQL Page 2 - MySQL Query Optimizations and Schema Design
Schema Design - MySQL
Performance is something for which we all strive. This applies to the lives of DBAs too, since their first and foremost task is to achieve a high level of performance from their databases. SQL professionals canít stress enough that spending quality time to optimize both schema design and queries should be considered top priorities. In this article we are going to cover these two.
The architecture of a database is represented by the schema. Within that you can find important pointers regarding the structure of the database. In the case of relational databases, these contain the tables, and the fields are also specified within each table -- including, of course, their relationships between each other.
It makes perfect sense to design a useful schema. However, usability is just one of the factors that DBAs should look to while designing. Of course, it must be usable because otherwise we couldn't say that the database is done. But then comes the other factor, which is just as important-performance. Lots of newcomers to the field of databases settle for a design as soon as they find out that their schema is usable. That's not a wise choice.
As a result, I think we need to consider performance while designing the schema. The performance pointers that we need to account for are related to the way in which queries are executed, disk I/O, the way MySQL and DBMS work (limitations, features, possibilities, etc.), and so forth.
Database normalization is indeed amazing, but there is a general guideline which should be followed. We can sum it up like this: begin with normalization and then de-normalize later on. Oh, and please, do not normalize up to the extremes, taking the process of normalization way too far. Keep things simple but no simpler than necessary. Doing the latter may over-complicate the entire database and, in the end, decrease performance.
As I mentioned earlier, think of your queries (at least the ones that can be found within the requirements) when designing your schema. Do not exaggerate data types when they aren't necessary. Use smallint or heck, even tinyint, whenever possible. Don't just jump right in with bigint and feel confident that at least it can take whatever you can give it. The performance hit it is going to cost isn't worth it.
Designing a schema requires answering questions. But as always, you need to know which ones are the "necessary" questions that ought to be answered. A few examples: Where will the data come from? What do the users want to accomplish with the database? What facts should be measured? Are the dimensions going to change over time? Is a family of fact tables needed? And so forth. For more, check this guide.
DBA experts are also recommending that IPv4 addresses should be stored as int unsigned data types. Always think before allocating and/or specifying a data type for specific requirements: is there a way you could simplify in order to store less?
Last but not least, you need to realize that splitting "large" tables into multiple "smaller" ones is a cost-effective workaround. We consider a table large when it has a lot of rows and/or columns. Rather than creating one literally huge table with dozens of columns and rows, try to orient your needs towards specifics, and divide into various tables. In the end, you will gain lots of benefits by doing so (performance and clarity).
Without getting into technicalities, having partitioned tables improves performance because the buffer pages aren't filled with unnecessary data (like storing the huge table in its entirety). Therefore, disk I/O is also minimized, and as you'd guess, response times are improved due to reduced seek times, and so forth.
Then again, you also gain clarity advantages. Imagine troubleshooting that huge table...! I'd prefer to know beforehand, if there's an issue, from which table it came, and then work only on that.