MySQL Query Optimizations and Schema Design

Performance is something for which we all strive. This applies to the lives of DBAs too, since their first and foremost task is to achieve a high level of performance from their databases. SQL professionals can’t stress enough that spending quality time to optimize both schema design and queries should be considered top priorities. In this article we are going to cover these two.

This is the third segment of the MySQL Performance Optimization four-part article series. Before we begin, I urge you to go back and check out the earlier parts.

Within the first article, the most important benchmarking concepts and strategies were pointed out. The second segment presented benchmarking applications and other useful tools to apply what we have learned and test our own performance.

As promised, in this third article we’re going to delve into optimizations and present most of the popular tips and hints regarding schema design and query optimizations. We need to grasp the concepts in order to be able to adapt them to your own needs (i.e.; database requirements).

Right after this part, the final part will cover really useful server tuning techniques and advice. Our purpose is to help you get the most out of your server.

Now that you already know what to expect from this segment, let’s begin. Enjoy!

{mospagebreak title=Schema Design}

The architecture of a database is represented by the schema. Within that you can find important pointers regarding the structure of the database. In the case of relational databases, these contain the tables, and the fields are also specified within each table – including, of course, their relationships between each other.

It makes perfect sense to design a useful schema. However, usability is just one of the factors that DBAs should look to while designing. Of course, it must be usable because otherwise we couldn’t say that the database is done. But then comes the other factor, which is just as important-performance. Lots of newcomers to the field of databases settle for a design as soon as they find out that their schema is usable. That’s not a wise choice.

As a result, I think we need to consider performance while designing the schema. The performance pointers that we need to account for are related to the way in which queries are executed, disk I/O, the way MySQL and DBMS work (limitations, features, possibilities, etc.), and so forth.

Database normalization is indeed amazing, but there is a general guideline which should be followed. We can sum it up like this: begin with normalization and then de-normalize later on. Oh, and please, do not normalize up to the extremes, taking the process of normalization way too far. Keep things simple but no simpler than necessary. Doing the latter may over-complicate the entire database and, in the end, decrease performance.

As I mentioned earlier, think of your queries (at least the ones that can be found within the requirements) when designing your schema. Do not exaggerate data types when they aren’t necessary. Use smallint or heck, even tinyint, whenever possible. Don’t just jump right in with bigint and feel confident that at least it can take whatever you can give it. The performance hit it is going to cost isn’t worth it.

Designing a schema requires answering questions. But as always, you need to know which ones are the “necessary” questions that ought to be answered. A few examples: Where will the data come from? What do the users want to accomplish with the database? What facts should be measured? Are the dimensions going to change over time? Is a family of fact tables needed? And so forth. For more, check this guide.

DBA experts are also recommending that IPv4 addresses should be stored as int unsigned data types. Always think before allocating and/or specifying a data type for specific requirements: is there a way you could simplify in order to store less?

Last but not least, you need to realize that splitting “large” tables into multiple “smaller” ones is a cost-effective workaround. We consider a table large when it has a lot of rows and/or columns. Rather than creating one literally huge table with dozens of columns and rows, try to orient your needs towards specifics, and divide into various tables. In the end, you will gain lots of benefits by doing so (performance and clarity).

Without getting into technicalities, having partitioned tables improves performance because the buffer pages aren’t filled with unnecessary data (like storing the huge table in its entirety). Therefore, disk I/O is also minimized, and as you’d guess, response times are improved due to reduced seek times, and so forth.

Then again, you also gain clarity advantages. Imagine troubleshooting that huge table…! I’d prefer to know beforehand, if there’s an issue, from which table it came, and then work only on that.

{mospagebreak title=Query Optimizations}

As soon as you have done your best to design the very best schema that you were able to do, it’s time to specify and get the most out of your queries. There are a few commands, tips, hints, and pointers of which you should be informed in terms of query optimizations. However, as always, you should always adapt everything to your own needs because these are general guidelines that may or may not help in your unique case.

First, let’s present the explain statement. This is a very useful command that can help you to decide which queries and/or indexes can improve/decrease performance. In a nutshell, the explain statement is synonymous with describe and can be used to gather and find out information regarding the way MySQL would execute a select statement. It returns the query execution plan information. And it can be used for optimization.

It is perhaps one of the most commonly used statements when optimizing queries. This command has lots of parameters and arguments, so please make sure that you check out the official documentation from MySQL Reference.

Moving on, priorities should be used to manipulate the jump queues or when you want to push further the execution of a specific command. Some of the most common scenarios are when you want to use select high_priority and even insert low_priority. In the aforementioned situations, the statements are executed on a higher priority; for example, the tables are selected earlier than the rest of the commands, and such.

Being frugal with disk I/O is also a great habit. How can you do that? Well, one way is by relying on insert delayed as many times as possible. Here we are referring to those situations where you don’t need the data right away. The modifications are stored in the memory but not actually written on the disk. Later on, all of the changes are written at the same time without additional seeks or I/O activity-this happens when clients close the table.

From time to time you may run into situations where you are required to delete a lot of rows from a MySQL table. If so, then defragmenting the table with the statement optimize table may help a bit. After a somewhat large change to the table you can check how this command affects the performance of your queries (benchmark before and after, compare results, and see for yourself-there are no guarantees, though).

Simplify, simplify, simplify. Break down complex permissions into more but simpler ones. By doing so, permissions checking operating expense could be reduced. MySQL routinely checks after each grant command. Speaking of simplicity, please do eliminate unnecessary parentheses from queries (such as where). Visually, extra parentheses can help but they do add up (to only nano- or milliseconds each, but still).

Then again, try to experiment and play around with caches. Don’t forget about the SQL_NO_CACHE command. This statement disables the automatic built-in query caching mechanism of MySQL. Explicitly giving hints regarding query caching can be of much more benefit if you know what you are doing; then, disabling the built-in feature is necessary to get the most out of your caching hints.

However, the above presumes that you already have activated the query caching function. We can’t really take that for granted. But if you have no idea whether you have it enabled or not, then just read this excellent article published right on MySQL’s official website. It covers everything you should know about its query caching function.

Finally, it’s up to you to identify slow queries and do your best to research them in order to alleviate the problem. You should enable slow query logging of MySQL. Once it’s enabled, you will be able to see which queries take too much time just be glancing over the log files. Extra data can also be found out by using such things as Lock_timewhich describes the amount of time the query was waiting unnecessarily due to the lock(s).

As a final suggestion, please look into the mysqldumpslow Perl script that comes already bundled with MySQL. This nifty utility reports how often those slow queries run. This is important, because by knowing how many times a particular slow query that takes a specific amount of time runs, you will be able to decide the priorities according to which you will try to optimize the queries.

Ultimately, queries that produce a greater load (spikes) on the server, but do not necessarily take much time, but are run quite often, should also be optimized. With great loads, the server is overwhelmed and may not be as responsive as expected. Cache the queries; try memcache or caching tables. Oh and yes, cache objects, not results. The former can be re-used, while the latter cannot.

{mospagebreak title=Taking a Break}

That’s right; we’ve just arrived to the end of this part. During our short journey we have seen the importance of query optimizations and schema design. You see, the most critical distinction in the real world is that performance comes into play. And in order to achieve high performing servers, responsive queries, and efficient functioning, “just all right” is never enough.

The most useful advice regarding database design planning is that we should think as early as possible. We shouldn’t sacrifice the best just for a currently working setup that gives us “acceptable” performance. We need to do our best to get the most out of what we have because later on down the road we (and the management) will be glad we did so. Lots of companies end up struggling, not knowing what to do when it’s too late.

And then to alleviate the “sudden” panic, they hire DBA experts and professionals in their fields, which most of the time recommend a total revamp. Why? Because the mindset on which everything was based-the foundation-wasn’t strong. At the beginning it was decent and satisfying, but as the company’s needs have grown, well, imagine a building becoming larger, taller, and wider… and you get my drift!

The total revamp usually requires a clean design, and this may not be possible without losing resources (money, time, etc.). Altogether, the conclusion is that regardless of the situation, it is always much more profitable to think big and plan for the future right from the beginning. But don’t exaggerate; be realistic, think over multiple times the requirements, and do your best. Never settle for less. All will be well.

As a closure, I’d like to invite you to join our experienced community of technology professionals in all areas of IT&C starting from software and hardware up to consumer electronics at Dev Hardware Forums. As well as, be sure to check out the community of our sister site at Dev Shed Forums. We are friendly and we’ll do our best to help you.

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan