If you develop MySQL-driven applications on a frequent basis, you’ll know that steering clear of record duplication issues is a must, especially when it comes to maintaining the integrity and consistency of the data manipulated behind the scenes. While the problem can be the origin of a huge variety of serious issues, it’s possible to solve them in a pretty simple manner.
If you’re a skilled programmer (regardless of the language of your choice) you can tackle the issue at the application level, even though this approach may lead you to write several chunks of boilerplate code, breaking down the commandment stated by the DRY (Don’t Repeat Yourself) principle. On the other hand, it’s possible to resolve the problem at the database level as well, even at the expense of dealing with the complexities of database architecture design.
While each method has its own pros and cons, in this tutorial I decided to discuss the latter, which in my opinion is slightly more efficient, as it’s not coupled to any particular programming language. In line with this concept, in the introductory chapter I demonstrated how easy it is to prevent duplicates in MySQL tables by using a simple combination of multiple primary keys along with the “INSERT IGNORE” statement.
If you missed that article or need a refresher, you can view it here:
The implementation of this method is reduced to defining one or more columns of the target table as primary keys (which actually avoids the record duplication), and inserting new rows by appending the “IGNORE” clause to a regular “INSERT”, something that makes MySQL discard any errors that might arise during the insertion process.
There’s a number of occasions when you won’t want to have multiple primary keys coexisting in the same table, which would make the previous method useless. But fear not, as there’s a couple of additional approaches that can be used to avoid duplicate rows without having to utilize several primary keys. The first of these methods bases its functionality on the “REPLACE INTO” statement, while the second relies on the “UNIQUE” clause to assure that no row duplication will occur.
In this last part of the tutorial series, I’ll be taking a close look at the aforementioned approaches, so that you can put them to work.
Review Time: Preventing Record Duplication Using Multiple Primary Keys and the “INSERT IGNORE” Statement
Tt’s really simple to prevent duplicated records by defining multiple primary keys in the target table. If this concept sounds somewhat confusing to you, be sure to look at the following code snippet, which applies this approach to a trivial MySQL table that stores information about some hypothetical users:
As the table’ structure shows, the “first_name”, “last_name” and “email” columns have been defined as primary keys, which makes it impossible to assign duplicate values to them. This works fine as long as you run multiple INSERTS only once, like the ones shown below:
However, if for some reason you try to execute the same bunch of statements again, you’ll get an error from MySQL, which is the expected behavior, but not the desired one. It’s feasible, though, to work around this “issue” and perform the corresponding insertions in the following way:
Since the “IGNORE” clause does exactly what its name suggests - to make the server ignore the errors caused when attempting to duplicate rows - the previous set of INSERTS won’t cause any complaints.
At this point you should know how to use primary keys in conjunction with the functionality of “IGNORE” in order to avoid row duplication issues. But, as I expressed at the beginning, MySQL provides a few other methods that allow you to achieve the same result. Moreover, if you feel rather uncomfortable using multiple primary keys, then you may want to take a close look at the “REPLACE” statement, which permits you to replace old records with new ones.
For obvious reasons, “REPLACE” can be used for preventing row duplication without having to alter the definition of the target table. In the upcoming section I’ll be showing you how to utilize this statement with our previous “users” table.
Replacing Old Records with New Ones: Introducing the “REPLACE” Statement
Although the idea is merely pragmatic, you should think of the “REPLACE” statement as a sort of counterpart for “INSERT IGNORE”. While the latter will disregard any attempt to assign duplicate values to columns in a table, the former will replace old values with new ones. Obviously, this comes in handy for avoiding duplicates without having to deal with multiple primary keys.
To explain the functionality of the “REPLACE” command, below I modified the initial definition of the “users” table, which looks like this:
Now the table doesn’t specify any primary keys. However, if you try to populate it by using the “REPLACE” statement, this could be done using a batch of commands similar to the ones shown in the following code sample:
As one might expect, the first time the above script is executed, REPLACE will behave like a common INSERT. However, if the script is run several times, the old values will be replaced by the new ones, even if they are identical, in this way avoiding duplicates. In summary, we’ve managed to implement a simple –yet effective - method for preventing record duplication, based only on the functionality provided by the “REPLACE” statement.
Now that you grasped the driving logic of the earlier approach, it’s time to move on and discuss the last one of this roundtrip, which in my personal opinion has the simplest implementation. As you may have heard, MySQL provides the “UNIQUE” clause, which permits you to specify that a column must be unique.
In the next section I’ll be demonstrating how to utilize the “UNIQUE” clause with the already familiar “users” table.
Covering a Fnal Mthod: Uing the “UNIQUE” Cause
MySQL makes it possible to specify that the value assigned to a column (or columns) must be unique. This can be easily done with the “UNIQUE” clause, whose use in pretty similar to its cousin “PRIMARY KEY”.
For you to more easily grasp how to utilize the functionality provided by “UNIQUE”, below I redefined the sample “users” table, whose structure now looks as follows:
In the previous example, this approach has been applied to all the columns of the target table, but logically it’s possible to be more selective and use “UNIQUE” with only a few.
And now that you grabbed the underlying logic of this final example, at this point you should have a pretty clear idea on how to prevent record duplication in MySQL tables. In this particular case, I showcased fourth different methods that permit to accomplish this in a pretty easy fashion, but surely you’ll be aware of some others that haven’t been covered here for the sake of brevity.
We’ve come to the end of this tutorial. As you just saw, preventing your MySQL tables from suffering duplicate issues is a fairly straightforward process that can be achieved with minor hassles, at least at the database level. However, if you feel more comfortable tackling the problem at the application level, there’s nothing wrong with using that approach, especially if your programs must switch over different RDBMS.
See you in the next MySQL tutorial!
blog comments powered by Disqus