Avoiding Duplicate Records in MySQL

In this MySQL database tutorial, you will be learning how to avoid duplicate records using “Primary Key” and the “Insert Ignore” SQL statement.

With such an impressive number of programs using MySQL as the underlying persistence mechanism, it seems that the development of database-driven applications is a fairly straightforward process that can be tackled with little or no hassles and having only a minimal background in database architecture design.

While this concept can be applied to relatively simple cases, the truth is that the process can be pretty challenging (and eventually daunting) when it comes to persisting and maintaining large amounts of data whose integrity and consistency must be preserved in the most extreme conditions. In such situations, not only is it necessary to deal with the intrinsic complexities exposed by the database schema itself, but it’s mandatory to implement mechanisms that ensure the integrity will be maintained over time, regardless of the operations performed by the application(s) that interact with the storage layer.

Of course, there are many factors that can affect (unintentionally or deliberately) the integrity of persisted data, which are way too extensive for being covered in a single tutorial (you can buy a book from the many available out there and dig deeper into the topic). However, a common issue that all database-driven programs must address is record duplication. Even in trivial and banal cases, like a personal blog database, record duplication can be a real nightmare that can end up bringing the efficiency and reliability of an application to its knees.

The good news is that preventing your MySQL database from suffering the side effects of record duplication isn’t as complicated and intimidating as it sounds. What’s more, the problem can be tackled both at application and database-level, with each approach owning its own pros and cons. However, the latter gives you more independency, as the solution isn’t strictly coupled to the language on which the application is seated upon.

In the lines to come I’ll be discussing a few straightforward approaches that you can use, in order to maintain your valuable data.

Having said that, now it’s time to get rid of the preliminaries and start looking into the first two methods that you can utilize for preventing record duplication: the “PRIMARY KEY” clause and the “INSERT IGNORE” SQL statement. Let’s get started!  

Taking the first step: creating a sample MySQL table    

Since my intention here is to showcase some simple – yet effective – methods that can be used for avoiding duplication of records in MySQL databases, the first logical step that must be taken is to create a sample table that permits us to demonstrate the efficiency of the methods in question.

To suit this basic requirement, below I defined a testing table called “users”, which has the following structure. Check it out:

DROP TABLE IF EXISTS `test`.`users`;
CREATE TABLE  `test`.`users` (
  `first_name` varchar(45) default NULL,
  `last_name` varchar(45) default NULL,
  `email` varchar(45) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

As you can see, the above table is comprised of only three columns, which store the first and last name of some users, and their email addresses as well. In addition, you may have noticed that the table uses the MyISAM database engine; if you feel more comfortable with other engines like “InnoDB” (which supports transactions), feel free to amend the previous code sample to fit your needs.

So far, I managed to create this simple table, which in its current state is pretty useless, as it doesn’t contain any meaningful data. Thus, it’s time to fix this issue and add to it some fictional users. The following code bit does exactly that:

INSERT INTO users (first_name, last_name, email) VALUES ('Sandra', 'Smith', 'sandra@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Daniel', 'Norton', 'daniel@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Susan', 'Wilson', 'susan@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Jenny', 'Adams', 'jenny@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Thomas', 'Everson', 'thomas@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Mary', 'Brown', 'mary@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Julie', 'Owen', 'julie@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Matthew', 'Burke', 'matthew@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Alfred', 'Wayne', 'alfred@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Margaret', 'White', 'margaret@domain.com');

At this point things are starting to look a bit more interesting, as our “users” table now has been populated with some sample data. Unfortunately, not all is so pleasant and delightful in this seemingly harmless scenario, as if for some unknown reason the previous set of statements are executed again, each row corresponding to every fictional user will be effectively duplicated.

As noted in the introduction, preventing this from happening is much easier than you might think. In the following segment I’ll be altering the definition of the “users” table, which will make use of the “PRIMARY KEY” clause to avoid any possible record duplication.

{mospagebreak title=MySQL Primary Keys and Insert Ignore Statement}

A quick and dirty approach to prevent record duplication: using the “PRIMARY KEY” clause

One of the simplest approaches that can be used for preventing duplicated records consists of defining some of the fields of the target table (or all of them, if you want to) as primary keys. Considering that a column which is a primary key can have only one value assigned to it, the problem can be solved in a jiffy.

With that being said, let me show you how to implement this approach with the “users” table defined before. Here’s how its enhanced definition looks:

DROP TABLE IF EXISTS `test`.`users`;
CREATE TABLE  `test`.`users` (
  `first_name` varchar(45) NOT NULL,
  `last_name` varchar(45) NOT NULL,
  `email` varchar(45) NOT NULL,
  PRIMARY KEY  (`first_name`,`last_name`,`email`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

As seen above, now the table’s “first_name”, “last_name” and “email” columns have been created as primary keys (even though this may sound rather strange at first, a table can have multiple primary keys), which implies that the values assigned to the columns in question can’t be duplicated.

But what impact does this have in the practice? Well, now it’s possible to run the below SQL snippet any number of times and feel confident that the INSERT statements won’t produce record duplication:

INSERT INTO users (first_name, last_name, email) VALUES ('Sandra', 'Smith', 'sandra@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Daniel', 'Norton', 'daniel@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Susan', 'Wilson', 'susan@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Jenny', 'Adams', 'jenny@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Thomas', 'Everson', 'thomas@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Mary', 'Brown', 'mary@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Julie', 'Owen', 'julie@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Matthew', 'Burke', 'matthew@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Alfred', 'Wayne', 'alfred@domain.com');
INSERT INTO users (first_name, last_name, email) VALUES ('Margaret', 'White', 'margaret@domain.com');

That seems to work like a charm! But before you launch your MySQL client and try out this method, it’s fair to stress that it has a tiny – yet important - catch. If you attempt to assign the same values to the table’s primary keys, they won’t be duplicated at all, but MySQL will issue an error.

In short, we’ve solved one problem, only to face a new one. The question that comes up here is: how can we use our shiny multiple primary keys without getting an error from the database server? This problem can be solved using an “INSERT” statement along with the “IGNORE” clause, which makes MySQL behave silently when attempting to assign multiple values to a primary key.

Avoiding MySQL errors: using the “INSERT IGNORE” statement

As I explained in the preceding segment, it’s really simple to make MySQL discard the errors that it generates when trying to insert multiple values into a column defined as a primary key. In reality, the fix is as easy as performing the insertion process using the INSERT IGNORE statement instead of a common INSERT. As one might expect, the “IGNORE” clause tells the database server to ignore any error raised during the insertion operation(s), something demonstrated by the following code fragment:     

INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Sandra', 'Smith', 'sandra@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Daniel', 'Norton', 'daniel@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Susan', 'Wilson', 'susan@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Jenny', 'Adams', 'jenny@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Thomas', 'Everson', 'thomas@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Mary', 'Brown', 'mary@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Julie', 'Owen', 'julie@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Matthew', 'Burke', 'matthew@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Alfred', 'Wayne', 'alfred@domain.com');
INSERT IGNORE INTO users (first_name, last_name, email) VALUES ('Margaret', 'White', 'margaret@domain.com');

If you execute the previous set of commands several times (and assuming that the definition of the “users” table remains the same), no record duplication will occur, and best of all, MySQL won’t issue any errors due to the use of the “IGNORE” clause. In short, we’ve managed to implement a simple method that prevents our tables from having duplicated rows, even at expenses of using multiple primary keys.

It’s valid to note that there exists a few other methods that can be used for fixing record duplication issues - we’ll discuss these in the last part of this tutorial series.

Final thoughts

That’s all for the moment. Over this introductory part of this two-part tutorial, I discussed a couple of approachable methods that can be used for preventing record duplication in a MySQL table. As you just saw, the use of multiple primary keys along with the “INSERT IGNORE” statement addresses this issue in a pretty efficient manner, and most importantly, without making the database server to throw ugly and unexpected errors.

Nonetheless, the above ones aren’t the only approaches that you can utilize for avoiding record duplication at database level. In fact, it’s possible to solve the problem by using the “REPLACE INTO” statement and even the simple “UNIQUE” clause.

However, the fine details concerning the use of these alternative methods will be covered in the final tutorial.

Don’t miss the last part!

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan