Using Relevance Rankings for Full Text and Boolean Searches with MySQL

If you’re a web developer who is searching for a step-by-step guide on how to quickly implement full text and Boolean searches with MySQL, then look no further. This group of articles might be what you need. Welcome to the second tutorial of the series that began with "Performing Full Text and Boolean Searches with MySQL."

Introduction

As the above title claims, this series shows you the key points for speeding up the execution of traditional SELECT statements by using the powerful capabilities of full text indexes and Boolean operators. These operators are fully supported by the newest versions of the popular open-source MySQL database server.

Now that you know what this series of educational articles is about, let me provide you with a brief explanation of how full text and Boolean searches work, in case you don’t yet have solid background in these helpful features. It’s worth keeping in mind that they’re not only present in MySQL, but in the majority of production-level database systems, like Oracle and Microsoft SQL Server.

Basically, the implementation of full text indexes for a specific database table implies the usage of the FULL TEXT MySQL command. This command specifies that one or more fields contained in that table will support this feature. This means that when performing a search query against the specified database table, all the searching terms included in the query that happen to be three or more characters long will be directly discarded from it, in this way performing a process commonly known as "noisy words removal."

Logically, as you may have guessed, defining these types of indexes for a given MySQL database table helps to improve the performance of large search queries. As an additional benefit, when the indexes in question are used in conjunction with a MATCH SQL statement, the corresponding results return a relevance ranking in accordance with the search terms included into the query.

Besides, it’s very important to mention that full text indexes support the implementation of Boolean operators too, including plus (+) and minus (-) signs, among others. This feature makes it very convenient to specify which search terms should be considered, and which ones should be discarded, when performing a search query against a given database table.

Of course, discussing the numerous benefits in using full text and Boolean searches with MySQL and only covering the theoretical aspect of the subject is a rather pointless process. Therefore in this second part of the series I’m going to demonstrate with several easy-to-grasp examples how to work with relevance rankings, in this way diving a bit deeper into this exciting database-related terrain.

Now, let’s get rid of the preliminaries and keep learning more about using full-text and Boolean searches in MySQL. It’s going to be instructive, trust me!

{mospagebreak title=Developing a basic MySQL-driven search engine}

To illustrate clearly how to retrieve different relevance rankings from MySQL when performing a full-text search against a specified database table, I’m going to use the same search engine that was built in the first article of the series. As you probably recall, it was composed of two simple source files.

The first file was responsible for displaying the pertinent web form on the browser for entering obviously diverse search strings. The second one was tasked with executing real full-text queries against a sample "USERS" database table.

Naturally, in this case I’m going to modify slightly the SELECT statement that returns the corresponding database results to handle the aforementioned relevance rankings. However, as you’ll see for yourself in the next few lines, the rest of the search application will remain nearly the same.

Having explained how this practical example will be developed, I’m going to create the mentioned "USERS" database table by specifying the corresponding full-text indexes for it. This simple process is demonstrated by the SQL statement below:

CREATE TABLE users
(
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY NOT NULL,
  firstname VARCHAR(64),
  lastname VARCHAR(64),
  email VARCHAR(64)
  comments TEXT
  FULLTEXT(firstname,lastname,comments)
);

As you can see, the definition for the prior database table specifies that three fields of it, that is "firstname,""lastname" and "comments" respectively, will be created as full-text indexes via the FULLTEXT command that you learned in the previous tutorial of the series. So far, so good, right?

Now, the next step consists of populating the above table with some trivial records, like the ones shown below:

("users" database table)

Id firstname lastname         email                                      comments

1  Alejandro Gervasio alejandro@domain.com MySQL is great for building a search engine
2  John        Williams  john@domain.com          PHP is a server side scripting language
3  Susan      Norton   susan@domain.com        JavaScript is good to manipulate documents
4  Julie         Wilson   julie@domain.com           MySQL is the best open source database server

Having already inserted some primitive data into the previous database table, it’s time to show the signatures of the two files that integrate this MySQL-driven search engine. These files look like this:

(definition of form.htm file)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-
8859-1" />
<title>Working with relevance results</title>
<style type="text/css">
body{
  
padding: 0;
  
margin: 0;
  
background: #fff;
}

h1{
  
font: bold 16px Arial, Helvetica, sans-serif;
  
color: #000;
  
text-align: center;
}

p{
  
font: bold 11px Tahoma, Arial, Helvetica, sans-serif;
  
color: #000;
}

#formcontainer{
  
width: 40%;
  
padding: 10px;
  
margin-left: auto;
  
margin-right: auto;
  
background: #6cf;
}
</style>
</head>
<body>
  
<h1>Working with relevance results</h1>
 
<div id="formcontainer">
   
<form action="search.php" method="get">
     
<p>Enter search term here : <input type="text"
name="searchterm" title="Enter search term here" /><input
type="submit" name="search" value="Search Now!" /></p>
   
</form>
 
</div>
</body>
</html>

(definition of search.php file)

<?php
// define ‘MySQL’ class
class MySQL{
  
private $conId;
  
private $host;
  
private $user;
  
private $password;
  
private $database;
  
private $result;
  
const OPTIONS=4;
  
public function __construct($options=array()){
    
if(count($options)!=self::OPTIONS){
      
throw new Exception(‘Invalid number of connection
parameters’);
     
}
    
foreach($options as $parameter=>$value){
      
if(!$value){
        
throw new Exception(‘Invalid parameter ‘.$parameter);
       
}
       
$this->{$parameter}=$value;
    
}
    
$this->connectDB();
   
}
  
// connect to MySQL
  
private function connectDB(){
    
if(!$this->conId=mysql_connect($this->host,$this-
>user,$this->password)){
      
throw new Exception(‘Error connecting to the server’);
    
}
    
if(!mysql_select_db($this->database,$this->conId)){
      
throw new Exception(‘Error selecting database’);
    
}
  
}
  
// run query
  
public function query($query){
    
if(!$this->result=mysql_query($query,$this->conId)){
      
throw new Exception(‘Error performing query ‘.$query);
    
}
    
return new Result($this,$this->result);
  
}
  
public function escapeString($value){
    
return mysql_escape_string($value);
  
}
}

// define ‘Result’ class
class Result {
  
private $mysql;
  
private $result;
  
public function __construct($mysql,$result){
    
$this->mysql=$mysql;
    
$this->result=$result;
  
}
  
// fetch row
  
public function fetchRow(){
    
return mysql_fetch_assoc($this->result);
  
}
  
// count rows
  
public function countRows(){
    
if(!$rows=mysql_num_rows($this->result)){
      
return false;
    
}
    
return $rows;
  
}
  
// count affected rows
  
public function countAffectedRows(){
    
if(!$rows=mysql_affected_rows($this->mysql->conId)){
      
throw new Exception(‘Error counting affected rows’);
    
}
    
return $rows;
   
}
  
// get ID form last-inserted row
  
public function getInsertID(){
    
if(!$id=mysql_insert_id($this->mysql->conId)){
      
throw new Exception(‘Error getting ID’);
    
}
    
return $id;
  
}
  
// seek row
  
public function seekRow($row=0){
    
if(!is_int($row)||$row<0){
      
throw new Exception(‘Invalid result set offset’);
    
}
    
if(!mysql_data_seek($this->result,$row)){
      
throw new Exception(‘Error seeking data’);
    
}
  
}
}

try{
  
// connect to MySQL
  
$db=new MySQL(array(‘host’=>’host’,’user’=>’user’,’password’=>’password’,
‘database’=>’database’));
  
$searchterm=$db->escapeString($_GET['searchterm']);
  
$result=$db->query("SELECT firstname, MATCH(firstname,lastname,comments) AGAINST(‘$searchterm’) AS
relevance FROM users");
  
if(!$result->countRows()){
    
echo ‘No results were found.';
  
}
  
else{
    
echo ‘<h2>Users returned are the following:</h2>';
    
while($row=$result->fetchRow()){
      
echo ‘<p>Name: ‘.$row['firstname'].’ Relevance: ‘.$row
['relevance'].'</p>';
    
}
  
}
}

catch(Exception $e){
  
echo $e->getMessage();
  
exit();
}
?>

Despite the rather lengthy signature that corresponds to the last PHP file, you should pay attention particularly to the way that the pertinent search query has been constructed:

$result=$db->query("SELECT firstname, MATCH
(firstname,lastname,comments) AGAINST(‘$searchterm’) AS
relevance FROM users");

In this case, I used the already familiar MATCH and AGAINST commands (covered in the preceding article of the series) to return from the sample "USERS" table a relevance ranking, depending on the search terms entered in the respective search form. However, the functionality of this ranking will be better understood if I show you some results outputted by the previous PHP file, according to the search term entered in the mentioned web form.

That being said, here are the corresponding database results:

// displays the following entering ‘Alejandro’ search term
/*
Users returned are the following:

Name: Alejandro Relevance: 1.0167628961849

Name: John Relevance: 0

Name: Susan Relevance: 0

Name: Julie Relevance: 0
*/ 

// displays the following entering ‘Susan’ search term
/*
Name: Alejandro Relevance: 0

Name: John Relevance: 0

Name: Susan Relevance: 1.0277009445163

Name: Julie Relevance: 0
*/

// displays the following entering ‘John’ search term
/*
Users returned are the following:

Name: Alejandro Relevance: 0 

Name: John Relevance: 1.0277009445163

Name: Susan Relevance: 0

Name: Julie Relevance: 0
*/

// displays the following entering ‘Julie’ search term
/*
Users returned are the following:

Name: Alejandro Relevance: 0 

Name: John Relevance: 0

Name: Susan Relevance: 0

Name: Julie Relevance: 1.0167628961849
*/

As you can see, the above list of examples shows in a friendly fashion how to retrieve some relevance rankings in accordance with diverse search terms entered in the search form. Here, it’s clear to see that this ranking value is a positive decimal value, and obviously varies in consonance with the inputted search string. Quite simple, right?

Okay, at this point I believe that the previous results should give you a better idea of how to return relevance values using full-text searches. So what is the next step that must be taken on this educational journey?

Well, since I assume that you’re interested in learning a bit more about how MySQL handles relevance rankings, in the following section I’m going to show you a concrete example to illustrate how to work the so-called "50%" threshold.

Does this sound complex to you? Fear not, since it’s much simpler than you think! Just keep reading to learn more on this topic.

{mospagebreak title=Determining the 50 percent threshold}

As I stated in the section that you just read, it’s important to know how MySQL handles different relevance rankings. This leads me straight into introducing the concept of a feature called the 50% threshold.

Basically, this means that if a search word is present in more than 50 percent (hence its name) of the table rows searched, then these rows simply will be discarded from the corresponding results.

So, if you consider together the rows removal process performed via the aforementioned 50% threshold, in addition to the elimination of noisy words, then you’ll have a clear idea of how MySQL tries to discard from the very beginning search terms with low relevance, in this way accelerating noticeably the execution of search queries.

Now that you have learned a bit of the theory surrounding the 50% threshold, let me show you a concrete example that demonstrates how a certain search term that is present in more than 50% of the existing database rows is automatically discarded by MySQL from the corresponding results.

To illustrate how this database row removal process works, I’m going to use the same source files that were shown in the previous section, so this specific example can be more easily grasped.

That being said, here are the source files in question:

(definition of form.htm file)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-
8859-1" />
<title>Testing the MySQL 50% threshold</title>
<style type="text/css">
body{
  
padding: 0;
  
margin: 0;
  
background: #fff;
}

h1{
  
font: bold 16px Arial, Helvetica, sans-serif;
  
color: #000;
  
text-align: center;
}

p{
  
font: bold 11px Tahoma, Arial, Helvetica, sans-serif;
  
color: #000;
}

#formcontainer{
  
width: 40%;
  
padding: 10px;
  
margin-left: auto;
  
margin-right: auto;
  
background: #6cf;
}
</style>
</head>
<body>
 
<h1>Testing the MySQL 50% threshold</h1>
  
<div id="formcontainer">
   
<form action="search.php" method="get">
     
<p>Enter search term here : <input type="text"
name="searchterm" title="Enter search term here" /><input
type="submit" name="search" value="Search Now!" /></p>
   
</form>
 
</div>
</body>
</html>

(definition of search.php file)

<?php
// define ‘MySQL’ class
class MySQL{
  
private $conId;
  
private $host;
  
private $user;
  
private $password;
  
private $database;
  
private $result;
  
const OPTIONS=4;
  
public function __construct($options=array()){
    
if(count($options)!=self::OPTIONS){
      
throw new Exception(‘Invalid number of connection
parameters’);
    
}
    
foreach($options as $parameter=>$value){
      
if(!$value){
        
throw new Exception(‘Invalid parameter ‘.$parameter);
       
}
      
$this->{$parameter}=$value;
     
}
    
$this->connectDB();
   
}
  
// connect to MySQL
  
private function connectDB(){
    
if(!$this->conId=mysql_connect($this->host,$this-
>user,$this->password)){
      
throw new Exception(‘Error connecting to the server’);
     
}
    
if(!mysql_select_db($this->database,$this->conId)){
      
throw new Exception(‘Error selecting database’);
    
}
  
}
  
// run query
  
public function query($query){
    
if(!$this->result=mysql_query($query,$this->conId)){
      
throw new Exception(‘Error performing query ‘.$query);
    
}
    
return new Result($this,$this->result);
  
}
  
public function escapeString($value){
    
return mysql_escape_string($value);
  
}
}

// define ‘Result’ class
class Result {
  
private $mysql;
  
private $result;
  
public function __construct($mysql,$result){
    
$this->mysql=$mysql;
    
$this->result=$result;
  
}
  
// fetch row
  
public function fetchRow(){
    
return mysql_fetch_assoc($this->result);
  
}
  
// count rows
  
public function countRows(){
    
if(!$rows=mysql_num_rows($this->result)){
      
return false;
    
}
    
return $rows;
  
}
  
// count affected rows
  
public function countAffectedRows(){
    
if(!$rows=mysql_affected_rows($this->mysql->conId)){
      
throw new Exception(‘Error counting affected rows’);
    
}
    
return $rows;
  
}
  
// get ID form last-inserted row
  
public function getInsertID(){
    
if(!$id=mysql_insert_id($this->mysql->conId)){
      
throw new Exception(‘Error getting ID’);
    
}
    
return $id;
  
}
  
// seek row
  
public function seekRow($row=0){
    
if(!is_int($row)||$row<0){
      
throw new Exception(‘Invalid result set offset’);
    
}
    
if(!mysql_data_seek($this->result,$row)){
      
throw new Exception(‘Error seeking data’);
    
}
  
}
}

try{
   // connect to MySQL
   $db=new MySQL(array(‘host’=>’host’,’user’=>’user’,’password’=>’password’,
‘database’=>’database’));
  
$searchterm=$db->escapeString($_GET['searchterm']);
  
$result=$db->query("SELECT firstname, MATCH
(firstname,lastname,comments) AGAINST(‘$searchterm’) AS
relevance FROM users");
  
if(!$result->countRows()){
    
echo ‘No results were found.';
  
}
  
else{
    
echo ‘<h2>Users returned are the following:</h2>';
    
while($row=$result->fetchRow()){
      
echo ‘<p>Name: ‘.$row['firstname'].’ Relevance: ‘.$row
['relevance'].'</p>';
    
}
  
}
}

catch(Exception $e){
   echo $e->getMessage();
  
exit();
}
?>

So far, so good. Since the definition of the above source files should be very familiar to you, pay strong attention to the results outputted by the previous search query if the search term "mysql" is entered in the corresponding web form.

// PHP file displays the following output
/*
Users returned are the following:

Name: Alejandro Relevance: 0

Name: John Relevance: 0

Name: Susan Relevance: 0

Name: Julie Relevance: 0
*/

As you can see, MySQL has quickly removed the previous search term from the respective database results, since it was present in two table rows. Now, are you starting to grasp the logic behind the 50% threshold? I bet you are!

All right, at this point I think you understand how MySQL removes diverse search terms based on the 50% threshold algorithm. Thus, it’s time to move on and read the last section of this tutorial, where I’m going to set up an additional example to further clarify the concept that surrounds the implementation of the aforementioned 50% threshold.

To see how this final example will be built, click on the link below and keep reading.

{mospagebreak title=Building an additional example}

In consonance with the concepts deployed in the previous section, the last example that I’m going to show you here will consist of demonstrating how MySQL is capable of returning different relevance rankings according to the search terms entered in a web form.

In this case, I’m going to use two concatenated search words to return a couple of relevance values from the previous "USERS" database table, instead of only one. I’m going to use the pair of source files that you learned in the beginning of the article, so here are their respective signatures:

(definition of form.htm file)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-
8859-1" />
<title>Working with relevance results</title>
<style type="text/css">
body{
  
padding: 0;
  
margin: 0;
  
background: #fff;
}

h1{
font: bold 16px Arial, Helvetica, sans-serif;

color: #000;

text-align: center;

}

p{

font: bold 11px Tahoma, Arial, Helvetica, sans-serif;

color: #000;

}

#formcontainer{

width: 40%;

padding: 10px;

margin-left: auto;

margin-right: auto;

background: #6cf;

}

</style>

</head>

<body>

<h1>Working with relevance results</h1>

<div id="formcontainer">

<form action="search.php" method="get">

<p>Enter search term here : <input type="text" name="searchterm"
title="Enter search term here" /><input type="submit"
name="search" value="Search Now!" /></p>

</form>

</div>

</body>

</html>

(definition of search.php file)

<?php

// define ‘MySQL’ class

class MySQL{

private $conId;

private $host;

private $user;

private $password;

private $database;

private $result;

const OPTIONS=4;

public function __construct($options=array()){

if(count($options)!=self::OPTIONS){

throw new Exception(‘Invalid number of connection parameters’);

}

foreach($options as $parameter=>$value){

if(!$value){

throw new Exception(‘Invalid parameter ‘.$parameter);

}

$this->{$parameter}=$value;

}

$this->connectDB();

}

// connect to MySQL

private function connectDB(){

if(!$this->conId=mysql_connect($this->host,$this->user,$this-
>password)){

throw new Exception(‘Error connecting to the server’);

}

if(!mysql_select_db($this->database,$this->conId)){

throw new Exception(‘Error selecting database’);

}

}

// run query

public function query($query){

if(!$this->result=mysql_query($query,$this->conId)){

throw new Exception(‘Error performing query ‘.$query);

}

return new Result($this,$this->result);

}

public function escapeString($value){

return mysql_escape_string($value);

}

}

// define ‘Result’ class

class Result {

private $mysql;

private $result;

public function __construct($mysql,$result){

$this->mysql=$mysql;

$this->result=$result;

}

// fetch row

public function fetchRow(){

return mysql_fetch_assoc($this->result);

}

// count rows

public function countRows(){

if(!$rows=mysql_num_rows($this->result)){

return false;

}

return $rows;

}

// count affected rows

public function countAffectedRows(){

if(!$rows=mysql_affected_rows($this->mysql->conId)){

throw new Exception(‘Error counting affected rows’);

}

return $rows;

}

// get ID form last-inserted row

public function getInsertID(){

if(!$id=mysql_insert_id($this->mysql->conId)){

throw new Exception(‘Error getting ID’);

}

return $id;

}

// seek row

public function seekRow($row=0){

if(!is_int($row)||$row<0){

throw new Exception(‘Invalid result set offset’);

}

if(!mysql_data_seek($this->result,$row)){

throw new Exception(‘Error seeking data’);

}

}

}

try{

// connect to MySQL

$db=new MySQL(array
(‘host’=>’host’,’user’=>’user’,’password’=>’password’,
‘database’=>’database’));

$searchterm=$db->escapeString($_GET['searchterm']);

$result=$db->query("SELECT firstname, MATCH
(firstname,lastname,comments) AGAINST(‘$searchterm’) AS
relevance FROM users");

if(!$result->countRows()){

echo ‘No results were found.';

}

else{

echo ‘<h2>Users returned are the following:</h2>';

while($row=$result->fetchRow()){

echo ‘<p>Name: ‘.$row['firstname'].’ Relevance: ‘.$row
['relevance'].'</p>';

}

}

}

catch(Exception $e){

echo $e->getMessage();

exit();

}

?>

Now, after listing the two previous source files, please study the following results returned by MySQL after entering the search string "alejandro+susan" in the corresponding web form:

// PHP file displays the following:
/*
Users returned are the following:
Name: Alejandro Relevance: 1.0167628961849
Name: John Relevance: 0
Name: Susan Relevance: 1.0277009445163
Name: Julie Relevance: 0
*/

As you can see, in the previous case the search query has returned two different relevance rankings in accordance with the inputted search terms. Hopefully, this last example should give you a better idea of the way that MySQL handles relevance values.

As usual with many of my articles on PHP development, feel free to modify all the code samples shown here, so you can start quickly implementing full-text searches in your own web applications.

Final thoughts

In this second article of the series, I provided you with a basic introduction to using relevance rankings when performing full-text searches with MySQL. In the final part I’m going to complete this interesting subject by teaching you how to use Boolean operators with your search queries.

You’ve been warned, so don’t miss it!

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye