Collections and Sorting Continued

In the first part of this series we implemented a basic sortable collection class. We used a Bubble Sort algorithm to order the elements in the collection, which came with a disclaimer regarding what a slow sort it is. This article will examine the primary sorting algorithms with code examples, and some empirical data regarding how they perform in relation to one another, as well as the size of the data set in question.

We are going to implement the following sort algorithms for our tests:

  1. Bubble Sort (Implemented in part one)
  2. Heap Sort
  3. Insertion Sort
  4. Merge Sort
  5. Quick Sort
  6. Selection Sort
  7. Shell Sort

We will also create a function to fill up our collection with random data in order to test the sort algorithms with a sufficiently large data set. The sort algorithms listed above are the ones that every computer science student learns in college and are the primary sort algorithms found in real-world applications. Before we actually write code to implement them, let’s discuss a few basic facts.

These algorithms can be grouped into two categories based on their algorithmic complexity:

  1. Algorithms with O(n2) complexity, also called quadratic complexity – bubble, insertion, selection and shell sorts. Algorithms of quadratic complexity are agonizingly slow with large data sets. A data set with 10,000 elements takes 10,000 times longer to process than a data set with 1,000 elements, and a set with 1,000 elements takes 1,000 times longer than a set with 100 elements, and so on.
  2. Algorithms with O(n log n) complexity, also called n log n complexity – heap, merge and quick sorts. n log n complexity is as much better than linear complexity as quadratic is worse. An algorithm completing in constant time would be preferable, but in the case of sorting this is accepted as an impossibility. An example of n log n complexity is the number of bits required to store an integer.

As you may have guessed, n log n complexity implies an inherently faster algorithm than one of quadratic complexity; the tradeoff is in the code itself. Faster algorithms in the case of sorting involve recursion, multiple arrays, and complicated data structures, but they run circles around their slower cousins. Choosing the proper sort algorithm is a subject unto itself, but in this article we will cover the general factors to be considered when choosing a sort algorithm.

First though, we need to whip up a touch of code to create a big data set. In the example below, set $numItems to however many data values you want in the collection.

function makeWord()
{
      $letters = array(‘a’, ‘b’, ‘c’, ‘d’, ‘e’, 
            ‘f’, ‘g’, ‘h’, ‘i’, ‘j’, ‘k’, ‘l’, ‘m’, 
            ‘n’, ‘o’, ‘p’, ‘q’, ‘r’, ‘s’, ‘t’, ‘u’, 
            ‘v’, ‘w’, ‘x’, ‘y’, ‘z’);           

      $return = ”;
      for ($i = 0; $i < 20; $i++)
      {
            $return .= $letters[rand(0, count($letters) - 1)];
      }     

      return $return;
}
$people = new People();
$numItems = 10000;
for ($i = 0; $i < $numItems; $i++)
      $people->Append(new Person(makeWord()));

{mospagebreak title=Sort Algorithms of Quadratic Complexity}

Bubble sort, insertion sort, selection sort and shell sort are algorithms of quadratic complexity. In the first part of this article, we implemented the bubble sort with the following code in the Sort() method from our People Collection class:

        public function Sort()
  {
        for ($i = $this->data->count() – 1; $i >= 0; $i–)
        {
              $flipped = false;
              for ($j = 0; $j < $i; $j++)
              {
                    if (strcmp($this->data[$j]->GetSortKey(), 
                          $this->data[$j + 1]->GetSortKey()) > 0)
                    { 
                          $tmp = $this->data[$j];
                          $this->data->offsetSet($j, $this->data
[$j + 1]);
                          $this->data->offsetSet($j + 1, $tmp);
                          $flipped = true;
                    }
              }
              if (!$flipped)
                    return;
        }
  }

Very simple and straightforward. I will not explain the code in the algorithms in this article, I leave it to you the reader as an exercise. You will benefit from learning how these algorithms work if you do not know already! Let’s have a look at the code for an insertion sort:

        public function InsertionSort()
  {
       for ($i = 1; $i < $this->data->count(); $i++)
        {
              $j = $i;
              $tmp = $this->data[$i];
              while (($j > 0) && (
                    strcmp($this->data[$j - 1]->GetSortKey(), 
                          $tmp->GetSortKey()) > 0)
                    )
              {
                    $this->data->offsetSet($j, $this->data[$j -
1]);
                    $j–;
              }
              $this->data->offsetSet($j, $tmp);
        }
  }

Now a selection sort:

        public function SelectionSort()
  {
        for ($i = 0; $i < $this->data->count(); $i++)
        {
              $min = $i;
              $j = 0;
               
              for ($j = $i + 1; $j < $this->data->count(); $j++)
              {
                    if (strcmp($this->data[$j]->GetSortKey(), 
                          $this->data[$min]->GetSortKey()) < 0)
                    {
                          $min = $j;
                    }
              }                 

              $tmp = $this->data[$min];
              $this->data->offsetSet($min, $this->data[$i]);
              $this->data->offsetSet($i, $tmp);
        }
  }

And for the last quadratic complexity sort, the shell sort:

        public function ShellSort()
  {
        $increment = 3;                 

        while ($increment > 0)
        {
              for ($i = 0; $i < $this->data->count(); $i++)
              {
                    $tmp = $this->data[$i];
                    $j = $i;                       

                    while ($j >= $increment)
                    {
                          if ($this->data[$j - $increment])
                          {
                                if (strcmp($this->data[$j -
$increment]->GetSortKey(), 
                                      $tmp->GetSortKey()) > 0)
                                {
                                      $this->data->offsetSet($j, 
                                            $this->data[$j -
$increment]);
                                      $j -= $increment;
                                }
                          }
                    }
                    $this->data->offsetSet($j, $tmp);
              }
               
              if ($increment % 2 != 0)
                    $increment = ($increment – 1) / 2;
              elseif ($increment == 1)
                    $increment = 0;
              else 
                    $increment = 1;
        }
 }

All of these sorts are slow. The shell sort is dramatically faster than the rest, but still slow relative to the algorithms. Before we delve into performance, though, I want to show the code for the n log n complexity algorithms.

{mospagebreak title=Sort Algorithms of n log n Complexity}

Heap Sort:

        public function HeapSort()
  {
        for ($i = ($this->data->count() / 2) – 1; $i >= 0; $i–)
              $this->HeapSortSiftDown($i, $this->data->count());                 

        for ($i = $this->data->count() – 1; $i >= 1; $i–)
        {
              $tmp = $this->data[0];
              $this->data->offsetSet(0, $this->data[$i]);
              $this->data->offsetSet($i, $tmp);
              $this->HeapSortSiftDown(0, $i – 1);
        }
  }     

  private function HeapSortSiftDown($i, $arraySize)
  {
        $done = 0;
        while (($i * 2 <= $arraySize) && (!$done))
        {
              if ($i * 2 == $arraySize)
                   $maxChild = $i * 2;
              elseif (strcmp($this->data[$i * 2]->GetSortKey(),
                    $this->data[$i * 2 + 1]->GetSortKey()) > 0)
                    $maxChild = $i * 2;
              else
                    $maxChild = $i * 2 +
1;                       

              if (strcmp($this->data[$i]->GetSortKey(),
                    $this->data[$maxChild]) < 0)
              {
                    $tmp = $this->data[$i];
                    $this->data->offsetSet($i, $this->data
[$maxChild]);
                    $this->data->offsetSet($maxChild, $temp);
                    $i = $maxChild;
              }
              else
                    $done = 1;
        }
  }

Merge Sort:

        public function MergeSort()
  {
        $this->MSort($this->data, array(), 0, $this->data->count
() – 1);
  }     

  private function MSort($data, $temp, $left, $right)
  {
        if ($right > $left)
        {
              $mid = ($right + $left) / 2;
              $this->MSort($data, $temp, $left, $mid);
              $this->MSort($data, $temp, $mid + 1,
$right);                 

              $this->Merge($data, $temp, $left, $mid + 1,
$right);
        }
  }
  private function Merge($data, $temp, $left, $mid, $right)
  {
        $leftEnd = $mid – 1;
        $tmpPos = $left;
        $numElements = $right – $left + 1;           

        while (($left <= $leftEnd) && ($mid <= $right))
        {
              if (strcmp($data[$left]->GetSortKey(),
                    $data[$mid]->GetSortKey()) <= 0)
              {
                    $temp[$tmpPos] = $data[$left];
                    $tmpPos++;
                    $left++;
              }
              else
              {
                    $temp[$tmpPos] = $data[$mid];
                    $tmpPos++;
                    $mid++;
              }
        }           

        while ($left <= $leftEnd)
        {
              $temp[$tmpPos] = $data[$left];
              $left++;
              $tmpPos++;
        }
        while ($mid <= $right)
        {
              $tmp[$tmpPos] = $data[$mid];
              $mid++;
              $tmpPos++;
        }
        for ($i = 0; $i <= $numElements; $i++)
        {
              $data->offsetSet($right, $temp[$right]);
              $right–;
        }
  }

Quick Sort:

            public function QuickSort()
      {
            $this->QSort($this->data, 0, $this->data->count() -
1);
      }     

      private function QSort($data, $left, $right)
      {
            $lHold = $left;
            $rHold = $right;
            $pivot = $data[$left];           

            while ($left < $right)
            {
                  while ((strcmp($data[$right]->GetSortKey(),
                        $pivot->GetSortKey()) >= 0) && 
                        ($left < $right))
                  {
                        $right–;
                  }
                  if ($left != $right)
                  {
                        $data->offsetSet($left, $data[$right]);
                        $left++;
                  }
                  while ((strcmp($data[$left]->GetSortKey(),
                        $pivot->GetSortKey()) <= 0) && 
                        ($left > $right))
                  {
                        $left++;
                  }
                  if ($left != $right)
                  {
                        $data->offsetSet($right, $data[$left]);
                        $right–;
                  }
            }           

            $data->offsetSet($left, $pivot);
            $pivot = $left;
            $left = $lHold;
            $right = $rHold;
            if ($left < $pivot)
                  $this->QSort($data, $left, $pivot – 1);
            if ($right > $pivot)
                  $this->QSort($data, $pivot + 1, $right);
      }

In order to test these algorithms, I simply added them to my class and then in Sort(), called whichever one I wanted to use.

{mospagebreak title=Metrics}

Here are some lovely graphs borrowed from http://linux.wku.edu/~lamonml/algor/sort/sort.html. The first graph shows execution time, represented in seconds on the vertical axis and the number of items in a data set on the horizontal axis. The second graph shows execution time in tenths of a second. You can perhaps guess which sort of complexity is represented by each graph!

 

The n log n algorithms absolutely crush the quadratic algorithms in terms of speed.

Conclusion

In this article we have reviewed the primary sorting algorithms and their general properties. We have reviewed how quickly they complete relative to the size of the data set to be sorted. We have also discussed how to choose the appropriate algorithm. The metrics presented in this article were borrowed from http://linux.wku.edu/~lamonml/algor/sort/sort.html, who was kind enough to compile them, knowing that eventually I would need them-–so, thanks to wku.edu.

As I wrote this article, it occurred to me that we could go one step further and throw in an implementation of the Strategy design pattern using the contents of this article. We will create a SortStrategy class and create a method in the base Collection class to select a SortStrategy class based on the data it contains. Since we have reviewed the best and worst of our two classes of algorithms and know how complex (or simple) they are, we will pick the “best” from each class and go from there.

Google+ Comments

Google+ Comments