In the first part of this series we implemented a basic sortable collection class. We used a Bubble Sort algorithm to order the elements in the collection, which came with a disclaimer regarding what a slow sort it is. This article will examine the primary sorting algorithms with code examples, and some empirical data regarding how they perform in relation to one another, as well as the size of the data set in question.
We are going to implement the following sort algorithms for our tests:
Bubble Sort (Implemented in part one)
Heap Sort
Insertion Sort
Merge Sort
Quick Sort
Selection Sort
Shell Sort
We will also create a function to fill up our collection with random data in order to test the sort algorithms with a sufficiently large data set. The sort algorithms listed above are the ones that every computer science student learns in college and are the primary sort algorithms found in real-world applications. Before we actually write code to implement them, let’s discuss a few basic facts.
These algorithms can be grouped into two categories based on their algorithmic complexity:
Algorithms with O(n2) complexity, also called quadratic complexity – bubble, insertion, selection and shell sorts. Algorithms of quadratic complexity are agonizingly slow with large data sets. A data set with 10,000 elements takes 10,000 times longer to process than a data set with 1,000 elements, and a set with 1,000 elements takes 1,000 times longer than a set with 100 elements, and so on.
Algorithms with O(n log n) complexity, also called n log n complexity – heap, merge and quick sorts. n log n complexity is as much better than linear complexity as quadratic is worse. An algorithm completing in constant time would be preferable, but in the case of sorting this is accepted as an impossibility. An example of n log n complexity is the number of bits required to store an integer.
As you may have guessed, n log n complexity implies an inherently faster algorithm than one of quadratic complexity; the tradeoff is in the code itself. Faster algorithms in the case of sorting involve recursion, multiple arrays, and complicated data structures, but they run circles around their slower cousins. Choosing the proper sort algorithm is a subject unto itself, but in this article we will cover the general factors to be considered when choosing a sort algorithm.
First though, we need to whip up a touch of code to create a big data set. In the example below, set $numItems to however many data values you want in the collection.