# Sort This Sort That

One of the main activities in algorithms and programs, whose purpose is manipulating data, is sorting. So if you write these kinds of programs-what kind of method for sorting data is best? In this article I will present some algorithms for sorting and their advantages and weaknesses.

One of the main problems with sorting performance is moving data.  Shuffling records can be time-expensive, but also can be avoided. I will discuss some ways of sorting keys without moving records. For this demonstration, the techniques will sort data which are only single fields acting as keys. In live databases, you will use whole records that include the key fields. (Keys are fields that you can use the >, <, and = symbols for comparison purposes.)

A disclaimer before I begin: For describing these algorithms I will use pseudo-language. In this way, concepts of these algorithms will be understandable to anyone regardless of the programming language.

{mospagebreak title=Sorting}

Let’s begin with sorting techniques. There are two groups of sorting methods: inner and outer. Inner is often used when you have data to hold in operative memory, and outer sorting is used on large quantities of data located on peripherals (such as hard disks) organized in files.

Inner Sorting

Assume that we have our keys placed in an array (a[1], a[0], … , a[n]).  Of course you can use something else to store your keys, perhaps linked-list, but in this case a simple array is good. (Linked lists are slower due to sequential data reading each time.)

The first group of inner-sorting methods is termed “sorting by comparing”. There are four method types:

• inserting methods
• selection methods
• replacing methods
• reconnecting methods

Reconnecting methods is also used for outer–sorting.

{mospagebreak title=Inserting Methods}

These methods are based on step-by-step array sorting. You have two parts of an array, sorted and unsorted, so you take one element from the unsorted part and put it in the right place of sorted part.

Let’s see an example of direct inserting method or insertion-sort algorithm:

[code]
INSERTION-SORT
for i=2 to n do
k=a[i]
j=i-1
while( j>0) and (a[j]>k) do
a[j+1] = a[j]
j=j-1
end_while
a[j+1]=k
end_for
[/code]

This is how this algorithm works. After i iterations of the for loop, our array has sorted and unsorted parts. (See image below.)  In the next iteration we take the first element from the unsorted part and place it in variable k. In the while loop we compare this variable with the last element in the sorted part of our array.

If the value of the element in the while loop is bigger than the value of the for loop we move it one place above in our array. Now we take the next element from the sorted part and compare it with variable k.  If this element is bigger we move it up one position in the array.

We repeat this process until we find an element value bigger than the value of variable k. At this moment we stop the while loop, and place our value of variable k one place behind found value. After our while loop is over (when i reaches value n) our array will be sorted. In this case, in ascending order.

This method is most effective if our array of keys is smaller and almost sorted.  The worst thing about this algorithm is that it requires much record moving. This leads to the conclusion that our array should be organized as a linked-list if our records are big. In this way we can just reconnect pointers.

{mospagebreak title=Decrementing Increment or Just Shell-Sort}

Shell-sorting is a better solution than insertion-sort, but it also contains insertion-sorting. First it splits our unsorted array into groups. In each group are elements whose positions in the array are on equidistant positions.

The number of spaces between elements in groups is called increment h1. Tbe number of groups, naturally, suits the value of increment. Now each group is sorted by insertion-sort method, after that our array is h1-sorted.

In our next step our increment is decremented and its value is h2.  Then we form smaller numbers of groups with more elements per group. Our algorithm ends when our increment reaches value one. This means that now we just use insertion-sort on the whole array, after that our array is completely sorted.

Let’s see what this algorithm looks like:

[code]
SHELL-SORT:
for i =1 to n do
inc=h[i]
for j=inc+1 to n do
y=a[j]
k=j-inc
while ((k>=1) and (y<=a[k])) do
a[k+inc] = a[k]
k=k-inc
end_while
end_for
end_for
[/code]

In this algorithm we assume that we have our increments in an array h[1:t]. Notice that everything inside the second for loop is just insertion-sort method used on groups.

It’s not obvious that this method has better performances than insertion-sort, but it does. In the beginning our increment h is big so groups are smaller, that means that speed increases on each group using insertion-sort. Each consecutive step our groups are bigger, but now those groups are sorted, facilitating insertion-sort on remaining groups. See step-by-step how this works below:

{mospagebreak title=Selection Methods}

The main strategy for the selection methods is to find, in each step, smaller or greater elements in unsorted arrays. Then place these elements as the first elements of sorted arrays. This strategy assumes that our array will have two parts: sorted and unsorted.

First method from this group is selection-sort:

[code]
SELECTION-SORT:
for i=1 to n-1 do
min=a[i]
pos=i
for j=i+1 to n do
if(a[j]<min) then
min=a[j]
pos=j
end_if
end_for
a[pos]=a[i]
a[i]=min
end_for
[/code]

Here we have, just like in the insertion-sort method, sorted and unsorted parts of our array. In the beginning the whole array is unsorted so we find the smallest value in our array and exchange it for the first element in the array.

After i-1 iterations we have the sorted part (a[1], … , a[i-1]) and the unsorted part (a[i], … , a[n]). In the next iteration we find the smallest value in the unsorted part and change its place with the first element from the sorted part, in that way this element becomes a part of the sorted part.

A critical part of this algorithm is that in each step it needs to find the smallest value from the unsorted part of our array.  This can be accomplished by splitting the unsorted part in groups and finding the smallest value in each group, termed the “local minimum”. Now finding the smallest value of the whole unsorted part is finding the smallest value of all local minimums. In the next step we just need to find one local minimum.

{mospagebreak title=Sorting with Binary Tree}

Using the Sorting with Binary Tree method I will cover creating the binary tree and the algorithm that goes through the whole tree and gets values from it in ascending order.

The binary tree is created out of nodes and connections between these nodes. Each node has a data field and two pointers (these pointers present connections between nodes) to two other nodes: called the left and right child. If the node doesn’t have a left or right child its pointer is set to null. The first node of this binary tree (the one that has no parent nodes) is called the root of the binary tree.

To create our binary tree we take the first element and make a root out of it. Then the next element from the array will be placed as the left or right child of the root depending on its value. If it is bigger or equal we put it as a right child, or if it is smaller we put it as a left child. For all other elements we do the same – keeping this child-parent order for all other nodes, as presented in the next picture.

{mospagebreak title=Left(node) and Right(node)}

First, something about the procedure that goes through this binary tree creating sorted array. We have a procedure get_value(node) that accepts as an argument the type of node and gets the node’s data value and puts it in an array.  There are also two procedures named left(node) and right(node) that return the left or right child node of the parent node set as the argument (this means pointers to the left or right child-node). If this is clear, see the procedure for sorting.

[code]
BINARY_TREE_SORT(root)
if(root!=null) then
BINARY_TREE_SORT(left(root))
get_value(root)
BINARY_TREE_SORT(right(root))
end_if
[/code]

As it is shown this procedure is recursive and accepts one argument of type node. When we start this procedure it recursively calls itself until it reaches the further-most left child-node that has no left child of it’s own. Then with a get_value procedure, it stores that node’s data value in the array and again calls binary_tree_sort procedure for that node’s further-most right child.  This recursion goes on until all elements (all recursion ends) are visited. This algorithm is efficient for all kinds of binary trees and also known as “inorder tree visit”.

Conclusion

I have presented two types of sorting methods: inserting and selection methods. In the next article we will go further with inner-sorting methods and I will try to present to you some linear-complex methods for sorting such as counting-sort, bucket-sort and distribution sort. Until then enjoy your sorting!