The Sorting Problem, Part 2: Stable comparison sorts.

Note: ‘upto’ = inclusive.

1) Bubble sort:

start from one end of your input array of ‘n’ elements. Keep a pointer (logical or actual) called ‘i‘ at the index position. Start another pointer ‘j‘ from 1 every iteration, and go till the end of the array. If a[j-1]>a[j], swap the two.
Doing this, we start getting the elements in sorted order….from the opposite end of the array from which we started. With every iteration of i, one more element gets into sorted order at the end.

Hence, as  the last elements are sorted, we can stop j before them.


for i=0 upto A.length-1, do:
    for j=1 upto A.length - i - 1, do: 
        if A[j]-1]>A[j] then: 
            swap (A[j-1], A[j])

Bubble sort can be modified with a ‘flag’ value in the if-statement, so that if no swaps happen in one traversal through the array, we exit the loops as the array is sorted. This reduces our best case time from Ω(n^2) to Ω(n). Our worst case-time is still O(n^2) though, when the array is reverse-sorted. It does not use any extra space.

Over the years, people have come out with various Bubble sort variations:

  • Cocktail-Shaker sort, a slightly more efficient version of Bubble, which sorts first at one end, and then the other. It runs in roughly half the time as bubble, but that is still O(n^2) and hence it is not really worth mentioning more about. It is unstable.
  • There also exists Comb sort, which is yet another bubble sort modification. It includes using bubble sort with different comb lengths, decreasing the comb length to one in the last case, where it is just ordinary bubble sort. This sort uses a property both insertion and bubble sort have: almost O(n) runtime for almost-sorted data. However, comb sort is also unstable.
  • Odd-even sort is yet another bubble-sort derivative. No one really cares about it, either.


2) Insertion sort:

start i from the left end of your input array, from index 1. Hold the value of your A[i] in a temporary variable, ‘key’. Keep moving elements one place in front (overwrite A[i] if necessary, as we have its value in ‘key’) until you find the proper place to put your value ‘key’. Select your next A[i] as your new ‘key’ and repeat until sorted.


Many people get insertion sort wrong; we don’t select each element and put it at the end…that’s selection sort.


for i = 1 to length(A)
    key = A[i]
    j = i
    while j > 0 and A[j-1] > key
        A[j] = A[j-1]
        j = j - 1
    A[j] = key

The time complexity of insertion sort, like bubble sort, is O(n^2) for when the array is reverse-sorted, and Ω(n) for when it is sorted. Average case is Θ(n^2).


Insertion sort variations (Note: some are unstable):

  • Shell sort:
    Despite being Θ(n^2), insertion sort is usually actually very fast on almost-sorted data. It is also very fast on very small amounts of data: it will beat O(nlogn) sorts for arrays of less than 10 elements.
    This unique property of insertion sort gave rise to Shell sort, which uses insertion sort multiple times, on different sets of elements. The first time, it considers the array made up of every ‘a’ th element, then sorts it accordingly. Then it selects every ‘b’th element (where b<a) then sorts it. Then it selects every ‘c’the element (where c<b<a) and sorts it. etc.
    a, b, c, d…. are called the ‘spans’, and are usually sorted in an array of their own and how they are selected is a subject on its own. In the last iteration of shell sort, span=1 and it is basically insertion sort. The data is almost sorted, and insertion sort runs very fast on it.
    Unfortunately, Shell sort is unstable (i.e. it sorts, but it may change the relative positions of keys of equal values), and it is really not very useful because it depends on how we select our spans. Most logically, we should select them in such a way that our largest span gets us subarrays of <10 elements each. But like I said, it’s complicated.
    Likewise, calculating Shell sort’s complexity is complicated, but one thing is for sure: its best case is O(nlogn) as it is a comparison sort.

Shell sort, which has nothing to do with the UNIX shell.


  • Gnome sort, is a mixture of insertion and bubble made in 2000. It starts off as insertion sort, but instead of shifting elements forward like insertion, it instead swaps like bubble. It is again, O(n^2). Gnoma sort is stable because it changes so little in the way insertion sort works.
  • Library sort, is an exact replica of insertion sort with spaces left in the array, like a librarian leaves spaces in rows of books. It is supposed to be Θ(n.logn), but it is not specified how to leave the spaces. It is hence unrealistic.


3) Merge sort:

This is one of the simplest O(n.logn) sorting algorithms. It runs like this:


 If merging is wrong I don’t wanna be right.

The code for merge sort can be found on these links:

I did not include it as it is quite large. Merge sort can be easily implemented iteratively, but, by design, it is recursive, as it follows the divide-and-conquer procedure. The iterative version is easier to implement, in my opinion.

It has two basic procedures:

  1. Divide: select your size of each of the two array to merge, from 1 to 2 to 4 to 8 etc….double your sizes of arrays to merge every iteration.
  2. Merge/combine: put the two sorted arrays in order to get a larger sorted array. Note: an array of size 1 is ‘sorted’.

Pretty simple, right?

One of the problems is figuring out why it is an O(nlogn) sorting alogirhtm. That requires a picture:

This a really shitty picture I could find, because it is really confusing for someone who has never seen Merge sort. Unfortunately, it is also the most accurate.

Think about it this way: you have 8 elements. What’s log(8) base 2? its 3. That’s the depth of your merging. Dividing takes really no time, just pass i + size as the length of your array to your merging function. Hence dividing is O(1).

Merging depends on the size. In your last case, it has to merge 4 elements with 4 elements, which is the most it ever will have to, so merging is O(n) exactly. For arrays of size less than n/2, say arrays of size 2, you will have to Merge each pair of subarrays, each of length ‘size’, from the start till the end of the array. This is again, O(n) exactly. So, O(n) x O(log n) = O(n.log n). You need an auxiliary array of size ‘n’ to put the elements in or else you will overwrite elements and hence lose their value. The data is copied back to the main array every time we double the size variable. (Note: alternately, we could just use the main array as the auxiliary array the next time).

Merge sort will work most cleanly when n is an exact multiple of 2, however it will work on every ‘n’…there will just be some shortage in the last subarray.

Time complexity:

Merge sort runs like hell. It is faster than Quicksort on average, and much faster than Heapsort. It is very fast. And it is also stable; with a simple modification via an if-statement, our merging function can maintain the relative order of elements with equal key values.

The real problem of merge sort is that in most cases is the auxiliary memory, which is O(n)….is unacceptable in most cases, especially when dealing with databases with hundreds of gigabytes of data or other memory-limited systems.

There is one exception, though: merge sort on a linked list. This will take O(1) auxiliary space if you enter your data into a linked list. And it will still run in O(n.log n).

Merge sort variations:

  • Timsort, which was developed in 2002 by Tim Petres for the Python programming language. It is now the standard sorting algorithm in Python and used in both Java and Android.  It is a combination of insertion sort and merge sort, and runs in O(n.logn) in the worst-case, and O(n) in the best case. Merge sort will run in O(n.logn) in the best case. Timsort is stable, unlike Shell sort.
  • There is also another variety of insertion-sort and Merge-sort, called Blocksort. It runs in-place in O(n.logn) time with Ω(n) as best case. It is a pretty good optimization of Merge sort for O(1) space, however, it does requite rotations of blocks of data. It is also stable.


Sorting linked lists:

Linked lists, as you probable know, are data structures where every Node is an object/structure that stores a bunch of data. One of the member variable of each node is a pointer to the next Node in the sequence. Sometimes there is also a pointer to the previous node, in which case it is called a doubly linked list.

If in a singly linked list, the last node points to the first, it is called a circular singly liked list. If in a doubly linked list the last node and first node point to each other, it is called a circular doubly linked list.

With linked lists, we allocate each node dynamically and not all at once like in an array. However, like arrays, linked lists are linear data structures. The main difference between linked lists and arrays is that in linked lists, access time for any element at random is O(n), whereas in an array it is O(1); in fact, it is exactly one step: A[x].

This is because, in the computer’s memory, creating an array allocates memory in a sequential order, whereas in a linked list, it is necessary to create every node in a separate location in memory, which can only be accessed via the pointer to that node….which is the pointer of the previous node in our linked list (and also the next node in a doubly linked list). We start accessing the nodes via a pointer to the start of the list, called the ‘list’ or ‘head’ pointer. From there, we must traverse the list to reach any other node in the list. Thus,

An array has random access and a linked list has linear access.

If we lost the pointers to that node, we can’t every find that node again in memory. This is inconvenient compared to arrays, where we only need the first element’s position in memory, and then all positions in memory are considered relative to it (<-this is what A[x] means. We access the ‘x’th element relative to A[0]).

However, linked lists do trump arrays in one respect: allocation time. In a linked list, if we make a new node, we can insert it basically anywhere we want into a linked list. Array, meanwhile, are immovable without copying. Inserting a new element in one place in an array requires creating a brand new array with n+1 elements, and copying all the into that. This is because:

An array has bulk allocation whereas a linked list has discrete allocation.

Arrays and linked lists are, therefore, sort of like the Yin and Yang of data structures.

Yin and Yang stands for two forces which seem opposite,

but which are actually interconnected and complementary.


One of the really cool things about insertion sort and bubble sort: you can run them both on linked lists. In fact, insertion in sorted order into a linked list is basically insertion sort. You just have to modify your algorithm a bit. You can also do this with merge sort.

Here is a simple insertion sort on a linked list by insertion sort. It uses the standard algorithm INSERT-SORTED, which inserts a Node ‘z’ into its sorted position into a presorted linked list. This will sort in O(n^2) time because, if you see, it exactly mirrors insertion sort on an array.

    if (List.head->next==NULL)   
    while(i->next != NULL)
        /*we have now detached our linked list into two smaller 
lists, one starting from temp, and one ending at i. We do this 
because in insertion sort on an array, we go backwards from the 
ith element to the first element and 'insert' into the correct 
Here, we go forward from the first element to the ith element, 
and insert into the correct place. The two algorithms are 
virtually the same; the element might be inserted anywhere between
the first and ith element for both. */
        INSERT-SORTED(j, List) //calls the function
        i->next=temp //reattach the linked lists
    if List.head==NULL
        if(z->data <= p->data)
    prev->next = z

For bubble and insertion sort, the efficiency doesn’t change by using a linked list.

It is just the same as running on an array. They don’t need extra space and will run in O(n^2) time. They take roughly the same amount of actual time, too.

The same cannot be said about Merge sort…….Merge sort actually gets more efficient when you run it on a linked list. Merge sort’s biggest limitation was its auxiliary space. With a linekd list, rearranging elements was easy. Granted, you will need a few pointers, but it does not need the massive O(n) auxiliary space that was needed for sorting an array! Yay! 😀 Proving this, however is left as an “exercise to the user” 😛 (Note: hint).

Quicksort and Heapsort, too, can be run on linked lists, but the truth is, it is wonky and shoddy and weird to do so. These are unstable algorithms, which should be a big pointer to the fact that they don’t access data linearly. Their efficiency depends on random access of an array. In a linked list, access of any element is O(n). Looking at Quicksort, this isn’t really too much of a problem, and on a linked list can and has been done… but, it adds a lot of overhead to the constant. And Quicksort’s low constant for average-time complexity, is the only real reason we ever use it in the first place. As for Heapsort, using a linked list would really be unacceptable in the ‘heapify’ stage, and puts a big dent into its already bloated efficiency, perhaps even pushing it to O(n^2) or beyond.

So, why is it that stable sorting algorithms work really well with linked lists?

This is conjecture, but I really think it is because in a stable sort, you compare one element with every other unsorted one. Insertion and bubble do this very obviously. Merge does it less obviously, but it does this, while merging. Stable sorts compare two objects with the same key value directly, and then allocates them into their relative positions. This is best done by going from the start of the array, to the end, i.e. linearly accessing your array. As a linked list has linear access, it is hence a good match for stable sorting algorithms.


Welp, that’s all for now, folks!

I’m going to continue this post in “The Sorting Problem, Part 3: unstable Comparison sorting”, where I will deal with Quicksort, Heapsort, and why they were invented in the first place.

Also stay tuned for “The Sorting Problem, Part 4: Why Not Radix?” where I deal with the strange, giggling non-comparative sorts. So, make sure you keep your thumbs pointed to the stars, and your browsers open!

About the author:

Abhishek Divekar,

as a kid, really wanted a dog. But his parents bought him a computer instead. And so he decided to trade one for the other, and has loved computers unconditionally ever since. They, meanwhile, have barely just tolerated him. At least, he reasons, they would never eat his clothes or spill fluids all over his bed. Except for that one time, with the Dell.

The Sorting Problem.


Genius mathematicians and Algorithms 101 students concur; sorting is “Not that easy”.

Well, I guess that clears things up.

Thanks for reading.

 One Abe in a bar is worth two Homers in a bush.

But seriously, there are a couple of things that make sorting really hard:

  • You can’t just get it done with in one go; you have to go through your array/tuple multiple times.
  • Every time you move an element, you must check that you don’t accidentally push it into the wrong place and overwrite some other data.
  • Worse, you have to get a machine to do it. And computers are really strange sometimes.
  • And you want your sort to be really fast, too.

These are all reasons why sorting is taught in Algo 101 classes and why brilliant mathematicians are still working on the problem.

It doesn’t mean we have managed to find one perfect way of doing it yet; sorting is still really difficult.


Okay, maybe not that difficult.

So, maybe it’s a good idea to start this post with why and how we sort data.

Disclaimer: this post begins with the following long, boring, definition on sorting.

Via Wikipedia:

The main purpose of sorting information is to optimise its usefulness for specific tasks. In general, there are two ways of grouping information: by category e.g. a shopping catalogue where items are compiled together under headings such as ‘home’, ‘sport & leisure’, ‘women’s clothes’ etc. (nominal scale) and by the intensity of some property, such as price, e.g. from the cheapest to most expensive (ordinal scale).

Richard Saul Wurman, in his book Information Anxiety, proposes that the most common sorting purposes are name, by location and by time (these are actually special cases of category and hierarchy). Together these give the acronym LATCH (Location, Alphabetical, Time, Category, Hierarchy) and can be used to describe just about every type of ordered information.

The opposite of sorting, rearranging a sequence of items in a random or meaningless order, is called shuffling.

For sorting, either a weak order “should not come after” or a strict weak order “should come before”, a strong order. For the sorting to be unique, these two are restricted to a total order and a strict total order, respectively.

Generally objects can be sorted based on a property. Such a component or property is called a sort key. For example, the items are books, the sort key is the title, subject or author, and the order is alphabetical. A new sort key can be created from two or more sort keys by lexicographical order. The first is then called the primary sort key, the second the secondary sort key, etc. For example, addresses could be sorted using the city as primary sort key, and the street as secondary sort key…….

…Basically, the small thing should come before the big thing. What the small thing is, depends on the situation.

Sometimes it is an integer. Sometimes it is a character. Sometimes it is a whole string of numbers, or even the addresses of houses.

Sometimes, we even put the big thing before the small thing. This is called sorting in descending order. By default though, when we sort, we sort in ascending order, because it makes a nice sound.



Types of sorting algorithms:

  1. Comparative sorting algorithms:

    these include (but are not restricted to) insertion, bubble, heap, Shell, merge, heap and quicksort. In these types, every comparison involves comparing entire numbers with one another, and rearranges the numbers (either in-place or using auxiliary i.e. extra memory).
    It has been proved and explained that the average lower-bound of all comparison-based sorting algorithms is Ω(n.logn). There are modifications that can be made to comparison-based sorting algorithms so that they sort in O(n) time when the array is sorted or reverse sorted, but for any random input array, a comparison-based sorting algorithm will not run much better than Θ(n.logn).

  2. Non-comparative sorting algorithms:

    the major ones are bucket sort, counting sort, and radix. Here, we don’t compare the numbers directly…instead we do something else. All the non-comparative sorting algorithms use auxiliary memory, i.e. additional memory (other than our input array). All of them are similar to hashing. And most of them run in linear timeThis is yet another example of the space-time tradeoff that exists in the development of algorithms.

Space and time complexity of all sorting algorithms


I really encourage you to see the above link, which is about the space and time complexities of all the various sorting algorithms. It helps a lot to see how they stack up.

Also, I really, really recommend you see this short, informative, fun Youtube video about how fast different sorting algorithms will run in real time. Every algorithms student and programmer should see it. Here is also a playlist of all of them individually and slowed down.

Other than time and space complexity, there is one more criterion that you need to consider while choosing which sorting algorithm to use…





Now, an important type of sorting algorithms are stable sorting algorithms, i.e. they maintain the relative order of objects with equal keys. A key is the parameter of the objects with respect to which you wish to sort them.

For example, if you are given a selection of students, presorted by roll-call number. We now sort them by name (first name only). Then, Alice who is #14 will come before Alice who is #27. Robert Ludlum who is #60 will come after Robert Stark who is #35 (remember, we are sorting by first name only) because that is the relative order they were in before we decided to sort the children by name.


Stability is more important than you think.

Tower of Hanoi: advanced mode.

Take another example:

You are buying a new cellphone. You check out various candidates weigh different features and operating systems, but in the end it all comes down to one thing: money.

You make a table of all your possible candidates with their respective prices. Initially, you’d probably make a list that looks like this:

Samesung DS4 $530
Samesung NotDifferentAtAll
Samesung Not
Samesung Not3
Samesung DS5
macromin Easel $270
Mapple myPhone $1450
Mapple myPhone 5
Mapple myPhone 6 $1150
Karcoal Uranium $330
Voogle Xenus 5
HTTPS Two $890
HTTPS Desire 982 $800
Okai Candella L432 – 320 uberpixel Zion Lens $1000
Blueberry 3.1416 $490

As you can see, these are all grouped by device manufacturer, but they are not sorted otherwise.

Suppose you want to sort this data by price. You could use a stable or unstable sorting algorithm to do this, and you would get a table like this:

macromin Easel
Karcoal Uranium
Blueberry 3.1416
Samesung DS4 $530
Samesung DS5
Voogle Xenus 5 $650
Samesung Not
Samesung Not3
Samesung NotDifferentAtAll
HTTPS Desire 982
Mapple myPhone 5
Okai Candella L432 – 320 uberpixel Zion Lens
Mapple myPhone 6
Mapple myPhone $1450

Now, suppose you want to group them back into company name, but you don’t want them to be jumbled back into how they were in the first one. In other words, you still want them to be sorted by price. This means when you sort a second time, you need to make sure that phones with the same company name retain their relative positions. Meaning:

When sorting with respect to multiple keys in succession, stable sorting algorithms are required.
We first sort with respect to the least important parameter, and then successively the more important parameters.

Our need would require you to run two sorts in succession, first by price (the less important one) and next by company name (the more important one), to get an arrangement like this:

Blueberry 3.1416
HTTPS Desire 982
Karcoal Uranium
macromin Easel $270
Mapple myPhone 5
Mapple myPhone 6 $1150
Mapple myPhone
Okai Candella L432 – 320 uberpixel Zion Lens
Samesung DS4
Samesung DS5
Samesung Not
Samesung Not3
Samesung NotDifferentAtAll
Voogle Xenus 5

You can tell which parameters are considered most important by looking at which are the broadest categories. Company name, for example, is the broadest here. Only in case of conflict of company name do we need to add more search parameters. Keeping data sorted with respect to multiple parameters helps searching tremendously, as we can use binary search on each parameter. Companies like Amazon etc. allow you to shop for items in an entirely similar way.

Using stable sorting algorithms hence makes searching faster.

Stable sorting algorithms also have more uses.

  • Radix sort, for example, completely relies on stability to sort (go ahead, try to run Radix by purposely not maintaining stability. The data will never be sorted). So does Bucket sort and Proxmap sort, as well as many more non-comparative sorting algorithms.
  • More than that, we usually like stable sorting algorithms because they make sense, and when they are executed one after the order, using different types of different keys, you get a unique and interesting result. Suppose, you have a list of folders on your computer and you sort them first by ‘name’, and then ‘date modified’. You will get at the top your most recently modified files/folders, not in some random order, but neatly arranged by name.
  • Another important thing about stable sorting algorithms is that you can run them on linked lists.

 Here is a list of all the known stable sorting algorithms. The are mostly varieties of insertion sort, bubble sort, merge sort, radix and bucket sort.

 Conclusion: sorting may be hard, but it is important. And sometimes fun, too.

I am now going to break up my notes on sorting algorithms into three parts, based on what I think are the important categories they fall into. They will be divided in three separate posts. They are:

  • Stable Comparative sorts
  • Unstable Comparative sorts
  • Non-comparative sorts

Let’s get on with part 1…