The Sorting Problem.

BREAKING NEWS:

Genius mathematicians and Algorithms 101 students concur; sorting is “Not that easy”.

Well, I guess that clears things up.

Thanks for reading.

 One Abe in a bar is worth two Homers in a bush.

But seriously, there are a couple of things that make sorting really hard:

  • You can’t just get it done with in one go; you have to go through your array/tuple multiple times.
  • Every time you move an element, you must check that you don’t accidentally push it into the wrong place and overwrite some other data.
  • Worse, you have to get a machine to do it. And computers are really strange sometimes.
  • And you want your sort to be really fast, too.

These are all reasons why sorting is taught in Algo 101 classes and why brilliant mathematicians are still working on the problem.

It doesn’t mean we have managed to find one perfect way of doing it yet; sorting is still really difficult.

 

Okay, maybe not that difficult.


So, maybe it’s a good idea to start this post with why and how we sort data.

Disclaimer: this post begins with the following long, boring, definition on sorting.

Via Wikipedia:

The main purpose of sorting information is to optimise its usefulness for specific tasks. In general, there are two ways of grouping information: by category e.g. a shopping catalogue where items are compiled together under headings such as ‘home’, ‘sport & leisure’, ‘women’s clothes’ etc. (nominal scale) and by the intensity of some property, such as price, e.g. from the cheapest to most expensive (ordinal scale).

Richard Saul Wurman, in his book Information Anxiety, proposes that the most common sorting purposes are name, by location and by time (these are actually special cases of category and hierarchy). Together these give the acronym LATCH (Location, Alphabetical, Time, Category, Hierarchy) and can be used to describe just about every type of ordered information.

The opposite of sorting, rearranging a sequence of items in a random or meaningless order, is called shuffling.

For sorting, either a weak order “should not come after” or a strict weak order “should come before”, a strong order. For the sorting to be unique, these two are restricted to a total order and a strict total order, respectively.

Generally objects can be sorted based on a property. Such a component or property is called a sort key. For example, the items are books, the sort key is the title, subject or author, and the order is alphabetical. A new sort key can be created from two or more sort keys by lexicographical order. The first is then called the primary sort key, the second the secondary sort key, etc. For example, addresses could be sorted using the city as primary sort key, and the street as secondary sort key…….

…Basically, the small thing should come before the big thing. What the small thing is, depends on the situation.

Sometimes it is an integer. Sometimes it is a character. Sometimes it is a whole string of numbers, or even the addresses of houses.

Sometimes, we even put the big thing before the small thing. This is called sorting in descending order. By default though, when we sort, we sort in ascending order, because it makes a nice sound.

 

 

Types of sorting algorithms:

  1. Comparative sorting algorithms:

    these include (but are not restricted to) insertion, bubble, heap, Shell, merge, heap and quicksort. In these types, every comparison involves comparing entire numbers with one another, and rearranges the numbers (either in-place or using auxiliary i.e. extra memory).
    It has been proved and explained that the average lower-bound of all comparison-based sorting algorithms is Ω(n.logn). There are modifications that can be made to comparison-based sorting algorithms so that they sort in O(n) time when the array is sorted or reverse sorted, but for any random input array, a comparison-based sorting algorithm will not run much better than Θ(n.logn).

  2. Non-comparative sorting algorithms:

    the major ones are bucket sort, counting sort, and radix. Here, we don’t compare the numbers directly…instead we do something else. All the non-comparative sorting algorithms use auxiliary memory, i.e. additional memory (other than our input array). All of them are similar to hashing. And most of them run in linear timeThis is yet another example of the space-time tradeoff that exists in the development of algorithms.

Space and time complexity of all sorting algorithms

 

I really encourage you to see the above link, which is about the space and time complexities of all the various sorting algorithms. It helps a lot to see how they stack up.

Also, I really, really recommend you see this short, informative, fun Youtube video about how fast different sorting algorithms will run in real time. Every algorithms student and programmer should see it. Here is also a playlist of all of them individually and slowed down.

Other than time and space complexity, there is one more criterion that you need to consider while choosing which sorting algorithm to use…

 

 

 


Stability:

Now, an important type of sorting algorithms are stable sorting algorithms, i.e. they maintain the relative order of objects with equal keys. A key is the parameter of the objects with respect to which you wish to sort them.

For example, if you are given a selection of students, presorted by roll-call number. We now sort them by name (first name only). Then, Alice who is #14 will come before Alice who is #27. Robert Ludlum who is #60 will come after Robert Stark who is #35 (remember, we are sorting by first name only) because that is the relative order they were in before we decided to sort the children by name.

 

Stability is more important than you think.

Tower of Hanoi: advanced mode.

Take another example:

You are buying a new cellphone. You check out various candidates weigh different features and operating systems, but in the end it all comes down to one thing: money.

You make a table of all your possible candidates with their respective prices. Initially, you’d probably make a list that looks like this:

Samesung DS4 $530
Samesung NotDifferentAtAll
$750
Samesung Not
$650
Samesung Not3
$750
Samesung DS5
$630
macromin Easel $270
Mapple myPhone 6.0.0.0.2 $1450
Mapple myPhone 5
$900
Mapple myPhone 6 $1150
Karcoal Uranium $330
Voogle Xenus 5
$650
HTTPS Two $890
HTTPS Desire 982 $800
Okai Candella L432 – 320 uberpixel Zion Lens $1000
Blueberry 3.1416 $490

As you can see, these are all grouped by device manufacturer, but they are not sorted otherwise.

Suppose you want to sort this data by price. You could use a stable or unstable sorting algorithm to do this, and you would get a table like this:

macromin Easel
$270
Karcoal Uranium
$330
Blueberry 3.1416
$490
Samesung DS4 $530
Samesung DS5
$630
Voogle Xenus 5 $650
Samesung Not
$650
Samesung Not3
$750
Samesung NotDifferentAtAll
$750
HTTPS Desire 982
$800
HTTPS Two
$890
Mapple myPhone 5
$900
Okai Candella L432 – 320 uberpixel Zion Lens
$1000
Mapple myPhone 6
$1150
Mapple myPhone 6.0.0.0.2 $1450

Now, suppose you want to group them back into company name, but you don’t want them to be jumbled back into how they were in the first one. In other words, you still want them to be sorted by price. This means when you sort a second time, you need to make sure that phones with the same company name retain their relative positions. Meaning:

When sorting with respect to multiple keys in succession, stable sorting algorithms are required.
We first sort with respect to the least important parameter, and then successively the more important parameters.

Our need would require you to run two sorts in succession, first by price (the less important one) and next by company name (the more important one), to get an arrangement like this:

Blueberry 3.1416
$490
HTTPS Desire 982
$800
HTTPS Two
$890
Karcoal Uranium
$330
macromin Easel $270
Mapple myPhone 5
$900
Mapple myPhone 6 $1150
Mapple myPhone 6.0.0.0.2
$1450
Okai Candella L432 – 320 uberpixel Zion Lens
$1000
Samesung DS4
$530
Samesung DS5
$630
Samesung Not
$650
Samesung Not3
$750
Samesung NotDifferentAtAll
$750
Voogle Xenus 5
$650

You can tell which parameters are considered most important by looking at which are the broadest categories. Company name, for example, is the broadest here. Only in case of conflict of company name do we need to add more search parameters. Keeping data sorted with respect to multiple parameters helps searching tremendously, as we can use binary search on each parameter. Companies like Amazon etc. allow you to shop for items in an entirely similar way.

Using stable sorting algorithms hence makes searching faster.

Stable sorting algorithms also have more uses.

  • Radix sort, for example, completely relies on stability to sort (go ahead, try to run Radix by purposely not maintaining stability. The data will never be sorted). So does Bucket sort and Proxmap sort, as well as many more non-comparative sorting algorithms.
  • More than that, we usually like stable sorting algorithms because they make sense, and when they are executed one after the order, using different types of different keys, you get a unique and interesting result. Suppose, you have a list of folders on your computer and you sort them first by ‘name’, and then ‘date modified’. You will get at the top your most recently modified files/folders, not in some random order, but neatly arranged by name.
  • Another important thing about stable sorting algorithms is that you can run them on linked lists.

 Here is a list of all the known stable sorting algorithms. The are mostly varieties of insertion sort, bubble sort, merge sort, radix and bucket sort.


 Conclusion: sorting may be hard, but it is important. And sometimes fun, too.

I am now going to break up my notes on sorting algorithms into three parts, based on what I think are the important categories they fall into. They will be divided in three separate posts. They are:

  • Stable Comparative sorts
  • Unstable Comparative sorts
  • Non-comparative sorts

Let’s get on with part 1…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s