I started this blog for those interested in computer science, and coming from that background, I expect you’ll be interested in how an algorithm’s complexity really makes a difference. I also expect that you know what I’m talking about when I say complexity. If you don’t, it would really be a bother to have to explain it all over again, but luckily I don’t have to!
Here are a few links from Wikipedia and Cornell University to get you started.
Note: this table contains exact values, but only the order of magnitude is more important, because the same algorithm might be coded in different ways, by different programmers, and hence the exact function of ‘n’ which describes the algorithm’s complexity may differ with different codes. However, if the method of approaching the problem in all of these codes is the same, the asymptotic notations O(n), O(nlogn) etc, will be the same as well.
Observing function values at various values of ‘n’:
|n =||n!||2^n||n^3||(n^2).log(n) [base:2]||n^2||n.log(n) [base 2]||n||log(n) [base 2]|
|21||5.1091 E19||2.097 E6||9,261||1,937.012||441||92.239||21||4.392|
|28||3.048 E29||2.684 E8||21,952||3,768.966||784||134.606||28||4.807|
|36||3.72 E41||6.872 E10||46,656||6,700.223||1,296||186.117||36||5.17|
|45||1.962 E56||3.5124 E13||91,125||11,121.003||2,205||274.133||45||5.492|
|55||1.26964 E73||3.603 E16||166,375||17,488.613||3,025||317.975||55||5.781|
|66||5.443 E92||7.739 E19||287,496||26,329||4,356||398.93||66|
|78||1.132 E115||3.022 E23||474,552||38,240||6,084||490.26||78|
|91||1.352 E140||2.476 E27||753,571||53,891||8,281||592.21||91|
|105||1.081 E168||4.056 E31||1.157 E6||74,024||11,025||704.996||105||6.714|
|120||6.69 E198||1.329 E36||120|
|153||1.142 E46||3.58 E6||153|
|190||9.68 E351||1.569 E57||273,271||190|
|210||1.6455 E63||9.0261 E6||340,198||44,100||1,620||210||7.714|
|skip +21 to +23.||..||..||..||..||..||..||..||..|
|300||3.0606 E614||2.037 E90||2.7 E7||740,793||90,000||2,468||300||8.229|
|351||4.324 E7||1.042 E6||123,201||2,968||351|
|skip +27 to +30||…||…||…||…||…||…||…||…|
|496||1.22 E8||2.203 E6||246,016||4,441||496||8.954|
|skip +32 to +35||…||…||…||…||…||…||…||…|
|666||1.0106 E1593||3.062 E200||2.954 E8||4.16 E6||443,556||6,246||666||9.379|
|skip +37 to +39||…||…||…||…||…||…||…||…|
|820||5.5139 E8||6.5085 E6||672,400||7,937||820||9.6795|
|skip +41 to +44||…||…||…||…||…||…||…||…|
|1035||3.682 E311||1.109 E9||1.073 E7||1.071 E6||10,365||1035||10.0154|
|skip +46 to +54||….||….||….||….||….||….||….||….|
|1540||3.652 E9||2.51 E7||2.372 E6||16,306||1540||10.589|
|skip +56 to +68||….||….||….||….||….||….||….||….|
|2415||1.4084 E10||6.554 E7||5.83 E6||27,139||2415|
|skip +70 to +88||….||….||….||….||….||….||….||….|
|4005||6.424 E10||1.92 E8||1.604 E7||47,930||4005||11.968|
|skip +90 to +114||…..||…..||…..||…..||…..||…..||…..||…..|
|6670||2.9674 E11||5.65 E8||4.45 E7||84,732||6670||12.7035|
|skip +116 to +140||…..||…..||…..||…..||…..||…..||…..||…..|
|10,011||2.865 E37||4.086 E3013||1.0033 E12||1.332 E9||1.002 E8||133,039||10,011||13.289|
|25,000||1.5625 E13||9.131 E9||6.25 E8||365,241||25,000||14.61|
|100,000||E15||1.661 E11||E10||1.661 E6||100,000||16.61|
|250,000||1.5625 E16||1.121 E12||6.25 E10||4.483 E6||250,000||17.93|
|E6||8.264 E(5,565,708)||9.9 E(301,029)||E18||1.993 E13||E12||1.993 E7||E6||19.932|
I really love this table because it shows just how massive n! and 2^n really are in comparison with the other functions.
I’ve also put in bold where the number of steps required for that function exceeds a million:
- For n^2, it happens when the input is a thousand.For n.log(n) (base 2), it is around a hundred thousand.But for 2^n it happens at n=20, and for n!, n=10.This just goes to show why complexity is key in writing your algorithms. There are a set of problems called NP-complete problems (like the Travelling Salesman) which are million-dollar problems, said to be unsolvable (theoretically) and even with dynamic programming algorithms, they take either O(2^n) or O(n!) time. It’s also speculated that solving any one problem, indirectly solves all the others.
- For log(n) (base 2), it isn’t shown, but basic logarithmic maths should point you to the last row in the 2^n column. See that number (9.9 E301029)? That’s how many inputs we must have, for the number of steps our O(logn) program to exceed a million steps.
Putting that into perspective, the distance from the Earth to the Sun is around 1.5 E20 ….. picometers.
This gives you an idea of just how efficient binary search and inserting into a binary search tree, etc, really is. If we have 100 billion sorted integer numbers (which will take about 20 GB of hard drive space) binary search will find the right number in just 37 steps.
- The simple logarithmic function is just about as close to O(1) you can get, which is why a binary search tree is so very, very efficient in finding an element, inserting an element, deleting an element, etc. The heap, which is most commonly used to visualize heap-sort – an O(n.log(n)) sorting algorithm), is a binary tree.
In fact, there are very few applications in which your complexity of using a balanced Binary Search Tree exceeds O(n). That’s why the BST is my favorite data structure.
I’ve often wondered about whether it’s worthwhile making a ternary search tree, so I drew up a graph using the great graphing tool, Desmos graphing calculator.
In the big picture (see below), you can see what I found; that it doesn’t make much difference which base you use, logartihmic time is pretty similar.
Maths note: log(2) ~=0.3 (base 10)
I think I’ve made my point, but if you’re still unconvinced, check this out:
You probably know that merge and heap sort are so very efficient with their O(n.log(n)) worst-time complexity, but I think the numbers do the real talking:
- To reach a million steps, O(n.log(n))-complexity merge/heap sort, require n=100,000 elements as input.
- Bubble/insertion/quick sort with O(n^2) complexity require only an input of n=1,000 to reach 1 million steps.
That’s a hundredfold gap . . . but then you notice that non-comparative sorts like radix, bucket and counting sort, would sort in O(n), i.e. a function relative to the number of elements you have in the input array. . . .whereas it has been proven that comparative sorting algorithms will max out at O(n.log n) complexity.
So why do the other sorting algorithms even exist?
It’s worth noting that even the best O(n.log(n)) comparative and O(n) non-comparative sorts, have a space-complexity of O(n), as compared to the O(1) space complexity of sorting algorithms like bubble/insertion/quick sort, which take O(n^2) time. Also, they all have their special sets of weaknesses.
Which is pretty bad. I mean, okay, all sorting algorithms must take O(n) space complexity at least – to store their data structure of ‘n’ elements – but the auxiliary (i.e. ‘extra’) space required for most O(n.log(n)) & O(n) sorts, is also O(n). And recursive implementation of a sorting algorithm takes O(n^2) space complexity, because the previous states are saved on the system’s function call stack.
The exception is heap-sort, where if you build the heap in-place (i.e. you modify the original array you are given), the space-complexity becomes O(1) i.e. you only need a fixed number of variables required to implement the sort, and your time-complexity doesn’t exceed O(n.log(n)) steps.
So my Completely Unrequested Recommendation (CUR) is: use Heap Sort! 😛
In actuality, there exist so many different sorting algorithms mostly because when you consider things like the average case, the limitations of different systems, and functionality…
We need to use different sorts in different situations.
Quicksort may have quadratic time…however, that is only in the worst case. In the average case, it is O(n.logn) or better. Heapsort, meanwhile, will be O(n.logn) always, because the indirect method it takes towards sorting an array. In real life – such as the Arrays.sort() function in Java – Quicksort is used most often.
Then there’s stability.
Merge, Radix, bubble and insertion are stable, but Heap sort is not. So for specific applications, you need a different kind of sort.
And another thing:
People keep finding new ways to improve them.
For example: Quicksort also gives an average-case complexity of n.log(n) (base 2), which is optimal and doable by selecting your ‘pivot’ correctly optimally. Then, someone came along and realized that if you choose your pivot as the median of your first, last and middle elements, you get sightly better results. If you select your pivot randomly, it gets safer, as you would foil the plans of someone who purposely wanted to hack your system, by causing a worst-case input every time (it’s happened…people are weird).
Still, if you know anything about algorithms, you know that calculating average case correctly is as difficult as teaching a dog how to play a piano . . . and not half as rewarding!
Welp, that’s all folks! I’m going to be trying to leaf around with all the concepts of trees in my next post, so keep your thumbs pointed to the stars and your browsers open!