Dynamic Programming [1]: Little steps

Let’s start this post with a perfectly reasonable assumption:

You are a rabbit farmer, and on your rabbit farm, you have rabbits.

That’s good. That’s a good direction. Yeah.

Rabbits, as you know, multiply. You started with two, but after half a year, you have somewhere north of thirty  of the furry little munchkins. Being a diligent farmer, you must calculate how many rabbits you will have after a few more months (in order to stock up on things such as rabbit feed, poop-scoops and orders slips from mystic tribesmen).

You started with two, but you noticed a pattern: after every month, the number of new rabbits you have is the total number of rabbits you had, two months ago. In other words, your new total number of rabbits is the sum of the last two months’ totals: 2, 3, 5, 8, 13, 21, 34, 55 … and if you know anything about maths, you’ll see where this thread of thought is going.

My first experience with the Fibonacci sequence occurred in a wonderful little book series named Murderous Maths, a few of which I still own today and cannot recommend highly enough (except by saying that it tickled my grey matter). In the book (I forget which one) a bunch of British characters all find themselves in different British situations, all of which revolve around the number sequence 0, 1, 1, 2, 3, 5, 8, 13, 21 etc. One of them, Rodney, is actually farming rabbits. I thought it was a hoot at the time; but, as I found out while researching this article, the author of Murderous Maths had taken his cues from history: Fibonacci actually owned a rabbit farm himself, which led to his now famous series.

It also led to the very famous golden ratio:
\varphi = \frac{1 + \sqrt{5)){2} \approx 1.61803\,39887\cdots\,

If you divide any two consecutive Fibby numbers,

you should approximately get this.

All of this is mostly immaterial, except that the Fibonacci sequence seems to be the most basic introduction to dynamic programming that anyone can think of.

Dynamic programming is (for want of better words) everywhere. Seriously. Go just about anywhere on the web and they’ll tell you how it relates to the Fibonacci sequence [basically, you should be smart enough to know that: to calculate the next month’s number of rabbits as time progresses, you don’t need to start from scratch every time; you can re-use your old results].

I don’t think that beating that horse will do any good. So I’m going to do a slightly harder problem, and one that is not as immediately intuitive: calculating Binomial coefficients.

The Binomial theorum (which I’m sure you recall in perfect clarity), is used for many things: one of which is to find the coefficients of the terms, in the expansion of (x+y)^n. The r’th coefficient is given by:

This is also the number of ways in which you can select ‘r’ items out of ‘n’ items; eg: if you have 10 balls labelled A, B, C, D…, in how many ways can you select any 3 balls? The answer is 10C3 = 10!/3!7! = 120 ways (more than you’d expect).

If you’re particularly mathsy, you might even know that these oefficients are the terms of Pascal’s triangle, with n=1,2,3… being the distances from the top of the triangle.

Pascal’s triangle: square version and party version.

Take a look at the second figure, and you will find it tells you many useful things: the third column goes as 1,3,6,10,15….which are (1), (1+2), (1+2+3), (1+2+3+4), (1+2+3+4+5) etc. The fourth column? Tetrahedral numbers. The fifth row? Treat the numbers as a single number, and it is (11)^5. The same result holds for all the rows. The sum of the numbers of the n’th row, is 2^n. The diagonals, if you notice, are the same as the columns. There’s a lot to be learned from this triangle, and calculating it efficiently is an important job.

A job for….Dynamic Programming.

(I remember, the coffee was fantastic)

Dynamic Programming involves using old results. Take a look at the second pascal’s triangle again. You can calculate one of the lower rows from the row directly above it: every j’th element in the lower row is the sum of the j’th and (j-1)’th elements, for 0 < j < n.

So, you can use dynamic programming to calculate all the binomial coefficients, because calculating n!/(n-r)!r! can easily go out of bounds for data-type ranges (eg: 100! = 9.332622E+157).

I’m sure it must be obvious what to do;

Binom_coeff_tab(i, j):     //tab for tabulation

int pascals_tri [n][n]   //assume these are all zeroed-out

pascals_tri[0][0]=1

for (i=1; i < n; i++) //rows

for (j=0; j <= i; j++) //columns

if (j==0 || j==i) pascals_tri[i][j] =1

else pascals_tri[i][j] = pascals_tri[i-1][j-1] + pascals_tri[i-1][j]

return pascals_tri[i][j]    //iCj, the coefficient we want

QED, you say. And it is. But the point of this was to learn Dynamic Programming, so you’re going to have to learn how to use recursion. You may remember recursion; that odd beast which enchants you with it’s utter simplicity. You fall in love with its ability to turn the most complicated problems into ten lines of code…until you try to make one on your own. Then, you hate it with a vengeance, as all humans hate what they do not understand.

Monologue aside, recursion has a big part to play in DP; all DP-related subproblems are defined most succinctly in recursive form. But Haskell-lovers should put down their glasses of champagne. It turns out that though almost all DP problems are defined recursively, they are often not implemented that way. In the broad sense, there are two ways of going about Dynamic Programming problems:

  1. Memoization:

    This is a purely recursive, top-down approach. It’s also called caching, and, in general, it uses less space than the next method. In memoization, whenever we need solution to a subproblem, we first look into the lookup table (I say table, but really only a few values are filled). If the precomputed value is there then we return that value, otherwise we calculate the value and put the result in lookup table so that it can be reused later. Memoization calculates the values on-demand. This means that computationally, it should be faster, as you only calculate the values you need. However, there may be overheads and the code is more complex.

  2. Tabulation:

    this is a very straightforward, bottom-up approach. You starts from the base case, and iteratively build an n-dimensional table of all the values computed so far. It computes values for all the sub-problems. Based on the results in the table, the solution to the “top” / original problem is then computed. It is usually faster than memoization because of the smaller overhead. It’s more predictable, too. But you do have to calculate a lot of unnecessary values.

(source for the above)

So, which do I use?

  • If all subproblems must be solved at least once, a bottom-up (i.e. tabulation) dynamic-programming algorithm usually outperforms a top-down memoized algorithm by a constant factor.

    • No stack overhead for recursion and less programming overhead for maintaining the table.
    • There are some problems for which the regular pattern of table accesses in the dynamic-programming algorithm can be exploited to reduce the time or space requirements even further.
  • If some subproblems in the subproblem space need not be solved at all, the memoized solution has the advantage of solving only those subproblems that are definitely required.
    Memoization can have tables too (as I’m about to show you). But Memoization works in a top-down fashion, rather than bottom-up.

The code we saw of the Binomial coefficients was a tabulation solution. Let’s try Memoization now.

In DP, we define the problem, as a recurrence relation between sub-problems. What is our problem? Get the binomial coefficeint C(i , j), i.e. the coefficient of the j’th term of the i’th row of pascal’s triangle. This happens to be iCj, by the way, i.e. i!/(i-j)!j!

We denoted C(i, j) as pascals_tri[i][j] in the code above. We noticed that pascals_tri[i][j] = pascals_tri[i-1][j-1] + pascals_tri[i-1][j]; let’s do the same with our binomial coefficient, and say:

C(i,j) = C(i-1, j-1) + C(i-1, j)     if 0 < j < i

   = 1                                      if j=0 or j=i

This is our recurrence relation (the problem in terms of the subproblems). Now, if asked to calculate iCj, we run the following recursive program.

Binom_coeff_rec(i, j):

if (j==0 or j==i)

return 1

else return Binom_coeff_rec(i-1,j-1) + Binom_coeff_rec(i-1,j)

This, works. But it’s extremely slow, as the screenshot I’ve provided demonstrates.

Dynamic programming: tabulation vs. vanilla recursion.

As inputs, this program takes a number N, and uses it as both i and j: i becomes N, and j becomes [N/2], which is the largest binomial coefficient in that row (this is just to demonstrate how it works).

As N becomes bigger, the simple recursion becomes VERY slow, while tabulation performs fairly well. This is because the simple recursive program runs in exponential time i.e. O(2^n), whereas the tabulation runs in O(n^2).

To put that in perspective, 30^2 is 900, whereas 2^30 is ~1 billion.

(Not gonna say it, not gonna say it….)

Note: the ratio of the two algorithms’ time complexity is not represented accurately in the time they take, because of the Python environment (and other things like caching etc); but comparing the times taken for different values of N, you can tell that the algorithms scale exponentially and quadratically, respectively.

So, simple recursion, while top-down, is quite terrible. That’s because we haven’t memoized our results, and we keep re-calculating them. Let’s try this again, but with the following memoization code:

Binom_coeff_mem(table, i, j):

if (j==0 or j==i)

table[i][j]=1

else

if (table[i-1][j-1]==0)    table[i-1][j-1]=Binom_coeff_mem(table, i-1, j-1)

if (table[i-1][j]==0)    table[i-1][j]=Binom_coeff_mem(table, i-1, j)

table[i][j]=table[i-1][j-1] + table[i-1][j]

return table[i][j]

Binom_coeff_mem_driver(i, j): //driver program for the above recursive function

int table=[i][i]   //assume all are zeroed-out

return Binom_coeff_mem(table, i, j)

I ran this program, and, as expected, it was much, much faster than the simple recursive one. In fact, it is expected to run faster than the tabulation (iterative) one, because it does not need to calculate all the values!

To give you some intuition about why this happens, I added a line of code in the program that prints the entire table after each stage:

Dynamic Programming: tabulation vs memoization
As you can see, the tabulation algorithm has to calculate ALL the values of ALL the rows, before getting to the one it wants. The upside is that the code is very simple and quite fast. But the Memoized code is even faster, because it only has to calculate the values it wants, i.e. the ones along the diagonal.
I know, the picture says that the memoized code is actually slower than the tabulated one…by a lot. This is because I’ve taken fairly small numbers, and hence the overhead of calling the function recursively is taking its toll. However, I tried it again with larger numbers, and the result was as expected: memoization beat tabulation, (though not by much).
The numbers get massive….imagine doing this by factorials!

Properties of Dynamic Programming:

Something you might have been wondering: why doesn’t the recursive method (or even the memoized one) blow out the computer’s stack? You’d expect it to, since for N=30, it does 2^30 = ~1 billion recursive function calls. The truth is that, while the number of function calls may be a LOT, they’re shallow: the maximum depth they can attain is N, because that’s the ‘depth’ of the table. Also, in most function calls, it’s not even going deeper than one level, because we’ve already memoized the result, so the function call is just retrieving it.
This thought leads to an important feature of DP-problems:
  • Overlapping sub-problems:

    Dynamic Programming problems are all recursive in nature, as they all develop from some recurrence relation. However, not all recursive problems can use tabulation/memoization: divide-and-conquer algorithms such as quicksort and mergesort do not fall under the category of DP because, while running, the algorithm changes the subproblem with each iteration. When you go back to the same array indices the next time, the elements aren’t the same, and so the problem is not the same, and so the memoized solution to the old problem is of no use. In other words, the subproblems do not overlap.
    A good reason to use Dynamic Programming is if your problem space is small; meaning, you keep having to calculate the same answers again and again, and you can save time by storing the answers.
    Looking at the recursion tree of a problem, you can instantly tell if it has overlapping subproblems, because the leaves keep spreading to the same values. In the pruned recursion tree (obtained by using Dynamic Programming), you can see just how stark the difference is.

  • Optimal substructure:

    If the optimal solution to a problem can be constructed from the optimal solutions of its subproblems (i.e. subsets of the problem), then it’s a strong hint that Dynamic Programming should be used.
    Optimal substructure is what we’re aiming for when generating the recurrence relation: we want one that solves the problem optimally by using the optimum of the smaller problems.

Source for the above: CLRS


As an example of the above two paradigms, consider the Rod-Cutting problem:

Rod-Cutting:

You are supposed to cut a metal rod into smaller rods. Provided is a reference table, which tells you the price of a rod piece of length ‘X’. eg: a rod of length 1 inch is worth $1, a rod of length 2 inches is worth $1.5, 3 inches might be $4.5, and so on.
Given a rod of finite length L, and assuming that every cut is free, how do you divide the rod so as to maximize your profit?

The total number of possible ways to cut the rod is exponential, because you must sum up all the possible combinations of cuts, which becomes 2^N.

As we want to transform exponential running time into polynomial, the obvious thing to do is use Dynamic Programming.

Let’s review the problem iteratively:

  • The base case: if you have a rod of length zero, your maximum profit is zero.
  • Check out the smallest subproblem possible: you have a rod of one inch. There’s only one way to ‘cut’ it; into a rod of one inch, with zero cuts (cuts are free anyways).
  • If you have a rod of two inches, you have two choices: cut it into two pieces (1”, 1”), or leave it as one piece (2”). Depending on the reference table, you choose the alternative that maximizes the profit you get from that.
  • Now, suppose you have a 3-inch rod (yes, okay, I heard it). There are multiple ways to cut it: you keep it as three pieces of lengths (1”, 1”, 1”), or two peices (1”, 2”) or one piece (3”). If you notice carefully, this overlaps with the previous problem; your choices are: keep it as 3”, or reuse your previously calculated value of the best possible cutting of a rod of length 2” – i.e. either (1”, 1”) or (2”) – and add the  price of an extra 1” piece, or keep it as a 2” rod piece + 1” piece (the ‘most optimum’ of N=1). Again, select the maximum of these choices as “the most optimum way of cutting a rod of length 3 inches”.
  • Go onto four inches, and the scenario changes, and the algorithm starts to emerge.
    • You can either keep it as 4”
    • You could set aside a 1” piece, and re-use the best way to cut a 3” rod.
    • You could set aside a 2” piece, and re-use the best way to cut a 2” rod.
    • You could set aside a 3” piece, and “re-use” the best way to cur a 1” rod (i.e. your base case)
      Thus, you find yourself with four options, and choose the best one.
  • 5” : You can see the alogithm in play:
    max( cost(5”),  cost(1”)+bestCost(4”),  cost(2”)+bestCost(3”),  cost(3”) + bestCost(2”), cost(4”) + bestCost(1”) )

This is all a bit weird-looking, so we’ll generalize it as a recurrence relation:

Rod-cutting:

bestCost(i) = max( cost(j) + bestCost( i – j ) ),    for 1 <= j <= i

where bestCost( 0 ) = 0 is the base case.

The code for Rod-Cutting would be as follows:

bestCost_driver( ):

n=8

int  saved_bestCosts[n]   //assume zeroed-out

price_array= { 1, 5, 8, 9, 10, 17, 17, 20 }

return  bestCost(price_array , saved_bestCosts , n)   // output: 22

bestCost (price_array,  saved_bestCosts,  n):

if (n==0)    return 0

 if (saved_bestCosts[n-1] != 0)

return saved_bestCosts[n-1]    // the most optimal way to cut a rod of length N

else   //haven’t saved, must calculate

max_val= price_array[n-1]    //get the price of the n’th length (this might be the optimal)

for (i=0; i < n; i++)

max_val = max(max_val , price_array[i] + bestCost(price_array, saved_bestCosts, n-i-1))

 saved_bestCosts[n-1] = max_val  //save the result

return max_val


Problems where Dynamic Programming is used:

So there you have it: a small intro to Dynamic Programming.

Since DP covers such a wide range of problems, I’m not going to be doing them systematically; I’ll take a few related problems in separate posts.

About the author:

Abhishek Divekar,

is a big fan of English rock band the Arctic Monkeys, even though he does not understand any of their music videos. He’s also offering 2,000 Rupiah to anyone who can explain why Humbug is their favorite album of all time (to cover the cost of a get well soon card).

Advertisements