Let’s start this post with a perfectly reasonable assumption:
You are a rabbit farmer, and on your rabbit farm, you have rabbits.
That’s good. That’s a good direction. Yeah.
Rabbits, as you know, multiply. You started with two, but after half a year, you have somewhere north of thirty of the furry little munchkins. Being a diligent farmer, you must calculate how many rabbits you will have after a few more months (in order to stock up on things such as rabbit feed, poopscoops and orders slips from mystic tribesmen).
You started with two, but you noticed a pattern: after every month, the number of new rabbits you have is the total number of rabbits you had, two months ago. In other words, your new total number of rabbits is the sum of the last two months’ totals: 2, 3, 5, 8, 13, 21, 34, 55 … and if you know anything about maths, you’ll see where this thread of thought is going.
My first experience with the Fibonacci sequence occurred in a wonderful little book series named Murderous Maths, a few of which I still own today and cannot recommend highly enough (except by saying that it tickled my grey matter). In the book (I forget which one) a bunch of British characters all find themselves in different British situations, all of which revolve around the number sequence 0, 1, 1, 2, 3, 5, 8, 13, 21 etc. One of them, Rodney, is actually farming rabbits. I thought it was a hoot at the time; but, as I found out while researching this article, the author of Murderous Maths had taken his cues from history: Fibonacci actually owned a rabbit farm himself, which led to his now famous series.
It also led to the very famous golden ratio:
If you divide any two consecutive Fibby numbers,
you should approximately get this.
All of this is mostly immaterial, except that the Fibonacci sequence seems to be the most basic introduction to dynamic programming that anyone can think of.
Dynamic programming is (for want of better words) everywhere. Seriously. Go just about anywhere on the web and they’ll tell you how it relates to the Fibonacci sequence [basically, you should be smart enough to know that: to calculate the next month’s number of rabbits as time progresses, you don’t need to start from scratch every time; you can reuse your old results].
I don’t think that beating that horse will do any good. So I’m going to do a slightly harder problem, and one that is not as immediately intuitive: calculating Binomial coefficients.
The Binomial theorum (which I’m sure you recall in perfect clarity), is used for many things: one of which is to find the coefficients of the terms, in the expansion of (x+y)^n. The r’th coefficient is given by:
This is also the number of ways in which you can select ‘r’ items out of ‘n’ items; eg: if you have 10 balls labelled A, B, C, D…, in how many ways can you select any 3 balls? The answer is 10C3 = 10!/3!7! = 120 ways (more than you’d expect).
If you’re particularly mathsy, you might even know that these oefficients are the terms of Pascal’s triangle, with n=1,2,3… being the distances from the top of the triangle.
Pascal’s triangle: square version and party version.
Take a look at the second figure, and you will find it tells you many useful things: the third column goes as 1,3,6,10,15….which are (1), (1+2), (1+2+3), (1+2+3+4), (1+2+3+4+5) etc. The fourth column? Tetrahedral numbers. The fifth row? Treat the numbers as a single number, and it is (11)^5. The same result holds for all the rows. The sum of the numbers of the n’th row, is 2^n. The diagonals, if you notice, are the same as the columns. There’s a lot to be learned from this triangle, and calculating it efficiently is an important job.
A job for….Dynamic Programming.
Dynamic Programming involves using old results. Take a look at the second pascal’s triangle again. You can calculate one of the lower rows from the row directly above it: every j’th element in the lower row is the sum of the j’th and (j1)’th elements, for 0 < j < n.
So, you can use dynamic programming to calculate all the binomial coefficients, because calculating n!/(nr)!r! can easily go out of bounds for datatype ranges (eg: 100! = 9.332622E+157).
I’m sure it must be obvious what to do;
Binom_coeff_tab(i, j): //tab for tabulation
int pascals_tri [n][n] //assume these are all zeroedout
pascals_tri[0][0]=1
for (i=1; i < n; i++) //rows
for (j=0; j <= i; j++) //columns
if (j==0  j==i) pascals_tri[i][j] =1
else pascals_tri[i][j] = pascals_tri[i1][j1] + pascals_tri[i1][j]
return pascals_tri[i][j] //iCj, the coefficient we want
QED, you say. And it is. But the point of this was to learn Dynamic Programming, so you’re going to have to learn how to use recursion. You may remember recursion; that odd beast which enchants you with it’s utter simplicity. You fall in love with its ability to turn the most complicated problems into ten lines of code…until you try to make one on your own. Then, you hate it with a vengeance, as all humans hate what they do not understand.
Monologue aside, recursion has a big part to play in DP; all DPrelated subproblems are defined most succinctly in recursive form. But Haskelllovers should put down their glasses of champagne. It turns out that though almost all DP problems are defined recursively, they are often not implemented that way. In the broad sense, there are two ways of going about Dynamic Programming problems:

Memoization:
This is a purely recursive, topdown approach. It’s also called caching, and, in general, it uses less space than the next method. In memoization, whenever we need solution to a subproblem, we first look into the lookup table (I say table, but really only a few values are filled). If the precomputed value is there then we return that value, otherwise we calculate the value and put the result in lookup table so that it can be reused later. Memoization calculates the values ondemand. This means that computationally, it should be faster, as you only calculate the values you need. However, there may be overheads and the code is more complex.

Tabulation:
this is a very straightforward, bottomup approach. You starts from the base case, and iteratively build an ndimensional table of all the values computed so far. It computes values for all the subproblems. Based on the results in the table, the solution to the “top” / original problem is then computed. It is usually faster than memoization because of the smaller overhead. It’s more predictable, too. But you do have to calculate a lot of unnecessary values.
(source for the above)
So, which do I use?

If all subproblems must be solved at least once, a bottomup (i.e. tabulation) dynamicprogramming algorithm usually outperforms a topdown memoized algorithm by a constant factor.
 No stack overhead for recursion and less programming overhead for maintaining the table.
 There are some problems for which the regular pattern of table accesses in the dynamicprogramming algorithm can be exploited to reduce the time or space requirements even further.

If some subproblems in the subproblem space need not be solved at all, the memoized solution has the advantage of solving only those subproblems that are definitely required.
Memoization can have tables too (as I’m about to show you). But Memoization works in a topdown fashion, rather than bottomup.
The code we saw of the Binomial coefficients was a tabulation solution. Let’s try Memoization now.
In DP, we define the problem, as a recurrence relation between subproblems. What is our problem? Get the binomial coefficeint C(i , j), i.e. the coefficient of the j’th term of the i’th row of pascal’s triangle. This happens to be iCj, by the way, i.e. i!/(ij)!j!
We denoted C(i, j) as pascals_tri[i][j] in the code above. We noticed that pascals_tri[i][j] = pascals_tri[i1][j1] + pascals_tri[i1][j]; let’s do the same with our binomial coefficient, and say:
C(i,j) = C(i1, j1) + C(i1, j) if 0 < j < i
= 1 if j=0 or j=i
This is our recurrence relation (the problem in terms of the subproblems). Now, if asked to calculate iCj, we run the following recursive program.
Binom_coeff_rec(i, j):
if (j==0 or j==i)
return 1
else return Binom_coeff_rec(i1,j1) + Binom_coeff_rec(i1,j)
This, works. But it’s extremely slow, as the screenshot I’ve provided demonstrates.
Dynamic programming: tabulation vs. vanilla recursion.
As inputs, this program takes a number N, and uses it as both i and j: i becomes N, and j becomes [N/2], which is the largest binomial coefficient in that row (this is just to demonstrate how it works).
As N becomes bigger, the simple recursion becomes VERY slow, while tabulation performs fairly well. This is because the simple recursive program runs in exponential time i.e. O(2^n), whereas the tabulation runs in O(n^2).
To put that in perspective, 30^2 is 900, whereas 2^30 is ~1 billion.
Note: the ratio of the two algorithms’ time complexity is not represented accurately in the time they take, because of the Python environment (and other things like caching etc); but comparing the times taken for different values of N, you can tell that the algorithms scale exponentially and quadratically, respectively.
So, simple recursion, while topdown, is quite terrible. That’s because we haven’t memoized our results, and we keep recalculating them. Let’s try this again, but with the following memoization code:
Binom_coeff_mem(table, i, j):
if (j==0 or j==i)
table[i][j]=1
else
if (table[i1][j1]==0) table[i1][j1]=Binom_coeff_mem(table, i1, j1)
if (table[i1][j]==0) table[i1][j]=Binom_coeff_mem(table, i1, j)
table[i][j]=table[i1][j1] + table[i1][j]
return table[i][j]
Binom_coeff_mem_driver(i, j): //driver program for the above recursive function
int table=[i][i] //assume all are zeroedout
return Binom_coeff_mem(table, i, j)
I ran this program, and, as expected, it was much, much faster than the simple recursive one. In fact, it is expected to run faster than the tabulation (iterative) one, because it does not need to calculate all the values!
To give you some intuition about why this happens, I added a line of code in the program that prints the entire table after each stage:
Properties of Dynamic Programming:

Overlapping subproblems:
Dynamic Programming problems are all recursive in nature, as they all develop from some recurrence relation. However, not all recursive problems can use tabulation/memoization: divideandconquer algorithms such as quicksort and mergesort do not fall under the category of DP because, while running, the algorithm changes the subproblem with each iteration. When you go back to the same array indices the next time, the elements aren’t the same, and so the problem is not the same, and so the memoized solution to the old problem is of no use. In other words, the subproblems do not overlap.
A good reason to use Dynamic Programming is if your problem space is small; meaning, you keep having to calculate the same answers again and again, and you can save time by storing the answers.
Looking at the recursion tree of a problem, you can instantly tell if it has overlapping subproblems, because the leaves keep spreading to the same values. In the pruned recursion tree (obtained by using Dynamic Programming), you can see just how stark the difference is. 
Optimal substructure:
If the optimal solution to a problem can be constructed from the optimal solutions of its subproblems (i.e. subsets of the problem), then it’s a strong hint that Dynamic Programming should be used.
Optimal substructure is what we’re aiming for when generating the recurrence relation: we want one that solves the problem optimally by using the optimum of the smaller problems.
Source for the above: CLRS
As an example of the above two paradigms, consider the RodCutting problem:
RodCutting:
You are supposed to cut a metal rod into smaller rods. Provided is a reference table, which tells you the price of a rod piece of length ‘X’. eg: a rod of length 1 inch is worth $1, a rod of length 2 inches is worth $1.5, 3 inches might be $4.5, and so on.
Given a rod of finite length L, and assuming that every cut is free, how do you divide the rod so as to maximize your profit?
The total number of possible ways to cut the rod is exponential, because you must sum up all the possible combinations of cuts, which becomes 2^N.
As we want to transform exponential running time into polynomial, the obvious thing to do is use Dynamic Programming.
Let’s review the problem iteratively:
 The base case: if you have a rod of length zero, your maximum profit is zero.
 Check out the smallest subproblem possible: you have a rod of one inch. There’s only one way to ‘cut’ it; into a rod of one inch, with zero cuts (cuts are free anyways).
 If you have a rod of two inches, you have two choices: cut it into two pieces (1”, 1”), or leave it as one piece (2”). Depending on the reference table, you choose the alternative that maximizes the profit you get from that.
 Now, suppose you have a 3inch rod (yes, okay, I heard it). There are multiple ways to cut it: you keep it as three pieces of lengths (1”, 1”, 1”), or two peices (1”, 2”) or one piece (3”). If you notice carefully, this overlaps with the previous problem; your choices are: keep it as 3”, or reuse your previously calculated value of the best possible cutting of a rod of length 2” – i.e. either (1”, 1”) or (2”) – and add the price of an extra 1” piece, or keep it as a 2” rod piece + 1” piece (the ‘most optimum’ of N=1). Again, select the maximum of these choices as “the most optimum way of cutting a rod of length 3 inches”.
 Go onto four inches, and the scenario changes, and the algorithm starts to emerge.
 You can either keep it as 4”
 You could set aside a 1” piece, and reuse the best way to cut a 3” rod.
 You could set aside a 2” piece, and reuse the best way to cut a 2” rod.
 You could set aside a 3” piece, and “reuse” the best way to cur a 1” rod (i.e. your base case)
Thus, you find yourself with four options, and choose the best one.
 5” : You can see the alogithm in play:
max( cost(5”), cost(1”)+bestCost(4”), cost(2”)+bestCost(3”), cost(3”) + bestCost(2”), cost(4”) + bestCost(1”) )
This is all a bit weirdlooking, so we’ll generalize it as a recurrence relation:
Rodcutting:
bestCost(i) = max( cost(j) + bestCost( i – j ) ), for 1 <= j <= i
where bestCost( 0 ) = 0 is the base case.
The code for RodCutting would be as follows:
bestCost_driver( ):
n=8
int saved_bestCosts[n] //assume zeroedout
price_array= { 1, 5, 8, 9, 10, 17, 17, 20 }
return bestCost(price_array , saved_bestCosts , n) // output: 22
bestCost (price_array, saved_bestCosts, n):
if (n==0) return 0
if (saved_bestCosts[n1] != 0)
return saved_bestCosts[n1] // the most optimal way to cut a rod of length N
else //haven’t saved, must calculate
max_val= price_array[n1] //get the price of the n’th length (this might be the optimal)
for (i=0; i < n; i++)
max_val = max(max_val , price_array[i] + bestCost(price_array, saved_bestCosts, ni1))
saved_bestCosts[n1] = max_val //save the result
return max_val
Problems where Dynamic Programming is used:
 Shortestpath problems (all of them, including Djikstra’s, save the previous results. As the graph does not change with each iteration, saving the results makes sense).
 String and sequence algorithms, including the Longest Common Subsequence problem (used in Git and other version control software, as well as to compare DNA strands), the Edit distance problem (used in autocorrect on phones), the Maximum subarray problem, and many more.
 Multiple matrix multiplications might meet most metered metrics (as Dynamic Programming).