CIS 501
Spring 2003

Assignment 02L - Lessons to be Learned from Assignment 2

Assignment 2 (the merge-sort part) is a pretty difficult, complex assignment. The final code is not long, but it IS complicated. I just want to take a few minutes here and point out some issues related to this assignment. You may find this useful in guiding your study for the exam. Consider these issues as things to remember for the future.

  1. This is an assignment in which group discussion and group work can be highly beneficial. So … if you are not in a group, I will help you get into one. Let me know by e-mail, as soon as possible. If you have changed your group in any way from the last time, also let me know.
  2. We always run around like lunatics preaching top-down approaches to programming. But the top-down approach does not work so well in this case. In fact, it is often the case that the top-down approach, by itself, is difficult to use. In this problem, and others, unless you have a clear picture of how the bottom, most detailed pieces of a program will work, it is nearly impossible to decide what the upper pieces will look like.
  3. We started out today taking a top-down approach, writing out, in English, what needed to be done in the outer two loops of the sort-merge function. But then we spent the rest of the class talking about the lowest level code – the function (called merge_two_runs) that merges two runs of length k (one from each input file) and writes the resulting run to an output file. If you take the time to fully understand how this function works, I would bet that the remaining work needed to finish the entire sort-merge problem will be minimal.

  4. As we began working on the merge_two_runs function, we constantly kept a focus on a sample set of data, partially sorted (worked through two Phases) and we examined what needed to be done to get Phase 3 working. We started small, with the very first step needed – namely reading a data item from each of the two input files, and then comparing these two data items. As we went along, we
  5. When we felt confident that the basic algorithm for merge_two_runs looked pretty good, we then began to look at the test for end of run and the copy operations. We actually figured out the test for end of run mechanism, but not the copy. Note that these two pieces of the sort-merge problem would fall below the merge_two_runs subproblem. In other words, they are subproblems of the subproblem.
  6. (continuation of 4) If I asked you now to draw a hierarchy chart, the sort-merge function problem would appear at the top. Underneath would be an initialization box (including a distribution function that would appear underneath the initialization box), a termination box, and a process box. The process box, which contains steps to be repeated, might read something like this:
  7. for k = 1 to k == log2n, repeat Phase k: merge runs of length 2 k-1 into runs of length 2k.

    Underneath this step is another step to be repeated and it is essentially the merge_two_runs step. Under this step is at the very least, two more steps: determine when the end of a run is reached, and copy the remaining run to the output file. YOU SHOULD DRAW YOURSELF A COPY OF THIS HIERARCHY CHART AND ADD WHATEVER ITEMS YOU SEE FIT TO ADD.

  8. FINALLY: Whenever you program, you should always try to anticipate what we sometimes call the "end point cases" for your algorithms. For example, in the handout e-mailed to you yesterday, it was suggested (see item 2ii) that power_c and power_d should be equal. Well, there may be an end point case for which this is not true. Consider for example, the case where count_c = 16 and count_d = 17. Compute power_c and power_d. In theory, one of these values is 4 and the other is 5. Which value do we want in this case? What values of number and n do we need in this case? (Hint: n should be 64). How would you rewrite the note at the end of 2ii to correct this problem?