CIS 501
Spring 2003

Assignment Number 02

On Sorting

Reading:
Complete the reading specified in Section B of the Syllabus

Exercises:
Headington and Riley - Chapter Exercises at end of Chapter 12:

#12.2a, c, h, i
#12.3
#12.4 a, c, h, i
#12.10 a and d.
Programming Project: [May be done alone or in pairs, but I want to know 1) who was given which tasks at the start and 2) who actually did what they were supposed to do. Turn the program in via e-mail along with a separate e-mail to me describing the process, including items 1 and 2) above. A copy of the paper describing the process also should be turned in to me.]

Write a program to perform a binary, 4-way (balanced) merge-sort using four files named fileA.dat, fileB.dat, fileC.dat, fileD.dat. You may assume that the following data is initially contained on fileA.dat:

67   71   61   57   59   47   41   43   37   29   31   23   17   19   13   7   11   5   2   3

Here, n = 20. Your algorithm should begin by distributing alternating runs (of length 1) to files fileC.dat and fileD.dat.

                               /------> fileC.dat
     fileA.dat -----><
                               \------> fileD.dat

Then you perform the first pass of the merge sort, by merging runs of length 1 from fileC.dat and fileD.dat to fileA.dat and fileB.dat.

   fileC.dat ------->\                 /---------> fileA.dat
                 (merge) >-------->< (distribute -- while you merge)
   fileD.dat ------->/                \---------> fileB.dat

Next, go on to pass 2, which merges runs of length 2 on fileA.dat and fileB.dat back to fileC.dat and fileD.dat. You continue in this fashion until you reach a pass, say pass_total, for which the merge-sort results in all the data on one file totally sorted (with the other file being empty). Note that past_total = ceil(log2(n)), which in our example had better be 5 -- and you should be able to explain why?

Hints:

  1. No arrays are needed here.

  2. You may find it helpful to have separate functions for
  3. merging runs (you may want the inner loop for this as a separate function)
  4. displaying an entire file
  5. displaying a run on a file (for testing purposes at least)
  6. distributing runs from one file to another (executed once)
  7. Clever use of nested loops (mainly nested for loops) will do most of the work.
  8. Once again, the outer loop executes once for each required pass. The total number of required passes, pass_total, can even be computed ahead of time (this is not true for the natural merge-sort).
  9. The length of a run for pass p can be computed ahead of time as len = 2(p-1). Thus, for each inner loop, the length of each run (except perhaps for the last run in a file) is known when the pass begins. Note: len = int (pow (2, p-1))
  10. The header for the inner for loop for pass p might look something like this:

    for (int ix = 0, ix < n/2, ix = ix + len)

    Within this loop you need a third nested loop which does all the dirty work. This loop (perhaps written as a separate function) merges one run of length len = 2(p-1) from one file, with a run of the same length from the second file. (Note that the length of the resulting merged run is now 2p ). This all works fine until you get to the end of a file. where we will have a problem with the "tail sequence" of a run (that portion of a run on one file that still remains after the end-of-file has been reached on the other file).

  11. We can handle a tail sequence by checking for the end-of-file before each read in the inner merge loop.
  12. We have two simple cases to worry about:
    case 1: we encounter the end of file on file 1.first.
    case 2: we encounter the end of file on file 2 first.

    We handle these cases in the same way. In case 1, copy the remaining data on file 2 to the file currently being used for output. In case 2, we copy the remaining data on file 1 to the current output file.

  13. The tail copy loop completes a pass.

  14. The best way to switch the input and output files is to use 2 true/false flags to alternately determine which of 4 calls to the inner merge to use -- for example:

    // Merge a run of length len from streamA and and one from streamB
    // to form a run of length 2*len written on streamD.
    // Use flags such as inAB and out1.

    if (inAB && !out1) inner_merge (len, streamA, streamB, streamD);

  15. There will be lots of argument passing required, especially involving the streams to be processed. Life will be easiest if you treat each of the four streams in your program as an fstream (NOT as ifstream or ofstream).

    fstream streamA, streamB, streamC, streamD;

    Remember to pass these streams as type fstream& arguments.

  16. To attach a file to a stream in a way that it can be used for read only, use the following open call ("r" specifies the file mode):

    streamA.open ("fileA.dat", "r");

    To attach a file for write only, use

    streamA.open ("fileA.dat", "w");

  17. Also remember that the end-of-file flag is set ONLY when you try to read past the end of a file (when you attempt to perform a read and there is no more data to be read). Only then is this flag set.

  18. When you are finished with a particular pass ( including the distribution pass), you need be able to reuse the files again, but in different modes. To do this, you have to close each file and then reopen it in the new different mode. Thus, for example, if you have finished reading from streamA (in read mode) and now want to write to streamA, you need to do close streamA

    streamA.close( );

    and reopen it using

    streamA.open ("fileA.dat", "w");

  19. Don’t forget to close all files when you are done with them.

Due Date: To be determined.