CIS 083
Assignment Number 09-F97
On Sorting
Reading: Complete the reading for Section G of the Syllabus
Exercises: Headington and Riley -- Chapter Exercises at end of Chapter 12:
- #12.2 a, c, h, i
- #12.3
- #12.4 a, c, h, i
- #12.10 a and d
Programming Project: [May be done alone, in pairs, in trips, or
quads, but I want to know 1) who was given which tasks at the start and 2) who
actually did what they were supposed to do. Turn the program in to the lab
assistant along with a separate piece of paper describing the process,
including items 1 and 2) above. A copy of the paper describing the process
also should be turned in to me.]
Write a program to perform a binary, 4-way (balanced) merge-sort using four
files named fileA.dat, fileB.dat, fileC.dat, fileD.dat. You may assume that
the following data is initially contained on fileA.dat:
3 2 5 11 7 13 19 17 23 31 29 37 43 41 47 59 57 61 71 67
Here, n = 20. Your algorithm should begin by distributing alternating runs
(of length 1) to files fileC.dat and fileD.dat.
/------> fileC.dat
fileA.dat -----><
\------> fileD.dat
Then you perform the first pass of the merge sort, by merging runs of
length 1 from fileC.dat and fileD.dat to fileA.dat and fileB.dat.
fileC.dat ------->\ /---------> fileA.dat
(merge) >-------->< (distribute -- while you merge)
fileD.dat ------->/ \---------> fileB.dat
Next, go on to pass 2, which merges runs of length 2 on fileA.dat and
fileB.dat back to fileC.dat and fileD.dat. You continue in this fashion until
you reach a pass, say pass_total, for which the merge-sort results in all the
data on one file totally sorted (with the other file being empty).
Note that past_total = ceil(log2(n)),
which in our example had better be 5 -- and you
should be able to explain why?
Hints:
- No arrays are needed here.
- You may find it helpful to have separate functions for
- merging runs (you may want the inner loop for this as a separate function)
- displaying an entire file
- displaying a run on a file (for testing purposes at least)
- distributing runs from one file to another (executed once)
- Clever use of nested loops (mainly nested for loops) will do most of the
work.
- Once again, the outer loop executes once for each required pass. The total
number of required passes, pass_total, can even be computed ahead of time
(this is not true for the natural merge-sort).
-
The length of a run for pass p can be computed ahead of time as len = 2
(p-1)
.
Thus, for each inner loop, the length of each run (except perhaps for the
last run in a file) is known when the pass begins. Note: len = int (pow (2, p-
1))
The header for the inner for loop for pass p might look something like
this:
for (int ix = 0, ix < n/2, ix = ix + len)
Within this loop you need a third nested loop which does all the
dirty work. This loop (perhaps written as a separate function) merges one
run of length len = 2
(p-1) from one file, with
a run of the same length from the second file. (Note that the length of
the resulting merged run is now 2
p ).
This all works fine until you get to the end of a file, where we will have a
problem with the "tail sequence" of a run (that portion of a run on one file
that still remains after the end-of-file has been reached on the other file).
- We can handle a tail sequence by checking for the end-of-file before each
read in the inner merge loop.
- We have two simple cases to worry about:
case 1: we encounter the end of file on file 1.first.
case 2: we encounter the end of file on file 2 first.
We handle these cases in the same way. In case 1, copy the remaining
data on file 2 to the file currently being used for output. In case 2, we copy
the remaining data on file 1 to the current output file.
The tail copy loop completes a pass.
The best way to switch the input and output files is to use 2 true/false
flags to alternately determine which of 4 calls to the inner merge to use --
for example:
- Merge a run of length len from streamA and and one from streamB to
form a run of length 2*len written on streamD.
- Use flags such as inAB and out1.
if (inAB && !out1) inner_merge (len, streamA, streamB, streamD);
There will be lots of argument passing required, especially involving the
streams to be processed. Life will be easiest if you treat each of the four
streams in your program as an fstream (NOT as ifstream or ofstream).
fstream streamA, streamB, streamC, streamD;
Remember to pass these streams as type fstream& arguments.
To attach a file to a stream in a way that it can be used for read only,
use the following open call ("r" specifies the file mode):
streamA.open ("fileA.dat", "r");
To attach a file for write only, use
streamA.open ("fileA.dat", "w");
Also remember that the end-of-file flag is set ONLY when you try to read
past the end of a file (when you attempt to perform a read and there is no
more data to be read). Only then is this flag set.
When you are finished with a particular pass ( including the distribution
pass), you need be able to reuse the files again, but in different modes. To
do this, you have to close each file and then reopen it in the new different
mode. Thus, for example, if you have finished reading from streamA (in read
mode) and now want to write to streamA, you need to do close streamA
streamA.close( );
and reopen it using
streamA.open ("fileA.dat", "w");
Don't forget to close all files when you are done with them.
Due Date: To be determined.