Comments on the thread pool example in "Java Network Programming" by Harold

  1. The static int field filesCompressed in GzipThread is accessed by concurrent threads. It is incremented with a synchronized method incrementFilesCompressed. It would be more efficient to just make the field volatile. Then we are guaranteed that accesses are atomic and to the most recently updated value without the synchronization overhead. Note that in this program the update and test are done within a synchronised block. Thus even "volatile" is redundant since the block with its locking and unlocking guaranties writing to shared memory of the thread local temporaries.
  2. Vector or ArrayList: the former is synchronized, the latter isn't. Since the accesses to the pool are already within synchronized blocks, there is no reason for the added overhead of Vector. And of course compiling with Java 5 we get warnings with either one.
  3. In some cases the program will fail to GZip some files. The original code has been modified by adding print statements to see what is going on, and by putting a sleep statement in the threads. That should not affect the functionality of the code since we cannot really rely on how threads are scheduled. You will see that some files are not GZed, or the GZip file is truncated.
  4. I also think that there is a possible (very improbable) deadlock.
    Suppose that main of GZipAllFiles is preempted before the statement
           filesToBeCompressed = totalFiles;
    that all GZipThreads but one have taken all the existing files and are sleeping on pool.wait(), and that the last GZipThread, call it moo, now executes until immediately before
           try { pool.wait();}
    where it is preempted.
    Now main resumes and sends an interrupt to all the pool threads. Moo is not waiting and is not in an interruptible call. If the interrupt (as it should be) is ignored then moo resumes and executes
           try { pool.wait();}
    and stays there waiting and blocking the whole program.
This program is complex. To rely on the interrupt mechanism is to rely on luck (the hope that the threads are in the try block at the time of the interrupt). It is better to see this problem as a Producer-Consumer problem and interpose a protected buffer between them. For termination either the buffer can contain a termination condition, or, better, the Producer can insert a termination tokens in the buffer (say null). A possible alternative implementation is here
Interestingly enough this implementation is much slower (>20 times!) than the solution presented by Harold! The difference is particularly significant in system time. My solution seems to force many more context switches than the Harold solution.
But if I replace in GZipThread.java the lines
            int b;
            while ((b = in.read()) != -1) out.write(b);
with the lines
            int n = in.available();
            byte[] buf = new byte[n];
            in.read(buf);
            out.write(buf, 0, n);
I get performance as good as in Harold [The Harold's solution can be improved a little by making in it the same change. After the change in both Harold's and in my program, the performance of the two programs are equivalent]. The Java Virtual machine must have rules for scheduling IO and threads that depend on the program structure in ways that are obscure to me.

By the way, in Java 1.5 there are now in the package java.util.concurrent a series of useful classes. Among them ArrayBlockingQueue, which is exactly a protected bounded buffer for objects. Interestingly it is an implementation that does not allow the insertion into the queue of null objects. One more version of the program, no using ArrayBlockingQueue and a trick for graceful termination, is here