Comments on the thread pool example in "Java Network Programming" by Harold
- The static int field filesCompressed in GzipThread
is accessed by concurrent threads. It is incremented with a
synchronized method incrementFilesCompressed.
It would be more efficient to just make the field volatile.
Then we are guaranteed that
accesses are atomic and to the most recently updated value without the synchronization
overhead.
Note that in this program the update and test are done within a synchronised
block. Thus even "volatile" is redundant since the block with its locking
and unlocking guaranties writing to shared memory of the thread local
temporaries.
- Vector or ArrayList: the former is synchronized, the latter isn't.
Since the accesses to the pool are already within synchronized blocks,
there is no reason for the added overhead of Vector.
And of course compiling with Java 5 we get warnings with either one.
- In some cases the program will fail to GZip some files.
The original code has been modified by adding
print statements to see what is going on, and by putting a sleep statement
in the threads. That should not affect the functionality of the code since
we cannot really rely on how threads are scheduled. You will see that some
files are not GZed, or the GZip file is truncated.
- I also think that there is a possible (very improbable) deadlock.
Suppose that main of GZipAllFiles is preempted before
the statement
filesToBeCompressed = totalFiles;
that all GZipThreads but one have taken all the existing files and are
sleeping on pool.wait(), and that the last GZipThread, call it moo,
now executes until immediately before
try { pool.wait();}
where it is preempted.
Now main resumes and sends an interrupt to all the pool threads.
Moo is not waiting and is not in an interruptible call.
If the interrupt (as it should be) is ignored then moo resumes and
executes
try { pool.wait();}
and stays there waiting and blocking the whole program.
This program is complex.
To rely on the interrupt mechanism is to rely on luck (the hope that the threads are
in the try block at the time of the interrupt).
It is better to see this problem as a Producer-Consumer problem and interpose a
protected buffer between them. For termination either the buffer can contain a termination condition,
or, better, the Producer can insert a termination tokens in the buffer (say
null).
A possible alternative implementation is here
Interestingly enough this implementation is much slower (>20 times!) than
the solution presented by Harold! The difference is particularly
significant in system time.
My solution seems to force many more
context switches than the Harold solution.
But if I replace in GZipThread.java the lines
int b;
while ((b = in.read()) != -1) out.write(b);
with the lines
int n = in.available();
byte[] buf = new byte[n];
in.read(buf);
out.write(buf, 0, n);
I get performance as good as in Harold [The Harold's solution can be improved
a little by making in it the same change.
After the change in both Harold's and in my program, the performance of the two programs are
equivalent]. The Java Virtual machine must have rules for scheduling IO
and threads that depend on the program structure in ways that are obscure
to me.
By the way, in Java 1.5 there are now in the package java.util.concurrent
a series of useful classes. Among them ArrayBlockingQueue, which is
exactly a protected bounded buffer for objects. Interestingly it is an
implementation that does not allow the insertion into the queue of null
objects. One more version of the program, no using ArrayBlockingQueue
and a trick for graceful termination, is
here