[Biojava-dev] Fwd: Bug in org/biojava/utils/io/UncompressInputStream.java

Richard Holland holland at ebi.ac.uk
Tue Apr 10 06:37:36 EDT 2007

Hash: SHA1

Why are these files in compress/uncompress format? Is it proprietary
software creating them, or a legacy system of some kind? Wouldn't gzip
give better results both in terms of compression ratios and performance
as it is far more up-to-date?

I believe that the JDK doesn't support LZW because LZW was patented, and
that patent expired only very recently (in 2003/4/5/6 depending on where
you live and in what form you use LZW):


It's one of those wonderful cases where the patent enforcement caused
the algorithm it was protecting to get dumped and forgotten because
nobody wanted to pay for it. Apart from *nix compress/uncompress and
inside the GIF format I'm not sure it's actually used anywhere else any

Technically we infringed the patent by including LZW support in BioJava,
but now the patent has expired we no longer need to worry.

Question is, do we need to fix this inherently computer-science problem
which is entirely unrelated to biology or bioinformatics, or can we just
get people to use an alternative library instead which supports it
better and is more generic? They are out there, for instance:



Andy Yates wrote:
> Seems very strange this does. I don't know much about decompression but
> by the looks of things LZW isn't supported by the JDK.
> Richard Holland wrote:
> AFAIK the Zip algorithm is just LZW with bells on, so it should produce
> exactly the same results.
> Chris Dagdigian wrote:
>>>> Passing on this email that came to me ...
>>>> Regards,
>>>> Chris Dagdigian
>>>> OBF
>>>> Begin forwarded message:
>>>>> From: "Miguel Duarte" <malduarte at gmail.com>
>>>>> Date: April 6, 2007 2:16:52 PM EDT
>>>>> To: dag at sonsorol.org
>>>>> Subject: Bug in org/biojava/utils/io/UncompressInputStream.java
>>>>> Hi Chris,
>>>>>> From
>>>>>> http://sourceforge.net/project/shownotes.php?release_id=314770&group_id=18598,
>>>>> i've learned that you're maintaining the class
>>>>> org/biojava/utils/io/UncompressInputStream.java. If that's not the
>>>>> case please forward this mail to the maintainer.
>>>>> I've discovered a nasty bug: With some read block sizes the algorithm
>>>>> truncates a few bytes from the end of the stream. I've verified this
>>>>> comparing the gzip/uncompress output for some files versus what
>>>>> org/biojava/utils/io/UncompressInputStream.java generates.
>>>>> Unfortunately i've not discovered the bug yet, but i can contribute
>>>>> with the attached test case. How to verify the bug:
>>>>> uncompress BH_03834.MCR.Z with gzip and with UncompressInputStream and
>>>>> compare the results.
>>>>> Thanks,
>>>>> Miguel Duarte
>>>> ------------------------------------------------------------------------
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
biojava-dev mailing list
biojava-dev at lists.open-bio.org

Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the biojava-dev mailing list