Test reading concatenated/multiblock gzips works#171
Conversation
|
Nice to know that all that work on dnaio does catch attention. Should we also support writing blocked gzip in the future? It is mandatory for writing BAM format, so it is something to consider. (Or maybe it should be in dnaio specifically, not xopen, because it is such a bioinformatics thing.) |
It’s a bit special, so I think it would be ok to not support that directly if we cannot find a simple interface for it. I seems to already work to just close the file and re-open it in append mode (I guess this won’t be very efficient for many small blocks, but at least it’s possible): from xopen import xopen
f = xopen("out.gz", mode="wb")
f.write(b"hello\n")
f.close()
f = xopen("out.gz", mode="ab")
f.write(b"world\n")
f.close()Hm, but if I look at the generated file, it actually contains four gzip headers in total. Is that something that python-isal does? |
|
Could be, in that case it is a bug worth investigating. Using igzip.open rather than xopen I count two instances of Using igzip_threaded.open Interestingly indeed multiple headers here. The output of zcat is still correct. Let me check how it works in the code. The flush call on multithreaded gzip writes all the data but also ends the gzip stream. It seems as flush is called twice. The code should probably be changed. Flush should not end the gzip stream, that is not congruent with how the single threaded implemenation works. |
|
I created a bug report. I have no time to fix this now, but it will be fixed in the future. |
|
Thanks! I’ll subscribe to that issue |
To support what I claim here: nf-core/detaxizer#62 (comment)