Investigate using the fasta index instead of splitting

Instead of splitting and writing out a file, can we instead be smarter about how we use the fasta index, or use a samtools-generated fasta index. The main issue with Biopython's reference index feature is that it still reads the entire fasta into memory, which is not what we want. What if instead of actually splitting the file, we just generated a list of indices, and split the sequence up that way? Similar notion, of say 500,000 bases, but we just pass fragments of the sequence into the single_runner. We already map out the sequence to find non-N regions. If we did that first, then split up the resulting index. All ideas we can try.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Investigate using the fasta index instead of splitting #216

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate using the fasta index instead of splitting #216

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions