Skip to content

Investigate using the fasta index instead of splitting #216

@joshfactorial

Description

@joshfactorial

Instead of splitting and writing out a file, can we instead be smarter about how we use the fasta index, or use a samtools-generated fasta index. The main issue with Biopython's reference index feature is that it still reads the entire fasta into memory, which is not what we want. What if instead of actually splitting the file, we just generated a list of indices, and split the sequence up that way? Similar notion, of say 500,000 bases, but we just pass fragments of the sequence into the single_runner. We already map out the sequence to find non-N regions. If we did that first, then split up the resulting index. All ideas we can try.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions