Instead of splitting and writing out a file, can we instead be smarter about how we use the fasta index, or use a samtools-generated fasta index. The main issue with Biopython's reference index feature is that it still reads the entire fasta into memory, which is not what we want. What if instead of actually splitting the file, we just generated a list of indices, and split the sequence up that way? Similar notion, of say 500,000 bases, but we just pass fragments of the sequence into the single_runner. We already map out the sequence to find non-N regions. If we did that first, then split up the resulting index. All ideas we can try.