-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Hi developers,
I’m using modkit pileup (v0.6.0) to extract confident 5mC calls in BED format from Nanopore data. I’ve successfully generated the .bed.gz output, and I understand the meaning of the columns based on your documentation. However, I’m still unsure about how to properly filter this output to retain high-confidence methylated sites.
My questions is what are the recommended filtering criteria for downstream use of 5mC calls from modkit pileup?
Are there any published or internal best practices for balancing sensitivity and specificity in calling modified bases (especially 5mC in CpG context)?
Should thresholds be chosen dynamically per-sample, e.g., based on modkit summary output?
For example in my .bed.gz i have:
contig_23 28713 28714 m 23 + 28713 28714 255,0,0 23 4.35 1 22 0 1 9 1 1
From what I understand:
Nmod = 1
Nvalid_cov = 23
fraction_modified = 4.35%
Nfail = 1
So this doesn’t pass strict thresholds, but I’m trying to define the optimal cutoff.
Any clarification on which values are most robust and biologically meaningful would be much appreciated. I’d also love to hear how others in the community are doing this in practice.
Thanks again!