Skip to content

Clarification on recommended filtering thresholds for confident 5mC calls from modkit pileup #574

@Alexkortsi

Description

@Alexkortsi

Hi developers,

I’m using modkit pileup (v0.6.0) to extract confident 5mC calls in BED format from Nanopore data. I’ve successfully generated the .bed.gz output, and I understand the meaning of the columns based on your documentation. However, I’m still unsure about how to properly filter this output to retain high-confidence methylated sites.

My questions is what are the recommended filtering criteria for downstream use of 5mC calls from modkit pileup?
Are there any published or internal best practices for balancing sensitivity and specificity in calling modified bases (especially 5mC in CpG context)?
Should thresholds be chosen dynamically per-sample, e.g., based on modkit summary output?

For example in my .bed.gz i have:

contig_23 28713 28714 m 23 + 28713 28714 255,0,0 23 4.35 1 22 0 1 9 1 1

From what I understand:

Nmod = 1
Nvalid_cov = 23
fraction_modified = 4.35%
Nfail = 1

So this doesn’t pass strict thresholds, but I’m trying to define the optimal cutoff.

Any clarification on which values are most robust and biologically meaningful would be much appreciated. I’d also love to hear how others in the community are doing this in practice.

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions