Skip to content

pileup command is giving BED files that dmr command says invalid-bedmethyl-data. #580

@ArnavBharti

Description

@ArnavBharti

pileup command is giving BED files that dmr command says invalid-bedmethyl-data. I have attached the output of DMR and the commands I have run.

OUTPUT (DMR)

> reading reference FASTA at "<genome_fasta_name>.fasta"
> 1521 common sequence(s) between FASTA and both samples
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 1094523 a records and 905404 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 6 and b: 21
> running with replicates, but not matched samples
> batch failed: invalid data, valid coverage (1) is not equal to the sum of canonical and modified counts (0), [BedMethylLine { chrom: "AAKM01000036", interval: Interval { start: 7056, stop: 7057, val: () }, raw_mod_code: Code('h'), strand: Positive, count_methylated: 0, valid_coverage: 1, count_canonical: 0, count_other: 1, count_delete: 0, count_fail: 0, count_diff: 0, count_nocall: 0 }] chrom: AAKM01000036 starting at 7056, stopping
> 12 batches processed                                                                                                                                                                 > Error! invalid-bedmethyl-data

PILEUP LOGS

[modkit-logging/src/lib.rs::69][2026-02-07 10:41:20][DEBUG] command line: modkit pileup J_CC_5hmC.bam J_CC_5hmC.bed.gz --modified-bases 5hmC --reference /home/<genome_fasta_name>.fasta --log J_CC_5hmC.log --bgzf
[modkit-core/src/pileup/subcommand.rs::802][2026-02-07 10:41:20][INFO] discarded 1107 contigs with zero aligned reads
[modkit-core/src/pileup/subcommand.rs::576][2026-02-07 10:41:20][INFO] parsed 1 base modification(s). Base modifications other than 'C:h' will be counted as 'N_other'.
[modkit-core/src/pileup/subcommand.rs::653][2026-02-07 10:41:20][INFO] adding single-base motif: 'C 0'
[modkit-core/src/pileup/subcommand.rs::944][2026-02-07 10:41:20][INFO] using bgzf compression with 4 compression threads
[modkit-core/src/command_utils.rs::102][2026-02-07 10:41:20][INFO] attempting to sample 10042 reads
[modkit-core/src/reads_sampler/mod.rs::49][2026-02-07 10:41:20][DEBUG] found BAM index, sampling reads in 1000000 base pair chunks
[modkit-core/src/reads_sampler/sampling_schedule.rs::261][2026-02-07 10:41:20][DEBUG] removed 0 contigs from schedule with <= 1 reads
[modkit-core/src/reads_sampler/sampling_schedule.rs::141][2026-02-07 10:41:20][DEBUG] derived sampling schedule, sampling total 11547 reads from 1641 contigs, 0 unmapped reads
[modkit-core/src/reads_sampler/sampling_schedule.rs::164][2026-02-07 10:41:20][DEBUG] schedule
 chrom  count/frac

DMR Command

modkit dmr pair \
  -a UNC_ED_5hmC.bed.gz \
  -a UNC_YR1_5hmC.bed.gz \
  -a UNC_YR2_5hmC.bed.gz \
  -a UNC_EL_5hmC.bed.gz \
  -a UNC_FY_5hmC.bed.gz \
  -b CM_EL_5hmC.bed.gz \
  -b CM_EJ_5hmC.bed.gz \
  -o DMR_UNC_vs_CM_5hmC.bed \
  --ref "${REF}" \
  --base C \
  --threads "${THREADS}" \
  --log-filepath DMR_UNC_vs_CM_5hmC.log

PILEUP Command. mod = 5hmC

pileup () {
    local bam=$1
    local mod=$2
    local prefix=$3
    log "Running modkit pileup (${mod}) for ${prefix}"
    modkit pileup \
        "${bam}" \
        "${prefix}_${mod}.bed.gz" \
        --modified-bases "${mod}" \
        --reference "${REF}" \
        --log "${prefix}_${mod}.log" \
        --bgzf
}

I had run modified basecalling for 5mC 5hmC to create modified BAM files. @ArtRand

Metadata

Metadata

Assignees

No one assigned

    Labels

    troubleshootingworkflow and data preparation questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions