Skip to content

Illumina error profile is flipped? #8

@hoelzer

Description

@hoelzer

Hey,

thanks for the great simulation tool! We simulated reads using this command:

conda activate SWAMPy
python simulate_metagenome.py \
    --genomes_file ../example/viralref.fasta \
    --temp_folder ../example/${sample}_temp \
    --genome_abundances ../example/${sample}.tsv \
    --primer_set a4 \
    --output_folder ../eebg24_highq \
    --output_filename_prefix ${sample} \
    --n_reads  1000000 \
    --seqSys MSv3 \
    --read_length 250 \
    --seed 10 \
    --amplicon_distribution  dirichlet_1 \
    --amplicon_pseudocounts 200 \
    --unique_insertion_rate 0.00002 \
    --unique_deletion_rate 0.000115 \
    --unique_substitution_rate 0.002485 \
    --recurrent_insertion_rate 0.00002 \
    --recurrent_deletion_rate 0 \
    --recurrent_substitution_rate 0.003357 \
    --max_insertion_length 14 \
    --subs_VAF_alpha 0.36,0.27 \
    --del_VAF_alpha 0.59,0.42 \
    --ins_VAF_alpha 0.33,0.45 \
    --r_subs_VAF_alpha 0.36,0.27 \
    --r_del_VAF_alpha 0.59,0.42 \
    --r_ins_VAF_alpha 0.33,0.45 

but when we check the quality pattern with FASTQC we see such profiles for the paired-end reads:

Forward read

Screenshot 2024-07-10 at 18 04 31

Reverse read

Screenshot 2024-07-10 at 18 05 28

Usually, the read quality drops towards the 5' end and not the 3' end (see https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). So it looks flipped? Although the general error profile looks quite good otherwise!

Did we do something wrong in the simulation, or is this a bug?

(we see the same pattern for all reads we simulated)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions