- Preferably macOS, Linux, or another Unix-like OS (tested on Ubuntu and macOS)
- Rust installation
- For installing boquila and later calculating nucleotide profiles of our results
- Cutadapt
- Python installation and following packages
plotly=5.3.1pingouin=0.4.0pandas=1.2.3
We are using Escherichia coli XR-seq data from Adebali,O. et al. (2017)
-
XR-seq data can be downloaded from here
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR512/007/SRR5125157/SRR5125157.fastq.gz -
Escherichia coli reference genome can be downloaded from here save as
ecoli_genome.fasta
There are two methods
Both methods require Cargo the Rust package manager, which should be installed automatically while installing Rust
-
Installing with
cargoCargo will build and install the binary, by default to$HOME/.cargo/bin/cargo install --branch main --git https://github.com/CompGenomeLab/boquila.git
-
Building from source For convenience, you can copy the executable
./target/release/boquilato some directory in yourPATH.- Clone the repository
git clone https://github.com/CompGenomeLab/boquila.git
- Then build with
cargocd boquila cargo build --release ./target/release/boquila --version 0.6.1
- Clone the repository
-
We need to trim reads to remove adapter sequences, for that we'll be using
cutadaptcutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTNNNNNNACGATCTCGTATGCCGTCTTCTGCTTG -o SRR5125157_cutadapt.fastq SRR5125157.fastq
-
(Optional) We need to create regions file to tell boquila which parts of the genome should be used while simulating reads. We will be using whole genome.
Regions file for this tuttorial is already available at here
- We can use
awkto get chromosomes and their length from reference genome.then createawk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen += length($0)}END{print seqlen}' ecoli_genome.fasta.ronformatted file with the results, example file can be found here
We can use following command to run simulation
boquila SRR5125157_cutadapt.fastq --ref ecoli_genome.fasta --regions ecoli.ron --seed 7 > SRR5125157_sim.fastqWe can use nuc_profile utility in boquila to quickly get profiles of our files.
-
We need to clone the repository and compile
nuc_profile- Clone the
boquilarepository if you didn't previously.git clone https://github.com/CompGenomeLab/boquila.git
- Build
nuc_profileBinary will be available atcd boquila/examples/nuc_profile cargo build --releaseboquila/examples/nuc_profile/target/release/Again for convenience, you can copy the executableboquila/examples/nuc_profile/target/release/nuc_profileto some directory in yourPATH.
- Clone the
-
Run the profiler for both files
nuc_profile -R SRR5125157_cutadapt.fastq --len 13 > input_profile.tsv
nuc_profile -R SRR5125157_sim.fastq --len 13 > simulation_profile.tsv-
For plots and statistical analysis you can use this python notebook provided in the examples directory
-
For statistical analysis we are using Hotelling's T-square test
- Hotelling's T2: 0.001973, p_value: 1.0
-
After completing steps in notebook provided above, you should see following plots.
