Name	Name	Last commit message	Last commit date
parent directory ..
nuc_profile	nuc_profile
README.md	README.md
ecoli.ron	ecoli.ron
plots.ipynb	plots.ipynb
profiles.png	profiles.png

Boquila example usage

Requirements

Preferably macOS, Linux, or another Unix-like OS (tested on Ubuntu and macOS)
Rust installation
- For installing boquila and later calculating nucleotide profiles of our results
Cutadapt
Python installation and following packages
- plotly=5.3.1
- pingouin=0.4.0
- pandas=1.2.3

Raw Data

We are using Escherichia coli XR-seq data from Adebali,O. et al. (2017)

XR-seq data can be downloaded from here ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR512/007/SRR5125157/SRR5125157.fastq.gz
Escherichia coli reference genome can be downloaded from here save as ecoli_genome.fasta

Install Boquila

There are two methods

Both methods require Cargo the Rust package manager, which should be installed automatically while installing Rust

Installing with cargo Cargo will build and install the binary, by default to $HOME/.cargo/bin/
```
cargo install --branch main --git https://github.com/CompGenomeLab/boquila.git
```
Building from source For convenience, you can copy the executable ./target/release/boquila to some directory in your PATH.
- Clone the repository
```
git clone https://github.com/CompGenomeLab/boquila.git
```
- Then build with cargo
```
cd boquila
cargo build --release
./target/release/boquila --version
0.6.1
```

Preparing Data

We need to trim reads to remove adapter sequences, for that we'll be using cutadapt

cutadapt -a TGGAATTCTCGGGTGCCAAGGAACTCCAGTNNNNNNACGATCTCGTATGCCGTCTTCTGCTTG -o SRR5125157_cutadapt.fastq SRR5125157.fastq

(Optional) We need to create regions file to tell boquila which parts of the genome should be used while simulating reads. We will be using whole genome.

Regions file for this tuttorial is already available at here

We can use awk to get chromosomes and their length from reference genome.
```
awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen += length($0)}END{print seqlen}' ecoli_genome.fasta
```
then create .ronformatted file with the results, example file can be found here

Run Boquila

We can use following command to run simulation

boquila SRR5125157_cutadapt.fastq --ref ecoli_genome.fasta --regions ecoli.ron --seed 7 > SRR5125157_sim.fastq

Profile Plots and Statistical Analysis

We can use nuc_profile utility in boquila to quickly get profiles of our files.

We need to clone the repository and compile nuc_profile
1. Clone the boquila repository if you didn't previously.
```
git clone https://github.com/CompGenomeLab/boquila.git
```
2. Build nuc_profile
```
cd boquila/examples/nuc_profile
cargo build --release
```
  Binary will be available at boquila/examples/nuc_profile/target/release/ Again for convenience, you can copy the executable boquila/examples/nuc_profile/target/release/nuc_profile to some directory in your PATH.
Run the profiler for both files

nuc_profile -R SRR5125157_cutadapt.fastq --len 13 > input_profile.tsv
nuc_profile -R SRR5125157_sim.fastq --len 13 > simulation_profile.tsv

For plots and statistical analysis you can use this python notebook provided in the examples directory
For statistical analysis we are using Hotelling's T-square test
- Hotelling's T2: 0.001973, p_value: 1.0
After completing steps in notebook provided above, you should see following plots.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Boquila example usage

Requirements

Raw Data

Install Boquila

Preparing Data

Run Boquila

Profile Plots and Statistical Analysis

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Boquila example usage

Requirements

Raw Data

Install Boquila

Preparing Data

Run Boquila

Profile Plots and Statistical Analysis