Skip to content

jas-st/reComBat-seq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reComBat-seq

reComBat-seq is a batch effect adjustment tool designed for bulk RNA-seq count data with underdetermined experimental designs. Building upon the negative binomial regression framework used in ComBat-seq[2], reComBat-seq incorporates Elastic Net regularization to address convergence issues when handling confounded data. It takes raw untransformed, raw count matrices as input and requires a known batch variable.

By applying regularized negative binomial regression, reComBat-seq models batch effects while enabling stable parameter estimation with rank-deficient design matrices. The adjusted data produced preserves the integer nature of counts, ensuring compatibility with widely used differential expression tools such as edgeR[1] and DESeq2.

This formulation extends the standard negative binomial regression model by incorporating regularization penalties, with NLLrepresenting the negative log-likelihood function.

$$L(\beta) = NLL(\beta) + \lambda \bigg( \alpha \| \beta \|_1 + \bigg( \frac{1-\alpha}{2} \bigg) \| \beta \|_2^2 \bigg)$$

Parallelisation is implemented via OpenMP. OpenMP is not available by default on macOS and must be installed separately (e.g. via Homebrew: brew install libomp). If OpenMP is unavailable, it will fall back to single-threaded execution.

Installation

Before use install edgeR from Bioconductor.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("edgeR")

reComBat-seq is then available for installation via GitHub.

# install.packages("devtools")
devtools::install_github("jas-st/reComBat-seq")

Usage

The tutorial folder contains some examples including the code used to generate the below plots using the breast cancer data from the ComBat-seq[2] paper as well as code used to correct the muscular dystrophy data used in our study. Using reComBat-seq begins with a raw count matrix. We then need to identify the batch variable we want to remove as well as all the covariates we want to make sure to keep their variation aka wanted.variation. Examples of the latter one may be biological covariates like disease status or cell identity. The code below shows the usage principle

library(reComBatseq)
# reading in data from files
raw.counts <- read.table(
    'muscular_dystrophy.exp.tsv',
    sep = '\t',
    quote = '',
    header = TRUE,
    row.names = 'X'
)

meta <- read.table(
    'muscular_dystrophy.meta.tsv',
    sep = '\t',
    quote = '',
    header = TRUE,
    row.names = 'X'
)

# applying recombatseq correction using disease as wanted covariate
corrected.counts <- reComBat.seq(
    t(raw.counts),
    batch=meta$sra_study_acc, 
    wanted.variation=meta['Disease']
)
Raw Data Corrected Data (Singular Design)

Arguments

  • counts - raw count matrix from genomic studies (dimensions gene x sample)
  • batch - vector containing batch assignment of samples
  • wanted.variation - a data.frame containing the covariates whose variation you want to preserve
  • num.threads - number of threads for parallel gene-wise regression using OpenMP, default is single-thread

The regularization can be adjusted via the following parameters:

  • lambda.reg - controls the strength of the regularization, $\lambda$ in the above equation
  • alpha.reg - controls the elastic net tuning, $\alpha$ in the above equation

References

  1. Chen Y, Chen L, Lun ATL, Baldoni P, Smyth GK (2025). “edgeR v4: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets.” Nucleic Acids Research, 53(2), gkaf018. doi:10.1093/nar/gkaf018.
  2. Yuqing Zhang, Giovanni Parmigiani, W Evan Johnson, ComBat-seq: batch effect adjustment for RNA-seq count data, NAR Genomics and Bioinformatics, Volume 2, Issue 3, 1 September 2020, lqaa078, https://doi.org/10.1093/nargab/lqaa078

About

Extension to ComBat-seq using regularized negative binomial regression.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors