Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging

The first systematic benchmark of fine-tuning strategies — Full Fine-Tuning (FFT), Linear Probing (LP), and Parameter-Efficient Fine-Tuning (PEFT) — applied to CNNs and Foundation Models for COVID-19 prognosis prediction from chest X-rays, under realistic clinical constraints of data scarcity and class imbalance.

Filippo Ruffini · Elena Mulero Ayllon · Linlin Shen · Paolo Soda · Valerio Guarrasi

Unit of Artificial Intelligence and Computer Systems, Università Campus Bio-Medico di Roma · College of Computer Science and Software Engineering, Shenzhen University · Department of Diagnostics and Intervention, Umeå University

📄 Paper · 🧩 Framework · ⚙️ Setup · 🚀 Usage guide · 📊 Results · 📚 Citation

Overview

This repository accompanies the paper "Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging" (published in Computer Methods and Programs in Biomedicine, doi: 10.1016/j.cmpb.2025.106...).

We introduce the first large-scale benchmark that systematically evaluates fine-tuning strategies for clinical prognosis prediction from chest X-rays. The benchmark is structured around three central research questions:

What are the most effective fine-tuning strategies for adapting CNN and FM architectures to prognosis tasks?
Which fine-tuning strategy offers the best efficiency–effectiveness trade-off when applied to FMs?
Can PEFT strategies provide robust adaptation in Few-Shot Learning (FSL) scenarios under prognosis data constraints?

We compare 3 CNN architectures and 8 Foundation Models across 6 fine-tuning strategies (FFT, LP, LoRA, VeRA, BitFit, IA³) on 6 prognostic tasks derived from 4 publicly available COVID-19 CXR datasets, in both full-data and few-shot regimes. The total benchmark required approximately 3,200 GPU-hours on 10 NVIDIA Tesla A40 GPUs.

Key findings: CNNs with FFT remain the most reliable option for severely imbalanced or extremely small datasets. FMs combined with PEFT (especially LoRA and BitFit) are competitive on larger datasets and scale more efficiently. In few-shot settings, LP offers the most stable generalization. No single strategy is universally optimal — the choice depends on dataset size, class balance, and model scale.

Framework

Figure 1 — Benchmark pipeline. The experimental framework is structured into five main stages: (1) Dataset Selection — four publicly available COVID-19 CXR datasets covering mortality, severity and ICU admission across varying imbalance levels and sample sizes; (2) Model Categorization — CNNs pretrained on ImageNet and FMs pretrained via self-supervised or contrastive learning on general or biomedical data; (3) Fine-Tuning Strategies — FFT (upper bound), LP (lower bound), and four PEFT methods: LoRA, VeRA, BitFit, and IA³; (4) Training Regimes — full-data and few-shot (k ∈ {2, 4, 8, 16, 32} samples per class); (5) Inference & Evaluation — MCC as primary metric (robust to class imbalance) and PR-AUC as complementary metric.

Models

Model	Architecture	Pretraining	Data	#Params (M)	Category
ResNet-18	CNN	Supervised	ImageNet	11.7	CNN
ResNet-50	CNN	Supervised	ImageNet	23.5	CNN
DenseNet-121	CNN	Supervised	ImageNet	7.9	CNN
DINOv2-S	ViT-S/14	Self-supervised	LVD-142M	21	FM
DINOv2-B	ViT-B/14	Self-supervised	LVD-142M	86	FM
DINOv2-L	ViT-L/14	Self-supervised	LVD-142M	300	FM
CLIP-Large	ViT-L/14	Contrastive	LAION-400M	300	FM
MedCLIP (ResNet)	ResNet-50	Contrastive	CheXpert+MIMIC	23	FM
MedCLIP (Swin)	Swin-T	Contrastive	CheXpert+MIMIC	27	FM
PubMedCLIP	ViT-B/32	Contrastive	ROCO	86	FM
BioMedCLIP	ViT-B/16	Contrastive	PMC-15M	86	FM

PEFT compatibility matrix

	LoRA	VeRA	IA³	BitFit	LP	FFT
ResNet-18/50	✓			✓	✓	✓
DenseNet-121	✓			✓	✓	✓
DINOv2-S/B/L	✓	✓	✓	✓	✓	✓
CLIP-Large	✓	✓	✓	✓	✓	✓
MedCLIP (ResNet)	✓			✓	✓	✓
MedCLIP (Swin)	✓		✓	✓	✓	✓
PubMedCLIP	✓	✓	✓	✓	✓	✓
BioMedCLIP	✓	✓	✓	✓	✓	✓

Datasets

Four publicly available COVID-19 CXR datasets are used, each annotated with a different prognostic outcome. All are publicly downloadable from their respective sources (links below).

Dataset	Task ID	Task	Samples	Patients	Centers	Class distribution	Validation
AIforCOVID (Soda et al., 2021)	AFC	Severity (Mild vs. Severe)	1585	1585	6	53% / 47%	LOCO
AIforCOVID	AFC_m	Mortality (Alive vs. Deceased)	1585	1585	6	85% / 15%	LOCO
COVID-19-AR (Desai et al., 2020)	CAR	ICU Admission (Yes vs. No)	99	99	1	71% / 29%	5-fold CV
CoCross (Kilintzis et al., 2022)	CC	ICU Outcome (Alive vs. Deceased)	389	150	1	63% / 37%	5-fold CV
Stony Brook COVID-19 (Saltz et al., 2021)	NY_small	Mortality (1 CXR/patient)	1365	1365	1	87% / 13%	5-fold CV
Stony Brook COVID-19	NY_all	Mortality (all CXRs)	13639	1365	1	64% / 36%	5-fold CV

These datasets collectively span the full spectrum of real-world prognostic challenges: from balanced to severely imbalanced classes, from small (N=99) to large (N=13,639) cohorts, and from single-center to multi-center settings.

Repository layout

.
├── src/                                    # Core codebase
│   ├── eval/
│   │   └── classification/
│   │       ├── linear.py                   # Main training entry point (Hydra)
│   │       ├── features_extraction.py      # Feature extraction for LP
│   │       └── ml_training.py              # Classical ML baselines
│   ├── data/
│   │   ├── datasets/                       # Per-dataset torch.Dataset classes
│   │   │   ├── aiforcovid.py               # AIforCOVID (AFC, AFC_m)
│   │   │   ├── car.py                      # COVID-19-AR (CAR)
│   │   │   ├── cocross.py                  # CoCross (CC)
│   │   │   └── ny.py                       # Stony Brook NY (NY_small, NY_all)
│   │   ├── loaders.py                      # DataLoader factory
│   │   ├── samplers.py                     # Class-balanced sampler
│   │   └── augmentations.py                # CXR augmentation pipeline
│   ├── models/                             # CNN and FM wrappers + PEFT injection
│   ├── configs/PEFT_runs/
│   │   ├── config.yaml                     # Hydra root config
│   │   └── experiment/
│   │       ├── databases/                  # Per-dataset configs (AFC, CAR, CC, NY, …)
│   │       ├── models/                     # Per-model configs (resnet_18, vitb14, …)
│   │       ├── paths/                      # System path profiles (local.yaml)
│   │       ├── validation_strategy/        # hold_out / 5fold / loCo
│   │       └── linear_probing_*.yaml       # Experiment presets (PEFT method × regime)
│   ├── bash/
│   │   ├── run_all.sh                      # Full benchmark reproduction (all datasets × regimes)
│   │   ├── multiple_linear.sh              # Per-dataset batch launcher (MAX_JOBS=4 concurrent)
│   │   ├── linear.sh                       # Single-run worker (called by launch_bash.py)
│   │   ├── extractor.sh                    # Feature-extraction worker
│   │   ├── debug_linear.sh                 # Quick single-model debug run
│   │   ├── AGGREGATE_RESULTS.sh            # Aggregate all datasets after runs finish
│   │   └── launch_bash.py                  # Job dispatcher (local background processes)
│   ├── preprocessing/                      # Per-dataset preprocessing scripts
│   │   ├── AFC/
│   │   ├── CoCross/
│   │   ├── COVID-19-AR/
│   │   └── COVID-NY/
│   └── postprocess/
│       ├── aggregate_results/              # Fold aggregation scripts
│       └── interface/                      # Dash-based interactive results explorer
├── figures/
│   ├── final_method.pdf                    # Figure 1 — pipeline overview
│   ├── fine_tuning_comparison.pdf          # Figure 2 — fine-tuning comparison boxplot
│   ├── CNN_vs_FM/                          # Figure 3 — CNN vs FM per-dataset plots
│   ├── ALL/                                # Figure 4 — all-FM PEFT scatter plots
│   └── CE95/                               # Appendix — 95% CI plots per dataset
└── requirements.txt                        # Pinned Python dependencies

Setup

1. Clone the repository

git clone https://github.com/fruffini/PEFT_Prognosis.git
cd PEFT_Prognosis

2. Install Python and create the environment

Python 3.10 is required. The full set of pinned versions used in the paper is in requirements.txt.

# Create and activate the virtual environment
python3.10 -m venv PEFT_env
source PEFT_env/bin/activate      # Windows: PEFT_env\Scripts\activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

Key packages installed (click to expand)

Package	Version	Role
`torch`	2.2.2	Deep learning framework
`torchvision`	0.17.2	Vision datasets and transforms
`peft`	0.11.1	HuggingFace PEFT (LoRA, VeRA, BitFit, IA³)
`transformers`	4.41.1	CLIP, DINOv2, PubMedCLIP, BioMedCLIP
`open-clip-torch`	2.24.0	OpenCLIP / BioMedCLIP loading
`timm`	0.9.16	Model utilities
`MedCLIP`	0.0.3	MedCLIP vision encoders
`hydra-core`	1.3.0	Configuration management
`wandb`	0.17.1	Experiment tracking
`scikit-learn`	1.4.2	Classical ML baselines and metrics

GPU note. The benchmark was run on NVIDIA A40 GPUs (48 GB). For 16–24 GB cards, set a smaller batch_size in the dataset config or use torch.cuda.amp. CPU-only runs are possible but very slow.

3. Configure paths

Open src/configs/PEFT_runs/experiment/paths/system/local.yaml and set the two paths for your machine:

# src/configs/PEFT_runs/experiment/paths/system/local.yaml
data_base_path: /path/to/your/data/processed   # root of preprocessed datasets
output_path:    /path/to/your/results           # where runs and checkpoints are saved

All Hydra commands automatically pick up these paths via experiment/paths/system@_global_=local.

4. Download the datasets

All four datasets are publicly available and free to download. Create the directory tree below, then download each dataset into its raw folder.

data/
└── raw/
    ├── AIforCOVID/          ← download here
    ├── COVID-19-AR/         ← download here
    ├── CoCross/             ← download here
    └── StonyBrook-COVID19/  ← download here

AIforCOVID (AFC / AFC_m)

1 585 patients · 6 Italian hospitals · two tasks: severity (balanced) and mortality (imbalanced)

Register and request access at the AIforCOVID portal or download directly from the Zenodo record.
Place the downloaded archive under data/raw/AIforCOVID/.

Preprocess:

python src/preprocessing/AFC/preprocess_AFC.py

COVID-19-AR (CAR)

99 CXRs · rural US population · ICU admission · strongly imbalanced

Download from The Cancer Imaging Archive (TCIA) — no registration required.
Place the DICOM/PNG files under data/raw/COVID-19-AR/.

Preprocess:

python src/preprocessing/COVID-19-AR/preprocess_CAR.py

CoCross (CC)

389 CXRs · longitudinal ICU monitoring · ICU outcome

Download from the CoCross dataset page (supplementary data link in the paper) or directly from the Zenodo record.
Place files under data/raw/CoCross/.

Preprocess:

python src/preprocessing/CoCross/preprocess_CC.py

Stony Brook COVID-19 (NY_small / NY_all)

13 639 CXRs · 1 365 patients · mortality · two sampling variants

Download from TCIA collection TCIA.BBAG-2690 — free TCIA account required.
Place the downloaded images under data/raw/StonyBrook-COVID19/.
Preprocess (generates both the single-CXR-per-patient split NY and the full longitudinal split NY_all):
```
python src/preprocessing/COVID-NY/preprocess_NY.py
```

After running the four scripts above, the processed splits appear under data/processed/:

data/processed/
├── AFC/            # AIforCOVID — severity
├── AFC_death/      # AIforCOVID — mortality
├── CAR/            # COVID-19-AR
├── CoCross/        # CoCross
├── NY/             # Stony Brook — 1 CXR/patient
└── NY_all/         # Stony Brook — all CXRs

Each directory contains the image files and a CSV manifest with split assignments consumed by the dataset classes in src/data/datasets/.

5. Download pretrained model weights

All model weights are downloaded automatically on first use by the respective library. No manual download is needed except for models behind a Hugging Face access gate.

Model	Auto-download	Source
ResNet-18 / ResNet-50 / DenseNet-121	✓ `torchvision`	PyTorch Hub
DINOv2-S / B / L	✓ `torch.hub`	facebookresearch/dinov2
CLIP-Large	✓ `open_clip`	mlfoundations/open_clip
MedCLIP (ResNet / Swin)	✓ `MedCLIP.from_pretrained()`	RyanWangZf/MedCLIP
BioMedCLIP	✓ `open_clip` (HF hub)	microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
PubMedCLIP	✓ `transformers` (HF hub)	flaviagiammarino/pubmed-clip-vit-base-patch32

For BioMedCLIP and PubMedCLIP (Hugging Face), log in once so the download can proceed:

pip install huggingface_hub
huggingface-cli login   # generate a token at https://huggingface.co/settings/tokens

Weights are cached in ~/.cache/huggingface/ and ~/.cache/torch/ after the first run.

Using the repository

1. Run a single experiment (Hydra CLI)

The main entry point is src/eval/classification/linear.py. All configuration is handled by Hydra — override any parameter on the command line.

source PEFT_env/bin/activate

# Linear Probing — ResNet-18 on AIforCOVID (AFC)
python src/eval/classification/linear.py \
    experiment/databases@db=AFC \
    experiment/models@_global_=resnet_18 \
    experiment=linear_probing_none_test_all \
    experiment/validation_strategy@_global_=loCo

# LoRA (rank=8) — DINOv2-B on Stony Brook NY (all CXRs)
python src/eval/classification/linear.py \
    experiment/databases@db=NY_all \
    experiment/models@_global_=vitb14_pretrain \
    experiment=linear_probing_LoRa_8_test_all \
    experiment/validation_strategy@_global_=5fold

# Full Fine-Tuning — BioMedCLIP on CoCross
python src/eval/classification/linear.py \
    experiment/databases@db=CoCross \
    experiment/models@_global_=biomedclip \
    experiment=full_finetuning \
    experiment/validation_strategy@_global_=5fold

# BitFit — Few-Shot (FSL) — CLIP-Large on AFC
python src/eval/classification/linear.py \
    experiment/databases@db=AFC \
    experiment/models@_global_=clip_large \
    experiment=linear_probing_fitbit_FSL_test_all \
    experiment/validation_strategy@_global_=loCo

Available experiment presets (src/configs/PEFT_runs/experiment/):

Config file	Strategy	Regime
`linear_probing_none_test_all.yaml`	LP	Full data
`full_finetuning.yaml`	FFT	Full data
`linear_probing_LoRa_{4,8,16}_test_all.yaml`	LoRA (r=4/8/16)	Full data
`linear_probing_VeRA_{4,8,16}_test_all.yaml`	VeRA (r=4/8/16)	Full data
`linear_probing_fitbit_test_all.yaml`	BitFit	Full data
`linear_probing_IA3_test_all.yaml`	IA³	Full data
`linear_probing_none_FSL_test_all.yaml`	LP	Few-Shot
`full_finetuning_FSL.yaml`	FFT	Few-Shot
`linear_probing_LoRa_FSL_{4,8,16}_test_all.yaml`	LoRA FSL	Few-Shot
`linear_probing_VeRA_FSL_{4,8,16}_test_all.yaml`	VeRA FSL	Few-Shot
`linear_probing_fitbit_FSL_test_all.yaml`	BitFit FSL	Few-Shot
`linear_probing_IA3_FSL_test_all.yaml`	IA³ FSL	Few-Shot

Available model configs (experiment/models/): resnet_18, resnet_50, dense121, vitb14_pretrain, vitl14_pretrain, vits14_pretrain, clip_large, medclip_resnet50, medclip_vision, pubmedclip, biomedclip

Available database configs (experiment/databases/): AFC, AFC_death, CAR, CoCross, NY_small, NY_all

Available validation strategies (experiment/validation_strategy/): loCo (Leave-One-Center-Out, for multi-centric AFC/AFC_m), 5fold (5-fold CV, for single-center datasets), hold_out

System path profiles (experiment/paths/system@_global_=): local (set paths in src/configs/PEFT_runs/experiment/paths/system/local.yaml)

Outputs (per-fold predictions, metrics, checkpoints) land under results/<DATASET>/<MODEL>/<STRATEGY>/.

2. Run batch sweeps

The multiple_linear.sh launcher fans out all model × fine-tuning combinations for a given dataset and regime:

# Full-data benchmark on AIforCOVID (severity)
bash src/bash/multiple_linear.sh -d AFC -v L -e VANILLA

# Few-Shot benchmark on AIforCOVID
bash src/bash/multiple_linear.sh -d AFC -v L -e FSL

# Counterfactual FSL (CFSL)
bash src/bash/multiple_linear.sh -d AFC -v L -e CFSL

# Replace -d with any dataset ID: AFC | AFC_death | CAR | CoCross | NY | NY_all

The -v flag controls validation strategy (L = LOCO, 5 = 5-fold), and -e controls the regime (VANILLA = full data, FSL = few-shot, CFSL = counterfactual FSL).

To run the complete benchmark in one shot (all datasets × all regimes), use the top-level script:

bash src/bash/run_all.sh

3. Aggregate results across folds

After all folds complete, aggregate per-fold results into summary tables:

python src/postprocess/aggregate_results/aggregate_results.py \
    experiment/databases@db=AFC \
    experiment/paths/system@_global_=local

Aggregated tables are written under results/aggregated_results/.

4. Explore results interactively

A Dash-based web interface is included for interactive exploration of aggregated results:

python src/postprocess/interface/index.py
# Open http://localhost:8050 in your browser

The interface includes per-dataset scatter plots of MCC vs. % parameters trained, CNN vs. FM comparisons, and model-level breakdowns.

5. Reproduce paper figures

The figures in figures/CNN_vs_FM/, figures/ALL/, and figures/CE95/ were generated from the aggregated results using the plotting scripts under src/postprocess/. After populating results/aggregated_results/, re-run the plotting pipeline that generated those figures from within the interface or the dedicated postprocess scripts.

Results

Fine-tuning comparison across datasets

Figure 2 — Distribution of MCC scores per fine-tuning method and dataset. Each box summarizes the mean performance of all models fine-tuned with a given technique on a specific dataset. FFT is consistently strong on small/imbalanced tasks; LP and BitFit offer the most stable lightweight alternatives; LoRA and VeRA show higher variance.

CNN vs. Foundation Model comparison

Figure 3 — CNN vs. FM per dataset (MCC). Each subplot shows the mean test MCC for all fine-tuning methods applicable to both architecture families. Each model is represented by its own symbol and each method by a distinct color. FFT (★) is shown separately. Key observations: (a) on balanced AFC, PEFT and FFT are competitive across architectures; (b) on imbalanced AFC_m, PEFT degrades sharply for most models, with only DINOv2 variants retaining reasonable performance; (e–f) dataset size is a critical factor — PEFT improves markedly from NY_small to NY_all.

All-FM PEFT analysis

Figure 4 — MCC vs. % parameters trained (FM only). X-axis: fraction of model parameters updated during fine-tuning. Y-axis: mean MCC on the test set. Each point is a (model, method) pair. The plot reveals that performance does not scale monotonically with parameter count: BitFit and LP (far left) often match or outperform mid-range PEFT configurations (LoRA, VeRA), while FFT (far right) dominates on small, imbalanced tasks.

Headline numbers (MCC, mean ± std across folds)

Selected top results from the paper (full tables in the published article):

Model	Task	Best strategy	MCC
DINOv2-S	CAR	LoRA (r=4)	77.2 ± 16.4
PubMedCLIP	NY_small	FFT	65.0 ± 25.2
DenseNet-121	CAR	FFT	70.4 ± 36.8
DINOv2-S	AFC	FFT	49.6 ± 13.8
DenseNet-121	AFC_m	FFT	51.8 ± 31.1
CLIP-Large	NY_all	LoRA (r=8)	43.5 ± 10.2
DINOv2-L	NY_all	BitFit	45.1 ± 9.5
BioMedCLIP	NY_small	FFT	43.8 ± 7.8

Full results for all model–task–strategy combinations are reported in the published paper (MCC and PR-AUC tables, Wilcoxon signed-rank pairwise comparisons).

95% confidence intervals (Appendix)

Appendix figures — MCC with 95% confidence intervals per dataset. Detailed per-model, per-method MCC plots with 95% CI are available for all six tasks in figures/CE95/.

Key conclusions

CNNs remain reliable in extreme low-data regimes. ResNet and DenseNet models fine-tuned with FFT outperform FMs consistently on very small (N < 200) or severely imbalanced datasets, due to their compact architecture and strong inductive biases.
FMs with PEFT excel as data availability increases. LoRA and BitFit enable efficient adaptation of large pretrained models with minimal parameter updates; they are competitive or superior on larger datasets (NY_all, AFC).
PEFT is sensitive to class imbalance. Severe imbalance (e.g., AFC_m at 85–15%) degrades PEFT performance sharply, while more balanced data restores competitiveness.
LP is the most stable few-shot strategy. In FSL settings (k ≤ 32 shots per class), LP consistently outperforms other methods on average, offering a computationally inexpensive and robust solution.
No single fine-tuning strategy is universally optimal. Model architecture, dataset scale, and class balance jointly determine which approach works best — the benchmark provides an actionable decision map.

Compute

Experiments were conducted on a high-performance computing cluster equipped with 10 NVIDIA Tesla A40 GPUs (48 GB each) via the National Academic Infrastructure for Supercomputing in Sweden (NAISS). Total benchmark cost: approximately 3,200 GPU-hours (~6 GPU-hours per model–dataset combination on average).

Data availability

All four datasets used in this benchmark are publicly available and can be downloaded from their original sources:

Dataset	Download
AIforCOVID	AIforCOVID portal
COVID-19-AR	The Cancer Imaging Archive / Radiology: AI
CoCross	CoCross dataset
Stony Brook COVID-19	TCIA collection TCIA.BBAG-2690

No patient-level data is included in this repository.

Citation

If you use this code or build on this benchmark, please cite:

@article{ruffini2025benchmarking,
  title   = {Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning
             for Prognosis Prediction in Medical Imaging},
  author  = {Ruffini, Filippo and Mulero Ayllon, Elena and Shen, Linlin and
             Soda, Paolo and Guarrasi, Valerio},
  journal = {Computer Methods and Programs in Biomedicine},
  year    = {2025},
  doi     = {10.1016/j.cmpb.2025.106...},
  url     = {https://www.sciencedirect.com/science/article/pii/S016926072500611X}
}

Please also cite the foundation models you use:

DINOv2 — Oquab et al., github.com/facebookresearch/dinov2
CLIP — Radford et al., openai.com/research/clip
MedCLIP — Wang et al., github.com/RyanWangZf/MedCLIP
BioMedCLIP — Zhang et al., HuggingFace
PubMedCLIP — Eslami et al., HuggingFace
HuggingFace PEFT — huggingface.co/docs/peft

License

All source code, configurations, documentation, and figures in this repository are released under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) — see creativecommons.org/licenses/by-nc/4.0 for the full text.

✅ Allowed — academic research, teaching, non-profit clinical research, personal study, modification and redistribution with attribution.
❌ Not allowed without a separate licence — incorporation into commercial products, paid clinical decision-support systems, or any other commercial exploitation.

For commercial licensing contact the corresponding authors.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.idea		.idea
.vscode		.vscode
configs		configs
figures		figures
src		src
.gitignore		.gitignore
README.md		README.md
cleaner_bash.sh		cleaner_bash.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging

Overview

Framework

Models

PEFT compatibility matrix

Datasets

Repository layout

Setup

1. Clone the repository

2. Install Python and create the environment

3. Configure paths

4. Download the datasets

AIforCOVID (AFC / AFC_m)

COVID-19-AR (CAR)

CoCross (CC)

Stony Brook COVID-19 (NY_small / NY_all)

5. Download pretrained model weights

Using the repository

1. Run a single experiment (Hydra CLI)

2. Run batch sweeps

3. Aggregate results across folds

4. Explore results interactively

5. Reproduce paper figures

Results

Fine-tuning comparison across datasets

CNN vs. Foundation Model comparison

All-FM PEFT analysis

Headline numbers (MCC, mean ± std across folds)

95% confidence intervals (Appendix)

Key conclusions

Compute

Data availability

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages