This repository contains scripts to download six public datasets and reproduce radiomics experiments on these six datasets and two other public datasets. The six public datasets are described in the following paper:
Starmans, M. P. A. et al. (2021). The WORC* database: MRI and CT scans, segmentations, and clinical labels for 932 patients from six radiomics studies. Submitted, preprint available from https://doi.org/10.1101/2021.08.19.21262238
The data used for six of the datasets can be found at https://xnat.health-ri.nl/data/projects/worc.
The experiments are described in the following paper:
Starmans, M. P. A. et al. (2021). Reproducible radiomics through automated machine learning validated on twelve clinical applications. Submitted, preprint available from https://arxiv.org/abs/2108.08618.
When using parts of this code or the above datasets, please cite the two above mentioned papers. Feel free to also cite the code from this repository:
Martijn P.A. Starmans. WORCDatabase. Zenodo (2021). Available from: https://github.com/MStarmans91/WORCDatabase. DOI: http://doi.org/10.5281/zenodo.5119040
The experiments only require the WORC Python package to be installed, which can be done using pip. In the original WORC paper, version 3.6.3 was used:
pip install "WORC==3.6.3"
The requirements for WORC itself can be found at https://github.com/MStarmans91/WORC.
The six public datasets as published in the Data in Brief paper can be
downloaded using the provided datadownloader.py script. These are
downloaded from the XNAT repository at https://xnat.health-ri.nl/data/projects/worc.
Simply import one of the following functions from the script and run it:
download_Lipodownload_Desmoiddownload_Liverdownload_GISTdownload_CRLMdownload_Melanoma
Additionally, we provide downloaders for two of the three previously publicly published datasets used in the WORC MEDIA paper:
download_HeadAndNeckdownload_Glioma
Documentation for these functions can be found in the docstrings of the functions.
The features used in WORC are stored in CSV format both on the XNAT, and in this Github repository.
In the files in "crossvalidationsplits" folder, you will find the 100x random-split train-test cross-validation setup used in the experiments of the original WORC paper mentioned above. If you want to compare performance of another method with WORC, we very strongly recommend to use the cross-validation splits from these files so you have the exact same evaluation setup.
For each dataset, a file called crossvalidationsplits_[DATASETNAME].csv is provided.
For each cross-validation iteration, a column for training and a column for testing is provided,
including the anonymized identities of each patient included in either the train or test
set (e.g., 0_train, 0_test). These files are also stored on the Health-RI XNAT,
see https://xnat.health-ri.nl/data/projects/worc/resources.
For each image and lesion segmentation, a set of 546 radiomics features were extracted using WORC, which internally uses the PyRadiomics (71 features) and PREDICT (493 features) toolboxes. For each dataset, a file called WORCDatabase_[DATASETNAME]\_features\_[WORC\_Version].csv is provided in the features folder. For each dataset, each row represents a different image and lesion, with each column providing one of the 546 radiomics features. The final column contains the subject name and associated lesion (e.g. [DATASETNAME]_[SUBJECTID]_[LESIONID]). The first row contains the header with the feature name, see for details the WORC paper and documentation. The WORC software version is added for versioning to support future updates when needed. These files re also stored on the Health-RI XNAT, see https://xnat.health-ri.nl/data/projects/worc/resources.
To reproduce the eight default experiments presented in the WORC paper,
i.e., those with the default WORC settings on the above mentioned eight datasets, simply import
and run the run_experiment function from the default_experiments.py script.
The dataset argument can be used to switch between the eight datasets. See
the docstring of the function or the ==help option when running the
function on the command line for further documentation.
For the six public datasets as published in the Data in Brief paper, the data is directly fed from XNAT into WORC. For the two previously publicly published datasets used in the WORC paper, some additional steps are performed, see below.
For the glioma dataset, radiomics features are supplied instead of raw imaging data. As these have been created using Python2 and a specific version of pandas, these files cannot directly be fed into WORC. Therefore, we have created two scripts to convert the feature files to a format that can be fed into WORC, which can be found in the helpers folder:
convert_features_hdf5_to_csv: this script should be run in a Python2 environment with the pandas 0.19.0 and tables package installed. The .hdf5 files containing the features are converted to .csv files.convert_features_csv_to_hdf5.py: this script should be run in a Python3 environment with pandas installed (any version, as long as you are using the same one when running the WORC experiment). The .csv files which were created by the previous function are converted back to .hdf5 files, but now with the same pandas version as you are using for WORC. The features are also renamed to correspond with the formatting used in WORC
Documentation for these functions can be found in the respective files. Note that you first have to download the features from http://dx.doi.org/10.17632/rssf5nxxby.3.
Afterwards, we recommend to use the SimpleWORC module as explained in
the WORC Tutorial (https://github.com/MStarmans91/WORCTutorial) to run the
actual WORC experiment.
For the head and neck dataset, the data can be found at https://xnat.health-ri.nl/data/projects/stwstrategyhn1.
This data can be directly fed into WORC. However, in the
experiment conducted in the paper, we predict the T-stage based on GTV-1 segmentations,
which are both missing for some of the patients. Hence, before running the experiment,
we first check for which patients these are available and only include those.
This is automatically done in the run_experiment function when using the
head and neck dataset.
By default, WORC optimizes the radiomics workflow construction using a
random search of 1000 iterations and creates an ensemble of the top 50
workflows. In the WORC paper, this was compared to various other setttings
for the number of random search iterations, top N for the ensembling,
other ensembling methods, and Bayesian optimization using SMAC. These
can also be performed by using run_automl_experiment function from
the automlcomparison.py script and manipulating the following arguments:
- The
use_smacargument can be set toTrueto use SMAC. - The
smac_budgetargument can be set to 'low', 'medium', or 'high' to change the time budget of SMAC. - The
ensembling_methodargument can be changed to use the other ensembling methods, see https://worc.readthedocs.io/en/latest/static/configuration.html#ensemble. - The
ensembling_sizeargument can be changed to determine the ensemble size if the top_N method is used. - The
RS_iterationsargument can be changed to determine the number of random search iterations if SMAC is not used. - The
radiomics_sotaargument can be set toTrueto use the radiomics baseline (PyRadiomics + LASSO + Logistic Regression)
To support regression testing, the standardized performance evaluation files from WORC
are provided in the pastperformances folder in files called WORCDatabase_performance_[DATASETNAME]_[WORC\_Version].sjon. Among others, these include several performance metrics in each of the above mentioned 100x random-split cross-validation iterations, and their mean and 95% confidence intervals as reported in the original radiomics study. More details are given in the WORC documentation. The WORC software version is added for versioning to support regression testing int the future.
See the WORC FAQ: https://worc.readthedocs.io/en/latest/static/faq.html
The authors thank Laurens Groenendijk for his assistance in processing the data and in the anonimization procedures. Martijn P. A. Starmans acknowledges funding from the research program STRaTeGy with project numbers 14929 and 14930, which is (partly) financed by the Netherlands Organization for Scientific Research (NWO). Part of this study was financed by the Stichting Coolsingel (reference number 567), a Dutch non-profit foundation. This study is supported by EuCanShare and EuCanImage (European Union’s Horizon 2020 research and innovation programme under grant agreement Nr. 825903 and Nr. 952103, respectively).