Skip to content

NHSDigital/dtos-analyse-data-science

Repository files navigation

DPSP Analyse Data Science

CI/CD Pull Request Quality Gate Status

Welcome to the DSPS Analyse team's Data Science repository! This repository stores all of our Data Science projects.

Table of Contents

Setup

This should be a frictionless installation process that works on various operating systems (macOS, Linux, Windows WSL) and handles all the dependencies.

Clone the repository (SSH)

git clone [email protected]:NHSDigital/dtos-analyse-data-science.git

Prerequisites

The following software packages, or their equivalents, are expected to be installed and configured:

  • Docker container runtime or a compatible tool, e.g. Podman,
  • GNU make 3.82 or later,
  • pip package manager for Python

Note

The version of GNU make available by default on macOS is earlier than 3.82. You will need to upgrade it or certain make tasks will fail. On macOS, you will need Homebrew installed, then to install make, like so:

brew install make

You will then see instructions to fix your $PATH variable to make the newly installed version available. If you are using dotfiles, this is all done for you.

  • GNU sed and GNU grep are required for the scripted command-line output processing,
  • GNU coreutils and GNU binutils may be required to build dependencies like Python, which may need to be compiled during installation,

Note

For macOS users, installation of the GNU toolchain has been scripted and automated as part of the dotfiles project. Please see this script for details.

  • Python required to run Git hooks,
  • jq a lightweight and flexible command-line JSON processor.

Configuration

Installation and configuration of the toolchain dependencies

make config

Usage

Inside the projects folder, each project has it's own README which explains the purpose of the project and how to install and run it.

Testing

To run tests on your local branch (these are the same tests that run automatically on commit, and remotely on GitHub)

make githooks-run

Design

Each project in the projects folder is self-contained, with it's own README, pyproject.toml and Docker file. Projects can be developed on local machines, using Poetry for virtual environments, package and dependency management. Alternatively Podman/Docker can be used to run the scripts.

Contacts

Contact screening-team-analyse-data-science on Slack

Licence

Unless stated otherwise, the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation.

Any HTML or Markdown documentation is © Crown Copyright and available under the terms of the Open Government Licence v3.0.

About

Data science related code for the Digital Screening Analyse Team

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •