Improve newer-GPU compatibility and optional kernel fallbacks by Rajesh-Nimmalapudi · Pull Request #14 · IntelliGen-AI/IntelliFold

Rajesh-Nimmalapudi · 2026-03-10T12:24:40Z

Summary

This PR improves IntelliFold compatibility on newer NVIDIA GPUs while keeping
the default inference path unchanged for existing users.

The changes in this PR:

relax the hard torch==2.6.0 pin so users can install a PyTorch build that
supports newer GPUs
move deepspeed from a mandatory dependency to an optional extra
keep baseline inference available when DeepSpeed is installed but cannot be
imported
fall back to the standard attention path when DS4Sci is explicitly requested
but unavailable
remove hardcoded older CUDA arch flags from the fast_layernorm extension
build
document a clearer installation path for newer GPU architectures and clarify
that DS4Sci and fast_layernorm are optional kernel paths

Motivation

The current packaging and kernel-build defaults make installation on newer
NVIDIA GPUs harder than necessary:

strict torch pinning prevents users from selecting a newer torch/CUDA build
for newer devices
deepspeed is treated as mandatory even though baseline inference does not
require it
fast_layernorm hardcodes an older CUDA architecture set

This PR keeps the baseline path conservative while removing those avoidable
compatibility blockers.

What changed

Packaging

setup.py
- changed torch==2.6.0 to torch>=2.6.0
- moved deepspeed into extras_require["deepspeed"]
environment.yaml
- changed torch==2.6.0 to torch>=2.6.0
- removed mandatory deepspeed

Runtime behavior

intellifold/openfold/model/primitives.py
- treat DeepSpeed as optional at import time
intellifold/openfold/utils/kernel/deepspeed_compat.py
- centralize DS4Sci availability checks
intellifold/openfold/inference_config.py
intellifold/openfold/v2_inference_config.py
intellifold/openfold/v2_flash_inference_config.py
- disable DS4Sci cleanly when it is requested but unavailable
- keep the standard path active instead of carrying a broken DS4Sci request
  into model execution

Kernel build behavior

intellifold/openfold/utils/layer_norm/torch_ext_compile.py
- removed hardcoded TORCH_CUDA_ARCH_LIST="7.0;8.0"
- removed the fixed sm_70/sm_80/sm_86/sm_90 list
- respect TORCH_CUDA_ARCH_LIST when it is provided
- otherwise defer to PyTorch's default extension-build behavior

Documentation

added a short newer-GPU installation note to README.md
added an explicit newer-GPU install path to docs/installation.md
clarified kernel build expectations in docs/kernels.md

Validation

I validated the branch locally with:

baseline runtime on a newer NVIDIA GPU using a newer torch/CUDA stack
fast_layernorm JIT build and runtime on the same GPU
DS4Sci-request fallback behavior from a clean installed package
package build and install from this branch into a clean target directory

Scope

This PR does not make DS4Sci the default path. DS4Sci remains optional, and
this change is focused on compatibility and safer optional-kernel behavior.

Improve newer-GPU compatibility and optional kernel fallbacks

21e4514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve newer-GPU compatibility and optional kernel fallbacks#14

Improve newer-GPU compatibility and optional kernel fallbacks#14
Rajesh-Nimmalapudi wants to merge 1 commit intoIntelliGen-AI:mainfrom
Rajesh-Nimmalapudi:fix/newer-gpu-compatibility

Rajesh-Nimmalapudi commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rajesh-Nimmalapudi commented Mar 10, 2026

Summary

Motivation

What changed

Packaging

Runtime behavior

Kernel build behavior

Documentation

Validation

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant