Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a Too many open files error that occurs during demultiplexing when cutadapt needs more file descriptors than the default ulimit of 1024 allows. When many adapters are used (e.g., 24 forward × 24 reverse in paired-end mode requires ~1250 file descriptors), the process fails. The fix raises the soft open-file limit to 6000 before cutadapt is invoked.
Changes:
- Raise the soft
ulimitfor open files to 6000 in the single-end demultiplexing script - Raise the soft
ulimitfor open files to 6000 in the paired-end demultiplexing script
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/pipecraft-core/service_scripts/demux_single_end_data.sh |
Adds ulimit -S -n 6000 before the workflow starts to prevent "Too many open files" errors during single-end demultiplexing |
src/pipecraft-core/service_scripts/demux_paired_end_data.sh |
Adds ulimit -S -n 6000 before the workflow starts to prevent "Too many open files" errors during paired-end demultiplexing |
Two moderate concerns with this approach:
-
Soft vs. hard limit:
ulimit -Sonly modifies the soft limit. If the Docker container's hard limit for open files happens to be below 6000 (which can occur depending on the Docker/host configuration since no explicitUlimitsare set inHostConfigwhen containers are created), the command will fail and the actual limit will remain at the default 1024, meaning the bug is not fixed in those environments. -
Hardcoded value may be insufficient for large multiplexing runs: The value 6000 is adequate for the example in the PR description (24×24 adapters, paired-end → ~1250 FDs), but for larger experiments (e.g., 96×96 dual adapters on paired-end data), the required file descriptors could exceed 6000. A more conservative choice like 65536 (a common production default) would make this more future-proof.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| run_python_module=$"python3 /scripts/submodules/assign_sample_names.demuxModule.py $indexes_file" | ||
|
|
||
| # Increase the number of open files limit | ||
| ulimit -S -n 6000 |
There was a problem hiding this comment.
The command ulimit -S -n 6000 only raises the soft limit for open files. The soft limit can only be raised up to the current hard limit. If the container's hard limit is lower than 6000 (which on some Docker/OS configurations may equal the default of 1024), this command will fail with an error like bash: ulimit: open files: cannot modify limit: Operation not permitted or silently do nothing.
To ensure this works reliably across all deployment environments, consider also setting the hard limit (ulimit -H -n 6000), which requires root/privileged access (common in containers), or — more robustly — setting the nofile ulimit at the Docker container creation level in HostConfig.Ulimits. Additionally, if users run large multiplexing experiments (e.g., 96×96 adapters for paired-end), they would need 96×96×2 ≈ 18,432 file descriptors, which would exceed the hardcoded 6000 limit. Consider making this value dynamically computed based on the number of adapters or using a higher ceiling such as 65536.
| ulimit -S -n 6000 | |
| # Allow override via DESIRED_NOFILE; default to a higher ceiling suitable for large multiplexing | |
| DESIRED_NOFILE="${DESIRED_NOFILE:-65536}" | |
| # Get current hard limit | |
| current_hard_limit=$(ulimit -H -n 2>/dev/null || echo "") | |
| if [ -n "$current_hard_limit" ]; then | |
| # Try to raise hard limit to desired value (may fail without sufficient privileges) | |
| if [ "$current_hard_limit" != "unlimited" ] && [ "$current_hard_limit" -lt "$DESIRED_NOFILE" ] 2>/dev/null; then | |
| ulimit -H -n "$DESIRED_NOFILE" 2>/dev/null || true | |
| current_hard_limit=$(ulimit -H -n 2>/dev/null || echo "") | |
| fi | |
| # Clamp soft limit to the (possibly updated) hard limit | |
| target_soft_limit="$DESIRED_NOFILE" | |
| if [ "$current_hard_limit" != "unlimited" ] && [ "$current_hard_limit" -lt "$target_soft_limit" ] 2>/dev/null; then | |
| target_soft_limit="$current_hard_limit" | |
| fi | |
| ulimit -S -n "$target_soft_limit" 2>/dev/null || true | |
| fi |
| # Increase the number of open files limit | ||
| ulimit -S -n 6000 | ||
|
|
There was a problem hiding this comment.
Same concern as in the paired-end script: ulimit -S -n 6000 only raises the soft limit. If the container's hard limit is below 6000, this command will fail. For a single-end scenario with many adapters (e.g., 96+), even 6000 might not be sufficient (96 adapters × 1 output file each + communication overhead could be fine, but the risk grows with scale).
Consider using a higher, more future-proof value (e.g., 65536) or dynamically computing the required value based on the adapter count.
| # Increase the number of open files limit | |
| ulimit -S -n 6000 | |
| # Increase the number of open files limit (respect hard limit and avoid lowering current soft limit) | |
| desired_nofiles=65536 | |
| hard_limit=$(ulimit -H -n 2>/dev/null) | |
| soft_limit=$(ulimit -S -n 2>/dev/null) | |
| # Determine target_limit: do not exceed hard limit (if numeric), and do not go below current soft limit | |
| if [ "$hard_limit" = "unlimited" ] || [ "$hard_limit" = "unlimited\n" ]; then | |
| target_limit=$desired_nofiles | |
| else | |
| target_limit=$desired_nofiles | |
| if [ "$hard_limit" -lt "$desired_nofiles" ] 2>/dev/null; then | |
| target_limit=$hard_limit | |
| fi | |
| fi | |
| # Never lower the existing soft limit | |
| if [ "$soft_limit" -gt "$target_limit" ] 2>/dev/null; then | |
| target_limit=$soft_limit | |
| fi | |
| ulimit -S -n "$target_limit" 2>/dev/null || { | |
| printf 'Warning: unable to increase open file limit; continuing with current limits.\n' >&2 | |
| } |
Demultiplexing fails with
ERROR: [Errno 24] Too many open filesThis is related to the number of open files allowed for a user.
E.g., for 24 forward and 24 reverse adapters and paired-end data, you need about 25252=1250 file descriptors, plus a couple for communication between each process (ref).
The default
ulimitis 1024.