[Do not merge] Execution refactor by pdobbelaere · Pull Request #96 · molmod/psiflow

pdobbelaere · 2026-03-12T14:10:26Z

This PR overhauls several parts of the execution module, aiming to make future maintenance easier. It contains several components.

Config specification

We reworked the configuration syntax quite dramatically. Some internals of the execution environment have also been updated. Below, we provide an overview of the new way to specify a YAML config file.

The YAML config consists of several components, mainly Parsl config options, global psiflowl settings and ExecutionDefinition blocks.

# specify any valid options of the Parsl Config class, 
# other than 'executors', 'run_dir' and 'initialize_logging'
# the defaults should suffice in most scenarios (except retries, maybe)

psiflow_log_level: warning  # verbosity of psiflow.log file in psiflow_internal
                            # it logs psiflow related events and should be the first place to check when things go wrong 

tmdir_root: /tmp            # where task tmpdirs are created and task files are written
                            # make sure it is not too long to avoid AF_UNIX: PATH_TOO_LONG errors
                        
keep_tmpdirs: false         # whether task tmpdirs are deleted after task completion
                            # only flip to true for debugging; it will eat disk storage

container:                  # if ModelEvaluation and ModelTraining tasks should be executed in a container
    engine: apptainer       # or singularity
    uri: /path/to/container # can point to an online repository or a container in the filesystem
    addopts: [...]          # additional arguments to pass to apptainer/singularity, probably do not touch
    gpu_flavour: none       # cuda/rocm if GPUs are involved

Next, you define various ExecutionDefinition blocks. We first introduce the generic syntax.
You have the exclusive choice between threadpool/workqueue executors and local/remote execution

ExecutionDefinition:        # e.g., ModelEvaluation
  # using local threadpool execution
  executor: threadpool
  max_threads: 2          # how many tasks can run concurrently
  use_gpu: false          # whether tasks can use GPU
    
  # or using the Parsl WorkQueueExecutor
  executor: workqueue     # default over 'threadpool'
  cores_per_task: 1       # how many cores are reserved per task (should be > 0)
  gpus_per_task: 0        # as previous, defaults to 0 (no GPU usage)
  mem_per_task: 0         # as previous, defaults to 0 (no RAM requirements)
    
  # general options
  max_runtime: 99:99:99   # how long tasks can run before they are killed (hh:mm:ss format)
                          # defaults to just under the job walltime in HPC environments
  min_runtime: 00:00:00   # the minimal walltime required for WQ to schedule a task on resource blocks
                          # only works with the WQ executor (hh:mm:ss format)
                            
  env_vars:               # specify custom task environment variables 
    CUSTOM_KEY: CUSTOM_VAL      
        
  # executing locally (default)
  # if not specified, psiflow will query your system for its available resources
  local:
    cores: 8            # number of cores in your local machine
    memory: 16          # usable RAM in your local machine
        
    
  # or executing through the SLURM job scheduler, not compatible with ThreadPoolExecutor
  slurm:
    # specify valid arguments of the Parsl SlurmProvider class
    # GPUs should be requested using the 'gpus_per_node' option
    cores_per_node: 16  # required argument
    walltime: 04:00:00  # required argument (hh:mm:ss format)

The executor types function differently: WorkQueueExecutor will (try to) respect total resource allocations - only starting new tasks when resources free up - while ThreadPoolExecutor only looks at the number of threads in use and can seriously oversubscribe CPU cores or GPUs.

An ExecutionDefinition block comes in three flavours (ModelEvaluation, ModelTraining, and ReferenceEvaluation) that each work slightly differently and accept some custom keys..

ModelEvaluation

timeout: float, max_resource_multiplier: int, allow_oversubscription: bool

The timeout parameter specifies how long the i-Pi server will wait before assuming a calculation has died. This stops the simulation and then cleans up nicely. Useful if you expect unstable simulations, but choosing it too short might end simulations prematurely.

Because i-Pi MD can combine any number of walkers (e.g., in replica exchange or PIMD) with any number of Hamiltonians (e.g., in thermodynamic integration), ModelEvaluation will try to scale computational resources as needed (see table below). Assume we specify:

ModelEvaluation:
  cores_per_task: 1
  local:
    cores: 8
    memory: 16

WQ will assign one core to a basic MD simulation (one walker, one hamiltonian), which means 8 independent simulations fit on our machine simultaneously.

For coupled walkers, WQ will multiply the resource assignment accordingly, without exceeding available resources. Similar behaviour holds for simulations with multiple MACE Hamiltonians. You can limit this multiplicative factor by specifying max_resource_multiplier in the ModelEvaluation block.

By default, psiflow will spawn one client for every walker-hamiltonian pair (so walkers X hamiltonians in total). This can lead to serious resource oversubscription, which is usually undesirable (i.e., clients fighting for CPU cores and reducing simulation speed). You can flip the allow_oversubscription flag to false to limit the maximum number of i-Pi clients in a simulation.

The table below illustrates this idea, where M and O represent max_resource_multiplier and allow_oversubscription.

Walkers	Hamiltonians	M	O	Clients	Cores
1	1	/	T	1	1
4	1	/	T	4	4
32	1	/	T	32	8

1	2	/	T	2	2
2	2	/	T	4	4

8	1	4	T	8	4
4	2	4	T	8	4

32	1	/	F	8	8
8	2	/	F	8	8
16	1	4	F	4	4

In this example, every client wants to use a single core (cores_per_task: 1), but the logic is implemented in terms of WQ task 'slots', which is the minimal ratio between *_per_task directives and total resources (eight here).

ModelTraining

Used for MLP training. Does not accept any special keys at the moment. This might change when the psiflow.models module is overhauled, somewhere in the near future.

ReferenceEvaluation

CP2K, GPAW, ORCA - memory_limit (float)

In the YAML config, ReferenceEvaluation blocks need to be named after the corresponding ab initio software. Because these calculations can be very memory hungry, you can specify memory_limit (in GB). Tasks exceeding that limit will be killed, and psiflow will continue without having other tasks fail. This functionality relies on systemd-run, which is available on many HPCs (TIER-2, TIER-1, LUMI) but probably not everywhere.

Additionally, we added the n_cores option to all Reference classes. It overrides the cores_per_task value, allowing you to separate 'small' and 'large' ab initio calculations. Suppose the YAML config contains

CP2K:
  cores_per_task: 64
  [...]

then all CP2K single-points will be executed on 64 cores. This could be undesirable if you have structures of different sizes. In the psiflow script, you can do

from psiflow.reference import CP2K

cp2k_large = CP2K(cp2k_input)             # defaults to using 64 cores
cp2k_small = CP2K(cp2k_input, n_cores=8)  # limits single-points to 8 cores

to use your computational resources more efficiently.

psiflow_internal and logging

We restructured the psiflow_internal directory slightly, which should make navigation less cumbersome. Additionally, we implemented a psiflow logger that writes to psiflow.log and logs psiflow-specific messages (configuration warnings, task execution states, etc.). Over time, it should become the main log file to check workflow progress.

Debugging

There are several new options to debug your workflow:

Specifying tmdir_root: /some/path and keep_tmpdirs: true will keep the working directory of every Parsl task, allowing you to inspect what input and output files it generates
Specify garbage_collect: false to make the Parsl DFK store every task record it encounters. You can then use log_dfk_tasks from psiflow.utils.logging to print an overview of these task records (task name, input dependencies, outputs, ...), allowing you to see which tasks psiflow submits under the hood.

WQ priority

(experimental feature)

WQ tasks can be given a priority value that determines in which order WQ schedules them. In psiflow, this logic is hidden from users. However, we implemented a very basic SetWQPriority context manager, which will set the priority of tasks submitted:

from psiflow.utils.wq import SetWQPriority

with SetWQPriority(5):
    # app1 and app2 have the same priority - no specific order
    app1()
    app2()

with SetWQPriority(10):
    # app3 has the highest priority - WQ will try to schedule it first
    app3()

This has only been tested superficially and will probably not work in complex join_app scenarios - where new tasks are defined at some undefined moment in the future.

Misc

fixes dataset.save does not return a datafuture #86
fixes Psiflow occupies the /tmp folder and does not properly clean it #92
updated to Parsl v2026.02.16
CP2K output formatting was changed between versions 2024.1 and 2025.02. To have things work correctly, you should use the 2025.02 version from now. CP2K publishes containers themselves; docker://cp2k/cp2k:2025.2_mpich_x86_64_psmp works with this PR.

TODO

Independent WQ tasks placed on the same machine can fight for resources, leading to slowdowns or worse. We are discussing with WQ how to solve this problem.
#Work Queue: Hardware-level isolation between tasks? cooperative-computing-lab/cctools#4370
psiflow.config module needs to be updated (or removed)
The new YAML config structure needs new examples
The psiflow.serialization module is becoming a big mess

- working towards cleaner input configs - started moving more functionality into ExecutionDefinition - added lots of TODO things and quesions

WQ priority can be controlled using the context manager `with SetWQPriority:` which will set the 'priority' resource_spec argument. Very basic implementation, and we will need to be careful with how `wq_resources` is called and used

remove the annoying '000' nesting level and place all executor output directly in psiflow_internal

- further cleanup execution - redo `ModelEvaluation` + threadpool (workqueue still needs work) - make bash app template for all bash apps. It is now possible to specify where tmpdirs are created through the `tmpdir_root` config option. Also, you can specify whether tmpdirs should be removed after the tasks finish (for debugging purposes) ATM, `ModelTraining` and `ReferenceEvaluation` are moderately broken, most likely

bash apps do not know about global scope variables (unless in threadpool)

(Re)implementing some logic - to dynamically scale up MD resources depending on walkers/hamiltonians (capped by the 'max_resource_multiplier' option) - to decide how many clients to spawn for an MD run to avoid resource oversubsciption ('allow_oversubscription' options) Also 'log_dfk_tasks' for debugging figure out how many clients can be used in a simulation

Update ModelTraining - this will be adapted when we update MACE etc. Update ReferenceEvaluation - updated memory_limit - allow creating Reference instances that ask for fewer cores than specified in ReferenceEvaluation (eliminating the need for CP2K/CP2K_small/...)

- update modules to work with the new execution module and syntax - fix tests

pdobbelaere added 13 commits March 12, 2026 15:06

adress molmod#86

4a26255

Update execution.py

32324d6

- working towards cleaner input configs - started moving more functionality into ExecutionDefinition - added lots of TODO things and quesions

implement very basic WQ priority handling

ffc0c11

WQ priority can be controlled using the context manager `with SetWQPriority:` which will set the 'priority' resource_spec argument. Very basic implementation, and we will need to be careful with how `wq_resources` is called and used

cleanup psiflow_internal

4e896d7

remove the annoying '000' nesting level and place all executor output directly in psiflow_internal

bugfix

c2d0e45

bash apps do not know about global scope variables (unless in threadpool)

Update execution.py

ad3e31e

Update ModelTraining - this will be adapted when we update MACE etc. Update ReferenceEvaluation - updated memory_limit - allow creating Reference instances that ask for fewer cores than specified in ReferenceEvaluation (eliminating the need for CP2K/CP2K_small/...)

updates and bugfixes

bb09d39

- update modules to work with the new execution module and syntax - fix tests

setup psiflow.log logging

54089b0

cleanup action

029f26f

small cleanup action

f9f7d33

final tweaks

bcfda23

pdobbelaere mentioned this pull request Mar 17, 2026

[Do not merge] Serialization refactor #97

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] Execution refactor#96

[Do not merge] Execution refactor#96
pdobbelaere wants to merge 13 commits intomolmod:mainfrom
pdobbelaere:execution

pdobbelaere commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Walkers	Hamiltonians	M	O	Clients	Cores
1	1	/	T	1	1
4	1	/	T	4	4
32	1	/	T	32	8

1	2	/	T	2	2
2	2	/	T	4	4

8	1	4	T	8	4
4	2	4	T	8	4

32	1	/	F	8	8
8	2	/	F	8	8
16	1	4	F	4	4

Walkers	Hamiltonians	M	O	Clients	Cores
1	1	/	T	1	1
4	1	/	T	4	4
32	1	/	T	32	8

1	2	/	T	2	2
2	2	/	T	4	4

8	1	4	T	8	4
4	2	4	T	8	4

32	1	/	F	8	8
8	2	/	F	8	8
16	1	4	F	4	4

Conversation

pdobbelaere commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Config specification

ModelEvaluation

ModelTraining

ReferenceEvaluation

psiflow_internal and logging

Debugging

WQ priority

Misc

TODO

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pdobbelaere commented Mar 12, 2026 •

edited

Loading

Walkers	Hamiltonians	M	O	Clients	Cores
1	1	/	T	1	1
4	1	/	T	4	4
32	1	/	T	32	8

1	2	/	T	2	2
2	2	/	T	4	4

8	1	4	T	8	4
4	2	4	T	8	4

32	1	/	F	8	8
8	2	/	F	8	8
16	1	4	F	4	4