Draft
Conversation
- working towards cleaner input configs - started moving more functionality into ExecutionDefinition - added lots of TODO things and quesions
WQ priority can be controlled using the context manager `with SetWQPriority:` which will set the 'priority' resource_spec argument. Very basic implementation, and we will need to be careful with how `wq_resources` is called and used
remove the annoying '000' nesting level and place all executor output directly in psiflow_internal
- further cleanup execution - redo `ModelEvaluation` + threadpool (workqueue still needs work) - make bash app template for all bash apps. It is now possible to specify where tmpdirs are created through the `tmpdir_root` config option. Also, you can specify whether tmpdirs should be removed after the tasks finish (for debugging purposes) ATM, `ModelTraining` and `ReferenceEvaluation` are moderately broken, most likely
(Re)implementing some logic
- to dynamically scale up MD resources depending on walkers/hamiltonians (capped by the 'max_resource_multiplier' option)
- to decide how many clients to spawn for an MD run to avoid resource oversubsciption ('allow_oversubscription' options)
Also 'log_dfk_tasks' for debugging
figure out how many clients can be used in a simulation
Update ModelTraining - this will be adapted when we update MACE etc. Update ReferenceEvaluation - updated memory_limit - allow creating Reference instances that ask for fewer cores than specified in ReferenceEvaluation (eliminating the need for CP2K/CP2K_small/...)
- update modules to work with the new execution module and syntax - fix tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR overhauls several parts of the
executionmodule, aiming to make future maintenance easier. It contains several components.Config specification
We reworked the configuration syntax quite dramatically. Some internals of the execution environment have also been updated. Below, we provide an overview of the new way to specify a YAML config file.
The YAML config consists of several components, mainly Parsl config options, global psiflowl settings and ExecutionDefinition blocks.
Next, you define various
ExecutionDefinitionblocks. We first introduce the generic syntax.You have the exclusive choice between threadpool/workqueue executors and local/remote execution
The executor types function differently:
WorkQueueExecutorwill (try to) respect total resource allocations - only starting new tasks when resources free up - whileThreadPoolExecutoronly looks at the number of threads in use and can seriously oversubscribe CPU cores or GPUs.An
ExecutionDefinitionblock comes in three flavours (ModelEvaluation,ModelTraining, andReferenceEvaluation) that each work slightly differently and accept some custom keys..ModelEvaluation
timeout: float,max_resource_multiplier: int,allow_oversubscription: boolThe
timeoutparameter specifies how long the i-Pi server will wait before assuming a calculation has died. This stops the simulation and then cleans up nicely. Useful if you expect unstable simulations, but choosing it too short might end simulations prematurely.Because i-Pi MD can combine any number of walkers (e.g., in replica exchange or PIMD) with any number of Hamiltonians (e.g., in thermodynamic integration),
ModelEvaluationwill try to scale computational resources as needed (see table below). Assume we specify:WQ will assign one core to a basic MD simulation (one walker, one hamiltonian), which means 8 independent simulations fit on our machine simultaneously.
For coupled walkers, WQ will multiply the resource assignment accordingly, without exceeding available resources. Similar behaviour holds for simulations with multiple MACE Hamiltonians. You can limit this multiplicative factor by specifying
max_resource_multiplierin theModelEvaluationblock.By default, psiflow will spawn one client for every walker-hamiltonian pair (so
walkersXhamiltoniansin total). This can lead to serious resource oversubscription, which is usually undesirable (i.e., clients fighting for CPU cores and reducing simulation speed). You can flip theallow_oversubscriptionflag to false to limit the maximum number of i-Pi clients in a simulation.The table below illustrates this idea, where M and O represent
max_resource_multiplierandallow_oversubscription.In this example, every client wants to use a single core (
cores_per_task: 1), but the logic is implemented in terms of WQ task 'slots', which is the minimal ratio between*_per_taskdirectives and total resources (eight here).ModelTraining
Used for MLP training. Does not accept any special keys at the moment. This might change when the
psiflow.modelsmodule is overhauled, somewhere in the near future.ReferenceEvaluation
CP2K,GPAW,ORCA-memory_limit(float)In the YAML config,
ReferenceEvaluationblocks need to be named after the corresponding ab initio software. Because these calculations can be very memory hungry, you can specifymemory_limit(in GB). Tasks exceeding that limit will be killed, and psiflow will continue without having other tasks fail. This functionality relies onsystemd-run, which is available on many HPCs (TIER-2, TIER-1, LUMI) but probably not everywhere.Additionally, we added the
n_coresoption to allReferenceclasses. It overrides thecores_per_taskvalue, allowing you to separate 'small' and 'large' ab initio calculations. Suppose the YAML config containsthen all CP2K single-points will be executed on 64 cores. This could be undesirable if you have structures of different sizes. In the psiflow script, you can do
to use your computational resources more efficiently.
psiflow_internal and logging
We restructured the
psiflow_internaldirectory slightly, which should make navigation less cumbersome. Additionally, we implemented a psiflow logger that writes topsiflow.logand logs psiflow-specific messages (configuration warnings, task execution states, etc.). Over time, it should become the main log file to check workflow progress.Debugging
There are several new options to debug your workflow:
tmdir_root: /some/pathandkeep_tmpdirs: truewill keep the working directory of every Parsl task, allowing you to inspect what input and output files it generatesgarbage_collect: falseto make the Parsl DFK store every task record it encounters. You can then uselog_dfk_tasksfrompsiflow.utils.loggingto print an overview of these task records (task name, input dependencies, outputs, ...), allowing you to see which tasks psiflow submits under the hood.WQ priority
(experimental feature)
WQ tasks can be given a priority value that determines in which order WQ schedules them. In psiflow, this logic is hidden from users. However, we implemented a very basic
SetWQPrioritycontext manager, which will set the priority of tasks submitted:This has only been tested superficially and will probably not work in complex
join_appscenarios - where new tasks are defined at some undefined moment in the future.Misc
docker://cp2k/cp2k:2025.2_mpich_x86_64_psmpworks with this PR.TODO
Independent WQ tasks placed on the same machine can fight for resources, leading to slowdowns or worse. We are discussing with WQ how to solve this problem.
#Work Queue: Hardware-level isolation between tasks? cooperative-computing-lab/cctools#4370
psiflow.configmodule needs to be updated (or removed)The new YAML config structure needs new examples
The
psiflow.serializationmodule is becoming a big mess