Skip to content

hyperpipe: parsimonious mode + hyperpipe driver#132

Merged
oshaughn merged 25 commits into
oshaughn:rift_O4dfrom
oshaughnessy-junior:rift_O4d_junior_parsimonious
May 15, 2026
Merged

hyperpipe: parsimonious mode + hyperpipe driver#132
oshaughn merged 25 commits into
oshaughn:rift_O4dfrom
oshaughnessy-junior:rift_O4d_junior_parsimonious

Conversation

@oshaughn
Copy link
Copy Markdown
Owner

  • hyperpipe driver script
  • parsimonious placement options, instead of draws from full posterior, for very costly models

Richard O'Shaughnessy and others added 25 commits May 12, 2026 11:56
Build a hyperpipeline analog to util_RIFT_pseudo_pipe.py with first-class
support for:

  * ini/Hydra-based configuration (default: bin/hyperpipe_conf.yaml,
    schema documented in RIFT/hyperpipe/config.py)
  * flexible multi-event input via the 'marg-list:' section --- one entry
    per (likelihood driver, event), heterogeneous batch sizes per entry,
    optional per-entry coord-module override
  * CIP-mirror coordinate-transformation framework
    (RIFT/hyperpipe/coords.py): coord modules emit
    --supplementary-coordinate-code, --parameter,
    --integration-parameter-range, --parameter-implied,
    --parameter-nofit, plus the supplementary-likelihood-factor trio
  * marg-driver toolkit (RIFT/hyperpipe/drivers) so downstream users
    don't re-derive the fragile argument strings / output-file names /
    contract points the hyperpipe needs them to honor

The legacy util_RIFT_hyperpipe.py shipped with the Gaussian demo was a
~200-line hand-rolled prototype; it forgot --test-args, --general-retries,
--general-request-disk, --request-memory-marg, --eos-post-explode-jobs,
--use-full-submit-paths, and passed a single --event-file flag pointing
at a list-of-events file (broken for multi-event analyses). The rewrite
delegates to the new RIFT.hyperpipe package and emits all of these
correctly, one --event-file per marg-list entry.

A pixi-based live test suite lives at test/hyperpipe/ (full lalsuite
stack in the pixi env). RIFT_ROOT is auto-detected from the test's own
location, so the suite works from any RIFT clone without per-user
configuration.

New
  RIFT/hyperpipe/{__init__,config,coords,marg_list}.py
  RIFT/hyperpipe/drivers/{__init__,base,gaussian}.py
  bin/util_HyperMargGaussian.py     console-script shim
  bin/hyperpipe_conf.yaml           default Hydra config
  test/hyperpipe/{pixi.toml,README.md}
  test/hyperpipe/tests/{conftest,standalone_check,demo_dryrun}.py
  test/hyperpipe/tests/test_{coords,marg_list,config,drivers,hydra_integration}.py

Modified
  RIFT/__init__.py                  added 'hyperpipe' to __all__
  bin/util_RIFT_hyperpipe.py        full rewrite, thin Hydra entry

Test
  cd MonteCarloMarginalizeCode/Code/test/hyperpipe
  pixi install && pixi run test
…preview)

Parsimonious-placement project: tracer-particle iterative grid update
intended to reduce the number of expensive likelihood evaluations
required at each adaptive iteration.

New code:
  - RIFT.misc.tracer_placement: engine package with SMC+MALA, birth-death,
    and SMC-MALA+BD samplers; pluggable RF / RBF / polynomial / quadratic
    fits via tracer_placement.fits.build(method, X, Y, sigma=None).
  - bin/util_HyperparameterTracerUpdate.py: drop-in alternative to
    util_HyperparameterPuffball.py for the hyperpipeline.
  - bin/util_ParameterTracerUpdate.py: drop-in alternative to
    util_ParameterPuffball.py for the event-level iterator.

Both tools refit their own surrogate from the same .dat file CIP reads
(no CIP coupling, no surrogate-pickle transfers, no DAG-generator
changes required for the basic drop-in workflow). They also accept
--update-method puffball as an exact-regression fallback so the
behaviour reduces to the existing puffball when needed.

Default pipeline behaviour is unchanged: nothing invokes the new
tools unless the user points --puff-exe at them.

Background / theory notes / Tier-1 prototype results / rollout plan
live in the parsimonious-placement project folder
(20260513-Me-ParsimoniousPlacementOptions/parsimonious_placement_plan.md).
A follow-up commit adds the create_eos_posterior_pipeline patch
for the optional --tracer-only-marg cost-halving workflow.
Optional workflow for use with the tracer-placement tools added in the
previous commit (util_HyperparameterTracerUpdate.py /
util_ParameterTracerUpdate.py). Two new CLI flags:

  --tracer-only-marg
      Skip MARG_* (grid-X.dat) nodes on non-final iterations; only
      MARG_PUFF_* (grid_puff-X.dat) runs. Halves the per-iteration
      likelihood-evaluation cost when the puff'd grid is itself
      already a posterior-tracking ensemble (which it is under
      tracer placement).

  --tracer-final-marg-iterations N        (default 1)
      Number of trailing iterations on which MARG still runs, so the
      final-posterior diagnostic stays based on a cleanly-sampled
      grid. Set to 0 to skip MARG entirely.

Gate is conservative: iteration 0 always runs MARG (no grid_puff-0.dat
exists yet), and the last N iterations always run MARG. With the
default N=1 and 5 total iterations, MARG runs on iters {0, 4} and is
skipped on {1, 2, 3} -- a 1.8x reduction in MARG-equivalent
evaluations.

Default behaviour is unchanged: --tracer-only-marg is False unless
explicitly set, so the gate collapses to a no-op for every existing
pipeline. The only structural change in the DAG generator is a
conditional around the MARG_* node-creation block in the iteration
loop (with a print() noting the skip for easy log diagnosis). No
changes to MARG_PUFF wiring, consolidation, EOS-post, or anything else.

Rationale and worked example in
20260513-Me-ParsimoniousPlacementOptions/rift_integration/workflow_tracer_only_marg.md.
…preview)

Parsimonious-placement project: tracer-particle iterative grid update
intended to reduce the number of expensive likelihood evaluations
required at each adaptive iteration.

New code:
  - RIFT.misc.tracer_placement: engine package with SMC+MALA, birth-death,
    and SMC-MALA+BD samplers; pluggable RF / RBF / polynomial / quadratic
    fits via tracer_placement.fits.build(method, X, Y, sigma=None).
  - bin/util_HyperparameterTracerUpdate.py: drop-in alternative to
    util_HyperparameterPuffball.py for the hyperpipeline.
  - bin/util_ParameterTracerUpdate.py: drop-in alternative to
    util_ParameterPuffball.py for the event-level iterator.

Both tools refit their own surrogate from the same .dat file CIP reads
(no CIP coupling, no surrogate-pickle transfers, no DAG-generator
changes required for the basic drop-in workflow). They also accept
--update-method puffball as an exact-regression fallback so the
behaviour reduces to the existing puffball when needed.

Default pipeline behaviour is unchanged: nothing invokes the new
tools unless the user points --puff-exe at them.

Background / theory notes / Tier-1 prototype results / rollout plan
live in the parsimonious-placement project folder
(20260513-Me-ParsimoniousPlacementOptions/parsimonious_placement_plan.md).
A follow-up commit adds the create_eos_posterior_pipeline patch
for the optional --tracer-only-marg cost-halving workflow.
Optional workflow for use with the tracer-placement tools added in the
previous commit (util_HyperparameterTracerUpdate.py /
util_ParameterTracerUpdate.py). Two new CLI flags:

  --tracer-only-marg
      Skip MARG_* (grid-X.dat) nodes on non-final iterations; only
      MARG_PUFF_* (grid_puff-X.dat) runs. Halves the per-iteration
      likelihood-evaluation cost when the puff'd grid is itself
      already a posterior-tracking ensemble (which it is under
      tracer placement).

  --tracer-final-marg-iterations N        (default 1)
      Number of trailing iterations on which MARG still runs, so the
      final-posterior diagnostic stays based on a cleanly-sampled
      grid. Set to 0 to skip MARG entirely.

Gate is conservative: iteration 0 always runs MARG (no grid_puff-0.dat
exists yet), and the last N iterations always run MARG. With the
default N=1 and 5 total iterations, MARG runs on iters {0, 4} and is
skipped on {1, 2, 3} -- a 1.8x reduction in MARG-equivalent
evaluations.

Default behaviour is unchanged: --tracer-only-marg is False unless
explicitly set, so the gate collapses to a no-op for every existing
pipeline. The only structural change in the DAG generator is a
conditional around the MARG_* node-creation block in the iteration
loop (with a print() noting the skip for easy log diagnosis). No
changes to MARG_PUFF wiring, consolidation, EOS-post, or anything else.

Rationale and worked example in
20260513-Me-ParsimoniousPlacementOptions/rift_integration/workflow_tracer_only_marg.md.
…AG-gen

Hydra-level integration of the parsimonious-placement workflow into the
hyperpipeline driver. Previously a tracer run required the user to (a) point
puff.exe at util_HyperparameterTracerUpdate.py and (b) escape tracer
hyperparameters through puff.extra-args as a free-form string, and there was
no Hydra-level switch for --tracer-only-marg at all.

This commit makes both first-class:

  * arch.tracer-only-marg          -> --tracer-only-marg
  * arch.tracer-final-marg-iterations
                                   -> --tracer-final-marg-iterations
  * puff.settings.update-method        -> --update-method
  * puff.settings.tracer-fit-method    -> --tracer-fit-method
  * puff.settings.n-mala-steps         -> --n-mala-steps
  * puff.settings.target-ess-frac      -> --target-ess-frac
  * puff.settings.birth-death-rate     -> --birth-death-rate
  * puff.settings.inj-file-prev        -> --inj-file-prev
  * puff.settings.rng-seed             -> --rng-seed
  * puff.settings.{state-in,state-out} -> --state-{in,out}
  * puff.settings.no-union-refit (bool) -> --no-union-refit
  * puff.settings.regularize (bool)    -> --regularize

Implementation:
  * _build_puff_args(): extend the puff section to consume an optional
    puff.settings: sub-block, mirroring the pattern post.settings: already
    uses. Null / empty values are skipped; bool toggles pass through truthy().
    Legacy puff.extra-args still appended verbatim.
  * cmd_parts assembly: append --tracer-only-marg (and the optional
    --tracer-final-marg-iterations N) when arch.tracer-only-marg is truthy.
    Mirrors the existing explode-marg-jobs handling.

The default hyperpipe_conf.yaml (and the in-package DEFAULT_CONFIG_YAML
schema in RIFT/hyperpipe/config.py) gain the new keys with safe defaults
(tracer-only-marg: false, all puff.settings values null) so existing user
configs are unchanged in behavior.

An example user config is added:
  demo/hyperpipe/hyperpipe_conf_tracer.yaml
demonstrating an end-to-end tracer + tracer-only-marg run on the Gaussian
toy. Selectable via:
  util_RIFT_hyperpipe.py --config-name hyperpipe_conf_tracer

Tested via stand-alone simulation of _build_puff_args and the cmd_parts
forwarding (no actual Hydra invocation needed):
  legacy config -> no tracer flags emitted (back-compat)
  tracer config -> --update-method smc-mala-bd --tracer-fit-method rf ...
                   plus --tracer-only-marg --tracer-final-marg-iterations 1
Merge branch 'rift_O4d_parsimonious_v2' into rift_O4d_junior_parsimonious
Closes the pathway gap that prevented H1 from working end-to-end: the
tracer was wedged into the puff slot (post EOS_POST + JOIN_POST) and was
reading grid-{k+1}.dat -- which is EOS_POST posterior samples with a
stale "# lnL sigma_lnL ..." header, not the ILE-evaluated (lambda, lnL)
table the tracer needs. The tracer would silently succeed with garbage
inputs (column 0 / column 1 in the posterior-samples file are arbitrary,
not lnL / sigma) and produce a placement that contained zero information
about the actual likelihood. This commit fixes the data flow, adds an
acquisition-function placement pathway for the conservative-deployment
regime, and resolves a leftover merge conflict in
create_eos_posterior_pipeline.

Also resolves a merge-conflict marker at lines 45-53 of
create_eos_posterior_pipeline left over from the
rift_O4d_parsimonious_v2 merge.

---- pipeline rewiring -----------------------------------------------

create_eos_posterior_pipeline gains --puff-input-source {posterior,
marg_net}, default posterior (legacy behavior preserved bit-for-bit).

  posterior (legacy): puff reads grid-{k+1}.dat (= EOS_POST posterior
    samples via JOIN_POST), writes grid_puff-{k+1}.dat, MARG_PUFF
    re-evaluates the puffed grid. Sensible for the baseline puffball
    since the puffball doesn't need lnL inputs.

  marg_net (tracer): puff/tracer node consumes all.marg_net (cumulative
    ILE-evaluated table emitted by UNIFY) and writes grid-{k+1}.dat
    directly. EOS_POST still runs each iteration but writes to
    posterior-{k+1}.dat (downstream-free, diagnostic only). The
    MARG_PUFF lane is suppressed entirely -- the tracer's chosen grid
    IS the next MARG input. One MARG lane per iteration in this mode,
    matching the expensive-ILE design.

The puff node's parent is rewired off unify_node (not parent_fit_node-
after-EOS_POST) in marg_net mode, so tracer + EOS_POST run in parallel.

Also: puff nodes are now gated on --puff-exe AND --puff-args together.
--puff-args without --puff-exe is rejected with a clear error rather
than silently dropping nodes.

---- tracer tool -----------------------------------------------------

util_HyperparameterTracerUpdate.py

  * --force-away is now wired through. After the engine returns X_out,
    a Mahalanobis decimation pass over the input grid's covariance
    drops near-duplicates. K=0 disables. Mirrors PUFF semantics --
    self-avoidance was missing before; SMC-MALA-BD on its own matches
    *density* (log rho_hat - log pi), not pairwise separation, so two
    particles could sit arbitrarily close. In the expensive-ILE regime
    near-duplicates are pure waste.

  * --update-method gains "ucb" (in addition to smc-mala-bd, smc-mala,
    birth-death, puffball).

  * --ucb-kappa FLOAT (default 2.0): UCB exploration weight in
    score(x) = mu(x) + kappa * sigma(x).

  * --ucb-n-candidates INT (default 20000): pool size for greedy batch
    selection.

---- engine: UCB + uncertainty plumbing ------------------------------

RIFT/misc/tracer_placement/

  fits/_base.py
    + predict_with_std(Z) -> (mean, std). Default: (predict(Z), zeros).
    + class flag has_uncertainty (default False).
    + class flag smooth_gradient (default True). Tells UCB which local
      polish to use.

  fits/_rf.py
    + predict_with_std returns tree-disagreement std. Not a calibrated
      posterior, but qualitatively correct: large where trees disagree
      (unexplored), small where they agree (well-sampled). Sufficient
      for UCB at zero added cost.
    + has_uncertainty = True, smooth_gradient = False.

  samplers/ucb.py  (new)
    Acquisition-function placement:
      1. Build candidate pool: 25% jittered current particles +
         75% uniform draws from the prior box.
      2. Score = mu + kappa * sigma.
      3. Greedy descending-score select with Mahalanobis distance
         self-avoidance (population-covariance metric).
      4. Local polish per point: gradient ascent for smooth fits
         (RBF / quadratic / GP), coordinate-hop for piecewise-constant
         fits (RF). Strategy selected from FitBase.smooth_gradient.
    Default kappa = 2.0, n_candidates = 20000, polish_steps = 20.
    No kappa tapering, no PUFF reactivation -- keeps the conservative
    pathway minimal for the expensive-simulation regime where there
    is no resource budget for posterior sampling on top.

  samplers/__init__.py: export ucb_place.

When the surrogate has no uncertainty estimate (e.g. quadratic fit),
UCB warns the user and degenerates to greedy mean-maximization rather
than failing. RF is the recommended fit method.

---- backwards compatibility -----------------------------------------

  * --puff-input-source defaults to posterior; legacy DAGs are
    unchanged.
  * --update-method puffball still reproduces puff exactly.
  * existing smc-mala-bd / smc-mala / birth-death paths untouched.
  * RF fit's predict() is unchanged; new method is additive.

---- testing ---------------------------------------------------------

Engine and tracer tool exercised end-to-end via the CLI on a hand-built
.dat: smc-mala-bd, ucb, puffball methods all produce valid output files
with the correct header / column count. UCB-on-quadratic correctly
warns about no-uncertainty and degenerates to greedy mean-max.

Full DAG generation could not be smoke-tested in the dev sandbox
(scipy not available) but the pipeline parses cleanly and the
patched branches were inspected at the right line ranges.
Two related additions on top of commit 04 (pathway fix + UCB):

  (1) The convergence test (convergence_test_samples.py) is now wired up
      correctly in tracer mode. In legacy puff mode it compared
      grid-{k+1}.dat vs grid-{k}.dat -- both held posterior samples back
      then. In tracer mode grid-{k}.dat is a *placement* grid (lnL
      columns zero); the actual posterior lives in posterior-{k}.dat.
      The test now points at posterior-{k}.dat / posterior-{k+1}.dat in
      tracer mode automatically. No new CLI flag needed; the routing
      follows --puff-input-source.

  (2) The Hydra driver (util_RIFT_hyperpipe.py) now forwards every
      tracer- and convergence-related flag that commit 03's message
      promised but did not actually deliver:

         arch.tracer-only-marg              -> --tracer-only-marg
         arch.tracer-final-marg-iterations  -> --tracer-final-marg-iterations
         puff.input-source                  -> --puff-input-source
         puff.settings.update-method        -> --update-method
         puff.settings.tracer-fit-method    -> --tracer-fit-method
         puff.settings.ucb-kappa            -> --ucb-kappa
         puff.settings.ucb-n-candidates     -> --ucb-n-candidates
         puff.settings.n-mala-steps         -> --n-mala-steps
         puff.settings.target-ess-frac      -> --target-ess-frac
         puff.settings.birth-death-rate     -> --birth-death-rate
         puff.settings.inj-file-prev        -> --inj-file-prev
         puff.settings.rng-seed             -> --rng-seed
         puff.settings.{state-in,state-out} -> --state-{in,out}
         puff.settings.no-union-refit       -> --no-union-refit (bool)
         puff.settings.regularize           -> --regularize (bool)
         test.settings.iteration-threshold  -> --iteration-threshold
         test.settings.write-file-on-success-> --write-file-on-success
         test.settings.test-output          -> --test-output
         test.settings.always-succeed       -> --always-succeed (bool)
         test.settings.verbose              -> --verbose (bool)

      Defaults are null/false throughout so legacy configs are byte-
      identical in their generated DAG.

---- create_eos_posterior_pipeline ----------------------------------

In tracer mode (puff_uses_marg_net), point the convergence test's
--samples files at posterior-{k}.dat / posterior-{k+1}.dat instead of
grid-{k}.dat / grid-{k+1}.dat. The DAG topology is otherwise unchanged:
test_node is still inserted between JOIN_POST and the puff/tracer node,
so it depends on the EOS_POST output (which is what we're checking the
convergence of) and gates the next placement.

---- util_RIFT_hyperpipe.py -----------------------------------------

* _build_puff_args(): consume an optional puff.settings: sub-block,
  mirroring the existing post.settings: pattern. Null values are
  skipped; bool toggles use hyper_config.truthy(). All keys are
  additive (legacy puffball ignores them).

* _build_test_args(): consume an optional test.settings: sub-block.
  Same pattern.

* cmd_parts assembly: append --tracer-only-marg,
  --tracer-final-marg-iterations, and --puff-input-source from the
  arch and puff blocks. All conditional on the user setting the
  corresponding key (null = omit the flag, preserving legacy behavior).

---- hyperpipe_conf.yaml --------------------------------------------

Schema additions (all null/false defaults):

  arch.tracer-only-marg
  arch.tracer-final-marg-iterations
  puff.input-source
  puff.settings (12 keys, see commit message body)
  test.settings (5 keys)

---- demo/hyperpipe/hyperpipe_conf_tracer.yaml (new) ---------------

A worked example showing the tracer workflow end-to-end. Uses
util_HyperparameterTracerUpdate.py with update-method=smc-mala-bd and
puff.input-source=marg_net. Includes the convergence test with
always-succeed=true (keep DAG running through full diagnostic sweep).

Run:
  cd MonteCarloMarginalizeCode/Code/demo/hyperpipe
  util_RIFT_hyperpipe.py --config-name hyperpipe_conf_tracer

Override for UCB:
  util_RIFT_hyperpipe.py --config-name hyperpipe_conf_tracer \
    puff.settings.update-method=ucb puff.settings.ucb-kappa=2.0

---- testing --------------------------------------------------------

* _build_puff_args / _build_test_args verified by direct invocation
  with three configs (legacy / tracer-smc / tracer-ucb): each emits
  the exact flag set expected, no extraneous flags, legacy unchanged.
* hyperpipe_conf.yaml + hyperpipe_conf_tracer.yaml parse as valid YAML.
* util_RIFT_hyperpipe.py + create_eos_posterior_pipeline AST-clean.
* Full DAG generation could not be smoke-tested in the dev sandbox
  (scipy not available); for verification on host, the demo Makefile
  in rift_integration/demo_hyperpipe_tracer also gained a
  --test-args plumbing in this same edit pass (project-side, not in
  this commit) plus `make convergence` and `make cross-compare`
  post-processing targets.
…iles

Hydra's native CLI requires the user to know about, and supply, two
separate flags (``--config-dir DIR --config-name NAME``) just to point
the driver at a config file that lives anywhere other than the
installed scripts directory. Worse, the second flag rejects a
``.yaml`` extension (Hydra appends one itself), so the obvious
``--config-name myconf.yaml`` fails with a confusing "Cannot find
primary config" error. This commit makes the driver accept what
users actually expect: a single ``--config PATH`` argument that just
works, anywhere on disk.

If you pass

    util_RIFT_hyperpipe.py --config /full/path/to/my_run.yaml ...

the new shim translates the path into the equivalent of

    --config-dir=/full/path/to --config-name=my_run

before Hydra parses sys.argv, and the file is loaded.

Cases handled:

  * Absolute path with .yaml:   --config /tmp/run.yaml
  * Relative path with .yaml:   --config ./d/run.yaml
  * Relative path with .yml:    --config ./d/run.yml      (or -c)
  * Equals form:                --config=/tmp/run.yaml
  * Short form:                 -c /tmp/run.yaml
  * Bare name (no path):        --config myconf
        falls through to Hydra's default search path -- equivalent to
        --config-name myconf.
  * User-mistake .yaml suffix:  --config-name myconf.yaml
        the trailing .yaml is stripped automatically.
  * Hydra-native flags:         --config-dir=/d --config-name=foo
        pass through untouched; the shim is idempotent.

Error handling: if the resolved path does not exist (case A--C--D),
fail with a clear message naming the file we looked for, instead of
letting Hydra emit its generic "Cannot find primary config" search-path
dump.

---- demo/hyperpipe/hyperpipe_conf_tracer.yaml ----------------------

Updated the header comment to recommend the new shim:

    util_RIFT_hyperpipe.py --config /path/to/hyperpipe_conf_tracer.yaml

instead of the old --config-name advice that only worked if the file
was shipped in bin/.

---- bin/hyperpipe_conf_tracer.yaml ---------------------------------

A previous step (rev'd locally; never released in a commit) also
copied hyperpipe_conf_tracer.yaml into bin/ so Hydra's default search
path would find it. With this shim that copy is redundant and
misleading (configs are user-owned, not install-shipped); the rollup
script removes the bin/ copy if present. The canonical documentation
copy stays under demo/hyperpipe/.

---- testing --------------------------------------------------------

The shim was exercised on 10 inputs (absolute, relative, .yaml/.yml
extension, short form, equals form, bare name, --config-name with
.yaml stripped, missing-file error, Hydra-native pass-through,
mixed shim + override). All produced the expected argv translation.
util_RIFT_hyperpipe.py AST-parses cleanly.
Why
---
create_eos_posterior_pipeline reads --eos-post-args, --puff-args, and
--test-args via:

    args = ' '.join(file_contents)
    args = ' '.join(args.split(' ')[1:])    # <-- DROPS FIRST TOKEN

This is the same first-token-drop convention used by the legacy CIP /
EOS pipeline driver (where the first token is conventionally the
executable name). The hyperpipe driver was writing real flags as the
first token, so create_eos_posterior_pipeline silently ate them:

    args_test.txt:     "--parameter x ..."   ->  pipeline sees "x ..."
    args_eos_post.txt: "--parameter x ..."   ->  pipeline sees "x ..."
    args_puff.txt:     "--force-away 0.25 ..." -> pipeline sees "0.25 ..."

For args_test.txt this manifested as convergence_test_samples receiving
its first --parameter eaten, breaking the JS / lame comparison setup.

Fix
---
Prepend a single dummy character "X" to each of the three args files at
write time. Done at the write site (not inside _build_*_args) because
the dummy is a pipeline-transport concern, not a logical-args concern.

The marg-list args file is *not* touched: create_eos_posterior_pipeline
reads it via --marg-event-args-list-file, which preserves each line
verbatim (no split[1:]).

Files
-----
  M  MonteCarloMarginalizeCode/Code/bin/util_RIFT_hyperpipe.py
The fix
-------
Earlier commits implemented tracer mode backwards: they skipped MARG on
intermediate iterations and kept MARG_PUFF, when the intent is the
opposite -- MARG should run every iteration so the tracer always has a
fresh all.marg_net to consume, and the MARG_PUFF lane (the redundant
evaluation of a separately-puffed grid) is what should be suppressed.

This commit:

  * create_eos_posterior_pipeline:
      - Deletes the _skip_marg_this_iter block. MARG_* nodes are added
        on every iteration.
      - Wraps both MARG_PUFF .sub-file writers (single-event and
        multi-event paths) in `if puff_args and not puff_uses_marg_net`
        so MARG_PUFF leaves no trace in tracer mode (.sub file or DAG
        node).
      - --tracer-only-marg / --tracer-final-marg-iterations kept as
        deprecated no-ops for back-compat. Help text updated.

  * bin/hyperpipe_conf.yaml:
      - Drops the arch.tracer-only-marg + arch.tracer-final-marg-iterations
        keys.

  * bin/util_RIFT_hyperpipe.py:
      - Drops the cmd_parts plumbing for the two deprecated flags. The
        driver no longer forwards --tracer-only-marg / --tracer-final-
        marg-iterations even if a user still sets them in yaml.

  * demo/hyperpipe/hyperpipe_conf_tracer.yaml:
      - Removes arch.tracer-only-marg + arch.tracer-final-marg-iterations.
      - Adds puff.input-source: marg_net (the actual tracer-pathway
        switch -- without this the pipeline falls back to legacy
        posterior wiring + MARG_PUFF lane).
      - Adds general.request-memory: 200 (toy demo footprint; the
        16 GB GW-PE default was way too heavy for a 3-D Gaussian).
      - Updates the example CLI to use the new --config PATH shim.
      - Rewrites the header comment to describe the actual workflow:
        MARG every iteration; tracer consumes all.marg_net.

Why the symptom went away
-------------------------
Before: "tracer-only-marg: skipping MARG_* nodes for iteration ..." was
printed every intermediate iteration, then MARG_PUFF nodes tried to
read grid_puff-{k}.dat -- which was never written because PUFF either
did not run (no --puff-input-source marg_net) or wrote elsewhere.

After: MARG runs every iteration -> all.marg_net is always populated ->
tracer consumes it -> writes grid-{k+1}.dat directly. No MARG_PUFF
anywhere.

Files
-----
  M  MonteCarloMarginalizeCode/Code/bin/create_eos_posterior_pipeline
  M  MonteCarloMarginalizeCode/Code/bin/hyperpipe_conf.yaml
  M  MonteCarloMarginalizeCode/Code/bin/util_RIFT_hyperpipe.py
  M  MonteCarloMarginalizeCode/Code/demo/hyperpipe/hyperpipe_conf_tracer.yaml
Why
---
PUFF.sub was being written with whatever the user passed on
--puff-exe, typically a bare command name like
"util_HyperparameterTracerUpdate.py". Condor on some test machines
does not search $PATH when launching local-universe jobs, so PUFF
fails to start while every other job in the DAG (MARG, MARG_PUFF,
EOS_POST_worker, JOIN_POST, TEST) runs fine -- those resolve their
executables via `which()` before handing them to dag_utils.

Fix
---
Mirror the eospost_exe resolution pattern: if opts.puff_exe is a
relative name, run `dag_utils.which` and store the absolute path in
exe_here. If it's already absolute, leave it alone. The OSG branch
is unchanged (still uses `which` for its own relevant_path).

Files
-----
  M  MonteCarloMarginalizeCode/Code/bin/create_eos_posterior_pipeline
Why
---
util_HyperparameterTracerUpdate.py builds its internal prior_box from
the DATA bounding box (X.min, X.max) + 10% pad, NOT from the user's
prior. SMC-MALA / SMC-MALA-BD then clip particle proposals to that
prior_box. So with a narrow initial grid -- e.g. the demo's
blind_gaussian_3d_xy_plus.dat which seeds a corner of
[-5,-2]x[2,5]x[2,5] inside a [-7,7]^3 prior -- particles literally
cannot escape the convex hull of the initial grid.

Earlier project-folder Makefile tests (T1_init_grid.dat) worked only
because that seed grid already spanned the full [-7,7]^3 prior; the
"convex hull confinement" was masked by the broad seed.

Fix
---
In _build_puff_args, after emitting --force-away / --puff-factor /
--parameter via coord_spec.to_puff_args(), append one
--downselect-parameter NAME --downselect-parameter-range [LO,HI] pair
per fitting parameter, pulling (LO,HI) from coord_spec.parameter_ranges
(i.e. yaml post.coords-sample).

The tracer drop-ins (util_*TracerUpdate.py) read these to widen
prior_box from data-bbox to user-prior. The legacy puffball
(util_HyperparameterPuffball.py) accepts the same flags and uses them
as a post-puff downselect mask -- behaviour is byte-identical there as
long as the initial grid already lives inside coords-sample, which is
always the case.

Range values are emitted unquoted ("[-7,7]" not "'[-7,7]'") because
create_eos_posterior_pipeline wraps any [..] in single quotes when
staging args_puff.txt for Condor (lines 191-192 of that file).

Verification
------------
With a 1000-point seed grid confined to x:[-5,-2], y:[2,5], z:[2,5]
and yaml coords-sample x:[-7,7] y:[-7,7] z:[-7,7]:
  before this patch: prior_box = [-5.3,-1.7] x [1.7,5.3] x [1.7,5.3]
                     particles stuck inside seed-grid hull
  after  this patch: prior_box = [-7,7]^3
                     particles free to explore the full prior

Files
-----
  M  MonteCarloMarginalizeCode/Code/bin/util_RIFT_hyperpipe.py
…est dirs

Bug
---
In the directory-creation loop the outer iteration variable `indx` was
being reassigned to 0 inside the body and reused as the per-event
counter:

    for indx in np.arange(it_start, opts.n_iterations+1):
        ...
        indx = 0                                  # CLOBBER
        for event in opts.event_file:
            mkdir(ile_dir + "/event_{}".format(indx))
            indx += 1
        ...
        if opts.test_args:
            test_dir = ".../iteration_" + str(indx) + "_test"   # WRONG
            mkdir(test_dir); mkdir(test_dir+'/logs')

By the time test_dir is built, indx == len(opts.event_file) (typically
1 for the hyperpipe demo). So instead of getting iteration_0_test ...
iteration_N_test, the pipeline creates iteration_1_test once per outer
iteration (which is a no-op after the first), and the test nodes for
iterations other than 1 land in directories that do not exist. With
test_args wired in tracer mode the convergence-test jobs then either
fail to start (initialdir missing) or write log files into a non-
existent path.

The outer-loop control flow itself was unaffected, because
`for indx in np.arange(...)` reassigns indx at the top of each
iteration regardless.

Fix
---
Rename the outer loop variable to `it_indx` (the actual semantic) and
the per-event counter to `event_indx` via enumerate(). Test directory
naming now uses it_indx, matching the iteration-numbered _marg / _post
/ _con dirs.

Files
-----
  M  MonteCarloMarginalizeCode/Code/bin/create_eos_posterior_pipeline
Same fix as commit 09 (PUFF) applied to the convergence-test job:
test.sub was being written with whatever the user passed on --test-exe,
typically the bare command name "convergence_test_samples". Condor on
some test machines does not search $PATH for local-universe jobs, so
test jobs fail to start while MARG / EOS_POST / PUFF all run fine.

Mirror the eospost_exe pattern: if opts.test_exe is a relative name,
resolve via dag_utils.which() and pass the absolute path to
write_test_sub. Absolute paths pass through untouched.

Files
-----
  M  MonteCarloMarginalizeCode/Code/bin/create_eos_posterior_pipeline
Two breakages
-------------
1. `make rundir` ran `util_RIFT_hyperpipe.py` with no `--config`, so
   Hydra loaded the install-shipped bin/hyperpipe_conf.yaml -- which has
   `init.file: null` and `general.rundir: null`, so
   `RIFT.hyperpipe.config.validate_config` rejects it before any DAG is
   written. The user's local demo yaml was never read.

2. The local demo `hyperpipe_conf.yaml` had two keys with spaces instead
   of hyphens:
     - arch.explode marg jobs   -> arch.explode-marg-jobs
     - puff.settings.puff factor -> puff.puff-factor
                                    (also at the wrong nesting level;
                                    `puff-factor` is a top-level puff field
                                    consumed by coord_spec.to_puff_args(),
                                    not a puff.settings key.)

   So even if (1) were fixed, `explode-marg-jobs` and `puff-factor` would
   silently default. The remaining defaults are fine for the toy demo.

Fix
---
  * Makefile: `rundir` now invokes `util_RIFT_hyperpipe.py --config
    hyperpipe_conf.yaml`, matching the `rundir_tracer` target style.
  * demo hyperpipe_conf.yaml: normalised indentation, renamed the
    spaces-in-keys, moved puff-factor / force-away to the canonical
    top-level puff: location, dropped puff.settings (which was the only
    survivor and was the wrong nesting in the first place), added
    `general.request-memory: 200` (the GW-PE-tuned 16 GB default is
    overkill for the 3-D Gaussian toy and was forcing every MARG node
    into 16 GB Condor slots), added a brief header comment.

Files
-----
  M  MonteCarloMarginalizeCode/Code/demo/hyperpipe/Makefile
  M  MonteCarloMarginalizeCode/Code/demo/hyperpipe/hyperpipe_conf.yaml
@oshaughn
Copy link
Copy Markdown
Owner Author

Tested on cardassia. Largely orthogonal to any production code.

@oshaughn oshaughn merged commit 8224382 into oshaughn:rift_O4d May 15, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants