GitHub - enlorenz/powersched

powersched

python ./checkenv.py - Check the environment.

python ./testenv.py - Short training run.

python ./train.py - Infinite training run with tensorboard logs and intermediate models save.

python ./train_iter.py - Sequentially launch ./train.py with different weights.

./train.py accepts --render argument with "human" or "none" ("none" is the default). "none" trains silently, while "human" runs intentionally slower, adds some debug output and graph output after each episode.

Curriculum Training

The current training setup is intentionally curriculum-based. The target behavior is not merely "use fewer nodes" or "wait longer"; it is the more specific policy:

execute little work during expensive hours,
defer safely while cheap hours are still ahead,
then clear backlog aggressively during cheap hours,
while keeping overdue backlog and job loss near zero.

This is now encoded directly in the environment and reward design:

the agent sees a 24h price forecast window,
cheap-hour execution is rewarded and expensive-hour execution is penalized,
cheap hours penalize under-service when backlog exists,
overdue backlog after the 24h grace period becomes intrinsically bad,
and end-of-episode pending and overdue metrics make "saving money by not serving work" visible.

The practical reason for using a curriculum instead of only training longer is that the full problem has several easy but wrong local optima:

serve immediately and ignore price timing,
trickle a small amount of work continuously,
or over-defer until backlog becomes unstable.

Those behaviors can produce tolerable short-horizon rewards, so simply running PPO for more steps does not guarantee discovery of the desired defer-then-clear policy. The curriculum reduces variance and improves credit assignment by first teaching the core phase behavior under deterministic logic prices and only then adding load, burstiness, realistic arrivals, price noise, and finally real prices.

Current intended sequence:

Stage A: flat arrivals + logic prices.
Stage B: high-load flat arrivals + logic prices.
Stage C: expensive-half-heavy or bursty arrivals + logic prices.
Stage D: main arrivals + logic prices.
Stage E: main arrivals + noisy logic prices.
Stage F: main arrivals + real prices.

In short: more steps on the full problem mostly improve whatever basin the optimizer already occupies; the curriculum is meant to make the correct basin discoverable first.

For a more formal write-up, see analysis/curriculum_argument.md.

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
.github/workflows		.github/workflows
data		data
src		src
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
analyze_arrivalscale_occupancy.py		analyze_arrivalscale_occupancy.py
analyze_jobs.py		analyze_jobs.py
analyze_lambda_occupancy.py		analyze_lambda_occupancy.py
analyze_seed_occupancy.py		analyze_seed_occupancy.py
plot_reward_shapes_alt.py		plot_reward_shapes_alt.py
powersched.def		powersched.def
requirements.txt		requirements.txt
rewards.md		rewards.md
train.py		train.py
train_iter.py		train_iter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

powersched

Curriculum Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

powersched

Curriculum Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages