LR scheduler progress can be inconsistent with dynamic batching in Megatron actor training

## Summary

There seems to be a potential mismatch between LR scheduler budgeting and LR scheduler consumption in the Megatron actor training path when dynamic batching is enabled.

The scheduler budget appears to be computed from nominal/static batch semantics, for example based on `rollout_batch_size`, `ppo_epochs`, `per_device_train_batch_size`, and `gradient_accumulation_steps`.

However, with `use_dynamic_batching_in_train: true`, the actor train batch is first split into token-bounded dynamic micro-batches. These micro-batches are then grouped by `gradient_accumulation_steps`, so the number of scheduler-consuming optimizer updates can scale roughly as `ceil(num_dynamic_micro_batches / gradient_accumulation_steps)`, rather than being determined only by the nominal sample-level batch formula.

As a result, changing dynamic batching behavior may change how fast the LR scheduler consumes its step budget, even when the intended logical training batch semantics are unchanged.

## Potential issue

The code seems to mix two notions of training progress:

- scheduler budget: nominal/static sample-level batch progress
- scheduler consumption: actual optimizer update count after token-level dynamic batching and gradient-accumulation grouping

This can cause the LR schedule to progress faster than intended when dynamic batching produces more actual optimizer updates than the nominal static-batch formula expects.

In one long-context agentic RL run, I observed the saved scheduler state progressing far beyond the configured decay horizon: `lr_decay_steps = 2400`, while `num_steps = 8429`.

This example is only meant as a symptom. The underlying concern is more general: it may be unexpected for LR scheduler progress to depend on token-level dynamic batching decisions if the logical training batch configuration is unchanged.

## Expected behavior

Changing `max_tokens_per_microbatch_in_train` should affect memory usage and internal batching, but it should not silently accelerate LR decay relative to the nominal training budget.

The LR scheduler should either explicitly follow the actual optimizer update count after dynamic batching, or follow the nominal/static training progress implied by the user configuration.

If the second behavior is intended, then scheduler progress may need to be normalized when dynamic batching changes the actual number of optimizer updates.

## Possible direction

One possible fix direction is to compute, for each actor train batch:

- `nominal_steps`: expected optimizer update count under static batch semantics
- `actual_steps`: actual scheduler-consuming optimizer update count after dynamic batching
- `scheduler_step_increment = nominal_steps / actual_steps`

Then the scheduler can advance by normalized progress rather than one full step per actual optimizer update.

The exact implementation does not have to use this API literally. The main point is to decouple LR scheduler progress from token-level dynamic batching when the logical training progress is unchanged.

## Questions

Could you clarify the intended LR scheduler semantics for Megatron actor training with dynamic batching?

1. Should the scheduler follow the actual optimizer update count after dynamic batching?
2. Or should it follow the nominal/static training progress from the configured batch semantics?
3. If the latter is intended, would normalizing scheduler progress under dynamic batching be an acceptable fix direction?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LR scheduler progress can be inconsistent with dynamic batching in Megatron actor training #442

Summary

Potential issue

Expected behavior

Possible direction

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

LR scheduler progress can be inconsistent with dynamic batching in Megatron actor training #442

Description

Summary

Potential issue

Expected behavior

Possible direction

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions