[Enhancement] Reduce pull/dispatch path allocation via primitive arrays, ThreadLocal reuse, and lambda elimination

### Before Creating the Enhancement Request

- [x] I have confirmed that this should be classified as an enhancement rather than a bug/feature.

### Summary

Reduce allocation in the pull/dispatch path by replacing boxed collections with primitive arrays, reusing DispatchRequest via ThreadLocal, merging mapped file slices, and eliminating CompletableFuture callback lambdas.

### Motivation

JFR profiling on the broker pull/dispatch path reveals several per-message allocation hotspots:

1. **`GetMessageResult`** — stored message offsets as `List<Long>`, boxing every `long` into a `Long` object. Under high pull QPS, this creates thousands of short-lived `Long` objects and `ArrayList` resize overhead per second.

2. **`DispatchRequest`** — a new `DispatchRequest` object is created for every message dispatched to ConsumeQueue/IndexService/TimerWheel. The object has mutable fields that could be reset and reused via ThreadLocal.

3. **`DefaultMappedFile.selectMappedBuffer`** — creates two separate `ByteBuffer` slices for position+size, then wraps them. Can be merged into a single slice operation.

4. **`DefaultMessageStore.putMessage/putMessages`** — wraps `asyncPutMessage` result in a `thenAccept` lambda callback for stats logging. The lambda captures `this` and `beginTime`, creating a closure object per message.

### Describe the Solution You'd Like

1. `GetMessageResult`: replace `List<Long>` with `long[]` + add `addQueueOffset(long)` method. Right-size initial capacity with constructor parameter.
2. `DispatchRequest`: change `final` fields to mutable + add `reset()` method for ThreadLocal reuse.
3. `DefaultMappedFile`: merge dual-slice into single `selectMappedBuffer` operation with cached append slice.
4. `DefaultMessageStore`: remove `thenAccept` callback, inline stats logging into `CommitLog` or caller.
5. `ConsumeQueue`: make `topicQueueKey` a `final` field to avoid per-call computation.

### Describe Alternatives You've Considered

- Use `LongAdder` instead of `long[]` for offsets — not applicable, offsets need ordering.
- Keep `thenAccept` callback but use a static method reference — still captures `this`, doesn't eliminate allocation.
- Use object pool instead of ThreadLocal for DispatchRequest — ThreadLocal is simpler and sufficient for single-threaded dispatch.

### Additional Context

Part of a larger JFR-driven optimization effort. Related PRs: #10443, #10444, #10514, #10524.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Reduce pull/dispatch path allocation via primitive arrays, ThreadLocal reuse, and lambda elimination #10525

Before Creating the Enhancement Request

Summary

Motivation

Describe the Solution You'd Like

Describe Alternatives You've Considered

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Enhancement] Reduce pull/dispatch path allocation via primitive arrays, ThreadLocal reuse, and lambda elimination #10525

Description

Before Creating the Enhancement Request

Summary

Motivation

Describe the Solution You'd Like

Describe Alternatives You've Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions