Skip to content

[Enhancement] Reduce pull/dispatch path allocation via primitive arrays, ThreadLocal reuse, and lambda elimination #10525

Description

@wang-jiahua

Before Creating the Enhancement Request

  • I have confirmed that this should be classified as an enhancement rather than a bug/feature.

Summary

Reduce allocation in the pull/dispatch path by replacing boxed collections with primitive arrays, reusing DispatchRequest via ThreadLocal, merging mapped file slices, and eliminating CompletableFuture callback lambdas.

Motivation

JFR profiling on the broker pull/dispatch path reveals several per-message allocation hotspots:

  1. GetMessageResult — stored message offsets as List<Long>, boxing every long into a Long object. Under high pull QPS, this creates thousands of short-lived Long objects and ArrayList resize overhead per second.

  2. DispatchRequest — a new DispatchRequest object is created for every message dispatched to ConsumeQueue/IndexService/TimerWheel. The object has mutable fields that could be reset and reused via ThreadLocal.

  3. DefaultMappedFile.selectMappedBuffer — creates two separate ByteBuffer slices for position+size, then wraps them. Can be merged into a single slice operation.

  4. DefaultMessageStore.putMessage/putMessages — wraps asyncPutMessage result in a thenAccept lambda callback for stats logging. The lambda captures this and beginTime, creating a closure object per message.

Describe the Solution You'd Like

  1. GetMessageResult: replace List<Long> with long[] + add addQueueOffset(long) method. Right-size initial capacity with constructor parameter.
  2. DispatchRequest: change final fields to mutable + add reset() method for ThreadLocal reuse.
  3. DefaultMappedFile: merge dual-slice into single selectMappedBuffer operation with cached append slice.
  4. DefaultMessageStore: remove thenAccept callback, inline stats logging into CommitLog or caller.
  5. ConsumeQueue: make topicQueueKey a final field to avoid per-call computation.

Describe Alternatives You've Considered

  • Use LongAdder instead of long[] for offsets — not applicable, offsets need ordering.
  • Keep thenAccept callback but use a static method reference — still captures this, doesn't eliminate allocation.
  • Use object pool instead of ThreadLocal for DispatchRequest — ThreadLocal is simpler and sufficient for single-threaded dispatch.

Additional Context

Part of a larger JFR-driven optimization effort. Related PRs: #10443, #10444, #10514, #10524.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions