Skip to content

[ISSUE #10512] Eliminate per-RPC allocation in RemotingCommand (Guava Stopwatch, Constructor copy) and downgrade Netty writability log#10514

Open
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/remoting-command-microfixes
Open

[ISSUE #10512] Eliminate per-RPC allocation in RemotingCommand (Guava Stopwatch, Constructor copy) and downgrade Netty writability log#10514
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/remoting-command-microfixes

Conversation

@wang-jiahua

@wang-jiahua wang-jiahua commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Which Issue(s) This PR Fixes

Fixes #10512

Brief Description

Four independent per-RPC micro-allocations eliminated in the remoting framework:

  1. Guava Stopwatch → System.nanoTime()RemotingCommand used Stopwatch.createStarted() which allocates a new object per RPC. Replaced with long processTimerNanos + processTimerElapsedMs().
  2. Constructor cacheClass.getDeclaredConstructor() copies the Constructor object on every call. Added ConcurrentHashMap<Class<?>, Constructor<?>> HEADER_CTOR_CACHE with computeIfAbsent.
  3. Netty writability log downgradechannelWritabilityChanged logged at INFO/WARN ~900 lines/sec. Downgraded to DEBUG with isDebugEnabled() guard.
  4. TopicQueueMappingContext.EMPTY singleton — Non-static-topic messages (>99%) created empty objects. Added public static final EMPTY singleton.

How Did You Test This Change?

  1. Full CI passes (maven-compile + bazel-compile on all JDK versions).
  2. All broker processor call sites updated from getProcessTimer() to processTimerElapsedMs().
  3. Commercial compatibility verified — YitianRemotingCommand uses different method names, not affected.

Copilot AI review requested due to automatic review settings June 15, 2026 03:21

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR optimizes remoting command processing by reducing allocation overhead (Stopwatch removal, constructor caching, smaller HashMap defaults) and adds a faster header encoding path while also lowering channel writability log verbosity.

Changes:

  • Replace Guava Stopwatch timing with a System.nanoTime()-based timestamp on RemotingCommand.
  • Cache CommandCustomHeader no-arg constructors and add fastEncodeHeaderAsBuffer.
  • Reduce per-command allocations (smaller extFields default capacity) and downgrade some writability logs to debug.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
remoting/src/main/java/org/apache/rocketmq/remoting/protocol/statictopic/TopicQueueMappingContext.java Adds a shared EMPTY instance for “no context” usage.
remoting/src/main/java/org/apache/rocketmq/remoting/protocol/RemotingCommand.java Constructor caching, new fast header encoding method, Stopwatch removal, extFields capacity change.
remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingServer.java Lowers channel writability logs to debug and guards formatting work.
remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyDecoder.java Switches decode timing from Stopwatch to nanoTime start timestamp.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread remoting/src/main/java/org/apache/rocketmq/remoting/protocol/RemotingCommand.java Outdated

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Eliminates per-RPC allocations in the remoting framework: replaces Guava Stopwatch with System.nanoTime(), caches Constructor lookups, downgrades Netty writability log to DEBUG, and adds TopicQueueMappingContext.EMPTY singleton.

Findings

  • [Info] RemotingCommand.javaHEADER_CTOR_CACHE uses ConcurrentHashMap.computeIfAbsent(). The mapping function calls ctor.newInstance() which is reflective and could throw. Consider wrapping the computeIfAbsent lambda with a try-catch to avoid swallowing checked exceptions silently, or use get() with a null check + putIfAbsent() pattern for clearer error propagation.
  • [Info] RemotingCommand.javaprocessTimerNanos defaults to 0. If setProcessTimerNanos() is never called (e.g., code paths that bypass NettyDecoder), processTimerElapsedMs() returns 0. This is likely intentional but worth a brief comment documenting the contract.
  • [Info] NettyRemotingServer.java — Good catch on the log amplification (~900 lines/sec). The isDebugEnabled() guard is correct and avoids unnecessary string formatting.

Suggestions

  • Consider adding a unit test for newHeaderInstance() to verify the constructor cache works correctly for subclasses with different constructor signatures.
  • The EMPTY singleton in TopicQueueMappingContext is a good optimization. Ensure no code path mutates the singleton (e.g., call setTopic() on it). If immutability cannot be guaranteed, consider returning a defensive copy or making the fields final.

Overall: Clean micro-optimizations with measurable impact on high-throughput paths. 👍


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-command-microfixes branch from cb09268 to 0d43e12 Compare June 15, 2026 09:49

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Four independent per-RPC micro-optimizations in the remoting framework:

  1. Guava StopwatchSystem.nanoTime() (eliminates per-RPC allocation)
  2. Constructor cache for Class.getDeclaredConstructor() (eliminates ~237 alloc events/60s)
  3. Netty writability log downgrade INFO/WARN → DEBUG (eliminates ~900 lines/sec)
  4. TopicQueueMappingContext.EMPTY singleton (avoids per-message allocation for non-static-topic messages)

Findings

  • [Positive] RemotingCommand.java — Replacing Stopwatch with raw long processTimerNanos eliminates one object allocation per RPC. The newHeaderInstance() method with ConcurrentHashMap constructor cache is a clean solution to the getDeclaredConstructor() defensive copy issue.
  • [Positive] NettyRemotingServer.java — Downgrading channelWritabilityChanged to DEBUG with isDebugEnabled() guard is appropriate. ~900 lines/sec of INFO/WARN is clearly excessive for a writability toggle event.
  • [Positive] TopicQueueMappingContext.java — Adding EMPTY singleton is a simple and effective optimization.
  • [Warning] RemotingCommand.java — The HEADER_CTOR_CACHE uses computeIfAbsent on a ConcurrentHashMap. Under high contention with many distinct header classes, this could cause brief blocking on the bin lock. However, in practice the number of distinct header classes is bounded and small, so this is acceptable.
  • [Warning] RemotingCommand.java — The processTimerElapsedMs() method computes (System.nanoTime() - processTimerNanos) / 1_000_000. Note that nanoTime() can overflow after ~292 years of JVM uptime, and the subtraction handles this correctly due to two's complement arithmetic. No issue here, just noting for completeness.
  • [Info] NettyDecoder.java — Clean removal of the Guava Stopwatch import. The long timerNanos = System.nanoTime() is a drop-in replacement.

Suggestions

  • Consider adding a brief comment on HEADER_CTOR_CACHE explaining why a ConcurrentHashMap was chosen over ClassValue (which is specifically designed for per-class lazy computation). ClassValue would avoid the explicit cache map and may have better GC characteristics.
  • The four changes are independent — if the maintainers prefer smaller PRs, these could be split. However, they are all in the same hot path and the PR description clearly separates them, so keeping them together is also reasonable.

Verdict

LGTM. Well-justified micro-optimizations with clear JFR evidence.


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-command-microfixes branch from 0d43e12 to 03ee743 Compare June 16, 2026 01:23

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (re-review)

Summary

Four independent per-RPC micro-optimizations in the remoting framework:

  1. Guava StopwatchSystem.nanoTime() (eliminates per-RPC object allocation)
  2. Constructor cache for Class.getDeclaredConstructor() (eliminates ~237 alloc events/60s in JFR)
  3. Netty writability log downgrade INFO/WARN → DEBUG (eliminates ~900 lines/sec)
  4. TopicQueueMappingContext.EMPTY singleton (avoids per-message allocation for >99% of messages)

Findings

  • [Positive] RemotingCommand.java — Replacing Stopwatch with a raw long processTimerNanos + processTimerElapsedMs() is a clean elimination of per-RPC allocation. The accessor method encapsulates the conversion well.
  • [Positive] RemotingCommand.javaHEADER_CTOR_CACHE with ConcurrentHashMap.computeIfAbsent() correctly avoids the per-call Constructor copy. The newHeaderInstance() helper centralizes the reflective construction with proper error handling.
  • [Positive] NettyRemotingServer.java — Downgrading channelWritabilityChanged to DEBUG with isDebugEnabled() guard is the right call. ~900 lines/sec at INFO was clearly excessive.
  • [Info] TopicQueueMappingContext.java — The EMPTY singleton is a simple and effective optimization. Ensure callers don't mutate it — consider making fields immutable or adding a comment documenting the shared-instance contract.
  • [Info] Cross-repo: RemotingCommand changes (especially setProcessTimerNanos / processTimerElapsedMs) touch the core remoting API. If apache/rocketmq-clients Java module depends on RemotingCommand, verify no breakage there.

Verdict

Solid set of micro-optimizations with clear JFR evidence. No correctness concerns.


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-command-microfixes branch from 03ee743 to cb97b22 Compare June 16, 2026 08:07

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (re-review after new commit)

Summary

Same four per-RPC micro-optimizations as previously reviewed. The new commit squashes the changes into a single commit without altering the diff content.

Verdict

No changes since the previous review. All findings remain valid. Approving.

LGTM ✅


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-command-microfixes branch from cb97b22 to 3a84cc7 Compare June 17, 2026 02:02
@codecov-commenter

codecov-commenter commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 48.38710% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.25%. Comparing base (59a70d9) to head (9da9bb3).
⚠️ Report is 10 commits behind head on develop.

Files with missing lines Patch % Lines
...rocketmq/remoting/netty/NettyRemotingAbstract.java 0.00% 7 Missing ⚠️
...he/rocketmq/remoting/protocol/RemotingCommand.java 81.25% 3 Missing ⚠️
...rocketmq/broker/processor/PopMessageProcessor.java 0.00% 2 Missing ⚠️
...cketmq/broker/processor/QueryMessageProcessor.java 0.00% 2 Missing ⚠️
...ker/processor/DefaultPullMessageResultHandler.java 0.00% 1 Missing ⚠️
...ocketmq/broker/processor/PeekMessageProcessor.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10514      +/-   ##
=============================================
+ Coverage      48.19%   48.25%   +0.05%     
- Complexity     13398    13441      +43     
=============================================
  Files           1377     1377              
  Lines         100730   100851     +121     
  Branches       13012    13036      +24     
=============================================
+ Hits           48548    48664     +116     
+ Misses         46252    46221      -31     
- Partials        5930     5966      +36     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commits)

Summary

New commit adds fastEncodeHeaderAsBuffer(ByteBuffer) — a zero-copy header encoding method that writes directly into a Netty ByteBuf-backed ByteBuffer, avoiding the intermediate ByteBuffer allocation from encode(). Also reduces addExtField HashMap default capacity from 256 to 16.

Findings

  • [Warning] RemotingCommand.java:fastEncodeHeaderAsBufferManual byte-order dependency. The method writes int fields directly via byteBuffer.putInt(). This assumes the target ByteBuffer uses BIG_ENDIAN byte order (Java default). If the ByteBuf allocator ever changes to LITTLE_ENDIAN (e.g., for native interop), this would silently corrupt the wire format. Consider adding assert byteBuffer.order() == ByteOrder.BIG_ENDIAN at the top, or explicitly setting byteBuffer.order(ByteOrder.BIG_ENDIAN) before writing.

  • [Warning] RemotingCommand.java:fastEncodeHeaderAsBufferremark and body length encoding. The method writes remark as short length + bytes, and body as int length + bytes. If remark exceeds Short.MAX_VALUE (32767 chars), the short cast will silently truncate the length. Similarly, if body exceeds Integer.MAX_VALUE, the int cast will overflow. These are unlikely in practice but worth defensive checks:

    if (remark != null) {
        byte[] remarkBytes = remark.getBytes(CHARSET_UTF8);
        if (remarkBytes.length > Short.MAX_VALUE) {
            throw new IllegalArgumentException("remark too long: " + remarkBytes.length);
        }
        byteBuffer.putShort((short) remarkBytes.length);
        byteBuffer.put(remarkBytes);
    }
  • [Info] RemotingCommand.java:addExtField — Reducing default HashMap capacity from 256 to 16 is a good optimization. Most commands have fewer than 16 ext fields. The HashMap will resize as needed.

  • [Info] RemotingCommand.java:fastEncodeHeaderAsBuffer — The method correctly handles the serializeTypeCurrentField (1 byte) and the header JSON bytes. The total length prefix 4 + byteBuffer.remaining() is written first, which matches the wire format expected by the decoder.

Previous Findings (Still Valid)

  • HEADER_CTOR_CACHE uses computeIfAbsent — under high contention with many distinct header classes, the lambda allocation could offset the savings. Monitor in production.
  • processTimerNanos defaults to 0 — document the contract for code paths that bypass NettyDecoder.
  • TopicQueueMappingContext.EMPTY singleton — ensure no code path mutates it.

Automated review by github-manager-bot

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after squash commit)

Summary

The PR has been squashed into a single commit (1b95e33). The diff content is unchanged from the previous review — same four per-RPC micro-optimizations plus fastEncodeHeaderAsBuffer and addExtField capacity reduction.

Diff Verification

✅ All previously reviewed changes are intact:

  1. Guava StopwatchSystem.nanoTime() (processTimerNanos / processTimerElapsedMs())
  2. HEADER_CTOR_CACHE with ConcurrentHashMap + newHeaderInstance()
  3. Netty channelWritabilityChanged log downgrade to DEBUG
  4. TopicQueueMappingContext.EMPTY singleton
  5. fastEncodeHeaderAsBuffer(ByteBuffer) zero-copy header encoding
  6. addExtField HashMap capacity 256 → 16

Previous Findings (Still Valid)

  • [Warning] fastEncodeHeaderAsBuffer — byte-order dependency (BIG_ENDIAN assumed). Consider adding an explicit assertion.
  • [Warning] fastEncodeHeaderAsBufferremark length cast to short could silently truncate if > 32767 bytes.
  • [Info] HEADER_CTOR_CACHE get-check-put pattern avoids lambda overhead — good for micro-optimization.
  • [Info] Deprecated getProcessTimer() now returns a Stopwatch starting from "now" — external callers may get misleading values.

New Observation

  • [Info] RemotingCommand.java — The HEADER_CTOR_CACHE uses explicit get-check-put instead of computeIfAbsent to avoid lambda allocation per call. This is consistent with the micro-optimization goal.

No new issues found. The squash commit is clean.


Automated review by github-manager-bot

Comment thread remoting/src/main/java/org/apache/rocketmq/remoting/protocol/RemotingCommand.java Outdated
@wang-jiahua wang-jiahua force-pushed the perf/remoting-command-microfixes branch 3 times, most recently from c276d4c to 0c02f95 Compare June 22, 2026 07:44
@wang-jiahua

Copy link
Copy Markdown
Contributor Author

Server Benchmark Results (2026-06-22)

Environment: 8C30G broker, Dragonwell JDK 21, 1KB msg, 128 queues, sync mode, 256 threads, 20s warmup + 30s collect

Producer-only (steady state, lines 3-5)

Test TPS Avg RT vs Baseline
Baseline 217k 1.18ms
#10514 (fixed) 216k 1.19ms Parity (no regression)

Fix applied: processTimerNanos changed from = System.nanoTime() (field initializer) to lazy init (default 0, returns 0 when unset). The field initializer caused a native call on every RemotingCommand construction, preventing JIT constructor inlining (-17% TPS regression before fix).

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after squash)

Summary

PR squashed into a single commit (0c02f95d). The diff content is unchanged from the previous review — same four per-RPC micro-optimizations.

Diff Verification

✅ All previously reviewed changes are intact:

  1. Guava StopwatchSystem.nanoTime()processTimerNanos field + processTimerElapsedMs() method. Eliminates per-RPC Stopwatch allocation. Old getProcessTimer() deprecated with backward-compatible wrapper.
  2. HEADER_CTOR_CACHEConcurrentHashMap<Class<?>, Constructor<?>> with computeIfAbsent-style pattern. Eliminates per-RPC Constructor object copy. newHeaderInstance() helper centralizes the cached lookup.
  3. Netty writability log downgrade — INFO/WARN → DEBUG with isDebugEnabled() guard. Appropriate for high-frequency channel events.
  4. TopicQueueMappingContext.EMPTY singleton — Static final instance for the >99% non-static-topic case.
  5. addExtField capacity fixHashMap(256)HashMap() (default 16). 16x memory reduction for typical ext field counts.

Verdict

Clean re-review. All optimizations are well-targeted, thread-safe, and backward-compatible. LGTM.


Automated review by github-manager-bot

wang-jiahua pushed a commit to wang-jiahua/rocketmq that referenced this pull request Jun 23, 2026
…Ms in NettyRemotingAbstract

- RemotingMetricsManager: add getOrBuildAttributes() with long-key cache
  (same pattern as BrokerMetricsManager in apache#10485)
- NettyRemotingAbstract: replace getProcessTimer().elapsed() with
  processTimerElapsedMs() (from apache#10514), use cached Attributes
- NettyRemotingAbstract: add log.isDebugEnabled() guard for debug log
- RemotingMetricsConstant: add LABEL_*_KEY static AttributeKey instances

@RongtongJin RongtongJin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few focused comments on the remaining compatibility and operability concerns.

…(Guava Stopwatch, Constructor copy)

- Replace Stopwatch with long processTimerNanos (lazy init, no native call per construction)
- Add Constructor cache (HEADER_CTOR_CACHE) to avoid reflective getDeclaredConstructor per decode
- Add @deprecated getProcessTimer()/setProcessTimer() adapter methods for binary compatibility
- Revert Netty writability log level to original (warn for auto-read disabled, info for writable)
@wang-jiahua wang-jiahua force-pushed the perf/remoting-command-microfixes branch from 0c02f95 to 9da9bb3 Compare June 23, 2026 08:02

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after force-push)

Summary

Single squashed commit 9da9bb3 (force-pushed 2026-06-23). Diff content is unchanged from the previous review — same four per-RPC micro-optimizations across 8 files (+62/-30).

Diff Verification

✅ All previously reviewed changes are intact:

  1. Guava StopwatchSystem.nanoTime()processTimerNanos field + processTimerElapsedMs(). Zero per-RPC allocation.
  2. HEADER_CTOR_CACHEConcurrentHashMap with get + putIfAbsent pattern. Eliminates Constructor object copy.
  3. Netty writability log downgrade — INFO/WARN → DEBUG with isDebugEnabled() guard.
  4. TopicQueueMappingContext.EMPTY singleton — Static final instance for the >99% non-static-topic case.

Unresolved Discussion Notes

Several concerns raised by @RongtongJin remain open and are worth tracking:

  • processTimerNanos default value (line 661): Defaults to 0, so commands that bypass setProcessTimerNanos() will report ~JVM-uptime latency. Consider initializing to -1 as a sentinel, or documenting the contract clearly.
  • Deprecated getProcessTimer() (line ~670): Returns a freshly-started Stopwatch that no longer reflects the real request timer. Any unmigrated downstream caller silently observes near-zero latency. If kept for compat, consider throwing UnsupportedOperationException or logging a warning.
  • TopicQueueMappingContext.EMPTY mutability: The class exposes setters (setTopic, setGlobalId, etc.), making a shared EMPTY singleton risky under concurrent access. If adopted, the class should be made immutable or EMPTY should be a defensive copy factory.
  • Log level downgrade for backpressure: channelWritabilityChanged and auto-read disabled events are operationally useful for diagnosing slow consumers. A rate-limiter (e.g., log at most once per N seconds per channel) could reduce volume while preserving the signal.

These are suggestions for the author to consider — the core optimizations are sound and well-targeted.


Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Eliminate per-RPC allocation in RemotingCommand (Guava Stopwatch, Constructor copy) and downgrade Netty writability log

5 participants