Add an inlining stage for single callsite methods by roberttoyonaga · Pull Request #12899 · oracle/graal

roberttoyonaga · 2026-02-02T14:29:15Z

Summary

This PR adds a new inlining stage for single callsite methods. We should be able to benefit from inlining single callsite methods without paying the code area price. This stage is done completely independently of the normal Trivial inlining stage. It can be turned off using the AOTSingleCallsiteInline hosted option, similar to the existing AOTTrivialInline option.

This inlining stage works by first counting the callsites for each method in the universe, then inlining those methods specifically. Similar to the Trivial inlining stage, rounds are used. However, usually only 1 or 2 rounds will ever execute. An additional round will execute only when an inlining candidate exceeds the fallback threshold (rare).

Overall, adding this inlining stage improves performance. See a few benchmark results below.

Results

Using Renaissance:

Test details:

Tested on Linux amd64
For stability during the build and during benchmark execution
- Intel turbo boost disabled
- CPU frequency pinned to 2100000kHz with cpupower
- cpupower frequency-set --governor performance
- Caches dropped before building with sh -c 'echo 3 >/proc/sys/vm/drop_caches'
- Each benchmark was run multiple times. The first few runs are discounted as warm-up.

Definitions:

new : The inliner with single callsite inlining implemented
old : The old inliner with default settings.
duration: Execution time of the Renaissance benchmark
% improvement : Calculated as 100 * (old duration - new duration)/ old duration
STDEV Standard deviation in benchmark run duration
inline time : time taken only by the inlining operations
build time : total time taken by the image builder
Peak Build RSS : Peak RSS reported by the image builder.

Benchmark	Duration (ms) [old]	Duration (ms) [new]	% improvement	STDEV (ms) [old]	STDEV (ms) [new]	Code Area (MB) [old]	Code Area (MB) [new]	File Size (MB) [old]	File Size (MB) [new]	Inline Time (s) [old]	Inline Time (s) [new]	Build Time (s) [old]	Build Time (s) [new]	Peak Build RSS (GB) [old]	Peak Build RSS (GB) [new]	Total runs	Warm up runs
mnemonics	11127.89	10368.52	6.824	77.685	70.535	7.42	7.49	19.5	19.5	0.8	1.25	38.85	39.35	1.735	1.82	16.	6.
reactors	35201.1	33134.3	5.871	564.39	1057.	7.86	7.94	21.07	21.13	0.8	1.3	40.1	41.2	1.81	1.93	10.	3.
future-genetic	2743.19	2684.55	2.138	16.91	19.525	7.54	7.6	19.63	19.69	0.75	1.3	39.05	39.65	1.74	1.815	25.	5.
par-mnemonics	9192.965	8634.455	6.075	89.955	60.565	7.43	7.49	19.5	19.5	0.8	1.2	38.7	39.35	1.74	1.8	16.	6.
philosophers	6818.26	6422.29	5.807	245.965	209.43	9.97	10.21	27.69	27.88	1.	1.4	46.6	47.6	2.05	2.2	30.	10.
scala-doku	5911.5	5931.97	-0.346	51.76	21.15	7.49	7.55	19.57	19.57	0.8	1.1	39.1	40.	1.76	1.84	20.	10.
fj-kmeans	6422.1	6488.66	-1.036	122.97	94.03	7.39	7.46	19.38	19.44	0.7	1.2	38.3	40.	1.72	1.8	30.	20.
akka-uct	30300.175	30102.31	0.653	836.825	1013.41	9.52	9.59	24.32	24.32	1.	1.5	45.2	45.7	2.03	2.07	10.	5.
scala-kmeans	722.883	616.075	14.775	3.39	4.1	7.37	7.46	19.38	19.44	0.8	1.3	38.5	39.	1.74	1.79	25.	5.

Using a Quarkus hello-world rest benchmark:

This benchmark has 2 endpoints: "greeting" and "beer". Both return plaintext, but "beer" does a little more work.

Test details:

Tested on Linux amd64
For stability during the build and during benchmark execution
- Intel turbo boost disabled
- CPU frequency pinned to 2100000kHz with cpupower
- cpupower frequency-set --governor performance
- Caches dropped before building and running with sh -c 'echo 3 >/proc/sys/vm/drop_caches'
- Both the old and new configurations were built and run 5 times in an alternating fashion. The results were averaged.
- Pinned the load driver (Hyperfoil) to 4 CPUs and the Quarkus app to 2 other CPUs

New Definitions:

Req/s : Quarkus app throughput in requests per second
% improvement : Calculated as 100 * (new throughput - old throughput)/ old throughput

Benchmark	Throughput (req/s) [old]	Throughput (req/s) [new]	% improvement	Code Area (MB) [old]	Code Area (MB) [new]	File Size (MB) [old]	File Size (MB) [new]	Inline Time (s) [old]	Inline Time (s) [new]	Build Time (s) [old]	Build Time (s) [new]	Peak Build RSS (GB) [old]	Peak Build RSS (GB) [new]
"greeting" endpoint	48161.4	53484.8	11.053	20.14	20.52	46.82	47.19	1.388	2.371	81.5	82.429	2.704	2.83
"beer" endpoint	23598.25	26249.5	11.235	Same as above	Same as above	Same as above	Same as above	Same as above	Same as above	Same as above	Same as above	Same as above	Same as above

Other notes

Improving the Trivial Inlining stage
I also tried improving the Trivial inlining stage by switching from using raw node counting to estimatedNodeSize(). That should give a more accurate prediction of code area than using the raw node count. However, this only resulted in improvement in one Renaissance benchmark (par-mnemonics), so I am not sure whether this change is worth it. You can see the code for that here roberttoyonaga#4.

Unit tests
I was not able to find any existing tests for the Native Image Inliner. I have manually checked for correctness by building with debug info and checking the generated assembly. However, I don't think that approach translates well to unit tests. Another option I was considering is testing for correctness at the Graal IR level. However, I'm not sure the best way to go about doing this so I decided it was best to ask for advice here before investing too much in a particular approach.

roberttoyonaga · 2026-02-02T15:52:54Z

cc @christianhaeubl
This PR is based on something we talked about a few months ago at the GraalVM summit. It adds an inlining stage for single callsite methods to achieve more inlining without suffering the code area penalty.
We also talked about using estimatedNodeSize() in the Trivial inliner cost calculations. I experimented with this, but unfortunately it did not seem to have much of an impact roberttoyonaga#4

christianhaeubl · 2026-02-03T08:03:54Z

@dougxc please assign someone from the compiler side as reviewer.

dougxc · 2026-02-03T10:43:01Z

@axel22 or @boris-spas , could you please have a look at this.

roberttoyonaga · 2026-02-17T15:19:33Z

@axel22 or @boris-spas - Just a gentle ping to keep this on your radar. Have you had a chance to take a look at this?

roberttoyonaga · 2026-03-06T14:14:37Z

@axel22 or @boris-spas, another ping to keep this on your radar.

Have you had time to start looking at this?

thomaswue · 2026-03-20T10:57:32Z

Hi @roberttoyonaga! Thank you for the PR. I will take this over for now and check how we can integrate. We do have larger changes to the inliner pending and I will see how this combines.

roberttoyonaga · 2026-04-15T19:44:34Z

Hi @thomaswue just a gentle ping. Do you know if there have been any updates with regard to this? Thanks!

roberttoyonaga · 2026-06-04T15:25:31Z

It would be great to get this in before the July release. Since this PR is not really moving, we are considering putting this into the GraalVM for JDK 25.0 maintenance repo (graalvm/graalvm-community-jdk25u#50).
@thomaswue Are there any updates regarding this?

thomaswue · 2026-06-05T10:01:31Z

I am benchmarking this today. Now backporting into 25.0.x is an option to be considered anyway, because if it is integrated, it will be on the 25.1.x innovation release line.

Karm · 2026-06-16T08:04:55Z

I integrated the optional inlining stage into graalvm/graalvm-community-jdk25u#50 (review) to be part of the July release.

thomaswue · 2026-06-18T09:28:34Z

There is bad interaction of this PR with the -H:Preserve flag or other features that force a method to become a root compilation anyway. I see modest benefits, I played around with expanding this a bit for higher benefits by including non-leaf methods in some cases.

roberttoyonaga · 2026-06-19T15:41:29Z

I've rebased with master, fixed conflicts, added a test mx scismoketest, switched the feature OFF by default, and made the feature "experimental"

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Feb 2, 2026

roberttoyonaga added ibm-redhat-interest native-image labels Feb 2, 2026

roberttoyonaga force-pushed the SingleCallsiteInlining branch from 216b144 to 644e949 Compare February 2, 2026 14:43

roberttoyonaga marked this pull request as ready for review February 2, 2026 15:41

roberttoyonaga mentioned this pull request Feb 2, 2026

Add an inlining stage for single callsite methods roberttoyonaga/graal#5

Closed

zakkak added the performance label Feb 3, 2026

oubidar-Abderrahim assigned axel22 and boris-spas Feb 19, 2026

thomaswue assigned thomaswue and unassigned axel22 and boris-spas Mar 20, 2026

zakkak mentioned this pull request May 13, 2026

Add an inlining stage for single callsite methods graalvm/graalvm-community-jdk25u#50

Merged

zakkak mentioned this pull request Jun 18, 2026

Add mx test for SingleCallsiteInliner graalvm/graalvm-community-jdk25u#86

Merged

roberttoyonaga added 3 commits June 19, 2026 11:26

single callsite inlining

ddbc103

add mx test for inliner

999c51a

make inliner option off by default and experimental

7d9d5a8

roberttoyonaga force-pushed the SingleCallsiteInlining branch from 8876f3b to 7d9d5a8 Compare June 19, 2026 15:26

fix

5986cb2

roberttoyonaga force-pushed the SingleCallsiteInlining branch from 1cf21dc to 5986cb2 Compare June 19, 2026 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an inlining stage for single callsite methods#12899

Add an inlining stage for single callsite methods#12899
roberttoyonaga wants to merge 4 commits into
oracle:masterfrom
roberttoyonaga:SingleCallsiteInlining

roberttoyonaga commented Feb 2, 2026

Uh oh!

roberttoyonaga commented Feb 2, 2026

Uh oh!

christianhaeubl commented Feb 3, 2026

Uh oh!

dougxc commented Feb 3, 2026

Uh oh!

roberttoyonaga commented Feb 17, 2026

Uh oh!

roberttoyonaga commented Mar 6, 2026

Uh oh!

thomaswue commented Mar 20, 2026

Uh oh!

roberttoyonaga commented Apr 15, 2026

Uh oh!

roberttoyonaga commented Jun 4, 2026

Uh oh!

thomaswue commented Jun 5, 2026

Uh oh!

Karm commented Jun 16, 2026

Uh oh!

thomaswue commented Jun 18, 2026

Uh oh!

roberttoyonaga commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

roberttoyonaga commented Feb 2, 2026

Summary

Results

Using Renaissance:

Using a Quarkus hello-world rest benchmark:

Other notes

Uh oh!

roberttoyonaga commented Feb 2, 2026

Uh oh!

christianhaeubl commented Feb 3, 2026

Uh oh!

dougxc commented Feb 3, 2026

Uh oh!

roberttoyonaga commented Feb 17, 2026

Uh oh!

roberttoyonaga commented Mar 6, 2026

Uh oh!

thomaswue commented Mar 20, 2026

Uh oh!

roberttoyonaga commented Apr 15, 2026

Uh oh!

roberttoyonaga commented Jun 4, 2026

Uh oh!

thomaswue commented Jun 5, 2026

Uh oh!

Karm commented Jun 16, 2026

Uh oh!

thomaswue commented Jun 18, 2026

Uh oh!

roberttoyonaga commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants