Add an inlining stage for single callsite methods#12899
Conversation
216b144 to
644e949
Compare
|
cc @christianhaeubl |
|
@dougxc please assign someone from the compiler side as reviewer. |
|
@axel22 or @boris-spas , could you please have a look at this. |
|
@axel22 or @boris-spas - Just a gentle ping to keep this on your radar. Have you had a chance to take a look at this? |
|
@axel22 or @boris-spas, another ping to keep this on your radar. Have you had time to start looking at this? |
|
Hi @roberttoyonaga! Thank you for the PR. I will take this over for now and check how we can integrate. We do have larger changes to the inliner pending and I will see how this combines. |
|
Hi @thomaswue just a gentle ping. Do you know if there have been any updates with regard to this? Thanks! |
|
It would be great to get this in before the July release. Since this PR is not really moving, we are considering putting this into the GraalVM for JDK 25.0 maintenance repo (graalvm/graalvm-community-jdk25u#50). |
|
I am benchmarking this today. Now backporting into 25.0.x is an option to be considered anyway, because if it is integrated, it will be on the 25.1.x innovation release line. |
|
I integrated the optional inlining stage into graalvm/graalvm-community-jdk25u#50 (review) to be part of the July release. |
|
There is bad interaction of this PR with the -H:Preserve flag or other features that force a method to become a root compilation anyway. I see modest benefits, I played around with expanding this a bit for higher benefits by including non-leaf methods in some cases. |
8876f3b to
7d9d5a8
Compare
1cf21dc to
5986cb2
Compare
|
I've rebased with master, fixed conflicts, added a test |
Summary
This PR adds a new inlining stage for single callsite methods. We should be able to benefit from inlining single callsite methods without paying the code area price. This stage is done completely independently of the normal Trivial inlining stage. It can be turned off using the
AOTSingleCallsiteInlinehosted option, similar to the existingAOTTrivialInlineoption.This inlining stage works by first counting the callsites for each method in the universe, then inlining those methods specifically. Similar to the Trivial inlining stage, rounds are used. However, usually only 1 or 2 rounds will ever execute. An additional round will execute only when an inlining candidate exceeds the fallback threshold (rare).
Overall, adding this inlining stage improves performance. See a few benchmark results below.
Results
Using Renaissance:
Test details:
cpupower frequency-set --governor performancesh -c 'echo 3 >/proc/sys/vm/drop_caches'Definitions:
Using a Quarkus hello-world rest benchmark:
This benchmark has 2 endpoints: "greeting" and "beer". Both return plaintext, but "beer" does a little more work.
Test details:
cpupower frequency-set --governor performancesh -c 'echo 3 >/proc/sys/vm/drop_caches'New Definitions:
Other notes
Improving the Trivial Inlining stage
I also tried improving the Trivial inlining stage by switching from using raw node counting to
estimatedNodeSize(). That should give a more accurate prediction of code area than using the raw node count. However, this only resulted in improvement in one Renaissance benchmark (par-mnemonics), so I am not sure whether this change is worth it. You can see the code for that here roberttoyonaga#4.Unit tests
I was not able to find any existing tests for the Native Image Inliner. I have manually checked for correctness by building with debug info and checking the generated assembly. However, I don't think that approach translates well to unit tests. Another option I was considering is testing for correctness at the Graal IR level. However, I'm not sure the best way to go about doing this so I decided it was best to ask for advice here before investing too much in a particular approach.