-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Move shared offload globals and define per-kernel globals once #149788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9de91f2 to
b357e0c
Compare
| if cx.sess().opts.unstable_opts.offload.contains(&Offload::Enable) | ||
| && !cx.sess().target.is_like_gpu | ||
| { | ||
| cx.offload_globals.replace(Some(OffloadGlobals::declare(&cx))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a bit unsure about this location. we could also cache these globals and generate them on the first intrinsic call, but that felt like overloading intrinsic codegen a bit too much
i don't have a strong opinion though, so happy to go with whatever u think is best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also prefer if the intrinsics stay simple and don't need global context about whether they're the first intrinsic or not.
The location seems fine, but can you leave a comment explaining that we have to leave a bunch of globals in the LLVM-IR host module? Similar to e.g. objective-c below. Otherwise lgtm.
|
Thanks for the IR and code cleanup, the struct makes it much nicer. @bors try @rust-timer queue I don't think it will have any impact, but then again, we thought the same when accidentally causing a regression with autodiff, so let's see. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Move shared offload globals and define per-kernel globals once
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (b120fe9): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 481.444s -> 482.856s (0.29%) |
|
Benchmark seems noisy, and the @bors delegate+ |
37df9e8 to
04c2d2b
Compare
|
@bors r+ rollup |
Move shared offload globals and define per-kernel globals once This PR moves the shared LLVM global variables logic out of the `offload` intrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic. r? `@ZuseZ4`
Move shared offload globals and define per-kernel globals once This PR moves the shared LLVM global variables logic out of the `offload` intrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic. r? ``@ZuseZ4``
Move shared offload globals and define per-kernel globals once This PR moves the shared LLVM global variables logic out of the `offload` intrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic. r? ```@ZuseZ4```
Rollup of 7 pull requests Successful merges: - #149633 (Enable `outline-atomics` by default on AArch64 FreeBSD) - #149788 (Move shared offload globals and define per-kernel globals once) - #149989 (Improve filenames encoding and misc) - #150012 (rustc_target: Add `efiapi` ABI support for LoongArch) - #150116 (layout: Store inverse memory index in `FieldsShape::Arbitrary`) - #150151 (Destabilise `target-spec-json`) - #150159 (Split eii macro expansion code) r? `@ghost` `@rustbot` modify labels: rollup
Move shared offload globals and define per-kernel globals once This PR moves the shared LLVM global variables logic out of the `offload` intrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic. r? `@ZuseZ4` tracking: - rust-lang#131513
This PR moves the shared LLVM global variables logic out of the
offloadintrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic.r? @ZuseZ4
tracking: