fix(posix): map pthread_key_tPointer to pthread_key_t instead of size_t#13391
fix(posix): map pthread_key_tPointer to pthread_key_t instead of size_t#13391gounthar wants to merge 1 commit into
Conversation
|
Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application. When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated. If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public. |
|
Thank you for signing the OCA. |
|
Thanks for the contribution! We'll review this shortly. @wirthi, could you please assign a reviewer? |
|
Appreciate the quick acknowledgment, looking forward to the review. |
|
|
||
| if len(symbols) == 0: | ||
| if not self.staticlibs: | ||
| mx.warn('No static JDK libraries found for riscv64 - generating empty JvmFuncsFallbacks.c') |
There was a problem hiding this comment.
I don't think this warning should mention riscv64.
There was a problem hiding this comment.
Fixed. Replaced the hardcoded 'riscv64' with mx.get_arch() so the warning isn't architecture-specific.
| layout = { | ||
| './': ['file:' + join(base_jdk_home, 'lib', lib_prefix + '*' + lib_suffix)] | ||
| } | ||
| from glob import glob as _glob |
There was a problem hiding this comment.
Please put this at the top of the file.
There was a problem hiding this comment.
Fixed. Removed the local import. glob is already at the top of the file; I clearly didn't look far enough.
There was a problem hiding this comment.
I'm quite curious why you needed that. What jdk did you use to build native image?
There was a problem hiding this comment.
Initially I used openjdk-25-jdk-headless from Debian apt on a Banana Pi F3 runner. That JDK ships with no static .a libraries, which is exactly why the JvmFuncsFallbacks build task hits that branch and why I changed mx.abort() to mx.warn(). Later I rebuilt labs-openjdk for riscv64 from source (tracked in labs-openjdk#34), which produced 38 static .a libs. With those in place the build gets further.
Should have put this in the PR description. My mistake.
There was a problem hiding this comment.
No worries!
I don't know if we really want to support building graal without the static .a libraries though. At least this change would be out of the scope of this PR, and would be touching all the architectures, so maybe it's better to keep this one focused on the pthread issue.
There was a problem hiding this comment.
Agreed, pulled it out. I force-pushed so the branch is now just the pthread_key_t mapping and nothing else: nameOfCType = "pthread_key_t" with @AllowWideningCast on read(), which is what you suggested, and it drops the stack-slot zeroing completely.
The static-lib handling and the libffi arch mapping were really separate riscv64 build concerns, so I will move those to their own PRs if they turn out to be worth it. You know the substratevm side better than I do, so tell me if you would rather I split things differently. Thanks for the steer.
| // StackValue allocates size_t (8 bytes) but pthread_key_t is unsigned int (4 bytes). | ||
| // Zero the slot so the upper 32 bits are clean; otherwise a 64-bit key.read() picks | ||
| // up stack garbage, and riscv64 glibc rejects the resulting key value via pthread_setspecific. | ||
| ((WordPointer) key).write(Word.zero()); |
There was a problem hiding this comment.
Instead of zeroing the value, maybe it would make more sense to have the nameOfCType of pthread_key_tPointer be pthread_key_t.
There was a problem hiding this comment.
Agreed, cleaner. Changing nameOfCType from "size_t" to "pthread_key_t" fixes it at the type level rather than patching around it in createUnmanagedThreadLocal. I want to test this on the F3 first since that's exactly where the @CPointerTo/StackValue/pthread_setspecific interaction breaks. Will update once verified. You probably know better than I do whether this could affect other platforms too.
On riscv64, pthread_key_t is a 4-byte unsigned int, but pthread_key_tPointer was declared with nameOfCType = "size_t" (8 bytes). Reading the key back as a 64-bit value left garbage in the upper 32 bits, which glibc rejected in pthread_setspecific. Declaring the C type as pthread_key_t sizes the pointer correctly; @AllowWideningCast permits the read into the wider UnsignedWord.
81aa24e to
c48878d
Compare
Summary
StackValue.get(pthread_key_tPointer.class)allocatessize_tbytes (8 on 64-bit) becausepthread_key_tPointeris@CPointerTo(nameOfCType = "size_t").pthread_key_createwrites only the 4-bytepthread_key_tvalue (unsigned inton Linux). Thenkey.read()does an 8-byte load, so whatever was sitting on the stack in the upper 32 bits gets folded into the returnedThreadLocalKey.On amd64 this is silent: glibc validates the key with a 32-bit comparison and never touches the upper word. On riscv64, glibc uses a 64-bit
bltuagainstPTHREAD_KEYS_MAX(1024). A key like0xdeadbeef00000002fails that check,pthread_setspecificreturnsEINVAL, and glibc aborts duringIsolateThreadCache.clear()at shutdown:Zeroing the slot before the call fixes this on my setup. I'm not sure it's the right approach though. Zeroing at allocation time might be cleaner, or maybe
pthread_key_tPointershould be sized to reflect the actual 4-byte type rather thansize_t. Happy to take direction from anyone who knows this area better.Related Issues
Fixes #13386
Testing
Tested on a single Banana Pi F3 (SpacemiT K1, rv64gc, Debian). With the fix, a HelloWorld native binary runs and exits cleanly. Without it, it prints output and then aborts with the error above.
I haven't tested other riscv64 boards or glibc versions, so there may be cases I haven't hit. On amd64 the change should be a no-op since the 32-bit glibc comparison never reads the upper bytes, but I haven't run a full test suite to confirm nothing regresses there. A proper regression test would need a riscv64 CI runner, which I don't have access to.
Documentation
Nothing to update. Runtime crash fix, no API change.
Contributor Checklist