Skip to content

fix(posix): map pthread_key_tPointer to pthread_key_t instead of size_t#13391

Open
gounthar wants to merge 1 commit into
oracle:masterfrom
gounthar:fix/riscv64-pthread-key-stackslot-zero
Open

fix(posix): map pthread_key_tPointer to pthread_key_t instead of size_t#13391
gounthar wants to merge 1 commit into
oracle:masterfrom
gounthar:fix/riscv64-pthread-key-stackslot-zero

Conversation

@gounthar

Copy link
Copy Markdown

Summary

StackValue.get(pthread_key_tPointer.class) allocates size_t bytes (8 on 64-bit) because pthread_key_tPointer is @CPointerTo(nameOfCType = "size_t"). pthread_key_create writes only the 4-byte pthread_key_t value (unsigned int on Linux). Then key.read() does an 8-byte load, so whatever was sitting on the stack in the upper 32 bits gets folded into the returned ThreadLocalKey.

On amd64 this is silent: glibc validates the key with a 32-bit comparison and never touches the upper word. On riscv64, glibc uses a 64-bit bltu against PTHREAD_KEYS_MAX (1024). A key like 0xdeadbeef00000002 fails that check, pthread_setspecific returns EINVAL, and glibc aborts during IsolateThreadCache.clear() at shutdown:

pthread_setspecific(key, value): wrong arguments

Zeroing the slot before the call fixes this on my setup. I'm not sure it's the right approach though. Zeroing at allocation time might be cleaner, or maybe pthread_key_tPointer should be sized to reflect the actual 4-byte type rather than size_t. Happy to take direction from anyone who knows this area better.

Related Issues

Fixes #13386

Testing

Tested on a single Banana Pi F3 (SpacemiT K1, rv64gc, Debian). With the fix, a HelloWorld native binary runs and exits cleanly. Without it, it prints output and then aborts with the error above.

I haven't tested other riscv64 boards or glibc versions, so there may be cases I haven't hit. On amd64 the change should be a no-op since the 32-bit glibc comparison never reads the upper bytes, but I haven't run a full test suite to confirm nothing regresses there. A proper regression test would need a riscv64 CI runner, which I don't have access to.

Documentation

Nothing to update. Runtime crash fix, no API change.

Contributor Checklist

  • I have read the contribution guide.
  • I have the right to contribute the submitted material under the project terms.
  • I have updated tests and documentation where appropriate.
  • If I used a coding assistant, I remain responsible for the entire contribution and have reviewed it accordingly.

@oracle-contributor-agreement

Copy link
Copy Markdown

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. label Apr 23, 2026
@oracle-contributor-agreement

Copy link
Copy Markdown

Thank you for signing the OCA.

@oracle-contributor-agreement oracle-contributor-agreement Bot added OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Apr 24, 2026
@oubidar-Abderrahim oubidar-Abderrahim self-assigned this Apr 28, 2026
@oubidar-Abderrahim

Copy link
Copy Markdown
Member

Thanks for the contribution!

We'll review this shortly. @wirthi, could you please assign a reviewer?

@wirthi wirthi requested a review from Zeavee April 28, 2026 10:32
@gounthar

Copy link
Copy Markdown
Author

Appreciate the quick acknowledgment, looking forward to the review.


if len(symbols) == 0:
if not self.staticlibs:
mx.warn('No static JDK libraries found for riscv64 - generating empty JvmFuncsFallbacks.c')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this warning should mention riscv64.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced the hardcoded 'riscv64' with mx.get_arch() so the warning isn't architecture-specific.

layout = {
'./': ['file:' + join(base_jdk_home, 'lib', lib_prefix + '*' + lib_suffix)]
}
from glob import glob as _glob

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put this at the top of the file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the local import. glob is already at the top of the file; I clearly didn't look far enough.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite curious why you needed that. What jdk did you use to build native image?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I used openjdk-25-jdk-headless from Debian apt on a Banana Pi F3 runner. That JDK ships with no static .a libraries, which is exactly why the JvmFuncsFallbacks build task hits that branch and why I changed mx.abort() to mx.warn(). Later I rebuilt labs-openjdk for riscv64 from source (tracked in labs-openjdk#34), which produced 38 static .a libs. With those in place the build gets further.

Should have put this in the PR description. My mistake.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries!

I don't know if we really want to support building graal without the static .a libraries though. At least this change would be out of the scope of this PR, and would be touching all the architectures, so maybe it's better to keep this one focused on the pthread issue.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, pulled it out. I force-pushed so the branch is now just the pthread_key_t mapping and nothing else: nameOfCType = "pthread_key_t" with @AllowWideningCast on read(), which is what you suggested, and it drops the stack-slot zeroing completely.

The static-lib handling and the libffi arch mapping were really separate riscv64 build concerns, so I will move those to their own PRs if they turn out to be worth it. You know the substratevm side better than I do, so tell me if you would rather I split things differently. Thanks for the steer.

// StackValue allocates size_t (8 bytes) but pthread_key_t is unsigned int (4 bytes).
// Zero the slot so the upper 32 bits are clean; otherwise a 64-bit key.read() picks
// up stack garbage, and riscv64 glibc rejects the resulting key value via pthread_setspecific.
((WordPointer) key).write(Word.zero());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of zeroing the value, maybe it would make more sense to have the nameOfCType of pthread_key_tPointer be pthread_key_t.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, cleaner. Changing nameOfCType from "size_t" to "pthread_key_t" fixes it at the type level rather than patching around it in createUnmanagedThreadLocal. I want to test this on the F3 first since that's exactly where the @CPointerTo/StackValue/pthread_setspecific interaction breaks. Will update once verified. You probably know better than I do whether this could affect other platforms too.

On riscv64, pthread_key_t is a 4-byte unsigned int, but pthread_key_tPointer
was declared with nameOfCType = "size_t" (8 bytes). Reading the key back as a
64-bit value left garbage in the upper 32 bits, which glibc rejected in
pthread_setspecific. Declaring the C type as pthread_key_t sizes the pointer
correctly; @AllowWideningCast permits the read into the wider UnsignedWord.
@gounthar gounthar force-pushed the fix/riscv64-pthread-key-stackslot-zero branch from 81aa24e to c48878d Compare May 24, 2026 13:16
@gounthar gounthar changed the title fix(posix): zero pthread_key_t stack slot before pthread_key_create fix(posix): map pthread_key_tPointer to pthread_key_t instead of size_t May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Native Image] pthread_setspecific: wrong arguments crash on shutdown — riscv64 only, root cause in PosixPlatformThreads.createUnmanagedThreadLocal()

3 participants