Skip to content

Deprioritize cancellation in mergeSuppressed#3924

Open
kyay10 wants to merge 1 commit into
mainfrom
kyay10/deprioritize-cancellation
Open

Deprioritize cancellation in mergeSuppressed#3924
kyay10 wants to merge 1 commit into
mainfrom
kyay10/deprioritize-cancellation

Conversation

@kyay10

@kyay10 kyay10 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

The problem

This affects all the "bracketing" functions from fx, as well as resourceScope and autoCloseScope. I'll be speaking about resourceScope for brevity.

The key observation behind this change is that some exceptions get swallowed because, e.g., someone raised inside resourceScope, but onRelease threw a real, non-fatal exception.
I think that's unacceptable; code that completes exceptionally needs to be reported eventually.

Background

A band-aid solution would be to throw suppressed exceptions in arrow.core.raise.fold, but I dislike that. Someone might want to catch the exception from resourceScope, but instead, the exception is "tunnelled" all the way to the fold corresponding to the Raise. Such static tunnelling behaviour is unexpected for the dynamic exceptions we're used to.

Even `kotlinx-coroutines` doesn't rethrow suppressed exceptions
import kotlinx.coroutines.*
import kotlin.time.Duration

suspend fun main() {
    withContext(Dispatchers.Default) {
        val autoCloseable = AutoCloseable { throw RuntimeException("BOOM!") }
        
        val job = async {
            autoCloseable.use {
                awaitCancellation()
            }
        }
        job.cancel()
    }
    println("All coroutines have completed") // finishes just fine!
}

One might argue that it's on stdlib to fix AutoCloseable.use first, then we can follow suit. I think Arrow should still fix this behavior, just like how result { } doesn't catch fatal exceptions, even though stdlib's runCatching does.

There is prior art for this change; Scala's Using has a more complicated priority system for exceptions. Importantly, they deprioritize ControlThrowable.
I don't want to get into the nightmare of ControlThrowable vs Break too much, but I'm treating CancellationException as very similar to ControlThrowable.

History and Justification

Scala 2 treated ControlThrowable specially, similar to how we treat CancellationException. Scala 3 wanted to do away with that, and instead treats Break like any other exception (in fact it's a RuntimeException). Needless to say, I disagree with that, with one reason being the issue at hand: some exceptions may get suppressed because of a break/raise.
Further, the entire Kotlin ecosystem treats CancellationException in a special way, and even Arrow treats it as a "fatal" exception for the purpose of not letting it get caught.
Also, I think all CancellationExceptions are being used for control flow, whether they're RaiseCancellationException or not. LMK if you disagree with this assertion, at least for how it's used in this PR!

Proposed solution(s)

I've categorized this behaviour in terms of 4 "variants" of completion:

  • normal: code returns a value normally
  • return: a non-local return happened. This requires an inline function, of course
  • cancel: a CancellationException was thrown. This includes coroutine cancellation and raise
  • throw: a non-fatal exception was thrown. Fatal exceptions are orthogonal to this discussion, and are thus out of scope (I'll open another PR later to discuss them)

When discussing resourceScope, it's important to note that onRelease cannot complete with a return.
I'm assuming, also, that onRelease completing normally (i.e. with Unit) should not change the completion behavior of resourceScope at all. In other words, I assume that onRelease { } is truly a no-op.
Thus, we care about onRelease completing with a cancellation or exceptionally.

Importantly, mergeSuppressed conflates normal and return completion with just null. This behaviour can be easily changed (since one can use finally to detect non-local-returns). Just keep that in mind when reading the provided code.

This PR has 8 tests for the behavior of resourceScope and onRelease, from the 4 completion types for resourceScope times the 2 interesting completion types for onRelease. "first" and "second" in the table cells refer to which completion is preferred ("first" being the one from the column, i.e. the completion of resourceScope; second being the one from the row, i.e. the onRelease)

Current behavior

normal return cancel throw
cancel second second first first
throw second second first first

Importantly, the main point of this PR is to change that second row so that it's as follows:

normal return cancel throw
throw second second second first

In other words, we seek to prioritize non-fatal exceptions above cancellation

The tricky part, however, is how to deal with that first row.

Proposed behavior

normal return cancel throw
cancel second second second first

which results from the axiom: raise and return should be treated the same.

Axiom Justification

raise and non-local return do the same thing: they let you jump to a statically determined, enclosing block of code. They're "static exceptions", if you will, or even "tunnelled exceptions", as they're called in literature. Even Ross Tate's lifetimes wish to do something similar to Raise (non-local returns becoming objects you can pass around), with so-called "Lexical Aborts" implemented using specially-tagged exceptions (which is how Raise is implemented!).
This axiom means that the return and cancel columns should be identical.

Note that, in the proposed behavior, this means that a raise in onRelease takes precedence over a raise in resourceScope

Scala-like behavior

Scala disagrees on this point. In their view, equal-priority completions should always prefer "first". One can change this PR by swapping the order of other is CancellationException and this is CancellationException, so that this is always prioritized.
However, this results in the following behavior, which breaks the raise-equals-return axiom:

normal return cancel throw
cancel second second first first

This can be fixed. resourceScope can treat non-local-returns specially, e.g. by producing a dummy CancellationException to pass into mergeSuppressed, or a new, more-expressive mergeSuppressed can be created. Regardless, it'd have the following consistent behavior:

normal return cancel throw
cancel second first first first

However, a new annoyance arises: normal completion is treated differently from non-local returns, even when code can look rather similar. For instance, consider the following:

apply {
  resourceScope {
    onClose { somethingThatThrowsCancellation() }
    return@apply // commenting this out changes behavior
  }
}

Arrow is no stranger to such a difference; accumulate makes this distinction. Still, it can be surprising, and it's annoying to implement since mergeSuppressed can't easily accomodate it in its current form.

Thus, I went with prioritizing onRelease's cancellation/raise. I think it makes some sense, too, because of how try-finally works normally. In particular, this interesting behavior happens:

val bool = run {
  try {
    return@run false
  } finally {
    return@run true
  }
}
bool shouldBe true

I.e. finally actually takes precedence. This is the case for exceptions as well, except when using use or try-with-resources in Java. Scala simply extrapolates from Java's try-with-resources and says that all completion behavior should prioritise try over finally; however, I think it makes sense to only apply such behavior to exceptions, and thus treat non-local returns and cancellation differently.

@github-actions

Copy link
Copy Markdown
Contributor

Kover Report

File Coverage [100.00%]
arrow-libs/core/arrow-exception-utils/src/commonMain/kotlin/arrow/core/ThrowableUtils.kt 100.00%
Total Project Coverage 47.91%

@serras

serras commented Jun 22, 2026

Copy link
Copy Markdown
Member

I am divided on this one. On the one hand, the reasoning seems good. On the other hand, this seems to break the behavior of existing clients for a problem that hasn't been reported in practice. @nomisRev what are your thoughts?

@kyay10

kyay10 commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

The important part is the Exception-above-cancellation thing (the throw row). We can go with the Scala-like behaviour for now since it matches current behaviour.
Admittedly, exception-ignoring is "intended" behaviour in that we have a test for it, so users might've depended on it.
On the other hand, such code is very fragile. If it's using raise, then switching to return or similar made the exception surface. If it was about cancellation, then there can easily be cancellation races that would make the exception surface or not, even though the exception is being thrown every time, just suppressed sometimes.

@nomisRev

nomisRev commented Jun 22, 2026

Copy link
Copy Markdown
Member

I was looking at this already last night, and I am also divided... To be honest I was also divided when I originally wrote this code, but in the end I decided to be consistent with use from Kotlin Std. If I remember correct the rationale is: When a cancellation event occurs the coroutine goes into cancellation state, and the cancellation state cannot be reverted. So a subsequent exception doesn't turn it into "failed".

Diverging from this now also seems tricky because it might change behaviour in edge-cases, and I am a bit hesitant to do so unless Kotlin Std changes behavior of use.

this seems to break the behavior of existing clients for a problem that hasn't been reported in practice.

This is another really good point. I also cannot find a similar YouTrack issue for use.

Thank you @kyay10 for your extensive research. If we do not merge can we improve somewhere else i.e. e.printStackTrace(), or leveraging CoroutineExceptionHandler in the coroutine based flows?

Even kotlinx-coroutines doesn't rethrow suppressed exceptions

I do find this example quite troubling, and you make a good point here.

"One might argue that it's on stdlib to fix AutoCloseable.use first, then we can follow suit. I think Arrow should still fix this behavior, just like how result { } doesn't catch fatal exceptions, even though stdlib's runCatching does."

I need to sleep on this a couple more nights 😅 Some other opinions would be really valuable.

@kyay10

kyay10 commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

I'll have to check this, but I recently tried code like:

try { awaitCancellation() } finally { throw RuntimeException() }

And it throws the runtime exception!

You're right, though, this should probably be pushed into stdlib use, and if they change it, we're easily justified to change ours

@kyay10

kyay10 commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

From conversation on Slack, it seems that if the code used RaiseCancellationException specifically, instead of general CancellationException, the behaviour change would be less surprising.
I'll make that change in the upcoming days. It'll likely be duplicated across a few modules (unless we want to move RaiseCancellationException into arrow-exception-utils), but it'll at least mitigate the issue when raise is involved.

@serras

serras commented Jun 27, 2026

Copy link
Copy Markdown
Member

I propose to fix #3926 (the simple fix, not reworking it over resourceScope) and #3925 along the way. That way we ensure that the behavior is aligned.

@kyay10

kyay10 commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator Author

To undermine my own argument: runCatching catching fatal exceptions is actually incredibly useful if you're writing low-level coroutines code. In general, Result is very specific to coroutines. use is different.

@arrow-kt arrow-kt locked and limited conversation to collaborators Jun 28, 2026
@arrow-kt arrow-kt unlocked this conversation Jun 28, 2026
@kyay10

kyay10 commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator Author

Here's the plan: I'll fix #3926 and #3925 in here as well.
I'll make some marker interface in arrow-exception-utils that RaiseCancellationException will implement, maybe marked with opt in just so we have freedom to change it later or even get rid of it (it's effectively a shared internal)

I'll still go with the proposed change, but only do it to that marker interface. In other words, behavior for cancel will be preserved, but raise will change. I'd still like opinions on whether we should go for a Scala-like approach instead, preferring first raise above second raise.

Independently, I'll raise (pun intended) the cancellation swallowing issue to stdlib. If they decide to change their behaviour there, we can easily update our code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants