Skip to content

vmm: cpu: remove Mutex around Vec<VcpuState> to prevent deadlock#152

Closed
phip1611 wants to merge 2 commits into
cyberus-technology:gardenlinuxfrom
phip1611:fix-cpu-deadlock
Closed

vmm: cpu: remove Mutex around Vec<VcpuState> to prevent deadlock#152
phip1611 wants to merge 2 commits into
cyberus-technology:gardenlinuxfrom
phip1611:fix-cpu-deadlock

Conversation

@phip1611
Copy link
Copy Markdown
Member

@phip1611 phip1611 commented Apr 28, 2026

This once and for all finally resolves the deadlock that was only
partially addressed in cloud-hypervisor#7990
(dfe78a3 and following) and cloud-hypervisor#8092
(2482f56).

I messed up in #136 by introducing a new deadlock. This solves the situation in a clean solution. Details in commit message.

Hints for Reviewers

Closes https://github.com/cobaltcore-dev/cobaltcore/issues/474

@phip1611 phip1611 self-assigned this Apr 28, 2026
This once and for all finally resolves the deadlock that was only
partially addressed in cloud-hypervisor#7990
(dfe78a3 and following) and cloud-hypervisor#8092
(2482f56). The first one laid the
important basic plumbing but actually just nararrowd down the deadlock.
The second PR tightened the new race window further - but never resolved
it.

This commit removes the Mutex around `Vec<VcpuState>`, which was
recently introduced in cloud-hypervisor#7990 (dfe78a3).

The deadlock was:
- CpuManager::pause() -> VcpuState::wait_until_signal_acknowleded()
- Mutex is held
- vCPU could not do MMIO on AcpiCpuHotplugController and therefore never
  acknowledge the signal.

The Mutex around `Vec<VcpuState>` was introduced because:

- CpuManager sometimes needs access to all vCPU states (the whole vec)
  or individual ones
- AcpiPciHotplugController needs access to individual vCPU states

The mutating operations caused the Mutex in the first place where:

- setting `handle: Option<std::thread::Handle>`
  - from `None` -> `Some` in `CpuManager::start_vcpu()`
  - from `Some` -> `None`:
    - in `AcpiCpuHotplugController::remove_vcpu()`
    - in `CpuManager::shutdown()`
- the two bools `inserting` and `removing`, which were guest-visible
  and used to check if guest ACKed CPU hotplug events

I resolved that by moving the Mutex around the Option to the thread
handle and switching the two raw bools to `AtomicBool`. After
comprehensive analysis, I am confident this is a proper and clean
solution that doesn't introduce any new (critical) races.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
After removing the outer `vcpu_states` mutex, a slot can briefly look
inactive before its teardown has actually finished.

During ACPI eject, `remove_vcpu()` joins the thread and therefore drops
the handle before it clears `kill` and `pending_removal`. That makes
`active()` flip to false too early.

`check_pending_removed_vcpu()` used `active() && pending_removal`, so a
concurrent resize could miss an in-flight hot-remove, treat the slot as
reusable, and start a new vCPU while the old teardown still had `kill`
set.

Fix this by treating `pending_removal` as the authoritative "do not
reuse yet" flag and documenting that `active()` is not sufficient
here.

On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
@phip1611 phip1611 marked this pull request as ready for review April 28, 2026 13:52
Comment thread vmm/src/cpu.rs
Comment thread vmm/src/cpu.rs
Comment thread vmm/src/cpu.rs
Comment thread vmm/src/cpu.rs
Comment on lines +739 to +741
if !self.active() {
return Ok(());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

active() only checks whether handle is Some. So if remove_vcpu() is called and reaches join_thread(), another caller of signal_vcpus() could see handle == None and stop waiting immediately. But the vCPU may still be running (i.e. join_thread() hasn't finished joining yet).

I don't think it is enough if active just checks whether handle is Some. I guess a safe fix would be to introduce yet another atomic to track whether a thread has exited.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think this through and get back to you again

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olivereanderson can you help me please to ratify if these race conditions are problematic? I'm also investigating

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will take an additional look now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, i have a feeling that I take one step back. RwLock<Vec<VcpuState>>. Then, callers that need exclusive access to the whole thing can have that, whereas on the critical deadlock section, I make sure both parties just use that .read() and operate on a shared atomic bool

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think a bit more about this. I am wondering whether this would be a good opportunity to switch the atomic bools out with a single AtomicU64. Let me get back to you on that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a good opportunity to switch the atomic bools out with a single AtomicU64

I think we should not do such drive-by changes - altohugh I agree that the many atomic bools are not optimal.

I was experiencing with RwLock + AtomicBools and I noticed I was thinking the wrong way. Instead of Arc<RwLock<Vec<VcpuState>> I'll try my next design with Arc<Box<[RwLock<VcpuState>]> which seems to be exactly what I need and how the code mutates vcpu states.

@olivereanderson
Copy link
Copy Markdown

olivereanderson commented Apr 30, 2026

I am currently reviewing this, but it might take a couple of days since we need to get it right this time as you mentioned 👍

Copy link
Copy Markdown

@olivereanderson olivereanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I cannot find any more deadlocks with this patch set!

@olivereanderson
Copy link
Copy Markdown

So from what I can tell we still have potential for a deadlock in AcpiHotplugController::remove_cpu: Imagine pause is called just before we get to join_thread in AcpiHotplugController::remove_vcpu, then it could be that the thread is stuck sleeping forever and never completes thus join_thread ends up hanging forever.

@olivereanderson
Copy link
Copy Markdown

I also don't see where the run_interrupted flag is cleared after we kill a vCPU. Is it even a good idea to reuse previous state when starting up a new vCPU? (I guess there must be, but it seems like a source for bugs as well).

@phip1611
Copy link
Copy Markdown
Member Author

phip1611 commented May 13, 2026

I also don't see where the run_interrupted flag is cleared after we kill a vCPU. Is it even a good idea to reuse previous state when starting up a new vCPU? (I guess there must be, but it seems like a source for bugs as well).

I think it is just reinitialized when we create a new vcpu. Should be a fix for a different PR. Or am I missing something?

@olivereanderson
Copy link
Copy Markdown

I also don't see where the run_interrupted flag is cleared after we kill a vCPU. Is it even a good idea to reuse previous state when starting up a new vCPU? (I guess there must be, but it seems like a source for bugs as well).

I think it is just reinitialized when we create a new vcpu. Should be a fix for a different PR. Or am I missing something?

It should likely be a fix for a different PR. I just brought it up because you mentioned in the PR description that this "must be the solution" (to all vCPU related deadlocks presumably?).

@phip1611
Copy link
Copy Markdown
Member Author

mentioned in the PR description that this "must be the solution" (to all vCPU related deadlocks presumably?).

thanks for bringing this up! This was a broader statement than anticipated. I wanted to say: This should fix the deadlock between pausing a vcpu and a vcpus mmio access onto the AcpiCpuHotplugController.

@phip1611
Copy link
Copy Markdown
Member Author

phip1611 commented May 13, 2026

thanks everyone! You all have been very helpful, especially @amphi and @olivereanderson . I think I have a solution now that works. Let's continue the discussion upstream:

cloud-hypervisor#8210

@phip1611 phip1611 closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants