vmm: cpu: remove Mutex around Vec<VcpuState> to prevent deadlock#152
vmm: cpu: remove Mutex around Vec<VcpuState> to prevent deadlock#152phip1611 wants to merge 2 commits into
Conversation
This once and for all finally resolves the deadlock that was only partially addressed in cloud-hypervisor#7990 (dfe78a3 and following) and cloud-hypervisor#8092 (2482f56). The first one laid the important basic plumbing but actually just nararrowd down the deadlock. The second PR tightened the new race window further - but never resolved it. This commit removes the Mutex around `Vec<VcpuState>`, which was recently introduced in cloud-hypervisor#7990 (dfe78a3). The deadlock was: - CpuManager::pause() -> VcpuState::wait_until_signal_acknowleded() - Mutex is held - vCPU could not do MMIO on AcpiCpuHotplugController and therefore never acknowledge the signal. The Mutex around `Vec<VcpuState>` was introduced because: - CpuManager sometimes needs access to all vCPU states (the whole vec) or individual ones - AcpiPciHotplugController needs access to individual vCPU states The mutating operations caused the Mutex in the first place where: - setting `handle: Option<std::thread::Handle>` - from `None` -> `Some` in `CpuManager::start_vcpu()` - from `Some` -> `None`: - in `AcpiCpuHotplugController::remove_vcpu()` - in `CpuManager::shutdown()` - the two bools `inserting` and `removing`, which were guest-visible and used to check if guest ACKed CPU hotplug events I resolved that by moving the Mutex around the Option to the thread handle and switching the two raw bools to `AtomicBool`. After comprehensive analysis, I am confident this is a proper and clean solution that doesn't introduce any new (critical) races. On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
After removing the outer `vcpu_states` mutex, a slot can briefly look inactive before its teardown has actually finished. During ACPI eject, `remove_vcpu()` joins the thread and therefore drops the handle before it clears `kill` and `pending_removal`. That makes `active()` flip to false too early. `check_pending_removed_vcpu()` used `active() && pending_removal`, so a concurrent resize could miss an in-flight hot-remove, treat the slot as reusable, and start a new vCPU while the old teardown still had `kill` set. Fix this by treating `pending_removal` as the authoritative "do not reuse yet" flag and documenting that `active()` is not sufficient here. On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
c83afc7 to
6b98627
Compare
| if !self.active() { | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
active() only checks whether handle is Some. So if remove_vcpu() is called and reaches join_thread(), another caller of signal_vcpus() could see handle == None and stop waiting immediately. But the vCPU may still be running (i.e. join_thread() hasn't finished joining yet).
I don't think it is enough if active just checks whether handle is Some. I guess a safe fix would be to introduce yet another atomic to track whether a thread has exited.
There was a problem hiding this comment.
Let me think this through and get back to you again
There was a problem hiding this comment.
@olivereanderson can you help me please to ratify if these race conditions are problematic? I'm also investigating
There was a problem hiding this comment.
Yes, I will take an additional look now.
There was a problem hiding this comment.
Hm, i have a feeling that I take one step back. RwLock<Vec<VcpuState>>. Then, callers that need exclusive access to the whole thing can have that, whereas on the critical deadlock section, I make sure both parties just use that .read() and operate on a shared atomic bool
There was a problem hiding this comment.
I need to think a bit more about this. I am wondering whether this would be a good opportunity to switch the atomic bools out with a single AtomicU64. Let me get back to you on that.
There was a problem hiding this comment.
this would be a good opportunity to switch the atomic bools out with a single AtomicU64
I think we should not do such drive-by changes - altohugh I agree that the many atomic bools are not optimal.
I was experiencing with RwLock + AtomicBools and I noticed I was thinking the wrong way. Instead of Arc<RwLock<Vec<VcpuState>> I'll try my next design with Arc<Box<[RwLock<VcpuState>]> which seems to be exactly what I need and how the code mutates vcpu states.
|
I am currently reviewing this, but it might take a couple of days since we need to get it right this time as you mentioned 👍 |
olivereanderson
left a comment
There was a problem hiding this comment.
Great work!
I cannot find any more deadlocks with this patch set!
|
So from what I can tell we still have potential for a deadlock in |
|
I also don't see where the |
I think it is just reinitialized when we create a new vcpu. Should be a fix for a different PR. Or am I missing something? |
It should likely be a fix for a different PR. I just brought it up because you mentioned in the PR description that this "must be the solution" (to all vCPU related deadlocks presumably?). |
thanks for bringing this up! This was a broader statement than anticipated. I wanted to say: This should fix the deadlock between pausing a vcpu and a vcpus mmio access onto the AcpiCpuHotplugController. |
|
thanks everyone! You all have been very helpful, especially @amphi and @olivereanderson . I think I have a solution now that works. Let's continue the discussion upstream: |
This once and for all finally resolves the deadlock that was only
partially addressed in cloud-hypervisor#7990
(dfe78a3 and following) and cloud-hypervisor#8092
(2482f56).
I messed up in #136 by introducing a new deadlock. This solves the situation in a clean solution. Details in commit message.
Hints for Reviewers
Closes https://github.com/cobaltcore-dev/cobaltcore/issues/474