Fix runtime data races, memory leak, and shutdown safety#1429
Open
bmehta001 wants to merge 6 commits intomicrosoft:mainfrom
Open
Fix runtime data races, memory leak, and shutdown safety#1429bmehta001 wants to merge 6 commits intomicrosoft:mainfrom
bmehta001 wants to merge 6 commits intomicrosoft:mainfrom
Conversation
- HttpClient_Apple: scope Cancel() to m_dataTask only instead of blanket-cancelling every task on the shared session. Fix torn read on m_requests.empty() in CancelAllRequests spin loop. - HttpClientManager: fix torn read on m_httpCallbacks.empty() in cancelAllRequests spin loop — read under lock. - HttpResponseDecoder: add missing delete ctx->httpResponse before nullptr in Abort and RetryNetwork paths (memory leak). Co-authored-by: Copilot <[email protected]>
- Only delete queued tasks after successful join (not after detach, where the thread may still access them — undefined behavior) - Replace catch(...) with std::system_error and std::exception handlers that log error code and message - Log pending queue sizes in both join and detach paths Co-authored-by: Copilot <[email protected]>
Both variables are read and written from different threads during normal upload scheduling. Declare as std::atomic to eliminate data races per the C++ memory model. Add .load() for variadic LOG_TRACE calls. Add comment explaining why unlocked stores in uploadAsync are safe. Co-authored-by: Copilot <[email protected]>
Remove LOG_TRACE from Logger destructor — it triggers a crash on iOS simulator when the recursive_mutex used by logging has already been destroyed during static destruction. Co-authored-by: Copilot <[email protected]>
3f0289c to
de46cb2
Compare
Reject new worker-thread tasks once shutdown starts so queue cleanup cannot race with late producers, and move the TPM scheduled-upload state back under a single mutex so latency/next-upload decisions stay consistent without mixed atomic and mutex access. Files changed: - lib/pal/WorkerThread.cpp - lib/tpm/TransmissionPolicyManager.cpp - lib/tpm/TransmissionPolicyManager.hpp Co-authored-by: Copilot <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR targets runtime correctness in the SDK by addressing thread-safety issues, shutdown safety, and a per-request memory leak in HTTP response handling.
Changes:
- Tighten HTTP request cancellation scoping and fix torn reads in request/callback tracking loops.
- Fix a
SimpleHttpResponseleak on aborted/network-failure decode paths. - Rework worker-thread shutdown behavior to avoid unsafe queue cleanup after
detach()and improve error logging. - Refactor
TransmissionPolicyManagerscheduling state synchronization (mutex + newcancelUploadTaskLocked()helper).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/tpm/TransmissionPolicyManager.hpp | Changes upload-scheduling state fields and adds a locked cancellation helper declaration. |
| lib/tpm/TransmissionPolicyManager.cpp | Moves upload-scheduling state access under a mutex and adjusts cancellation/scheduling flow. |
| lib/pal/WorkerThread.cpp | Makes shutdown/join behavior safer and improves exception handling/logging during join/detach. |
| lib/http/HttpResponseDecoder.cpp | Deletes ctx->httpResponse on Abort/RetryNetwork paths to prevent leaks. |
| lib/http/HttpClient_Apple.mm | Limits cancellation to the instance’s task and fixes a torn read in the shutdown wait loop. |
| lib/http/HttpClientManager.cpp | Fixes a torn read in the shutdown wait loop by locking around empty-check. |
| lib/api/Logger.cpp | Removes destructor logging to avoid iOS static-destruction-order crash. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Keep the scheduled-upload state mutex-based, but stop holding m_scheduledUploadMutex across DeferredCallbackHandle::Cancel so shutdown and pause paths do not block uploadAsync behind the same lock. While touching the path, use std::chrono::milliseconds for the bandwidth-controller reschedule call so ENABLE_BW_CONTROLLER builds cleanly. Co-authored-by: Copilot <[email protected]>
bmehta001
added a commit
to bmehta001/cpp_client_telemetry
that referenced
this pull request
May 1, 2026
Co-authored-by: Copilot <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+152
to
+160
| while (true) | ||
| { | ||
| { | ||
| LOCKGUARD(m_httpCallbacksMtx); | ||
| if (m_httpCallbacks.empty()) | ||
| break; | ||
| } | ||
| std::this_thread::yield(); | ||
| } |
Comment on lines
+114
to
+117
| if (m_shuttingDown) { | ||
| LOG_WARN("Dropping queued task %p during shutdown", item); | ||
| delete item; | ||
| return; |
Comment on lines
+98
to
+105
| // Clean up any tasks remaining in the queues after shutdown. | ||
| // Only safe after join() — the thread has fully exited. | ||
| // After detach(), the thread still needs the shutdown item | ||
| // and may still be accessing the queues. | ||
| if (joined) { | ||
| for (auto task : m_queue) { delete task; } | ||
| m_queue.clear(); | ||
| for (auto task : m_timerQueue) { delete task; } |
Comment on lines
+173
to
+180
| scheduledUploadLock.unlock(); | ||
| if (!cancelUploadTask()) | ||
| { | ||
| LOG_TRACE("Upload either hasn't been scheduled or already done."); | ||
| } | ||
| scheduledUploadLock.lock(); | ||
| if (shouldSkipScheduling()) | ||
| { |
Comment on lines
+173
to
+180
| scheduledUploadLock.unlock(); | ||
| if (!cancelUploadTask()) | ||
| { | ||
| LOG_TRACE("Upload either hasn't been scheduled or already done."); | ||
| } | ||
| scheduledUploadLock.lock(); | ||
| if (shouldSkipScheduling()) | ||
| { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes several thread-safety and resource-management issues that affect normal SDK operation (not just reinit edge cases). Split out from #1415 per reviewer request so runtime behavior changes stay separate from CI/build/test fixes.
Fix HTTP client torn reads and scoped cancellation (3 files)
HttpClient_Apple.mm:
[session getTasksWithCompletionHandler:]fromCancel(); the async block uses the shared staticNSURLSessionand can cancel tasks belonging to otherHttpClient_Appleinstances (or a newly initialized instance after teardown). Scope cancellation to[m_dataTask cancel]only.m_requests.empty()inCancelAllRequests();std::mapwas read without holdingm_requestsMtxwhileErase/Addmutate it under the lock.HttpClientManager.cpp:
m_httpCallbacks.empty()incancelAllRequests()by reading it underm_httpCallbacksMtx.HttpResponseDecoder.cpp:
delete ctx->httpResponsebefore setting it tonullptrin theAbortandRetryNetworkpaths. TheAcceptedandRetryServerpaths already handled this correctly.Fix WorkerThread shutdown safety (1 file)
Queue()calls are rejected once teardown starts.join().detach()on repeatedJoin()calls when the worker thread is already non-joinable.Make TransmissionPolicyManager scheduling consistently mutex-guarded (2 files)
m_runningLatency,m_scheduledUploadTime, andm_isUploadScheduledunderm_scheduledUploadMutexinstead of mixing atomics with mutex-protected state.m_scheduledUploadMutexacrossDeferredCallbackHandle::Cancel(...)so stop/pause paths do not block againstuploadAsync()on the same lock.std::chrono::milliseconds(delayMs)for the bandwidth-controller reschedule call soENABLE_BW_CONTROLLERbuilds cleanly.Fix Logger destructor crash (1 file)
LOG_TRACEfromLogger::~Logger(); it triggers a static-destruction-order crash on iOS when the logging mutex has already been destroyed.