Skip to content

Fix LLM callback isolation without serializing requests#4252

Open
VedantMadane wants to merge 3 commits intocrewAIInc:mainfrom
VedantMadane:fix/llm-callbacks-no-global-mutation
Open

Fix LLM callback isolation without serializing requests#4252
VedantMadane wants to merge 3 commits intocrewAIInc:mainfrom
VedantMadane:fix/llm-callbacks-no-global-mutation

Conversation

@VedantMadane
Copy link
Copy Markdown

@VedantMadane VedantMadane commented Jan 19, 2026

This is a follow-up to #4218 (auto-closed by bot) addressing the same race in LLM callback handling without holding a global lock across the network call.

What changed

  • Stop mutating LiteLLM global callback lists for per-request callbacks.
  • Pass callbacks via the request params ("callbacks") and continue to invoke token usage callbacks from CrewAI response handlers.
  • Make test_llm_callback_replacement deterministic by mocking litellm.completion (removes sleep/heisenbug).

Why

The approach in #4218 used a class-level lock held across the entire LLM request which can serialize all concurrent agent calls. This keeps concurrency while still ensuring callback isolation.

Fixes #4214.


Note

High Risk
Touches core LLM.call/LLM.acall request plumbing and callback behavior, which can regress token tracking and integrations under concurrency. The async error-handling block appears to contain duplicated/stray code that could break acall at runtime.

Overview
Fixes callback race conditions across concurrent LiteLLM calls by stopping CrewAI from mutating LiteLLM's global callback lists and instead passing callbacks on the per-request params for both sync and async code paths.

Removes the LLM.set_callbacks global-deduplication helper and updates tests: makes test_llm_callback_replacement deterministic by mocking litellm.completion, and adds a new threaded concurrency test to assert callback and token-usage isolation between simultaneous requests.

Written by Cursor Bugbot for commit 877d021. This will update automatically on new commits. Configure here.

@VedantMadane
Copy link
Copy Markdown
Author

Not covered in this PR description:

  1. Lock scoping alternative (save previous global callbacks, set new ones, perform request, then restore) and why we avoided it.
  2. Context local callback isolation using contextvars or thread local dispatch.
  3. A true concurrency regression test (multi thread or async) that proves no cross contamination under parallel calls.

If you prefer, I can add a follow up commit that documents these options or adds a concurrency focused test.

@VedantMadane VedantMadane force-pushed the fix/llm-callbacks-no-global-mutation branch from 31fdc55 to 35483b6 Compare February 10, 2026 11:02
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

# Conflicts:
#	lib/crewai/src/crewai/llm.py
set_callbacks mutated LiteLLM's global callback lists - the pattern this PR
removes. Its only call sites were deleted; no other callers exist. Removing
to avoid accidental re-introduction of global-mutation pattern.

Made-with: Cursor
@VedantMadane VedantMadane force-pushed the fix/llm-callbacks-no-global-mutation branch from 877d021 to 003f5a3 Compare March 27, 2026 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Hidden race condition in LLM callback system causing test failures

1 participant