Fix Corrective-RAG never triggering web search (exact-match grade parsing)#246
Fix Corrective-RAG never triggering web search (exact-match grade parsing)#246douxiao398 wants to merge 1 commit into
Conversation
…actually triggers The corrective-rag and firecrawl-agent workflows decide whether to run a corrective web search with `if "no" in relevancy_results`. Each element of that list is the grader LLM's full response (lower-cased/stripped), so this only matches when a response is exactly "no". Real outputs like "no.", "No, the document is not relevant", etc. never match, so the corrective web search is silently skipped and the core Corrective-RAG mechanism never fires. corrective-rag also filtered relevant docs with `result == "yes"`, which drops any doc whose grade isn't exactly "yes" (e.g. "yes, relevant"). Make the parsing robust: - corrective-rag: treat a doc as relevant when its grade starts with "yes", and trigger the web search when any doc is not a clear "yes". - firecrawl-agent: it already treats a doc as relevant when the grade contains "yes", so trigger the web search when any doc does not contain "yes" (consistent with that definition).
📝 WalkthroughWalkthroughTwo corrective RAG workflow files update ChangesTolerant LLM Relevancy Grading
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
firecrawl-agent/workflow.py (1)
188-193:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRoute empty grading results to corrective web search.
At Line 188, an empty
relevancy_results_stripedmakesany(...)false, so the workflow skipsWebSearchEventwhen retrieval returns no documents. That leaves the user on the non-corrective path with no context.Suggested fix
- if any("yes" not in result.lower() for result in relevancy_results_striped): + if not relevancy_results_striped or any( + "yes" not in result.lower() for result in relevancy_results_striped + ): print("DEBUG: Some documents irrelevant, returning WebSearchEvent") return WebSearchEvent(relevant_text=relevant_text)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@firecrawl-agent/workflow.py` around lines 188 - 193, The condition at line 188 using any() returns False when relevancy_results_striped is empty, causing the workflow to incorrectly route to QueryEvent instead of WebSearchEvent when there are no documents to evaluate. Modify the condition to explicitly check if relevancy_results_striped is empty or if any result does not contain "yes", ensuring that both empty retrieval results and documents with missing "yes" values trigger the WebSearchEvent for corrective web search. You can do this by adding an additional check like `if not relevancy_results_striped or any(...)` to handle the empty case separately and return WebSearchEvent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@corrective-rag/workflow.py`:
- Around line 144-147: The condition at line 144 using `any(not
result.startswith("yes") for result in relevancy_results)` evaluates to False
when relevancy_results is empty, causing the code to incorrectly return
QueryEvent instead of triggering a web search. Modify the condition to
explicitly check if relevancy_results is empty OR if any result doesn't start
with "yes", so that empty retrieval results trigger the WebSearchEvent fallback
path as intended for corrective search behavior.
---
Outside diff comments:
In `@firecrawl-agent/workflow.py`:
- Around line 188-193: The condition at line 188 using any() returns False when
relevancy_results_striped is empty, causing the workflow to incorrectly route to
QueryEvent instead of WebSearchEvent when there are no documents to evaluate.
Modify the condition to explicitly check if relevancy_results_striped is empty
or if any result does not contain "yes", ensuring that both empty retrieval
results and documents with missing "yes" values trigger the WebSearchEvent for
corrective web search. You can do this by adding an additional check like `if
not relevancy_results_striped or any(...)` to handle the empty case separately
and return WebSearchEvent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 485a9728-6d5a-44a1-99c0-b9858ee7262d
📒 Files selected for processing (2)
corrective-rag/workflow.pyfirecrawl-agent/workflow.py
| if any(not result.startswith("yes") for result in relevancy_results): | ||
| return WebSearchEvent(relevant_text=relevant_text) | ||
| else: | ||
| return QueryEvent(relevant_text=relevant_text, search_text="") |
There was a problem hiding this comment.
Handle empty retrieval as a web-search trigger.
At Line 144, any(...) is false for an empty relevancy_results, so zero retrieved docs bypass corrective search and go straight to QueryEvent. That misses the fallback path exactly when retrieval fails.
Suggested fix
- if any(not result.startswith("yes") for result in relevancy_results):
+ if not relevancy_results or any(
+ not result.startswith("yes") for result in relevancy_results
+ ):
return WebSearchEvent(relevant_text=relevant_text)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@corrective-rag/workflow.py` around lines 144 - 147, The condition at line 144
using `any(not result.startswith("yes") for result in relevancy_results)`
evaluates to False when relevancy_results is empty, causing the code to
incorrectly return QueryEvent instead of triggering a web search. Modify the
condition to explicitly check if relevancy_results is empty OR if any result
doesn't start with "yes", so that empty retrieval results trigger the
WebSearchEvent fallback path as intended for corrective search behavior.
The Corrective-RAG demos never actually run their corrective web search, because the trigger condition can't match real LLM output.
In
corrective-rag/workflow.py(and the same logic infirecrawl-agent/workflow.py), each retrieved document is graded by an LLM and the full response is stored:The decision to do a web search is then:
relevancy_resultsis a list of full grader responses, so"no" in relevancy_resultsis a membership test that is only true when a response is exactly the string"no". The grading prompt asks for a binaryyes/no, but in practice the model replies with things like"no.","No, the document is not relevant", etc. None of those equal"no", so the condition is almost never true and the corrective web search is silently skipped — the core Corrective-RAG behavior doesn't fire.corrective-raghas the same issue on the relevant-doc filter (result == "yes"), which drops any document whose grade isn't exactly"yes"(e.g."yes, relevant").Fix — make the grade parsing robust instead of exact-match:
"yes", and the web search triggers when any document is not a clear"yes"."yes", so I made the trigger consistent with that — web search when any document's grade does not contain"yes". (It had switched the relevant-doc filter to fuzzy matching but left the"no" in ...trigger as exact, so the two were inconsistent.)Easy way to see it before the fix: grade an irrelevant document so the model answers
"No, ..."— the old code returnsQueryEvent(no web search) instead ofWebSearchEvent.Summary by CodeRabbit