fix: allow text files with non-multimodal models (#5137)#5138
Open
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Open
fix: allow text files with non-multimodal models (#5137)#5138devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Conversation
TextFiles passed via input_files incorrectly triggered a 'Model does not support multimodal input' error for non-vision-capable models. Text files are now inlined as text content in the message instead of being rejected. Changes: - base_llm.py: Rewrite _process_message_files to distinguish text files from binary files; add _is_text_file helper - llm.py: Apply same text-file inlining logic in both sync and async _process_message_files methods - crew.py: Recognize text file MIME types as auto-injectable so they don't require the read_file tool - task.py: Same text-file auto-injection logic in prompt method - tests: Add 17 tests covering text file inlining, image rejection, mixed files, _is_text_file helper, and edge cases Co-Authored-By: João <joao@crewai.com>
Contributor
Author
|
Prompt hidden (unlisted session) |
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: João <joao@crewai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #5137 —
TextFileobjects passed viainput_filesincorrectly raised"Model does not support multimodal input"for non-vision models (e.g.gpt-3.5-turbo,claude-sonnet-4.6).The root cause was that
_process_message_filesrejected all files when the model wasn't multimodal, without distinguishing text files from binary files (images, PDFs, audio, video).Fix: Text files are now inlined as plain text content in the message body instead of being rejected. Non-text files (images, PDFs, etc.) still correctly raise
ValueErrorfor non-multimodal models.Files changed:
base_llm.py— Rewrote_process_message_filesto separate text vs non-text files; added_is_text_file()static helperllm.py— Same inlining logic applied to both sync and async_process_message_filescrew.py/task.py— Updatedis_auto_injected()to recognize text MIME types so text files don't unnecessarily require theread_filetooltest_multimodal.py— 17 new testsReview & Testing Checklist for Human
base_llm.py:648-703,llm.py:2056-2110,llm.py:2155-2209): Verify the logic is identical in all three copies. The async path (_aprocess_message_files) has no dedicated test coverage — consider whether that's acceptable.ValueErroris raised for the image. Confirm this is the desired behavior vs. raising immediately without modifying the message._is_text_file()inbase_llm.pyandtext_prefixestuples increw.py/task.pydefine the same set of text MIME types independently. A drift between these lists could cause inconsistent behavior. Consider whether a shared constant is warranted.except Exceptiononread_text(): If a TextFile fails to read, it silently falls into the non-text bucket and may trigger theValueError. Verify this fallback is appropriate vs. surfacing the read error..txtor.jsonfile viainput_filesto a crew using a non-multimodal model (e.g.gpt-3.5-turbo) and confirm the task completes successfully with the file content visible in the prompt.Notes
test_multimodal.py(e.g.TestLiteLLMMultimodal::test_format_multimodal_content_image) are unrelated to this change — they fail onmainas well due to the minimal test PNG not being processed by the currentcrewai_fileslibrary.Link to Devin session: https://app.devin.ai/sessions/e46e8669f7a5459380d029d403270307