Skip to content

fix: allow text files with non-multimodal models (#5137)#5138

Open
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1774600272-fix-text-file-multimodal-check
Open

fix: allow text files with non-multimodal models (#5137)#5138
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1774600272-fix-text-file-multimodal-check

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Fixes #5137TextFile objects passed via input_files incorrectly raised "Model does not support multimodal input" for non-vision models (e.g. gpt-3.5-turbo, claude-sonnet-4.6).

The root cause was that _process_message_files rejected all files when the model wasn't multimodal, without distinguishing text files from binary files (images, PDFs, audio, video).

Fix: Text files are now inlined as plain text content in the message body instead of being rejected. Non-text files (images, PDFs, etc.) still correctly raise ValueError for non-multimodal models.

Files changed:

  • base_llm.py — Rewrote _process_message_files to separate text vs non-text files; added _is_text_file() static helper
  • llm.py — Same inlining logic applied to both sync and async _process_message_files
  • crew.py / task.py — Updated is_auto_injected() to recognize text MIME types so text files don't unnecessarily require the read_file tool
  • test_multimodal.py — 17 new tests

Review & Testing Checklist for Human

  • Duplicated inlining logic across 3 locations (base_llm.py:648-703, llm.py:2056-2110, llm.py:2155-2209): Verify the logic is identical in all three copies. The async path (_aprocess_message_files) has no dedicated test coverage — consider whether that's acceptable.
  • Mixed text+image behavior: When both a TextFile and an ImageFile are attached to a non-multimodal model, the text gets inlined as a side-effect before the ValueError is raised for the image. Confirm this is the desired behavior vs. raising immediately without modifying the message.
  • MIME type lists not shared: _is_text_file() in base_llm.py and text_prefixes tuples in crew.py/task.py define the same set of text MIME types independently. A drift between these lists could cause inconsistent behavior. Consider whether a shared constant is warranted.
  • Bare except Exception on read_text(): If a TextFile fails to read, it silently falls into the non-text bucket and may trigger the ValueError. Verify this fallback is appropriate vs. surfacing the read error.
  • Manual E2E test: Pass a .txt or .json file via input_files to a crew using a non-multimodal model (e.g. gpt-3.5-turbo) and confirm the task completes successfully with the file content visible in the prompt.

Notes

  • The 10 pre-existing test failures in test_multimodal.py (e.g. TestLiteLLMMultimodal::test_format_multimodal_content_image) are unrelated to this change — they fail on main as well due to the minimal test PNG not being processed by the current crewai_files library.

Link to Devin session: https://app.devin.ai/sessions/e46e8669f7a5459380d029d403270307

TextFiles passed via input_files incorrectly triggered a 'Model does not
support multimodal input' error for non-vision-capable models. Text files
are now inlined as text content in the message instead of being rejected.

Changes:
- base_llm.py: Rewrite _process_message_files to distinguish text files
  from binary files; add _is_text_file helper
- llm.py: Apply same text-file inlining logic in both sync and async
  _process_message_files methods
- crew.py: Recognize text file MIME types as auto-injectable so they
  don't require the read_file tool
- task.py: Same text-file auto-injection logic in prompt method
- tests: Add 17 tests covering text file inlining, image rejection,
  mixed files, _is_text_file helper, and edge cases

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Prompt hidden (unlisted session)

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Co-Authored-By: João <joao@crewai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] / [HELP] "Model does not support multimodal input [...] Use a vision-capable model" for TextFile input

0 participants