feat(infrastructure): add VLM base classes and utilities#638
feat(infrastructure): add VLM base classes and utilities#638davidberenstein1957 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
| top = getattr(tok, "top_logprobs", None) or [] | ||
| for t in top: | ||
| token_str = (getattr(t, "token", "") or "").lower() | ||
| lp = float(getattr(t, "logprob", -1e9) or -1e9) |
There was a problem hiding this comment.
Logprob zero treated as missing due to falsy check
Medium Severity
The expression float(getattr(t, "logprob", -1e9) or -1e9) uses the or operator to provide a fallback, but 0.0 is falsy in Python. A logprob of 0.0 means P = exp(0) = 1.0 (100% probability), yet 0.0 or -1e9 evaluates to -1e9, turning that into P ≈ 0. This silently corrupts probability scoring whenever a token has logprob exactly zero.
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
| self.pooling_mode = pooling_mode | ||
| self.skip_instruction = skip_instruction | ||
| self.max_length = max_length | ||
| self.doc_max_length = 512 |
There was a problem hiding this comment.
Constructor ignores doc_max_length parameter, hardcodes 512
Medium Severity
LLM2Vec.__init__ accepts a doc_max_length parameter (line 79) but line 88 assigns self.doc_max_length = 512 instead of self.doc_max_length = doc_max_length. The parameter value is silently discarded, so any doc_max_length loaded from llm2vec_config.json via from_pretrained or passed explicitly has no effect on document truncation behavior.
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
| "peft>=0.18.0,<0.19.0", | ||
| "trl<=0.21.0", | ||
| "termcolor==2.3.0", | ||
| "realesrgan", |
There was a problem hiding this comment.
Heavy realesrgan moved from optional to core dependencies
Medium Severity
realesrgan was previously under the optional upscale extra but is now a core dependency in dependencies. This forces all users to install a heavy GPU-oriented package (with native compilation requirements) even if they never use upscaling. The upscale optional extra was simultaneously removed.
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
- Add BaseVLM abstract interface - Add LitellmVLM for API-based inference (OpenAI, Anthropic, etc.) - Add TransformersVLM for local Hugging Face models - Add StatefulVLMMeanScoresMetric base class for judge metrics - Add vlm_utils.py with image/batch utilities - Add pyproject.toml dependency pins (peft, litellm) - Add unit tests for infrastructure
f89b047 to
fb6d967
Compare
21212de to
7054e53
Compare
Keep PR #638 focused on VLM infrastructure by removing exports for downstream metric classes and restoring Rapidata export from the base branch. Co-authored-by: Cursor <cursoragent@cursor.com>


Summary
Adds the VLM inference infrastructure used by all downstream VLM judge metrics:
BaseVLMLitellmVLMTransformersVLMStatefulVLMMeanScoresMetricStack Position
feat/vlm-pr-1-vendor)feat/vlm-pr-3a-qa-accuracy)feat/vlm-pr-5-e2e-tests)feat/metrics-vlm-support)Files
src/pruna/evaluation/metrics/vlm_base.pysrc/pruna/evaluation/metrics/vlm_utils.pytests/evaluation/test_vlm_base_infrastructure.pysrc/pruna/evaluation/metrics/utils.pysrc/pruna/evaluation/metrics/__init__.pypyproject.tomlAlignment Notes
This PR is intentionally based on
feat/vlm-pr-1-vendorso reviewers only see infrastructure delta.Test Plan
Review Focus
Review Flow (Order)
Review the stack in this exact order:
This PR in the flow (2/10)