UNS-481 [FEAT] Add Gemini embedding adapter for Google AI Studio#1891
UNS-481 [FEAT] Add Gemini embedding adapter for Google AI Studio#1891jaseemjaskp wants to merge 3 commits intomainfrom
Conversation
Add a new embedding adapter for Google's Gemini models using LiteLLM's gemini/ provider prefix. Enables Gemini-based embedding models (e.g. gemini/text-embedding-004) in document processing workflows.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbit
WalkthroughAdds a Gemini embedding adapter: parameter model/validation, adapter implementation and metadata, JSON configuration schema, module export, and tests for registry, schema, and validation (no runtime integration changes). Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| unstract/sdk1/src/unstract/sdk1/adapters/base1.py | Adds GeminiEmbeddingParameters with non-mutating validate/validate_model; validate makes a defensive copy before modifying the dict, consistent with best practices in this file. |
| unstract/sdk1/src/unstract/sdk1/adapters/embedding1/gemini.py | New GeminiEmbeddingAdapter follows the established pattern exactly; MRO, static methods, metadata, and provider string are all correct. |
| unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.json | JSON schema is consistent with openai.json; required fields, password format, and timeout default (240) all match existing adapter conventions. |
| unstract/sdk1/src/unstract/sdk1/adapters/embedding1/init.py | Clean import and all addition; alphabetical ordering preserved. |
| unstract/sdk1/tests/test_gemini_embedding.py | 16 tests covering registration, ID format, schema structure, prefix idempotency, mutation guards, and validation error paths — comprehensive coverage with no gaps. |
Sequence Diagram
sequenceDiagram
participant UI as Frontend UI
participant API as Unstract API
participant Adapter as GeminiEmbeddingAdapter
participant Params as GeminiEmbeddingParameters
participant LiteLLM as LiteLLM (gemini/)
UI->>API: Submit adapter config (adapter_name, api_key, model)
API->>Adapter: validate(adapter_metadata)
Adapter->>Params: shallow copy metadata
Params->>Params: validate_model() — add gemini/ prefix if missing
Params->>Params: Pydantic model_dump() — strip unknown fields
Params-->>API: validated dict {model, api_key, embed_batch_size, timeout, ...}
API-->>UI: Adapter saved
UI->>API: Run embedding task
API->>Adapter: embed(texts)
Adapter->>LiteLLM: litellm.embedding(model="gemini/text-embedding-004", ...)
LiteLLM-->>Adapter: embedding vectors
Adapter-->>API: vectors
API-->>UI: result
Reviews (3): Last reviewed commit: "UNS-481 [FIX] Handle None model value in..." | Re-trigger Greptile
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
unstract/sdk1/tests/test_gemini_embedding.py (1)
1-1: Harden UUID validation intest_get_id_format.Current check only validates length. Parsing via
uuid.UUID(...)will catch malformed IDs with correct length.Suggested test improvement
import json +import uuid from unstract.sdk1.adapters.embedding1.gemini import GeminiEmbeddingAdapter from unstract.sdk1.adapters.enums import AdapterTypes @@ def test_get_id_format(self) -> None: adapter_id = GeminiEmbeddingAdapter.get_id() assert adapter_id.startswith("gemini|") - # UUID part should be 36 chars uuid_part = adapter_id.split("|")[1] - assert len(uuid_part) == 36 + assert str(uuid.UUID(uuid_part)) == uuid_partAlso applies to: 14-20
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/tests/test_gemini_embedding.py` at line 1, The test test_get_id_format currently only asserts the UUID string length; update it to parse the candidate ID with uuid.UUID(candidate_id) to ensure it's a valid UUID (this will raise ValueError for malformed IDs even if length matches). Locate the test_get_id_format function in the test_gemini_embedding tests and replace or augment the length-only assertion with a uuid.UUID(...) parse (and optionally assert the parsed object's version if needed). Apply the same change to the other similar assertions referenced around lines 14-20 so all UUID checks validate by parsing rather than length alone.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/adapters/base1.py`:
- Around line 959-963: In validate_model, treat adapter_metadata["model"] values
that are empty or blank as missing before prefixing: when reading model from
adapter_metadata in validate_model, normalize by checking if model is falsy or
model.strip() == "" and if so set it to the default "gemini/text-embedding-004";
then proceed with the existing startswith("gemini/") check to prefix only
non-empty names and assign back to adapter_metadata["model"]. This ensures
validate_model does not produce "gemini/" for blank strings.
In `@unstract/sdk1/src/unstract/sdk1/adapters/embedding1/gemini.py`:
- Around line 35-36: The get_icon() function returns a reference to "Gemini.png"
which is missing and causes the UI icon to fail; add a Gemini.png image asset
into the same adapter-icons asset directory where other adapter icons (e.g.,
OpenAI, Bedrock, VertexAI) live so the path returned by get_icon() resolves at
runtime, and ensure the image filename and casing exactly match the string
"/icons/adapter-icons/Gemini.png" used in get_icon().
In `@unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.json`:
- Around line 35-41: The JSON schema's "timeout" property default (currently
240) is inconsistent with the runtime default used by BaseEmbeddingParameters
(600); update the "timeout" default in
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.json to 600 so
UI/config-sourced metadata matches the runtime behavior, and keep the
"description" as "Timeout in seconds" to avoid ambiguity.
---
Nitpick comments:
In `@unstract/sdk1/tests/test_gemini_embedding.py`:
- Line 1: The test test_get_id_format currently only asserts the UUID string
length; update it to parse the candidate ID with uuid.UUID(candidate_id) to
ensure it's a valid UUID (this will raise ValueError for malformed IDs even if
length matches). Locate the test_get_id_format function in the
test_gemini_embedding tests and replace or augment the length-only assertion
with a uuid.UUID(...) parse (and optionally assert the parsed object's version
if needed). Apply the same change to the other similar assertions referenced
around lines 14-20 so all UUID checks validate by parsing rather than length
alone.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 97716567-1414-4b5c-a020-858dd9fae6f2
📒 Files selected for processing (5)
unstract/sdk1/src/unstract/sdk1/adapters/base1.pyunstract/sdk1/src/unstract/sdk1/adapters/embedding1/__init__.pyunstract/sdk1/src/unstract/sdk1/adapters/embedding1/gemini.pyunstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.jsonunstract/sdk1/tests/test_gemini_embedding.py
| def validate_model(adapter_metadata: dict[str, "Any"]) -> str: | ||
| model = adapter_metadata.get("model", "gemini/text-embedding-004") | ||
| if not model.startswith("gemini/"): | ||
| model = f"gemini/{model}" | ||
| adapter_metadata["model"] = model |
There was a problem hiding this comment.
Handle empty/blank model names before prefixing.
Line 960 falls back only when model is missing, not when it is present but empty (""), which yields gemini/ on Line 962. Please normalize empty/blank values to the default model.
Suggested fix
`@staticmethod`
def validate_model(adapter_metadata: dict[str, "Any"]) -> str:
- model = adapter_metadata.get("model", "gemini/text-embedding-004")
- if not model.startswith("gemini/"):
+ model = (adapter_metadata.get("model") or "").strip()
+ if not model:
+ model = "gemini/text-embedding-004"
+ elif not model.startswith("gemini/"):
model = f"gemini/{model}"
adapter_metadata["model"] = model
return model🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@unstract/sdk1/src/unstract/sdk1/adapters/base1.py` around lines 959 - 963, In
validate_model, treat adapter_metadata["model"] values that are empty or blank
as missing before prefixing: when reading model from adapter_metadata in
validate_model, normalize by checking if model is falsy or model.strip() == ""
and if so set it to the default "gemini/text-embedding-004"; then proceed with
the existing startswith("gemini/") check to prefix only non-empty names and
assign back to adapter_metadata["model"]. This ensures validate_model does not
produce "gemini/" for blank strings.
| def get_icon() -> str: | ||
| return "/icons/adapter-icons/Gemini.png" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify that the Gemini icon asset exists somewhere in the repository.
fd -i 'Gemini\.png$'Repository: Zipstack/unstract
Length of output: 43
🏁 Script executed:
# Search for the icons directory and other icon files
fd -i 'icon' --type d | head -20Repository: Zipstack/unstract
Length of output: 201
🏁 Script executed:
# Look for any PNG files in icon-related directories
find . -type f -name "*.png" 2>/dev/null | grep -i icon | head -20Repository: Zipstack/unstract
Length of output: 1118
🏁 Script executed:
# Check the structure of the embedding adapter directory
find unstract/sdk1/src/unstract/sdk1/adapters/embedding1 -type f 2>/dev/nullRepository: Zipstack/unstract
Length of output: 933
🏁 Script executed:
# Look at other adapter implementations to see how they define get_icon()
rg "def get_icon\(\)" -A 1 --type pyRepository: Zipstack/unstract
Length of output: 12624
Add missing Gemini.png icon asset to prevent UI rendering failure.
The Gemini embedding adapter references /icons/adapter-icons/Gemini.png but this icon file does not exist in frontend/public/icons/adapter-icons/. The icon asset must be added to this directory to prevent the adapter icon from failing to render in the UI. Other embedding adapters (OpenAI, Bedrock, VertexAI, etc.) all have corresponding icon files present in this location.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@unstract/sdk1/src/unstract/sdk1/adapters/embedding1/gemini.py` around lines
35 - 36, The get_icon() function returns a reference to "Gemini.png" which is
missing and causes the UI icon to fail; add a Gemini.png image asset into the
same adapter-icons asset directory where other adapter icons (e.g., OpenAI,
Bedrock, VertexAI) live so the path returned by get_icon() resolves at runtime,
and ensure the image filename and casing exactly match the string
"/icons/adapter-icons/Gemini.png" used in get_icon().
| "timeout": { | ||
| "type": "number", | ||
| "minimum": 0, | ||
| "multipleOf": 1, | ||
| "title": "Timeout", | ||
| "default": 240, | ||
| "description": "Timeout in seconds" |
There was a problem hiding this comment.
Align schema timeout default with runtime validation default.
Line 40 sets timeout default to 240, but runtime validation currently defaults to 600 via BaseEmbeddingParameters. This creates inconsistent adapter behavior depending on whether config comes from UI schema or direct metadata input.
Suggested fix (schema-side)
"timeout": {
"type": "number",
"minimum": 0,
"multipleOf": 1,
"title": "Timeout",
- "default": 240,
+ "default": 600,
"description": "Timeout in seconds"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "timeout": { | |
| "type": "number", | |
| "minimum": 0, | |
| "multipleOf": 1, | |
| "title": "Timeout", | |
| "default": 240, | |
| "description": "Timeout in seconds" | |
| "timeout": { | |
| "type": "number", | |
| "minimum": 0, | |
| "multipleOf": 1, | |
| "title": "Timeout", | |
| "default": 600, | |
| "description": "Timeout in seconds" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.json`
around lines 35 - 41, The JSON schema's "timeout" property default (currently
240) is inconsistent with the runtime default used by BaseEmbeddingParameters
(600); update the "timeout" default in
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.json to 600 so
UI/config-sourced metadata matches the runtime behavior, and keep the
"description" as "Timeout in seconds" to avoid ambiguity.
…model, update default model - Copy adapter_metadata before mutating in validate() - Remove side-effect dict write in validate_model() - Add blank/empty model validation with ValueError - Update JSON schema descriptions for model auto-prefix and batch size - Add tests for no-mutation, blank model, missing api_key, extra fields
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/adapters/base1.py`:
- Around line 959-960: The code coerces adapter_metadata.get("model", "") to
string which turns None into "None" and bypasses the empty check; update the
logic in base1.py around the model variable (where adapter_metadata and model
are referenced) to first read the raw value (e.g., raw_model =
adapter_metadata.get("model")), treat None as missing, then only call
str(...).strip() on non-None values and validate emptiness after stripping; also
add a unit test that passes model=None to ensure "None" does not become a valid
model (preventing invalid values like "gemini/None").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ca746679-98a8-445f-940f-163fa32e63ec
📒 Files selected for processing (3)
unstract/sdk1/src/unstract/sdk1/adapters/base1.pyunstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.jsonunstract/sdk1/tests/test_gemini_embedding.py
✅ Files skipped from review due to trivial changes (1)
- unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/gemini.json
🚧 Files skipped from review as they are similar to previous changes (1)
- unstract/sdk1/tests/test_gemini_embedding.py
…mini/None - Use isinstance check instead of str() coercion to handle None model - Add test for model=None case
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
gemini/text-embedding-004) via LiteLLM'sgemini/provider prefix.Why
How
GeminiEmbeddingParametersclass inbase1.pywithvalidate()andvalidate_model()(idempotentgemini/prefix handling)GeminiEmbeddingAdapterinembedding1/gemini.pyfollowing the established adapter patternembedding1/static/gemini.jsonfor the configuration UIembedding1/__init__.pywith the Gemini import and__all__entryCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
litellmalready in the project.Notes on Testing
unstract/sdk1/tests/test_gemini_embedding.pyvalidate_model()prefix idempotency,validate()integration,embed_batch_sizedefaultsGemini.pngicon asset needs to be added tofrontend/public/icons/adapter-icons/separately for the UI icon to renderChecklist
I have read and understood the Contribution Guidelines.