[FIX] Unified retry for LLM and embedding providers by chandrasekharan-zipstack · Pull Request #1886 · Zipstack/unstract

chandrasekharan-zipstack · 2026-03-31T17:29:10Z

What

Unified retry with exponential backoff for all LLM and embedding provider calls
Exposed max_retries in UI for embedding adapters (OpenAI, Azure, VertexAI, Ollama) and Ollama LLM — previously missing from the adapter form
DRY consolidation of retry decision logic into a single _get_retry_delay() helper used by all 4 retry paths

Why

max_retries configured in the adapter UI was silently ignored for:
- LLM (httpx-based providers): Anthropic, Vertex AI, Bedrock, Mistral — litellm's retry only works for SDK-based providers (OpenAI/Azure)
- All embedding providers: litellm has no wrapper-level retry for embeddings at all (embedding_with_retries() does not exist)
4 out of 5 embedding adapters did not expose max_retries in the UI, so users could not configure retries even if the backend supported them
Retry decision logic was duplicated across call_with_retry, acall_with_retry, iter_with_retry, and retry_with_exponential_backoff

How

Self-managed retry at the SDK layer wrapping litellm.completion() / litellm.embedding() calls
Disable litellm's own retry to prevent double-retry: max_retries=0 (SDK-level) + num_retries=0 (litellm wrapper)
is_retryable_litellm_error() predicate uses duck-typing (no litellm import) to detect transient errors: status codes 408/429/500/502/503/504, ConnectionError, TimeoutError, and litellm/httpx exception class names
Stream retry (iter_with_retry) only retries before the first chunk is yielded — mid-stream failures propagate immediately
Extracted _get_retry_delay() shared helper — all retry functions delegate retry decisions to this single function

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No breaking changes:

Existing adapters without max_retries in saved metadata: Pydantic defaults to None, code converts to 0 — no retry, identical to current behavior
New/edited adapters: UI form pre-fills max_retries=3, users can adjust
JSON schema additions are additive — no existing fields removed or changed (except Bedrock embedding default: 5 to 3 for consistency)
litellm caching, callbacks, and usage tracking still work per attempt since each retry goes through litellm.completion()/litellm.embedding()

Database Migrations

None

Env Config

No new env vars. Existing PLATFORM_SERVICE_MAX_RETRIES, PROMPT_SERVICE_MAX_RETRIES etc. are unaffected.

Relevant Docs

Supersedes approach in PR UN-3344 [FIX] Activate litellm retry for all LLM providers #1867 which only fixed LLM completion by bridging max_retries to num_retries

Related Issues or PRs

UN-3344 [FIX] Activate litellm retry for all LLM providers #1867

Dependencies Versions

No dependency changes

Notes on Testing

Tested locally with docker containers against:

Provider	Error Type	Retried?	Backoff?
OpenAI LLM (SDK-based)	Auth 401	No (correct)	N/A
OpenAI LLM (SDK-based)	Timeout	Yes, 2 retries	1.2s, 2.5s
Anthropic LLM (httpx-based)	Auth 401	No (correct)	N/A
Anthropic LLM (httpx-based)	Timeout	Yes, 2 retries	1.2s, 2.2s
OpenAI Embedding	Auth 401	No (correct)	N/A
OpenAI Embedding	Timeout	Yes, 2 retries	1.0s, 2.3s

Screenshots

N/A — backend-only changes + JSON schema additions

Checklist

I have read and understood the Contribution Guidelines.

litellm's retry only works for SDK-based providers (OpenAI/Azure). httpx-based providers (Anthropic, Vertex, Bedrock, Mistral) and ALL embedding calls silently ignore max_retries. This adds self-managed retry with exponential backoff at the SDK layer, disabling litellm's own retry entirely for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

coderabbitai · 2026-03-31T17:29:18Z

Summary by CodeRabbit

New Features
- Configurable max retry attempts for embedding and LLM providers (default: 3).
- Unified retry handling for sync, async, and streaming calls with exponential backoff and improved error classification.
Behavior
- Streaming retries occur only before the first streamed item; post-first-item failures surface immediately.
Stability
- Better logging and retry decisions to improve resilience to transient provider errors.

Walkthrough

Adds configurable retry support: adapter JSON schemas gain an optional max_retries property (default 3 in most files, bedrock changed 5→3). Embedding and LLM paths route LiteLLM calls through shared retry helpers while disabling LiteLLM's internal retry.

Changes

Cohort / File(s)	Summary
Embedding Adapter Schemas `unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/azure.json`, `unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/openai.json`, `unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/ollama.json`, `unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/vertexai.json`	Added optional `max_retries` property (number; minimum 0; multipleOf 1; default 3) to each schema.
Bedrock Embedding Schema `unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/bedrock.json`	Updated existing `max_retries` default from `5` to `3` and rephrased the description.
LLM Adapter Schema `unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/ollama.json`	Added optional `max_retries` property (number; minimum 0; multipleOf 1; default 3).
Embedding Module `unstract/sdk1/src/unstract/sdk1/embedding.py`	Wrapped LiteLLM embedding calls with retry helpers (`call_with_retry`/`acall_with_retry`), added private `_pop_retry_params()` to extract and suppress LiteLLM retry params, adjusted async error handling, and added module logging.
LLM Module `unstract/sdk1/src/unstract/sdk1/llm.py`	Routed sync/async/streaming completion calls through `call_with_retry`, `acall_with_retry`, `iter_with_retry`; added `_disable_litellm_retry()` to extract `max_retries` and disable LiteLLM internal retries.
Retry Utilities `unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`	Introduced comprehensive retry infra: `RETRYABLE_STATUS_CODES`, `is_retryable_litellm_error()`, `_get_retry_delay()`, and helpers `call_with_retry`, `acall_with_retry`, `iter_with_retry`; refactored existing retry decorator, adjusted default exception types, and simplified retry control flow.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client Code
    participant Module as Embedding/LLM Module
    participant RetryUtil as Retry Utils
    participant LiteLLM as LiteLLM
    participant Provider as External Provider

    Client->>Module: Call get_embedding / completion
    Module->>Module: Extract max_retries from kwargs
    Module->>Module: Disable LiteLLM internal retry (max_retries=0, num_retries=0)
    Module->>RetryUtil: Invoke call_with_retry / acall_with_retry / iter_with_retry

    loop Until success or max_retries exhausted
        RetryUtil->>LiteLLM: Execute wrapped call
        LiteLLM->>Provider: Send request
        alt Provider success
            Provider-->>LiteLLM: Return response
            LiteLLM-->>RetryUtil: Return result
            RetryUtil-->>Client: Return result
        else Transient provider error
            Provider-->>LiteLLM: Error
            LiteLLM-->>RetryUtil: Raise exception
            RetryUtil->>RetryUtil: is_retryable_litellm_error()? → True
            RetryUtil->>RetryUtil: Compute backoff delay
            RetryUtil->>RetryUtil: Wait & retry
        else Non-retryable error
            Provider-->>LiteLLM: Error
            LiteLLM-->>RetryUtil: Raise exception
            RetryUtil->>Client: Raise exception
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[FIX] Unified retry for LLM and embedding providers' clearly and concisely summarizes the main objective: implementing unified retry logic across LLM and embedding providers.
Description check	✅ Passed	The PR description comprehensively addresses all required template sections including What, Why, How, breaking changes analysis, database migrations, environment config, related issues, dependencies, testing notes, and checklist confirmation.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/unified-retry-llm-embedding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Move retry loops out of LLM/Embedding classes into generic call_with_retry, acall_with_retry, and iter_with_retry functions in retry_utils.py. Both classes now call these directly instead of maintaining their own retry helper methods. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

greptile-apps · 2026-03-31T17:41:38Z

Greptile Summary

This PR delivers a well-scoped reliability improvement: it replaces the previous patchwork of litellm-managed retries (which silently did nothing for httpx-based LLM providers and all embedding providers) with a self-managed retry layer that wraps every litellm.completion() and litellm.embedding() call. It also closes a UX gap by exposing max_retries in the adapter UI for OpenAI, Azure, VertexAI, Ollama (embedding), and Ollama (LLM).

Key points:

Retry logic is correct end-to-end. _disable_litellm_retry / _pop_retry_params zero out both max_retries=0 and num_retries=0 before each litellm call, eliminating the double-retry risk for Azure/OpenAI SDK-backed providers.
The previously-flagged issues are all resolved. ConnectionError import shadowing is fixed (RequestsConnectionError); the MRO check now covers subclasses; num_retries is explicitly zeroed for embeddings.
_get_retry_delay is a clean DRY consolidation — all four retry paths delegate to a single helper, which makes the backoff and retry-decision logic easy to audit and modify.
iter_with_retry stream semantics are sound — it only retries before the first chunk is yielded, preventing partial-stream delivery to callers.
Two minor P2 style points: whether 500 should remain in RETRYABLE_STATUS_CODES (it is retryable by convention but also returned for persistent errors like context-length violations by some providers), and the implicit None return path in call_with_retry / acall_with_retry that is safe at runtime but opaque to static analysis.

Confidence Score: 5/5

Safe to merge — all previously raised P0/P1 concerns are resolved; remaining findings are P2 design preferences.

All three issues from prior review threads (ConnectionError import shadowing, num_retries double-retry for embeddings, MRO-only name matching) are correctly addressed. The retry control flow is sound, litellm retries are fully disabled before the wrapper loop runs, and backward compatibility is preserved via the None or 0 coercion for adapters without a saved max_retries. Only two P2 suggestions remain.

No files require special attention; the two minor points are both in retry_utils.py.

Important Files Changed

Filename	Overview
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py	Core refactor: adds `is_retryable_litellm_error`, `_get_retry_delay` shared helper, and three new retry wrappers (`call_with_retry`, `acall_with_retry`, `iter_with_retry`); fixes the `ConnectionError` import-shadowing and MRO-only name-match issues raised in prior review threads.
unstract/sdk1/src/unstract/sdk1/embedding.py	Adds `_pop_retry_params` (pops `max_retries`, zeros both `max_retries` and `num_retries` to prevent double-retry) and wraps all four embedding call sites with the new retry helpers.
unstract/sdk1/src/unstract/sdk1/llm.py	Adds static `_disable_litellm_retry` helper and wraps `complete`, `stream_complete`, and `acomplete` calls with the new retry wrappers.
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/vertexai.json	Adds `max_retries` field (default 3) to the Vertex AI embedding adapter UI schema; indentation inside the file is slightly inconsistent with surrounding properties but JSON is semantically valid.
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/bedrock.json	Changes the Bedrock embedding `max_retries` default from 5 → 3 for consistency, and unifies the description wording.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LLM/Embedding
    participant RetryWrapper
    participant _get_retry_delay
    participant litellm

    Caller->>LLM/Embedding: complete() / get_embedding()
    LLM/Embedding->>LLM/Embedding: _disable_litellm_retry()<br/>pops max_retries, sets max_retries=0 num_retries=0
    LLM/Embedding->>RetryWrapper: call_with_retry(fn, max_retries)
    loop attempt 0..max_retries
        RetryWrapper->>litellm: fn() → litellm.completion/embedding(max_retries=0)
        alt success
            litellm-->>RetryWrapper: response
            RetryWrapper-->>LLM/Embedding: response
        else transient error
            litellm-->>RetryWrapper: exception
            RetryWrapper->>_get_retry_delay: (error, attempt, max_retries)
            alt retryable AND attempt < max_retries
                _get_retry_delay-->>RetryWrapper: delay (exponential+jitter)
                RetryWrapper->>RetryWrapper: sleep(delay)
            else non-retryable OR exhausted
                _get_retry_delay-->>RetryWrapper: None
                RetryWrapper-->>Caller: raise exception
            end
        end
    end
    LLM/Embedding-->>Caller: response

Prompt To Fix All With AI

This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
Line: 20

Comment:
**Consider excluding `500` from retryable status codes**

`500 Internal Server Error` is included in `RETRYABLE_STATUS_CODES`. Most transient LLM failures land on 429/503/504, while 500 is also returned by several providers for malformed requests or context-length violations that won't recover on retry. Retrying those calls burns quota unnecessarily and adds latency.

If you'd like to keep 500 retryable you may want to tighten it to only the clearly transient set:

```suggestion
RETRYABLE_STATUS_CODES = frozenset({408, 429, 502, 503, 504})
```

Alternatively, keeping 500 is a reasonable defensive choice — just worth a deliberate decision.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
Line: 122-142

Comment:
**`call_with_retry` (and `acall_with_retry`) have an implicit `None` return in their type signatures**

Both functions are typed as `-> object`, but if the loop body somehow fell through without returning or raising — which can't happen with the current logic, but is opaque to static analysers — they would implicitly return `None`. In practice the control flow is correct (last-attempt exceptions always re-raise), but callers like `embedding.py` do `resp["data"][0]["embedding"]` on the return value without a None-guard. Annotating the return as `-> Any` and adding a final `raise RuntimeError("unreachable")` after the loop would make this explicit and silence any type-checker warnings:

```python
    for attempt in range(max_retries + 1):
        try:
            return fn()
        except Exception as e:
            delay = _get_retry_delay(...)
            if delay is None:
                raise
            time.sleep(delay)
    raise RuntimeError("call_with_retry: exhausted retries without raising")  # unreachable
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (5): Last reviewed commit: "[FIX] Address CodeRabbit review: add API..." | Re-trigger Greptile}

unstract/sdk1/src/unstract/sdk1/embedding.py

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py

- Extract _get_retry_delay() shared helper to eliminate duplicated retry decision logic across call_with_retry, acall_with_retry, iter_with_retry, and retry_with_exponential_backoff - Add num_retries=0 to embedding._pop_retry_params() to fully disable litellm's internal retry for embedding calls - Expose max_retries in UI JSON schemas for embedding adapters (OpenAI, Azure, VertexAI, Ollama) and Ollama LLM — previously the field existed in Pydantic models but wasn't shown to users, silently defaulting to 0 retries - Add debug logging to LLM and Embedding retry parameter extraction - Clarify docstrings distinguishing is_retryable_litellm_error() from is_retryable_error() (different exception hierarchies) - Remove stale noqa: C901 from simplified retry_with_exponential_backoff Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…dapters Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…check - Fix `requests.ConnectionError` shadowing Python's builtin `ConnectionError` in `is_retryable_litellm_error()` — rename import to `RequestsConnectionError` and use `builtins.ConnectionError` / `builtins.TimeoutError` explicitly - Use `__mro__`-based class name check instead of `type(error).__name__` to also catch subclasses of retryable error types - P1 (num_retries not zeroed) was already fixed in prior commit Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (1)
254-317: Extract the sync retry loop instead of maintaining two copies.

retry_with_exponential_backoff() now reimplements the same attempt/sleep/decision flow as call_with_retry(), which is the SonarCloud failure on Line 254. Pulling the common loop into one helper will lower the branching here and keep the decorator path from drifting away from the SDK path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 254 - 317,
The retry logic in retry_with_exponential_backoff duplicates the
attempt/sleep/decision flow from call_with_retry; extract that shared loop into
a single helper function (e.g., _run_retry_loop or reuse call_with_retry
signature) that encapsulates attempts, calling the target, invoking
_get_retry_delay, sleeping, logging, and raising, then have
retry_with_exponential_backoff's decorator/wrapper call this helper (passing
func, max_retries, base_delay, multiplier, jitter, exceptions, logger_instance,
prefix, retry_predicate) to eliminate the duplicated flow and keep both the
decorator path and call_with_retry using the same implementation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 113-186: The three retry helpers (call_with_retry,
acall_with_retry, iter_with_retry) must validate max_retries is >= 0 before
entering their loops; add an early check in each function (call_with_retry,
acall_with_retry, iter_with_retry) that raises a ValueError (or similar) when
max_retries is negative so we fail fast instead of returning None or yielding
nothing when range(max_retries + 1) is empty; this mirrors the existing
decorator validation and prevents silent no-op behavior.
- Around line 33-57: The isinstance check in is_retryable_litellm_error
currently uses a shadowed ConnectionError (imported from requests.exceptions)
which narrows detection to requests-only errors; update the import/usage so the
check uses the built-in ConnectionError and TimeoutError types (i.e., remove or
rename the requests import and reference the built-ins in
is_retryable_litellm_error) and also add "APITimeoutError" to the
_RETRYABLE_ERROR_NAMES allowlist so OpenAI/OpenAI-mapped 408 timeout exceptions
are recognized by the name-based check.

---

Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 254-317: The retry logic in retry_with_exponential_backoff
duplicates the attempt/sleep/decision flow from call_with_retry; extract that
shared loop into a single helper function (e.g., _run_retry_loop or reuse
call_with_retry signature) that encapsulates attempts, calling the target,
invoking _get_retry_delay, sleeping, logging, and raising, then have
retry_with_exponential_backoff's decorator/wrapper call this helper (passing
func, max_retries, base_delay, multiplier, jitter, exceptions, logger_instance,
prefix, retry_predicate) to eliminate the duplicated flow and keep both the
decorator path and call_with_retry using the same implementation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b73e0a58-1ac6-4696-9a91-27485df74d1b

📥 Commits

Reviewing files that changed from the base of the PR and between 3b1a343 and f83404a.

📒 Files selected for processing (9)

unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/azure.json
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/bedrock.json
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/ollama.json
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/openai.json
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/vertexai.json
unstract/sdk1/src/unstract/sdk1/adapters/llm1/static/ollama.json
unstract/sdk1/src/unstract/sdk1/embedding.py
unstract/sdk1/src/unstract/sdk1/llm.py
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py

coderabbitai · 2026-04-01T13:25:16Z

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py

+def is_retryable_litellm_error(error: Exception) -> bool:
+    """Check if a litellm/provider API error should trigger a retry.

-    Args:
-        error: The exception to check
+    Distinct from is_retryable_error() which handles requests-library exceptions
+    (requests.ConnectionError, requests.HTTPError.response.status_code, OSError).
+    litellm/openai/httpx have a separate exception hierarchy: status_code lives
+    on the exception itself, and class names like APIConnectionError don't inherit
+    from the requests types. Uses duck-typing to avoid importing litellm directly.
+    """
+    # Python built-in connection / timeout base classes
+    if isinstance(error, ConnectionError | TimeoutError):
+        return True

-    Returns:
-        True if the error should trigger a retry
+    # litellm/openai/httpx exception types that don't inherit from the
+    # built-ins above but still represent transient network failures.
+    if type(error).__name__ in _RETRYABLE_ERROR_NAMES:
+        return True
+
+    # Status-code check covers litellm.RateLimitError (429),
+    # InternalServerError (500), ServiceUnavailableError (503), etc.
+    status_code = getattr(error, "status_code", None)
+    if status_code is not None and status_code in RETRYABLE_STATUS_CODES:
+        return True
+
+    return False


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

For the current official LiteLLM, OpenAI Python SDK, and httpx documentation, what exception classes and attributes represent retryable connection, timeout, and HTTP 408/429/5xx failures? Please confirm whether any of them use response.status_codeinstead ofstatus_code, and whether classes like APITimeoutError, ConnectError, or PoolTimeout can be raised in the paths LiteLLM exposes.

💡 Result:

LiteLLM, OpenAI Python SDK, and httpx each handle retryable failures with specific exception classes and attributes. LiteLLM exposes OpenAI-compatible exceptions that wrap underlying errors, adding attributes like status_code, llm_provider, model, num_retries, and max_retries. It does not directly expose raw httpx exceptions like ConnectError or PoolTimeout in its public API paths; instead, it maps them to APIConnectionError or Timeout. LiteLLM (current official docs and source): - Connection failures: APIConnectionError (inherits openai.APIConnectionError), for network/connection issues including 500 unmapped errors. Retryable. - Timeout failures: Timeout (inherits openai.APITimeoutError), status_code=408 (or provider-specific). - HTTP 408: Mapped to Timeout. - HTTP 429: RateLimitError (inherits openai.RateLimitError), status_code=429. Retryable. - HTTP 5xx: APIError (generic 500), InternalServerError (>=500 unmapped), ServiceUnavailableError (503). All retryable. Attributes: All have .status_code (int, set directly e.g. self.status_code=429), .message, .llm_provider, .model, .litellm_debug_info, .num_retries, .max_retries. Some construct .response as httpx.Response with .status_code. OpenAI Python SDK (v1, _exceptions.py source): - Connection failures: APIConnectionError (inherits APIError). - Timeout failures: APITimeoutError (inherits APIConnectionError). - HTTP 408: APITimeoutError. - HTTP 429: RateLimitError (inherits APIStatusError), status_code=429. - HTTP 5xx: APIError, InternalServerError (inherits APIStatusError). Attributes: .status_code (on APIStatusError subclasses), .request (httpx.Request), .response (on some, with .status_code), .request_id. httpx (official docs): - Connection failures: ConnectError (inherits NetworkError > TransportError > RequestError). - Timeout failures: TimeoutException and subclasses: ConnectTimeout, ReadTimeout, WriteTimeout, PoolTimeout. All inherit TransportError > RequestError. - HTTP 408/429/5xx: Not exceptions by default; raised as HTTPStatusError (with .response.status_code) only if response.raise_for_status called. Attributes: Exceptions have .request (if available). No direct .status_code on exception; HTTPStatusError has .response with .status_code. Responses always use .status_code (not response.status_code). Confirmation: - None use response.status_code on the exception itself; LiteLLM/OpenAI exceptions set .status_code directly (int). httpx.Response has .status_code; no evidence of response.status_code anywhere. - APITimeoutError: Raised by OpenAI SDK, inherited/mapped in LiteLLM as Timeout. - ConnectError, PoolTimeout: httpx exceptions. Can occur internally in LiteLLM/OpenAI (both use httpx), but LiteLLM maps to APIConnectionError/Timeout before exposing in completion paths. Not raised directly by LiteLLM's public API.

Citations:

1: https://github.com/BerriAI/litellm/blob/main/litellm/exceptions.py

2: https://www.mintlify.com/BerriAI/litellm/api/exceptions

3: https://docs.litellm.ai/docs/exception_mapping

4: https://www.python-httpx.org/exceptions/

5: http://docs.litellm.ai/docs/exception_mapping

6: https://mintlify.com/BerriAI/litellm/api/exceptions

7: https://fossies.org/linux/openai-python/src/openai/_exceptions.py

🏁 Script executed:

# Check the import statement around line 13 head -20 unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py | cat -n

Repository: Zipstack/unstract

Length of output: 792

🏁 Script executed:

# Find the _RETRYABLE_ERROR_NAMES definition rg -n "_RETRYABLE_ERROR_NAMES" unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py -A 5 -B 2

Repository: Zipstack/unstract

Length of output: 761

🏁 Script executed:

# Check if there are any other references to these exception names in the file rg -n "APITimeoutError|ConnectError|PoolTimeout" unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py

Repository: Zipstack/unstract

Length of output: 43

🏁 Script executed:

# Get the complete _RETRYABLE_ERROR_NAMES definition sed -n '23,35p' unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py | cat -n

Repository: Zipstack/unstract

Length of output: 429

🏁 Script executed:

# Get the complete code from lines 33-57 including the isinstance check sed -n '33,57p' unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py | cat -n

Repository: Zipstack/unstract

Length of output: 1428

Fix import shadowing and add missing exception class names.

Line 13 imports ConnectionError from requests.exceptions, which shadows the built-in ConnectionError class. This means the isinstance check on line 42 is checking requests.exceptions.ConnectionError instead of the built-in, narrowing its reach to only requests-library errors rather than Python's native connection errors.

Additionally, _RETRYABLE_ERROR_NAMES is missing "APITimeoutError", which is raised by the OpenAI SDK (OpenAI maps HTTP 408 to APITimeoutError). This class is not in the allowlist, so timeouts from the OpenAI SDK will skip the name-based check and only be caught if they carry a status_code attribute set to 408—a fragile fallback.

Consider using the built-in ConnectionError and TimeoutError directly in the isinstance check (available in Python 3.10+), and add "APITimeoutError" to the exception names allowlist to ensure consistent coverage across litellm, openai, and httpx exception hierarchies.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 33 - 57, The isinstance check in is_retryable_litellm_error currently uses a shadowed ConnectionError (imported from requests.exceptions) which narrows detection to requests-only errors; update the import/usage so the check uses the built-in ConnectionError and TimeoutError types (i.e., remove or rename the requests import and reference the built-ins in is_retryable_litellm_error) and also add "APITimeoutError" to the _RETRYABLE_ERROR_NAMES allowlist so OpenAI/OpenAI-mapped 408 timeout exceptions are recognized by the name-based check.

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py

coderabbitai

🧹 Nitpick comments (2)

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (2)

257-324: Consider extracting the inner wrapper function to reduce cognitive complexity.

SonarCloud flags cognitive complexity of 20 (allowed: 15). The nested decorator structure with try/except and conditional logging contributes to this. You could extract the "giving up" logging block or the entire wrapper body into a helper function.

Possible refactor approach

+def _log_final_failure(
+    logger_instance: logging.Logger,
+    func_name: str,
+    attempt: int,
+    prefix: str,
+) -> None:
+    """Log when retry attempts are exhausted."""
+    if attempt > 0:
+        logger_instance.exception(
+            "Giving up '%s' after %d attempt(s) for %s",
+            func_name,
+            attempt + 1,
+            prefix,
+        )
+
 def retry_with_exponential_backoff(...) -> Callable:
     ...
     def decorator(func: Callable) -> Callable:
         `@wraps`(func)
         def wrapper(*args: Any, **kwargs: Any) -> Any:
             for attempt in range(max_retries + 1):
                 try:
                     result = func(*args, **kwargs)
                     ...
                     return result
                 except exceptions as e:
                     delay = _get_retry_delay(...)
                     if delay is None:
-                        if attempt > 0:
-                            logger_instance.exception(
-                                "Giving up '%s' after %d attempt(s) for %s",
-                                func.__name__,
-                                attempt + 1,
-                                prefix,
-                            )
+                        _log_final_failure(logger_instance, func.__name__, attempt, prefix)
                         raise
                     time.sleep(delay)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 257 - 324,
The wrapper inside retry_with_exponential_backoff is too complex; extract the
retry loop or the "giving up" logging and sleep/raise logic into a helper
function to reduce cognitive complexity. Create a private helper (e.g.,
_execute_with_retries or _handle_attempt) that accepts func, args, kwargs,
max_retries, base_delay, multiplier, jitter, exceptions, retry_predicate,
prefix, and logger_instance and returns the result or raises; have wrapper call
this helper and keep wrapper only responsible for `@wraps` and invoking the
helper; ensure the helper uses _get_retry_delay and preserves the existing
logging messages (including the success log in wrapper when attempt > 0 or move
that logging into the helper) so behavior is unchanged.

138-145: Consider tightening type hints for fn parameter.

The current Callable[[], object] is imprecise:

acall_with_retry awaits the result, so fn should return an awaitable
iter_with_retry iterates the result, so fn should return an iterable

This won't cause runtime issues but weakens static type checking.

Suggested type hints

from collections.abc import Awaitable, Iterable
from typing import TypeVar

T = TypeVar("T")

async def acall_with_retry(
    fn: Callable[[], Awaitable[T]],
    *,
    max_retries: int,
    ...
) -> T:

def iter_with_retry(
    fn: Callable[[], Iterable[T]],
    *,
    max_retries: int,
    ...
) -> Generator[T, None, None]:

Also applies to: 160-167

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 138 - 145,
Tighten the type hints for acall_with_retry and iter_with_retry: introduce a
TypeVar T and change the fn parameter for acall_with_retry to Callable[[],
Awaitable[T]] with the function returning T, and change iter_with_retry to
accept Callable[[], Iterable[T]] and return Generator[T, None, None]; import
Awaitable and Iterable from collections.abc (and Generator if not already
present) and TypeVar from typing to make the signatures precise and improve
static checking while keeping existing runtime behavior (refer to function names
acall_with_retry and iter_with_retry and update their return annotations
accordingly).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 257-324: The wrapper inside retry_with_exponential_backoff is too
complex; extract the retry loop or the "giving up" logging and sleep/raise logic
into a helper function to reduce cognitive complexity. Create a private helper
(e.g., _execute_with_retries or _handle_attempt) that accepts func, args,
kwargs, max_retries, base_delay, multiplier, jitter, exceptions,
retry_predicate, prefix, and logger_instance and returns the result or raises;
have wrapper call this helper and keep wrapper only responsible for `@wraps` and
invoking the helper; ensure the helper uses _get_retry_delay and preserves the
existing logging messages (including the success log in wrapper when attempt > 0
or move that logging into the helper) so behavior is unchanged.
- Around line 138-145: Tighten the type hints for acall_with_retry and
iter_with_retry: introduce a TypeVar T and change the fn parameter for
acall_with_retry to Callable[[], Awaitable[T]] with the function returning T,
and change iter_with_retry to accept Callable[[], Iterable[T]] and return
Generator[T, None, None]; import Awaitable and Iterable from collections.abc
(and Generator if not already present) and TypeVar from typing to make the
signatures precise and improve static checking while keeping existing runtime
behavior (refer to function names acall_with_retry and iter_with_retry and
update their return annotations accordingly).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 04b772de-c8c2-491a-9b94-28b8bcc53a9a

📥 Commits

Reviewing files that changed from the base of the PR and between f83404a and 43fed18.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py

…tries - Add APITimeoutError to _RETRYABLE_ERROR_NAMES for explicit OpenAI SDK timeout coverage - Add _validate_max_retries() guard to call_with_retry, acall_with_retry, iter_with_retry to fail fast on negative values instead of silently returning None Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

sonarqubecloud · 2026-04-01T15:31:38Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-04-01T15:31:46Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 98 passed, 0 failed (98 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

coderabbitai

🧹 Nitpick comments (3)

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (3)

266-333: Consider reducing cognitive complexity.

SonarCloud flagged this function with cognitive complexity of 20 (allowed 15). The nested conditionals within the exception handlers contribute to this. Consider extracting the "giving up" logging block (lines 319-325) into a helper or simplifying the control flow.

💡 Suggested refactor

+def _log_giving_up(
+    logger_instance: logging.Logger,
+    func_name: str,
+    attempt: int,
+    prefix: str,
+) -> None:
+    """Log when giving up after all retry attempts."""
+    if attempt > 0:
+        logger_instance.exception(
+            "Giving up '%s' after %d attempt(s) for %s",
+            func_name,
+            attempt + 1,
+            prefix,
+        )
+

 def retry_with_exponential_backoff(
     ...
 ) -> Callable:
     def decorator(func: Callable) -> Callable:
         `@wraps`(func)
         def wrapper(*args: Any, **kwargs: Any) -> Any:
             for attempt in range(max_retries + 1):
                 try:
                     result = func(*args, **kwargs)
                     if attempt > 0:
                         logger_instance.info(
                             "Successfully completed '%s' after %d retry attempt(s)",
                             func.__name__,
                             attempt,
                         )
                     return result
                 except exceptions as e:
                     delay = _get_retry_delay(...)
                     if delay is None:
-                        if attempt > 0:
-                            logger_instance.exception(
-                                "Giving up '%s' after %d attempt(s) for %s",
-                                func.__name__,
-                                attempt + 1,
-                                prefix,
-                            )
+                        _log_giving_up(logger_instance, func.__name__, attempt, prefix)
                         raise
                     time.sleep(delay)
                 except Exception:
                     raise
         return wrapper
     return decorator

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 266 - 333,
The retry_with_exponential_backoff wrapper has high cognitive complexity due to
nested exception handling and an inline "giving up" logging block; extract that
block into a small helper (e.g., _log_giving_up or _handle_give_up) and call it
from inside the except exceptions as e: branch when _get_retry_delay returns
None, passing logger_instance, prefix, func.__name__, and attempt so the main
wrapper focuses on control flow only, reducing nesting; keep the existing calls
to _get_retry_delay and time.sleep unchanged and ensure the helper performs the
logger.exception call and any formatting before re-raising the exception.

145-152: Type hint should reflect awaitable return type.

The fn parameter is awaited on line 158, so the type hint should be Callable[[], Awaitable[object]] rather than Callable[[], object].

💡 Suggested fix

+from collections.abc import Awaitable
+
 async def acall_with_retry(
-    fn: Callable[[], object],
+    fn: Callable[[], Awaitable[object]],
     *,
     max_retries: int,
     retry_predicate: Callable[[Exception], bool],
     description: str = "",
     logger_instance: logging.Logger | None = None,
 ) -> object:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 145 - 152,
Update the type hint for the `fn` parameter in `acall_with_retry` to reflect
that it's awaited: change `Callable[[], object]` to `Callable[[],
Awaitable[object]]`, and add the necessary `Awaitable` import from `typing` (or
`collections.abc`) at the top of the module so the annotation is valid; keep the
function's async signature and return annotation as-is.

168-175: Type hint should reflect iterable return type.

The fn result is iterated on line 186, so the type hint should reflect this, e.g., Callable[[], Iterable[T]].

💡 Suggested fix

+from collections.abc import Iterable
+from typing import TypeVar
+
+T = TypeVar("T")
+
 def iter_with_retry(
-    fn: Callable[[], object],
+    fn: Callable[[], Iterable[T]],
     *,
     max_retries: int,
     retry_predicate: Callable[[Exception], bool],
     description: str = "",
     logger_instance: logging.Logger | None = None,
-) -> Generator:
+) -> Generator[T, None, None]:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 168 - 175,
The type hint for iter_with_retry is wrong because fn is expected to return an
iterable; update the signature to use a generic TypeVar (e.g., T) and change
fn's type to Callable[[], Iterable[T]] and the function's return type to
Generator[T, None, None] so callers of iter_with_retry and the loop over fn()
have correct typing; ensure you import or declare typing.TypeVar and Iterable
where needed and update any related annotations in iter_with_retry accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 266-333: The retry_with_exponential_backoff wrapper has high
cognitive complexity due to nested exception handling and an inline "giving up"
logging block; extract that block into a small helper (e.g., _log_giving_up or
_handle_give_up) and call it from inside the except exceptions as e: branch when
_get_retry_delay returns None, passing logger_instance, prefix, func.__name__,
and attempt so the main wrapper focuses on control flow only, reducing nesting;
keep the existing calls to _get_retry_delay and time.sleep unchanged and ensure
the helper performs the logger.exception call and any formatting before
re-raising the exception.
- Around line 145-152: Update the type hint for the `fn` parameter in
`acall_with_retry` to reflect that it's awaited: change `Callable[[], object]`
to `Callable[[], Awaitable[object]]`, and add the necessary `Awaitable` import
from `typing` (or `collections.abc`) at the top of the module so the annotation
is valid; keep the function's async signature and return annotation as-is.
- Around line 168-175: The type hint for iter_with_retry is wrong because fn is
expected to return an iterable; update the signature to use a generic TypeVar
(e.g., T) and change fn's type to Callable[[], Iterable[T]] and the function's
return type to Generator[T, None, None] so callers of iter_with_retry and the
loop over fn() have correct typing; ensure you import or declare typing.TypeVar
and Iterable where needed and update any related annotations in iter_with_retry
accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 22f7cf9d-7244-483f-90ab-c81ca438acc8

📥 Commits

Reviewing files that changed from the base of the PR and between 43fed18 and c8054e8.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py

pk-zipstack

@chandrasekharan-zipstack Looks like there are some comments from sonarcloud. Please check them out. LGTM otherwise.

greptile-apps bot reviewed Mar 31, 2026

View reviewed changes

unstract/sdk1/src/unstract/sdk1/embedding.py Show resolved Hide resolved

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py Outdated Show resolved Hide resolved

unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py Show resolved Hide resolved

chandrasekharan-zipstack and others added 2 commits April 1, 2026 16:07

[FIX] Set max_retries default to 3 for all embedding and Ollama LLM a…

f83404a

…dapters Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chandrasekharan-zipstack self-assigned this Apr 1, 2026

chandrasekharan-zipstack marked this pull request as ready for review April 1, 2026 13:15

chandrasekharan-zipstack requested review from gaya3-zipstack, hari-kuriakose and pk-zipstack April 1, 2026 13:23

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

chandrasekharan-zipstack mentioned this pull request Apr 1, 2026

UN-3344 [FIX] Activate litellm retry for all LLM providers #1867

Open

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

pk-zipstack approved these changes Apr 2, 2026

View reviewed changes

Conversation

chandrasekharan-zipstack commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

greptile-apps bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Apr 1, 2026

Quality Gate passed

Uh oh!

github-actions bot commented Apr 1, 2026

Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

pk-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chandrasekharan-zipstack commented Mar 31, 2026 •

edited

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

greptile-apps bot commented Mar 31, 2026 •

edited

Loading