[FIX] Unified retry for LLM and embedding providers#1886
[FIX] Unified retry for LLM and embedding providers#1886chandrasekharan-zipstack wants to merge 6 commits intomainfrom
Conversation
litellm's retry only works for SDK-based providers (OpenAI/Azure). httpx-based providers (Anthropic, Vertex, Bedrock, Mistral) and ALL embedding calls silently ignore max_retries. This adds self-managed retry with exponential backoff at the SDK layer, disabling litellm's own retry entirely for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Summary by CodeRabbit
WalkthroughAdds configurable retry support: adapter JSON schemas gain an optional Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client Code
participant Module as Embedding/LLM Module
participant RetryUtil as Retry Utils
participant LiteLLM as LiteLLM
participant Provider as External Provider
Client->>Module: Call get_embedding / completion
Module->>Module: Extract max_retries from kwargs
Module->>Module: Disable LiteLLM internal retry (max_retries=0, num_retries=0)
Module->>RetryUtil: Invoke call_with_retry / acall_with_retry / iter_with_retry
loop Until success or max_retries exhausted
RetryUtil->>LiteLLM: Execute wrapped call
LiteLLM->>Provider: Send request
alt Provider success
Provider-->>LiteLLM: Return response
LiteLLM-->>RetryUtil: Return result
RetryUtil-->>Client: Return result
else Transient provider error
Provider-->>LiteLLM: Error
LiteLLM-->>RetryUtil: Raise exception
RetryUtil->>RetryUtil: is_retryable_litellm_error()? → True
RetryUtil->>RetryUtil: Compute backoff delay
RetryUtil->>RetryUtil: Wait & retry
else Non-retryable error
Provider-->>LiteLLM: Error
LiteLLM-->>RetryUtil: Raise exception
RetryUtil->>Client: Raise exception
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Move retry loops out of LLM/Embedding classes into generic call_with_retry, acall_with_retry, and iter_with_retry functions in retry_utils.py. Both classes now call these directly instead of maintaining their own retry helper methods. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
| Filename | Overview |
|---|---|
| unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py | Core refactor: adds is_retryable_litellm_error, _get_retry_delay shared helper, and three new retry wrappers (call_with_retry, acall_with_retry, iter_with_retry); fixes the ConnectionError import-shadowing and MRO-only name-match issues raised in prior review threads. |
| unstract/sdk1/src/unstract/sdk1/embedding.py | Adds _pop_retry_params (pops max_retries, zeros both max_retries and num_retries to prevent double-retry) and wraps all four embedding call sites with the new retry helpers. |
| unstract/sdk1/src/unstract/sdk1/llm.py | Adds static _disable_litellm_retry helper and wraps complete, stream_complete, and acomplete calls with the new retry wrappers. |
| unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/vertexai.json | Adds max_retries field (default 3) to the Vertex AI embedding adapter UI schema; indentation inside the file is slightly inconsistent with surrounding properties but JSON is semantically valid. |
| unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/bedrock.json | Changes the Bedrock embedding max_retries default from 5 → 3 for consistency, and unifies the description wording. |
Sequence Diagram
sequenceDiagram
participant Caller
participant LLM/Embedding
participant RetryWrapper
participant _get_retry_delay
participant litellm
Caller->>LLM/Embedding: complete() / get_embedding()
LLM/Embedding->>LLM/Embedding: _disable_litellm_retry()<br/>pops max_retries, sets max_retries=0 num_retries=0
LLM/Embedding->>RetryWrapper: call_with_retry(fn, max_retries)
loop attempt 0..max_retries
RetryWrapper->>litellm: fn() → litellm.completion/embedding(max_retries=0)
alt success
litellm-->>RetryWrapper: response
RetryWrapper-->>LLM/Embedding: response
else transient error
litellm-->>RetryWrapper: exception
RetryWrapper->>_get_retry_delay: (error, attempt, max_retries)
alt retryable AND attempt < max_retries
_get_retry_delay-->>RetryWrapper: delay (exponential+jitter)
RetryWrapper->>RetryWrapper: sleep(delay)
else non-retryable OR exhausted
_get_retry_delay-->>RetryWrapper: None
RetryWrapper-->>Caller: raise exception
end
end
end
LLM/Embedding-->>Caller: response
Prompt To Fix All With AI
This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
Line: 20
Comment:
**Consider excluding `500` from retryable status codes**
`500 Internal Server Error` is included in `RETRYABLE_STATUS_CODES`. Most transient LLM failures land on 429/503/504, while 500 is also returned by several providers for malformed requests or context-length violations that won't recover on retry. Retrying those calls burns quota unnecessarily and adds latency.
If you'd like to keep 500 retryable you may want to tighten it to only the clearly transient set:
```suggestion
RETRYABLE_STATUS_CODES = frozenset({408, 429, 502, 503, 504})
```
Alternatively, keeping 500 is a reasonable defensive choice — just worth a deliberate decision.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
Line: 122-142
Comment:
**`call_with_retry` (and `acall_with_retry`) have an implicit `None` return in their type signatures**
Both functions are typed as `-> object`, but if the loop body somehow fell through without returning or raising — which can't happen with the current logic, but is opaque to static analysers — they would implicitly return `None`. In practice the control flow is correct (last-attempt exceptions always re-raise), but callers like `embedding.py` do `resp["data"][0]["embedding"]` on the return value without a None-guard. Annotating the return as `-> Any` and adding a final `raise RuntimeError("unreachable")` after the loop would make this explicit and silence any type-checker warnings:
```python
for attempt in range(max_retries + 1):
try:
return fn()
except Exception as e:
delay = _get_retry_delay(...)
if delay is None:
raise
time.sleep(delay)
raise RuntimeError("call_with_retry: exhausted retries without raising") # unreachable
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (5): Last reviewed commit: "[FIX] Address CodeRabbit review: add API..." | Re-trigger Greptile
- Extract _get_retry_delay() shared helper to eliminate duplicated retry decision logic across call_with_retry, acall_with_retry, iter_with_retry, and retry_with_exponential_backoff - Add num_retries=0 to embedding._pop_retry_params() to fully disable litellm's internal retry for embedding calls - Expose max_retries in UI JSON schemas for embedding adapters (OpenAI, Azure, VertexAI, Ollama) and Ollama LLM — previously the field existed in Pydantic models but wasn't shown to users, silently defaulting to 0 retries - Add debug logging to LLM and Embedding retry parameter extraction - Clarify docstrings distinguishing is_retryable_litellm_error() from is_retryable_error() (different exception hierarchies) - Remove stale noqa: C901 from simplified retry_with_exponential_backoff Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…dapters Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…check - Fix `requests.ConnectionError` shadowing Python's builtin `ConnectionError` in `is_retryable_litellm_error()` — rename import to `RequestsConnectionError` and use `builtins.ConnectionError` / `builtins.TimeoutError` explicitly - Use `__mro__`-based class name check instead of `type(error).__name__` to also catch subclasses of retryable error types - P1 (num_retries not zeroed) was already fixed in prior commit Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (1)
254-317: Extract the sync retry loop instead of maintaining two copies.
retry_with_exponential_backoff()now reimplements the same attempt/sleep/decision flow ascall_with_retry(), which is the SonarCloud failure on Line 254. Pulling the common loop into one helper will lower the branching here and keep the decorator path from drifting away from the SDK path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 254 - 317, The retry logic in retry_with_exponential_backoff duplicates the attempt/sleep/decision flow from call_with_retry; extract that shared loop into a single helper function (e.g., _run_retry_loop or reuse call_with_retry signature) that encapsulates attempts, calling the target, invoking _get_retry_delay, sleeping, logging, and raising, then have retry_with_exponential_backoff's decorator/wrapper call this helper (passing func, max_retries, base_delay, multiplier, jitter, exceptions, logger_instance, prefix, retry_predicate) to eliminate the duplicated flow and keep both the decorator path and call_with_retry using the same implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 113-186: The three retry helpers (call_with_retry,
acall_with_retry, iter_with_retry) must validate max_retries is >= 0 before
entering their loops; add an early check in each function (call_with_retry,
acall_with_retry, iter_with_retry) that raises a ValueError (or similar) when
max_retries is negative so we fail fast instead of returning None or yielding
nothing when range(max_retries + 1) is empty; this mirrors the existing
decorator validation and prevents silent no-op behavior.
- Around line 33-57: The isinstance check in is_retryable_litellm_error
currently uses a shadowed ConnectionError (imported from requests.exceptions)
which narrows detection to requests-only errors; update the import/usage so the
check uses the built-in ConnectionError and TimeoutError types (i.e., remove or
rename the requests import and reference the built-ins in
is_retryable_litellm_error) and also add "APITimeoutError" to the
_RETRYABLE_ERROR_NAMES allowlist so OpenAI/OpenAI-mapped 408 timeout exceptions
are recognized by the name-based check.
---
Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 254-317: The retry logic in retry_with_exponential_backoff
duplicates the attempt/sleep/decision flow from call_with_retry; extract that
shared loop into a single helper function (e.g., _run_retry_loop or reuse
call_with_retry signature) that encapsulates attempts, calling the target,
invoking _get_retry_delay, sleeping, logging, and raising, then have
retry_with_exponential_backoff's decorator/wrapper call this helper (passing
func, max_retries, base_delay, multiplier, jitter, exceptions, logger_instance,
prefix, retry_predicate) to eliminate the duplicated flow and keep both the
decorator path and call_with_retry using the same implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b73e0a58-1ac6-4696-9a91-27485df74d1b
📒 Files selected for processing (9)
unstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/azure.jsonunstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/bedrock.jsonunstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/ollama.jsonunstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/openai.jsonunstract/sdk1/src/unstract/sdk1/adapters/embedding1/static/vertexai.jsonunstract/sdk1/src/unstract/sdk1/adapters/llm1/static/ollama.jsonunstract/sdk1/src/unstract/sdk1/embedding.pyunstract/sdk1/src/unstract/sdk1/llm.pyunstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
| def is_retryable_litellm_error(error: Exception) -> bool: | ||
| """Check if a litellm/provider API error should trigger a retry. | ||
|
|
||
| Args: | ||
| error: The exception to check | ||
| Distinct from is_retryable_error() which handles requests-library exceptions | ||
| (requests.ConnectionError, requests.HTTPError.response.status_code, OSError). | ||
| litellm/openai/httpx have a separate exception hierarchy: status_code lives | ||
| on the exception itself, and class names like APIConnectionError don't inherit | ||
| from the requests types. Uses duck-typing to avoid importing litellm directly. | ||
| """ | ||
| # Python built-in connection / timeout base classes | ||
| if isinstance(error, ConnectionError | TimeoutError): | ||
| return True | ||
|
|
||
| Returns: | ||
| True if the error should trigger a retry | ||
| # litellm/openai/httpx exception types that don't inherit from the | ||
| # built-ins above but still represent transient network failures. | ||
| if type(error).__name__ in _RETRYABLE_ERROR_NAMES: | ||
| return True | ||
|
|
||
| # Status-code check covers litellm.RateLimitError (429), | ||
| # InternalServerError (500), ServiceUnavailableError (503), etc. | ||
| status_code = getattr(error, "status_code", None) | ||
| if status_code is not None and status_code in RETRYABLE_STATUS_CODES: | ||
| return True | ||
|
|
||
| return False |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
For the current official LiteLLM, OpenAI Python SDK, and httpx documentation, what exception classes and attributes represent retryable connection, timeout, and HTTP 408/429/5xx failures? Please confirm whether any of them use response.status_codeinstead ofstatus_code, and whether classes like APITimeoutError, ConnectError, or PoolTimeout can be raised in the paths LiteLLM exposes.
💡 Result:
LiteLLM, OpenAI Python SDK, and httpx each handle retryable failures with specific exception classes and attributes. LiteLLM exposes OpenAI-compatible exceptions that wrap underlying errors, adding attributes like status_code, llm_provider, model, num_retries, and max_retries. It does not directly expose raw httpx exceptions like ConnectError or PoolTimeout in its public API paths; instead, it maps them to APIConnectionError or Timeout. LiteLLM (current official docs and source): - Connection failures: APIConnectionError (inherits openai.APIConnectionError), for network/connection issues including 500 unmapped errors. Retryable. - Timeout failures: Timeout (inherits openai.APITimeoutError), status_code=408 (or provider-specific). - HTTP 408: Mapped to Timeout. - HTTP 429: RateLimitError (inherits openai.RateLimitError), status_code=429. Retryable. - HTTP 5xx: APIError (generic 500), InternalServerError (>=500 unmapped), ServiceUnavailableError (503). All retryable. Attributes: All have .status_code (int, set directly e.g. self.status_code=429), .message, .llm_provider, .model, .litellm_debug_info, .num_retries, .max_retries. Some construct .response as httpx.Response with .status_code. OpenAI Python SDK (v1, _exceptions.py source): - Connection failures: APIConnectionError (inherits APIError). - Timeout failures: APITimeoutError (inherits APIConnectionError). - HTTP 408: APITimeoutError. - HTTP 429: RateLimitError (inherits APIStatusError), status_code=429. - HTTP 5xx: APIError, InternalServerError (inherits APIStatusError). Attributes: .status_code (on APIStatusError subclasses), .request (httpx.Request), .response (on some, with .status_code), .request_id. httpx (official docs): - Connection failures: ConnectError (inherits NetworkError > TransportError > RequestError). - Timeout failures: TimeoutException and subclasses: ConnectTimeout, ReadTimeout, WriteTimeout, PoolTimeout. All inherit TransportError > RequestError. - HTTP 408/429/5xx: Not exceptions by default; raised as HTTPStatusError (with .response.status_code) only if response.raise_for_status called. Attributes: Exceptions have .request (if available). No direct .status_code on exception; HTTPStatusError has .response with .status_code. Responses always use .status_code (not response.status_code). Confirmation: - None use response.status_code on the exception itself; LiteLLM/OpenAI exceptions set .status_code directly (int). httpx.Response has .status_code; no evidence of response.status_code anywhere. - APITimeoutError: Raised by OpenAI SDK, inherited/mapped in LiteLLM as Timeout. - ConnectError, PoolTimeout: httpx exceptions. Can occur internally in LiteLLM/OpenAI (both use httpx), but LiteLLM maps to APIConnectionError/Timeout before exposing in completion paths. Not raised directly by LiteLLM's public API.
Citations:
- 1: https://github.com/BerriAI/litellm/blob/main/litellm/exceptions.py
- 2: https://www.mintlify.com/BerriAI/litellm/api/exceptions
- 3: https://docs.litellm.ai/docs/exception_mapping
- 4: https://www.python-httpx.org/exceptions/
- 5: http://docs.litellm.ai/docs/exception_mapping
- 6: https://mintlify.com/BerriAI/litellm/api/exceptions
- 7: https://fossies.org/linux/openai-python/src/openai/_exceptions.py
🏁 Script executed:
# Check the import statement around line 13
head -20 unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py | cat -nRepository: Zipstack/unstract
Length of output: 792
🏁 Script executed:
# Find the _RETRYABLE_ERROR_NAMES definition
rg -n "_RETRYABLE_ERROR_NAMES" unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py -A 5 -B 2Repository: Zipstack/unstract
Length of output: 761
🏁 Script executed:
# Check if there are any other references to these exception names in the file
rg -n "APITimeoutError|ConnectError|PoolTimeout" unstract/sdk1/src/unstract/sdk1/utils/retry_utils.pyRepository: Zipstack/unstract
Length of output: 43
🏁 Script executed:
# Get the complete _RETRYABLE_ERROR_NAMES definition
sed -n '23,35p' unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py | cat -nRepository: Zipstack/unstract
Length of output: 429
🏁 Script executed:
# Get the complete code from lines 33-57 including the isinstance check
sed -n '33,57p' unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py | cat -nRepository: Zipstack/unstract
Length of output: 1428
Fix import shadowing and add missing exception class names.
Line 13 imports ConnectionError from requests.exceptions, which shadows the built-in ConnectionError class. This means the isinstance check on line 42 is checking requests.exceptions.ConnectionError instead of the built-in, narrowing its reach to only requests-library errors rather than Python's native connection errors.
Additionally, _RETRYABLE_ERROR_NAMES is missing "APITimeoutError", which is raised by the OpenAI SDK (OpenAI maps HTTP 408 to APITimeoutError). This class is not in the allowlist, so timeouts from the OpenAI SDK will skip the name-based check and only be caught if they carry a status_code attribute set to 408—a fragile fallback.
Consider using the built-in ConnectionError and TimeoutError directly in the isinstance check (available in Python 3.10+), and add "APITimeoutError" to the exception names allowlist to ensure consistent coverage across litellm, openai, and httpx exception hierarchies.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 33 - 57,
The isinstance check in is_retryable_litellm_error currently uses a shadowed
ConnectionError (imported from requests.exceptions) which narrows detection to
requests-only errors; update the import/usage so the check uses the built-in
ConnectionError and TimeoutError types (i.e., remove or rename the requests
import and reference the built-ins in is_retryable_litellm_error) and also add
"APITimeoutError" to the _RETRYABLE_ERROR_NAMES allowlist so
OpenAI/OpenAI-mapped 408 timeout exceptions are recognized by the name-based
check.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (2)
257-324: Consider extracting the innerwrapperfunction to reduce cognitive complexity.SonarCloud flags cognitive complexity of 20 (allowed: 15). The nested decorator structure with try/except and conditional logging contributes to this. You could extract the "giving up" logging block or the entire wrapper body into a helper function.
Possible refactor approach
+def _log_final_failure( + logger_instance: logging.Logger, + func_name: str, + attempt: int, + prefix: str, +) -> None: + """Log when retry attempts are exhausted.""" + if attempt > 0: + logger_instance.exception( + "Giving up '%s' after %d attempt(s) for %s", + func_name, + attempt + 1, + prefix, + ) + def retry_with_exponential_backoff(...) -> Callable: ... def decorator(func: Callable) -> Callable: `@wraps`(func) def wrapper(*args: Any, **kwargs: Any) -> Any: for attempt in range(max_retries + 1): try: result = func(*args, **kwargs) ... return result except exceptions as e: delay = _get_retry_delay(...) if delay is None: - if attempt > 0: - logger_instance.exception( - "Giving up '%s' after %d attempt(s) for %s", - func.__name__, - attempt + 1, - prefix, - ) + _log_final_failure(logger_instance, func.__name__, attempt, prefix) raise time.sleep(delay)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 257 - 324, The wrapper inside retry_with_exponential_backoff is too complex; extract the retry loop or the "giving up" logging and sleep/raise logic into a helper function to reduce cognitive complexity. Create a private helper (e.g., _execute_with_retries or _handle_attempt) that accepts func, args, kwargs, max_retries, base_delay, multiplier, jitter, exceptions, retry_predicate, prefix, and logger_instance and returns the result or raises; have wrapper call this helper and keep wrapper only responsible for `@wraps` and invoking the helper; ensure the helper uses _get_retry_delay and preserves the existing logging messages (including the success log in wrapper when attempt > 0 or move that logging into the helper) so behavior is unchanged.
138-145: Consider tightening type hints forfnparameter.The current
Callable[[], object]is imprecise:
acall_with_retryawaits the result, sofnshould return an awaitableiter_with_retryiterates the result, sofnshould return an iterableThis won't cause runtime issues but weakens static type checking.
Suggested type hints
from collections.abc import Awaitable, Iterable from typing import TypeVar T = TypeVar("T") async def acall_with_retry( fn: Callable[[], Awaitable[T]], *, max_retries: int, ... ) -> T: def iter_with_retry( fn: Callable[[], Iterable[T]], *, max_retries: int, ... ) -> Generator[T, None, None]:Also applies to: 160-167
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 138 - 145, Tighten the type hints for acall_with_retry and iter_with_retry: introduce a TypeVar T and change the fn parameter for acall_with_retry to Callable[[], Awaitable[T]] with the function returning T, and change iter_with_retry to accept Callable[[], Iterable[T]] and return Generator[T, None, None]; import Awaitable and Iterable from collections.abc (and Generator if not already present) and TypeVar from typing to make the signatures precise and improve static checking while keeping existing runtime behavior (refer to function names acall_with_retry and iter_with_retry and update their return annotations accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 257-324: The wrapper inside retry_with_exponential_backoff is too
complex; extract the retry loop or the "giving up" logging and sleep/raise logic
into a helper function to reduce cognitive complexity. Create a private helper
(e.g., _execute_with_retries or _handle_attempt) that accepts func, args,
kwargs, max_retries, base_delay, multiplier, jitter, exceptions,
retry_predicate, prefix, and logger_instance and returns the result or raises;
have wrapper call this helper and keep wrapper only responsible for `@wraps` and
invoking the helper; ensure the helper uses _get_retry_delay and preserves the
existing logging messages (including the success log in wrapper when attempt > 0
or move that logging into the helper) so behavior is unchanged.
- Around line 138-145: Tighten the type hints for acall_with_retry and
iter_with_retry: introduce a TypeVar T and change the fn parameter for
acall_with_retry to Callable[[], Awaitable[T]] with the function returning T,
and change iter_with_retry to accept Callable[[], Iterable[T]] and return
Generator[T, None, None]; import Awaitable and Iterable from collections.abc
(and Generator if not already present) and TypeVar from typing to make the
signatures precise and improve static checking while keeping existing runtime
behavior (refer to function names acall_with_retry and iter_with_retry and
update their return annotations accordingly).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 04b772de-c8c2-491a-9b94-28b8bcc53a9a
📒 Files selected for processing (1)
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
…tries - Add APITimeoutError to _RETRYABLE_ERROR_NAMES for explicit OpenAI SDK timeout coverage - Add _validate_max_retries() guard to call_with_retry, acall_with_retry, iter_with_retry to fail fast on negative values instead of silently returning None Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
There was a problem hiding this comment.
🧹 Nitpick comments (3)
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py (3)
266-333: Consider reducing cognitive complexity.SonarCloud flagged this function with cognitive complexity of 20 (allowed 15). The nested conditionals within the exception handlers contribute to this. Consider extracting the "giving up" logging block (lines 319-325) into a helper or simplifying the control flow.
💡 Suggested refactor
+def _log_giving_up( + logger_instance: logging.Logger, + func_name: str, + attempt: int, + prefix: str, +) -> None: + """Log when giving up after all retry attempts.""" + if attempt > 0: + logger_instance.exception( + "Giving up '%s' after %d attempt(s) for %s", + func_name, + attempt + 1, + prefix, + ) + def retry_with_exponential_backoff( ... ) -> Callable: def decorator(func: Callable) -> Callable: `@wraps`(func) def wrapper(*args: Any, **kwargs: Any) -> Any: for attempt in range(max_retries + 1): try: result = func(*args, **kwargs) if attempt > 0: logger_instance.info( "Successfully completed '%s' after %d retry attempt(s)", func.__name__, attempt, ) return result except exceptions as e: delay = _get_retry_delay(...) if delay is None: - if attempt > 0: - logger_instance.exception( - "Giving up '%s' after %d attempt(s) for %s", - func.__name__, - attempt + 1, - prefix, - ) + _log_giving_up(logger_instance, func.__name__, attempt, prefix) raise time.sleep(delay) except Exception: raise return wrapper return decorator🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 266 - 333, The retry_with_exponential_backoff wrapper has high cognitive complexity due to nested exception handling and an inline "giving up" logging block; extract that block into a small helper (e.g., _log_giving_up or _handle_give_up) and call it from inside the except exceptions as e: branch when _get_retry_delay returns None, passing logger_instance, prefix, func.__name__, and attempt so the main wrapper focuses on control flow only, reducing nesting; keep the existing calls to _get_retry_delay and time.sleep unchanged and ensure the helper performs the logger.exception call and any formatting before re-raising the exception.
145-152: Type hint should reflect awaitable return type.The
fnparameter is awaited on line 158, so the type hint should beCallable[[], Awaitable[object]]rather thanCallable[[], object].💡 Suggested fix
+from collections.abc import Awaitable + async def acall_with_retry( - fn: Callable[[], object], + fn: Callable[[], Awaitable[object]], *, max_retries: int, retry_predicate: Callable[[Exception], bool], description: str = "", logger_instance: logging.Logger | None = None, ) -> object:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 145 - 152, Update the type hint for the `fn` parameter in `acall_with_retry` to reflect that it's awaited: change `Callable[[], object]` to `Callable[[], Awaitable[object]]`, and add the necessary `Awaitable` import from `typing` (or `collections.abc`) at the top of the module so the annotation is valid; keep the function's async signature and return annotation as-is.
168-175: Type hint should reflect iterable return type.The
fnresult is iterated on line 186, so the type hint should reflect this, e.g.,Callable[[], Iterable[T]].💡 Suggested fix
+from collections.abc import Iterable +from typing import TypeVar + +T = TypeVar("T") + def iter_with_retry( - fn: Callable[[], object], + fn: Callable[[], Iterable[T]], *, max_retries: int, retry_predicate: Callable[[Exception], bool], description: str = "", logger_instance: logging.Logger | None = None, -) -> Generator: +) -> Generator[T, None, None]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py` around lines 168 - 175, The type hint for iter_with_retry is wrong because fn is expected to return an iterable; update the signature to use a generic TypeVar (e.g., T) and change fn's type to Callable[[], Iterable[T]] and the function's return type to Generator[T, None, None] so callers of iter_with_retry and the loop over fn() have correct typing; ensure you import or declare typing.TypeVar and Iterable where needed and update any related annotations in iter_with_retry accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py`:
- Around line 266-333: The retry_with_exponential_backoff wrapper has high
cognitive complexity due to nested exception handling and an inline "giving up"
logging block; extract that block into a small helper (e.g., _log_giving_up or
_handle_give_up) and call it from inside the except exceptions as e: branch when
_get_retry_delay returns None, passing logger_instance, prefix, func.__name__,
and attempt so the main wrapper focuses on control flow only, reducing nesting;
keep the existing calls to _get_retry_delay and time.sleep unchanged and ensure
the helper performs the logger.exception call and any formatting before
re-raising the exception.
- Around line 145-152: Update the type hint for the `fn` parameter in
`acall_with_retry` to reflect that it's awaited: change `Callable[[], object]`
to `Callable[[], Awaitable[object]]`, and add the necessary `Awaitable` import
from `typing` (or `collections.abc`) at the top of the module so the annotation
is valid; keep the function's async signature and return annotation as-is.
- Around line 168-175: The type hint for iter_with_retry is wrong because fn is
expected to return an iterable; update the signature to use a generic TypeVar
(e.g., T) and change fn's type to Callable[[], Iterable[T]] and the function's
return type to Generator[T, None, None] so callers of iter_with_retry and the
loop over fn() have correct typing; ensure you import or declare typing.TypeVar
and Iterable where needed and update any related annotations in iter_with_retry
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 22f7cf9d-7244-483f-90ab-c81ca438acc8
📒 Files selected for processing (1)
unstract/sdk1/src/unstract/sdk1/utils/retry_utils.py
pk-zipstack
left a comment
There was a problem hiding this comment.
@chandrasekharan-zipstack Looks like there are some comments from sonarcloud. Please check them out. LGTM otherwise.



What
max_retriesin UI for embedding adapters (OpenAI, Azure, VertexAI, Ollama) and Ollama LLM — previously missing from the adapter form_get_retry_delay()helper used by all 4 retry pathsWhy
max_retriesconfigured in the adapter UI was silently ignored for:embedding_with_retries()does not exist)max_retriesin the UI, so users could not configure retries even if the backend supported themcall_with_retry,acall_with_retry,iter_with_retry, andretry_with_exponential_backoffHow
litellm.completion()/litellm.embedding()callsmax_retries=0(SDK-level) +num_retries=0(litellm wrapper)is_retryable_litellm_error()predicate uses duck-typing (no litellm import) to detect transient errors: status codes 408/429/500/502/503/504,ConnectionError,TimeoutError, and litellm/httpx exception class namesiter_with_retry) only retries before the first chunk is yielded — mid-stream failures propagate immediately_get_retry_delay()shared helper — all retry functions delegate retry decisions to this single functionCan this PR break any existing features. If yes, please list possible items. If no, please explain why.
No breaking changes:
max_retriesin saved metadata: Pydantic defaults toNone, code converts to0— no retry, identical to current behaviormax_retries=3, users can adjustlitellm.completion()/litellm.embedding()Database Migrations
Env Config
PLATFORM_SERVICE_MAX_RETRIES,PROMPT_SERVICE_MAX_RETRIESetc. are unaffected.Relevant Docs
max_retriestonum_retriesRelated Issues or PRs
Dependencies Versions
Notes on Testing
Tested locally with docker containers against:
Screenshots
N/A — backend-only changes + JSON schema additions
Checklist
I have read and understood the Contribution Guidelines.