-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
LocalAI version:
- localai:latest
- localai:latest-gpu-nvidia-cuda-13
- images pulled today
Environment, CPU architecture, OS, and Version:
- cpu: intel i7-10700K
- gpu: nvidia 3080
- windows11 - using docker engine on wsl
- also tried on macOS m1 and m3, but not supported rerankers backend
Describe the bug
/v1/rerank not working
I have been trying to use reranker .gguf models, but i end up with error 500.
I have been trying a couple of combination, using the cpu and gpu image, with different backends in the .yaml.
Using the model from gallery Qwen3-VL-Reranker-8B-GGUF, with gpu image i get two error depending on using the llama-cpp backend which are set by the gallery
Jan 18 23:09:02 ERROR Failed to load model modelID="Qwen3-VL-Reranker-8B-GGUF" error=failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF backend="llama-cpp" caller={caller.file="/build/pkg/model/initializers.go" caller.L=179 }
or updating the yaml to backend: rerankers using the cuda13-rerankers backend.
api-1 | Jan 19 08:19:44 ERROR Failed to load model modelID="Qwen3-VL-Reranker-8B-GGUF" error=failed to load model with internal loader: could not load model (no success): Unexpected err=TypeError("APIRanker.__init__() missing 1 required positional argument: 'api_key'"), type(err)=<class 'TypeError'> backend="rerankers" caller={caller.file="/build/pkg/model/initializers.go" caller.L=179 }
i have also tried to use gguf from huggingface same issues
To Reproduce
- Run localai with docker compose
services:
api:
image: localai/localai:latest-gpu-nvidia-cuda-13
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
ports:
- 8080:8080
environment:
- DEBUG=true
volumes:
- ./models:/models:cached
# For NVIDIA GPUs, uncomment:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
- download the raranker model from gallery Qwen3-VL-Reranker-8B-GGUF
- curl the model using
curl -X POST http://localhost:8080/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"query": "What are the benefits of exercise?",
"documents": [
"Regular exercise can improve cardiovascular health.",
"Eating a balanced diet is important for overall well-being.",
"Exercise helps in weight management and builds muscle strength.",
"Reading books can expand your knowledge and vocabulary."
],
"model": "Qwen3-VL-Reranker-8B-GGUF",
"top_n": 2
}'
- also try updating the Qwen3-VL-Reranker-8B-GGUF yaml file using the reranker backend to get the second error
- comments is me updating the gallery models yaml file
backend: llama-cpp
#backend: rerankers
description: Imported from https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF
function:
grammar:
disable: true
known_usecases:
- chat
#- rerank
mmproj: llama-cpp/mmproj/Qwen3-VL-Reranker-8B.mmproj-f16.gguf
name: Qwen3-VL-Reranker-8B-GGUF
options:
- use_jinja:true
parameters:
model: llama-cpp/models/Qwen3-VL-Reranker-8B.Q4_K_M.gguf
template:
use_tokenizer_template: true
same issues with gallery model: jina reranker tiny
Expected behavior
Expected to return a response with reranked documents.
Logs
Additional context