/v1/rerank not working. Issues with backend integration?

**LocalAI version:**

- localai:latest
- localai:latest-gpu-nvidia-cuda-13
- images pulled today

**Environment, CPU architecture, OS, and Version:**

- cpu: intel i7-10700K
- gpu: nvidia 3080
- windows11 - using docker engine on wsl
- also tried on macOS m1 and m3, but not supported rerankers backend

**Describe the bug**

### /v1/rerank not working

I have been trying to use reranker .gguf models, but i end up with error 500.
I have been trying a couple of combination, using the cpu and gpu image, with different backends in the .yaml.

Using the model from gallery Qwen3-VL-Reranker-8B-GGUF, with gpu image i get two error depending on using the llama-cpp backend which are set by the gallery 

`
Jan 18 23:09:02 ERROR Failed to load model modelID="Qwen3-VL-Reranker-8B-GGUF" error=failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF backend="llama-cpp" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=179 } 
`
or updating the yaml to backend: rerankers using the cuda13-rerankers backend.

`
api-1  | Jan 19 08:19:44 ERROR Failed to load model modelID="Qwen3-VL-Reranker-8B-GGUF" error=failed to load model with internal loader: could not load model (no success): Unexpected err=TypeError("APIRanker.__init__() missing 1 required positional argument: 'api_key'"), type(err)=<class 'TypeError'> backend="rerankers" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=179 }
 `

i have also tried to use gguf from huggingface same issues 

**To Reproduce**

- Run localai with docker compose
```
services:
  api:
    image: localai/localai:latest-gpu-nvidia-cuda-13
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
    volumes:
      - ./models:/models:cached
    # For NVIDIA GPUs, uncomment:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
                capabilities: [gpu]
```
- download the raranker model from gallery Qwen3-VL-Reranker-8B-GGUF
- curl the model using
```
curl -X POST http://localhost:8080/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the benefits of exercise?",
    "documents": [
      "Regular exercise can improve cardiovascular health.",
      "Eating a balanced diet is important for overall well-being.",
      "Exercise helps in weight management and builds muscle strength.",
      "Reading books can expand your knowledge and vocabulary."
    ],
    "model": "Qwen3-VL-Reranker-8B-GGUF",
    "top_n": 2
  }'
```
- also try updating the Qwen3-VL-Reranker-8B-GGUF yaml file using the reranker backend to get the second error
- comments is me updating the gallery models yaml file
```
backend: llama-cpp
#backend: rerankers
description: Imported from https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF
function:
    grammar:
        disable: true
known_usecases:
    - chat
    #- rerank
mmproj: llama-cpp/mmproj/Qwen3-VL-Reranker-8B.mmproj-f16.gguf
name: Qwen3-VL-Reranker-8B-GGUF
options:
    - use_jinja:true
parameters:
    model: llama-cpp/models/Qwen3-VL-Reranker-8B.Q4_K_M.gguf
template:
    use_tokenizer_template: true
```
same issues with gallery model: jina reranker tiny

**Expected behavior**

Expected to return a response with reranked documents.

**Logs**


**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

/v1/rerank not working. Issues with backend integration? #8115

/v1/rerank not working

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

/v1/rerank not working. Issues with backend integration? #8115

Description

/v1/rerank not working

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions