Skip to content

/v1/rerank not working. Issues with backend integration? #8115

@aaskil

Description

@aaskil

LocalAI version:

  • localai:latest
  • localai:latest-gpu-nvidia-cuda-13
  • images pulled today

Environment, CPU architecture, OS, and Version:

  • cpu: intel i7-10700K
  • gpu: nvidia 3080
  • windows11 - using docker engine on wsl
  • also tried on macOS m1 and m3, but not supported rerankers backend

Describe the bug

/v1/rerank not working

I have been trying to use reranker .gguf models, but i end up with error 500.
I have been trying a couple of combination, using the cpu and gpu image, with different backends in the .yaml.

Using the model from gallery Qwen3-VL-Reranker-8B-GGUF, with gpu image i get two error depending on using the llama-cpp backend which are set by the gallery

Jan 18 23:09:02 ERROR Failed to load model modelID="Qwen3-VL-Reranker-8B-GGUF" error=failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF backend="llama-cpp" caller={caller.file="/build/pkg/model/initializers.go" caller.L=179 }
or updating the yaml to backend: rerankers using the cuda13-rerankers backend.

api-1 | Jan 19 08:19:44 ERROR Failed to load model modelID="Qwen3-VL-Reranker-8B-GGUF" error=failed to load model with internal loader: could not load model (no success): Unexpected err=TypeError("APIRanker.__init__() missing 1 required positional argument: 'api_key'"), type(err)=<class 'TypeError'> backend="rerankers" caller={caller.file="/build/pkg/model/initializers.go" caller.L=179 }

i have also tried to use gguf from huggingface same issues

To Reproduce

  • Run localai with docker compose
services:
  api:
    image: localai/localai:latest-gpu-nvidia-cuda-13
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
    volumes:
      - ./models:/models:cached
    # For NVIDIA GPUs, uncomment:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
                capabilities: [gpu]
  • download the raranker model from gallery Qwen3-VL-Reranker-8B-GGUF
  • curl the model using
curl -X POST http://localhost:8080/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the benefits of exercise?",
    "documents": [
      "Regular exercise can improve cardiovascular health.",
      "Eating a balanced diet is important for overall well-being.",
      "Exercise helps in weight management and builds muscle strength.",
      "Reading books can expand your knowledge and vocabulary."
    ],
    "model": "Qwen3-VL-Reranker-8B-GGUF",
    "top_n": 2
  }'
  • also try updating the Qwen3-VL-Reranker-8B-GGUF yaml file using the reranker backend to get the second error
  • comments is me updating the gallery models yaml file
backend: llama-cpp
#backend: rerankers
description: Imported from https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF
function:
    grammar:
        disable: true
known_usecases:
    - chat
    #- rerank
mmproj: llama-cpp/mmproj/Qwen3-VL-Reranker-8B.mmproj-f16.gguf
name: Qwen3-VL-Reranker-8B-GGUF
options:
    - use_jinja:true
parameters:
    model: llama-cpp/models/Qwen3-VL-Reranker-8B.Q4_K_M.gguf
template:
    use_tokenizer_template: true

same issues with gallery model: jina reranker tiny

Expected behavior

Expected to return a response with reranked documents.

Logs

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions