Skip to content

feat: add OpenAI-compatible provider for local LLM support#1

Open
marcs7 wants to merge 1 commit intowolfsoftwaresystemsltd:masterfrom
marcs7:feat/openai-provider
Open

feat: add OpenAI-compatible provider for local LLM support#1
marcs7 wants to merge 1 commit intowolfsoftwaresystemsltd:masterfrom
marcs7:feat/openai-provider

Conversation

@marcs7
Copy link
Copy Markdown

@marcs7 marcs7 commented Mar 19, 2026

Summary

Adds a third AI provider option ("OpenAI Compatible") that works with any OpenAI-compatible API endpoint, enabling local LLM usage via tools like
LiteLLM, Ollama, vLLM, llama.cpp, etc.

This allows WolfStack users to run AI features entirely self-hosted without requiring external API keys from Anthropic or Google.

Changes

Backend (src/ai/mod.rs)

  • Added openai_api_key and openai_base_url fields to AiConfig
  • New call_openai() function implementing OpenAI /v1/chat/completions API
  • Provider routing for "openai" in chat(), health_check(), and analyze_issue()
  • Model listing via /v1/models endpoint
  • API key is optional (supports unauthenticated local endpoints)

Backend (src/api/mod.rs)

  • ai_save_config handles new openai_api_key and openai_base_url fields
  • Config syncs across cluster nodes

Frontend (web/index.html)

  • "OpenAI Compatible (Local LLM)" option in provider dropdown
  • Base URL and API Key fields (shown only when OpenAI provider selected)
  • Text input for model name (replaces dropdown when using OpenAI, since model names are user-defined)

Frontend (web/js/app.js)

  • Load/save/test functions handle new fields
  • Dynamic show/hide of OpenAI fields and model input based on provider selection

Testing

Tested with:

  • LiteLLM proxy → llama.cpp (Qwen3-Coder-30B, local Vulkan inference)
  • Chat, health checks, and connection test all working
  • Config persists across restarts and syncs to cluster nodes

Use Case

Many self-hosted users run local LLMs for privacy, cost, or latency reasons. This PR enables WolfStack's AI features (chat assistant, health monitoring,
issue analysis) to work with any OpenAI-compatible endpoint — no cloud API keys required.

Screenshot_20260319_151605

wolfsoftwaresystemsltd pushed a commit that referenced this pull request Apr 16, 2026
Folded WolfSecure's scanner logic into the main wolfstack binary as a new src/security.rs module — no more separate plugin, no per-node install dance, no architecture-mismatch pain. Findings flow into the existing System Check UI under a 'Security' category, into the periodic alerting pipeline with per-finding cooldown, and into the AI health-check prompt so the assistant treats active threats as top-priority.

Posture checks:
  - Risky services listening on 0.0.0.0 (Docker 2375, Redis, MongoDB, Postgres, MySQL, Elasticsearch, memcached, Kibana)
  - /etc/wolfstack/ config files that are world- or group-readable
  - Cluster secret too short or low-entropy
  - sshd_config with PermitRootLogin yes / PasswordAuthentication yes
  - SSH exposed without fail2ban or sshguard running

Active-attack detection:
  - SSH brute-force in progress (>=10 failed auths from one IP within the last 5 minutes, via journalctl with auth.log fallback that filters by leading syslog timestamp)
  - Known crypto-miner processes (xmrig, minerd, cpuminer, ethminer, t-rex, nbminer, lolminer, cgminer, and friends) — the #1 post-compromise payload
  - Recent executable files in /tmp or /dev/shm (<24h, classic malware staging)
  - Established outbound connections to RAT / C2 / mining-pool ports (4444, 6667, 9001, 14444, etc) excluding RFC1918 peers

Integration:
  - GET /api/system-check now runs dependency + security checks in parallel via tokio::join!, appending findings to the existing checks array — frontend renders 'Security' as a new category automatically
  - Background tokio task runs the security scan every 5 min, fires alerts via alerting::send_alert() on critical findings (Discord/Slack/Telegram/email), with a per-check-name 1h cooldown so a prolonged attack doesn't spam channels
  - AI health_check() loads security findings and threads them into the LLM prompt so alerts + proposed [ACTION] fixes account for active threats, not just CPU/RAM/disk

Bugs caught in pre-commit review (all fixed):
  - ssh_is_exposed was reading wrong field from ss -tlnp (Netid column dropped changes indexing); now uses ss -tulnp with explicit tcp filter
  - scan_outbound_suspicious assumed 5 fields but ss suppresses State column when filtered — fixed to use fields[3]
  - auth.log fallback used to read the entire log; now tails 2000 lines AND filters by leading syslog timestamp to enforce the 5-minute window
  - Alert cooldown key stripped of dynamic counts so 'SSH brute-force (3 IPs, 47 attempts)' doesn't re-alert every scan
wolfsoftwaresystemsltd pushed a commit that referenced this pull request May 1, 2026
Three urgent fixes targeting the Discord reports about local AI not
working in WolfStack despite working via curl, plus Gary's restart
bubble bug.

1. Local-model tool calling — recover from content-string emissions
   Reproduced live on this box against Ollama's qwen2.5-coder:32b: the
   model emits tool calls as JSON *inside* `message.content` instead
   of in the structured `tool_calls` array. WolfStack's previous
   parser only read `tool_calls`, so users got raw JSON as the "AI's
   reply" and tool dispatch never fired. Common with smaller / not-
   tool-fine-tuned local models (qwen2.5 below 7B, llama3.2 variants,
   gemma tunes, FunctionGemma).

   New `extract_tool_calls_from_content` in `src/ai/mod.rs` recognises
   six common content-side wire formats and translates them into the
   same bracket-tag pipeline the rest of the code uses:
     • bare `{"name": "fn", "arguments": {…}}` object
     • bare array `[{"name": "fn", "arguments": {…}}, …]`
     • OpenAI-shape `{"function": {…}}` inside content
     • Fenced ```json blocks
     • `<tool_call>…</tool_call>` and `<function_call>…</function_call>`
       XML wrappers (FunctionGemma + qwen tool-call dialect)
     • Mistral's `[TOOL_CALLS]` prefix

   **Critical safety property**: takes an `allowed_tool_names` param
   and drops any extracted call whose name isn't on the caller's
   allowlist. Without that, prose like `{"name": "nginx", "status":
   "stopped"}` (a model explaining service state) would synthesise a
   phantom tool call — on a sysadmin platform that's potential
   command-injection-via-AI-response. The allowlist closes that.
   `MAIN_AI_TOOLS` constant gates the chat path; WolfAgents passes
   each agent's `allowed_tools` set.

   `<tool_call>` XML stripper anchored at position 0 of the trimmed
   string so a mid-prose echo can't trigger extraction. Fenced-block
   close detected with proper `rfind("```")` instead of trim-end-all-
   backticks.

   14 unit tests across `content_tool_call_tests` covering every
   wire format, the rejection cases, and two regression guards
   pinned by name (`unknown_tool_name_is_dropped`,
   `mid_prose_xml_wrapper_does_not_match`).

2. WolfAgents same recovery path
   `src/wolfagents/agent_loop.rs::openai_tool_loop` had the same
   blind spot — model's content-side tool calls returned to the
   user as raw JSON. Recovery synthesises into the existing
   `tool_calls_json` so the rest of the multi-round loop works
   unchanged. Tool-call IDs use `call_{:016x}` (timestamp + counter
   + agent-id-length mix) so a future swap to OpenAI proper, which
   validates the ID shape, doesn't reject the recovered turn's
   history.

3. AI chat bubble visibility on restart (Gary KO4BSR)
   `web/index.html`'s page-load visibility check only inspected
   `has_claude_key || has_gemini_key` — Local / OpenAI / OpenRouter
   users lost the red AI bubble every wolfstack restart and had to
   re-save Settings → AI Agent to bring it back (the Save handler
   force-shows it). Now checks all five provider fields.

Diagnostic logging
- `call_local`: previously-debug-only logs escalated to info/warn
  so the next "local AI not responding" report has actionable
  signal at default log levels.
- Empty-response error message now includes finish_reason +
  body_size so context-overflow on small models is identifiable
  from the UI.
- Body-size measurement at debug level (RUST_LOG=wolfstack::ai=debug
  to enable) — surfaces the #1 small-model failure mode.

Independent code-reviewer pass: two BLOCKERs (mid-prose XML match,
unrestricted tool-name acceptance) + four MAJORs (synthetic ID
format, fenced-block stripping, info-log flooding) all addressed
before commit. All findings have negative-case regression tests
pinning the fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant