Skip to content

Local tool - benchmark and test#3

Open
falkorichter wants to merge 4 commits intomainfrom
local-tool
Open

Local tool - benchmark and test#3
falkorichter wants to merge 4 commits intomainfrom
local-tool

Conversation

@falkorichter
Copy link
Copy Markdown
Member

@falkorichter falkorichter commented Apr 7, 2026

  • Added a benchmark for the LLM
  • Reverted workspace/signal-faq-bot/INSTRUCTIONS.md to its original state

created ticket at Arthur-Ficial/apfel#47

@falkorichter falkorichter self-assigned this Apr 7, 2026
@falkorichter
Copy link
Copy Markdown
Member Author

@copilot please remove all the changes from the workspace/signal-faq-bot/INSTRUCTIONS.md file in this PR, reset it to it's original state and update the PR description

Copy link
Copy Markdown

Copilot AI commented Apr 7, 2026

@copilot please remove all the changes from the workspace/signal-faq-bot/INSTRUCTIONS.md file in this PR, reset it to it's original state and update the PR description

Done. workspace/signal-faq-bot/INSTRUCTIONS.md has been reverted to its original state in commit 6256ac9, and the PR description has been updated.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a local benchmark prompt for evaluating the FAQ bot, and updates the Signal FAQ bot workspace content (instructions + Python dependencies).

Changes:

  • Add benchmark-prompt.md containing a sample user query plus 2-week calendar context and FAQ/instructions text.
  • Update workspace/signal-faq-bot/INSTRUCTIONS.md to embed the full FAQ content and adjust bot guidance.
  • Add workspace/signal-faq-bot/requirements.txt for calendar-related Python dependencies.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
workspace/signal-faq-bot/requirements.txt Adds Python dependency list for the calendar fetch script.
workspace/signal-faq-bot/INSTRUCTIONS.md Reworks the bot instruction prompt and inlines the FAQ content.
benchmark-prompt.md Adds a benchmark prompt combining a sample question with calendar + FAQ/instructions context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmark-prompt.md
   Fahrradwerkstatt mit Khashayar
   ℹ️  Das Team der Fahrrad-Selbsthilfewerkstatt unterstützt euch bei der Reparatur eurer Bikes. 

Spenden bitte:  (https://35services.de/startseite/spenden/
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The donation link line has an unclosed parenthesis (Spenden bitte: (https://35services.de/startseite/spenden/). Fix the punctuation so the markdown renders cleanly and the URL is easy to click/copy in the benchmark prompt.

Suggested change
Spenden bitte:  (https://35services.de/startseite/spenden/
Spenden bitte: https://35services.de/startseite/spenden/

Copilot uses AI. Check for mistakes.
Comment thread benchmark-prompt.md
Comment on lines +11 to +16
📅 Wednesday, 08.04.2026 19:00–21:00 Uhr
   Holz Öffnungszeit am Mittwoch
   ℹ️  Schwerpunkt: Holzarbeiten
Die "großen Maschinen" können nur genutzt werden, wenn jemand aus dem Verein anwesend ist, der eingewiesen ist. Wegen Lärmsc

📅 Wednesday, 08.04.2026 19:00–21:00 Uhr
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several calendar description lines in the benchmark input are cut off mid-word (e.g., “Wegen Lärmsc”, “Die "großen Maschi”). If this file is meant to be reused for consistent evaluation, consider cleaning these snippets (or adding an explicit ellipsis) so the benchmark prompt is well-formed and less ambiguous.

Copilot uses AI. Check for mistakes.
Comment thread benchmark-prompt.md

📅 Monday, 20.04.2026 18:00–20:00 Uhr
   Holz Öffnungszeit Micha
   ℹ️  <span>Achtung: Am ersten Montag im Monat findet eine Vereinssitzung parallel statt. Dann fällt diese Öffungszeit aus.</span><br><br>Die "großen Maschi
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calendar snippet includes raw HTML (<span>…</span><br><br>) which makes the benchmark input noisy. If this prompt is used to evaluate answer quality, consider stripping HTML from event descriptions in the benchmark data (or normalizing it) so the model isn’t penalized for markup artifacts.

Suggested change
   ℹ️  <span>Achtung: Am ersten Montag im Monat findet eine Vereinssitzung parallel statt. Dann fällt diese Öffungszeit aus.</span><br><br>Die "großen Maschi
   ℹ️  Achtung: Am ersten Montag im Monat findet eine Vereinssitzung parallel statt. Dann fällt diese Öffungszeit aus. Die "großen Maschi

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants