Skip to content

feat: Evaluate Prompt Orchestration Conflict Resolution #309

Description

@thinhlpg

Description

Evaluate and validate that USER instructions correctly override SERVER instructions when conflicts occur

Context

The prompt orchestration system composes prompts from multiple sources:

  • SERVER-side (developer-controlled): timing, system prompt, deep research, tools, code assistant, chain-of-thought
  • USER-side (user-controlled): project instruction, custom instruction, memory, tone/style

When these conflict (e.g., server says "JSON" but user says "Markdown"), user settings should take priority.

Models to test:

  • Jan 30B Instruct
  • Jan 30B Thinking

Scope

  • Test conflict scenarios: SERVER vs USER instruction priority
  • Test toggle behavior: Enable/disable optional server prompts one-by-one
  • Validate with LLM-as-judge evaluation
  • Document findings and edge cases

Out of Scope

  • Changing the actual prompt orchestration implementation
  • Production deployment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions