Skip to content

[Router] 2381 unified interface for semantic_similarity and model classifier#2483

Merged
ramkrishna2910 merged 6 commits into
mainfrom
router/2381-semantic_similarity_classifier-unified
Jun 30, 2026
Merged

[Router] 2381 unified interface for semantic_similarity and model classifier#2483
ramkrishna2910 merged 6 commits into
mainfrom
router/2381-semantic_similarity_classifier-unified

Conversation

@SlawomirNowaczyk

Copy link
Copy Markdown
Collaborator

Implements the semantic_similarity classifier for the Lemonade Router and unifies it with the generic classifier type: both now produce a label → score map and are addressed by label + min_score/max_score band. reference_phrases becomes a concept → phrases map, so one classifier can score many concepts in a single embedding pass.

Rationale for the breaking change is here: https://github.com/lemonade-sdk/lemonade/blob/router/2381-semantic_similarity_classifier-unified/semantic_classifier_unification_rationale.md

What changed

  • Unified scoring contract. semantic_similarity fills Score::labels with one entry per concept (max cosine over that concept's reference phrases), read back via score_of(label) exactly like classifier. The condition layer (make_classifier_band_condition, leaf-factory label/default_label resolution) has no semantic-vs-model branch.
  • Multi-concept support. reference_phrases is now an object mapping each concept name (the classifier's output label) to its exemplar phrases. Input is embedded once and scored against every concept; reference embeddings are computed once and cached (mutex-guarded, since classifiers are shared across concurrent requests).
  • Removed the magic empty key. The old Score.labels[""] special case is gone. Score::primary() is now strict: it returns the lone entry of a genuinely label-less classifier and 0.0 otherwise, so a condition that omits label can never silently match an arbitrary label of a multi-label score.
  • Shared parsing. make_classifier factors out parse_labels / parse_default_label; the only type-specific step is parse_reference_phrases, which derives labels from the concept keys. semantic_similarity rejects an explicit labels field (concept names are authoritative).
  • Schema. route_policy.schema.json updates reference_phrases to a non-empty object of concept → non-empty string[], and forbids labels on semantic_similarity.

Frozen-semantics / compatibility

  • reference_phrases shape changes from string[] to {concept: string[]} — a breaking change to the (unreleased) routing schema, intentional and the whole point of the unification.
  • Score::primary() redefined (lone-entry-or-0.0); the ""-key convention is no longer special anywhere in the engine. Worth a row in the frozen-semantics table in README.md.

Validation

  • C++ unit tests: test_routing_policy_semantic.cpp (multi-concept scoring, max-cosine, caching, on_error, failure paths) plus updates to test_routing_policy_registry.cpp, test_routing_policy_contract.cpp, test_routing_policy_evaluator.cpp, and the l2_semantic.json fixture / fake_classifier_services.h.
  • Run: ctest --output-on-failure -R "RoutingPolicy(Contract|Evaluator|Registry|Semantic)Test".

Closes

Closes #2381.

@SlawomirNowaczyk SlawomirNowaczyk changed the title [Router] 2381 unified interdace for semantic_similarity and model classifier [Router] 2381 unified interface for semantic_similarity and model classifier Jun 29, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Jun 29, 2026

@eddierichter-amd eddierichter-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Two small changes requested.

Comment thread CMakeLists.txt
Comment thread src/cpp/resources/schemas/route_policy.schema.json
@ramkrishna2910

Copy link
Copy Markdown
Contributor

Decision: we're consolidating on this unified approach for #2381 over the single-concept variant in #2482. Reasoning is timing — the routing schema is still unreleased, so collapsing semantic_similarity onto the generic classifier contract (and dropping the magic empty key) is free now and a breaking, corpus-and-migration cost after release. The rationale doc captures it well, and the multi-concept capability is a real bonus for no ongoing complexity.

Two things to land before merge (both mechanical, per Eddie's review):

  1. CI — add RoutingPolicySemanticTest to the ctest filter (RoutingPolicy(Contract|Evaluator|Registry|Semantic)Test) in both spots in cpp_server_build_test_release.yml.
  2. schema-lockroute_policy.schema.json changed but schema-lock.json wasn't refreshed. Regenerate it in this PR (python test/test_schema_lock.py --update) so the lock guard passes. The schema is still released: false, so this is a clean refresh, not a back-compat break.

Heads-up on sequencing: #2380 (deterministic conditions) also touches route_policy.schema.json + schema-lock.json (tightening keywords_any/keywords_all to minItems: 1). I'll rebase #2380 onto this once it merges and regenerate the lock there, so we don't collide on the hash.

@ramkrishna2910 ramkrishna2910 enabled auto-merge June 29, 2026 23:31
@ramkrishna2910 ramkrishna2910 added this pull request to the merge queue Jun 29, 2026
Merged via the queue into main with commit 71dae50 Jun 30, 2026
119 of 124 checks passed
@ramkrishna2910 ramkrishna2910 deleted the router/2381-semantic_similarity_classifier-unified branch June 30, 2026 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Router] semantic_similarity classifier (M5)

3 participants