Skip to content

feat: add schema modification MCP tools (add-fields, add-field-types)#131

Open
adityamparikh wants to merge 18 commits into
apache:mainfrom
adityamparikh:schema-modification
Open

feat: add schema modification MCP tools (add-fields, add-field-types)#131
adityamparikh wants to merge 18 commits into
apache:mainfrom
adityamparikh:schema-modification

Conversation

@adityamparikh
Copy link
Copy Markdown
Contributor

Summary

  • Adds add-fields and add-field-types MCP tools to extend a Solr collection's schema additively through the MCP layer.
  • Partially closes Add schema and configuration modification tools #30. Replace/delete operations, add-copy-field, add-dynamic-field, and add-codec-factory are deliberately deferred (see spec for rationale).
  • Both tools take List<Map<String, Object>> matching Solr's Schema API JSON shape; batched via SchemaRequest.MultiUpdate. No transformation layer.
  • Both tools are @PreAuthorize("isAuthenticated()") — HTTP enforces auth, stdio bypasses (same pattern as existing tools).
  • Renames metadata/ package to schema/ since it only ever held schema-related types.

Design and plan

Why two tools instead of one combined? LLM tool-use ergonomics: single-purpose tools eliminate cross-wiring risk between field and field-type definitions. The orphan-field-type cost of separation is harmless.

Why no replace/delete? Schema replace/delete silently breaks indexed data without an explicit reindex. AI-driven workflows are the wrong place to expose that footgun without a guardrail design.

Why Map<String, Object> instead of records? Solr field-type shape includes analyzers/tokenizers/filters with class-specific param bags; records collapse to Map<String, Object> at the leaves anyway. Map shape matches SolrJ's AddField(Map) constructor — zero transformation.

Tool descriptions include inline recipes

add-field-types description includes three common recipes inline (case-insensitive exact match, dense vector for semantic search, autocomplete) to give the LLM a diagnostic-to-fix bridge — symptom in user prompt → recipe in description → exact analyzer chain.

Test plan

  • Unit tests (Mockito, @DisabledInNativeImage) — validation, happy path, error propagation, single-analyzer, separate index/query analyzers, DenseVectorField
  • Integration tests (Testcontainers, real Solr) — schema round-trip, custom analyzer behavior at index + query time, DenseVectorField round-trip, duplicate-field / unknown-type error propagation
  • MCP protocol tests in McpClientIntegrationTestBase — end-to-end add-fieldsindex-json-documentssearch workflow, exercised against both HTTP and stdio transports
  • ./gradlew clean build — BUILD SUCCESSFUL, all tests pass
  • ./gradlew jibDockerBuild — Jib JVM image built
  • ./gradlew bootBuildImage -Pnative — Paketo native stdio image built
  • ./gradlew bootBuildImage -Pnative -Pprofile=http — Paketo native HTTP image built
  • ./gradlew nativeTest -Pnative — 168/168 native tests pass (113 Mockito tests appropriately skipped)

The nativeTest run also caught a real native-image reflection gap: Jackson.convertValue(map, AnalyzerDefinition.class) needs reflection metadata for AnalyzerDefinition (and FieldTypeDefinition defensively). Fixed in fix(native): register AnalyzerDefinition and FieldTypeDefinition for reflection.

🤖 Generated with Claude Code

adityamparikh and others added 17 commits May 17, 2026 01:20
Spec and plan for adding add-fields and add-field-types MCP tools per
issue #30. See docs/superpowers/specs/ and docs/superpowers/plans/.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
… tools

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
…oject DTO convention

@JsonIgnoreProperties, @JsonInclude(NON_NULL), and @jsonformat on the
timestamp field match the pattern used by every record in Dtos.java.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
Package previously named "metadata" only contained schema-related types
(SchemaService, SchemaUpdateResult, and their tests). Renaming to "schema"
makes the package name accurate to its contents.

Moves preserve git history via git mv. Imports updated in Main, MainTest,
and McpToolRegistrationTest. Spec and plan paths updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
Closes part of #30. Adds one or more fields per call via SolrJ's
SchemaRequest.MultiUpdate. Input is List<Map<String, Object>> matching
the Solr Schema API add-field JSON shape; validation is limited to
collection name and non-empty list (Solr returns clear errors for
malformed field defs).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
String.valueOf(null) returns the literal "null"; direct cast yields a
real null on missing key, which makes the result honest about input
shape (Solr's error surfaces before any result is returned).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
…elper

Supports single analyzer, separate index/query analyzers, and
non-analyzer field types like DenseVectorField. Manual conversion from
flat input map to SolrJ FieldTypeDefinition because name/class go into
attributes map and analyzers are typed sub-objects.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
…es test

Aligns addFieldTypes_blankCollection_throws with the parallel
addFields_blankCollection_throws test which already covers
null + empty + whitespace.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
…flection

Same pattern as the other @mcptool response records — invisible to AOT
because MCP dispatches via Object.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
End-to-end against real Solr via Testcontainers. Verifies schema
round-trip, custom analyzer behavior, vector field type registration,
and error propagation on duplicate field / unknown type.

SolrJ's MultiUpdate.process() throws natively on Schema API errors,
so no explicit response-body inspection was needed in SchemaService.

Note: SolrJ 10 moved SolrQuery to
org.apache.solr.client.solrj.request.SolrQuery; vectorDimension
attribute is returned as String by the schema API, handled via
toString/parseInt.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
Adds ordered tests 16-18 exercising the add-fields → index → search
workflow through the MCP protocol against both HTTP and stdio transports.
Also asserts add-fields and add-field-types appear in listTools output.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
…n tools

Reflects the metadata→schema package rename and the new add-fields and
add-field-types capabilities.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
…reflection

GraalVM native test caught that Jackson's convertValue(map, AnalyzerDefinition.class)
in SchemaService.toAnalyzerDefinition fails at runtime without reflection metadata.
The spec anticipated this; adding both SolrJ types defensively.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
The get-schema MCP tool returns SolrJ's SchemaRepresentation, which
Spring AI serializes to JSON for MCP clients. Without reflection
metadata in the native image, the JSON Spring AI produces is missing
the fields/fieldTypes/dynamicFields/copyFields arrays — silently
breaking any consumer that introspects the schema.

JVM tests didn't catch this because the pre-existing get-schema test
only asserts the response is non-empty. The new end-to-end shows
workflow in the next commit parses the schema JSON, which surfaced
the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
Adds 9 new ordered MCP-protocol tests (orders 19-27) covering the
full workflow an LLM would drive against the shows-collection use
case from issue #30:

  1. create-collection  → "shows"
  2. add-fields         → 16 user-defined fields (title, platform,
                          genres, release_year, ..., imdb_rating,
                          description, tags)
  3. index-json-documents → 61 docs from src/test/resources/shows.json
                            (Netflix, Prime Video, HBO Max, Disney+,
                             Apple TV+, Hulu, Peacock, Paramount+)
  4. search             → numFound=61
  5. search + facet     → platform facet returns Netflix=20, Prime=20
  6. search + filter    → multi-valued genres:Sci-Fi
  7. search + keyword   → description:apocalyptic with platform filter
  8. get-schema         → all 16 added fields present
  9. get-collection-stats → numDocs=61

Because the test base class is reused by both HTTP and stdio MCP
client transports and is also compiled into the GraalVM native test
binary, these tests exercise all four combinations:

  - JVM + stdio (McpClientStdioIntegrationTest)
  - JVM + HTTP  (McpClientIntegrationTest)
  - Native + stdio (via nativeTest -Pnative)
  - Native + HTTP  (via nativeTest -Pnative)

The shows collection inherits the same _default configset that
prior tests in this class have already modified via schemaless
indexing and add-fields against mcp-client-test. addShowsSchema()
calls get-schema first to filter the desired field list to only
the fields not already present in the shared configset's
managed-schema — which is exactly what the add-fields tool
description tells the LLM to do.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
…keys

Two correctness fixes flagged by the Solr-expert review of PR apache#131:

1. MultiUpdate IS transactional, not sequential
   Per the Solr Schema API reference guide and SolrJ's
   SchemaRequest.MultiUpdate Javadoc, all commands in a single call
   either succeed or fail together. The previous tool descriptions
   wrongly told the LLM "commands run in input order; if one fails
   mid-batch, prior commands remain applied" / "partial application
   possible on failure" — which would lead an LLM to issue partial-
   recovery commands that double-apply on retry. Updated both
   add-fields and add-field-types descriptions and the spec doc to
   say "Solr's Schema API is transactional — if any command in the
   batch fails, none are applied."

2. toAnalyzerDefinition silently dropped unknown analyzer keys
   SolrJ's AnalyzerDefinition only exposes typed setters for
   charFilters, tokenizer, and filters. A naive
   objectMapper.convertValue(raw, AnalyzerDefinition.class) silently
   drops every other top-level analyzer key (class,
   luceneMatchVersion, positionIncrementGap, ...). This broke the
   valid single-class analyzer form
   {"analyzer":{"class":"solr.WhitespaceAnalyzer"}} which the Solr
   Ref Guide documents for the StandardAnalyzer / WhitespaceAnalyzer /
   KeywordAnalyzer / per-language analyzer (ArabicAnalyzer etc.)
   patterns.

   Rewrote the helper to manually split the map: charFilters,
   tokenizer, filters go through the typed setters; everything else
   is preserved via setAttributes(). Added a unit-test regression
   guard that serializes the captured MultiUpdate's wire body and
   asserts the analyzer-level class key is present.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
Apply the same drop-noise treatment to SchemaUpdateResult that PR apache#132
applies to Dtos.java records:

- success (boolean) — always true on return; failures throw before
  reaching the result.
- timestamp (Date) — always "now"; sub-second operation, MCP host
  records call timing already.

After this change, SchemaUpdateResult is just (collection, addedNames):
both fields carry real information. addedNames echoes the field names
back so the LLM can confirm what landed. java.util.Date and
com.fasterxml.jackson.annotation.JsonFormat imports drop from the
record entirely.

Test assertions referencing the dropped fields are removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: adityamparikh <[email protected]>
adityamparikh added a commit to adityamparikh/solr-mcp that referenced this pull request May 22, 2026
…safety

README:
- Add `add-fields` and `add-field-types` to the Tools table (PR apache#131).
- Add a sentence on MCP behavior hints (`readOnlyHint`,
  `destructiveHint`, `idempotentHint`) so client integrators know to
  read them (PR apache#134).
- Add an "MCP Prompts" section listing the six workflow prompts
  introduced in the prompts PR.
- Clarify that `solr://{collection}/schema` autocompletion is
  prefix-filtered, case-insensitive, capped at 100 — describing the
  behavior shipped with the completion PR.

CONTRIBUTING:
- Add a "Null safety" subsection covering the project-wide
  `@NullMarked` contract and NullAway enforcement (PR apache#133).

These docs land alongside the corresponding feature PRs at apache/main.

Signed-off-by: adityamparikh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add schema and configuration modification tools

1 participant