feat: add schema modification MCP tools (add-fields, add-field-types)#131
Open
adityamparikh wants to merge 18 commits into
Open
feat: add schema modification MCP tools (add-fields, add-field-types)#131adityamparikh wants to merge 18 commits into
adityamparikh wants to merge 18 commits into
Conversation
Spec and plan for adding add-fields and add-field-types MCP tools per issue #30. See docs/superpowers/specs/ and docs/superpowers/plans/. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
… tools Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
…oject DTO convention @JsonIgnoreProperties, @JsonInclude(NON_NULL), and @jsonformat on the timestamp field match the pattern used by every record in Dtos.java. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
Package previously named "metadata" only contained schema-related types (SchemaService, SchemaUpdateResult, and their tests). Renaming to "schema" makes the package name accurate to its contents. Moves preserve git history via git mv. Imports updated in Main, MainTest, and McpToolRegistrationTest. Spec and plan paths updated. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
Closes part of #30. Adds one or more fields per call via SolrJ's SchemaRequest.MultiUpdate. Input is List<Map<String, Object>> matching the Solr Schema API add-field JSON shape; validation is limited to collection name and non-empty list (Solr returns clear errors for malformed field defs). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
String.valueOf(null) returns the literal "null"; direct cast yields a real null on missing key, which makes the result honest about input shape (Solr's error surfaces before any result is returned). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
…elper Supports single analyzer, separate index/query analyzers, and non-analyzer field types like DenseVectorField. Manual conversion from flat input map to SolrJ FieldTypeDefinition because name/class go into attributes map and analyzers are typed sub-objects. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
…es test Aligns addFieldTypes_blankCollection_throws with the parallel addFields_blankCollection_throws test which already covers null + empty + whitespace. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
…flection Same pattern as the other @mcptool response records — invisible to AOT because MCP dispatches via Object. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
End-to-end against real Solr via Testcontainers. Verifies schema round-trip, custom analyzer behavior, vector field type registration, and error propagation on duplicate field / unknown type. SolrJ's MultiUpdate.process() throws natively on Schema API errors, so no explicit response-body inspection was needed in SchemaService. Note: SolrJ 10 moved SolrQuery to org.apache.solr.client.solrj.request.SolrQuery; vectorDimension attribute is returned as String by the schema API, handled via toString/parseInt. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
Adds ordered tests 16-18 exercising the add-fields → index → search workflow through the MCP protocol against both HTTP and stdio transports. Also asserts add-fields and add-field-types appear in listTools output. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
…n tools Reflects the metadata→schema package rename and the new add-fields and add-field-types capabilities. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
…reflection GraalVM native test caught that Jackson's convertValue(map, AnalyzerDefinition.class) in SchemaService.toAnalyzerDefinition fails at runtime without reflection metadata. The spec anticipated this; adding both SolrJ types defensively. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
The get-schema MCP tool returns SolrJ's SchemaRepresentation, which Spring AI serializes to JSON for MCP clients. Without reflection metadata in the native image, the JSON Spring AI produces is missing the fields/fieldTypes/dynamicFields/copyFields arrays — silently breaking any consumer that introspects the schema. JVM tests didn't catch this because the pre-existing get-schema test only asserts the response is non-empty. The new end-to-end shows workflow in the next commit parses the schema JSON, which surfaced the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
Adds 9 new ordered MCP-protocol tests (orders 19-27) covering the full workflow an LLM would drive against the shows-collection use case from issue #30: 1. create-collection → "shows" 2. add-fields → 16 user-defined fields (title, platform, genres, release_year, ..., imdb_rating, description, tags) 3. index-json-documents → 61 docs from src/test/resources/shows.json (Netflix, Prime Video, HBO Max, Disney+, Apple TV+, Hulu, Peacock, Paramount+) 4. search → numFound=61 5. search + facet → platform facet returns Netflix=20, Prime=20 6. search + filter → multi-valued genres:Sci-Fi 7. search + keyword → description:apocalyptic with platform filter 8. get-schema → all 16 added fields present 9. get-collection-stats → numDocs=61 Because the test base class is reused by both HTTP and stdio MCP client transports and is also compiled into the GraalVM native test binary, these tests exercise all four combinations: - JVM + stdio (McpClientStdioIntegrationTest) - JVM + HTTP (McpClientIntegrationTest) - Native + stdio (via nativeTest -Pnative) - Native + HTTP (via nativeTest -Pnative) The shows collection inherits the same _default configset that prior tests in this class have already modified via schemaless indexing and add-fields against mcp-client-test. addShowsSchema() calls get-schema first to filter the desired field list to only the fields not already present in the shared configset's managed-schema — which is exactly what the add-fields tool description tells the LLM to do. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
…keys Two correctness fixes flagged by the Solr-expert review of PR apache#131: 1. MultiUpdate IS transactional, not sequential Per the Solr Schema API reference guide and SolrJ's SchemaRequest.MultiUpdate Javadoc, all commands in a single call either succeed or fail together. The previous tool descriptions wrongly told the LLM "commands run in input order; if one fails mid-batch, prior commands remain applied" / "partial application possible on failure" — which would lead an LLM to issue partial- recovery commands that double-apply on retry. Updated both add-fields and add-field-types descriptions and the spec doc to say "Solr's Schema API is transactional — if any command in the batch fails, none are applied." 2. toAnalyzerDefinition silently dropped unknown analyzer keys SolrJ's AnalyzerDefinition only exposes typed setters for charFilters, tokenizer, and filters. A naive objectMapper.convertValue(raw, AnalyzerDefinition.class) silently drops every other top-level analyzer key (class, luceneMatchVersion, positionIncrementGap, ...). This broke the valid single-class analyzer form {"analyzer":{"class":"solr.WhitespaceAnalyzer"}} which the Solr Ref Guide documents for the StandardAnalyzer / WhitespaceAnalyzer / KeywordAnalyzer / per-language analyzer (ArabicAnalyzer etc.) patterns. Rewrote the helper to manually split the map: charFilters, tokenizer, filters go through the typed setters; everything else is preserved via setAttributes(). Added a unit-test regression guard that serializes the captured MultiUpdate's wire body and asserts the analyzer-level class key is present. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
4 tasks
Apply the same drop-noise treatment to SchemaUpdateResult that PR apache#132 applies to Dtos.java records: - success (boolean) — always true on return; failures throw before reaching the result. - timestamp (Date) — always "now"; sub-second operation, MCP host records call timing already. After this change, SchemaUpdateResult is just (collection, addedNames): both fields carry real information. addedNames echoes the field names back so the LLM can confirm what landed. java.util.Date and com.fasterxml.jackson.annotation.JsonFormat imports drop from the record entirely. Test assertions referencing the dropped fields are removed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: adityamparikh <[email protected]>
This was referenced May 19, 2026
adityamparikh
added a commit
to adityamparikh/solr-mcp
that referenced
this pull request
May 22, 2026
…safety README: - Add `add-fields` and `add-field-types` to the Tools table (PR apache#131). - Add a sentence on MCP behavior hints (`readOnlyHint`, `destructiveHint`, `idempotentHint`) so client integrators know to read them (PR apache#134). - Add an "MCP Prompts" section listing the six workflow prompts introduced in the prompts PR. - Clarify that `solr://{collection}/schema` autocompletion is prefix-filtered, case-insensitive, capped at 100 — describing the behavior shipped with the completion PR. CONTRIBUTING: - Add a "Null safety" subsection covering the project-wide `@NullMarked` contract and NullAway enforcement (PR apache#133). These docs land alongside the corresponding feature PRs at apache/main. Signed-off-by: adityamparikh <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
add-fieldsandadd-field-typesMCP tools to extend a Solr collection's schema additively through the MCP layer.add-copy-field,add-dynamic-field, andadd-codec-factoryare deliberately deferred (see spec for rationale).List<Map<String, Object>>matching Solr's Schema API JSON shape; batched viaSchemaRequest.MultiUpdate. No transformation layer.@PreAuthorize("isAuthenticated()")— HTTP enforces auth, stdio bypasses (same pattern as existing tools).metadata/package toschema/since it only ever held schema-related types.Design and plan
docs/superpowers/specs/2026-05-17-schema-modification-design.mddocs/superpowers/plans/2026-05-17-schema-modification.mdWhy two tools instead of one combined? LLM tool-use ergonomics: single-purpose tools eliminate cross-wiring risk between field and field-type definitions. The orphan-field-type cost of separation is harmless.
Why no replace/delete? Schema replace/delete silently breaks indexed data without an explicit reindex. AI-driven workflows are the wrong place to expose that footgun without a guardrail design.
Why
Map<String, Object>instead of records? Solr field-type shape includes analyzers/tokenizers/filters with class-specific param bags; records collapse toMap<String, Object>at the leaves anyway. Map shape matches SolrJ'sAddField(Map)constructor — zero transformation.Tool descriptions include inline recipes
add-field-typesdescription includes three common recipes inline (case-insensitive exact match, dense vector for semantic search, autocomplete) to give the LLM a diagnostic-to-fix bridge — symptom in user prompt → recipe in description → exact analyzer chain.Test plan
@DisabledInNativeImage) — validation, happy path, error propagation, single-analyzer, separate index/query analyzers, DenseVectorFieldMcpClientIntegrationTestBase— end-to-endadd-fields→index-json-documents→searchworkflow, exercised against both HTTP and stdio transports./gradlew clean build— BUILD SUCCESSFUL, all tests pass./gradlew jibDockerBuild— Jib JVM image built./gradlew bootBuildImage -Pnative— Paketo native stdio image built./gradlew bootBuildImage -Pnative -Pprofile=http— Paketo native HTTP image built./gradlew nativeTest -Pnative— 168/168 native tests pass (113 Mockito tests appropriately skipped)The
nativeTestrun also caught a real native-image reflection gap:Jackson.convertValue(map, AnalyzerDefinition.class)needs reflection metadata forAnalyzerDefinition(andFieldTypeDefinitiondefensively). Fixed infix(native): register AnalyzerDefinition and FieldTypeDefinition for reflection.🤖 Generated with Claude Code