Skip to content

Comments

[Feature] Calcite PPL search result highlighting#5141

Open
RyanL1997 wants to merge 11 commits intoopensearch-project:mainfrom
RyanL1997:syntax-highlighting
Open

[Feature] Calcite PPL search result highlighting#5141
RyanL1997 wants to merge 11 commits intoopensearch-project:mainfrom
RyanL1997:syntax-highlighting

Conversation

@RyanL1997
Copy link
Collaborator

@RyanL1997 RyanL1997 commented Feb 12, 2026

Description.

For ease of review, please reference the design doc in issue #5156.

  • Add request-level highlight API for PPL queries. Callers (OSD, API, CLI) can include an optional highlight JSON object in the PPL request body, which the backend forwards as-is to OpenSearch.
  • Highlight config is carried across the thread boundary (REST handler → sql-worker) via AbstractPlan, then set as a ThreadLocal for Calcite planning and execution.
  • Highlight metadata from OpenSearch hits is collected via a side-channel ThreadLocal in OpenSearchIndexEnumerator and merged back into the JDBC/SimpleJson response as a highlights array parallel to datarows.

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-request PPL highlighting end-to-end: request parsing, plan-level propagation via ThreadLocal, conditional hidden _highlight column in Calcite schema, attaching HighlightBuilder to OpenSearch requests, extracting highlights from responses, and exposing per-row highlights through QueryResult/formatters with unit and integration tests.

Changes

Cohort / File(s) Summary
Plan context & propagation
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java, core/src/main/java/org/opensearch/sql/executor/execution/AbstractPlan.java
Add ThreadLocal highlightConfig and plan-level highlightConfig field with accessors to carry highlight settings across thread boundary.
Plan execution flow
core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java, core/src/main/java/org/opensearch/sql/executor/execution/ExplainPlan.java
Set/clear CalcitePlanContext ThreadLocal around execute/explain to ensure worker thread sees plan highlight config.
PPL request surface
ppl/src/main/java/org/opensearch/sql/ppl/PPLService.java, ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java
Parse highlight from PPL request JSON and attach it to created plan before submission.
Calcite schema / planning
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java, core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
When highlight config exists, add hidden _highlight column (ANY) to row type and preserve it through projection/visitor logic.
OpenSearch request building
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteEnumerableIndexScan.java
Add applyHighlightConfig to build/attach OpenSearch HighlightBuilder from CalcitePlanContext and apply it before executing requests; include in explain output.
Response handling & enumerator
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java, opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java, opensearch/src/main/java/org/opensearch/sql/opensearch/response/OpenSearchResponse.java
Carry _highlight as an opaque per-row value (excluded from schema), special-case resolution in enumerator, and centralize highlight field constant usage.
Protocol & formatting
protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java, protocol/src/main/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatter.java, protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java
Add QueryResult.highlights() and include highlights in JDBC/Simple JSON outputs when present.
Expressions & constants
core/src/main/java/org/opensearch/sql/expression/HighlightExpression.java
Introduce HIGHLIGHT_FIELD constant and use it instead of hard-coded string.
Tests (unit & integration)
core/src/test/.../ExplainPlanTest.java, core/src/test/.../QueryPlanTest.java, opensearch/src/test/.../AbstractCalciteIndexScanHighlightTest.java, protocol/src/test/.../*, ppl/src/test/.../PPLQueryRequestTest.java, integ-test/src/test/java/.../CalcitePPLHighlightIT.java, integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml
Add unit and integration tests for propagation, request-building, highlighter config application, enumerator extraction, QueryResult highlights, and formatted outputs.
Docs
docs/user/ppl/interfaces/endpoint.md
Add Highlight section documenting highlight request object, parameters, examples, response format, and limitations.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PPLService
    participant Planner as CalcitePlanner
    participant Worker as ExecThread
    participant CalciteCtx as CalcitePlanContext
    participant OpenSearch as OpenSearchNode
    participant Enumerator as OpenSearchIndexEnumerator
    participant Protocol as QueryResult/Formatter

    Client->>PPLService: POST /_plugins/_ppl (with optional "highlight")
    PPLService->>Planner: build execution plan
    PPLService->>Planner: set plan.highlightConfig
    Planner->>Worker: submit plan (worker thread)
    Worker->>CalciteCtx: setHighlightConfig(plan.highlightConfig)
    Worker->>OpenSearch: execute search (HighlightBuilder attached)
    OpenSearch-->>Enumerator: return hits with highlight fragments
    Enumerator->>Worker: produce rows including hidden _highlight value
    Worker->>Protocol: build QueryResult (extract highlights())
    Worker->>CalciteCtx: clearHighlightConfig()
    Protocol-->>Client: JSON response (optional "highlights" array)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

calcite

Suggested reviewers

  • LantaoJin
  • penghuo
  • ps48
  • kavithacm
  • derek-ho
  • joshuali925
  • anirudha
  • Swiddis
  • yuancu
  • GumpacG
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[Feature] Calcite PPL search result highlighting' directly and clearly describes the main feature being added—PPL result highlighting via Calcite.
Description check ✅ Passed The description is related to the changeset, explaining the highlighting feature, thread-local propagation, response formatting, and referencing design docs and related issues.
Linked Issues check ✅ Passed The PR successfully implements all key objectives from issues #5156 and #5059: request-level highlight API, thread-local config propagation, conditional _highlight column, HighlightBuilder integration, highlight metadata collection, and response formatting with highlights array.
Out of Scope Changes check ✅ Passed All changes are scoped to highlighting feature implementation: config propagation (AbstractPlan, ThreadLocal), Calcite integration (_highlight column, projection handling), OpenSearch request building, response extraction/formatting, and comprehensive tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@RyanL1997 RyanL1997 added enhancement New feature or request PPL Piped processing language feature labels Feb 12, 2026
@RyanL1997 RyanL1997 marked this pull request as ready for review February 19, 2026 00:10
@RyanL1997 RyanL1997 changed the title [WIP][Feature] Calcite PPL search result highlighting eiifcbncngeijlvvcecdekikvhrgrfneuccvrhjrbkju[Feature] Calcite PPL search result highlighting Feb 19, 2026
@RyanL1997 RyanL1997 changed the title eiifcbncngeijlvvcecdekikvhrgrfneuccvrhjrbkju[Feature] Calcite PPL search result highlighting [Feature] Calcite PPL search result highlighting Feb 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (10)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml (3)

36-41: Add index cleanup to teardown.

The teardown disables the Calcite plugin but does not delete the ppl_highlight_test index. This can cause test pollution if tests run multiple times or if other tests use the same index name.

Proposed fix to add index cleanup
 ---
 teardown:
+  - do:
+      indices.delete:
+        index: ppl_highlight_test
+        ignore_unavailable: true
+
   - do:
       query.settings:
         body:
           transient:
             plugins.calcite.enabled : false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml`
around lines 36 - 41, The teardown currently only disables the Calcite plugin;
add a step to delete the ppl_highlight_test index to avoid test pollution.
Inside the existing teardown block (under teardown -> - do:), add a delete-index
action targeting the index name ppl_highlight_test (use the same REST/DSL action
style as other teardown steps), and make the delete tolerant of missing indices
(ignore_unavailable/allow_no_indices or equivalent) so the teardown does not
fail if the index is already absent. Ensure this runs before or alongside the
plugin disable step so cleanup always occurs.

99-100: Consider adding test case for non-text field behavior.

Per the RFC, only text and keyword fields should produce highlight fragments—non-string fields like code (integer) should not be highlighted. Adding a test that explicitly verifies integer fields are excluded from highlights would strengthen coverage of the documented behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml`
around lines 99 - 100, Update the ppl_highlight.yml test to explicitly assert
that non-text fields (the integer field named "code") are not included in the
highlights: after the existing checks for highlights (the existing "highlights"
variable and match size assertions), add an assertion that the highlights result
does not contain any entry for "code" (e.g., a negative contains/is_false check
against highlights.code or a predicate that ensures no fragment keys equal
"code"). Reference the "highlights" test output and the field name "code" when
adding the new negative assertion so the test verifies integer fields are
excluded from highlighting.

62-63: Consider adding assertions for highlight content.

The test only verifies that highlights is present, but doesn't validate that the highlighted content actually contains the expected tags (e.g., <em>error</em>) or that the correct fields are highlighted. This makes it possible for a bug to produce malformed highlights while still passing the test.

Consider adding a match assertion to verify at least one highlight contains the expected tagged content.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml`
around lines 62 - 63, Add a match assertion that validates the actual highlight
HTML/text content rather than only the presence of "highlights": locate the
assertions around "highlights" in ppl_highlight.yml and replace or add to the
existing checks (the {"size": 2} and is_true: highlights lines) with a match
that asserts at least one highlight string contains the expected tag/term (for
example that a highlight value contains "<em>error</em>" or the expected field
name wrapped in <em>…</em>), using the same match/assertion syntax used
elsewhere in the file so the test fails if highlighted content is missing or
malformed.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLHighlightIT.java (1)

23-165: Consider adding error and boundary condition tests.

Per coding guidelines, integration tests should include error condition tests and boundary condition tests. This test class would benefit from:

  1. Error condition test: Invalid/malformed highlight JSON configuration
  2. Boundary condition test: Query that returns empty results (no documents match)
  3. Edge case test: Filter-only query (per RFC, should not produce highlights for purely structured queries)
Suggested additional tests
`@Test`
public void testHighlightWithEmptyResults() throws IOException {
  // Query that matches no documents
  JSONObject result =
      executeQueryWithHighlight(
          "search source=" + TEST_INDEX_ACCOUNT + " \\\"NonExistentTerm12345\\\"",
          "{\"fields\": {\"*\": {}}, \"pre_tags\": [\"<em>\"], \"post_tags\": [\"</em>\"]}");
  
  assertTrue(result.has("highlights"));
  JSONArray highlights = result.getJSONArray("highlights");
  assertEquals(0, highlights.length());
}

`@Test`
public void testHighlightWithFilterOnlyQuery() throws IOException {
  // Filter-only query without full-text search should not produce highlights
  JSONObject result =
      executeQueryWithHighlight(
          "search source=" + TEST_INDEX_ACCOUNT + " | where age > 30",
          "{\"fields\": {\"*\": {}}, \"pre_tags\": [\"<em>\"], \"post_tags\": [\"</em>\"]}");
  
  // Verify behavior per RFC - no highlights for filter-only queries
  JSONArray highlights = result.getJSONArray("highlights");
  for (int i = 0; i < highlights.length(); i++) {
    assertTrue("Filter-only query should not produce highlights", highlights.isNull(i));
  }
}

As per coding guidelines: "Include boundary condition tests (min/max values, empty inputs) for all new functions" and "Include error condition tests (invalid inputs, exceptions) for all new functions".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLHighlightIT.java`
around lines 23 - 165, Add three new integration tests to CalcitePPLHighlightIT:
one that submits malformed highlight JSON via executeQueryWithHighlight (e.g.,
invalid JSON or missing required fields) and asserts the API returns an error
status or error payload; one that runs a full-text search for a non-existent
term using executeQueryWithHighlight and asserts "highlights" exists but the
JSONArray returned by executeQueryWithHighlight has length 0; and one that
executes a filter-only query (e.g., "search source=... | where age > 30") with
executeQueryWithHighlight and asserts highlights are null/empty for each hit
(use executeQueryWithHighlight and iterate the JSONArray as in
testHighlightWithPipedFilter to confirm null entries). Ensure tests use the
existing helper methods executeQueryWithHighlight/executeQueryNoHighlight and
live in the CalcitePPLHighlightIT class.
core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java (1)

89-94: Consider extracting setHighlightThreadLocal() to AbstractPlan to reduce duplication.

This helper method is identical in both QueryPlan and ExplainPlan. While the duplication is minor (4 lines), extracting it to AbstractPlan as a protected method would centralize the logic and ensure both subclasses stay in sync.

♻️ Proposed refactor in AbstractPlan

In AbstractPlan.java:

  /**
   * Sets the highlight configuration on the CalcitePlanContext ThreadLocal
   * for use during Calcite planning and execution.
   */
  protected void setHighlightThreadLocal() {
    Map<String, Object> config = getHighlightConfig();
    if (config != null) {
      CalcitePlanContext.setHighlightConfig(config);
    }
  }

Then both QueryPlan and ExplainPlan can simply call setHighlightThreadLocal() without the private helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java`
around lines 89 - 94, Extract the duplicated helper into AbstractPlan by moving
the setHighlightThreadLocal() logic there: implement a protected
setHighlightThreadLocal() in AbstractPlan that calls getHighlightConfig() and,
if non-null, invokes CalcitePlanContext.setHighlightConfig(config); then remove
the private setHighlightThreadLocal() implementations from QueryPlan and
ExplainPlan and replace their usages with a call to the inherited protected
setHighlightThreadLocal() method so both subclasses share the same
implementation and stay in sync.
core/src/test/java/org/opensearch/sql/executor/execution/QueryPlanTest.java (1)

156-165: Missing mock setup may cause test instability.

This test calls query.execute() without stubbing queryService.execute(...). Unlike the first test which uses doAnswer, this test has no mock behavior defined. With Mockito's default lenient stubbing in @ExtendWith(MockitoExtension.class), the mock returns null for void methods, but this relies on implicit behavior rather than explicit setup.

Consider adding explicit stubbing for consistency and clarity:

🔧 Suggested improvement
 `@Test`
 public void execute_clears_highlight_threadlocal_after_execution() {
   Map<String, Object> highlightConfig = Map.of("fields", Map.of("*", Map.of()));
+  doAnswer(invocation -> null).when(queryService).execute(any(), any(), any());

   QueryPlan query = new QueryPlan(queryId, queryType, plan, queryService, queryListener);
   query.setHighlightConfig(highlightConfig);
   query.execute();

   assertNull(CalcitePlanContext.getHighlightConfig());
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/test/java/org/opensearch/sql/executor/execution/QueryPlanTest.java`
around lines 156 - 165, The test
execute_clears_highlight_threadlocal_after_execution calls QueryPlan#execute()
but doesn't stub the mock QueryService, which can make the test flaky; add an
explicit Mockito stub for queryService.execute(...) (similar to the other test's
doAnswer) so that when QueryPlan#execute invokes queryService.execute it returns
the expected result or performs the expected side effect, then run the same
assertions (ensure you reference QueryPlan, query.execute(), and
queryService.execute(...) when adding the stub) to guarantee deterministic
behavior.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java (2)

160-169: Missing highlight cleanup in close() may leave stale data.

The reset() method clears the collected highlights, but close() does not. If an enumerator is closed without being reset (e.g., on error paths), stale highlight data from a previous execution could persist in the ThreadLocal and potentially be merged into unrelated subsequent queries on the same thread.

🔧 Suggested fix
 `@Override`
 public void close() {
   iterator = Collections.emptyIterator();
   queryCount = 0;
+  COLLECTED_HIGHLIGHTS.get().clear();
   bgScanner.close();
   if (request != null) {
     client.forceCleanup(request);
     request = null;
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java`
around lines 160 - 169, The close() method in OpenSearchIndexEnumerator doesn't
clear the per-thread highlight state, which can leave stale highlights in
ThreadLocal; update close() (in class OpenSearchIndexEnumerator) to also clear
collected highlights by invoking the same cleanup as reset() (or explicitly
calling collectedHighlights.remove()/clearCollectedHighlights()) before
returning so highlights are not leaked across requests; ensure this runs even
when other cleanup (bgScanner.close(), client.forceCleanup(request)) has already
executed.

40-53: ThreadLocal not removed after use risks memory leaks in thread pools.

The COLLECTED_HIGHLIGHTS ThreadLocal is initialized with withInitial(ArrayList::new) but never calls remove(). In long-lived thread pools (common in OpenSearch), this keeps the ArrayList instance attached to each thread indefinitely.

getAndClearCollectedHighlights() clears the list contents but the ThreadLocal entry persists. Consider calling COLLECTED_HIGHLIGHTS.remove() after retrieving and clearing:

🔧 Suggested fix
 public static List<ExprValue> getAndClearCollectedHighlights() {
   List<ExprValue> result = new ArrayList<>(COLLECTED_HIGHLIGHTS.get());
-  COLLECTED_HIGHLIGHTS.get().clear();
+  COLLECTED_HIGHLIGHTS.remove();
   return result;
 }

 /** Clear collected highlights (call before starting a new execution). */
 public static void clearCollectedHighlights() {
-  COLLECTED_HIGHLIGHTS.get().clear();
+  COLLECTED_HIGHLIGHTS.remove();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java`
around lines 40 - 53, The ThreadLocal COLLECTED_HIGHLIGHTS currently only clears
the held ArrayList but never removes the ThreadLocal entry, risking memory leaks
in pooled threads; update getAndClearCollectedHighlights() to copy contents,
clear the list, and then call COLLECTED_HIGHLIGHTS.remove() to detach the entry
from the current thread, and change clearCollectedHighlights() to call
COLLECTED_HIGHLIGHTS.remove() (instead of just .clear()) so the ThreadLocal is
fully removed when starting a new execution; keep references to
COLLECTED_HIGHLIGHTS, getAndClearCollectedHighlights, and
clearCollectedHighlights to locate the changes.
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java (2)

210-228: Highlights not cleared on exception path.

If statement.executeQuery() or buildResultSet() throws an exception, the collected highlights from OpenSearchIndexEnumerator will not be cleared (since getAndClearCollectedHighlights() is only called in buildResultSet). This could leave stale data in the ThreadLocal for subsequent queries on the same thread.

🔧 Suggested fix
 public void execute(
     RelNode rel, CalcitePlanContext context, ResponseListener<QueryResponse> listener) {
   client.schedule(
       () -> {
         try (PreparedStatement statement = OpenSearchRelRunners.run(context, rel)) {
           OpenSearchIndexEnumerator.clearCollectedHighlights();
           ProfileMetric metric = QueryProfiling.current().getOrCreateMetric(MetricName.EXECUTE);
           long execTime = System.nanoTime();
           ResultSet result = statement.executeQuery();
           QueryResponse response =
               buildResultSet(result, rel.getRowType(), context.sysLimit.querySizeLimit());
           metric.add(System.nanoTime() - execTime);
           listener.onResponse(response);

         } catch (SQLException e) {
+          OpenSearchIndexEnumerator.clearCollectedHighlights();
           throw new RuntimeException(e);
         }
       });
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java`
around lines 210 - 228, Ensure collected highlights are always cleared on the
exception path by moving or adding a call to
OpenSearchIndexEnumerator.clearCollectedHighlights() into a finally block around
the code that runs the query in OpenSearchExecutionEngine.execute (the block
that uses OpenSearchRelRunners.run, statement.executeQuery(), and
buildResultSet). Specifically, wrap the try-with-resources/logic such that after
ResultSet processing (or if an SQLException/any RuntimeException occurs) you
call OpenSearchIndexEnumerator.clearCollectedHighlights() (or
getAndClearCollectedHighlights() if that returns and clears) before rethrowing
or returning, so no ThreadLocal highlights remain for subsequent queries.

284-297: Silent handling of highlight/row count mismatch may mask bugs.

The Math.min(values.size(), collectedHighlights.size()) guard silently ignores mismatches. If the enumerator collected fewer highlights than rows returned (or vice versa), this could indicate a synchronization bug between the Calcite pipeline and the side-channel collection.

Consider logging a warning when counts don't match to aid debugging:

🔧 Suggested improvement
     List<ExprValue> collectedHighlights =
         OpenSearchIndexEnumerator.getAndClearCollectedHighlights();
+    if (!collectedHighlights.isEmpty() && collectedHighlights.size() != values.size()) {
+      logger.warn("Highlight count ({}) does not match row count ({})",
+          collectedHighlights.size(), values.size());
+    }
     for (int i = 0; i < Math.min(values.size(), collectedHighlights.size()); i++) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java`
around lines 284 - 297, The loop currently uses Math.min(values.size(),
collectedHighlights.size()) and silently ignores mismatched counts; update
OpenSearchExecutionEngine after calling
OpenSearchIndexEnumerator.getAndClearCollectedHighlights() to check if
values.size() != collectedHighlights.size() and emit a warning-level log
(including both sizes and context like "highlight/row count mismatch") so
mismatches are visible for debugging; keep the existing merge logic that sets
HighlightExpression.HIGHLIGHT_FIELD into ExprTupleValue.fromExprValueMap(values)
but log the discrepancy before the for-loop to aid tracing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLHighlightIT.java`:
- Around line 56-74: The test testHighlightWithSpecificField currently only
asserts that firstname is not highlighted; update it to also assert that the
address field is present and contains highlight snippets: after obtaining
JSONArray highlights and each non-null JSONObject hl (from
executeQueryWithHighlight result), add a positive assertion that
hl.has("address") is true and that hl.getJSONArray("address").length() > 0 (or
that the address value is non-empty) to ensure the address field is actually
highlighted.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatterTest.java`:
- Around line 211-258: Rename the two test methods in JdbcResponseFormatterTest:
change format_response_with_highlights() to testFormatResponseWithHighlights()
and format_response_without_highlights() to
testFormatResponseWithoutHighlights(); update any references to these methods
(if used reflectively) so the `@Test-annotated` methods follow the
test<Functionality><Condition> naming convention while keeping their bodies
(uses of QueryResult, formatter.format(response), assertJsonEquals, and
assertEquals checks) unchanged.
- Around line 211-258: Add additional unit tests in JdbcResponseFormatterTest to
cover null/boundary/multi-row/error cases for highlight formatting: create tests
that (1) format an empty QueryResult (no rows) and assert that "highlights" is
absent and total/size are zero, (2) format multiple-row QueryResult where some
rows include a valid "_highlight" and others omit it and assert highlights array
aligns per-row (null or absent entries where appropriate), and (3) pass invalid
or null highlight payloads (e.g., "_highlight" present but value null or wrong
type) to formatter.format and assert graceful handling (no exception and either
omission or null entries) to satisfy NULL/boundary/error/multi-document
coverage; reference existing test helpers and symbols like QueryResult, Schema,
Column, tupleValue, ExprTupleValue, and formatter.format when adding these
cases.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java`:
- Around line 187-219: Two unit test methods in SimpleJsonResponseFormatterTest
do not follow the required naming pattern; rename the methods
formatResponseWithHighlights to testFormatResponseHighlightsPresent and
formatResponseWithoutHighlightsOmitsField to
testFormatResponseHighlightsOmittedWhenAbsent (or the exact pattern
test<Functionality><Condition>) so they match test<Functionality><Condition>;
update any internal references within the SimpleJsonResponseFormatterTest class
(imports, annotations like `@Test` remain unchanged) and ensure the new method
names still run under the existing test framework.
- Around line 187-219: Add unit tests to cover null/boundary/multi-row/error
cases for highlight formatting in SimpleJsonResponseFormatter: create tests that
call SimpleJsonResponseFormatter.format with (1) an empty QueryResult (no rows)
and assert the output omits "highlights" and yields an empty result, (2)
multiple rows where some rows include an "_highlight" ExprTupleValue and others
don't and assert highlights appear only for rows with the field, (3) a row
containing a null "_highlight" value and assert the formatter handles it
gracefully (omits or represents as null per contract), and (4) a row with an
invalid "_highlight" payload type (e.g., non-ExprTupleValue or non-collection
elements) and assert the formatter either throws a documented exception or
sanitizes the output; use the existing helpers QueryResult,
ExprTupleValue.fromExprValueMap, tupleValue, collectionValue and
SimpleJsonResponseFormatter(COMPACT) to construct inputs and assertions.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java`:
- Around line 115-198: Rename the JUnit test methods to follow the
test<Functionality><Condition> pattern: change
iterate_excludes_highlight_from_datarows to
testIterateExcludesHighlightFromDatarows, highlights_returns_highlight_data to
testHighlightsReturnsHighlightData,
highlights_returns_null_when_no_highlight_data to
testHighlightsReturnsNullWhenNoHighlightData, and
highlights_returns_null_when_highlight_is_missing to
testHighlightsReturnsNullWhenHighlightIsMissing; update any references/imports
or test framework annotations if needed so the new method names compile and run.
- Around line 115-198: Add unit tests for QueryResult.highlights() to cover
empty input, multi-row mixed highlights, null/invalid highlight payloads, and
boundary cases: 1) empty rows: construct a QueryResult with an empty list and
assert highlights() returns empty list; 2) multi-row mix: build a QueryResult
with multiple ExprTupleValue rows where some rows include a valid "_highlight",
some have ExprValueUtils.LITERAL_MISSING, some have null/invalid types (e.g.,
non-collection), and assert highlights() returns a list whose entries align
per-document (valid maps for valid highlights, null for missing/invalid); 3)
boundary empty highlight: include a "_highlight" whose field value is an empty
collection and assert it returns an empty map or empty collection entry as per
current contract; and 4) error condition: add a test where "_highlight" is an
unexpected type and assert highlights() yields null for that entry (or throws if
the contract requires), referencing QueryResult.highlights() and QueryResultTest
to locate where to add these cases.

---

Nitpick comments:
In `@core/src/main/java/org/opensearch/sql/executor/execution/QueryPlan.java`:
- Around line 89-94: Extract the duplicated helper into AbstractPlan by moving
the setHighlightThreadLocal() logic there: implement a protected
setHighlightThreadLocal() in AbstractPlan that calls getHighlightConfig() and,
if non-null, invokes CalcitePlanContext.setHighlightConfig(config); then remove
the private setHighlightThreadLocal() implementations from QueryPlan and
ExplainPlan and replace their usages with a call to the inherited protected
setHighlightThreadLocal() method so both subclasses share the same
implementation and stay in sync.

In `@core/src/test/java/org/opensearch/sql/executor/execution/QueryPlanTest.java`:
- Around line 156-165: The test
execute_clears_highlight_threadlocal_after_execution calls QueryPlan#execute()
but doesn't stub the mock QueryService, which can make the test flaky; add an
explicit Mockito stub for queryService.execute(...) (similar to the other test's
doAnswer) so that when QueryPlan#execute invokes queryService.execute it returns
the expected result or performs the expected side effect, then run the same
assertions (ensure you reference QueryPlan, query.execute(), and
queryService.execute(...) when adding the stub) to guarantee deterministic
behavior.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLHighlightIT.java`:
- Around line 23-165: Add three new integration tests to CalcitePPLHighlightIT:
one that submits malformed highlight JSON via executeQueryWithHighlight (e.g.,
invalid JSON or missing required fields) and asserts the API returns an error
status or error payload; one that runs a full-text search for a non-existent
term using executeQueryWithHighlight and asserts "highlights" exists but the
JSONArray returned by executeQueryWithHighlight has length 0; and one that
executes a filter-only query (e.g., "search source=... | where age > 30") with
executeQueryWithHighlight and asserts highlights are null/empty for each hit
(use executeQueryWithHighlight and iterate the JSONArray as in
testHighlightWithPipedFilter to confirm null entries). Ensure tests use the
existing helper methods executeQueryWithHighlight/executeQueryNoHighlight and
live in the CalcitePPLHighlightIT class.

In `@integ-test/src/yamlRestTest/resources/rest-api-spec/test/ppl_highlight.yml`:
- Around line 36-41: The teardown currently only disables the Calcite plugin;
add a step to delete the ppl_highlight_test index to avoid test pollution.
Inside the existing teardown block (under teardown -> - do:), add a delete-index
action targeting the index name ppl_highlight_test (use the same REST/DSL action
style as other teardown steps), and make the delete tolerant of missing indices
(ignore_unavailable/allow_no_indices or equivalent) so the teardown does not
fail if the index is already absent. Ensure this runs before or alongside the
plugin disable step so cleanup always occurs.
- Around line 99-100: Update the ppl_highlight.yml test to explicitly assert
that non-text fields (the integer field named "code") are not included in the
highlights: after the existing checks for highlights (the existing "highlights"
variable and match size assertions), add an assertion that the highlights result
does not contain any entry for "code" (e.g., a negative contains/is_false check
against highlights.code or a predicate that ensures no fragment keys equal
"code"). Reference the "highlights" test output and the field name "code" when
adding the new negative assertion so the test verifies integer fields are
excluded from highlighting.
- Around line 62-63: Add a match assertion that validates the actual highlight
HTML/text content rather than only the presence of "highlights": locate the
assertions around "highlights" in ppl_highlight.yml and replace or add to the
existing checks (the {"size": 2} and is_true: highlights lines) with a match
that asserts at least one highlight string contains the expected tag/term (for
example that a highlight value contains "<em>error</em>" or the expected field
name wrapped in <em>…</em>), using the same match/assertion syntax used
elsewhere in the file so the test fails if highlighted content is missing or
malformed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java`:
- Around line 210-228: Ensure collected highlights are always cleared on the
exception path by moving or adding a call to
OpenSearchIndexEnumerator.clearCollectedHighlights() into a finally block around
the code that runs the query in OpenSearchExecutionEngine.execute (the block
that uses OpenSearchRelRunners.run, statement.executeQuery(), and
buildResultSet). Specifically, wrap the try-with-resources/logic such that after
ResultSet processing (or if an SQLException/any RuntimeException occurs) you
call OpenSearchIndexEnumerator.clearCollectedHighlights() (or
getAndClearCollectedHighlights() if that returns and clears) before rethrowing
or returning, so no ThreadLocal highlights remain for subsequent queries.
- Around line 284-297: The loop currently uses Math.min(values.size(),
collectedHighlights.size()) and silently ignores mismatched counts; update
OpenSearchExecutionEngine after calling
OpenSearchIndexEnumerator.getAndClearCollectedHighlights() to check if
values.size() != collectedHighlights.size() and emit a warning-level log
(including both sizes and context like "highlight/row count mismatch") so
mismatches are visible for debugging; keep the existing merge logic that sets
HighlightExpression.HIGHLIGHT_FIELD into ExprTupleValue.fromExprValueMap(values)
but log the discrepancy before the for-loop to aid tracing.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java`:
- Around line 160-169: The close() method in OpenSearchIndexEnumerator doesn't
clear the per-thread highlight state, which can leave stale highlights in
ThreadLocal; update close() (in class OpenSearchIndexEnumerator) to also clear
collected highlights by invoking the same cleanup as reset() (or explicitly
calling collectedHighlights.remove()/clearCollectedHighlights()) before
returning so highlights are not leaked across requests; ensure this runs even
when other cleanup (bgScanner.close(), client.forceCleanup(request)) has already
executed.
- Around line 40-53: The ThreadLocal COLLECTED_HIGHLIGHTS currently only clears
the held ArrayList but never removes the ThreadLocal entry, risking memory leaks
in pooled threads; update getAndClearCollectedHighlights() to copy contents,
clear the list, and then call COLLECTED_HIGHLIGHTS.remove() to detach the entry
from the current thread, and change clearCollectedHighlights() to call
COLLECTED_HIGHLIGHTS.remove() (instead of just .clear()) so the ThreadLocal is
fully removed when starting a new execution; keep references to
COLLECTED_HIGHLIGHTS, getAndClearCollectedHighlights, and
clearCollectedHighlights to locate the changes.

Comment on lines +56 to +74
@Test
public void testHighlightWithSpecificField() throws IOException {
JSONObject result =
executeQueryWithHighlight(
"search source=" + TEST_INDEX_ACCOUNT + " \\\"Street\\\"",
"{\"fields\": {\"address\": {}}, \"pre_tags\": [\"<em>\"], \"post_tags\":"
+ " [\"</em>\"]}");

assertTrue(result.has("highlights"));
JSONArray highlights = result.getJSONArray("highlights");

for (int i = 0; i < highlights.length(); i++) {
if (!highlights.isNull(i)) {
JSONObject hl = highlights.getJSONObject(i);
// Only address field should be highlighted, not other text fields
assertFalse("Should not highlight firstname field", hl.has("firstname"));
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Strengthen assertion to verify address field IS highlighted.

The test only asserts that firstname is NOT highlighted but doesn't positively verify that the address field IS highlighted. This could pass even if highlighting is completely broken for the target field.

Proposed fix to add positive assertion
     for (int i = 0; i < highlights.length(); i++) {
       if (!highlights.isNull(i)) {
         JSONObject hl = highlights.getJSONObject(i);
         // Only address field should be highlighted, not other text fields
         assertFalse("Should not highlight firstname field", hl.has("firstname"));
+        assertTrue("Should highlight address field", hl.has("address"));
       }
     }
+    // Verify at least one non-null highlight was found
+    boolean foundHighlight = false;
+    for (int i = 0; i < highlights.length(); i++) {
+      if (!highlights.isNull(i)) {
+        foundHighlight = true;
+        break;
+      }
+    }
+    assertTrue("Expected at least one highlight entry", foundHighlight);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLHighlightIT.java`
around lines 56 - 74, The test testHighlightWithSpecificField currently only
asserts that firstname is not highlighted; update it to also assert that the
address field is present and contains highlight snippets: after obtaining
JSONArray highlights and each non-null JSONObject hl (from
executeQueryWithHighlight result), add a positive assertion that
hl.has("address") is true and that hl.getJSONArray("address").length() > 0 (or
that the address value is non-empty) to ensure the address field is actually
highlighted.

Comment on lines +211 to +258
@Test
void format_response_with_highlights() {
QueryResult response =
new QueryResult(
new Schema(
ImmutableList.of(
new Column("name", null, STRING), new Column("age", null, INTEGER))),
Collections.singletonList(
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"name",
stringValue("John"),
"age",
org.opensearch.sql.data.model.ExprValueUtils.integerValue(20),
"_highlight",
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"name",
org.opensearch.sql.data.model.ExprValueUtils.collectionValue(
ImmutableList.of("<em>John</em>"))))))));

assertJsonEquals(
"{"
+ "\"schema\":["
+ "{\"name\":\"name\",\"type\":\"keyword\"},"
+ "{\"name\":\"age\",\"type\":\"integer\"}"
+ "],"
+ "\"datarows\":[[\"John\",20]],"
+ "\"highlights\":[{\"name\":[\"<em>John</em>\"]}],"
+ "\"total\":1,"
+ "\"size\":1,"
+ "\"status\":200}",
formatter.format(response));
}

@Test
void format_response_without_highlights() {
QueryResult response =
new QueryResult(
new Schema(
ImmutableList.of(
new Column("name", null, STRING), new Column("age", null, INTEGER))),
Collections.singletonList(tupleValue(ImmutableMap.of("name", "John", "age", 20))));

// When no highlights, the highlights field should not appear in the JSON
String formatted = formatter.format(response);
assertEquals(false, formatted.contains("\"highlights\""));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename tests to match the required naming pattern.

Use test<Functionality><Condition> (e.g., testFormatResponseWithHighlights, testFormatResponseWithoutHighlights). As per coding guidelines, "Test naming should follow pattern test".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatterTest.java`
around lines 211 - 258, Rename the two test methods in
JdbcResponseFormatterTest: change format_response_with_highlights() to
testFormatResponseWithHighlights() and format_response_without_highlights() to
testFormatResponseWithoutHighlights(); update any references to these methods
(if used reflectively) so the `@Test-annotated` methods follow the
test<Functionality><Condition> naming convention while keeping their bodies
(uses of QueryResult, formatter.format(response), assertJsonEquals, and
assertEquals checks) unchanged.

⚠️ Potential issue | 🟠 Major

Add null/boundary/multi-row/error coverage for highlight formatting.

The new tests only cover a single-row highlight and absence. Add coverage for empty results, multiple rows with mixed highlight presence, and invalid/null highlight payloads to satisfy required per-document test coverage. As per coding guidelines, "NULL input tests must be included for all new functions", "Include boundary condition tests (min/max values, empty inputs) for all new functions", "Include error condition tests (invalid inputs, exceptions) for all new functions", and "Include multi-document tests for per-document operations".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatterTest.java`
around lines 211 - 258, Add additional unit tests in JdbcResponseFormatterTest
to cover null/boundary/multi-row/error cases for highlight formatting: create
tests that (1) format an empty QueryResult (no rows) and assert that
"highlights" is absent and total/size are zero, (2) format multiple-row
QueryResult where some rows include a valid "_highlight" and others omit it and
assert highlights array aligns per-row (null or absent entries where
appropriate), and (3) pass invalid or null highlight payloads (e.g.,
"_highlight" present but value null or wrong type) to formatter.format and
assert graceful handling (no exception and either omission or null entries) to
satisfy NULL/boundary/error/multi-document coverage; reference existing test
helpers and symbols like QueryResult, Schema, Column, tupleValue,
ExprTupleValue, and formatter.format when adding these cases.

Comment on lines +187 to +219
@Test
void formatResponseWithHighlights() {
QueryResult response =
new QueryResult(
schema,
Collections.singletonList(
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"firstname",
stringValue("John"),
"age",
integerValue(20),
"_highlight",
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"firstname",
collectionValue(ImmutableList.of("<em>John</em>"))))))));
SimpleJsonResponseFormatter formatter = new SimpleJsonResponseFormatter(COMPACT);
String result = formatter.format(response);
assertTrue(result.contains("\"highlights\""));
assertTrue(result.contains("\"firstname\":[\"<em>John</em>\"]"));
}

@Test
void formatResponseWithoutHighlightsOmitsField() {
QueryResult response =
new QueryResult(
schema,
Collections.singletonList(tupleValue(ImmutableMap.of("firstname", "John", "age", 20))));
SimpleJsonResponseFormatter formatter = new SimpleJsonResponseFormatter(COMPACT);
String result = formatter.format(response);
assertFalse(result.contains("\"highlights\""));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename tests to match the required naming pattern.

Use test<Functionality><Condition> (e.g., testFormatResponseHighlightsPresent, testFormatResponseHighlightsOmittedWhenAbsent). As per coding guidelines, "Test naming should follow pattern test".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java`
around lines 187 - 219, Two unit test methods in SimpleJsonResponseFormatterTest
do not follow the required naming pattern; rename the methods
formatResponseWithHighlights to testFormatResponseHighlightsPresent and
formatResponseWithoutHighlightsOmitsField to
testFormatResponseHighlightsOmittedWhenAbsent (or the exact pattern
test<Functionality><Condition>) so they match test<Functionality><Condition>;
update any internal references within the SimpleJsonResponseFormatterTest class
(imports, annotations like `@Test` remain unchanged) and ensure the new method
names still run under the existing test framework.

⚠️ Potential issue | 🟠 Major

Add null/boundary/multi-row/error coverage for highlight formatting.

The new highlight tests only cover a single-row happy path and the “no highlights” case. Add coverage for empty results, multiple rows with mixed highlight presence, and invalid/null highlight payloads to satisfy required test coverage for new per-document behavior. As per coding guidelines, "NULL input tests must be included for all new functions", "Include boundary condition tests (min/max values, empty inputs) for all new functions", "Include error condition tests (invalid inputs, exceptions) for all new functions", and "Include multi-document tests for per-document operations".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java`
around lines 187 - 219, Add unit tests to cover null/boundary/multi-row/error
cases for highlight formatting in SimpleJsonResponseFormatter: create tests that
call SimpleJsonResponseFormatter.format with (1) an empty QueryResult (no rows)
and assert the output omits "highlights" and yields an empty result, (2)
multiple rows where some rows include an "_highlight" ExprTupleValue and others
don't and assert highlights appear only for rows with the field, (3) a row
containing a null "_highlight" value and assert the formatter handles it
gracefully (omits or represents as null per contract), and (4) a row with an
invalid "_highlight" payload type (e.g., non-ExprTupleValue or non-collection
elements) and assert the formatter either throws a documented exception or
sanitizes the output; use the existing helpers QueryResult,
ExprTupleValue.fromExprValueMap, tupleValue, collectionValue and
SimpleJsonResponseFormatter(COMPACT) to construct inputs and assertions.

Comment on lines +115 to +198
@Test
void iterate_excludes_highlight_from_datarows() {
QueryResult response =
new QueryResult(
schema,
Collections.singletonList(
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"name",
ExprValueUtils.stringValue("John"),
"age",
ExprValueUtils.integerValue(20),
"_highlight",
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"name",
ExprValueUtils.collectionValue(
ImmutableList.of("<em>John</em>"))))))),
Cursor.None);

for (Object[] objects : response) {
// datarows should only have schema columns, not _highlight
assertArrayEquals(new Object[] {"John", 20}, objects);
}
}

@Test
void highlights_returns_highlight_data() {
QueryResult response =
new QueryResult(
schema,
Collections.singletonList(
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"name",
ExprValueUtils.stringValue("John"),
"age",
ExprValueUtils.integerValue(20),
"_highlight",
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"name",
ExprValueUtils.collectionValue(
ImmutableList.of("<em>John</em>"))))))),
Cursor.None);

List<Map<String, Object>> highlights = response.highlights();
assertEquals(1, highlights.size());
assertEquals(ImmutableMap.of("name", ImmutableList.of("<em>John</em>")), highlights.get(0));
}

@Test
void highlights_returns_null_when_no_highlight_data() {
QueryResult response =
new QueryResult(
schema,
Collections.singletonList(tupleValue(ImmutableMap.of("name", "John", "age", 20))),
Cursor.None);

List<Map<String, Object>> highlights = response.highlights();
assertEquals(1, highlights.size());
assertNull(highlights.get(0));
}

@Test
void highlights_returns_null_when_highlight_is_missing() {
QueryResult response =
new QueryResult(
schema,
Collections.singletonList(
ExprTupleValue.fromExprValueMap(
ImmutableMap.of(
"name",
ExprValueUtils.stringValue("John"),
"age",
ExprValueUtils.integerValue(20),
"_highlight",
ExprValueUtils.LITERAL_MISSING))),
Cursor.None);

List<Map<String, Object>> highlights = response.highlights();
assertEquals(1, highlights.size());
assertNull(highlights.get(0));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename tests to match the required naming pattern.

Use test<Functionality><Condition> (e.g., testIterateExcludesHighlightFromDatarows, testHighlightsReturnsNullWhenMissing). As per coding guidelines, "Test naming should follow pattern test".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java`
around lines 115 - 198, Rename the JUnit test methods to follow the
test<Functionality><Condition> pattern: change
iterate_excludes_highlight_from_datarows to
testIterateExcludesHighlightFromDatarows, highlights_returns_highlight_data to
testHighlightsReturnsHighlightData,
highlights_returns_null_when_no_highlight_data to
testHighlightsReturnsNullWhenNoHighlightData, and
highlights_returns_null_when_highlight_is_missing to
testHighlightsReturnsNullWhenHighlightIsMissing; update any references/imports
or test framework annotations if needed so the new method names compile and run.

⚠️ Potential issue | 🟠 Major

Add null/boundary/multi-row/error coverage for highlights().

Current tests cover a single-row happy path and missing highlights, but not empty input, mixed multi-row alignment, or invalid/null highlight payloads. Add those to satisfy required coverage for the new per-document behavior. As per coding guidelines, "NULL input tests must be included for all new functions", "Include boundary condition tests (min/max values, empty inputs) for all new functions", "Include error condition tests (invalid inputs, exceptions) for all new functions", and "Include multi-document tests for per-document operations".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java`
around lines 115 - 198, Add unit tests for QueryResult.highlights() to cover
empty input, multi-row mixed highlights, null/invalid highlight payloads, and
boundary cases: 1) empty rows: construct a QueryResult with an empty list and
assert highlights() returns empty list; 2) multi-row mix: build a QueryResult
with multiple ExprTupleValue rows where some rows include a valid "_highlight",
some have ExprValueUtils.LITERAL_MISSING, some have null/invalid types (e.g.,
non-collection), and assert highlights() returns a list whose entries align
per-document (valid maps for valid highlights, null for missing/invalid); 3)
boundary empty highlight: include a "_highlight" whose field value is an empty
collection and assert it returns an empty map or empty collection entry as per
current contract; and 4) error condition: add a test where "_highlight" is an
unexpected type and assert highlights() yields null for that entry (or throws if
the contract requires), referencing QueryResult.highlights() and QueryResultTest
to locate where to add these cases.

*
* @return highlight JSONObject from request, or null if not specified
*/
public JSONObject getHighlight() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add doc to describe highligh parameters. include, what field can be highlight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added at: docs/user/ppl/interfaces/endpoint.md (link)

// is collected as a side channel in OpenSearchIndexEnumerator and merged here.
List<ExprValue> collectedHighlights =
OpenSearchIndexEnumerator.getAndClearCollectedHighlights();
for (int i = 0; i < Math.min(values.size(), collectedHighlights.size()); i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlight-to-row positional misalignment when Calcite operators sit above the index scan. The COLLECTED_HIGHLIGHTS ThreadLocal collects one entry per row the enumerator produces (raw scan order). After Calcite processing, buildResultSet iterates the final ResultSet, which may contain fewer rows in different order if non-pushable operators (filter, sort, dedup, head/tail) were applied above the scan.

Copy link
Collaborator Author

@RyanL1997 RyanL1997 Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've addressed this by replacing the COLLECTED_HIGHLIGHTS ThreadLocal side-channel entirely. Instead, _highlight is now a hidden column in the Calcite RowType (SqlTypeName.ANY).

How it works: CalciteLogicalIndexScan.buildInitialSchema() appends _highlight to the scan's schema when highlight config is present. OpenSearchResponse.addHighlightsToBuilder() builds the highlight data as an ExprTupleValue and puts it directly into the row tuple. OpenSearchIndexEnumerator.current() carries it as an opaque ExprValue in the Calcite row array.

Because _highlight is now a regular column, Calcite operators (filter, sort, dedup, head/tail) naturally preserve it alongside the row data — no positional misalignment is possible. When a filter removes a row, its highlight goes with it. When sort reorders rows, highlights move with them.

On the output side, OpenSearchExecutionEngine.buildResultSet() extracts _highlight from each row and excludes it from the response schema (hidden from users). QueryResult.highlights() produces a parallel array for the formatters. | fields projections also preserve _highlight automatically via CalciteRelNodeVisitor.visitProject(), matching DSL behavior where _source filtering never affects highlights.

- "</em>"

- match: {"size": 2}
- is_true: highlights
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertions only check is_true/is_false, no content validation

* After execution, {@link #getAndClearCollectedHighlights()} retrieves the collected data so it
* can be merged back into the ExprTupleValues for the JDBC response.
*/
private static final ThreadLocal<List<ExprValue>> COLLECTED_HIGHLIGHTS =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is highlighting intended only for single-source search commands, or should it work with joins/subqueries?

COLLECTED_HIGHLIGHTS is a single static ThreadLocal shared across all enumerator instances. If a query involves a join or union of two OpenSearch indices, multiple enumerator instances will append to the same list.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Highlighting is single-source only, consistent with DSL where highlighting is inherently single-index per request — there's no DSL equivalent of highlighting across a join.

Regarding the COLLECTED_HIGHLIGHTS ThreadLocal concern — this is no longer applicable. We replaced the ThreadLocal side-channel entirely with a hidden _highlight column approach (see the response on the positional misalignment comment). There is no shared static list anymore.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1)

264-305: ⚠️ Potential issue | 🟠 Major

Preserve _highlight column through projections to maintain downstream availability.

pushDownProject rebuilds the schema strictly from selectedColumns indices. Since _highlight is a hidden column (added by buildInitialSchema when highlight config exists) and is not typically part of user projections, it gets dropped from the new schema. This breaks downstream code that expects _highlight to be present (e.g., tuple.get(HIGHLIGHT_FIELD) in QueryResult and OpenSearchIndexEnumerator).

Re-append _highlight to the schema when it exists in the current row type but is absent from selectedColumns, similar to the suggested approach in the review.

Suggested fix
-    RelDataType newSchema = builder.build();
+    // Preserve hidden _highlight column if present in input schema
+    boolean highlightPresent =
+        fieldList.stream().anyMatch(f -> HIGHLIGHT_FIELD.equals(f.getName()));
+    boolean highlightSelected =
+        selectedColumns.stream()
+            .anyMatch(i -> HIGHLIGHT_FIELD.equals(fieldList.get(i).getName()));
+    if (highlightPresent && !highlightSelected) {
+      builder.add(HIGHLIGHT_FIELD, SqlTypeName.ANY);
+    }
+    RelDataType newSchema = builder.build();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java`
around lines 264 - 305, pushDownProject currently rebuilds newSchema from
selectedColumns and drops the hidden _highlight field; detect if getRowType()
contains HIGHLIGHT_FIELD but newSchema.getFieldNames() does not, and append the
_highlight field back into the builder before builder.build(). Concretely, in
CalciteLogicalIndexScan.pushDownProject use
getRowType().getField(HIGHLIGHT_FIELD) (or scan getRowType().getFieldList() to
find the field) to retrieve the RelDataTypeField for HIGHLIGHT_FIELD and call
builder.add(...) for it if it exists and its name is missing from
selectedColumns/newSchema, so that newSchema and the created
CalciteLogicalIndexScan preserve the _highlight column for downstream code that
references HIGHLIGHT_FIELD.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java`:
- Around line 264-305: pushDownProject currently rebuilds newSchema from
selectedColumns and drops the hidden _highlight field; detect if getRowType()
contains HIGHLIGHT_FIELD but newSchema.getFieldNames() does not, and append the
_highlight field back into the builder before builder.build(). Concretely, in
CalciteLogicalIndexScan.pushDownProject use
getRowType().getField(HIGHLIGHT_FIELD) (or scan getRowType().getFieldList() to
find the field) to retrieve the RelDataTypeField for HIGHLIGHT_FIELD and call
builder.add(...) for it if it exists and its name is missing from
selectedColumns/newSchema, so that newSchema and the created
CalciteLogicalIndexScan preserve the _highlight column for downstream code that
references HIGHLIGHT_FIELD.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 411-418: When highlight is enabled
(CalcitePlanContext.getHighlightConfig() != null) the highlight preservation
block in CalciteRelNodeVisitor can add HIGHLIGHT_FIELD twice into expandedFields
(duplicate of expandProjectFields' AllFields expansion); modify the logic around
currentFields/index hlIndex and expandedFields to first scan expandedFields for
an existing RexInputRef that references the same input index (or any existing
projection that already corresponds to HIGHLIGHT_FIELD) and only add
context.relBuilder.field(hlIndex) if no matching RexInputRef exists, so the
projection is not duplicated and downstream tuple.get(HIGHLIGHT_FIELD) continues
to work.

Comment on lines +411 to +418
// When highlight is active, include _highlight in the projection so it survives
// through the Calcite pipeline. This matches DSL behavior where _source filtering
// does not affect highlights.
if (CalcitePlanContext.getHighlightConfig() != null) {
int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD);
if (hlIndex >= 0) {
expandedFields.add(context.relBuilder.field(hlIndex));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's see the full context of the visitProject method and the code around lines 411-418
cd core/src/main/java/org/opensearch/sql/calcite && wc -l CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 97


🏁 Script executed:

# Read the visitProject method context (expanded range to understand the logic)
cd core/src/main/java/org/opensearch/sql/calcite && sed -n '380,430p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 1792


🏁 Script executed:

# Search for how _highlight is used downstream (extraction/handling)
rg "HIGHLIGHT_FIELD|_highlight|getHighlight" -n -A 3 -B 1 --type java

Repository: opensearch-project/sql

Length of output: 50378


🏁 Script executed:

# Check if there are similar guards against duplicate projection in the codebase
rg "expandedFields|alreadyProjected|anyMatch.*RexInputRef" -n --type java

Repository: opensearch-project/sql

Length of output: 1744


🏁 Script executed:

# Search for tests related to highlight projection
rg "highlight.*[Pp]roject|visitProject.*highlight" -n --type java

Repository: opensearch-project/sql

Length of output: 365


🏁 Script executed:

# Check if _highlight can be explicitly selected in user queries
# Look for test cases with explicit _highlight selection
rg "SELECT.*_highlight|project.*_highlight" -n --type java -B 2 -A 2 | head -50

Repository: opensearch-project/sql

Length of output: 1538


🏁 Script executed:

# Check expandProjectFields to see if it could include _highlight from user input
cd core/src/main/java/org/opensearch/sql/calcite && sed -n '440,495p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 2319


🏁 Script executed:

# Search for how currentFields is populated and whether _highlight would be present
cd core/src/main/java/org/opensearch/sql/calcite && sed -n '393,425p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 1328


🏁 Script executed:

# Check if there are tests specifically covering _highlight in projections with other fields
rg "highlight.*project|project.*highlight" -n --type java -A 5 | head -60

Repository: opensearch-project/sql

Length of output: 3601


🏁 Script executed:

# Check if _highlight is classified as a metadata field
rg "isMetadataField|METADATA|_highlight" -n --type java -B 2 -A 2 | grep -A 5 "isMetadataField"

Repository: opensearch-project/sql

Length of output: 4271


🏁 Script executed:

# Get the isMetadataField implementation
cd core/src/main/java/org/opensearch/sql/calcite && rg "isMetadataField" -A 5 CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 1952


🏁 Script executed:

# Check if _highlight is in METADATAFIELD_TYPE_MAP
rg "METADATAFIELD_TYPE_MAP|_highlight" -n --type java | grep -B 5 -A 5 "METADATAFIELD_TYPE_MAP"

Repository: opensearch-project/sql

Length of output: 7296


🏁 Script executed:

# Find where METADATAFIELD_TYPE_MAP is defined
fd "OpenSearchConstants.java" --type f | head -1 | xargs cat | head -100

Repository: opensearch-project/sql

Length of output: 1137


🏁 Script executed:

# Verify that _highlight is NOT in the metadata field map and would be included in wildcard projections
cd core/src/main/java/org/opensearch/sql/opensearch/storage && cat -n OpenSearchIndex.java | sed -n '65,75p'

Repository: opensearch-project/sql

Length of output: 172


🏁 Script executed:

# Check if there are any existing tests that combine SELECT * with highlight
rg "SELECT \*|AllFields" -n --type java integ-test/src/test/java/org/opensearch/sql/sql/HighlightFunctionIT.java

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Find the correct path for OpenSearchIndex.java
fd "OpenSearchIndex.java" --type f

Repository: opensearch-project/sql

Length of output: 243


🏁 Script executed:

# Check the METADATAFIELD_TYPE_MAP definition to confirm _highlight is NOT in it
rg "METADATAFIELD_TYPE_MAP\|_highlight" opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java -A 10 -B 2

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check if tests verify SELECT * with highlight scenarios
rg "SELECT.*\*.*highlight|highlight.*SELECT" integ-test/ -n --type java -A 3 -B 1

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for expandWildcardPattern to understand if _highlight gets included
rg "expandWildcardPattern|WildcardUtils" -n --type java -A 5 | head -50

Repository: opensearch-project/sql

Length of output: 5426


🏁 Script executed:

# Verify the exact scenario: when ALL fields are selected, does _highlight get added to expandedFields?
# Trace through: AllFields case adds all non-metadata fields
# _highlight is in currentFields (added by CalciteLogicalIndexScan when highlight config is set)
# _highlight is NOT in METADATAFIELD_TYPE_MAP, so !isMetadataField returns true
# So _highlight WOULD be added to expandedFields via the AllFields expansion
# Then the highlight logic adds it AGAIN at line 417

# Confirm _highlight is added to schema when highlight is active
rg "buildSchema|HIGHLIGHT_FIELD" opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java -B 3 -A 3

Repository: opensearch-project/sql

Length of output: 1710


🏁 Script executed:

# Check if there are integration tests that cover SELECT * with highlight
fd "HighlightFunctionIT.java" --type f -exec cat {} \; | head -150

Repository: opensearch-project/sql

Length of output: 253


🏁 Script executed:

# Read the HighlightFunctionIT.java test file
cat integ-test/src/test/java/org/opensearch/sql/sql/HighlightFunctionIT.java | head -200

Repository: opensearch-project/sql

Length of output: 7756


🏁 Script executed:

# Let me verify the exact duplicate scenario one more time by checking both code paths:
# Path 1: AllFields expansion (lines 479-482)
# Path 2: Explicit highlight logic (lines 414-417)

# Confirm that expandProjectFields is called BEFORE the highlight logic
cd core/src/main/java/org/opensearch/sql/calcite && sed -n '393,425p' CalciteRelNodeVisitor.java | cat -n

Repository: opensearch-project/sql

Length of output: 1559


🏁 Script executed:

# Now trace through a hypothetical SELECT * query with highlight:
# 1. Project node has AllFields as projectList
# 2. visitProject is called
# 3. isSingleAllFieldsProject returns true
# 4. handleAllFieldsProject is called (not visitProject logic at 411-420)

# Check handleAllFieldsProject
cd core/src/main/java/org/opensearch/sql/calcite && sed -n '424,445p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 951


🏁 Script executed:

# Check what happens in the non-AllFields-only case
# If someone does: SELECT name, * with highlight
# Then isSingleAllFieldsProject returns FALSE
# Then expandProjectFields is called
# Then highlight logic runs

# Verify: when is expandProjectFields called?
cd core/src/main/java/org/opensearch/sql/calcite && sed -n '395,404p' CalciteRelNodeVisitor.java

Repository: opensearch-project/sql

Length of output: 397


🏁 Script executed:

# Final verification: construct the exact scenario that would trigger the bug
# SELECT name, * with highlight enabled
# Trace: 
# 1. Project has [Field("name"), AllFields()]
# 2. isSingleAllFieldsProject returns FALSE (size=2, not 1)
# 3. expandProjectFields processes both:
#    - Field "name" → added to expandedFields
#    - AllFields → expands to all currentFields (which includes _highlight)
# 4. Then highlight logic adds _highlight AGAIN

# Check if there are tests covering SELECT <field>, * scenario
rg "SELECT.*,.*\*" integ-test/ core/src/test -n --type java | head -20

Repository: opensearch-project/sql

Length of output: 3138


Guard against duplicate _highlight projection when using SELECT with mixed explicit fields and wildcards.

When highlight is enabled and a user queries with mixed explicit fields and AllFields (e.g., SELECT name, *), the code can add _highlight twice to the projection:

  1. expandProjectFields includes _highlight via AllFields expansion (since _highlight is not in METADATAFIELD_TYPE_MAP)
  2. The highlight preservation logic at lines 414-417 adds it again unconditionally

Calcite may rename duplicate columns (e.g., _highlight0, _highlight1), which breaks downstream extraction that uses exact name matching via tuple.get(HIGHLIGHT_FIELD). The suggested guard checking for existing RexInputRef matches prevents this duplication.

🔧 Suggested guard
       if (CalcitePlanContext.getHighlightConfig() != null) {
         int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD);
-        if (hlIndex >= 0) {
-          expandedFields.add(context.relBuilder.field(hlIndex));
+        if (hlIndex >= 0) {
+          boolean alreadyProjected =
+              expandedFields.stream()
+                  .anyMatch(
+                      rex ->
+                          (rex instanceof RexInputRef ref && ref.getIndex() == hlIndex)
+                              || (rex instanceof RexCall call
+                                  && call.getKind() == SqlKind.AS
+                                  && call.getOperands().get(0) instanceof RexInputRef ref
+                                  && ref.getIndex() == hlIndex));
+          if (!alreadyProjected) {
+            expandedFields.add(context.relBuilder.field(hlIndex));
+          }
         }
       }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// When highlight is active, include _highlight in the projection so it survives
// through the Calcite pipeline. This matches DSL behavior where _source filtering
// does not affect highlights.
if (CalcitePlanContext.getHighlightConfig() != null) {
int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD);
if (hlIndex >= 0) {
expandedFields.add(context.relBuilder.field(hlIndex));
}
// When highlight is active, include _highlight in the projection so it survives
// through the Calcite pipeline. This matches DSL behavior where _source filtering
// does not affect highlights.
if (CalcitePlanContext.getHighlightConfig() != null) {
int hlIndex = currentFields.indexOf(HIGHLIGHT_FIELD);
if (hlIndex >= 0) {
boolean alreadyProjected =
expandedFields.stream()
.anyMatch(
rex ->
(rex instanceof RexInputRef ref && ref.getIndex() == hlIndex)
|| (rex instanceof RexCall call
&& call.getKind() == SqlKind.AS
&& call.getOperands().get(0) instanceof RexInputRef ref
&& ref.getIndex() == hlIndex));
if (!alreadyProjected) {
expandedFields.add(context.relBuilder.field(hlIndex));
}
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
around lines 411 - 418, When highlight is enabled
(CalcitePlanContext.getHighlightConfig() != null) the highlight preservation
block in CalciteRelNodeVisitor can add HIGHLIGHT_FIELD twice into expandedFields
(duplicate of expandProjectFields' AllFields expansion); modify the logic around
currentFields/index hlIndex and expandedFields to first scan expandedFields for
an existing RexInputRef that references the same input index (or any existing
projection that already corresponds to HIGHLIGHT_FIELD) and only add
context.relBuilder.field(hlIndex) if no matching RexInputRef exists, so the
projection is not duplicated and downstream tuple.get(HIGHLIGHT_FIELD) continues
to work.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
docs/user/ppl/interfaces/endpoint.md (2)

107-116: Clarify dual text and keyword highlighting behavior.

Line 108 shows both "firstname": ["<em>Holmes</em>"] and "firstname.keyword": ["<em>Holmes</em>"] in the highlights. While line 76 mentions that wildcard matching includes .keyword subfields, users may find it surprising to see both the analyzed text field and the keyword subfield highlighted for the same match.

Consider adding a brief note in the example explanation (after line 116) to clarify why both appear and when users might want to filter this (e.g., by specifying explicit fields instead of "*").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/ppl/interfaces/endpoint.md` around lines 107 - 116, Add a brief
explanatory note after the example that clarifies why both analyzed and keyword
subfield highlights appear: explain that using a wildcard ("*") matches both the
analyzed field (e.g., "firstname") and its unmapped keyword subfield (e.g.,
"firstname.keyword"), causing duplicate highlight entries for the same token,
and advise that users can avoid this by specifying explicit fields to highlight
(instead of "*") or by excluding ".keyword" fields when they only want
analyzed-text highlights; reference the "highlights" example showing "firstname"
and "firstname.keyword" and mention the wildcard behavior.

136-138: Consider expanding abbreviated example output.

Using ... for brevity in Example 2 is acceptable, but it may be clearer to either:

  1. Show the minimal complete schema (just the fields that matter for this example), or
  2. Add a note indicating what's omitted (e.g., // schema and datarows same as Example 1).

This helps readers understand the complete structure without needing to reference Example 1.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/ppl/interfaces/endpoint.md` around lines 136 - 138, Example 2 in
endpoint.md uses "schema" and "datarows" with "..." which hides structure;
update the Example 2 JSON so readers can understand it by either (a) replacing
the "..." under "schema" and "datarows" with a minimal complete schema and one
or two representative datarows (include the actual field names used in Example
1), or (b) add an explicit comment/note beneath the JSON such as "// schema and
datarows same as Example 1" that points to the source of the omitted content;
locate the block containing "schema" and "datarows" in Example 2 and apply one
of these two changes to make the example self-contained.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/user/ppl/interfaces/endpoint.md`:
- Around line 102-106: Several response examples in the docs are inconsistent
about including a "status" field (the snippet with "datarows" and the very first
POST example lack it while other examples include "status": 200); update all
example response JSON blocks to include a "status": 200 field so every example
(including the initial POST example and the block containing "datarows" and the
examples around lines referenced in the review) follows the same format,
ensuring the inserted key is placed at the same level as other top-level
response fields for uniformity.
- Around line 97-106: Example 1's query ("search source=accounts \"Holmes\"") is
missing an explicit fields clause, leaving the returned columns ambiguous;
update that example to include a fields clause listing the schema fields
(firstname, lastname, address) so the expected output is deterministic and
matches the earlier POST example pattern.
- Line 163: Update the sentence to explicitly state that highlighting is
supported only for single-source queries and that its behavior with joins,
subqueries, or multi-source queries (e.g., using the operator "| join") is
unsupported and untested: clarify that the system does not guarantee an error or
consistent output in those cases and that highlighting may be silently omitted
or produce unexpected results, so users should avoid combining highlighting with
joins, subqueries, or multi-source queries.

---

Nitpick comments:
In `@docs/user/ppl/interfaces/endpoint.md`:
- Around line 107-116: Add a brief explanatory note after the example that
clarifies why both analyzed and keyword subfield highlights appear: explain that
using a wildcard ("*") matches both the analyzed field (e.g., "firstname") and
its unmapped keyword subfield (e.g., "firstname.keyword"), causing duplicate
highlight entries for the same token, and advise that users can avoid this by
specifying explicit fields to highlight (instead of "*") or by excluding
".keyword" fields when they only want analyzed-text highlights; reference the
"highlights" example showing "firstname" and "firstname.keyword" and mention the
wildcard behavior.
- Around line 136-138: Example 2 in endpoint.md uses "schema" and "datarows"
with "..." which hides structure; update the Example 2 JSON so readers can
understand it by either (a) replacing the "..." under "schema" and "datarows"
with a minimal complete schema and one or two representative datarows (include
the actual field names used in Example 1), or (b) add an explicit comment/note
beneath the JSON such as "// schema and datarows same as Example 1" that points
to the source of the omitted content; locate the block containing "schema" and
"datarows" in Example 2 and apply one of these two changes to make the example
self-contained.

Comment on lines +97 to +106
"schema": [
{ "name": "firstname", "type": "string" },
{ "name": "lastname", "type": "string" },
{ "name": "address", "type": "string" }
],
"datarows": [
["Holmes", "Morgan", "123 Main St"],
["Jane", "Holmes", "456 Oak Ave"],
["John", "Smith", "880 Holmes Lane"]
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n docs/user/ppl/interfaces/endpoint.md | head -150

Repository: opensearch-project/sql

Length of output: 5348


Add explicit field selection to make the query output consistent.

The query search source=accounts "Holmes" omits a fields clause but the schema shows only three fields (firstname, lastname, address). This creates ambiguity about what fields an unrestricted query returns. Add the fields clause:

-  "query": "search source=accounts \"Holmes\"",
+  "query": "search source=accounts \"Holmes\" | fields firstname, lastname, address",

This aligns Example 1 with the explicit field selection pattern shown in the earlier POST example at line 16 and makes the expected output complete and unambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/ppl/interfaces/endpoint.md` around lines 97 - 106, Example 1's
query ("search source=accounts \"Holmes\"") is missing an explicit fields
clause, leaving the returned columns ambiguous; update that example to include a
fields clause listing the schema fields (firstname, lastname, address) so the
expected output is deterministic and matches the earlier POST example pattern.

Comment on lines +102 to +106
"datarows": [
["Holmes", "Morgan", "123 Main St"],
["Jane", "Holmes", "456 Oak Ave"],
["John", "Smith", "880 Holmes Lane"]
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Response format inconsistency: status field.

Example 1 (line 114) includes "status": 200 in the response, but the initial POST example (lines 51-53) and Example 2 (line 146) also include "status": 200, while the very first example (lines 21-53) does not show a status field at all.

For consistency and to avoid user confusion, ensure all example responses follow the same format. Either include status in all examples or document when it appears.

As per coding guidelines, documentation should follow existing patterns and structure consistently.

Also applies to: 139-143

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/ppl/interfaces/endpoint.md` around lines 102 - 106, Several
response examples in the docs are inconsistent about including a "status" field
(the snippet with "datarows" and the very first POST example lack it while other
examples include "status": 200); update all example response JSON blocks to
include a "status": 200 field so every example (including the initial POST
example and the block containing "datarows" and the examples around lines
referenced in the review) follows the same format, ensuring the inserted key is
placed at the same level as other top-level response fields for uniformity.

- Highlighting is supported only in the Calcite engine.
- The backend forwards the highlight config as-is to OpenSearch. The same highlighting behavior and limitations as [OpenSearch's highlighting API](https://opensearch.org/docs/latest/search-plugins/searching-data/highlight/) apply.
- Piped commands (`where`, `sort`, `head`, `dedup`) narrow or reorder the result set but do not affect which terms are highlighted.
- Highlighting works with **single-source queries only**, consistent with DSL where highlighting is inherently single-index per request. Behavior with joins (`| join`), subqueries, and multi-source queries is not validated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.java" -o -name "*.kt" | head -30

Repository: opensearch-project/sql

Length of output: 2727


🏁 Script executed:

# Search for test files related to highlighting
fd -e java -path "*/test/*" "highlight" 2>/dev/null | head -20

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for highlighting implementation and how it handles joins/multi-source
rg -l "highlight" --type=java -i | head -20

Repository: opensearch-project/sql

Length of output: 1766


🏁 Script executed:

# Search for join-related tests
fd -e java -path "*/test/*" "join" 2>/dev/null | head -10

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

cat ./sql/src/test/java/org/opensearch/sql/sql/antlr/HighlightTest.java

Repository: opensearch-project/sql

Length of output: 1162


🏁 Script executed:

# Search for highlighting implementation in PPL/SQL parsers
rg -C5 "highlight" --type=java ./ppl ./sql | head -100

Repository: opensearch-project/sql

Length of output: 10904


🏁 Script executed:

# Search for join-related highlighting or multi-source validation
rg -C3 "highlight" --type=java | rg -C3 "join|multi.*source|subquery" | head -50

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look at the PPL parser/domain to see highlight handling
cat ./ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java 2>/dev/null | head -200

Repository: opensearch-project/sql

Length of output: 3849


🏁 Script executed:

# Search for join implementation in SQL/PPL execution
rg -C5 "join" --type=java -i | grep -C3 "highlight" | head -50

Repository: opensearch-project/sql

Length of output: 638


🏁 Script executed:

# Look for any validation or error handling when processing highlights
rg -C3 "highlight" --type=java | grep -C2 "validate|error|fail|check" | head -50

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Search for how multi-source or subquery scenarios handle highlighting
fd -e java | xargs rg "getHighlight|highlight" | head -30

Repository: opensearch-project/sql

Length of output: 4339


🏁 Script executed:

# Check if there's any join-related test with highlight
find . -name "*JoinTest.java" -o -name "*MultiSourceTest.java" | head -10

Repository: opensearch-project/sql

Length of output: 140


🏁 Script executed:

# Examine the join test file to see if highlight is tested with joins
cat ./ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLJoinTest.java | head -200

Repository: opensearch-project/sql

Length of output: 9103


🏁 Script executed:

# Look for the highlight integration test to see what it covers
cat ./integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLHighlightIT.java | head -300

Repository: opensearch-project/sql

Length of output: 6838


🏁 Script executed:

# Search for any restrictions or validation related to highlight and joins
rg "highlight" --type=java -C3 | grep -C3 "restrict|join|single" | head -40

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check the documentation file mentioned in the review
cat ./docs/user/ppl/interfaces/endpoint.md | sed -n '155,175p'

Repository: opensearch-project/sql

Length of output: 1699


🏁 Script executed:

# Search for any validation that rejects or restricts multi-source highlights
rg "highlight" --type=java -C5 | grep -C3 "SemanticCheck|validate|exception|error|throw" | head -40

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check if there's any multi-source query validation for highlight
rg "multi.*source|subquery" --type=java -C3 | grep -C2 "highlight" | head -30

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check if there's validation in query execution or planner that prevents multi-source with highlight
rg "pushDownHighlight|highlight" --type=java -C5 | grep -C3 "source|join|table" | head -50

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look for any semantic checks or validation rules for highlight
fd -e java -path "*/semantic*" -o -path "*/validator*" | head -10

Repository: opensearch-project/sql

Length of output: 234


🏁 Script executed:

# Search for where highlight is actually processed/validated
rg "class.*Highlight" --type=java | head -20

Repository: opensearch-project/sql

Length of output: 1525


🏁 Script executed:

# Examine HighlightAnalyzer to see if there's validation for multi-source scenarios
cat ./core/src/main/java/org/opensearch/sql/analysis/HighlightAnalyzer.java

Repository: opensearch-project/sql

Length of output: 1586


🏁 Script executed:

# Check the test that throws SemanticCheckException for highlight
cat ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilderTest.java | grep -C15 "SemanticCheckException.*highlight"

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Look at SQL highlight integration tests to see what's tested
cat ./integ-test/src/test/java/org/opensearch/sql/sql/HighlightFunctionIT.java

Repository: opensearch-project/sql

Length of output: 7756


🏁 Script executed:

# Get more context on the SemanticCheckException test for highlight
cat ./opensearch/src/test/java/org/opensearch/sql/opensearch/request/OpenSearchRequestBuilderTest.java | grep -B20 -A5 "SemanticCheckException" | grep -B20 -A5 "highlight"

Repository: opensearch-project/sql

Length of output: 2170


🏁 Script executed:

# Search for any validation that explicitly prevents multi-source highlight
rg "pushDownHighlight" --type=java -B5 -A10 | head -80

Repository: opensearch-project/sql

Length of output: 9815


🏁 Script executed:

# Search for how logical plans are executed when joins are involved with highlight
rg "LogicalHighlight|LogicalJoin" --type=java -C3 | head -60

Repository: opensearch-project/sql

Length of output: 7701


🏁 Script executed:

# Check if there's any rule or optimizer that handles highlight with joins
fd -e java -path "*/optimizer/rule*" | xargs rg "highlight" 2>/dev/null | head -20

Repository: opensearch-project/sql

Length of output: 2768


Clarify what happens when highlighting is used with joins or multi-source queries.

The phrase "Behavior with joins (| join), subqueries, and multi-source queries is not validated" is ambiguous. Based on the codebase, highlighting with multi-source queries is genuinely untested. Clarify whether:

  • The system rejects these queries with an error,
  • Highlighting is silently omitted from results, or
  • Behavior is undefined and may produce unexpected results.

This helps users avoid confusion when combining highlight with joins or subqueries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/ppl/interfaces/endpoint.md` at line 163, Update the sentence to
explicitly state that highlighting is supported only for single-source queries
and that its behavior with joins, subqueries, or multi-source queries (e.g.,
using the operator "| join") is unsupported and untested: clarify that the
system does not guarantee an error or consistent output in those cases and that
highlighting may be silently omitted or produce unexpected results, so users
should avoid combining highlighting with joins, subqueries, or multi-source
queries.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE RFC] PPL Search Result Highlighting [FEATURE] Supporting Query Highlight Feature into PPL API

2 participants