Skip to content

feat(env): format-preserving byte-level YAML splice fast path#133

Merged
fentas merged 10 commits into
mainfrom
feat/format-preserving-emitter
Apr 8, 2026
Merged

feat(env): format-preserving byte-level YAML splice fast path#133
fentas merged 10 commits into
mainfrom
feat/format-preserving-emitter

Conversation

@fentas

@fentas fentas commented Apr 7, 2026

Copy link
Copy Markdown
Owner

Follow-up from PR #126 review: every successful merge sync used to produce a noisy git diff because `spliceYAMLStructural` round-tripped the whole document through `yaml.Marshal`, which normalizes whitespace, quoting, and field ordering even for keys the splice didn't touch.

What's new

`spliceYAMLByteLevel` is a new fast path that walks `local` byte-by-byte and only re-emits the in-scope keys:

  1. Parse local for `yaml.v3` Node Line/Column metadata.
  2. Compute byte ranges for every top-level key.
  3. Walk local:
    • out-of-scope range → copy verbatim
    • in-scope, in merged → emit `yaml.Marshal` of just that one key:value pair
    • in-scope, not in merged → drop (deletion)
  4. Append additions (merged keys not in local) at the end.

The output is byte-identical for every unchanged byte. Only the in-scope keys go through the yaml.v3 emitter.

Dispatch order in `spliceYAML`

  1. Byte-level splice (best fidelity, NEW)
  2. Structural splice (fallback for shapes byte-level can't handle — empty doc, non-mapping root, exotic Line metadata)
  3. Text splice (conflict path, unchanged)

Test

`TestSpliceYAMLByteLevel_PreservesOutOfScopeBytesVerbatim` uses a local file with non-default formatting:

```yaml

Top of file

binaries:
kubectl: {} # 4-space indent
kustomize: {}

envs:
"github.com/keep/me": # an inline comment
files:
"a.yaml": "docs/a.yaml" # double-quoted keys
# trailing comment inside envs
```

After splicing in a new `tilt:` binary, all of the formatting choices above (4-space indent, quoted keys, inline comment, trailing comment, header comment) survive byte-for-byte. Before this PR, every one of them would be reformatted by yaml.v3.

This is the format-preserving emitter mentally tracked from the #126 review loop.

🤖 Generated with Claude Code

Follow-up from PR #126 review: every successful merge sync produced
a noisy git diff because spliceYAMLStructural round-tripped the
whole document through yaml.Marshal, which would normalize
whitespace, quoting, and field ordering even for keys the splice
didn't touch.

New byte-level fast path (spliceYAMLByteLevel):

  1. Parse local for yaml.Node Line/Column metadata.
  2. Compute byte ranges for every top-level key.
  3. Walk local byte-by-byte:
       - out-of-scope key range  → copy verbatim
       - in-scope key, in merged → emit yaml.Marshal of just that
                                    one key:value pair
       - in-scope key, not merged → drop (deletion)
  4. Append additions (merged keys not in local) at the end.

The output preserves whitespace, quoting, and comments BYTE-FOR-BYTE
on every byte that wasn't touched. Only the in-scope keys go through
the yaml.v3 emitter.

Dispatch order in spliceYAML is now:
  1. Byte-level splice (best fidelity)
  2. Structural splice (existing fallback for shapes byte-level
     can't handle — empty doc, non-mapping root, exotic Line metadata)
  3. Text splice (conflict path, unchanged)

New helper marshalSingleTopLevelKey serializes one key:value pair as
a top-level YAML mapping entry. Used for both substitutions and
appends in the byte-level path.

New regression test
TestSpliceYAMLByteLevel_PreservesOutOfScopeBytesVerbatim uses a
local file with non-default formatting (4-space indent,
double-quoted keys, inline comments) to prove that none of those
choices get normalized away when an unrelated top-level key is
spliced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new YAML splice fast path that preserves untouched bytes verbatim during select + merge env syncs, reducing noisy diffs caused by full-document yaml.v3 re-emission.

Changes:

  • Introduces spliceYAMLByteLevel and updates spliceYAML dispatch order to prefer byte-level splicing, then structural, then text/conflict splice.
  • Adds a unit test intended to validate that out-of-scope YAML formatting is preserved when splicing in-scope changes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/env/splice.go Adds byte-level YAML splice strategy and adjusts dispatch to prioritize format-preserving behavior.
pkg/env/splice_test.go Adds a new test case for byte-level splice preservation behavior.

Comment thread pkg/env/splice_test.go Outdated
Comment thread pkg/env/splice.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread pkg/env/splice.go
Comment thread pkg/env/splice.go
- spliceYAML doc comment now accurately describes the byte-level
  path: out-of-scope bytes are byte-identical, but in-scope ranges
  are re-encoded as full key:value pairs and may have key
  quoting/style and key-line comments rewritten.
- TestSpliceYAMLByteLevel_PreservesOutOfScopeBytesVerbatim now
  asserts exact byte equality on the envs: slice instead of
  substring matching, so any reformatter touching that region
  fails the test even when the surrounding lines still appear.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread pkg/env/splice.go
Comment thread pkg/env/splice.go
- Validate top-level key byte ranges before slicing in
  spliceYAMLByteLevel. yaml.v3 Line/Column metadata can be
  unreliable for flow-style mappings or single-line documents,
  and an out-of-bounds slice would panic at runtime instead of
  letting spliceYAML fall back to the structural / text path.
- Preserve CRLF line endings when the local file uses them.
  yaml.v3 always emits LF, so without this the spliced regions
  would produce a mixed-ending file (CRLF in preserved bytes,
  LF in spliced regions) — noisy in diffs and bad for
  Windows-aware tooling. New usesCRLF helper detects the local
  file's convention and the byte-level splice rewrites emitted
  blocks (and the appended-keys separator newline) to match.
- Strategy doc comment now refers to marshalSingleTopLevelKey
  instead of yaml.Marshal, which is what the implementation
  actually uses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread pkg/env/splice.go Outdated
Comment thread pkg/env/splice.go
Comment thread pkg/env/splice.go Outdated
… round 3)

- Range validation now requires startByte < endByte (was allowing
  startByte == endByte, which let zero-length yaml.v3 metadata for
  flow-style mappings sneak through and emit garbage).
- Hoist the usesCRLF call out of the per-key splice loop. Scanning
  the entire file inside the loop made the splice O(n*k); the
  line-ending convention can't change between keys, so detecting
  it once is enough.
- New TestSpliceYAMLByteLevel_PreservesCRLFLineEndings test covers
  the CRLF normalization path end-to-end: a CRLF local file stays
  CRLF after the splice with no bare LFs anywhere in the output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread pkg/env/splice.go Outdated
Comment thread pkg/env/splice_test.go
- spliceYAMLByteLevel doc now says explicitly that the in-scope
  key line is also re-emitted (not preserved from local), so key
  quoting/style and key-line comments may change. Out-of-scope
  bytes still keep their original formatting.
- TestSpliceYAMLByteLevel_PreservesCRLFLineEndings now checks the
  envs:-section bytes.Index results before slicing. A regression
  that dropped envs: from either side previously would have
  panicked instead of producing a useful failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fentas fentas requested a review from Copilot April 7, 2026 19:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread pkg/env/splice_test.go Outdated
…nd 5)

The naive bytes.Contains(\\n) && !bytes.Contains(\\r\\n) check was
always false on a CRLF file because CRLF itself contains \\n. The
explicit per-byte loop below is the real check; remove the dead
guard and document why.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread pkg/env/splice.go
…nd 6)

The implementation uses yaml.NewEncoder().Encode() and the
marshalSingleTopLevelKey helper, not yaml.Marshal directly.
Update the splice strategy doc comments so the wording matches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/env/splice.go:36

  • The spliceSelectedScope doc comment still describes only two strategies (structural + text), but spliceYAML now has a byte-level fast path that is the default for successful YAML merges. Please update this top-level comment to reflect the three-path dispatch so callers understand when output is byte-preserving vs re-encoded.
// Two paths:
//
//   - Structural splice (fast path): when `merged` parses as valid YAML,
//     rewrite the local Node tree in place (replace scoped key values,
//     append new scoped keys, remove vanished scoped keys). Out-of-scope
//     comments and layout are preserved because their Nodes are untouched.
//     Note: the yaml.v3 encoder re-emits the whole document, so the output won't
//     be byte-identical to the input even for unchanged keys — this is a
//     limitation of yaml.v3 that the structural splice cannot work around
//     here. A format-preserving emitter is tracked as a separate
//     follow-up.

Comment thread pkg/env/splice.go
Comment thread pkg/env/splice_test.go Outdated
…133 round 7)

- New TestSpliceYAMLByteLevel_AppendsNewScopedKey covers the
  "additions" branch: local has only out-of-scope keys, merged
  introduces a new scoped top-level key, and the splice must
  append it at EOF with byte-identical preservation of the
  existing local content and a separating newline.
- Update the stale comment in
  TestSpliceYAMLByteLevel_PreservesOutOfScopeBytesVerbatim that
  still mentioned yaml.Marshal — the implementation uses the
  yaml.v3 encoder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread pkg/env/splice.go Outdated
…round 8)

The doc still described only the structural fast path + text
fallback and called the format-preserving emitter a "separate
follow-up". This PR added the byte-level fast path; rewrite the
top-level comment to describe the three-path dispatch in order of
fidelity (byte-level → structural → text) and drop the stale
follow-up wording.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread pkg/env/splice.go Outdated
Previously usesCRLF returned true if the file contained ANY \r\n,
which meant a mostly-LF file with a stray CRLF would have spliced
regions rewritten to CRLF — increasing the mixing instead of
fixing it.

New rule: a file is CRLF only when at least one \r\n is present
AND every \n is preceded by \r. Mixed-ending files stay LF so the
splice produces output that matches yaml.v3's emitter and the
dominant convention. New TestUsesCRLF covers all five edge cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@fentas fentas merged commit 94cd711 into main Apr 8, 2026
12 checks passed
@fentas fentas deleted the feat/format-preserving-emitter branch April 8, 2026 06:23
fentas pushed a commit that referenced this pull request Apr 8, 2026
🤖 I have created a release *beep* *boop*
---


## [4.14.0](v4.13.0...v4.14.0)
(2026-04-08)


### Features

* **cli:** b env resolve for merge conflicts
([#125](#125) phase 4)
([#138](#138))
([b3d00c8](b3d00c8))
* **cli:** plan-json polish — destructive row in error + effectiveDryRun
helper ([#132](#132))
([4ad5dfa](4ad5dfa))
* **cli:** surface conflict markers in env status
([#139](#139))
([1e54c73](1e54c73))
* **env:** format-preserving byte-level YAML splice fast path
([#133](#133))
([94cd711](94cd711))
* **env:** JSON splice support for scoped selects
([#135](#135))
([89ccf9d](89ccf9d))
* **env:** orphan-file delete plumbing
([#125](#125) phase 3)
([#137](#137))
([5891781](5891781))
* **env:** per-key b.pin annotation
([#125](#125) phase 2)
([#136](#136))
([9ed59db](9ed59db))
* **env:** structural 3-way merge for YAML/JSON
([#134](#134))
([fd787b0](fd787b0))
* **env:** wrapKeyFor extracts leading identifier for filter expressions
([#131](#131))
([f542f28](f542f28))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants