#3038 Phase 0 spike: dual-stack System.Text.Json on the eager path (93% corpus parity)#3041
Draft
michaelcfanning wants to merge 1 commit into
Draft
#3038 Phase 0 spike: dual-stack System.Text.Json on the eager path (93% corpus parity)#3041michaelcfanning wants to merge 1 commit into
michaelcfanning wants to merge 1 commit into
Conversation
The frozen object model now carries STJ attributes ALONGSIDE the existing
Newtonsoft ones (via a `using Stj = ...` alias), so both serializers work
against the same model during the transition. The Newtonsoft path
(SarifLog.Load/LoadDeferred/Save and every existing call site) is
untouched and behaves identically.
scripts/StjAttributeTransform.ps1
One-time, idempotent mechanical transform of the 84 model files under
Autogenerated/, NotYetAutoGenerated/, VersionOne/Autogenerated/:
[DataMember(Name=x,...)] -> + [Stj.JsonPropertyName(x)]
[JsonProperty(DefaultValueH..)] -> + [Stj.JsonIgnore(WhenWritingDefault)]
[JsonIgnore] -> + [Stj.JsonIgnore]
The 84-file transform is one command, not 84 hand-edits.
src/Sarif/Readers/SystemTextJson/SarifJsonConverters.cs
9 JsonConverter<T> ports: Uri, DateTime, Version, SarifVersion,
BigInteger, plain-enum factory, [Flags]-enum factory, and the
PropertyBag/SerializedPropertyInfo pair (#3038 flags this as the
harder case; JsonElement.GetRawText() + Utf8JsonWriter.WriteRawValue()
are exactly the capture-raw-JSON primitives that pattern needs).
src/Sarif/Readers/SystemTextJson/SarifJson.cs
SarifJson.Load/Save/Serialize + configured Options. Three
TypeInfoResolver modifiers replicate the contract semantics
Newtonsoft gets for free from [DataContract]:
- EnforceDataContractOptIn: drop any property without
[JsonPropertyName] (suppresses SarifNodeKind, Tags, etc.)
- HonorDefaultValueAttribute: ShouldSerialize at the
[DefaultValue(x)] target, not the CLR default
- AdmitInternalPropertiesBag: include the internal Properties
bag every node overrides
SarifJsonContext is the [JsonSerializable] source-gen root for
NativeAOT; it compiles, but is parked behind reflection mode for
the spike (re-enabled once the transform also stamps [Stj.JsonIgnore]
on non-DataMember properties so the opt-in modifier can retire and
fast-path serialization comes back).
src/Test.UnitTests.Sarif/Readers/SystemTextJson/SarifJsonRoundTripTests.cs
Parity harness: STJ self-idempotency + STJ-vs-Newtonsoft semantic
equality (JToken.DeepEquals, the existing FileDiffingUnitTests bar)
across every embedded *.sarif fixture (76 v2 logs).
Measured Phase-0 result on the 76-fixture v2 corpus + 5 targeted tests:
75 / 81 pass (93%)
Progression across four root-cause fixes:
28 -> 29 (resource-path bug in test harness)
-> 36 (DataContract opt-in modifier; clears SarifNodeKind/Tags leak)
-> 34 (global WhenWritingDefault for EmitDefaultValue=false)
-> 75 (BigInteger converter; clears ~45 in one shot)
Remaining 6, categorized:
- 4x PrereleaseCompatibilityTransformer fixtures (pre-2.1.0 schema
version-upgrade edge cases; not representative SARIF)
- 1x Result.Kind [DefaultValue(ResultKind.Fail)] suppression
(the modifier should cover it; needs a closer look at where the
attribute lives across the Autogenerated/NotYetAutoGenerated split)
- 1x SerializedPropertyInfo null-literal handling: Newtonsoft stores
the JSON literal text; the STJ converter currently stores a CLR
null for JsonTokenType.Null. ~2-line fix.
Zero regression on the existing Newtonsoft suite: 959/960 pass; the 1
failure is FileSystem_IsSymbolicLinkTestCases, an environment-
dependent symlink test unrelated to serialization.
What this de-risks (per #3038 phasing):
- The frozen-model attribute transform IS mechanical (one script).
- The PropertyBag converter ports cleanly via STJ's raw-JSON
primitives; no research risk.
- The test bar is semantic (JToken.DeepEquals), not byte-exact, so
STJ/Newtonsoft formatting differences are not a validation cost.
- The deferred-reader port and the public-API break are genuinely the
long poles; everything before them is hours, not weeks.
Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase-0 spike for #3038: dual-stack System.Text.Json on the eager (de)serialization path, with measured corpus parity. The Newtonsoft path is untouched — this is purely additive.
tl;dr for the "weeks of effort" estimate
The full issue (deferred-reader rewrite + public-API break + AOT publish + measurement) is genuinely multi-week. Phase 0 is not. This branch gets the eager STJ path to 93% corpus parity in ~4 new files + one transform script, with the remaining 7% pinned to three narrow root causes. The scary parts of the issue turned out to be tractable:
StjAttributeTransform.ps1) adds STJ attributes alongside Newtonsoft via ausing Stj = …alias. Both serializers work against the same model. 84 files, one command.JsonElement.GetRawText()+Utf8JsonWriter.WriteRawValue()are exactly the capture-raw-JSON primitivesSerializedPropertyInfoneeds. ~40 lines.JToken.DeepEquals), not byte-exact. STJ's escaping/number-format/indent differences never reach the assertion.What's here
scripts/StjAttributeTransform.ps1— the mechanical transform (already applied to the 84 model files in this commit; re-runnable, idempotent)src/Sarif/Readers/SystemTextJson/SarifJsonConverters.cs— 9JsonConverter<T>portssrc/Sarif/Readers/SystemTextJson/SarifJson.cs—SarifJson.Load/Save/Serialize/Options+ 3 contract modifiers +[JsonSerializable]source-gen contextsrc/Test.UnitTests.Sarif/Readers/SystemTextJson/SarifJsonRoundTripTests.cs— parity harness over every embedded*.sariffixtureMeasured
75 / 81 pass (93%) on the v2 corpus + targeted property-bag tests.
Progression across four root-cause fixes during the spike: 28 → 29 → 36 → 34 → 75. Each jump was one well-understood gap:
[DataContract]opt-in →EnforceDataContractOptInmodifier[DataMember(EmitDefaultValue=false)]on every property → globalWhenWritingDefault[DefaultValue(x)]suppresses atx, not CLR default →HonorDefaultValueAttributemodifierLocation.Id : BigIntegerhas no built-in STJ converter →StjBigIntegerConverterRemaining 6, categorized (none are research risk):
PrereleaseCompatibilityTransformerfixtures — pre-2.1.0 schema-upgrade edge cases, not representative SARIFResult.Kind[DefaultValue(ResultKind.Fail)]suppression — the modifier should cover it; the attribute may live on theNotYetAutoGenerated/partial half (which the csproj excludes from compile)SerializedPropertyInfonull-literal handling — Newtonsoft stores the JSON literal"null"; STJ converter currently stores CLRnull. ~2-line fix.Zero regression on the existing Newtonsoft suite: 959/960 pass; the 1 failure is
FileSystem_IsSymbolicLinkTestCases, an environment-dependent symlink test unrelated to serialization.What this does NOT do (the actual long poles)
LoadDeferred,SarifDeferredContractResolver,JsonPositionedTextReader) — still Newtonsoft, off the default path, sequenced last per Migrate JSON serialization Newtonsoft.Json -> System.Text.Json (source-gen) to unblock NativeAOT #3038.Newtonsoft.Json.JsonConverter, 66 public members expose Newtonsoft types. This branch adds; it doesn't remove. The break plan is Phase 2.SarifJsonContext(the source-gen root) compiles, but is parked behind reflection mode until the transform script also stamps[Stj.JsonIgnore]on non-[DataMember]public properties so the opt-in modifier can retire and fast-path comes back. That's the next mechanical step, then<PublishAot>true</PublishAot>on the multitool and measure.How to extend
🤖 Generated with Claude Code