Add digit separators #1160

seizethedave · 2024-06-22T03:19:52Z

This PR adds digit separators (1_000) to Jsonnet's numeric constants.

Accompanying issue with format proposal: #1155

sbarzowski · 2024-06-23T12:30:06Z

I'm in favor of this. We need to also:

Update the docs. At least

jsonnet/doc/ref/spec.html

Line 4 in 2bca3a0

---

. Ideallly also the tutorial.

See https://github.com/google/jsonnet?tab=readme-ov-file#locally-serving-the-website for working on documentation.

Add some end-to-end examples. Just dump some .jsonnet files in test_suite and run https://github.com/google/jsonnet/blob/master/test_suite/refresh_golden.sh.
We'll need to update go-jsonnet implementation (separate PR). It should be straightforward.

seizethedave · 2024-06-23T15:12:34Z

Thanks @sbarzowski, I'll get cracking on those.

seizethedave · 2024-07-04T21:03:14Z

@sbarzowski I think I'm ready for another round of feedback on this one. I suspect these might need a little more help:

a more formal treatment of the numeric grammar in the docs
note how std.parseInt and related functions will not honor numbers with underscores
Thanks!

seizethedave · 2024-08-14T16:05:53Z

Anything I can do to push this forward @sbarzowski? Thanks!

johnbartholomew · 2025-02-23T02:08:08Z

Sorry for all the delays on this. I do hope to review this soon; it seems like a useful improvement.

seizethedave · 2025-08-12T15:51:44Z

@johnbartholomew I'll try to find a few minutes to get this PR back into a healthy/mergable state. If you have any pre-review feedback or changes I can get those going.

johnbartholomew · 2026-01-21T16:48:25Z

core/lexer.cpp

                    case 'e':
                    case 'E': state = AFTER_E; break;

+                    case '_': state = AFTER_UNDERSCORE; goto skip_char;


Underscore must not be allowed after a leading zero.

JSON numbers that start with a 0 may only be 0 itself, or a fraction starting 0., or a (fairly pointless...) 0 with an exponent (e.g., 0e5). If you go to the AFTER_UNDERSCORE state after a leading zero then it can be used to produce an invalid number, e.g., 0_5 is read as 05 which is not allowed.

(The reason it's not allowed is because it's ambiguous - Javascript reads numbers with a leading zero as octal numbers, but JSON does not support octal numbers, so to avoid a misleading difference between JSON and Javascript leading zeros are simply forbidden)

johnbartholomew · 2026-01-21T16:53:27Z

core/lexer.cpp

+                    case '6':
+                    case '7':
+                    case '8':
+                    case '9': state = AFTER_ONE_TO_NINE; break;


Unfortunately, this is not correct. We need to maintain the distinction between state AFTER_ONE_TO_NINE (which is the state used in the integer part of the number) and state AFTER_DIGIT which is used after the decimal point. Having a single AFTER_UNDERSCORE state which transitions to AFTER_ONE_TO_NINE means that you allow an invalid repetition of the decimal point, e.g., this accepts 12.34_56.78_90.12 which should not be allowed.

I see you already handled this correctly for the exponent part, where you have a separate AFTER_EXP_UNDERSCORE state; you just need a similar AFTER_FRAC_UNDERSCORE state to correspond with AFTER_DIGIT.

(For what it's worth, the existing state names AFTER_ONE_TO_NINE and AFTER_DIGIT are really confusing!)

johnbartholomew · 2026-01-21T18:04:18Z

Well, "reviewing this soon" turned into ignoring it for an entire year 😭

This is basically very good and I should have merged it a long time ago! There are a couple of errors in the state machine, I believe, which I have commented on. But on a positive note, there have been almost no other changes to lexer.cpp in the meantime, so this change still rebases cleanly.

On the basis that this has been sitting in the queue for far, far too long already, I will go ahead and rebase this myself, add a couple of fixes, and merge it.

…tion There are some cases which are a little strange but lexically valid. - `1.2.3.4` lexically this tokenises as `1.2` DOT `3.4`, because a dot in the fractional or exponent part of a number is simply treated the same as any other possible terminating character (any character that isn't part of the valid number lexical syntax) - `1e2.34` lexically is `1e2` DOT `34` (same as the first case) - `1e2e34` lexically is `1e2` (number) `e34` (identifier) These behaviours are basically preserved/extrapolated in the case of digit separators, so for example `1_2.3_4.5_6` is lexically parsed as `12.34` DOT `56`. And `1e2_3e4` is lexically parsed as `1e23` (number), `e4` (identifier). These both look very confusing, but it probably doesn't matter because those token sequences are, I think, not valid syntactically so they'll just be rejected by the parser. Note that in JSON (and jsonnet), leading zeros are not allowed in numeric literals. This behaviour is explicitly kept with digit separators, so `0_5` is explicitly rejected. The alternatives are: - Treat underscore after an initial zero the same as any terminator character, so `0_5` lexes as tokens `0` followed by identifier `_5`. - Allow underscore, thereby breaking the no-leading-zeros rule, so `0_5` tokenises as `05`. Either option seems confusing, hence it seems better to explicitly reject an underscore after an initial zero.

johnbartholomew · 2026-01-21T18:31:41Z

Oh, @seizethedave I see you actually have seen the review comments already!

Would you prefer to do the updates yourself, or would you prefer that I add fixes? I'm happy to do either.

seizethedave · 2026-01-21T19:59:34Z

Well, "reviewing this soon" turned into ignoring it for an entire year 😭

Hey, I'm familiar with that operating style.

@johnbartholomew I will take you up on the offer to rebase/fix/submit, thank you very much! Good catches on the state machine.

seizethedave mentioned this pull request Jun 23, 2024

Add digit separators to Jsonnet google/go-jsonnet#760

Open

seizethedave marked this pull request as ready for review July 4, 2024 21:00

MrG9090 approved these changes Jul 13, 2024

View reviewed changes

MrG9090 approved these changes Oct 22, 2024

View reviewed changes

MrG9090 approved these changes Oct 29, 2024

View reviewed changes

MrG9090 approved these changes Nov 6, 2024

View reviewed changes

MrG9090 approved these changes Nov 14, 2024

View reviewed changes

MrG9090 approved these changes Jan 15, 2025

View reviewed changes

johnbartholomew self-assigned this Feb 23, 2025

MrG9090 approved these changes Mar 9, 2025

View reviewed changes

seizethedave added 10 commits January 21, 2026 16:34

Add some support for underscore separators.

b3befa2

Tests and fixes.

278fcbc

More tests.

214eb82

More tests.

052bacd

Simpler to not special-case consecutive _s.

5a549e7

Update docs and tutorial to include digit separators.

5702a24

newline

58cf8c1

Add test suite jsonnets for digit separators.

386e73a

parseInt->parseJson

37a7b79

Regenerate golden file.

7784da1

johnbartholomew reviewed Jan 21, 2026

View reviewed changes

johnbartholomew force-pushed the digitsep branch from b7aa8bd to 82ebe7d Compare January 21, 2026 20:19

johnbartholomew added 2 commits January 21, 2026 20:30

fix: jsonnetfmt the digit-separator test suite inputs

59b023e

doc: update docs from source

f78b553

johnbartholomew merged commit f78b553 into google:master Jan 21, 2026
9 checks passed

johnbartholomew mentioned this pull request Jan 21, 2026

Support underscores or other separators in numeric literals. #1155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add digit separators #1160

Add digit separators #1160

seizethedave commented Jun 22, 2024 •

edited

Loading

Uh oh!

sbarzowski commented Jun 23, 2024

Uh oh!

seizethedave commented Jun 23, 2024

Uh oh!

seizethedave commented Jul 4, 2024

Uh oh!

seizethedave commented Aug 14, 2024

Uh oh!

johnbartholomew commented Feb 23, 2025

Uh oh!

seizethedave commented Aug 12, 2025

Uh oh!

johnbartholomew Jan 21, 2026

Uh oh!

johnbartholomew Jan 21, 2026

Uh oh!

johnbartholomew commented Jan 21, 2026

Uh oh!

johnbartholomew commented Jan 21, 2026

Uh oh!

seizethedave commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add digit separators #1160

Add digit separators #1160

Conversation

seizethedave commented Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbarzowski commented Jun 23, 2024

Uh oh!

seizethedave commented Jun 23, 2024

Uh oh!

seizethedave commented Jul 4, 2024

Uh oh!

seizethedave commented Aug 14, 2024

Uh oh!

johnbartholomew commented Feb 23, 2025

Uh oh!

seizethedave commented Aug 12, 2025

Uh oh!

johnbartholomew Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

johnbartholomew Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

johnbartholomew commented Jan 21, 2026

Uh oh!

johnbartholomew commented Jan 21, 2026

Uh oh!

seizethedave commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

seizethedave commented Jun 22, 2024 •

edited

Loading