Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 27 additions & 11 deletions src/Lua/CodeAnalysis/Syntax/Lexer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@ public bool MoveNext()
{
case ' ':
case '\t':
while (offset < Source.Length && (Source.Span[offset] == ' ' || Source.Span[offset] == '\t'))
{
Advance(1);
}

return MoveNext();
case '\n':
current = SyntaxToken.EndOfLine(position);
Expand Down Expand Up @@ -110,16 +115,16 @@ public bool MoveNext()
current = SyntaxToken.Addition(position);
return true;
case '-':
// comment
if (c2 == '-')
// handle comments iteratively
while (offset < span.Length && c2 == '-')
{
var pos = position;
Advance(1);
Advance(1); // consume first '-'

// block comment
if (span.Length > offset + 1 && span[offset] is '[' && span[offset + 1] is '[' or '=')
if (span.Length > offset + 1 && span[offset] == '[' && (span[offset + 1] == '[' || span[offset + 1] == '='))
{
Advance(1);
Advance(1); // consume second '-'
var (_, _, isTerminated) = ReadUntilLongBracketEnd(ref span);
Comment on lines 121 to 128
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comments on the Advance(1) calls are misleading. At this point the first '-' has already been consumed before the switch, so Advance(1) here consumes the second '-', and the subsequent Advance(1) inside the block-comment branch consumes the '[' that starts the long-bracket delimiter. Please update these comments to reflect what is actually being consumed to avoid future regressions.

Copilot uses AI. Check for mistakes.
if (!isTerminated)
{
Expand All @@ -131,13 +136,24 @@ public bool MoveNext()
ReadUntilEOL(ref span, ref offset, out _);
}

return MoveNext();
}
else
{
current = SyntaxToken.Subtraction(position);
return true;
// prepare for next iteration
if (offset < span.Length)
{
c2 = (offset + 1 < span.Length) ? span[offset + 1] : char.MinValue;
if (span[offset] != '-') break; // next char is not a comment, exit loop
}
Comment on lines 119 to 144
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the while loop, position is captured once before entering the '-' case and is never recomputed after skipping a full comment line. If the loop consumes multiple comments (or reaches code after a comment), any diagnostics raised from later iterations will report the wrong start position. A safer approach is to avoid trying to process multiple logical tokens inside the same '-' switch arm; after skipping one comment, restart the main lexing loop so startOffset/position are recalculated for the next token.

Copilot uses AI. Check for mistakes.
else
{
break;
}
}

// after skipping comments, if we reached end, return false
if (offset >= span.Length)
return false;

current = SyntaxToken.Subtraction(position); // if single '-' remains
return true;
Comment on lines 118 to 156
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is intended to prevent stack overflows on inputs with many -- comments, but the current test suite only checks a few small comment cases. Please add a regression test that feeds the lexer a large number of comment lines (and optionally comments followed by real code) to ensure MoveNext() can iterate through them without a stack overflow and still produces correct tokens afterward.

Copilot uses AI. Check for mistakes.
Comment on lines 118 to 156
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment-skipping logic in the '-' case does not continue lexing after a comment unless the file ends. After skipping a line/block comment, this code falls through to current = SyntaxToken.Subtraction(position) without consuming the current character at offset, so inputs like "--comment\nprint(1)" will incorrectly emit a subtraction token (and can get stuck reprocessing the same character). Restructure this branch so that when a comment is recognized (c2 == '-'), it skips the comment and then restarts tokenization from the new offset (iteratively, not by emitting Subtraction).

Copilot uses AI. Check for mistakes.
case '*':
current = SyntaxToken.Multiplication(position);
return true;
Expand Down