-
-
Notifications
You must be signed in to change notification settings - Fork 148
Description
Disclaimer: I asked copilot to write up this bug report as it implemented a workaround
Description:
When using the Gemini bridge with streaming enabled, RawHttpResult::getDataStream() fails with JsonException ("Control character error" or "Syntax error") because it assumes each HTTP chunk contains complete, parseable JSON.
In practice, the HTTP transport layer can split the response at arbitrary byte boundaries. The Gemini API returns responses as JSON arrays ([{...}, {...}]), and these can be split mid-object across multiple HTTP chunks.
Steps to Reproduce:
- Configure Gemini platform with streaming enabled
- Make a chat request that triggers a tool call (these tend to have larger responses with base64 signatures)
- The response is split across multiple HTTP chunks
Expected Behavior:
The stream should buffer incoming data and only yield complete JSON objects.
Actual Behavior:
The stream attempts to parse each HTTP chunk as complete JSON, throwing JsonException on partial data:
JsonException: Control character error, possibly incorrectly encoded
Debug Output:
The raw chunks received show the split:
- Chunk 1:
[{"candidates": [...partial JSON with base64 signature... - Chunk 2:
...base64 continuation... - Chunk 3:
...more base64... - Chunk 4:
]
Root Cause:
In RawHttpResult.php:
public function getDataStream(): iterable
{
foreach ((new EventSourceHttpClient())->stream($this->response) as $chunk) {
// ...
$jsonDelta = $chunk instanceof ServerSentEvent ? $chunk->getData() : $chunk->getContent();
// This assumes $jsonDelta is complete JSON, but HTTP chunks can split anywhere
$deltas = explode(",\r\n", $jsonDelta);
foreach ($deltas as $delta) {
yield json_decode($delta, true, flags: \JSON_THROW_ON_ERROR); // Fails on partial JSON
}
}
}Suggested Fix:
Buffer the incoming data and only parse when a complete JSON array/object is detected:
public function getDataStream(): iterable
{
$buffer = '';
foreach ((new EventSourceHttpClient())->stream($this->response) as $chunk) {
if ($chunk->isFirst() || $chunk->isLast()) {
continue;
}
if ($chunk instanceof ServerSentEvent && '[DONE]' === $chunk->getData()) {
continue;
}
$buffer .= $chunk instanceof ServerSentEvent ? $chunk->getData() : $chunk->getContent();
// Try to parse when we likely have complete JSON (ends with ] for arrays)
$trimmed = trim($buffer);
if (str_starts_with($trimmed, '[') && str_ends_with($trimmed, ']')) {
$decoded = json_decode($trimmed, true);
if (json_last_error() === JSON_ERROR_NONE) {
foreach ($decoded as $item) {
yield $item;
}
$buffer = '';
}
}
}
}Workaround:
We implemented a custom ResultConverter that uses our own stream parser with buffering, bypassing RawHttpResult::getDataStream() entirely.
Environment:
- symfony/ai-platform version: dev-main (commit d7c8e4c9d3ebca8a670e06223067a8b58b5cb91d)
- PHP version: 8.4.15
- Gemini model: gemini-2.5-flash