Refactor multiple flows to enhance performance, boost scalability, and ensure stability. by luuquangvu · Pull Request #131 · Nativu5/Gemini-FastAPI

luuquangvu · 2026-03-21T04:10:07Z

This PR is still a work in progress and uses features that aren't yet officially available in the Gemini-API library, so we'll need to wait for the library's official update before merging. Feel free to try it out and share any feedback or report any issues you encounter. Thanks!

Here are some highlights of the changes:

The entire logic for storing conversation history has been rewritten, aiming for compatibility with various endpoints and easy scalability in the future.
The logic of the endpoints has been rewritten, and now all endpoints work correctly with both streaming and non-streaming flows.
Compatible with the latest library updates, including the ability to download full-size images and enable video or music generation.
All cookie-related errors will be fully resolved, and users will get a clear notification if the server invalidates cookies, making it simple to know when to manually refresh a new one.

…N support - Replaced prefix-based parsing with a root key approach. - Added JSON parsing to handle list-based model configurations. - Improved handling of errors and cleanup of environment variables.

…to Python literals - Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails. - Improved error handling and logging for invalid configurations. - Ensured proper cleanup of environment variables post-parsing.

- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.

…nvironment variable setup

…nvironment variables; enhance error logging in config validation

…tring or list structure for enhanced flexibility in automated environments

… multiple chunks

…s found in either the raw or cleaned history.

…s found.

… for better Gemini compatibility.

…s found.

…eeds METADATA_TTL_MINUTES.

…tion from being saved

…ystem instruction when reusing a session to save tokens.

… text file attachment - When multiple chunks are sent simultaneously, Google will immediately invalidate the access token and reject the request - When a prompt contains a structured format like JSON, splitting it can break the format and may cause the model to misunderstand the context - Another minor tweak as Copilot suggested

…e sessions. - Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR.

…ith reusable sessions. - Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR. - Introducing a new feature for real-time streaming responses. - Fully resolve the problem with reusable sessions. - Break down similar flow logic into helper functions. - All endpoints now support inline Markdown images. - Switch large prompts to use BytesIO to avoid reading and writing to disk.

…ith reusable sessions. - Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR. - Introducing a new feature for real-time streaming responses. - Fully resolve the problem with reusable sessions. - Break down similar flow logic into helper functions. - All endpoints now support inline Markdown images. - Switch large prompts to use BytesIO to avoid reading and writing to disk. - Remove duplicate images when saving and responding.

Vigno04 · 2026-04-14T10:19:12Z

I think one improvement you could make is stripping unnecessary tokens before sending messages to the chat.

For example, when using Open WebUI, I see messages formatted like this looking at them from the gemini web ui:

 <|im_start|>user

aiutami a valutare ...

<|im_end|>

<|im_start|>assistant

As you can see, it includes all the special tags. However, these tags are already reintroduced by Gemini on the backend, since the message is sent as part of a web ui chat request.

Removing them would have a few benefits:

Based on tests with tiktoken (using Gemma), it reduces around ~30 tokens per request
It may improve model performance by avoiding duplicated start/end tokens
It could make the traffic less detectable by Google, as the message would resemble a more standard chat format

Overall, stripping these redundant tokens seems like a simple optimization with multiple advantages, i write it here since opening a pull request for this really small feature seems a bit pointless and also i wanted a second opinion on the matter

luuquangvu · 2026-04-14T11:36:51Z

@Vigno04 Thank you for your feedback. Regarding why we need to add ChatML tags or unnecessary system hints, you can refer back to previous issues like #59. It might save a few tokens, but it won't work with some clients that require a call tool.

Optimize the codebase by applying Sourcery suggestions

… on exponential backoff

…r a certain period of time

…r being idle

…r to prevent CPU spikes when handling large output frames

luuquangvu added 30 commits December 31, 2025 13:46

refactor: Simplify Gemini model environment variable parsing with JSO…

61c5f3b

…N support - Replaced prefix-based parsing with a root key approach. - Added JSON parsing to handle list-based model configurations. - Improved handling of errors and cleanup of environment variables.

fix: Improve regex patterns in helper module

476b9dd

- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.

docs: Update README files to include custom model configuration and e…

35c1e99

…nvironment variable setup

fix: Remove unused headers from HTTP client in helper module

9b81621

fix: Update README and README.zh to clarify model configuration via e…

32a48dc

…nvironment variables; enhance error logging in config validation

Update README and README.zh to clarify model configuration via JSON s…

0c00b08

…tring or list structure for enhanced flexibility in automated environments

Merge branch 'Nativu5:main' into main

e2233f4

Refactor: compress JSON content to save tokens and streamline sending…

b599d99

… multiple chunks

Refactor: Modify the LMDB store to fix issues where no conversation i…

186b844

…s found in either the raw or cleaned history.

Refactor: Modify the LMDB store to fix issues where no conversation i…

6dd1fec

…s found.

Refactor: Update all functions to use orjson for better performance

20ed245

Update project dependencies

f67fe63

Fix IDE warnings

889f2d2

Incorrect IDE warnings

66b6202

Refactor: Modify the LMDB store to fix issues where no conversation i…

3297f53

…s found.

Refactor: Centralized the mapping of the 'developer' role to 'system'…

5399b26

… for better Gemini compatibility.

Refactor: Modify the LMDB store to fix issues where no conversation i…

de01c78

…s found.

Refactor: Modify the LMDB store to fix issues where no conversation i…

1964147

…s found.

Refactor: Modify the LMDB store to fix issues where no conversation i…

8c5c749

…s found.

Refactor: Avoid reusing an existing chat session if its idle time exc…

ce67d66

…eeds METADATA_TTL_MINUTES.

Refactor: Update the LMDB store to resolve issues preventing conversa…

3d32d12

…tion from being saved

Refactor: Update the _prepare_messages_for_model helper to omit the s…

2eb9f05

…ystem instruction when reusing a session to save tokens.

Enable streaming responses and fully resolve the problem with reusabl…

bdd893f

…e sessions. - Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR.

Merge branch 'Nativu5:main' into main

a51f75c

Merge branch 'main' of https://github.com/luuquangvu/Gemini-FastAPI

767f0b3

Update the account quotas logic to make it more display-friendly

df512c3

luuquangvu added 18 commits April 14, 2026 19:52

Add a background task to send HTTP/2 PING frames

506c78f

Explicitly use HTTP/2 and include SSRF protection

90aefdb

Explicitly use HTTP/2 and include SSRF protection

1156d22

Include the impersonate parameter

e9c7ca7

Optimize the codebase by applying Sourcery suggestions

Periodically check for dead clients in the pool and revive them based…

d6aef65

… on exponential backoff

Explicitly use HTTP/3 with fallback and update libraries

b8cd546

Experiment with resolving the issue of cookies becoming inactive afte…

3230a81

…r a certain period of time

Experiment with resolving the issue of cookies becoming inactive afte…

bf765bc

…r a certain period of time

Update health_check

2b605b4

Experiment with resolving the issue of cookies becoming inactive afte…

7481322

…r a certain period of time

Update README to avoid using chrome based

26f5ad4

Workaround for new device-bound session mechanism

9d56d33

Stop background tasks when the account status is not available

a2f1ebd

Add support for the new Flash Lite model

539c398

Tagged arguments preserve JSON-compatible types

228ca5e

Implement new extended thinking level

d7ca588

Implement new extended thinking level

cafd046

Update to fix new extended thinking mode

ad11981

luuquangvu mentioned this pull request May 21, 2026

请问3.5 flash能用上吗 #149

Open

luuquangvu added 8 commits May 21, 2026 23:12

Increase the close_delay to allow cookies to refresh before going idle

90fb370

Include Pyright in the type-checking pipeline and small improvements

04abb9b

Implement a new helper to get compute usage based info

1abc647

Update helper to get compute-based usage limits info

103606c

Cache the available model list to prevent it from becoming empty afte…

c0b59d9

…r being idle

Apply strict rules for a structured JSON format

c448cfb

Fix leaked structured JSON format when using streaming mode

3816709

Improve performance by switching from a stateless to a stateful parse…

a1d9c53

…r to prevent CPU spikes when handling large output frames

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor multiple flows to enhance performance, boost scalability, and ensure stability.#131

Refactor multiple flows to enhance performance, boost scalability, and ensure stability.#131
luuquangvu wants to merge 288 commits into
Nativu5:mainfrom
luuquangvu:main

luuquangvu commented Mar 21, 2026

Uh oh!

Vigno04 commented Apr 14, 2026

Uh oh!

luuquangvu commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luuquangvu commented Mar 21, 2026

Uh oh!

Vigno04 commented Apr 14, 2026

Uh oh!

luuquangvu commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants