Skip to content

Refactor multiple flows to enhance performance, boost scalability, and ensure stability.#131

Draft
luuquangvu wants to merge 288 commits into
Nativu5:mainfrom
luuquangvu:main
Draft

Refactor multiple flows to enhance performance, boost scalability, and ensure stability.#131
luuquangvu wants to merge 288 commits into
Nativu5:mainfrom
luuquangvu:main

Conversation

@luuquangvu
Copy link
Copy Markdown
Collaborator

This PR is still a work in progress and uses features that aren't yet officially available in the Gemini-API library, so we'll need to wait for the library's official update before merging. Feel free to try it out and share any feedback or report any issues you encounter. Thanks!

Here are some highlights of the changes:

  • The entire logic for storing conversation history has been rewritten, aiming for compatibility with various endpoints and easy scalability in the future.
  • The logic of the endpoints has been rewritten, and now all endpoints work correctly with both streaming and non-streaming flows.
  • Compatible with the latest library updates, including the ability to download full-size images and enable video or music generation.
  • All cookie-related errors will be fully resolved, and users will get a clear notification if the server invalidates cookies, making it simple to know when to manually refresh a new one.

…N support

- Replaced prefix-based parsing with a root key approach.
- Added JSON parsing to handle list-based model configurations.
- Improved handling of errors and cleanup of environment variables.
…to Python literals

- Added `ast.literal_eval` as a fallback for parsing environment variables when JSON decoding fails.
- Improved error handling and logging for invalid configurations.
- Ensured proper cleanup of environment variables post-parsing.
- Adjusted `TOOL_CALL_RE` regex pattern for better accuracy.
…nvironment variables; enhance error logging in config validation
…tring or list structure for enhanced flexibility in automated environments
…s found in either the raw or cleaned history.
…ystem instruction when reusing a session to save tokens.
… text file attachment

- When multiple chunks are sent simultaneously, Google will immediately invalidate the access token and reject the request
- When a prompt contains a structured format like JSON, splitting it can break the format and may cause the model to misunderstand the context
- Another minor tweak as Copilot suggested
…e sessions.

- Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR.
…ith reusable sessions.

- Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
…ith reusable sessions.

- Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.
…ith reusable sessions.

- Ensure that PR HanaokaYuzu/Gemini-API#220 is merged before proceeding with this PR.
- Introducing a new feature for real-time streaming responses.
- Fully resolve the problem with reusable sessions.
- Break down similar flow logic into helper functions.
- All endpoints now support inline Markdown images.
- Switch large prompts to use BytesIO to avoid reading and writing to disk.
- Remove duplicate images when saving and responding.
@Vigno04
Copy link
Copy Markdown
Contributor

Vigno04 commented Apr 14, 2026

I think one improvement you could make is stripping unnecessary tokens before sending messages to the chat.

For example, when using Open WebUI, I see messages formatted like this looking at them from the gemini web ui:

 <|im_start|>user

aiutami a valutare ...

<|im_end|>

<|im_start|>assistant 

As you can see, it includes all the special tags. However, these tags are already reintroduced by Gemini on the backend, since the message is sent as part of a web ui chat request.

Removing them would have a few benefits:

  • Based on tests with tiktoken (using Gemma), it reduces around ~30 tokens per request
  • It may improve model performance by avoiding duplicated start/end tokens
  • It could make the traffic less detectable by Google, as the message would resemble a more standard chat format

Overall, stripping these redundant tokens seems like a simple optimization with multiple advantages, i write it here since opening a pull request for this really small feature seems a bit pointless and also i wanted a second opinion on the matter

@luuquangvu
Copy link
Copy Markdown
Collaborator Author

@Vigno04 Thank you for your feedback. Regarding why we need to add ChatML tags or unnecessary system hints, you can refer back to previous issues like #59. It might save a few tokens, but it won't work with some clients that require a call tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants