Disk-index-rag#16
Open
THENIROCK wants to merge 23 commits into
Open
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a Tauri+React UI shell with a WebSocket bridge backend while preserving the existing overlay event contract, and update the Qt overlay visuals plus typing/runtime fixes needed for stable startup. Made-with: Cursor
Brings in the new Tauri+React voice UI (Ali/ui-app/, ui/web_overlay.py, wake word demos) while keeping the disk access / file resolution work on this branch. Only conflict was Ali/requirements.txt, where PyQt6 (island) and PySide6+websockets (new overlay) deps are both kept. Made-with: Cursor
… fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The visual planner now emits a single new action type — browser_task —
and the orchestrator dispatches it to a vendored Chrome-extension sub-agent
via MCP stdio (executors/browser/agent_client.py). The sub-agent owns the
full agent loop on the user's real Chrome (navigation, DOM reading, tool
calls, confirmation gate); Cactus picks the task description once per voice
command rather than per step.
Changes in Ali/:
executors/browser/
agent/ NEW — vendored Hanzi Browse extension + MCP
server (stripped of cloud/license/telemetry
surface). Chrome extension runs against your
real signed-in Chrome via chrome.debugger.
agent_client.py NEW — Python MCP-stdio client. Same surface
every executor uses: run_task / get_task /
send_message / cancel / poll_until_...
browser.py DELETED — replaced by the sub-agent.
adapters/ DELETED — site-specific YC Apply Playwright
flows. The sub-agent handles these directly
from a natural-language task string now.
scripts/cactus_server.py NEW — FastAPI sidecar wrapping Cactus's
Python SDK (bare ctypes → HTTP). Lets the
extension talk to on-device Gemma 4 via
provider='cactus' in chrome.storage.local;
default is Google AI Studio via
provider='google'.
orchestrator/visual_planner.py
Simplified ALLOWED_ACTION_TYPES to browser_task
+ ask_user + complete + abort. Updated prompt
+ fallback path to emit browser_task with a
complete natural-language task description
(plus ${resume}/${contact_X} slot placeholders
that the orchestrator resolves before sending).
orchestrator/orchestrator.py
_execute_action now dispatches browser_task to
the LocalAgentClient sub-agent. Resolves slot
placeholders locally so sensitive paths/names
never reach whatever LLM drafts the task
string. Relays awaiting_confirmation back to
the user and sends their yes/no reply through.
tests/test_core_contracts.py
Updated fallback test to assert the new
browser_task delegation.
requirements.txt Drops playwright; adds fastapi, uvicorn, mcp
(Python MCP client).
config/settings.py Adds CACTUS_VL_MODEL / CACTUS_SIDECAR_URL /
AGENT_NODE_BIN.
.gitignore Ignores node_modules and the built TS dist
under the vendored agent tree.
docs/cactus-findings.md NEW — one-day-of-hacking write-up: dylib path
fix, actual Python SDK signatures, measured
gemma-4-E2B prefill rates on M1 Pro, the
read_page DOM-size cliff that pushed the
default browser LLM to Gemini 2.5 Flash, and
where Cactus actually fits in this product.
Tests: 7/7 Ali contracts pass. Smoke-verified LocalAgentClient spawns the
Node MCP server, connects to the relay, and the Chrome extension registers
as expected.
Adds LocalAgentClient (MCP-stdio) + full browser sub-agent extension. Resolved conflict by keeping Korin's run_script/author_script/compose_mail handlers alongside Hanzi's browser_task delegation path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ali: swap Playwright for Chrome-extension browser sub-agent
User-visible changes: - Remove the "Powered by Hanzi Browse" overlay badge that used to appear in the top-right of every page the agent drove. The content script still renders the pulsing glow + stop button; only the branded badge is gone. - Rename [Hanzi Browse] DevTools console-log prefixes to [browser-agent]. - Page titles: Hanzi Browse → Voice Agent — Browser (sidepanel, sidepanel-preact), Welcome to Hanzi Browse → Welcome (onboarding). - MCP server startup log: "Starting Hanzi Browse MCP Server v2.0..." → "Starting browser sub-agent MCP server v2.0..." - Doc headers in service-worker, sidepanel.js, server/index.ts, cli.ts, cli/exit-codes.ts all dropped the brand. Left alone on purpose (changing these breaks runtime or costs nothing in demo optics): - com.hanzi_browse.oauth_host native-messaging host ID (bundle ID) - HANZI_BROWSE_* env var names (compat with existing env) - ~/.hanzi-browse session dir (compat with existing session files) - Chrome Web Store EXTENSION_URL fallback - integration.test.ts references (tests we don't run) - UPSTREAM.md / cactus-findings.md factual origin references Rebuilt server/dist/* accordingly.
Improve wake-word reliability and intent normalization, add conversational response and local URL open paths, and harden file resolution fallbacks so common commands avoid vision prompt loops. Made-with: Cursor
Add a laptop-wide disk index to enable local-first retrieval of user files, enhancing the ability to answer knowledge-based questions. Introduce new settings for index configuration and scanning, and update intent handling to support the new ASK_KNOWLEDGE goal. Implement background indexing and provide a menu option for users to rebuild the index. Update documentation to reflect these changes.
- Enhance `.gitignore` to exclude additional local cache and index files. - Modify `preflight.py` to improve startup checks, including a new diagnostic print statement and a more efficient module availability check. - Update SQL schema to include a new `vector` column for embeddings and a partial index for unembedded chunks. - Increment schema version to 2 to reflect changes in the database structure. - Refine file resolution logic to filter out synthetic data-source hits from the candidate list. - Expand README with detailed instructions on indexing options and configurations. - Improve menu bar functionality to better handle index rebuilding and status updates. - Enhance error handling and logging in various components for better diagnostics.
…d SpeechRecognition dependency for wake word functionality. Implemented backtick and Right Option key callbacks for improved command activation. Updated sentence-transformers version constraint to avoid compatibility issues with macOS.
- Introduced regex patterns to strip ANSI escape sequences and CLI end markers from the raw output. - Updated the `_clean_reply` function to improve the processing of model responses, ensuring only the relevant answer is retained. - Added detailed docstring to clarify the cleaning process and the structure of the output.
- Enhanced the command activation process by refining the handling of backtick and Right Option key inputs. - Improved wake word detection reliability to ensure more accurate activation of voice commands. - Updated documentation to reflect changes in command handling and wake word functionality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
some rag fixes and disk access ui fixes