Skip to content

Disk-index-rag#16

Open
THENIROCK wants to merge 23 commits into
cactus-compute:mainfrom
Watcher1223:disk-index-rag
Open

Disk-index-rag#16
THENIROCK wants to merge 23 commits into
cactus-compute:mainfrom
Watcher1223:disk-index-rag

Conversation

@THENIROCK

Copy link
Copy Markdown

some rag fixes and disk access ui fixes

Watcher1223 and others added 23 commits April 18, 2026 13:43
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a Tauri+React UI shell with a WebSocket bridge backend while preserving the existing overlay event contract, and update the Qt overlay visuals plus typing/runtime fixes needed for stable startup.

Made-with: Cursor
Brings in the new Tauri+React voice UI (Ali/ui-app/, ui/web_overlay.py,
wake word demos) while keeping the disk access / file resolution work on
this branch. Only conflict was Ali/requirements.txt, where PyQt6 (island)
and PySide6+websockets (new overlay) deps are both kept.

Made-with: Cursor
… fixes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The visual planner now emits a single new action type — browser_task —
and the orchestrator dispatches it to a vendored Chrome-extension sub-agent
via MCP stdio (executors/browser/agent_client.py). The sub-agent owns the
full agent loop on the user's real Chrome (navigation, DOM reading, tool
calls, confirmation gate); Cactus picks the task description once per voice
command rather than per step.

Changes in Ali/:

  executors/browser/
    agent/                    NEW — vendored Hanzi Browse extension + MCP
                              server (stripped of cloud/license/telemetry
                              surface). Chrome extension runs against your
                              real signed-in Chrome via chrome.debugger.
    agent_client.py           NEW — Python MCP-stdio client. Same surface
                              every executor uses: run_task / get_task /
                              send_message / cancel / poll_until_...
    browser.py                DELETED — replaced by the sub-agent.
    adapters/                 DELETED — site-specific YC Apply Playwright
                              flows. The sub-agent handles these directly
                              from a natural-language task string now.

  scripts/cactus_server.py    NEW — FastAPI sidecar wrapping Cactus's
                              Python SDK (bare ctypes → HTTP). Lets the
                              extension talk to on-device Gemma 4 via
                              provider='cactus' in chrome.storage.local;
                              default is Google AI Studio via
                              provider='google'.

  orchestrator/visual_planner.py
                              Simplified ALLOWED_ACTION_TYPES to browser_task
                              + ask_user + complete + abort. Updated prompt
                              + fallback path to emit browser_task with a
                              complete natural-language task description
                              (plus ${resume}/${contact_X} slot placeholders
                              that the orchestrator resolves before sending).
  orchestrator/orchestrator.py
                              _execute_action now dispatches browser_task to
                              the LocalAgentClient sub-agent. Resolves slot
                              placeholders locally so sensitive paths/names
                              never reach whatever LLM drafts the task
                              string. Relays awaiting_confirmation back to
                              the user and sends their yes/no reply through.
  tests/test_core_contracts.py
                              Updated fallback test to assert the new
                              browser_task delegation.

  requirements.txt            Drops playwright; adds fastapi, uvicorn, mcp
                              (Python MCP client).
  config/settings.py          Adds CACTUS_VL_MODEL / CACTUS_SIDECAR_URL /
                              AGENT_NODE_BIN.
  .gitignore                  Ignores node_modules and the built TS dist
                              under the vendored agent tree.

  docs/cactus-findings.md     NEW — one-day-of-hacking write-up: dylib path
                              fix, actual Python SDK signatures, measured
                              gemma-4-E2B prefill rates on M1 Pro, the
                              read_page DOM-size cliff that pushed the
                              default browser LLM to Gemini 2.5 Flash, and
                              where Cactus actually fits in this product.

Tests: 7/7 Ali contracts pass. Smoke-verified LocalAgentClient spawns the
Node MCP server, connects to the relay, and the Chrome extension registers
as expected.
Adds LocalAgentClient (MCP-stdio) + full browser sub-agent extension.
Resolved conflict by keeping Korin's run_script/author_script/compose_mail
handlers alongside Hanzi's browser_task delegation path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ali: swap Playwright for Chrome-extension browser sub-agent
User-visible changes:
- Remove the "Powered by Hanzi Browse" overlay badge that used to appear
  in the top-right of every page the agent drove. The content script still
  renders the pulsing glow + stop button; only the branded badge is gone.
- Rename [Hanzi Browse] DevTools console-log prefixes to [browser-agent].
- Page titles: Hanzi Browse → Voice Agent — Browser (sidepanel, sidepanel-preact),
  Welcome to Hanzi Browse → Welcome (onboarding).
- MCP server startup log: "Starting Hanzi Browse MCP Server v2.0..." →
  "Starting browser sub-agent MCP server v2.0..."
- Doc headers in service-worker, sidepanel.js, server/index.ts, cli.ts,
  cli/exit-codes.ts all dropped the brand.

Left alone on purpose (changing these breaks runtime or costs nothing in
demo optics):
  - com.hanzi_browse.oauth_host native-messaging host ID (bundle ID)
  - HANZI_BROWSE_* env var names (compat with existing env)
  - ~/.hanzi-browse session dir (compat with existing session files)
  - Chrome Web Store EXTENSION_URL fallback
  - integration.test.ts references (tests we don't run)
  - UPSTREAM.md / cactus-findings.md factual origin references

Rebuilt server/dist/* accordingly.
Improve wake-word reliability and intent normalization, add conversational response and local URL open paths, and harden file resolution fallbacks so common commands avoid vision prompt loops.

Made-with: Cursor
Add a laptop-wide disk index to enable local-first retrieval of user files, enhancing the ability to answer knowledge-based questions. Introduce new settings for index configuration and scanning, and update intent handling to support the new ASK_KNOWLEDGE goal. Implement background indexing and provide a menu option for users to rebuild the index. Update documentation to reflect these changes.
- Enhance `.gitignore` to exclude additional local cache and index files.
- Modify `preflight.py` to improve startup checks, including a new diagnostic print statement and a more efficient module availability check.
- Update SQL schema to include a new `vector` column for embeddings and a partial index for unembedded chunks.
- Increment schema version to 2 to reflect changes in the database structure.
- Refine file resolution logic to filter out synthetic data-source hits from the candidate list.
- Expand README with detailed instructions on indexing options and configurations.
- Improve menu bar functionality to better handle index rebuilding and status updates.
- Enhance error handling and logging in various components for better diagnostics.
…d SpeechRecognition dependency for wake word functionality. Implemented backtick and Right Option key callbacks for improved command activation. Updated sentence-transformers version constraint to avoid compatibility issues with macOS.
- Introduced regex patterns to strip ANSI escape sequences and CLI end markers from the raw output.
- Updated the `_clean_reply` function to improve the processing of model responses, ensuring only the relevant answer is retained.
- Added detailed docstring to clarify the cleaning process and the structure of the output.
- Enhanced the command activation process by refining the handling of backtick and Right Option key inputs.
- Improved wake word detection reliability to ensure more accurate activation of voice commands.
- Updated documentation to reflect changes in command handling and wake word functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants