v0.4.0 -- per-edge LLM scoring + view_tree nested JSON#10
Merged
Conversation
…_sql - recall_memory docstring: prepend "Primary recall tool" framing. - view_tree docstring: add explicit "when to use" line, named as the entity-driven default; positioned ahead of search_sql for any "what's around this entity" question. - search_sql docstring: tightened to one assertive line — aggregates only, never for recall/discovery/understanding/neighbourhood. - system_prompt.md TOOL PRIORITY block: promote view_tree from a buried bullet to a named slot (#2), making the entity-driven vs query-driven split explicit. SQL stays as #5 exception. - skills/braindb-agent/SKILL.md: prose paragraph rewritten as a numbered priority list matching the other skill's shape. - skills/braindb/SKILL.md: split tree out of the "structure lookups" bullet into its own #2 slot; SQL bullet explicitly forbids "around this entity" questions (those are tree's job). - BRAINDB_GUIDE.md: added a top-level ⚠ TOOL PRIORITY block so the reference doc isn't the weakest spot in the guidance chain. Net token impact on agent prompts: ~+20 tokens per call (one extra phrase in three tool docstrings). Skill markdowns and the guide are not loaded into prompts. No behaviour change in any code path; this is pure messaging.
…ultiplier Two related fixes to the recall ranking, both confirmed against live data where every depth-1 graph result was pinning at the same literal 0.27 regardless of which seed it came from. 1. Propagate seed similarity through graph hops. The graph CTE now carries a `seed_origin_id` column from each seed down through every recursive row. In context.py, the score component for a graph-discovered (non-seed) entity is now the score of its origin seed, inherited via that column, instead of the literal `0.3` fallback that made every graph-only entity rank identically. Before this fix a perfect-match seed (sim=1.0) and a weak-match seed (sim=0.3) produced the same depth-1 neighbour rank. Worse: a weak seed (sim<0.27) was outranked by its own neighbours because the fallback floor was higher than the seed's real score. Both gone now. 2. Soften the depth multiplier. The hardcoded depth step in the recursive CTE goes from [1.0 / 0.6 / 0.3] to [1.0 / 0.8 / 0.6]. Deeper hops still decay but no longer collapse — depth-2 and depth-3 items can now reach final_rank values that exceed the min_relevance threshold and surface in results, instead of vanishing as they do today. Net effect (for a seed with similarity 1.0): depth 0: 1.00 -> 1.00 (unchanged) depth 1: 0.27 -> 0.80 depth 2: 0.12 -> 0.51 depth 3: 0.05 -> 0.31 For a seed with similarity 0.5: depth 0: 0.50 -> 0.50 (unchanged) depth 1: 0.27 -> 0.40 (now correctly lower than seed) depth 2: 0.12 -> 0.26 depth 3: 0.05 -> 0.15 No new tables, no migration, no config flags, no module reorganisation. Just two surgical edits: one extra column in the CTE, one Python lookup swap, two constants nudged up.
Round-1 elevated view_tree under a category label ("default for entity-
driven neighbourhood exploration"). Benchmarking showed both Claude (via
the curl skill) and the in-house Qwen agent picked tree zero times across
5 questions x 2 paths = 10 runs. The category framing was too abstract.
This round describes WHAT tree does (capability) with a SUGGESTIVE "when"
hint, not a rigid trigger. The agent keeps full judgment about whether to
use it — we just make the value clearer:
reveals an entity's connections in one call: relations + 1-N hop
neighbours + edge scores. Especially useful when you have an entity ID
(from a previous result) and want its graph context.
No "INSTEAD OF" commands. No decision-rule blocks. No examples. The
shape mirrors what already worked for the search_sql demotion (capability
+ bounded use, agent decides).
Net token impact on the agent's system prompt: ~-20 to -50 tokens (this
is a shrink, not a bloat). Same edit applied across:
- braindb/agent/tools.py (view_tree docstring — in agent prompt)
- braindb/agent/prompts/system_prompt.md (TOOL PRIORITY block entry #2)
- skills/braindb/SKILL.md (user-facing skill)
- skills/braindb-agent/SKILL.md (user-facing skill)
- BRAINDB_GUIDE.md (reference guide top block)
No code changes. No behavioural change in any code path. Pure messaging.
Verified separately on the same benchmark question set next.
…abels Three minimal fixes to the agent's view_tree tool. Round-2a benchmarks showed Claude started using tree (0/5 -> 3/5) but the Qwen agent didn't, and even on Path A one of three tree calls (q4) didn't pay off. Looking at the actual implementation revealed why: tree was advertised with a max_depth argument but ignored it (single-hop SQL), so an agent asking view_tree(id, max_depth=2) only got depth-1 connections. Fixes: 1. max_depth respected. Single-hop SQL replaced with a recursive CTE that walks bidirectionally (as the single-hop already did via the OR clause) and stops at the requested depth. A cycle-visited array prevents loops. 2. Depth grouping in output. "DEPTH N (count):" headers between sections. Within a depth, rows sorted by edge_score desc. Same line shape as before; only headers are new. 3. Wiki labels use canonical_name. The wiki:meta comment header was being truncated as if it were content body. Extract canonical_name via a small regex; everything else keeps the existing 80-char content truncation. No graph.py change. No system_prompt change (already committed in round-2a). No schema change. ~35 lines net in tools.py. Verified separately on the same 5 benchmark questions next.
… tool Round-2a benchmarks showed the Qwen agent (Path B) still picked tree 0/5 times despite the new wording. Hypothesis: "graph context" reads as a niche specialist feature to a smaller model, so it falls back to recall. The reframing makes tree sound like the GENERAL tool for the thing the agent actually wants to do once it has an entity in hand: explore around it. "Explore around this entity" is the everyday framing; "graph context" is the jargon framing. Same line count, same shape, no bloat. Just a verb change in the system_prompt.md TOOL PRIORITY block entry #2. Verified separately in the next benchmark.
…our drift)
Round-2c benchmarks revealed that "tree" was served by two divergent
implementations: the HTTP endpoint (routers/memory.py::entity_tree) did
a single-hop SQL that silently ignored max_depth, while the agent's
view_tree tool ran a proper recursive CTE. Same name, same input,
different behaviour. Path A (HTTP) and Path B (agent tool) were not
looking at the same data.
This commit extracts one source of truth:
- braindb/services/tree.py NEW: build_entity_tree(conn, entity_id,
max_depth) recursive CTE walks bidirectionally and respects
max_depth. Returns {"root": {...}, "connections": [...]} with the
same shape the HTTP endpoint always advertised — the frontend Graph
tab keeps reading it unchanged.
- routers/memory.py::entity_tree shrinks from ~60 lines to 8: just
calls build_entity_tree.
- agent/tools.py::view_tree shrinks: drops its own recursive CTE
(added in bccf2b4), calls build_entity_tree, keeps only the text
rendering (depth headers, [out]/[in] arrows, _tree_label for wikis).
Behavioural effects:
- HTTP /memory/tree/<id>?max_depth=N now actually walks N hops. Quick
spot check: tree on the "value-investing" keyword used to return ~20
depth-1 connections; now returns 156 connections (20 d1 + 136 d2).
- Frontend Graph tab: same field names, same direction values
("outgoing"/"incoming"), more nodes visible at depth 2. No JS change
needed.
- Agent view_tree tool: returns the same text shape we shipped in
bccf2b4; underlying data now comes from the shared service.
Tests: tests/test_search.py — all 6 tests pass (shape-agnostic check
on /memory/tree was already there; refactor preserves the shape).
Net diff: +60 / -90 across 3 files. Code SHRINK.
…es both scores * tools.py: create_relation gains importance_score parameter; INSERT writes both relevance_score and importance_score (column was NULL for all agent-created rows since day one). * ingest_watcher.py: stripped dictated certainty/importance/relevance_score literals from chunk-extraction and central_review prompts -- LLM judges per the tool docstring. AGENT_TIMEOUT now env-overridable, default 1200. * services/graph.py: per-hop multiplier now r.relevance_score * COALESCE(r.importance_score, 0.5) * depth_penalty; is_bidirectional dropped from the JOIN (always walks both directions, matches tree.py). * system_prompt.md: importance_score added to create_relation param list. * 4 new tests lock the behaviour: persistence, watcher-no-dictation, importance_score moves rank, unidirectional edges walk backwards. All 142 tests pass. Path A bench 5/5 PASS in 14s (zero view_tree). Path B bench 5/5 PASS in 1090s at 1200s timeout. Variance verified on live ingest of the AI Dark Output article.
* pyproject.toml: add pytest-asyncio==0.23.7 to [dev]. Existing tests use @pytest.mark.asyncio decorators (test_handoff_hooks, test_runhooks_countdown, test_final_answer_rename) but the plugin was not listed in deps, so `pip install -e ".[dev]"` left them skipped silently on a clean install. * tests/test_ingest.py: the three datasource-ingest tests used fixed content strings, so a previous run's row in the DB caused dedup-by-hash to fire and the 201 assertion to fail on subsequent runs. Prepend a per-run uuid to the content so each invocation is genuinely fresh. No production-code change. 134/134 pass (8 wiki_jobs_grouping deselected — those use the host port mapping; they run from the host per tests/README.md, not from inside the api container). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ired wikis * services/tree.py: one build_entity_tree function. Recursive CTE carries parent_id + accumulated_score (relevance × COALESCE( importance_score, 0.5) × depth_penalty, same formula as graph.py). DISTINCT ON (id) ORDER BY id, accumulated_score DESC -- multi-path first-wins by best score. Skip tagged_with edges + target.entity_type='keyword' by default; skip wikis_ext.retired_at IS NOT NULL. New shape: root keyed by entity_type, children arrays per node, _truncated last-child marker. * routers/memory.py: /memory/tree/<id> returns the nested shape; new query params include_keywords, top_k (default 40), min_path_score. * agent/tools.py::view_tree: returns json.dumps(tree) directly; _tree_label helper removed. * system_prompt.md: view_tree blurb updated to describe nested JSON. Path A 5/5 PASS, view_tree 0/5 -> 1/5 (the agent reaches for tree now that the shape is structured). Path B 5/5 PASS, 1090s -> 773s (-29%), 54 tool calls -> 40 (-26%), zero delegate calls on q4 (was 2). Two latent bugs caught by the new shape and fixed in this commit: keyword children leaking through non-tagged_with edges; duplicate retired-wiki siblings. Frontend Graph tab will be broken until graph.js consumes the new shape -- follow-up commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps pyproject.toml 0.2.0 -> 0.4.0 (catches up from the v0.3.0 release which shipped without a pyproject bump). Adds the CHANGELOG entry. Refreshes user-facing docs (README, BRAINDB_GUIDE, both skills) for the new nested-JSON tree shape. No DB migration. No env-var changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v0.4.0 — per-edge LLM scoring + view_tree nested JSON
Headline: a focused pass on recall quality and the
view_treetool. The per-edge LLM judgment that was missing oncreate_relationis now wired through to graph scoring, andview_treereturns a nested JSON tree the agent can actually navigate (vs the depth-grouped text that silently clipped 70% of connections on popular wikis).Changed
view_tree/GET /api/v1/memory/tree/<id>— nested JSON shape. Root keyed byentity_type,childrenarrays per node, multi-path first-wins by best accumulated path score, keyword + retired-wiki noise filtered by default,_truncatedlast-child marker when more remain. One shared builder (build_entity_treeinbraindb/services/tree.py) for the HTTP endpoint and the agent tool. New optional query params:include_keywords(defaultfalse),top_k(default40),min_path_score(default0.0).create_relationwrites both edge scores. Theimportance_scorecolumn had been NULL for every agent-created row since day one; the parameter is now on the tool, the watcher's extraction prompt no longer dictates literal score values (the LLM judges per docstring), and the graph CTE multipliesrelevance_score × COALESCE(importance_score, 0.5) × depth_penaltyper hop.is_bidirectionalis now ignored by graph traversal — every edge walks both ways.system_prompt.md, both skill files,README.md, andBRAINDB_GUIDE.mdfor the new tree shape.Fixed
view_treekeyword noise through non-tagged_withedges.view_treeduplicate retired-wiki siblings — tree CTE now skipswikis_ext.retired_at IS NOT NULL.tests/test_ingest.py— content is uuid'd per run.pytest-asyncio==0.23.7to[dev].Bench
view_treeusage 0 → 1-2 calls (the structured shape is now usable in practice)./agent/query): 5/5 PASS, −25% wall-clock, −26% tool calls, zerodelegatecalls on the hardest question (was 2). Numbers inbenchmarks/runs/round-2f_comparison.md.Upgrading from v0.3.0
No DB migration. No env-var changes. The wiki maintainer's existing retired-wiki pipeline now also gates
view_treetraversal.pyproject.tomlversion field was at0.2.0in v0.3.0's tagged release (the bump was missed); this release catches it up to0.4.0.Test plan
docker exec).