diff --git a/CHANGELOG.md b/CHANGELOG.md
index a676740..ee114e4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,29 @@
 # Changelog
 
+## [2025-10-17T06:42:38-04:00 (America/New_York)]
+### Changed
+- Added `entity_label` to triplet CSV rows generated by `scripts/generate_synthetic_dataset.py` and refreshed ingestion
+  documentation (`docs/retrieval.md`, `README.md`, `docs/operations.md`, `docs/testing.md`, `SETUP.md`) plus planning collateral
+  (`PROJECT.md`, `PLAN.md`, `ROADMAP.md`, `SOT.md`, `ENVIRONMENT_NEEDS.md`, `NEEDED_FOR_TESTING.md`, `PLANNING_THOUGHTS.md`,
+  `ISSUES.md`, `TODO.md`, `RESUME_NOTES.md`) so synthetic dataset guidance stays accurate.
+
+## [2025-10-16T22:44:21-04:00 (America/New_York)]
+### Changed
+- Simplified roadmap section headings in `ROADMAP.md` by removing week estimates from the horizon labels to
+  emphasise qualitative prioritisation.
+
+## [2025-10-16T21:44:46-04:00 (America/New_York)]
+### Added
+- Documented a synthetic dataset ingestion workflow in `docs/retrieval.md` (including sample loader code) so benchmarking
+  runs can hydrate graph drivers without recomputing embeddings.
+
+### Changed
+- Expanded operations, setup, and environment guides (`docs/operations.md`, `SETUP.md`, `ENVIRONMENT_NEEDS.md`,
+  `NEEDED_FOR_TESTING.md`) with batching/verification tips for loading generated JSONL/CSV corpora.
+- Updated core documentation and planning artifacts (`README.md`, `PROJECT.md`, `PLAN.md`, `ROADMAP.md`, `SOT.md`,
+  `RECOMMENDATIONS.md`, `PLANNING_THOUGHTS.md`, `ISSUES.md`, `RESUME_NOTES.md`, `TODO.md`) to reference the ingestion workflow
+  and capture the follow-up automation task.
+
 ## [2025-10-16T20:39:06-04:00 (America/New_York)]
 ### Added
 - Added live integration coverage for Memgraph, Neo4j, and Redis via `meshmind/tests/test_integration_live.py` and configured
diff --git a/ENVIRONMENT_NEEDS.md b/ENVIRONMENT_NEEDS.md
index 95abca7..0b8ed3e 100644
--- a/ENVIRONMENT_NEEDS.md
+++ b/ENVIRONMENT_NEEDS.md
@@ -24,7 +24,10 @@
   consolidation heuristics and pagination under load. The new
   `scripts/generate_synthetic_dataset.py` utility produces JSONL/CSV corpora
   (defaults: 10k memories, 20k triplets, 384-dim embeddings) that can be copied to
-  shared storage for on-demand benchmarking.
+  shared storage for on-demand benchmarking. Triplet rows now embed `entity_label`,
+  so pairing the shared datasets with the ingestion workflow documented in
+  `docs/retrieval.md` lets operators seed environments quickly without recomputing
+  embeddings or rewriting CSV headers.
 - Maintain outbound package download access to PyPI and vendor repositories; this
   session confirmed package installation works when the network is open, and future
   sessions need the same capability to refresh locks or install new optional
diff --git a/ISSUES.md b/ISSUES.md
index db7bf9b..adba87d 100644
--- a/ISSUES.md
+++ b/ISSUES.md
@@ -35,7 +35,9 @@
 
 - [ ] Validate the new Docker Compose stacks (root and `meshmind/tests/docker/`) on an environment with container support and document host requirements (ports, resources).
 ## Low Priority / Nice to Have
+- [x] Align synthetic dataset triplet CSV headers with `Triplet` schema (added `entity_label`) and refresh ingestion docs.
+- [x] Remove week-based horizon estimates from roadmap headings to avoid implying precise delivery dates in planning docs.
 - [x] Offer alternative storage backends (in-memory driver, SQLite, etc.) for easier local development.
 - [x] Provide an administrative dashboard or CLI commands for listing namespaces, counts, and maintenance statistics (CLI admin subcommands now expose predicates, telemetry, and graph checks).
-- [ ] Publish onboarding guides and troubleshooting FAQs for contributors.
+- [ ] Publish onboarding guides and troubleshooting FAQs for contributors (synthetic dataset ingestion docs landed in `docs/retrieval.md`, but a broader newcomer guide is still pending).
 - [ ] Explore plugin registration for embeddings and retrieval strategies to reduce manual wiring.
diff --git a/NEEDED_FOR_TESTING.md b/NEEDED_FOR_TESTING.md
index c58f70f..9257f22 100644
--- a/NEEDED_FOR_TESTING.md
+++ b/NEEDED_FOR_TESTING.md
@@ -69,7 +69,7 @@
   external services are unavailable.
 - Use `meshmind/testing` fakes (`FakeMemgraphDriver`, `FakeRedisBroker`, `FakeEmbeddingEncoder`, `FakeLLMClient`) in tests or demos to eliminate external infrastructure requirements. Integration suites marked with `@pytest.mark.integration` exercise live Memgraph/Neo4j/Redis instances and expect the docker stack to be running.
 - Invoke `meshmind admin predicates` and `meshmind admin maintenance --max-attempts <n> --base-delay <seconds> --run <task>` during local runs to inspect predicate registries, telemetry, and tune maintenance retries without external services.
-- Use the benchmarking utilities in `scripts/` (`evaluate_importance.py`, `consolidation_benchmark.py`, `benchmark_pagination.py`) to validate heuristics and driver performance offline before connecting to live infrastructure. Generate large corpora with `scripts/generate_synthetic_dataset.py` when you need ≥10k memories for stress tests.
+- Use the benchmarking utilities in `scripts/` (`evaluate_importance.py`, `consolidation_benchmark.py`, `benchmark_pagination.py`) to validate heuristics and driver performance offline before connecting to live infrastructure. Generate large corpora with `scripts/generate_synthetic_dataset.py` when you need ≥10k memories for stress tests; triplet CSV rows now ship with `entity_label`, so the ingestion workflow in `docs/retrieval.md` can hydrate graph drivers without extra mutation.
 - Seed demo data as needed using the `examples/extract_preprocess_store_example.py` script after configuring environment
   variables.
 - Create a `.env` file storing the environment variables above for consistent local configuration.
diff --git a/PLAN.md b/PLAN.md
index ac41fde..c2f62b8 100644
--- a/PLAN.md
+++ b/PLAN.md
@@ -1,5 +1,7 @@
 # Plan of Action
 
+Roadmap milestones now reference qualitative horizons (Near/Mid/Long-Term) instead of week estimates to focus this plan on sequencing rather than timeboxing.
+
 ## Phase 1 – Stabilize Runtime Basics ✅
 1. **Dependency Guards** – Implemented lazy driver factories, optional imports, and clear ImportErrors for missing packages.
 2. **Default Encoder Registration** – Bootstraps register encoders/entities automatically and the CLI invokes them on startup.
@@ -20,7 +22,8 @@
 2. **Maintenance Tasks** – Tasks emit telemetry, persist consolidation/compression results, and now retry conflicting writes with
    configurable exponential backoff (`MAINTENANCE_MAX_ATTEMPTS`, `MAINTENANCE_BASE_DELAY_SECONDS`). Synthetic benchmark scripts,
    the new `scripts/generate_synthetic_dataset.py`, and integration tests against live Memgraph/Neo4j validate behaviour on larger
-   workloads; next, replay production-like datasets to tune thresholds.
+   workloads. Fresh documentation in `docs/retrieval.md` and `docs/operations.md` now describes how to ingest those synthetic datasets
+   (with triplet CSVs that include `entity_label`) into the target backend; next, replay production-like datasets to tune thresholds.
 3. **Importance Scoring Improvements** – Heuristic scoring is live, records distribution metrics via telemetry, and ships with
    `scripts/evaluate_importance.py` for synthetic/offline evaluation. Next: incorporate real feedback loops or LLM-assisted
    ranking to tune weights over time.
diff --git a/PLANNING_THOUGHTS.md b/PLANNING_THOUGHTS.md
index 8942954..1d165db 100644
--- a/PLANNING_THOUGHTS.md
+++ b/PLANNING_THOUGHTS.md
@@ -14,7 +14,7 @@
 - **Pydantic Model Policy** – Follow the documented plan (target Pydantic 2.12+, refresh locks when 3.13 wheels land, record migration guidance) to avoid resurrecting compatibility shims.
 
 ## Upcoming Research
-- Benchmark consolidation heuristics on synthetic datasets representing customer scale and capture telemetry snapshots (seed data via `scripts/generate_synthetic_dataset.py`).
+- Benchmark consolidation heuristics on synthetic datasets representing customer scale and capture telemetry snapshots (seed data via `scripts/generate_synthetic_dataset.py`—whose triplet CSV now includes `entity_label`—and load it using the ingestion workflow documented in `docs/retrieval.md`).
 - Compare graph query latency across in-memory, SQLite, Memgraph, and Neo4j drivers when using pagination and filtering.
 - Evaluate rerank quality across LLM providers using a labelled evaluation set to determine optimal default models.
 - Investigate options for secure secret storage (e.g., Vault, AWS Secrets Manager) to standardise API key management.
diff --git a/PROJECT.md b/PROJECT.md
index ee60a39..cc319c3 100644
--- a/PROJECT.md
+++ b/PROJECT.md
@@ -78,7 +78,7 @@
 - Docker Compose now provisions Memgraph, Neo4j, and Redis; integration-specific stacks (including the Celery worker) live under
   `meshmind/tests/docker/`. `pytest -m integration` exercises live services once the stack is running. See `ENVIRONMENT_NEEDS.md`
   and `SETUP.md` for enabling optional services locally.
-- `scripts/generate_synthetic_dataset.py` produces large JSONL/CSV corpora (defaults: 10k memories, 20k triplets, 384-dim embeddings) to stress retrieval and consolidation flows prior to ingesting real datasets.
+- `scripts/generate_synthetic_dataset.py` produces large JSONL/CSV corpora (defaults: 10k memories, 20k triplets, 384-dim embeddings) to stress retrieval and consolidation flows prior to ingesting real datasets. Triplet rows ship with `entity_label` so the ingestion workflow documented in `docs/retrieval.md` hydrates graph drivers without additional preprocessing.
 
 ## Roadmap Highlights
 - Push graph-backed retrieval deeper into the drivers (vector similarity, structured filters) so the new server-side filtering/pagination evolves into full backend-native ranking.
diff --git a/README.md b/README.md
index af6f219..6861178 100644
--- a/README.md
+++ b/README.md
@@ -202,7 +202,11 @@ Tasks instantiate the driver lazily, emit structured logs/metrics, and persist c
 ## Benchmarking & Evaluation
 - **Synthetic dataset generation** – `scripts/generate_synthetic_dataset.py` creates large JSONL/CSV corpora of
   memories/triplets (defaults: 10k memories, 20k triplets, 384-dim embeddings) so you can stress retrieval, consolidation,
-  and integration flows before ingesting real data.
+  and integration flows before ingesting real data. Triplet rows now ship with `entity_label` to match
+  `meshmind.core.types.Triplet`.
+- **Synthetic dataset ingestion** – Follow the workflow documented in `docs/retrieval.md` to load the generated JSONL/CSV
+  payloads into MeshMind via the Python client. The operations guide walks through batching tips and post-ingestion
+  verification so benchmark runs start from a consistent baseline.
 - **Importance scoring** – `scripts/evaluate_importance.py` runs the heuristic against JSON or synthetic datasets and reports
   descriptive statistics for quick regression checks.
 - **Consolidation throughput** – `scripts/consolidation_benchmark.py` generates synthetic workloads to measure batch merging
diff --git a/RECOMMENDATIONS.md b/RECOMMENDATIONS.md
index ffb65ba..3600b01 100644
--- a/RECOMMENDATIONS.md
+++ b/RECOMMENDATIONS.md
@@ -30,7 +30,9 @@
 
 ## Documentation & Onboarding
 - Keep `README.md`, `SOT.md`, `docs/`, and onboarding guides synchronized with each release; document rerank, retrieval, and
-  registry flows with diagrams when possible.
+  registry flows with diagrams when possible. The new synthetic dataset ingestion workflow in `docs/retrieval.md` should be
+  incorporated into future onboarding materials.
+- Keep roadmap horizons qualitative (Near/Mid/Long-Term) instead of week-based estimates so planning docs emphasise sequencing and flexibility.
 - Maintain the troubleshooting section for optional tooling (ruff, pyright, typeguard, toml-sort, yamllint) now referenced in
   the Makefile and expand it as new developer utilities are introduced. Keep `SETUP.md` synchronized when dependencies change.
 - Provide walkthroughs for configuring LLM reranking, including sample prompts and response expectations.
diff --git a/RESUME_NOTES.md b/RESUME_NOTES.md
index 6d3726f..607fafd 100644
--- a/RESUME_NOTES.md
+++ b/RESUME_NOTES.md
@@ -10,9 +10,11 @@
 
 ## Latest Changes
 
+- Removed week-based estimates from roadmap section headings and refreshed planning docs (`PLAN.md`, `SOT.md`, `RECOMMENDATIONS.md`, `ISSUES.md`, `TODO.md`) to emphasise qualitative sequencing.
 - Added live integration coverage (`meshmind/tests/test_integration_live.py`) for Memgraph, Neo4j, and Redis, introduced a pytest marker configuration, and documented the workflow across README/SETUP/docs.
 - Generated a fresh `uv.lock`, pinned `.python-version` to 3.12, and updated install docs to standardise on `uv sync --all-extras`.
-- Created `scripts/generate_synthetic_dataset.py` for large JSONL/CSV corpora and referenced it across benchmarking docs.
+- Created `scripts/generate_synthetic_dataset.py` for large JSONL/CSV corpora, added `entity_label` to triplet CSV rows, and referenced it across benchmarking docs.
+- Documented the synthetic dataset ingestion workflow across `docs/retrieval.md`, `docs/operations.md`, README, and supporting planning guides so benchmarks can load corpora without recomputing embeddings.
 - Updated documentation and planning collateral (README.md, SETUP.md, docs/development.md, docs/testing.md, docs/operations.md, PROJECT.md, PLAN.md, RECOMMENDATIONS.md, ROADMAP.md, ENVIRONMENT_NEEDS.md, NEEDED_FOR_TESTING.md, SOT.md, PLANNING_THOUGHTS.md, DUMMIES.md, TODO.md, RESUME_NOTES.md) to reflect the integration workflow, dataset generation, and the new Pydantic policy.
 
 ## Environment State
@@ -26,5 +28,5 @@
 1. Address remaining `TODO.md` priority items (backend-native vector similarity, Celery worker integration, grpcurl end-to-end tests) now that graph services are accessible locally.
 2. Automate the integration suite in CI and capture resource requirements for shared infrastructure.
 3. Prepare grpcurl-based smoke tests for `meshmind serve-grpc` and plan protobuf client packaging once integration coverage extends beyond the Python stub.
-4. Feed findings from large synthetic datasets into retry/backoff defaults and document recommended values in `ENVIRONMENT_NEEDS.md`.
+4. Feed findings from large synthetic datasets into retry/backoff defaults and document recommended values in `ENVIRONMENT_NEEDS.md`, validating the new ingestion workflow as part of those runs.
 5. Continue tracking shim retirements in `DUMMIES.md` and follow the cleanup plan in `CLEANUP.md` so remaining fakes can be removed when infrastructure allows.
diff --git a/ROADMAP.md b/ROADMAP.md
index be874ae..1c8d38d 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -5,21 +5,21 @@
 - Support multiple graph backends (in-memory, SQLite, Memgraph, Neo4j) with consistent telemetry, maintenance, and LLM orchestration knobs.
 - Provide developers with reproducible tooling, comprehensive documentation, and automation scripts that keep local and CI environments aligned.
 
-## Near-Term (0–2 Weeks)
+## Near-Term
 - Automate the new integration suite (`pytest -m integration`) in CI so Memgraph/Neo4j/Redis regressions fail fast.
-- Finalize maintenance write policies by implementing retry/backoff semantics and measuring consolidation accuracy against representative datasets (now aided by `scripts/generate_synthetic_dataset.py`).
+- Finalize maintenance write policies by implementing retry/backoff semantics and measuring consolidation accuracy against representative datasets (now aided by `scripts/generate_synthetic_dataset.py`, whose triplet CSV exposes `entity_label`, and the documented ingestion workflow in `docs/retrieval.md`).
 - Publish ROADMAP and PLANNING_THOUGHTS artifacts, and seed the `research/` folder with competitive analysis to ground prioritization discussions.
 - Expand automated smoke tests for REST `/memories/counts`, CLI `meshmind admin counts`, and provisioning scripts to ensure guardrails stay trustworthy.
 - Capture outstanding shim retirement work (FastAPI tests now live; continue tracking FakeLLM/Fake drivers) in CLEANUP.md with precise acceptance criteria for each removal.
 
-## Mid-Term (2–6 Weeks)
+## Mid-Term
 - Run load tests against SQLite and hosted graph backends to tune pagination defaults, consolidation heuristics, and token compression strategies.
 - Implement backend-native vector similarity queries and schema indexes so embeddings never leave the database during scoring.
 - Finalise the gRPC surface by building on the new asyncio server helpers—exercise the `meshmind serve-grpc` CLI entry point within Docker Compose, publish generated clients (Python + additional languages), and add integration smoke tests so external agents can integrate without the in-process stub.
 - Instrument observability exports (Prometheus/OpenTelemetry) and wire dashboards/alerts for ingestion latency, queue depth, and error rates.
 - Replace compatibility shims with official Pydantic/FastAPI packages once dependency constraints are lifted, and backfill validation coverage.
 
-## Long-Term (6+ Weeks)
+## Long-Term
 - Build evaluation loops—analytics dashboards and LLM-assisted reviews—that continuously score memory importance heuristics and rerank quality.
 - Introduce human-in-the-loop tooling for conflict resolution, allowing operators to approve merges or override automated maintenance plans.
 - Explore federated deployments that synchronise multiple MeshMind instances, including replication strategies and eventual-consistency guarantees.
diff --git a/SETUP.md b/SETUP.md
index 942497a..b574dae 100644
--- a/SETUP.md
+++ b/SETUP.md
@@ -80,7 +80,10 @@ docker compose -f meshmind/tests/docker/memgraph.yml up -d
 ```
 
 > Need synthetic load? Run `python scripts/generate_synthetic_dataset.py build/datasets/benchmark`
-> to seed JSONL/CSV fixtures before loading them into Memgraph/Neo4j for stress tests.
+> to seed JSONL/CSV fixtures before loading them into Memgraph/Neo4j for stress tests. Triplet rows
+> now include `entity_label`, so the ingestion workflow in `docs/retrieval.md` can materialize
+> `Triplet` models without mutating CSV fields. Follow the ingestion steps when copying fixtures so
+> benchmarks reuse the same namespace/layout.
 
 ### 3.2 Cleaning up
 
diff --git a/SOT.md b/SOT.md
index 4f0d8d6..f20dbdc 100644
--- a/SOT.md
+++ b/SOT.md
@@ -28,11 +28,12 @@ Supporting assets:
 - `SETUP.md`: End-to-end provisioning instructions covering Python deps, environment variables, and Compose workflows.
 - `run/install_setup.sh`, `run/maintenance_setup.sh`: Automation scripts for provisioning fresh environments and refreshing cached workspaces.
 - `scripts/evaluate_importance.py`, `scripts/consolidation_benchmark.py`, `scripts/benchmark_pagination.py`: Evaluation and benchmarking tools for importance heuristics, consolidation throughput, and driver pagination performance.
-- `scripts/generate_synthetic_dataset.py`: Produces large JSONL/CSV corpora (defaults: 10k memories, 20k triplets, 384-dim embeddings) for integration and benchmark scenarios.
+- `scripts/generate_synthetic_dataset.py`: Produces large JSONL/CSV corpora (defaults: 10k memories, 20k triplets, 384-dim embeddings) for integration and benchmark scenarios. Triplet rows include `entity_label`, so the ingestion workflow in `docs/retrieval.md` stores the generated payloads without recomputing embeddings or mutating CSV fields.
 - `.github/workflows/ci.yml`: GitHub Actions workflow running linting/formatting checks and pytest.
 - `pyproject.toml`: Project metadata and dependency list (pins Python `>=3.11,<3.13`; see compatibility notes in `ISSUES.md`).
 - Documentation (`PROJECT.md`, `PLAN.md`, `SOT.md`, `README.md`, etc.) describing the system and roadmap.
 - Strategic context (`ROADMAP.md`, `PLANNING_THOUGHTS.md`, `research/overview.md`) summarising milestones, planning questions, and competitor analysis.
+  Roadmap horizons now use qualitative labels (Near/Mid/Long-Term) without week estimates to emphasise sequencing over exact timing.
 - `DUMMIES.md`: Catalog of temporary shims (REST/gRPC stubs, Celery dummies, fake drivers) with removal guidance and a retired
   section for historical compatibility layers.
 
diff --git a/TODO.md b/TODO.md
index 6bd8279..ad6167b 100644
--- a/TODO.md
+++ b/TODO.md
@@ -2,6 +2,8 @@
 
 ## Completed
 
+- [x] Ensure `scripts/generate_synthetic_dataset.py` emits `entity_label` for triplet CSV rows and refresh ingestion docs.
+- [x] Remove week estimate qualifiers from roadmap horizon headings to keep milestone labels qualitative.
 - [x] Implement dependency guards and lazy imports for optional packages (`pymgclient`, `tiktoken`, `celery`, `sentence-transformers`).
 - [x] Add bootstrap helper for default encoder registration and call it from the CLI.
 - [x] Update OpenAI encoder implementation to align with latest SDK responses and retry semantics.
@@ -73,6 +75,7 @@
 - [x] Add packaging tests to guarantee `meshmind/protos/memory_service.proto` ships with the distribution and exposes the expected service definition.
 - [x] Document runtime and operational guidance for the gRPC server across README, SETUP, `docs/api.md`, and `docs/operations.md`.
 - [x] Add Makefile and CI targets (`make protos`, `make protos-check`) plus scripts to regenerate/verify protobuf bindings, failing CI when drift occurs.
+- [x] Document ingestion workflows for the synthetic dataset generator across `docs/retrieval.md` and operations guides so benchmarking instructions stay cohesive.
 - [x] Replace the REST stub with the concrete FastAPI application and migrate smoke tests to `fastapi.testclient.TestClient`.
 - [x] Remove Celery dummy fallbacks by requiring the real app/beat imports and keeping docker-compose stacks in sync.
 - [x] Add a `serve-grpc` CLI subcommand and verify it delegates to the runtime helpers.
@@ -95,9 +98,9 @@
 - [ ] Add integration tests that spin up `meshmind serve-grpc` and exercise ingestion/search via grpcurl to complement the unit-level coverage (blocked until network-accessible infrastructure is ready).
 - [ ] Publish protobuf-generated client artifacts (Python wheel or language-neutral bundles) so external services can consume the API once infrastructure is available.
 - [ ] Automate the live integration suite (`pytest -m integration`) in CI so Memgraph/Neo4j/Redis regressions fail fast.
-- [ ] Document ingestion workflows for the synthetic dataset generator across `docs/retrieval.md` and operations guides so benchmarking instructions stay cohesive.
 - [ ] Document the retired REST/Celery shims in release notes and communicate migration steps to downstream integrators.
 - [ ] Capture gRPC CLI usage examples (including docker-compose orchestration) in `docs/api.md` and `docs/operations.md` once integration smoke tests complete.
+- [ ] Automate ingestion of synthetic dataset payloads (JSONL/CSV) via a CLI or script wrapper so benchmarking runs do not require custom snippets.
 
 ## Recommended Waiting for Approval Tasks
 
diff --git a/docs/operations.md b/docs/operations.md
index b0e652d..e379240 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -72,7 +72,10 @@ This guide covers operational tasks for MeshMind deployments.
 
 - `make benchmarks` runs the synthetic benchmarking scripts (`scripts/evaluate_importance.py`, `scripts/consolidation_benchmark.py`, `scripts/benchmark_pagination.py`) with fast defaults and stores JSON summaries in `build/benchmarks/`.
 - Override script flags to stress specific backends (for example `--backend neo4j` or higher iteration counts) once live services are provisioned, and capture findings in `FINDINGS.md` / `ENVIRONMENT_NEEDS.md`.
-- Use `scripts/generate_synthetic_dataset.py` to produce large JSONL/CSV corpora (defaults: 10k memories, 20k triplets, 384-dim embeddings) before loading data into Memgraph/Neo4j for stress testing.
+- Use `scripts/generate_synthetic_dataset.py` to produce large JSONL/CSV corpora (defaults: 10k memories, 20k triplets, 384-dim embeddings) before loading data into Memgraph/Neo4j for stress testing. Pair the generator with the ingestion snippet from `docs/retrieval.md` to hydrate graph backends quickly without recomputing embeddings. Triplet payloads now include `entity_label` so they align with `Triplet` validation without extra preprocessing. When loading via the MeshMind client:
+  - Batch writes (for example in chunks of 500 memories/triplets) to keep request payload sizes manageable.
+  - Align namespaces across the JSONL/CSV payloads and retrieval queries so pagination filters remain effective.
+  - Call `meshmind.cli.admin counts --namespace <ns>` after ingestion to confirm memory distribution before executing benchmarks.
 
 ## Deployment Considerations
 
diff --git a/docs/retrieval.md b/docs/retrieval.md
index ecfed61..1d444ee 100644
--- a/docs/retrieval.md
+++ b/docs/retrieval.md
@@ -50,6 +50,62 @@ batch processing patterns.
 - `rerank_model` / `rerank_endpoint`: explicit overrides that take precedence over environment defaults when reranking.
 - `fields`: optional mapping for textual searches (regex, exact, fuzzy) to target metadata keys.
 
+## Synthetic Dataset Ingestion Workflow
+
+Large-scale retrieval experiments rely on synthetic corpora so benchmarks stay reproducible. Use the following workflow to
+seed data generated by `scripts/generate_synthetic_dataset.py` into your target backend:
+
+1. Generate the corpus:
+
+   ```bash
+   python scripts/generate_synthetic_dataset.py build/datasets/benchmark \
+     --memories 10000 \
+     --triplets 20000 \
+     --namespace benchmark
+   ```
+
+   This produces `memories.jsonl` (memory payloads) and `triplets.csv` (relationships) under `build/datasets/benchmark/`.
+
+2. Load memories with a short Python helper. The snippet below deserialises the JSONL payload and stores the objects directly
+   through the MeshMind client:
+
+   ```python
+   from __future__ import annotations
+
+   from pathlib import Path
+
+   from meshmind.client import MeshMind
+   from meshmind.core.types import Memory
+
+
+   def load_memories(path: Path, namespace: str, batch_size: int = 500) -> None:
+       mm = MeshMind()
+       batch: list[Memory] = []
+       with path.open("r", encoding="utf-8") as handle:
+           for line in handle:
+               payload = Memory.parse_raw(line)
+               payload.namespace = namespace
+               batch.append(payload)
+               if len(batch) >= batch_size:
+                   mm.store_memories(list(batch))
+                   batch.clear()
+       if batch:
+           mm.store_memories(list(batch))
+
+
+   load_memories(Path("build/datasets/benchmark/memories.jsonl"), namespace="benchmark")
+   ```
+
+3. Persist relationships in a similar fashion using `MeshMind.store_triplets` and the generated CSV payload (for example, with
+   `csv.DictReader`). Each row now includes `subject`, `predicate`, `object`, `namespace`, `entity_label`, and `metadata`, so the
+   ingestion helper can instantiate `Triplet(**row)` without additional mutation.
+
+4. Run retrieval queries (`meshmind search`, REST/gRPC calls, or the MeshMind Python client) targeting the `benchmark`
+   namespace and optional `entity_labels` to exercise vector, hybrid, and metadata filters against the seeded dataset.
+
+The same JSONL/CSV payloads can be adapted for bulk ingestion APIs exposed by the REST/gRPC services if you prefer remote
+loading. Make sure to keep namespaces aligned so pagination and label filters remain effective across benchmarking runs.
+
 ## Extending Retrieval
 
 1. Add a new module under `meshmind/retrieval` with a function that accepts `(query, memories, **kwargs)`.
diff --git a/docs/testing.md b/docs/testing.md
index a28bb5d..c432f29 100644
--- a/docs/testing.md
+++ b/docs/testing.md
@@ -69,7 +69,8 @@ The command stores JSON summaries under `build/benchmarks/`:
 Adjust the script flags (for example `--backend`, `--iterations`, or `--count`) to stress alternative drivers or larger
 datasets; see `scripts/*.py` for supported options. Document notable findings in `FINDINGS.md` or `ENVIRONMENT_NEEDS.md`
 when tuning defaults for new environments. Use `scripts/generate_synthetic_dataset.py` to generate large JSONL/CSV corpora
-before loading them into Memgraph/Neo4j for scale testing.
+before loading them into Memgraph/Neo4j for scale testing; the triplet CSV now includes `entity_label` so importing with
+`Triplet(**row)` succeeds without extra preprocessing.
 
 ## Adding Tests
 
diff --git a/scripts/generate_synthetic_dataset.py b/scripts/generate_synthetic_dataset.py
index 397e221..de5cc91 100644
--- a/scripts/generate_synthetic_dataset.py
+++ b/scripts/generate_synthetic_dataset.py
@@ -56,11 +56,14 @@ def _write_jsonl(path: Path, rows: Iterable[dict[str, object]]) -> None:
 def _write_triplets(path: Path, rows: Iterable[dict[str, object]]) -> None:
     path.parent.mkdir(parents=True, exist_ok=True)
     with path.open("w", encoding="utf-8") as handle:
-        handle.write("subject,predicate,object,namespace,metadata\n")
+        handle.write("subject,predicate,object,namespace,entity_label,metadata\n")
         for row in rows:
             metadata = json.dumps(row.get("metadata", {}), ensure_ascii=False)
             handle.write(
-                f"{row['subject']},{row['predicate']},{row['object']},{row['namespace']},{metadata}\n"
+                (
+                    f"{row['subject']},{row['predicate']},{row['object']},{row['namespace']},"
+                    f"{row['entity_label']},{metadata}\n"
+                )
             )
 
 
@@ -79,16 +82,19 @@ def generate_dataset(
     memory_rows = []
     triplet_rows = []
     entity_ids: list[str] = []
+    entity_labels: dict[str, str] = {}
 
     for _ in range(memories):
         uid = str(uuid4())
         entity_ids.append(uid)
+        label = random.choice(["Note", "Task", "Observation"])
+        entity_labels[uid] = label
         memory_rows.append(
             {
                 "uuid": uid,
                 "namespace": namespace,
                 "name": _random_text(3).title(),
-                "entity_label": random.choice(["Note", "Task", "Observation"]),
+                "entity_label": label,
                 "content": _random_text(random.randint(20, 60)),
                 "embedding": _random_embedding(embedding_dim),
                 "metadata": _random_metadata(),
@@ -103,6 +109,7 @@ def generate_dataset(
                 "predicate": random.choice(["references", "follows", "relates_to", "duplicates"]),
                 "object": obj,
                 "namespace": namespace,
+                "entity_label": entity_labels[subj],
                 "metadata": {
                     "confidence": round(random.uniform(0.5, 0.99), 2),
                     "notes": _random_text(6),