Merge branch 'main' into feat-nvidia-embedding-remote-sync

2026-06-12 07:38:27 +08:00 · 2026-06-12 07:38:27 +08:00 · b19f486d50
commit b19f486d50
parent 6021ea34ac 636602409c
52 changed files with 7602 additions and 889 deletions
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@ -32,13 +32,12 @@ jobs:

      - uses: actions/setup-node@v4
        with:
-          node-version: 22
+          node-version: 24
          registry-url: https://registry.npmjs.org
+          package-manager-cache: false

      - run: npm run build
      - run: npm publish --provenance --access public
-        env:
-          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

      - name: Extract release notes
        id: notes
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -19,6 +19,198 @@
  and query expansion models.
 - Embedding: use approximate token counts in external embedding mode so
  chunking does not load a local GGUF tokenizer.
+### Documentation
+
+- README: documented collection filtering (`-c` semantics), the `collection
+  show`/`include`/`exclude`/`update-cmd` subcommands, the `--intent`/`--no-rerank`/
+  `-C`/`--full-path` search flags, the `--format <kind>` output selector (with the
+  legacy `--json`/`--csv`/`--md`/`--xml`/`--files` booleans noted as aliases),
+  `vector-search`/`deep-search` aliases, embed
+  memory flags (`--max-docs-per-batch`/`--max-batch-mb`), a sample `--explain`
+  score trace, the `qmd doctor`/`qmd init` commands, the `get` `:from:count`
+  suffix and `--no-line-numbers`, an MCP tool parameter reference, and a
+  Benchmarking section for `qmd bench`.
+- docs/SYNTAX.md: removed the non-existent `q` MCP parameter example (the `query`
+  tool and REST endpoint accept only the `searches` array) and added a Scoping
+  section.
+- README: removed the misleading `qmd update --pull` example. The `--pull` flag is
+  parsed but never consumed (`updateCollections()` ignores it); the real mechanism
+  for running `git pull` before re-indexing is a per-collection `update` command,
+  set via `qmd collection update-cmd`.
+
+### Fixed
+
+- MCP server instructions now tell agents to scope with the plural `collections`
+  parameter (matching the schema). The previous singular `collection` hint led
+  agents to pass a parameter that Zod silently strips, producing unscoped results.
+  The `get` instruction line also now documents the full `file.md:from:count`
+  range suffix instead of only the single-line `file.md:100` offset.
+
+- Filesystem paths with special characters (`#`, `&`, spaces, `[]`, `()`, etc.)
+  now round-trip correctly through index → search → get. Previously
+  `reindexCollection` called `handelize()` on relative paths before storing
+  them, turning `# Meeting - 234232 3432 __ 5.md` into
+  `Meeting-234232-3432-5.md` and making `qmd get <actual-path>`,
+  `qmd get --full-path`, and `qmd ls` return dead or garbled paths. Paths are
+  now stored verbatim. Existing indexes auto-migrate on the next `qmd update`.
+
+- FTS5 search now correctly matches dotted version strings like `2026.4.10`. The
+  `porter unicode61` tokenizer splits on dots (storing `2026`, `4`, `10` as
+  separate tokens), but the query sanitizer was stripping dots and producing
+  `2026410` which never matched. Dotted terms are now split and ANDed together
+  so version-string searches work as expected (#563).
+- HTTP REST endpoints `/query` and `/search` now return `qmd://collection/path`
+  URIs in the `file` field, matching the output format used by the CLI and MCP
+  resource URIs. Previously the raw `displayPath` (`collection/path`) was
+  returned without the scheme prefix (#576).
+- The embed session `maxDuration` is now env-configurable via
+  `QMD_EMBED_MAX_DURATION_MS` (default: 30 min). This prevents large-corpus
+  embeddings from being aborted by the hardcoded 30-minute ceiling (#673).
+
+## [2.5.3] - 2026-05-28
+
+### Features
+
+- `qmd get` now accepts a `:from:count` suffix on a path or docid (e.g.
+  `qmd get "#abc123:120:40"` reads 40 lines starting at line 120). Explicit
+  `--from`/`-l` flags still override the suffix. The MCP `get` tool accepts the
+  same suffix.
+- `qmd get` and `qmd multi-get` are now **line-numbered by default** and print
+  the document's `#docid` and `qmd://` path in the output header. Disable line
+  numbers with `--no-line-numbers`. The MCP `get`/`multi_get` tools default
+  `lineNumbers` to `true` to match.
+- `qmd multi-get` now includes the `#docid` in every output format
+  (`--md`, `--json`, `--csv`, `--xml`, `--files`, and the default CLI view),
+  consistent with `qmd search`.
+- `qmd get` and `qmd multi-get` accept `--full-path`, which replaces the
+  `qmd://` path + `#docid` with the document's on-disk filesystem path (handy for
+  piping into `Read`/`Edit`/an editor). Falls back to the canonical `qmd://` +
+  docid header when the file no longer exists on disk.
+- `qmd search` / `qmd query` now show a clearer hit identifier: the default CLI
+  view (and the new `**file:**` line in `--md` output) always prints the full
+  `qmd://collection/path` URI so you can pipe it straight back into `qmd get`.
+- `qmd search` / `qmd query` accept `--full-path` with the same semantics as
+  `qmd get`: the result label becomes the file's on-disk path — `./`-prefixed
+  relative path when the file lives in a subfolder of `$PWD`, absolute realpath
+  otherwise — and the per-result `#docid` is dropped because the path is the
+  identifier. The leading `./` is intentional so the output is unambiguously a
+  filesystem path. Applies to all output formats.
+- `qmd get` and `qmd multi-get` now also use the `./`-prefixed convention when
+  `--full-path` renders a path under `$PWD`, matching `search`/`query`.
+- New `--format <kind>` flag selects the output format (`cli` | `json` | `csv` |
+  `md` | `xml` | `files`) for `search`, `query`, and `multi-get`. The legacy
+  boolean aliases (`--json`/`--csv`/`--md`/`--xml`/`--files`) still work but are
+  no longer in `--help`; prefer `--format`.
+
+### Fixes
+
+- Launcher: source-mode runner selection now prefers Node + tsx over Bun when
+  both `package-lock.json` and `bun.lock` are present in the package root,
+  mirroring the dist-mode "npm priority" rule. Fixes pnpm-global installs that
+  copy the entire working tree (including `.git` and `bun.lock`) into the
+  install dir and previously routed through Bun, causing ABI mismatches with
+  the Node-built `better-sqlite3` / `sqlite-vec` native modules.
+- Darwin Metal: llama-using commands (`query`, `vsearch`, `embed`) no longer
+  dump a multi-kB GGML/Metal backtrace at process exit even when output
+  succeeded. The libggml-metal static `ggml_metal_device` destructor asserts
+  `[rsets->data count] == 0` during `__cxa_finalize_ranges`, but the
+  buffer-free path never calls the symmetric `ggml_metal_device_rsets_rm`
+  to remove released rsets from the device collection (upstream
+  ggml-org/llama.cpp#22593, one-line fix open as PR #22595). The assertion
+  only fires when `process.exit()` skips Node's `beforeExit` hook, which is
+  what node-llama-cpp uses to auto-dispose Metal contexts. Primary fix:
+  `finishSuccessfulCliCommand` now sets `process.exitCode = 0` and returns
+  instead of calling `process.exit(0)`, so `beforeExit` fires and the native
+  binding cleans up before libc's static destructor runs. Defense-in-depth:
+  the launcher (`bin/qmd`) and the npm test driver (`scripts/test-all.mjs`
+  + the `test:bun` / `test:unit` package.json scripts) also set
+  `GGML_METAL_NO_RESIDENCY=1` on darwin before spawning node/bun, covering
+  error paths and tests that still terminate via `process.exit()`. The env
+  var must be set before node/bun start — libggml-metal reads it via libc
+  `getenv` at module-load time, and Bun does not propagate `process.env`
+  mutations to libc `setenv` — so it lives in the launcher rather than in
+  test-preload. Residency sets give no measurable speedup for QMD's
+  short-lived CLI workflow (benchmarked on M3 Pro). Opt back in with
+  `QMD_METAL_KEEP_RESIDENCY=1` for long-lived qmd processes (e.g. the MCP
+  daemon may benefit on hot reload) or to triage the upstream fix.
+  `qmd doctor` reports the mitigation state. Minimal reproduction:
+  `scripts/repro-metal-rsets-crash.mjs`.
+
+### Docs
+
+- qmd skill: emphasize reading line ranges with `get`'s built-in
+  `:from:count` suffix / `--from`/`-l` flags instead of piping through
+  `sed`/`head`/`tail`; cite the docid and line numbers now present in retrieval
+  output; and author structured `intent:`/`lex:`/`vec:`/`hyde:` queries yourself
+  rather than relying on built-in query expansion.
+
+## [2.5.2] - 2026-05-22
+
+### Fixes
+
+- Launcher: Rewrite `bin/qmd` as a Node-based shebang polyglot to fix global npm installation execution failures on Windows (#668 / #452), while supporting seamless fallback to Bun in Node-less environments.
+
+
+## [2.5.1] - 2026-05-20
+
+### Changes
+
+- Release: publish from GitHub Actions via npm Trusted Publishing/OIDC instead of a long-lived `NPM_TOKEN` secret.
+
+## [2.5.0] - 2026-05-19
+
+### Changes
+
+- Dependencies: update core SQLite/config/chunking packages (`better-sqlite3`, `yaml`, `web-tree-sitter`, `tree-sitter-go`, and `tree-sitter-python`) while keeping incompatible `zod`, `tsx`, and `vitest` majors pinned.
+- Agent skills: add `qmd skills list|get|path` to serve version-matched runtime skill instructions from the installed CLI, and make `qmd skill install` write a stable discovery stub so installed agent skills do not go stale after QMD upgrades.
+- CLI: add `qmd doctor` for index/runtime diagnostics, including SQLite/sqlite-vec versions, embedding fingerprint freshness, mixed-fingerprint detection, safe legacy fingerprint adoption, and content-hash sampling.
+
+### Fixes
+
+- Launcher: prefer runnable TypeScript source in git checkouts even when ignored `dist/` artifacts exist, while packaged installs continue to run `dist/`.
+- GPU: keep node-llama-cpp's documented `gpu: "auto"` initialization as the primary path, then perform no-build packaged CUDA/Vulkan/Metal probes only if auto falls back to CPU.
+- CLI: move GPU/CPU runtime diagnostics out of `qmd status`; use `qmd doctor` for device probing and related environment guidance.
+- CLI: point unexpected command/setup failures toward `qmd doctor` so diagnostics are the default next step when QMD behaves incorrectly.
+- Doctor: explicitly warn when `content_vectors` contains multiple non-empty embedding fingerprint names, with the per-fingerprint document/chunk breakdown.
+- Embed: make the TTY progress line label byte-based input progress explicitly, show embedded chunks as a count, and shorten the displayed model name.
+- Embed: retain per-chunk failure details, retry failed chunks after later successful embeds and again when no other chunks remain, clear recovered errors, and cap retries to avoid endless loops.
+- Tests: expand the container smoke harness to cover npm-global, npx-style, and Bun-global install scenarios, always checking auto and `QMD_FORCE_CPU=1` doctor modes, with opt-in tiny `qmd embed` and GPU probe runs for supported container runtimes.
+- Embedding: fingerprint vector metadata using the active embedding model and formatting/chunking parameters so stale vectors are treated as pending after search semantics change. Legacy `content_vectors` columns are migrated lazily on first vector-health/write use to preserve fast QMD startup.
+
+- Skill: expand the packaged QMD skill with retrieval-first workflows, structured query examples, wiki/source collection guidance, and safe fallbacks when model-backed search is unavailable.
+- Tests: make `bun run test` execute the local unit suite under both Node/Vitest and Bun (`test:node` + `test:bun`) so runtime-specific regressions are caught before CI.
+- Model config: centralize embedding/rerank/generation model resolution so `qmd embed`, `status`, `query`, `vsearch`, `pull`, SDK vector search, and `bench` use the same active `.qmd/index.yaml` model hints and environment fallbacks.
+- GPU/status: `qmd status` now uses the same embedding model identity as `qmd embed` when computing pending embeddings, so URI-backed embeddings are not incorrectly reported as pending under the legacy `embeddinggemma` alias.
+- GPU status: `qmd status` now always shows GPU mode/configuration without unsafe native probing, and CPU-fallback warnings point to `QMD_STATUS_DEVICE_PROBE=1 qmd status` for an actual backend probe. The no-GPU warning is emitted once per process instead of once per LLM instance during benchmarks.
+- GPU: add `QMD_FORCE_CPU=1` / `--no-gpu` to bypass CUDA/Vulkan/Metal probing entirely, and route native llama.cpp stdout noise to stderr so JSON output stays parseable during search/query commands.
+- Snippet line numbers: `qmd_query` (MCP), HTTP `/query`, and `qmd query`
+  (CLI JSON output and snippet headers) now return absolute source-file
+  line numbers instead of chunk-local ones, so the `line` field can be
+  passed back to `qmd_get` as `fromLine` without a separate lookup.
+  Snippet selection remains scoped to the best matching chunk
+  (preserves #149).
+- CLI: `qmd query --full` now emits the full document body in all output
+  formats (json, csv, md, xml), restoring the documented behavior of the
+  flag. Previously it returned only the best matching chunk (~3.6KB max
+  per result). Output payload for `--full` queries is now proportional
+  to total document size.
+- macOS Metal: `qmd query --json` now flushes successful JSON output and uses a safe immediate-exit path on Darwin to avoid ggml Metal finalizer aborts; other commands still dispose LLM contexts/models before the llama runtime. #368
+- Embedding: require complete chunk coverage before treating a document as
+  embedded, remove partial vectors when chunk/session failures leave a
+  document incomplete, and keep `qmd status` pending counts honest after
+  interrupted long embed runs. #637 #378
+- Embedding: `qmd embed -c <collection>` now scopes pending-doc selection
+  to the requested collection instead of embedding global pending work.
+  Scoped `--force` clears only collection-owned vectors, preserves shared
+  hashes referenced by sibling collections, and drops `vectors_vec` only
+  when the scoped clear empties all vectors.
+- Hybrid search: weight RRF lists by query type so original FTS and original vector evidence get the intended 2x boost, instead of accidentally boosting the first lexical expansion. #591
+- MCP: seed llama.cpp/GGML quiet env vars before launching `qmd mcp` so native logs cannot pollute stdio JSON-RPC framing. #593
+- CLI: remove CommonJS `require()` calls from ESM index path normalization so `qmd --index <path>` no longer crashes with `ERR_AMBIGUOUS_MODULE_SYNTAX` on Node 22+. #634
+- Windows CUDA: serialize llama.cpp embedding/reranking contexts by default to avoid intermittent `ggml-cuda.cu:98` crashes in `qmd query`; set `QMD_EMBED_PARALLELISM` to opt back into parallel contexts if your driver is stable. #519
+- MCP: make `qmd mcp --index <name>` use the selected index for both foreground and daemon HTTP servers instead of falling back to the default store. #343
+- Embedding: respect `QMD_EMBED_MODEL` consistently for vector indexing and vector-backed search, with default-model fallback when unset.
+- Config: use one home-directory resolver for YAML config and the default SQLite cache path, avoiding Windows CLI/MCP split-brain when `HOME` is unset.
 - GPU: respect explicit `QMD_LLAMA_GPU=metal|vulkan|cuda` backend overrides instead of always using auto GPU selection. #529
 - Fix: preserve original filename case in `handelize()`. The previous
  `.toLowerCase()` call made indexed paths unreachable on case-sensitive
@ -27,6 +219,18 @@
 - CLI: make `qmd status` skip native `node-llama-cpp` device probing by
  default so status stays safe on machines with broken or unsupported GPU
  drivers. Set `QMD_STATUS_DEVICE_PROBE=1` to opt in.
+- CLI: lazy-load `node-llama-cpp` so lightweight commands such as
+  `qmd status` do not import native ML dependencies or trigger llama.cpp
+  builds on ARM/no-GPU machines. #491
+- Store: keep content rows referenced by inactive documents during orphan
+  cleanup so `qmd update` preserves soft-deleted tombstones for removed
+  files. #585
+- Packaging: install AST grammar WASM packages as required dependencies so
+  Bun global installs include TypeScript/TSX/JavaScript grammars, and add a
+  `smoke:package-grammars` verification command. #595
+- Launcher: add wrapper smoke coverage for scoped package, npm/npx,
+  Homebrew/Linuxbrew, Bun global symlink layouts, and `$BUN_INSTALL`
+  false-positive runtime selection regressions. #351 #353 #354 #356 #358 #359

 ## [2.1.0] - 2026-04-05

--- a/README.md
+++ b/README.md
@ -135,6 +135,30 @@ LLM models stay loaded in VRAM across requests. Embedding/reranking contexts are

 Point any MCP client at `http://localhost:8181/mcp` to connect.

+#### MCP Tool Parameters
+
+| Tool | Parameter | Type | Notes |
+|------|-----------|------|-------|
+| `query` | `searches` | array | Typed sub-queries (`lex`/`vec`/`hyde`), 1–10. **Required.** First gets 2x weight. |
+| `query` | `collections` | string[] | Filter by collection names (OR). **Array only** — singular `collection` is silently ignored. |
+| `query` | `intent` | string | Disambiguation context (does not search on its own) |
+| `query` | `limit` | number | Max results (default 10) |
+| `query` | `minScore` | number | Minimum relevance 0–1 (default 0) |
+| `query` | `candidateLimit` | number | Max candidates to rerank (default 40) |
+| `query` | `rerank` | boolean | Run LLM reranking (default **true**); set false for RRF-only |
+| `get` | `file` | string | Path, docid (`#abc123`), or `path:from:count` (e.g. `#abc123:120:40`) |
+| `get` | `fromLine` | number | Start line (1-indexed); overrides the `:from` suffix |
+| `get` | `maxLines` | number | Limit returned lines |
+| `get` | `lineNumbers` | boolean | Prefix lines with numbers (default **true**) |
+| `multi_get` | `pattern` | string | Glob pattern or comma-separated list |
+| `multi_get` | `maxBytes` | number | Skip files larger than N (default 10240) |
+| `multi_get` | `maxLines` | number | Limit lines per file |
+| `multi_get` | `lineNumbers` | boolean | Prefix lines with numbers (default **true**) |
+
+Unknown parameters are silently ignored (not rejected) — double-check names if
+results seem unscoped. The HTTP `/query` and `/search` endpoints return
+`qmd://collection/path` URIs in the `file` field, matching the CLI and MCP output.
+
 ### SDK / Library Usage

 Use QMD as a library in your own Node.js or Bun applications.
@ -575,6 +599,17 @@ qmd collection rename myproject my-project
 # List files in a collection
 qmd ls notes
 qmd ls notes/subfolder
+
+# Show collection details (path, glob mask, include status, context count)
+qmd collection show notes
+
+# Include or exclude a collection from default (unscoped) queries
+qmd collection include notes
+qmd collection exclude notes
+
+# Run a command before every `qmd update` (e.g. git pull); empty arg clears it
+qmd collection update-cmd notes 'git pull --rebase'
+qmd collection update-cmd notes
 ```

 ### Generate Vector Embeddings
@ -591,6 +626,10 @@ qmd embed --chunk-strategy auto

 # Also works with query for consistent chunk selection
 qmd query "auth flow" --chunk-strategy auto
+
+# Memory control for large corpora / constrained systems
+qmd embed --max-docs-per-batch 50   # cap docs per embedding batch
+qmd embed --max-batch-mb 64         # cap batch size in MB
 ```

 **AST-aware chunking** (`--chunk-strategy auto`) uses tree-sitter to chunk code
@ -652,6 +691,9 @@ qmd vsearch "how to login"
 qmd query "user authentication"
 ```

+Two aliases exist for the semantic/hybrid modes: `vector-search` (→ `vsearch`)
+and `deep-search` (→ `query`).
+
 ### Options

 ```sh
@ -664,24 +706,45 @@ qmd query "user authentication"
 --line-numbers     # Add line numbers to output
 --explain          # Include retrieval score traces (query, JSON/CLI output)
 --index <name>     # Use named index
+--intent "<text>"  # Disambiguation context (e.g. "web page load times")
+--no-rerank        # Skip LLM reranking (RRF scores only; faster on CPU)
+-C, --candidate-limit <n>  # Max candidates to rerank (default: 40)
+--full-path        # Emit on-disk filesystem paths instead of qmd:// URIs

 # Output formats (for search and multi-get)
--files            # Output: docid,score,filepath,context
--json             # JSON output with snippets
--csv              # CSV output
--md               # Markdown output
--xml              # XML output
+--format <kind>    # cli (default) | json | csv | md | xml | files
+                   # (--json, --csv, --md, --xml, --files are legacy aliases)

 # Get options
-qmd get <file>[:line]  # Get document, optionally starting at line
-l <num>               # Maximum lines to return
--from <num>           # Start from line number
+qmd get <file>[:from[:count]]  # Get document; optional start line and count
+-l <num>                       # Maximum lines to return
+--from <num>                   # Start line (overrides the :from suffix)
+--no-line-numbers              # Disable line numbering (on by default)

 # Multi-get options
 -l <num>           # Maximum lines per file
 --max-bytes <num>  # Skip files larger than N bytes (default: 10KB)
 ```

+### Collection Filtering
+
+The `-c`/`--collection` flag filters results by collection **name** (as shown by
+`qmd collection list`). Collections are a global registry — you can search any
+collection from any directory:
+
+```sh
+qmd search "auth" -c notes           # single collection
+qmd search "auth" -c notes -c docs   # multiple collections (OR)
+```
+
+With no `-c` flag, all default-included collections are searched. Collections
+marked excluded (`qmd collection exclude <name>`) are skipped unless named
+explicitly with `-c`.
+
+> **Note:** With multiple `-c` flags, results come from a global top-K pool and are
+> then filtered. If one collection dominates the rankings, matches from smaller
+> collections may not appear at the default limit — raise `-n` or use `--all`.
+
 ### Output Format

 Default output is colorized CLI format (respects `NO_COLOR` env).
@ -759,17 +822,48 @@ qmd query --json --explain "quarterly reports"
 qmd --index work search "quarterly reports"
 ```

+The `--explain` flag attaches a score breakdown to each result: the FTS/vector
+backend scores plus the RRF fusion math (rank, weight, top-rank bonus) and every
+sub-query's contribution. Abbreviated:
+
+```json
+{
+  "docid": "#6c90f0",
+  "score": 0.89,
+  "file": "qmd://qmd/README.md",
+  "explain": {
+    "ftsScores": [0.892, 0.907],
+    "vectorScores": [0.540, 0.484],
+    "rrf": {
+      "rank": 1,
+      "weight": 0.75,
+      "baseScore": 0.123,
+      "topRankBonus": 0.05,
+      "totalScore": 0.173,
+      "contributions": [
+        { "source": "fts", "queryType": "original", "query": "reranking",
+          "rank": 1, "weight": 2, "backendScore": 0.892, "rrfContribution": 0.0328 }
+      ]
+    }
+  }
+}
+```
+
 ### Index Maintenance

 ```sh
 # Show index status and collections with contexts
 qmd status

-# Re-index all collections
+# Re-index all collections. If a collection has a configured update command
+# (e.g. `git pull`), it runs first — set one with `qmd collection update-cmd`.
 qmd update

-# Re-index with git pull first (for remote repos)
-qmd update --pull
+# Diagnose the install (runtime, sqlite-vec, embedding fingerprints, GPU probe)
+qmd doctor
+
+# Initialize a project-local index in the current directory
+qmd init

 # Get document by filepath (with fuzzy matching suggestions)
 qmd get notes/meeting.md
@ -780,6 +874,13 @@ qmd get "#abc123"
 # Get document starting at line 50, max 100 lines
 qmd get notes/meeting.md:50 -l 100

+# Read 40 lines starting at line 120 via the :from:count suffix (works with docids)
+qmd get notes/meeting.md:120:40
+qmd get "#abc123:120:40"
+
+# get / multi-get are line-numbered by default; disable with --no-line-numbers
+qmd get notes/meeting.md --no-line-numbers
+
 # Get multiple documents by glob pattern
 qmd multi-get "journals/2025-05*.md"

@ -796,6 +897,75 @@ qmd multi-get "docs/*.md" --json
 qmd cleanup
 ```

+### Benchmarking
+
+Measure search quality across all four backends with `qmd bench` and a fixture file
+of queries with known-relevant documents.
+
+**From a git checkout**, an example fixture and its test corpus ship in the repo:
+
+```sh
+# One-time setup (indexes the repo's test corpus into its own collection)
+qmd collection add test/eval-docs --name eval-docs
+qmd embed -c eval-docs
+
+# Run the benchmark (table output)
+qmd bench src/bench/fixtures/example.json
+
+# JSON output for programmatic analysis
+qmd bench src/bench/fixtures/example.json --json
+```
+
+> The example fixture (`src/bench/fixtures/example.json`) and its test corpus
+> (`test/eval-docs/`) exist only in a git checkout — they are **not** part of the
+> published npm package. If you installed via `npm`/`npx`, write your own fixture
+> (see below) against a collection you have already indexed:
+>
+> ```sh
+> qmd bench my-fixture.json -c my-collection
+> ```
+
+Each query runs against four backends, reporting precision@k, recall, MRR, and F1:
+
+| Backend | What it tests | LLM required |
+|---------|---------------|--------------|
+| `bm25` | Keyword search only (FTS5) | No |
+| `vector` | Semantic similarity only | Embedding model |
+| `hybrid` | BM25 + vector fusion (no reranking) | Embedding model |
+| `full` | Full pipeline with LLM reranking | All three models |
+
+**Score interpretation:** `1.00` = perfect (all expected docs in top results),
+`0.00` = complete miss. The example fixture typically shows bm25 ~0.50, vector
+~0.70, and hybrid/full ~1.00 — a concrete demonstration of why hybrid search beats
+either backend alone.
+
+**Custom fixtures** are JSON:
+
+```json
+{
+  "description": "My benchmark",
+  "version": 1,
+  "collection": "my-collection",
+  "queries": [
+    {
+      "id": "find-auth",
+      "query": "authentication flow",
+      "type": "semantic",
+      "expected_files": ["docs/auth-design.md"],
+      "expected_in_top_k": 3
+    }
+  ]
+}
+```
+
+`expected_files` are collection-relative paths as shown by `qmd ls`. The `type`
+field (`exact`, `semantic`, `topical`, `cross-domain`, `alias`) labels queries for
+grouping — it does not change search behavior.
+
+> **Heads-up:** if the fixture's collection isn't indexed, bench currently runs to
+> completion and reports all zeros with no warning. Verify setup with
+> `qmd ls <collection>` first.
+
 ## Data Storage

 Index stored in: `~/.cache/qmd/index.sqlite`
@ -817,6 +987,9 @@ llm_cache       -- Cached LLM responses (query expansion, rerank scores)
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `XDG_CACHE_HOME` | `~/.cache` | Cache directory location |
+| `QMD_LLAMA_GPU` | `auto` | Force llama.cpp GPU backend (`metal`, `vulkan`, `cuda`) or disable GPU with `false` |
+| `QMD_FORCE_CPU` | unset | Set to `1`/`true` to force CPU mode before any CUDA/Vulkan/Metal probing. Equivalent CLI flag: `--no-gpu`. |
+| `QMD_EMBED_PARALLELISM` | automatic | Override embedding/reranking context parallelism (1-8). Windows CUDA defaults to `1` because parallel CUDA contexts can crash with `ggml-cuda.cu:98`; use Vulkan or raise this only if your driver is stable. |

 ## How It Works

--- a/bin/qmd
+++ b/bin/qmd
@ -31,3 +31,165 @@ elif [ -x "$HOME/.bun/bin/bun" ]; then
 else
  exec node "$DIR/dist/cli/qmd.js" "$@"
 fi
+#!/usr/bin/env node
+// 2>/dev/null; if command -v node >/dev/null 2>&1; then exec node "$0" "$@"; else exec bun "$0" "$@"; fi
+// Cross-platform launcher for qmd.
+//
+// Previously this was a POSIX shell script with `#!/bin/sh`, which meant npm
+// on Windows generated shims that tried to route through `/bin/sh` — a path
+// that doesn't exist on Windows, so `qmd` failed immediately after a global
+// install. Rewriting the launcher in Node.js lets npm generate native
+// cmd/ps1/sh shims that invoke `node` directly on every platform.
+
+import { spawn, spawnSync } from "node:child_process";
+import { existsSync, realpathSync } from "node:fs";
+import { dirname, resolve } from "node:path";
+import { fileURLToPath } from "node:url";
+
+// Resolve symlinks so global installs (npm link / npm install -g) can find
+// the actual package directory instead of the global bin directory.
+const self = realpathSync(fileURLToPath(import.meta.url));
+const pkgDir = resolve(dirname(self), "..");
+const jsEntry = resolve(pkgDir, "dist/cli/qmd.js");
+const tsEntry = resolve(pkgDir, "src/cli/qmd.ts");
+
+// MCP stdio reserves stdout exclusively for JSON-RPC frames. node-llama-cpp
+// / llama.cpp / ggml can write native logs directly to stdout before JS-level
+// log handlers are attached, so seed the native quiet env before Node/Bun imports
+// the CLI and its LLM modules. Preserve explicit user values when provided.
+if (process.argv[2] === "mcp") {
+  process.env.LLAMA_LOG_LEVEL = process.env.LLAMA_LOG_LEVEL || "error";
+  process.env.GGML_LOG_LEVEL = process.env.GGML_LOG_LEVEL || "error";
+  process.env.GGML_BACKEND_SILENT = process.env.GGML_BACKEND_SILENT || "1";
+}
+
+// libggml-metal on macOS uses "residency sets" to keep allocated model memory
+// resident across inference requests (180-second keep_alive timer). The
+// process-static device destructor that runs during libc exit() asserts the
+// residency set is empty (ggml-org/llama.cpp#22593); the keep_alive hasn't
+// expired by exit, so the assertion fails and ggml_abort dumps a multi-kB
+// stack trace to stderr even when the user-visible results were already
+// emitted correctly. No JS-side dispose can prevent it because the static
+// destructor runs in __cxa_finalize_ranges, after every JS-reachable cleanup.
+//
+// For QMD's short-lived CLI workflow, residency sets provide no observable
+// performance benefit (subsequent requests don't reuse the warm mapping —
+// measured: identical wall time with and without on M3 Pro), so disable them
+// by default on darwin. The env var must be set BEFORE the native llama.cpp
+// binding loads, which is why it lives here in the launcher rather than in
+// the JS entry point. Opt back in with QMD_METAL_KEEP_RESIDENCY=1 if you
+// run long-lived qmd processes (the MCP daemon may benefit on hot reload)
+// or are triaging an upstream Metal teardown fix.
+if (process.platform === "darwin" && process.env.QMD_METAL_KEEP_RESIDENCY !== "1") {
+  process.env.GGML_METAL_NO_RESIDENCY = process.env.GGML_METAL_NO_RESIDENCY || "1";
+}
+
+function hasBun() {
+  try {
+    const res = spawnSync("bun", ["--version"], { stdio: "ignore", shell: process.platform === "win32" });
+    return res.status === 0;
+  } catch {
+    return false;
+  }
+}
+
+// In published packages, bin/qmd must run dist/. In a git checkout, however,
+// dist/ is often ignored and can be stale after git reset or branch switches.
+// Prefer source mode only for checkouts so ./bin/qmd reflects the checked-out
+// source without changing packaged/runtime behavior.
+//
+// Critical: source-mode detection must NOT trigger when a package manager
+// installed us. `pnpm install -g .` (and `npm install -g .`) copy the entire
+// working tree — including .git/, bun.lock, package-lock.json, src/, and even
+// node_modules/ — into <prefix>/node_modules/@tobilu/qmd/, so .git and a
+// lockfile being present is not a reliable "this is a working tree" signal.
+// What IS reliable: a package-manager install always lands the package
+// directory inside a `node_modules/` segment; a bare working-tree checkout
+// (with `bun link` or a direct path invocation) does not. Gate source mode
+// on that. Allow QMD_SOURCE_MODE=1 / =0 as an explicit override for the
+// rare case where the heuristic disagrees with the user.
+const sourceOverride = process.env.QMD_SOURCE_MODE;
+const looksInstalled = pkgDir.split("/").includes("node_modules");
+const sourceAllowed = sourceOverride === "1"
+  || (sourceOverride !== "0" && !looksInstalled);
+
+let useSourceMode = false;
+let sourceRunner = null;
+let sourceArgs = [];
+
+if (sourceAllowed && existsSync(resolve(pkgDir, ".git")) && existsSync(tsEntry)) {
+  // Lockfile-driven runner selection — mirror the dist-mode logic below so
+  // source mode picks the same runtime the user's deps were installed for.
+  // package-lock.json wins over bun.lock when both are present: pnpm/npm
+  // installs ship the Node-ABI native modules (better-sqlite3, sqlite-vec),
+  // and running Bun against them produces ABI mismatches. This also fixes
+  // pnpm-global installs, which copy the whole working tree — including .git
+  // and bun.lock — into the install dir and used to route through Bun even
+  // when the user installed via npm/pnpm.
+  const hasNpmLock = existsSync(resolve(pkgDir, "package-lock.json"));
+  const hasBunLock = existsSync(resolve(pkgDir, "bun.lock")) || existsSync(resolve(pkgDir, "bun.lockb"));
+  const tsxEntry = resolve(pkgDir, "node_modules/tsx/dist/cli.mjs");
+  const tsxAvailable = existsSync(tsxEntry);
+
+  if (hasNpmLock && tsxAvailable) {
+    useSourceMode = true;
+    sourceRunner = "node";
+    sourceArgs = [tsxEntry, tsEntry, ...process.argv.slice(2)];
+  } else if (hasBunLock && hasBun()) {
+    useSourceMode = true;
+    sourceRunner = "bun";
+    sourceArgs = [tsEntry, ...process.argv.slice(2)];
+  } else if (tsxAvailable) {
+    useSourceMode = true;
+    sourceRunner = "node";
+    sourceArgs = [tsxEntry, tsEntry, ...process.argv.slice(2)];
+  }
+}
+
+if (!useSourceMode && !existsSync(jsEntry)) {
+  console.error(`qmd is not built: missing ${jsEntry}`);
+  console.error("Run: bun install && bun run build");
+  console.error("Or:  npm install && npm run build");
+  console.error("After building, run: qmd doctor");
+  process.exit(1);
+}
+
+// Detect the package manager that installed dependencies by checking lockfiles.
+// $BUN_INSTALL is intentionally NOT checked — it only indicates that bun exists
+// on the system, not that it was used to install this package (see #361).
+//
+// package-lock.json takes priority: if it exists, npm installed the native
+// modules for Node. The repo ships bun.lock, so without this check, source
+// builds that use npm would be incorrectly routed to bun, causing ABI
+// mismatches with better-sqlite3 / sqlite-vec (see #381).
+let runnerName = "node";
+if (existsSync(resolve(pkgDir, "package-lock.json"))) {
+  runnerName = "node";
+} else if (existsSync(resolve(pkgDir, "bun.lock")) || existsSync(resolve(pkgDir, "bun.lockb"))) {
+  runnerName = "bun";
+} else {
+  runnerName = "node";
+}
+
+const runner = useSourceMode ? sourceRunner : (runnerName === "node" ? "node" : "bun");
+const args = useSourceMode ? sourceArgs : [jsEntry, ...process.argv.slice(2)];
+const needsShell = (runner === "bun") && process.platform === "win32";
+
+const child = spawn(runner, args, {
+  stdio: "inherit",
+  shell: needsShell,
+});
+
+child.on("exit", (code, signal) => {
+  if (signal) {
+    process.kill(process.pid, signal);
+  } else {
+    process.exit(code ?? 0);
+  }
+});
+
+child.on("error", (err) => {
+  const name = useSourceMode ? sourceRunner : runnerName;
+  console.error(`qmd: failed to launch ${name}: ${err.message}`);
+  process.exit(1);
+});
--- a/bun.lock
+++ b/bun.lock
@ -6,13 +6,17 @@
      "name": "2025-12-07-bm25-q",
      "dependencies": {
        "@modelcontextprotocol/sdk": "1.29.0",
-        "better-sqlite3": "12.8.0",
+        "better-sqlite3": "12.10.0",
        "fast-glob": "3.3.3",
        "node-llama-cpp": "3.18.1",
        "picomatch": "4.0.4",
        "sqlite-vec": "0.1.9",
-        "web-tree-sitter": "0.26.7",
-        "yaml": "2.8.3",
+        "tree-sitter-go": "0.25.0",
+        "tree-sitter-python": "0.25.0",
+        "tree-sitter-rust": "0.24.0",
+        "tree-sitter-typescript": "0.23.2",
+        "web-tree-sitter": "0.26.8",
+        "yaml": "2.9.0",
        "zod": "4.2.1",
      },
      "devDependencies": {
@ -26,10 +30,6 @@
        "sqlite-vec-linux-arm64": "0.1.9",
        "sqlite-vec-linux-x64": "0.1.9",
        "sqlite-vec-windows-x64": "0.1.9",
-        "tree-sitter-go": "0.23.4",
-        "tree-sitter-python": "0.23.4",
-        "tree-sitter-rust": "0.24.0",
-        "tree-sitter-typescript": "0.23.2",
      },
      "peerDependencies": {
        "typescript": "^5.9.3",
@ -247,7 +247,7 @@

    "base64-js": ["base64-js@1.5.1", "", {}, "sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA=="],

-    "better-sqlite3": ["better-sqlite3@12.8.0", "", { "dependencies": { "bindings": "^1.5.0", "prebuild-install": "^7.1.1" } }, "sha512-RxD2Vd96sQDjQr20kdP+F+dK/1OUNiVOl200vKBZY8u0vTwysfolF6Hq+3ZK2+h8My9YvZhHsF+RSGZW2VYrPQ=="],
+    "better-sqlite3": ["better-sqlite3@12.10.0", "", { "dependencies": { "bindings": "^1.5.0", "prebuild-install": "^7.1.1" } }, "sha512-CyzaZRQKyHkB2ZInfTTl2nvT33EbDpjkLEbE8/Zck3Ll6O0qqvuGdrJ45HgtH+HykRg88ITY3AdreBGN70aBSQ=="],

    "bindings": ["bindings@1.5.0", "", { "dependencies": { "file-uri-to-path": "1.0.0" } }, "sha512-p2q/t/mhvuOj/UeLlV6566GD/guowlr0hHxClI0W9m7MWYkL1F0hLo+0Aexs9HSPCtR1SXQ0TD3MMKrXZajbiQ=="],

@ -401,7 +401,7 @@

    "get-proto": ["get-proto@1.0.1", "", { "dependencies": { "dunder-proto": "^1.0.1", "es-object-atoms": "^1.0.0" } }, "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g=="],

-    "get-tsconfig": ["get-tsconfig@4.13.6", "", { "dependencies": { "resolve-pkg-maps": "^1.0.0" } }, "sha512-shZT/QMiSHc/YBLxxOkMtgSid5HFoauqCE3/exfsEcwg1WkeqjG+V40yBbBrsD+jW2HDXcs28xOfcbm2jI8Ddw=="],
+    "get-tsconfig": ["get-tsconfig@4.14.0", "", { "dependencies": { "resolve-pkg-maps": "^1.0.0" } }, "sha512-yTb+8DXzDREzgvYmh6s9vHsSVCHeC0G3PI5bEXNBHtmshPnO+S5O7qgLEOn0I5QvMy6kpZN8K1NKGyilLb93wA=="],

    "github-from-package": ["github-from-package@0.0.0", "", {}, "sha512-SyHy3T1v2NUXn29OsWdxmK6RwHD+vkj3v8en8AOBZ1wBQ/hCAQ5bAQTD02kW4W9tUp/3Qh6J8r9EvntiyCmOOw=="],

@ -509,7 +509,7 @@

    "node-abi": ["node-abi@3.87.0", "", { "dependencies": { "semver": "^7.3.5" } }, "sha512-+CGM1L1CgmtheLcBuleyYOn7NWPVu0s0EJH2C4puxgEZb9h8QpR9G2dBfZJOAUhi7VQxuBPMd0hiISWcTyiYyQ=="],

-    "node-addon-api": ["node-addon-api@8.5.0", "", {}, "sha512-/bRZty2mXUIFY/xU5HLvveNHlswNJej+RnxBjOMkidWfwZzgTbPG1E3K5TOxRLOR+5hX7bSofy8yf1hZevMS8A=="],
+    "node-addon-api": ["node-addon-api@8.7.0", "", {}, "sha512-9MdFxmkKaOYVTV+XVRG8ArDwwQ77XIgIPyKASB1k3JPq3M8fGQQQE3YpMOrKm6g//Ktx8ivZr8xo1Qmtqub+GA=="],

    "node-api-headers": ["node-api-headers@1.8.0", "", {}, "sha512-jfnmiKWjRAGbdD1yQS28bknFM1tbHC1oucyuMPjmkEs+kpiu76aRs40WlTmBmyEgzDM76ge1DQ7XJ3R5deiVjQ=="],

@ -687,11 +687,11 @@

    "toidentifier": ["toidentifier@1.0.1", "", {}, "sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA=="],

-    "tree-sitter-go": ["tree-sitter-go@0.23.4", "", { "dependencies": { "node-addon-api": "^8.2.1", "node-gyp-build": "^4.8.2" }, "peerDependencies": { "tree-sitter": "^0.21.1" }, "optionalPeers": ["tree-sitter"] }, "sha512-iQaHEs4yMa/hMo/ZCGqLfG61F0miinULU1fFh+GZreCRtKylFLtvn798ocCZjO2r/ungNZgAY1s1hPFyAwkc7w=="],
+    "tree-sitter-go": ["tree-sitter-go@0.25.0", "", { "dependencies": { "node-addon-api": "^8.3.1", "node-gyp-build": "^4.8.4" }, "peerDependencies": { "tree-sitter": "^0.25.0" }, "optionalPeers": ["tree-sitter"] }, "sha512-APBc/Dq3xz/e35Xpkhb1blu5UgW+2E3RyGWawZSCNcbGwa7jhSQPS8KsUupuzBla8PCo8+lz9W/JDJjmfRa2tw=="],

    "tree-sitter-javascript": ["tree-sitter-javascript@0.23.1", "", { "dependencies": { "node-addon-api": "^8.2.2", "node-gyp-build": "^4.8.2" }, "peerDependencies": { "tree-sitter": "^0.21.1" }, "optionalPeers": ["tree-sitter"] }, "sha512-/bnhbrTD9frUYHQTiYnPcxyHORIw157ERBa6dqzaKxvR/x3PC4Yzd+D1pZIMS6zNg2v3a8BZ0oK7jHqsQo9fWA=="],

-    "tree-sitter-python": ["tree-sitter-python@0.23.4", "", { "dependencies": { "node-addon-api": "^8.2.1", "node-gyp-build": "^4.8.2" }, "peerDependencies": { "tree-sitter": "^0.21.1" }, "optionalPeers": ["tree-sitter"] }, "sha512-MbmUAl7y5UCUWqHscHke7DdRDwQnVNMNKQYQc4Gq2p09j+fgPxaU8JVsuOI/0HD3BSEEe5k9j3xmdtIWbDtDgw=="],
+    "tree-sitter-python": ["tree-sitter-python@0.25.0", "", { "dependencies": { "node-addon-api": "^8.5.0", "node-gyp-build": "^4.8.4" }, "peerDependencies": { "tree-sitter": "^0.25.0" }, "optionalPeers": ["tree-sitter"] }, "sha512-eCmJx6zQa35GxaCtQD+wXHOhYqBxEL+bp71W/s3fcDMu06MrtzkVXR437dRrCrbrDbyLuUDJpAgycs7ncngLXw=="],

    "tree-sitter-rust": ["tree-sitter-rust@0.24.0", "", { "dependencies": { "node-addon-api": "^8.2.2", "node-gyp-build": "^4.8.4" }, "peerDependencies": { "tree-sitter": "^0.22.1" }, "optionalPeers": ["tree-sitter"] }, "sha512-NWemUDf629Tfc90Y0Z55zuwPCAHkLxWnMf2RznYu4iBkkrQl2o/CHGB7Cr52TyN5F1DAx8FmUnDtCy9iUkXZEQ=="],

@ -725,7 +725,7 @@

    "vitest": ["vitest@3.2.4", "", { "dependencies": { "@types/chai": "^5.2.2", "@vitest/expect": "3.2.4", "@vitest/mocker": "3.2.4", "@vitest/pretty-format": "^3.2.4", "@vitest/runner": "3.2.4", "@vitest/snapshot": "3.2.4", "@vitest/spy": "3.2.4", "@vitest/utils": "3.2.4", "chai": "^5.2.0", "debug": "^4.4.1", "expect-type": "^1.2.1", "magic-string": "^0.30.17", "pathe": "^2.0.3", "picomatch": "^4.0.2", "std-env": "^3.9.0", "tinybench": "^2.9.0", "tinyexec": "^0.3.2", "tinyglobby": "^0.2.14", "tinypool": "^1.1.1", "tinyrainbow": "^2.0.0", "vite": "^5.0.0 || ^6.0.0 || ^7.0.0-0", "vite-node": "3.2.4", "why-is-node-running": "^2.3.0" }, "peerDependencies": { "@edge-runtime/vm": "*", "@types/debug": "^4.1.12", "@types/node": "^18.0.0 || ^20.0.0 || >=22.0.0", "@vitest/browser": "3.2.4", "@vitest/ui": "3.2.4", "happy-dom": "*", "jsdom": "*" }, "optionalPeers": ["@edge-runtime/vm", "@types/debug", "@types/node", "@vitest/browser", "@vitest/ui", "happy-dom", "jsdom"], "bin": { "vitest": "vitest.mjs" } }, "sha512-LUCP5ev3GURDysTWiP47wRRUpLKMOfPh+yKTx3kVIEiu5KOMeqzpnYNsKyOoVrULivR8tLcks4+lga33Whn90A=="],

-    "web-tree-sitter": ["web-tree-sitter@0.26.7", "", {}, "sha512-KiZhelTvBA/ziUHEO7Emb75cGVAq8iGZNabYaZm53Zpy50NsXyOW+xSHlwHt5CVg/TRPZBfeVLTTobF0LjFJ1w=="],
+    "web-tree-sitter": ["web-tree-sitter@0.26.8", "", {}, "sha512-4sUwi7ZyOrIk5KLgYLkc2A/F0LFMQnBhfb+2Cdl7ik4ePJ6JD+fk4ofI2sA5eGawBKBaK4Vntt7Ww5KcEsay4A=="],

    "which": ["which@6.0.1", "", { "dependencies": { "isexe": "^4.0.0" }, "bin": { "node-which": "bin/which.js" } }, "sha512-oGLe46MIrCRqX7ytPUf66EAYvdeMIZYn3WaocqqKZAxrBpkqHfL/qvTyJ/bTk5+AqHCjXmrv3CEWgy368zhRUg=="],

@ -739,7 +739,7 @@

    "yallist": ["yallist@5.0.0", "", {}, "sha512-YgvUTfwqyc7UXVMrB+SImsVYSmTS8X/tSrtdNZMImM+n7+QTriRXyXim0mBrTXNeqzVF0KWGgHPeiyViFFrNDw=="],

-    "yaml": ["yaml@2.8.3", "", { "bin": { "yaml": "bin.mjs" } }, "sha512-AvbaCLOO2Otw/lW5bmh9d/WEdcDFdQp2Z2ZUH3pX9U2ihyUY0nvLv7J6TrWowklRGPYbB/IuIMfYgxaCPg5Bpg=="],
+    "yaml": ["yaml@2.9.0", "", { "bin": { "yaml": "bin.mjs" } }, "sha512-2AvhNX3mb8zd6Zy7INTtSpl1F15HW6Wnqj0srWlkKLcpYl/gMIMJiyuGq2KeI2YFxUPjdlB+3Lc10seMLtL4cA=="],

    "yargs": ["yargs@17.7.2", "", { "dependencies": { "cliui": "^8.0.1", "escalade": "^3.1.1", "get-caller-file": "^2.0.5", "require-directory": "^2.1.1", "string-width": "^4.2.3", "y18n": "^5.0.5", "yargs-parser": "^21.1.1" } }, "sha512-7dSzzRQ++CKnNI/krKnYRV7JKKPUXMEh61soaHKg9mrWEhzFWhFnxPxGl+69cD1Ou63C13NUPCnmIcrvqCuM6w=="],

@ -773,8 +773,6 @@

    "micromatch/picomatch": ["picomatch@2.3.1", "", {}, "sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA=="],

-    "node-llama-cpp/node-addon-api": ["node-addon-api@8.7.0", "", {}, "sha512-9MdFxmkKaOYVTV+XVRG8ArDwwQ77XIgIPyKASB1k3JPq3M8fGQQQE3YpMOrKm6g//Ktx8ivZr8xo1Qmtqub+GA=="],
-
    "ora/cli-spinners": ["cli-spinners@3.4.0", "", {}, "sha512-bXfOC4QcT1tKXGorxL3wbJm6XJPDqEnij2gQ2m7ESQuE+/z9YFIWnl/5RpTiKWbMq3EVKR4fRLJGn6DVfu0mpw=="],

    "postcss/nanoid": ["nanoid@3.3.11", "", { "bin": { "nanoid": "bin/nanoid.cjs" } }, "sha512-N8SpfPUnUp1bK+PMYW8qSWdl9U+wwNWI4QKxOYDy9JAro3WMX7p2OeVRF9v+347pnakNevPmiHhNmZ2HbFA76w=="],
@ -793,9 +791,13 @@

    "tinyglobby/picomatch": ["picomatch@4.0.3", "", {}, "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q=="],

-    "vite/picomatch": ["picomatch@4.0.3", "", {}, "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q=="],
+    "tree-sitter-javascript/node-addon-api": ["node-addon-api@8.5.0", "", {}, "sha512-/bRZty2mXUIFY/xU5HLvveNHlswNJej+RnxBjOMkidWfwZzgTbPG1E3K5TOxRLOR+5hX7bSofy8yf1hZevMS8A=="],

-    "vitest/picomatch": ["picomatch@4.0.3", "", {}, "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q=="],
+    "tree-sitter-rust/node-addon-api": ["node-addon-api@8.5.0", "", {}, "sha512-/bRZty2mXUIFY/xU5HLvveNHlswNJej+RnxBjOMkidWfwZzgTbPG1E3K5TOxRLOR+5hX7bSofy8yf1hZevMS8A=="],
+
+    "tree-sitter-typescript/node-addon-api": ["node-addon-api@8.5.0", "", {}, "sha512-/bRZty2mXUIFY/xU5HLvveNHlswNJej+RnxBjOMkidWfwZzgTbPG1E3K5TOxRLOR+5hX7bSofy8yf1hZevMS8A=="],
+
+    "vite/picomatch": ["picomatch@4.0.3", "", {}, "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q=="],

    "wrap-ansi/ansi-styles": ["ansi-styles@4.3.0", "", { "dependencies": { "color-convert": "^2.0.1" } }, "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg=="],

--- a/docs/SYNTAX.md
+++ b/docs/SYNTAX.md
@ -127,26 +127,41 @@ Without intent, "performance" is ambiguous (web-perf? team health? fitness?). Wi
 - Empty lines are ignored
 - Leading/trailing whitespace is trimmed

-## MCP/HTTP API
+## Scoping

-The `query` tool accepts a query document:
+Restrict queries to specific collections with `-c` (CLI) or `collections` (MCP/SDK):

-```json
-{
-  "q": "lex: CAP theorem\nvec: consistency vs availability",
-  "collections": ["docs"],
-  "limit": 10
-}
+```bash
+# CLI — by collection name (see `qmd collection list`)
+qmd query -c docs "how does auth work"
+qmd query -c docs -c notes $'lex: auth\nvec: authentication flow'
 ```

-Or structured format:
+For MCP / HTTP, pass a plural `collections` array (OR match):
+
+```json
+{ "searches": [ { "type": "lex", "query": "auth" } ], "collections": ["docs", "notes"] }
+```
+
+`-c`/`collections` matches by collection name and works from any directory.
+Multiple values are OR-combined. Without scoping, all default-included collections
+are searched; collections marked excluded (`qmd collection exclude <name>`) are
+skipped unless explicitly named. In MCP the parameter is the plural `collections`
+array — a singular `collection` is silently ignored.
+
+## MCP/HTTP API
+
+The `query` tool (and the REST `/query` endpoint) accept a structured query with a
+`searches` array. There is no `q` string parameter — `searches` is required:

 ```json
 {
  "searches": [
    { "type": "lex", "query": "CAP theorem" },
    { "type": "vec", "query": "consistency vs availability" }
-  ]
+  ],
+  "collections": ["docs"],
+  "limit": 10
 }
 ```

--- a/flake.nix
+++ b/flake.nix
@ -44,8 +44,8 @@
        });

        nodeModulesHashes = {
-          x86_64-linux = "sha256-D0ezO4vqq4iswcAMU2DCql9ZAQvh3me6N9aDB5roq4w=";
-          aarch64-darwin = "sha256-qU+9KdR/nTocelyANS09I/4yaQ+7s1LvJNqB27IOK/c=";
+          x86_64-linux = "sha256-sVXoNWIcx1RYRtRWB4F2j7x8/cabFBKq+plFhPU7tBc=";
+          aarch64-darwin = "sha256-gDyJ5boyH44SeXlKo+W4G36GSUejyXP5PFvW+dFS1Mk=";

          # Populate these on first build for additional hosts if/when needed.
          aarch64-linux = pkgs.lib.fakeHash;
--- a/package.json
+++ b/package.json
@ -1,6 +1,6 @@
 {
  "name": "@tobilu/qmd",
-  "version": "2.1.0",
+  "version": "2.5.3",
  "description": "Query Markup Documents - On-device hybrid search for markdown files with BM25, vector search, and LLM reranking",
  "type": "module",
  "main": "dist/index.js",
@ -17,13 +17,23 @@
  "files": [
    "bin/",
    "dist/",
+    "skills/",
+    "scripts/build.mjs",
+    "scripts/check-package-grammars.mjs",
+    "scripts/package-smoke.mjs",
+    "scripts/test-all.mjs",
    "LICENSE",
    "CHANGELOG.md"
  ],
  "scripts": {
    "prepare": "[ -d .git ] && ./scripts/install-hooks.sh || true",
-    "build": "tsc -p tsconfig.build.json && printf '#!/usr/bin/env node\n' | cat - dist/cli/qmd.js > dist/cli/qmd.tmp && mv dist/cli/qmd.tmp dist/cli/qmd.js && chmod +x dist/cli/qmd.js",
-    "test": "vitest run --reporter=verbose test/",
+    "build": "node scripts/build.mjs",
+    "test": "node scripts/test-all.mjs",
+    "test:types": "node ./node_modules/typescript/bin/tsc -p tsconfig.build.json --noEmit",
+    "test:node": "node ./node_modules/vitest/vitest.mjs run --reporter=verbose --testTimeout 60000",
+    "test:bun": "bun test --timeout 60000 --preload ./src/test-preload.ts",
+    "test:unit": "CI=true node ./node_modules/vitest/vitest.mjs run --reporter=verbose --testTimeout 60000 test/ && CI=true bun test --timeout 60000 --preload ./src/test-preload.ts test/",
+    "test:package": "node scripts/package-smoke.mjs",
    "qmd": "tsx src/cli/qmd.ts",
    "index": "tsx src/cli/qmd.ts index",
    "vector": "tsx src/cli/qmd.ts vector",
@ -31,7 +41,8 @@
    "vsearch": "tsx src/cli/qmd.ts vsearch",
    "rerank": "tsx src/cli/qmd.ts rerank",
    "inspector": "npx @modelcontextprotocol/inspector tsx src/cli/qmd.ts mcp",
-    "release": "./scripts/release.sh"
+    "release": "./scripts/release.sh",
+    "smoke:package-grammars": "node scripts/check-package-grammars.mjs"
  },
  "publishConfig": {
    "access": "public"
@ -46,13 +57,17 @@
  },
  "dependencies": {
    "@modelcontextprotocol/sdk": "1.29.0",
-    "better-sqlite3": "12.8.0",
+    "better-sqlite3": "12.10.0",
    "fast-glob": "3.3.3",
    "node-llama-cpp": "3.18.1",
    "picomatch": "4.0.4",
    "sqlite-vec": "0.1.9",
-    "web-tree-sitter": "0.26.7",
-    "yaml": "2.8.3",
+    "tree-sitter-go": "0.25.0",
+    "tree-sitter-python": "0.25.0",
+    "tree-sitter-rust": "0.24.0",
+    "tree-sitter-typescript": "0.23.2",
+    "web-tree-sitter": "0.26.8",
+    "yaml": "2.9.0",
    "zod": "4.2.1"
  },
  "optionalDependencies": {
@ -60,11 +75,7 @@
    "sqlite-vec-darwin-x64": "0.1.9",
    "sqlite-vec-linux-arm64": "0.1.9",
    "sqlite-vec-linux-x64": "0.1.9",
-    "sqlite-vec-windows-x64": "0.1.9",
-    "tree-sitter-go": "0.23.4",
-    "tree-sitter-python": "0.23.4",
-    "tree-sitter-rust": "0.24.0",
-    "tree-sitter-typescript": "0.23.2"
+    "sqlite-vec-windows-x64": "0.1.9"
  },
  "devDependencies": {
    "@types/better-sqlite3": "7.6.13",
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@ -12,8 +12,8 @@ importers:
        specifier: 1.29.0
        version: 1.29.0(zod@4.2.1)
      better-sqlite3:
-        specifier: 12.8.0
-        version: 12.8.0
+        specifier: 12.10.0
+        version: 12.10.0
      fast-glob:
        specifier: 3.3.3
        version: 3.3.3
@ -26,15 +26,27 @@ importers:
      sqlite-vec:
        specifier: 0.1.9
        version: 0.1.9
+      tree-sitter-go:
+        specifier: 0.25.0
+        version: 0.25.0
+      tree-sitter-python:
+        specifier: 0.25.0
+        version: 0.25.0
+      tree-sitter-rust:
+        specifier: 0.24.0
+        version: 0.24.0
+      tree-sitter-typescript:
+        specifier: 0.23.2
+        version: 0.23.2
      typescript:
        specifier: ^5.9.3
        version: 5.9.3
      web-tree-sitter:
-        specifier: 0.26.7
-        version: 0.26.7
+        specifier: 0.26.8
+        version: 0.26.8
      yaml:
-        specifier: 2.8.3
-        version: 2.8.3
+        specifier: 2.9.0
+        version: 2.9.0
      zod:
        specifier: 4.2.1
        version: 4.2.1
@ -47,7 +59,7 @@ importers:
        version: 4.21.0
      vitest:
        specifier: 3.2.4
-        version: 3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3)
+        version: 3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0)
    optionalDependencies:
      sqlite-vec-darwin-arm64:
        specifier: 0.1.9
@ -64,18 +76,6 @@ importers:
      sqlite-vec-windows-x64:
        specifier: 0.1.9
        version: 0.1.9
-      tree-sitter-go:
-        specifier: 0.23.4
-        version: 0.23.4
-      tree-sitter-python:
-        specifier: 0.23.4
-        version: 0.23.4
-      tree-sitter-rust:
-        specifier: 0.24.0
-        version: 0.24.0
-      tree-sitter-typescript:
-        specifier: 0.23.2
-        version: 0.23.2

 packages:

@ -273,36 +273,42 @@ packages:
    engines: {node: '>=20.0.0'}
    cpu: [arm64, x64]
    os: [linux]
+    libc: [glibc]

  '@node-llama-cpp/linux-armv7l@3.18.1':
    resolution: {integrity: sha512-BrJL2cGo0pN5xd5nw+CzTn2rFMpz9MJyZZPUY81ptGkF2uIuXT2hdCVh56i9ImQrTwBfq1YcZL/l/Qe/1+HR/Q==}
    engines: {node: '>=20.0.0'}
    cpu: [arm, x64]
    os: [linux]
+    libc: [glibc]

  '@node-llama-cpp/linux-x64-cuda-ext@3.18.1':
    resolution: {integrity: sha512-VqyKhAVHPCpFzh0f1koCBgpThL+04QOXwv0oDQ8s8YcpfMMOXQlBhTB0plgTh0HrPExoObfTS4ohkrbyGgmztQ==}
    engines: {node: '>=20.0.0'}
    cpu: [x64]
    os: [linux]
+    libc: [glibc]

  '@node-llama-cpp/linux-x64-cuda@3.18.1':
    resolution: {integrity: sha512-qOaYP4uwsUoBHQ/7xSOvyJIuXapS57Al+Sudgi00f96ldNZLKe1vuSGptAi5LTM2lIj66PKm6h8PlRWctwsZ2g==}
    engines: {node: '>=20.0.0'}
    cpu: [x64]
    os: [linux]
+    libc: [glibc]

  '@node-llama-cpp/linux-x64-vulkan@3.18.1':
    resolution: {integrity: sha512-SIaNTK5pUPhwJD0gmiQfHa8OrRctVMmnqu+slJrz2Mzgg/XrwFndJlS9hvc+jSjTXCouwf7sYeQaaJWvQgBh/A==}
    engines: {node: '>=20.0.0'}
    cpu: [x64]
    os: [linux]
+    libc: [glibc]

  '@node-llama-cpp/linux-x64@3.18.1':
    resolution: {integrity: sha512-tRmWcsyvAcqJHQHXHsaOkx6muGbcirA9nRdNgH6n7bjGUw4VuoBD3dChyNF3/Ktt7ohB9kz+XhhyZjbDHpXyMA==}
    engines: {node: '>=20.0.0'}
    cpu: [x64]
    os: [linux]
+    libc: [glibc]

  '@node-llama-cpp/mac-arm64-metal@3.18.1':
    resolution: {integrity: sha512-cyZTdsUMlvuRlGmkkoBbN3v/DT6NuruEqoQYd9CqIrPyLa1xLNBTSKIZ9SgRnw23iCOj4URfITvRP+2pu63LuQ==}
@ -375,24 +381,28 @@ packages:
    engines: {node: '>= 10'}
    cpu: [arm64]
    os: [linux]
+    libc: [glibc]

  '@reflink/reflink-linux-arm64-musl@0.1.19':
    resolution: {integrity: sha512-37iO/Dp6m5DDaC2sf3zPtx/hl9FV3Xze4xoYidrxxS9bgP3S8ALroxRK6xBG/1TtfXKTvolvp+IjrUU6ujIGmA==}
    engines: {node: '>= 10'}
    cpu: [arm64]
    os: [linux]
+    libc: [musl]

  '@reflink/reflink-linux-x64-gnu@0.1.19':
    resolution: {integrity: sha512-jbI8jvuYCaA3MVUdu8vLoLAFqC+iNMpiSuLbxlAgg7x3K5bsS8nOpTRnkLF7vISJ+rVR8W+7ThXlXlUQ93ulkw==}
    engines: {node: '>= 10'}
    cpu: [x64]
    os: [linux]
+    libc: [glibc]

  '@reflink/reflink-linux-x64-musl@0.1.19':
    resolution: {integrity: sha512-e9FBWDe+lv7QKAwtKOt6A2W/fyy/aEEfr0g6j/hWzvQcrzHCsz07BNQYlNOjTfeytrtLU7k449H1PI95jA4OjQ==}
    engines: {node: '>= 10'}
    cpu: [x64]
    os: [linux]
+    libc: [musl]

  '@reflink/reflink-win32-arm64-msvc@0.1.19':
    resolution: {integrity: sha512-09PxnVIQcd+UOn4WAW73WU6PXL7DwGS6wPlkMhMg2zlHHG65F3vHepOw06HFCq+N42qkaNAc8AKIabWvtk6cIQ==}
@ -444,66 +454,79 @@ packages:
    resolution: {integrity: sha512-L+34Qqil+v5uC0zEubW7uByo78WOCIrBvci69E7sFASRl0X7b/MB6Cqd1lky/CtcSVTydWa2WZwFuWexjS5o6g==}
    cpu: [arm]
    os: [linux]
+    libc: [glibc]

  '@rollup/rollup-linux-arm-musleabihf@4.60.1':
    resolution: {integrity: sha512-n83O8rt4v34hgFzlkb1ycniJh7IR5RCIqt6mz1VRJD6pmhRi0CXdmfnLu9dIUS6buzh60IvACM842Ffb3xd6Gg==}
    cpu: [arm]
    os: [linux]
+    libc: [musl]

  '@rollup/rollup-linux-arm64-gnu@4.60.1':
    resolution: {integrity: sha512-Nql7sTeAzhTAja3QXeAI48+/+GjBJ+QmAH13snn0AJSNL50JsDqotyudHyMbO2RbJkskbMbFJfIJKWA6R1LCJQ==}
    cpu: [arm64]
    os: [linux]
+    libc: [glibc]

  '@rollup/rollup-linux-arm64-musl@4.60.1':
    resolution: {integrity: sha512-+pUymDhd0ys9GcKZPPWlFiZ67sTWV5UU6zOJat02M1+PiuSGDziyRuI/pPue3hoUwm2uGfxdL+trT6Z9rxnlMA==}
    cpu: [arm64]
    os: [linux]
+    libc: [musl]

  '@rollup/rollup-linux-loong64-gnu@4.60.1':
    resolution: {integrity: sha512-VSvgvQeIcsEvY4bKDHEDWcpW4Yw7BtlKG1GUT4FzBUlEKQK0rWHYBqQt6Fm2taXS+1bXvJT6kICu5ZwqKCnvlQ==}
    cpu: [loong64]
    os: [linux]
+    libc: [glibc]

  '@rollup/rollup-linux-loong64-musl@4.60.1':
    resolution: {integrity: sha512-4LqhUomJqwe641gsPp6xLfhqWMbQV04KtPp7/dIp0nzPxAkNY1AbwL5W0MQpcalLYk07vaW9Kp1PBhdpZYYcEw==}
    cpu: [loong64]
    os: [linux]
+    libc: [musl]

  '@rollup/rollup-linux-ppc64-gnu@4.60.1':
    resolution: {integrity: sha512-tLQQ9aPvkBxOc/EUT6j3pyeMD6Hb8QF2BTBnCQWP/uu1lhc9AIrIjKnLYMEroIz/JvtGYgI9dF3AxHZNaEH0rw==}
    cpu: [ppc64]
    os: [linux]
+    libc: [glibc]

  '@rollup/rollup-linux-ppc64-musl@4.60.1':
    resolution: {integrity: sha512-RMxFhJwc9fSXP6PqmAz4cbv3kAyvD1etJFjTx4ONqFP9DkTkXsAMU4v3Vyc5BgzC+anz7nS/9tp4obsKfqkDHg==}
    cpu: [ppc64]
    os: [linux]
+    libc: [musl]

  '@rollup/rollup-linux-riscv64-gnu@4.60.1':
    resolution: {integrity: sha512-QKgFl+Yc1eEk6MmOBfRHYF6lTxiiiV3/z/BRrbSiW2I7AFTXoBFvdMEyglohPj//2mZS4hDOqeB0H1ACh3sBbg==}
    cpu: [riscv64]
    os: [linux]
+    libc: [glibc]

  '@rollup/rollup-linux-riscv64-musl@4.60.1':
    resolution: {integrity: sha512-RAjXjP/8c6ZtzatZcA1RaQr6O1TRhzC+adn8YZDnChliZHviqIjmvFwHcxi4JKPSDAt6Uhf/7vqcBzQJy0PDJg==}
    cpu: [riscv64]
    os: [linux]
+    libc: [musl]

  '@rollup/rollup-linux-s390x-gnu@4.60.1':
    resolution: {integrity: sha512-wcuocpaOlaL1COBYiA89O6yfjlp3RwKDeTIA0hM7OpmhR1Bjo9j31G1uQVpDlTvwxGn2nQs65fBFL5UFd76FcQ==}
    cpu: [s390x]
    os: [linux]
+    libc: [glibc]

  '@rollup/rollup-linux-x64-gnu@4.60.1':
    resolution: {integrity: sha512-77PpsFQUCOiZR9+LQEFg9GClyfkNXj1MP6wRnzYs0EeWbPcHs02AXu4xuUbM1zhwn3wqaizle3AEYg5aeoohhg==}
    cpu: [x64]
    os: [linux]
+    libc: [glibc]

  '@rollup/rollup-linux-x64-musl@4.60.1':
    resolution: {integrity: sha512-5cIATbk5vynAjqqmyBjlciMJl1+R/CwX9oLk/EyiFXDWd95KpHdrOJT//rnUl4cUcskrd0jCCw3wpZnhIHdD9w==}
    cpu: [x64]
    os: [linux]
+    libc: [musl]

  '@rollup/rollup-openbsd-x64@4.60.1':
    resolution: {integrity: sha512-cl0w09WsCi17mcmWqqglez9Gk8isgeWvoUZ3WiJFYSR3zjBQc2J5/ihSjpl+VLjPqjQ/1hJRcqBfLjssREQILw==}
@ -628,9 +651,9 @@ packages:
  base64-js@1.5.1:
    resolution: {integrity: sha512-AKpaYlHn8t4SVbOHCy+b5+KKgvR4vrsD8vbvrbiQJps7fKDTkjkDry6ji0rUJjC0kzbNePLwzxq8iypo41qeWA==}

-  better-sqlite3@12.8.0:
-    resolution: {integrity: sha512-RxD2Vd96sQDjQr20kdP+F+dK/1OUNiVOl200vKBZY8u0vTwysfolF6Hq+3ZK2+h8My9YvZhHsF+RSGZW2VYrPQ==}
-    engines: {node: 20.x || 22.x || 23.x || 24.x || 25.x}
+  better-sqlite3@12.10.0:
+    resolution: {integrity: sha512-CyzaZRQKyHkB2ZInfTTl2nvT33EbDpjkLEbE8/Zck3Ll6O0qqvuGdrJ45HgtH+HykRg88ITY3AdreBGN70aBSQ==}
+    engines: {node: 20.x || 22.x || 23.x || 24.x || 25.x || 26.x}

  bindings@1.5.0:
    resolution: {integrity: sha512-p2q/t/mhvuOj/UeLlV6566GD/guowlr0hHxClI0W9m7MWYkL1F0hLo+0Aexs9HSPCtR1SXQ0TD3MMKrXZajbiQ==}
@ -943,8 +966,8 @@ packages:
    resolution: {integrity: sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==}
    engines: {node: '>= 0.4'}

-  get-tsconfig@4.13.7:
-    resolution: {integrity: sha512-7tN6rFgBlMgpBML5j8typ92BKFi2sFQvIdpAqLA2beia5avZDrMs0FLZiM5etShWq5irVyGcGMEA1jcDaK7A/Q==}
+  get-tsconfig@4.14.0:
+    resolution: {integrity: sha512-yTb+8DXzDREzgvYmh6s9vHsSVCHeC0G3PI5bEXNBHtmshPnO+S5O7qgLEOn0I5QvMy6kpZN8K1NKGyilLb93wA==}

  github-from-package@0.0.0:
    resolution: {integrity: sha512-SyHy3T1v2NUXn29OsWdxmK6RwHD+vkj3v8en8AOBZ1wBQ/hCAQ5bAQTD02kW4W9tUp/3Qh6J8r9EvntiyCmOOw==}
@ -1536,10 +1559,10 @@ packages:
    resolution: {integrity: sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==}
    engines: {node: '>=0.6'}

-  tree-sitter-go@0.23.4:
-    resolution: {integrity: sha512-iQaHEs4yMa/hMo/ZCGqLfG61F0miinULU1fFh+GZreCRtKylFLtvn798ocCZjO2r/ungNZgAY1s1hPFyAwkc7w==}
+  tree-sitter-go@0.25.0:
+    resolution: {integrity: sha512-APBc/Dq3xz/e35Xpkhb1blu5UgW+2E3RyGWawZSCNcbGwa7jhSQPS8KsUupuzBla8PCo8+lz9W/JDJjmfRa2tw==}
    peerDependencies:
-      tree-sitter: ^0.21.1
+      tree-sitter: ^0.25.0
    peerDependenciesMeta:
      tree-sitter:
        optional: true
@ -1552,10 +1575,10 @@ packages:
      tree-sitter:
        optional: true

-  tree-sitter-python@0.23.4:
-    resolution: {integrity: sha512-MbmUAl7y5UCUWqHscHke7DdRDwQnVNMNKQYQc4Gq2p09j+fgPxaU8JVsuOI/0HD3BSEEe5k9j3xmdtIWbDtDgw==}
+  tree-sitter-python@0.25.0:
+    resolution: {integrity: sha512-eCmJx6zQa35GxaCtQD+wXHOhYqBxEL+bp71W/s3fcDMu06MrtzkVXR437dRrCrbrDbyLuUDJpAgycs7ncngLXw==}
    peerDependencies:
-      tree-sitter: ^0.21.1
+      tree-sitter: ^0.25.0
    peerDependenciesMeta:
      tree-sitter:
        optional: true
@ -1691,8 +1714,8 @@ packages:
      jsdom:
        optional: true

-  web-tree-sitter@0.26.7:
-    resolution: {integrity: sha512-KiZhelTvBA/ziUHEO7Emb75cGVAq8iGZNabYaZm53Zpy50NsXyOW+xSHlwHt5CVg/TRPZBfeVLTTobF0LjFJ1w==}
+  web-tree-sitter@0.26.8:
+    resolution: {integrity: sha512-4sUwi7ZyOrIk5KLgYLkc2A/F0LFMQnBhfb+2Cdl7ik4ePJ6JD+fk4ofI2sA5eGawBKBaK4Vntt7Ww5KcEsay4A==}

  which@2.0.2:
    resolution: {integrity: sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==}
@ -1724,8 +1747,8 @@ packages:
    resolution: {integrity: sha512-YgvUTfwqyc7UXVMrB+SImsVYSmTS8X/tSrtdNZMImM+n7+QTriRXyXim0mBrTXNeqzVF0KWGgHPeiyViFFrNDw==}
    engines: {node: '>=18'}

-  yaml@2.8.3:
-    resolution: {integrity: sha512-AvbaCLOO2Otw/lW5bmh9d/WEdcDFdQp2Z2ZUH3pX9U2ihyUY0nvLv7J6TrWowklRGPYbB/IuIMfYgxaCPg5Bpg==}
+  yaml@2.9.0:
+    resolution: {integrity: sha512-2AvhNX3mb8zd6Zy7INTtSpl1F15HW6Wnqj0srWlkKLcpYl/gMIMJiyuGq2KeI2YFxUPjdlB+3Lc10seMLtL4cA==}
    engines: {node: '>= 14.6'}
    hasBin: true

@ -2060,13 +2083,13 @@ snapshots:
      chai: 5.3.3
      tinyrainbow: 2.0.0

-  '@vitest/mocker@3.2.4(vite@7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3))':
+  '@vitest/mocker@3.2.4(vite@7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0))':
    dependencies:
      '@vitest/spy': 3.2.4
      estree-walker: 3.0.3
      magic-string: 0.30.21
    optionalDependencies:
-      vite: 7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3)
+      vite: 7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0)

  '@vitest/pretty-format@3.2.4':
    dependencies:
@ -2130,7 +2153,7 @@ snapshots:

  base64-js@1.5.1: {}

-  better-sqlite3@12.8.0:
+  better-sqlite3@12.10.0:
    dependencies:
      bindings: 1.5.0
      prebuild-install: 7.1.3
@ -2474,7 +2497,7 @@ snapshots:
      dunder-proto: 1.0.1
      es-object-atoms: 1.1.1

-  get-tsconfig@4.13.7:
+  get-tsconfig@4.14.0:
    dependencies:
      resolve-pkg-maps: 1.0.0

@ -2654,8 +2677,7 @@ snapshots:

  node-api-headers@1.8.0: {}

-  node-gyp-build@4.8.4:
-    optional: true
+  node-gyp-build@4.8.4: {}

  node-llama-cpp@3.18.1(typescript@5.9.3):
    dependencies:
@ -3113,41 +3135,36 @@ snapshots:

  toidentifier@1.0.1: {}

-  tree-sitter-go@0.23.4:
+  tree-sitter-go@0.25.0:
    dependencies:
      node-addon-api: 8.7.0
      node-gyp-build: 4.8.4
-    optional: true

  tree-sitter-javascript@0.23.1:
    dependencies:
      node-addon-api: 8.7.0
      node-gyp-build: 4.8.4
-    optional: true

-  tree-sitter-python@0.23.4:
+  tree-sitter-python@0.25.0:
    dependencies:
      node-addon-api: 8.7.0
      node-gyp-build: 4.8.4
-    optional: true

  tree-sitter-rust@0.24.0:
    dependencies:
      node-addon-api: 8.7.0
      node-gyp-build: 4.8.4
-    optional: true

  tree-sitter-typescript@0.23.2:
    dependencies:
      node-addon-api: 8.7.0
      node-gyp-build: 4.8.4
      tree-sitter-javascript: 0.23.1
-    optional: true

  tsx@4.21.0:
    dependencies:
      esbuild: 0.27.7
-      get-tsconfig: 4.13.7
+      get-tsconfig: 4.14.0
    optionalDependencies:
      fsevents: 2.3.3

@ -3177,13 +3194,13 @@ snapshots:

  vary@1.1.2: {}

-  vite-node@3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3):
+  vite-node@3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0):
    dependencies:
      cac: 6.7.14
      debug: 4.4.3
      es-module-lexer: 1.7.0
      pathe: 2.0.3
-      vite: 7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3)
+      vite: 7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0)
    transitivePeerDependencies:
      - '@types/node'
      - jiti
@ -3198,7 +3215,7 @@ snapshots:
      - tsx
      - yaml

-  vite@7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3):
+  vite@7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0):
    dependencies:
      esbuild: 0.27.7
      fdir: 6.5.0(picomatch@4.0.4)
@ -3210,13 +3227,13 @@ snapshots:
      '@types/node': 25.5.2
      fsevents: 2.3.3
      tsx: 4.21.0
-      yaml: 2.8.3
+      yaml: 2.9.0

-  vitest@3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3):
+  vitest@3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0):
    dependencies:
      '@types/chai': 5.2.3
      '@vitest/expect': 3.2.4
-      '@vitest/mocker': 3.2.4(vite@7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3))
+      '@vitest/mocker': 3.2.4(vite@7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0))
      '@vitest/pretty-format': 3.2.4
      '@vitest/runner': 3.2.4
      '@vitest/snapshot': 3.2.4
@ -3234,8 +3251,8 @@ snapshots:
      tinyglobby: 0.2.15
      tinypool: 1.1.1
      tinyrainbow: 2.0.0
-      vite: 7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3)
-      vite-node: 3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.8.3)
+      vite: 7.3.2(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0)
+      vite-node: 3.2.4(@types/node@25.5.2)(tsx@4.21.0)(yaml@2.9.0)
      why-is-node-running: 2.3.0
    optionalDependencies:
      '@types/node': 25.5.2
@ -3253,7 +3270,7 @@ snapshots:
      - tsx
      - yaml

-  web-tree-sitter@0.26.7: {}
+  web-tree-sitter@0.26.8: {}

  which@2.0.2:
    dependencies:
@ -3280,7 +3297,7 @@ snapshots:

  yallist@5.0.0: {}

-  yaml@2.8.3: {}
+  yaml@2.9.0: {}

  yargs-parser@21.1.1: {}

--- a/scripts/build.mjs
+++ b/scripts/build.mjs
@ -0,0 +1,29 @@
+#!/usr/bin/env node
+import { spawnSync } from "node:child_process";
+import { chmodSync, readFileSync, renameSync, writeFileSync } from "node:fs";
+import { join } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const root = join(fileURLToPath(new URL("..", import.meta.url)));
+
+function run(command, args, options = {}) {
+  const result = spawnSync(command, args, {
+    cwd: root,
+    stdio: "inherit",
+    shell: process.platform === "win32",
+    ...options,
+  });
+  if (result.status !== 0) {
+    process.exit(result.status ?? 1);
+  }
+}
+
+run(process.execPath, [join(root, "node_modules", "typescript", "bin", "tsc"), "-p", "tsconfig.build.json"]);
+
+const cliPath = join(root, "dist", "cli", "qmd.js");
+const tmpPath = `${cliPath}.tmp`;
+const built = readFileSync(cliPath, "utf8");
+const withoutExistingShebang = built.startsWith("#!") ? built.slice(built.indexOf("\n") + 1) : built;
+writeFileSync(tmpPath, `#!/usr/bin/env node\n${withoutExistingShebang}`);
+renameSync(tmpPath, cliPath);
+chmodSync(cliPath, 0o755);
--- a/scripts/check-package-grammars.mjs
+++ b/scripts/check-package-grammars.mjs
@ -0,0 +1,29 @@
+#!/usr/bin/env node
+import { createRequire } from "node:module";
+
+const require = createRequire(import.meta.url);
+
+const grammars = [
+  "tree-sitter-typescript/tree-sitter-typescript.wasm",
+  "tree-sitter-typescript/tree-sitter-tsx.wasm",
+  "tree-sitter-python/tree-sitter-python.wasm",
+  "tree-sitter-go/tree-sitter-go.wasm",
+  "tree-sitter-rust/tree-sitter-rust.wasm",
+];
+
+let ok = true;
+for (const grammar of grammars) {
+  try {
+    const resolved = require.resolve(grammar);
+    console.log(`ok ${grammar} -> ${resolved}`);
+  } catch (err) {
+    ok = false;
+    console.error(`missing ${grammar}`);
+    console.error(err instanceof Error ? err.message : String(err));
+  }
+}
+
+if (!ok) {
+  console.error("\nAST grammar package smoke check failed. Run `bun install` locally or repair a broken global install with the matching `bun add tree-sitter-...@<version>` command shown by `qmd status`.");
+  process.exit(1);
+}
--- a/scripts/package-smoke.mjs
+++ b/scripts/package-smoke.mjs
@ -0,0 +1,65 @@
+#!/usr/bin/env node
+import { spawnSync } from "node:child_process";
+import { existsSync, readFileSync, statSync } from "node:fs";
+import { join } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const root = fileURLToPath(new URL("..", import.meta.url));
+const pkg = JSON.parse(readFileSync(join(root, "package.json"), "utf8"));
+
+function run(label, command, args, options = {}) {
+  console.log(`==> ${label}`);
+  const { quiet, ...spawnOptions } = options;
+  const result = spawnSync(command, args, {
+    cwd: root,
+    stdio: quiet ? "pipe" : "inherit",
+    shell: process.platform === "win32",
+    ...spawnOptions,
+  });
+  if (result.status !== 0) {
+    console.error(`Package smoke failed: ${label}`);
+    if (quiet) {
+      if (result.stdout) process.stderr.write(result.stdout);
+      if (result.stderr) process.stderr.write(result.stderr);
+    }
+    process.exit(result.status ?? 1);
+  }
+}
+
+function assertPath(path, label = path) {
+  const full = join(root, path);
+  if (!existsSync(full)) {
+    console.error(`Package smoke failed: missing ${label} (${path})`);
+    process.exit(1);
+  }
+  return full;
+}
+
+run("build compiled package", process.execPath, ["scripts/build.mjs"]);
+run("AST grammar runtime packages", process.execPath, ["scripts/check-package-grammars.mjs"]);
+
+for (const entry of pkg.files ?? []) {
+  assertPath(entry.replace(/\/$/, ""), `package.json files[] entry ${entry}`);
+}
+
+for (const [name, binPath] of Object.entries(pkg.bin ?? {})) {
+  const full = assertPath(binPath, `bin ${name}`);
+  const mode = statSync(full).mode;
+  if ((mode & 0o111) === 0) {
+    console.error(`Package smoke failed: bin ${name} is not executable (${binPath})`);
+    process.exit(1);
+  }
+}
+
+assertPath("dist/index.js", "compiled main export");
+assertPath("dist/index.d.ts", "compiled type export");
+assertPath("dist/cli/qmd.js", "compiled CLI");
+
+run("compiled CLI under Node", process.execPath, ["dist/cli/qmd.js", "--help"], { quiet: true });
+run("package wrapper", "sh", ["bin/qmd", "--help"], { quiet: true });
+
+if (process.env.QMD_SKIP_BUN_SMOKE === "1") {
+  console.log("==> compiled CLI under Bun (skipped by QMD_SKIP_BUN_SMOKE=1)");
+} else {
+  run("compiled CLI under Bun", "bun", ["dist/cli/qmd.js", "--help"], { quiet: true });
+}
--- a/scripts/release.sh
+++ b/scripts/release.sh
@ -93,7 +93,7 @@ echo ""

 # --- Rename [Unreleased] -> [X.Y.Z] - date, add fresh [Unreleased] ---

-sed -i '' "s/^## \[Unreleased\].*/## [$NEW] - $DATE/" CHANGELOG.md
+perl -0pi -e 's/^## \[Unreleased\].*/## ['"$NEW"'] - '"$DATE"'/m' CHANGELOG.md

 # Insert a new empty [Unreleased] section after the header
 awk '
--- a/scripts/repro-metal-rsets-crash.mjs
+++ b/scripts/repro-metal-rsets-crash.mjs
@ -0,0 +1,118 @@
+#!/usr/bin/env node
+/**
+ * Minimal reproduction of llama.cpp issue ggml-org/llama.cpp#22593:
+ *
+ *   ggml-metal-device.m:612: GGML_ASSERT([rsets->data count] == 0) failed
+ *
+ * Root cause (per the upstream issue and proposed fix PR #22595):
+ *   `ggml_metal_buffer_rset_free` releases the per-buffer residency set object
+ *   but does NOT call the symmetric `ggml_metal_device_rsets_rm`. So the
+ *   device's `rsets->data` array accumulates dangling references. When the
+ *   process exits and libc fires the process-static `ggml_metal_device`
+ *   destructor in `__cxa_finalize_ranges`, the destructor asserts the
+ *   array is empty — and it isn't.
+ *
+ * Observed downstream behavior:
+ *   - With EXPLICIT `dispose()` of every JS handle in order, the assertion
+ *     does NOT fire. node-llama-cpp's dispose path tears the Metal buffers
+ *     down before the static dtor runs, so the device's rsets array is
+ *     empty by exit time. (Tested locally — clean exit.)
+ *   - With NO dispose (the typical real-world case: synchronous `exit()`,
+ *     `--watch` mode, `process.exit()` after results are written, or any
+ *     code path where GC + finalizers race with libc exit), the rset
+ *     references linger until the static dtor fires, and the assertion
+ *     trips.
+ *
+ * What this script does:
+ *   1. Load node-llama-cpp + a small GGUF model on the Metal backend.
+ *      This allocates at least one Metal buffer → calls rsets_add internally.
+ *   2. Run an inference (creating an embedding context populates buffers
+ *      that the dispose path would normally clean up).
+ *   3. Skip explicit dispose. Just let the process exit.
+ *
+ * Expected behavior on macOS 15+ with Apple Silicon, current llama.cpp
+ * (bundled in node-llama-cpp 3.18.1, llama.cpp tag b8390):
+ *   - Without GGML_METAL_NO_RESIDENCY:
+ *       Script writes "ok" and main() returns, then ggml_abort fires the
+ *       assertion, prints a multi-kB backtrace, and the process exits with
+ *       SIGABRT (exit code 134).
+ *   - With GGML_METAL_NO_RESIDENCY=1:
+ *       Clean exit code 0. Residency-set code path is skipped entirely.
+ *   - With --dispose flag (manual cleanup):
+ *       Clean exit code 0 even without the env var, as long as JS dispose()
+ *       runs successfully before libc exit.
+ *
+ * Usage:
+ *   # Reproduce the crash (no dispose, no env var)
+ *   node scripts/repro-metal-rsets-crash.mjs
+ *
+ *   # Verify the documented workaround
+ *   GGML_METAL_NO_RESIDENCY=1 node scripts/repro-metal-rsets-crash.mjs
+ *
+ *   # Verify that explicit dispose also avoids the crash
+ *   node scripts/repro-metal-rsets-crash.mjs --dispose
+ *
+ * Refs:
+ *   https://github.com/ggml-org/llama.cpp/issues/22593  (root-cause analysis)
+ *   https://github.com/ggml-org/llama.cpp/pull/22595    (one-line fix, open)
+ *   https://github.com/tobi/qmd/issues/368              (downstream report)
+ *   https://github.com/tobi/qmd/issues/674              (downstream, current)
+ *   https://github.com/tobi/qmd/pull/600                (downstream workaround PR)
+ */
+
+import { existsSync } from "node:fs";
+import { homedir } from "node:os";
+import { resolve } from "node:path";
+
+const DEFAULT_MODEL = resolve(
+  homedir(),
+  ".cache/qmd/models/hf_ggml-org_embeddinggemma-300M-Q8_0.gguf",
+);
+
+const args = process.argv.slice(2);
+const wantsDispose = args.includes("--dispose");
+const modelPath = args.find((a) => !a.startsWith("--")) ?? DEFAULT_MODEL;
+
+if (!existsSync(modelPath)) {
+  console.error(`Model not found: ${modelPath}`);
+  console.error("Pass a path to any local GGUF as argv[1], or run `qmd embed` once to populate the default cache path.");
+  process.exit(2);
+}
+
+console.error(
+  `[repro] GGML_METAL_NO_RESIDENCY=${process.env.GGML_METAL_NO_RESIDENCY ?? "(unset)"}`,
+);
+console.error(`[repro] dispose=${wantsDispose}`);
+console.error(`[repro] loading: ${modelPath}`);
+
+const { getLlama } = await import("node-llama-cpp");
+
+const llama = await getLlama();
+const model = await llama.loadModel({ modelPath });
+const context = await model.createEmbeddingContext();
+
+console.error(`[repro] backend: ${llama.gpu}`);
+
+// Run actual inference so the buffer-allocation path is hit.
+await context.getEmbeddingFor("repro text");
+
+if (wantsDispose) {
+  console.error("[repro] explicit dispose…");
+  await context.dispose();
+  await model.dispose();
+  await llama.dispose();
+}
+
+console.error("[repro] main() returning via process.exit(0)");
+console.log("ok");
+
+// CRITICAL: use process.exit(), not `return`. node-llama-cpp registers a
+// `process.once('beforeExit', …)` hook that auto-disposes WeakRef'd Llama
+// instances when the event loop empties naturally. `process.exit()` skips
+// `beforeExit`, so the rsets stay populated until libc's `exit()` fires the
+// static dtor — which is when the upstream assertion bug trips.
+//
+// CLI tools (qmd query, qmd vsearch, qmd embed, etc.) all call process.exit()
+// after writing results, which is why every real downstream report crashes
+// even though the minimal "let main return" version does not.
+process.exit(0);
--- a/scripts/test-all.mjs
+++ b/scripts/test-all.mjs
@ -0,0 +1,38 @@
+#!/usr/bin/env node
+import { spawnSync } from "node:child_process";
+import { join } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const root = fileURLToPath(new URL("..", import.meta.url));
+
+// Mirror bin/qmd's darwin Metal residency mitigation for test subprocesses.
+// libggml-metal asserts on a non-empty residency set during its static
+// destructor (ggml-org/llama.cpp#22593, fix open as #22595) and dumps a
+// multi-kB backtrace at process exit even when tests pass. The env var must
+// be set BEFORE the subprocess starts because libggml-metal reads it via
+// libc getenv at module-load time. Opt out with QMD_METAL_KEEP_RESIDENCY=1.
+const darwinMetalEnv =
+  process.platform === "darwin" && process.env.QMD_METAL_KEEP_RESIDENCY !== "1"
+    ? { GGML_METAL_NO_RESIDENCY: "1" }
+    : {};
+
+function run(label, command, args, options = {}) {
+  console.log(`==> ${label}`);
+  const { env: extraEnv, ...spawnOptions } = options;
+  const result = spawnSync(command, args, {
+    cwd: root,
+    stdio: "inherit",
+    shell: process.platform === "win32",
+    env: { ...process.env, ...darwinMetalEnv, ...(extraEnv ?? {}) },
+    ...spawnOptions,
+  });
+  if (result.status !== 0) {
+    console.error(`Test task failed: ${label}`);
+    process.exit(result.status ?? 1);
+  }
+}
+
+run("TypeScript build typecheck", process.execPath, [join(root, "node_modules", "typescript", "bin", "tsc"), "-p", "tsconfig.build.json", "--noEmit"]);
+run("Vitest suite under Node", process.execPath, [join(root, "node_modules", "vitest", "vitest.mjs"), "run", "--reporter=verbose", "--testTimeout", "60000", "test/"], { env: { CI: "true" } });
+run("Bun test suite", "bun", ["test", "--timeout", "60000", "--preload", "./src/test-preload.ts", "test/"], { env: { CI: "true" } });
+run("Package smoke", process.execPath, ["scripts/package-smoke.mjs"]);
--- a/skills/qmd/SKILL.md
+++ b/skills/qmd/SKILL.md
@ -1,144 +1,295 @@
 ---
 name: qmd
-description: Search markdown knowledge bases, notes, and documentation using QMD. Use when users ask to search notes, find documents, or look up information.
+description: Search local markdown knowledge bases, notes, docs, and wikis with QMD. Use when users ask to find notes, retrieve documents, inspect a wiki, answer from indexed markdown, or set up QMD access.
 license: MIT
 compatibility: Requires qmd CLI or MCP server. Install via `npm install -g @tobilu/qmd`.
 metadata:
  author: tobi
-  version: "2.0.0"
+  version: "2.2.0"
 allowed-tools: Bash(qmd:*), mcp__qmd__*
 ---

-# QMD - Quick Markdown Search
+# QMD - Query Markdown Documents

-Local search engine for markdown content.
+## How search works

-## Status
+QMD searches local markdown collections: notes, docs, wikis, transcripts, and
+project knowledge bases. Use it before web search when the answer may already be
+in indexed local files.

-!`qmd status 2>/dev/null || echo "Not installed: npm install -g @tobilu/qmd"`
+The workflow is always:

-## MCP: `query`
+1. Search for candidate documents.
+2. Retrieve the full source with `qmd get` or `qmd multi-get`.
+3. Answer from retrieved text, citing paths or docids.
+
+Do not answer from snippets alone when the user needs facts, decisions, quotes,
+or nuance. Snippets are only leads.
+
+Typical loop:
+
+```bash
+qmd search "merchant reality support interviews" -n 5
+# leads: #abc123 concepts/customer-proximity.md; #def432 sources/merchant-call.md
+qmd multi-get "#abc123,#def432" --format md
+```
+
+**Default to structured `qmd query` with `intent:`, `lex:`, `vec:`, and `hyde:`
+fields that you write yourself.** You are a better query expander than the
+built-in model: you know the user's actual goal, the domain vocabulary, and the
+nearby-but-wrong concepts to avoid. Do not just paste the user's words into
+`qmd query "..."` and hope the expansion model guesses right — supply the
+`intent:` and craft the lexical and semantic terms deliberately (see
+[Pick the right search mode](#pick-the-right-search-mode)).
+
+When reporting what you retrieved, a compact note is enough; do not paste whole
+files unless needed:
+
+```text
+Retrieved:
+- #abc123 concepts/customer-proximity.md
+- #def432 sources/merchant-call.md
+```
+
+## Pick the right search mode
+
+Use **BM25 lexical search** when you know exact words, titles, names, code
+symbols, or rare phrases:
+
+```bash
+qmd search "cockpit OKR Goodhart" -n 10
+qmd search '"AI Before Headcount"' -c concepts -n 5
+```
+
+Use **`qmd query` with structured fields** when the user describes an idea
+indirectly, uses different wording than the source, or needs conceptual recall.
+**This is the default mode — write the fields yourself rather than leaning on
+query expansion.** Combine exact anchors with semantic recall:
+
+```bash
+qmd query $'intent: Find the concept note about metrics as instruments without letting OKRs replace judgment.\nlex: cockpit instruments OKR Goodhart metrics judgment\nvec: data informed not metric driven product judgment\nhyde: A concept note says metrics are useful like cockpit instruments, but leaders should remain data-informed rather than metric-driven because OKRs and dashboards can Goodhart product judgment.'
+```
+
+Structured query fields (you author each one — do not delegate this to the
+expansion model):
+
+- `intent:` states what you are trying to find **and what to avoid**. Always
+  supply this. It steers ranking away from nearby-but-wrong concepts.
+- `lex:` exact terms, aliases, titles, code symbols, and rare words you expect
+  in the source. This is your own keyword expansion.
+- `vec:` paraphrases the idea in natural language, in source-like wording.
+- `hyde:` describes the document or answer that would satisfy the request.
+
+You do not need all four every time, but you should almost always write at least
+`intent:` plus one of `lex:`/`vec:`. A bare `qmd query "the user's sentence"`
+throws away the context only you have and relies on the built-in expander to
+reconstruct it — prefer the structured form.
+
+If you genuinely have nothing to expand (a single rare token, a verbatim phrase),
+that is a job for `qmd search`, not bare `qmd query`:
+
+```bash
+qmd query --format json --explain $'intent: ...\nlex: ...\nvec: ...'  # inspect ranking
+```
+
+If `qmd query` is slow or model/GPU setup fails, fall back to `qmd search` with
+better lexical terms.
+
+## Retrieve sources
+
+Search results include docids like `#abc123` and `qmd://...` paths. Fetch them:
+
+```bash
+qmd get "#abc123"
+qmd get qmd://concepts/ai-before-headcount.md
+qmd multi-get "#abc123,#def432" --format md
+qmd multi-get 'concepts/{ai-before-headcount.md,data-informed-not-metric-driven.md}' --format md
+qmd multi-get 'sources/podcast-2025-*.md' -l 80
+```
+
+Use `multi-get` when comparing several hits or gathering context across pages.
+
+### Output is line-numbered and carries the docid — cite both
+
+`get` and `multi-get` are **line-numbered by default** and always print the
+document's `#docid` and `qmd://` path. So `get` output looks like:
+
+```text
+qmd://concepts/note.md  #abc123
+---
+
+1: # Metrics as instruments
+2:
+3: Treat dashboards like cockpit instruments...
+```
+
+Cite the docid and exact line numbers in your answer, and use the numbers to ask
+for the next slice. Pass `--no-line-numbers` only when you need raw content to
+copy verbatim (e.g. reproducing a code block).
+
+When you need to open or edit the underlying file (e.g. hand a path to `Read`,
+`Edit`, or an editor), add `--full-path`. It replaces the `qmd://` URL + docid
+header with the document's on-disk path, falling back to the canonical header if
+the file no longer exists on disk:
+
+```text
+$ qmd get "#abc123" --full-path
+/Users/you/notes/concepts/note.md
+---
+
+1: # Metrics as instruments
+```
+
+`--full-path` works the same way on `qmd search` and `qmd query`: result paths
+become the file's on-disk path — `./`-prefixed relative path when the file is
+inside `$PWD`, absolute realpath otherwise — and the per-result `#docid` is
+dropped because the path is the identifier. The leading `./` is intentional so
+the output is unambiguously a filesystem path and cannot be mistaken for a bare
+collection-relative string. Default search/query output still uses `qmd://`
+URIs; only opt into `--full-path` when you specifically need a path you can hand
+to a non-QMD tool.
+
+### Read line ranges with the `:from:count` suffix — never pipe through `sed`/`head`/`tail`
+
+`qmd get` slices files itself. Use the suffix or flags; do **not** shell out to
+`sed -n`, `head`, `tail`, or `awk` to pull a line range. Piping defeats docid
+resolution, virtual-path lookups, line numbering, and the header, and it is
+slower and more error-prone.
+
+The most compact form is a `:from:count` suffix right on the path or docid —
+prefer it:
+
+```bash
+qmd get "#abc123:120:40"                  # 40 lines starting at line 120
+qmd get qmd://concepts/note.md:200:60     # lines 200–259
+qmd get "#abc123:120"                      # from line 120 to end of file
+qmd get "#abc123" --from 120 -l 40         # equivalent, using flags
+```
+
+Suffix and flags:
+
+- `<path>:<from>:<count>` — start at line `<from>`, read `<count>` lines. **Best
+  for reading around a search hit.**
+- `<path>:<from>` — start at `<from>`, read to end of file.
+- `--from <line>` / `-l <lines>` — flag equivalents. Explicit flags override the
+  suffix, so `... :5:2 -l 1` reads 1 line.
+- `--no-line-numbers` — drop the `N:` prefixes (line numbers are on by default).
+
+Wrong: `qmd get "#abc123" | sed -n '120,160p'`
+Right: `qmd get "#abc123:120:40"`
+
+Search results include a `:line` anchor on each hit — feed it straight into
+`qmd get path:line:<n>` to read a window around the match (line numbers in the
+output will start at `line`).
+
+## Discover what is indexed
+
+```bash
+qmd collection list
+qmd ls
+qmd status
+```
+
+Add collection filters when broad searches drift into the wrong corpus:
+
+```bash
+qmd search "headcount autonomous agents" -c concepts -n 10
+qmd query "merchant support product reality" -c concepts -c sources -n 10
+```
+
+Omit `-c` to search everything.
+
+## MCP Tool: `query`
+
+When using the MCP server, prefer structured searches:

 ```json
 {
  "searches": [
-    { "type": "lex", "query": "CAP theorem consistency" },
-    { "type": "vec", "query": "tradeoff between consistency and availability" }
+    { "type": "lex", "query": "cockpit OKR Goodhart" },
+    { "type": "vec", "query": "data informed not metric driven product judgment" },
+    { "type": "hyde", "query": "A concept note explains that metrics are useful as instruments, but leaders should not let OKRs or dashboards replace judgment." }
  ],
-  "collections": ["docs"],
+  "intent": "Find the concept note about using metrics as instruments without becoming metric-driven.",
+  "collections": ["concepts"],
  "limit": 10
 }
 ```

-### Query Types
+Query types:

-| Type | Method | Input |
-|------|--------|-------|
-| `lex` | BM25 | Keywords — exact terms, names, code |
-| `vec` | Vector | Question — natural language |
-| `hyde` | Vector | Answer — hypothetical result (50-100 words) |
+- `lex` — BM25 keyword search. Best for exact terms, names, titles, and code.
+- `vec` — vector semantic search. Best for natural-language concepts.
+- `hyde` — vector search using a hypothetical answer/document passage.

-### Writing Good Queries
+## Query craft

-**lex (keyword)**
- 2-5 terms, no filler words
- Exact phrase: `"connection pool"` (quoted)
- Exclude terms: `performance -sports` (minus prefix)
- Code identifiers work: `handleError async`
+Good QMD searches mix three things:

-**vec (semantic)**
- Full natural language question
- Be specific: `"how does the rate limiter handle burst traffic"`
- Include context: `"in the payment service, how are refunds processed"`
+1. **Title/alias anchors:** exact page titles, named entities, phrases.
+2. **Semantic paraphrase:** how a human would describe the idea.
+3. **Negative space:** enough intent to avoid nearby-but-wrong concepts.

-**hyde (hypothetical document)**
- Write 50-100 words of what the *answer* looks like
- Use the vocabulary you expect in the result
-
-**expand (auto-expand)**
- Use a single-line query (implicit) or `expand: question` on its own line
- Lets the local LLM generate lex/vec/hyde variations
- Do not mix `expand:` with other typed lines — it's either a standalone expand query or a full query document
-
-### Intent (Disambiguation)
-
-When a query term is ambiguous, add `intent` to steer results:
-
-```json
-{
-  "searches": [
-    { "type": "lex", "query": "performance" }
-  ],
-  "intent": "web page load times and Core Web Vitals"
-}
-```
-
-Intent affects expansion, reranking, chunk selection, and snippet extraction. It does not search on its own — it's a steering signal that disambiguates queries like "performance" (web-perf vs team health vs fitness).
-
-### Combining Types
-
-| Goal | Approach |
-|------|----------|
-| Know exact terms | `lex` only |
-| Don't know vocabulary | Use a single-line query (implicit `expand:`) or `vec` |
-| Best recall | `lex` + `vec` |
-| Complex topic | `lex` + `vec` + `hyde` |
-| Ambiguous query | Add `intent` to any combination above |
-
-First query gets 2x weight in fusion — put your best guess first.
-
-### Lex Query Syntax
-
-| Syntax | Meaning | Example |
-|--------|---------|---------|
-| `term` | Prefix match | `perf` matches "performance" |
-| `"phrase"` | Exact phrase | `"rate limiter"` |
-| `-term` | Exclude | `performance -sports` |
-
-Note: `-term` only works in lex queries, not vec/hyde.
-
-### Collection Filtering
-
-```json
-{ "collections": ["docs"] }              // Single
-{ "collections": ["docs", "notes"] }     // Multiple (OR)
-```
-
-Omit to search all collections.
-
-## Other MCP Tools
-
-| Tool | Use |
-|------|-----|
-| `get` | Retrieve doc by path or `#docid` |
-| `multi_get` | Retrieve multiple by glob/list |
-| `status` | Collections and health |
-
-## CLI
+Examples:

 ```bash
-qmd query "question"              # Auto-expand + rerank
-qmd query $'lex: X\nvec: Y'       # Structured
-qmd query $'expand: question'     # Explicit expand
-qmd query --json --explain "q"    # Show score traces (RRF + rerank blend)
-qmd search "keywords"             # BM25 only (no LLM)
-qmd get "#abc123"                 # By docid
-qmd multi-get "journals/2026-*.md" -l 40  # Batch pull snippets by glob
-qmd multi-get notes/foo.md,notes/bar.md   # Comma-separated list, preserves order
+# Exact-ish title lookup
+qmd search '"arm the rebels" merchants tools big companies' -c concepts
+
+# Semantic concept lookup
+qmd query $'intent: Find the customer proximity concept, not generic customer delight.\nlex: support pseudonymous merchant customer interviews\nvec: founder stays close to merchant reality through support and product use'
+
+# Source lookup
+qmd search "six-week cadence WhatsApp merchant relationships Shawn Ryan" -c sources -n 10
 ```

-## HTTP API
+## Setup and maintenance

-```bash
-curl -X POST http://localhost:8181/query \
-  -H "Content-Type: application/json" \
-  -d '{"searches": [{"type": "lex", "query": "test"}]}'
-```
-
-## Setup
+Only mutate indexes when the user asked for setup or maintenance. Searching and
+retrieving are safe; collection/index mutation is not a casual first step.

 ```bash
 npm install -g @tobilu/qmd
 qmd collection add ~/notes --name notes
+qmd update
 qmd embed
 ```
+
+Health and diagnostics:
+
+```bash
+qmd doctor
+qmd status
+qmd pull
+```
+
+`qmd doctor` checks config, model cache, device/GPU setup, vector fingerprints,
+and common environment overrides. If a model-backed command fails, run it before
+changing configuration.
+
+## MCP setup
+
+See `references/mcp-setup.md` for Claude Code, Claude Desktop, OpenClaw, and HTTP
+server configuration.
+
+## Pitfalls
+
+- **Do not stop at snippets.** Fetch documents before making claims.
+- **Do not slice files with `sed`/`head`/`tail`.** Use the `path:from:count`
+  suffix (e.g. `qmd get "#abc123:120:40"`) or `--from`/`-l`. Output is already
+  line-numbered; piping breaks docid resolution, the header, and virtual paths.
+- **Do not lean on query expansion.** Write `intent:`/`lex:`/`vec:`/`hyde:`
+  yourself. A bare `qmd query "user sentence"` discards the context only you
+  have. You expand the query; the model just ranks.
+- **Do not overuse semantic search.** If you know exact titles or terms, BM25 is
+  faster and often better.
+- **Do not mutate indexes casually.** `qmd collection add`, `qmd update`, and
+  `qmd embed` change local state and can be expensive.
+- **Model-backed commands can be environment-sensitive.** If `qmd query`,
+  `qmd vsearch`, or reranking fails because local models/GPU are unavailable,
+  use `qmd search` and stronger lexical/structured terms.
+- **Ambiguous user wording needs intent.** Add `intent:` rather than hoping query
+  expansion guesses the right domain.
+- **Collection names matter.** Search `concepts` for synthesized wiki pages,
+  `sources` for transcripts/raw source pages, and docs collections for code or
+  project documentation.
--- a/src/ast.ts
+++ b/src/ast.ts
@ -63,15 +63,22 @@ export function detectLanguage(filepath: string): SupportedLanguage | null {
 /**
 * Maps language to the npm package and wasm filename for the grammar.
 */
-const GRAMMAR_MAP: Record<SupportedLanguage, { pkg: string; wasm: string }> = {
-  typescript: { pkg: "tree-sitter-typescript", wasm: "tree-sitter-typescript.wasm" },
-  tsx:        { pkg: "tree-sitter-typescript", wasm: "tree-sitter-tsx.wasm" },
-  javascript: { pkg: "tree-sitter-typescript", wasm: "tree-sitter-typescript.wasm" },
-  python:     { pkg: "tree-sitter-python",     wasm: "tree-sitter-python.wasm" },
-  go:         { pkg: "tree-sitter-go",         wasm: "tree-sitter-go.wasm" },
-  rust:       { pkg: "tree-sitter-rust",        wasm: "tree-sitter-rust.wasm" },
+const GRAMMAR_MAP: Record<SupportedLanguage, { pkg: string; wasm: string; version: string }> = {
+  typescript: { pkg: "tree-sitter-typescript", wasm: "tree-sitter-typescript.wasm", version: "0.23.2" },
+  tsx:        { pkg: "tree-sitter-typescript", wasm: "tree-sitter-tsx.wasm",        version: "0.23.2" },
+  javascript: { pkg: "tree-sitter-typescript", wasm: "tree-sitter-typescript.wasm", version: "0.23.2" },
+  python:     { pkg: "tree-sitter-python",     wasm: "tree-sitter-python.wasm",     version: "0.23.4" },
+  go:         { pkg: "tree-sitter-go",         wasm: "tree-sitter-go.wasm",         version: "0.23.4" },
+  rust:       { pkg: "tree-sitter-rust",       wasm: "tree-sitter-rust.wasm",       version: "0.24.0" },
 };

+export function formatGrammarLoadError(language: SupportedLanguage, err: unknown): string {
+  const grammar = GRAMMAR_MAP[language];
+  const detail = err instanceof Error ? err.message : String(err);
+  return `${grammar.pkg}/${grammar.wasm} failed to load (${detail}); falling back to regex chunking. ` +
+    `Repair a broken global install with: bun add ${grammar.pkg}@${grammar.version}`;
+}
+
 // =============================================================================
 // Per-Language Query Definitions
 // =============================================================================
@ -176,6 +183,9 @@ let initPromise: Promise<void> | null = null;
 /** Languages that have already failed to load — warn only once per process. */
 const failedLanguages = new Set<string>();

+/** Last grammar load error by language, for status output. */
+const grammarLoadErrors = new Map<SupportedLanguage, string>();
+
 /** Cached grammar load promises. */
 const grammarCache = new Map<string, Promise<LanguageType>>();

@ -228,7 +238,9 @@ async function loadGrammar(language: SupportedLanguage): Promise<LanguageType |
  } catch (err) {
    failedLanguages.add(language);
    grammarCache.delete(wasmKey);
-    console.warn(`[qmd] Failed to load tree-sitter grammar for ${language}: ${err}`);
+    const message = formatGrammarLoadError(language, err);
+    grammarLoadErrors.set(language, message);
+    console.warn(`[qmd] AST grammar unavailable for ${language}: ${message}`);
    return null;
  }
 }
@ -345,7 +357,7 @@ export async function getASTStatus(): Promise<{
        getQuery(lang, grammar);
        languages.push({ language: lang, available: true });
      } else {
-        languages.push({ language: lang, available: false, error: "grammar failed to load" });
+        languages.push({ language: lang, available: false, error: grammarLoadErrors.get(lang) ?? "grammar failed to load" });
      }
    } catch (err) {
      languages.push({
--- a/src/bench-rerank.ts
+++ b/src/bench-rerank.ts
@ -260,16 +260,18 @@ async function main() {
      const r = await benchmarkConfig(model, llama, docs, p, true);
      results.push(r);
      process.stdout.write(` ${r.medianMs.toFixed(0)}ms (${r.docsPerSec.toFixed(1)} docs/s)\n`);
-    } catch (e: any) {
-      process.stdout.write(` failed: ${e.message}\n`);
+    } catch (e: unknown) {
+      const message = e instanceof Error ? e.message : String(e);
+      process.stdout.write(` failed: ${message}\n`);
      // Try without flash
      process.stdout.write(`  [${p} ctx, no flash] running...`);
      try {
        const r = await benchmarkConfig(model, llama, docs, p, false);
        results.push(r);
        process.stdout.write(` ${r.medianMs.toFixed(0)}ms (${r.docsPerSec.toFixed(1)} docs/s)\n`);
-      } catch (e2: any) {
-        process.stdout.write(` failed: ${e2.message}\n`);
+      } catch (e2: unknown) {
+        const message = e2 instanceof Error ? e2.message : String(e2);
+        process.stdout.write(` failed: ${message}\n`);
      }
    }
  }
--- a/src/bench/bench.ts
+++ b/src/bench/bench.ts
@ -22,6 +22,7 @@ import {
  type QMDStore,
  type SearchResult,
  type HybridQueryResult,
+  type ExpandedQuery,
 } from "../index.js";
 import { scoreResults } from "./score.js";
 import type {
@ -34,35 +35,130 @@ import type {

 type Backend = {
  name: string;
-  run: (store: QMDStore, query: string, limit: number, collection?: string) => Promise<string[]>;
+  run: (store: QMDStore, query: BenchmarkQuery, limit: number, collection?: string) => Promise<string[]>;
 };

+type ParsedStructuredQuery = {
+  searches: ExpandedQuery[];
+  intent?: string;
+};
+
+function parseStructuredQuery(query: string): ParsedStructuredQuery | undefined {
+  const lines = query.split("\n").map((line, idx) => ({
+    trimmed: line.trim(),
+    number: idx + 1,
+  })).filter(line => line.trimmed.length > 0);
+
+  if (lines.length === 0) return undefined;
+
+  const prefixRe = /^(lex|vec|hyde):\s*/i;
+  const intentRe = /^intent:\s*/i;
+  const searches: ExpandedQuery[] = [];
+  let intent: string | undefined;
+
+  for (const line of lines) {
+    if (intentRe.test(line.trimmed)) {
+      if (intent !== undefined) {
+        throw new Error(`Line ${line.number}: only one intent: line is allowed per benchmark query.`);
+      }
+      intent = line.trimmed.replace(intentRe, "").trim();
+      if (!intent) {
+        throw new Error(`Line ${line.number}: intent: must include text.`);
+      }
+      continue;
+    }
+
+    const match = line.trimmed.match(prefixRe);
+    if (match) {
+      const type = match[1]!.toLowerCase() as "lex" | "vec" | "hyde";
+      const text = line.trimmed.slice(match[0].length).trim();
+      if (!text) {
+        throw new Error(`Line ${line.number} (${type}:) must include text.`);
+      }
+      searches.push({ type, query: text, line: line.number });
+      continue;
+    }
+
+    if (lines.length === 1) {
+      return undefined;
+    }
+
+    throw new Error(`Line ${line.number} is missing a lex:/vec:/hyde:/intent: prefix.`);
+  }
+
+  if (intent && searches.length === 0) {
+    throw new Error("intent: cannot appear alone. Add at least one lex:, vec:, or hyde: line.");
+  }
+
+  return searches.length > 0 ? { searches, intent } : undefined;
+}
+
+function uniqueFiles(files: string[], limit: number): string[] {
+  const seen = new Set<string>();
+  const out: string[] = [];
+  for (const file of files) {
+    if (seen.has(file)) continue;
+    seen.add(file);
+    out.push(file);
+    if (out.length >= limit) break;
+  }
+  return out;
+}
+
 const BACKENDS: Backend[] = [
  {
    name: "bm25",
    run: async (store, query, limit, collection) => {
-      const results = await store.searchLex(query, { limit, collection });
+      const structured = parseStructuredQuery(query.query);
+      const lexQueries = structured?.searches.filter(q => q.type === "lex");
+      if (structured) {
+        const files: string[] = [];
+        for (const lex of lexQueries ?? []) {
+          const results = await store.searchLex(lex.query, { limit, collection });
+          files.push(...results.map((r: SearchResult) => r.filepath));
+        }
+        return uniqueFiles(files, limit);
+      }
+
+      const results = await store.searchLex(query.query, { limit, collection });
      return results.map((r: SearchResult) => r.filepath);
    },
  },
  {
    name: "vector",
    run: async (store, query, limit, collection) => {
-      const results = await store.searchVector(query, { limit, collection });
+      const structured = parseStructuredQuery(query.query);
+      const vectorQueries = structured?.searches.filter(q => q.type === "vec" || q.type === "hyde");
+      if (structured) {
+        const files: string[] = [];
+        for (const vectorQuery of vectorQueries ?? []) {
+          const results = await store.searchVector(vectorQuery.query, { limit, collection });
+          files.push(...results.map((r: SearchResult) => r.filepath));
+        }
+        return uniqueFiles(files, limit);
+      }
+
+      const results = await store.searchVector(query.query, { limit, collection });
      return results.map((r: SearchResult) => r.filepath);
    },
  },
  {
    name: "hybrid",
    run: async (store, query, limit, collection) => {
-      const results = await store.search({ query, limit, collection, rerank: false });
+      const structured = parseStructuredQuery(query.query);
+      const results = structured
+        ? await store.search({ queries: structured.searches, intent: structured.intent, limit, collection, rerank: false })
+        : await store.search({ query: query.query, limit, collection, rerank: false });
      return results.map((r: HybridQueryResult) => r.file);
    },
  },
  {
    name: "full",
    run: async (store, query, limit, collection) => {
-      const results = await store.search({ query, limit, collection, rerank: true });
+      const structured = parseStructuredQuery(query.query);
+      const results = structured
+        ? await store.search({ queries: structured.searches, intent: structured.intent, limit, collection, rerank: true })
+        : await store.search({ query: query.query, limit, collection, rerank: true });
      return results.map((r: HybridQueryResult) => r.file);
    },
  },
@ -79,18 +175,23 @@ async function runQuery(

  let resultFiles: string[];
  try {
-    resultFiles = await backend.run(store, query.query, limit, collection);
-  } catch (err: any) {
+    resultFiles = await backend.run(store, query, limit, collection);
+  } catch {
    // Backend may not be available (e.g., no embeddings for vector search)
    return {
      precision_at_k: 0,
      recall: 0,
+      recall_at_1: 0,
+      recall_at_3: 0,
+      recall_at_5: 0,
      mrr: 0,
      f1: 0,
      hits_at_k: 0,
      total_expected: query.expected_files.length,
      latency_ms: Date.now() - start,
      top_files: [],
+      matched_files: [],
+      unmatched_expected_files: query.expected_files,
    };
  }

@ -111,14 +212,14 @@ function formatTable(results: QueryResult[]): string {
  const num = (n: number) => n.toFixed(2).padStart(5);

  lines.push(
-    `${pad("Query", 25)} ${pad("Backend", 8)} ${pad("P@k", 6)} ${pad("Recall", 7)} ${pad("MRR", 6)} ${pad("F1", 6)} ${pad("ms", 8)}`
+    `${pad("Query", 25)} ${pad("Backend", 8)} ${pad("P@k", 6)} ${pad("R@1", 6)} ${pad("R@3", 6)} ${pad("R@5", 6)} ${pad("MRR", 6)} ${pad("F1", 6)} ${pad("ms", 8)}`
  );
-  lines.push("-".repeat(70));
+  lines.push("-".repeat(88));

  for (const r of results) {
    for (const [backend, br] of Object.entries(r.backends)) {
      lines.push(
-        `${pad(r.id, 25)} ${pad(backend, 8)} ${num(br.precision_at_k)} ${num(br.recall)}  ${num(br.mrr)} ${num(br.f1)} ${String(Math.round(br.latency_ms)).padStart(7)}ms`
+        `${pad(r.id, 25)} ${pad(backend, 8)} ${num(br.precision_at_k)} ${num(br.recall_at_1)} ${num(br.recall_at_3)} ${num(br.recall_at_5)} ${num(br.mrr)} ${num(br.f1)} ${String(Math.round(br.latency_ms)).padStart(7)}ms`
      );
    }
    lines.push("");
@ -138,13 +239,16 @@ function computeSummary(results: QueryResult[]): BenchmarkResult["summary"] {
    }
  }

-  for (const name of backendNames) {
-    let totalP = 0, totalR = 0, totalMrr = 0, totalF1 = 0, totalLat = 0, count = 0;
+  for (const name of Array.from(backendNames)) {
+    let totalP = 0, totalR = 0, totalR1 = 0, totalR3 = 0, totalR5 = 0, totalMrr = 0, totalF1 = 0, totalLat = 0, count = 0;
    for (const r of results) {
      const br = r.backends[name];
      if (!br) continue;
      totalP += br.precision_at_k;
      totalR += br.recall;
+      totalR1 += br.recall_at_1;
+      totalR3 += br.recall_at_3;
+      totalR5 += br.recall_at_5;
      totalMrr += br.mrr;
      totalF1 += br.f1;
      totalLat += br.latency_ms;
@ -154,6 +258,9 @@ function computeSummary(results: QueryResult[]): BenchmarkResult["summary"] {
      summary[name] = {
        avg_precision: totalP / count,
        avg_recall: totalR / count,
+        avg_recall_at_1: totalR1 / count,
+        avg_recall_at_3: totalR3 / count,
+        avg_recall_at_5: totalR5 / count,
        avg_mrr: totalMrr / count,
        avg_f1: totalF1 / count,
        avg_latency_ms: totalLat / count,
@ -166,7 +273,7 @@ function computeSummary(results: QueryResult[]): BenchmarkResult["summary"] {

 export async function runBenchmark(
  fixturePath: string,
-  options: { json?: boolean; collection?: string; backends?: string[] } = {},
+  options: { json?: boolean; collection?: string; backends?: string[]; dbPath?: string; configPath?: string } = {},
 ): Promise<BenchmarkResult> {
  // Load fixture
  const raw = readFileSync(resolve(fixturePath), "utf-8");
@ -177,7 +284,10 @@ export async function runBenchmark(
  }

  // Open store
-  const store = await createStore({ dbPath: getDefaultDbPath() });
+  const store = await createStore({
+    dbPath: options.dbPath ?? getDefaultDbPath(),
+    ...(options.configPath ? { configPath: options.configPath } : {}),
+  });

  // Filter backends if requested
  const activeBackends = options.backends
@ -232,7 +342,7 @@ export async function runBenchmark(
    const num = (n: number) => n.toFixed(3).padStart(6);
    for (const [name, s] of Object.entries(summary)) {
      console.log(
-        `  ${pad(name, 8)} P@k=${num(s.avg_precision)} Recall=${num(s.avg_recall)} MRR=${num(s.avg_mrr)} F1=${num(s.avg_f1)} Avg=${Math.round(s.avg_latency_ms)}ms`
+        `  ${pad(name, 8)} P@k=${num(s.avg_precision)} R@1=${num(s.avg_recall_at_1)} R@3=${num(s.avg_recall_at_3)} R@5=${num(s.avg_recall_at_5)} MRR=${num(s.avg_mrr)} F1=${num(s.avg_f1)} Avg=${Math.round(s.avg_latency_ms)}ms`
      );
    }
  }
--- a/src/bench/score.ts
+++ b/src/bench/score.ts
@ -11,7 +11,7 @@
 */
 export function normalizePath(p: string): string {
  if (p.startsWith("qmd://")) {
-    // qmd://collection/path/to/file → path/to/file
+    // qmd://collection/docs/readme.md → docs/readme.md
    const withoutScheme = p.slice("qmd://".length);
    const slashIdx = withoutScheme.indexOf("/");
    p = slashIdx >= 0 ? withoutScheme.slice(slashIdx + 1) : withoutScheme;
@ -31,6 +31,30 @@ export function pathsMatch(result: string, expected: string): boolean {
  return false;
 }

+type ScoreMetrics = {
+  precision_at_k: number;
+  recall: number;
+  recall_at_1: number;
+  recall_at_3: number;
+  recall_at_5: number;
+  mrr: number;
+  f1: number;
+  hits_at_k: number;
+  matched_files: string[];
+  unmatched_expected_files: string[];
+};
+
+function hitsWithin(resultFiles: string[], expectedFiles: string[], k: number): number {
+  const topKResults = resultFiles.slice(0, k);
+  let hits = 0;
+  for (const expected of expectedFiles) {
+    if (topKResults.some(r => pathsMatch(r, expected))) {
+      hits++;
+    }
+  }
+  return hits;
+}
+
 /**
 * Score a set of search results against expected files.
 */
@ -38,21 +62,18 @@ export function scoreResults(
  resultFiles: string[],
  expectedFiles: string[],
  topK: number,
-): { precision_at_k: number; recall: number; mrr: number; f1: number; hits_at_k: number } {
+): ScoreMetrics {
  // Count hits in top-k
-  const topKResults = resultFiles.slice(0, topK);
-  let hitsAtK = 0;
-  for (const expected of expectedFiles) {
-    if (topKResults.some(r => pathsMatch(r, expected))) {
-      hitsAtK++;
-    }
-  }
+  const hitsAtK = hitsWithin(resultFiles, expectedFiles, topK);
+
+  const matchedFiles: string[] = [];
+  const unmatchedExpectedFiles: string[] = [];

-  // Count total hits anywhere
-  let totalHits = 0;
  for (const expected of expectedFiles) {
    if (resultFiles.some(r => pathsMatch(r, expected))) {
-      totalHits++;
+      matchedFiles.push(expected);
+    } else {
+      unmatchedExpectedFiles.push(expected);
    }
  }

@ -67,10 +88,24 @@ export function scoreResults(

  const denominator = Math.min(topK, expectedFiles.length);
  const precision_at_k = denominator > 0 ? hitsAtK / denominator : 0;
-  const recall = expectedFiles.length > 0 ? totalHits / expectedFiles.length : 0;
+  const recall = expectedFiles.length > 0 ? matchedFiles.length / expectedFiles.length : 0;
+  const recall_at_1 = expectedFiles.length > 0 ? hitsWithin(resultFiles, expectedFiles, 1) / expectedFiles.length : 0;
+  const recall_at_3 = expectedFiles.length > 0 ? hitsWithin(resultFiles, expectedFiles, 3) / expectedFiles.length : 0;
+  const recall_at_5 = expectedFiles.length > 0 ? hitsWithin(resultFiles, expectedFiles, 5) / expectedFiles.length : 0;
  const f1 = precision_at_k + recall > 0
    ? 2 * (precision_at_k * recall) / (precision_at_k + recall)
    : 0;

-  return { precision_at_k, recall, mrr, f1, hits_at_k: hitsAtK };
+  return {
+    precision_at_k,
+    recall,
+    recall_at_1,
+    recall_at_3,
+    recall_at_5,
+    mrr,
+    f1,
+    hits_at_k: hitsAtK,
+    matched_files: matchedFiles,
+    unmatched_expected_files: unmatchedExpectedFiles,
+  };
 }
--- a/src/bench/types.ts
+++ b/src/bench/types.ts
@ -37,6 +37,12 @@ export interface BackendResult {
  precision_at_k: number;
  /** Fraction of expected files found anywhere in results */
  recall: number;
+  /** Fraction of expected files found in the first result */
+  recall_at_1: number;
+  /** Fraction of expected files found in the top 3 results */
+  recall_at_3: number;
+  /** Fraction of expected files found in the top 5 results */
+  recall_at_5: number;
  /** Reciprocal rank of first relevant result (1/rank, 0 if not found) */
  mrr: number;
  /** Harmonic mean of precision_at_k and recall */
@ -49,6 +55,10 @@ export interface BackendResult {
  latency_ms: number;
  /** Top result file paths (for inspection) */
  top_files: string[];
+  /** Expected files that were found anywhere in the returned result set */
+  matched_files: string[];
+  /** Expected files missing from the returned result set */
+  unmatched_expected_files: string[];
 }

 export interface QueryResult {
@ -65,6 +75,9 @@ export interface BenchmarkResult {
  summary: Record<string, {
    avg_precision: number;
    avg_recall: number;
+    avg_recall_at_1: number;
+    avg_recall_at_3: number;
+    avg_recall_at_5: number;
    avg_mrr: number;
    avg_f1: number;
    avg_latency_ms: number;
--- a/src/cli/formatter.ts
+++ b/src/cli/formatter.ts
@ -185,8 +185,9 @@ export function searchResultsToMarkdown(
    if (opts.lineNumbers) {
      content = addLineNumbers(content);
    }
+    const fileLine = `**file:** \`${row.displayPath}\`\n`;
    const contextLine = row.context ? `**context:** ${row.context}\n` : "";
-    return `---\n# ${heading}\n\n**docid:** \`#${row.docid}\`\n${contextLine}\n${content}\n`;
+    return `---\n# ${heading}\n\n${fileLine}**docid:** \`#${row.docid}\`\n${contextLine}\n${content}\n`;
  }).join("\n");
 }

--- a/src/cli/qmd.ts
+++ b/src/cli/qmd.ts
--- a/src/collections.ts
+++ b/src/collections.ts
@ -6,8 +6,8 @@
 */

 import { existsSync, mkdirSync, readFileSync, writeFileSync } from "fs";
-import { join, dirname } from "path";
-import { homedir } from "os";
+import { join, dirname, resolve } from "path";
+import { qmdHomedir } from "./paths.js";
 import YAML from "yaml";

 // ============================================================================
@ -101,9 +101,7 @@ export function setConfigSource(source?: { configPath?: string; config?: Collect
 export function setConfigIndexName(name: string): void {
  // Resolve relative paths to absolute paths and sanitize for use as filename
  if (name.includes('/')) {
-    const { resolve } = require('path');
-    const { cwd } = require('process');
-    const absolutePath = resolve(cwd(), name);
+    const absolutePath = resolve(process.cwd(), name);
    // Replace path separators with underscores to create a valid filename
    currentIndexName = absolutePath.replace(/\//g, '_').replace(/^_/, '');
  } else {
@ -120,13 +118,41 @@ function getConfigDir(): string {
  if (process.env.XDG_CONFIG_HOME) {
    return join(process.env.XDG_CONFIG_HOME, "qmd");
  }
-  return join(homedir(), ".config", "qmd");
+  return join(qmdHomedir(), ".config", "qmd");
 }

 function getConfigFilePath(): string {
  return join(getConfigDir(), `${currentIndexName}.yml`);
 }

+/**
+ * Find a project-local QMD config by walking upward from startDir.
+ * The local config lives at .qmd/index.yaml or .qmd/index.yml and,
+ * when used by the CLI, keeps both config and index DB writes inside
+ * the project instead of the global ~/.config / ~/.cache locations.
+ */
+export function findLocalConfigPath(startDir: string = process.cwd()): string | undefined {
+  let dir = resolve(startDir);
+
+  while (true) {
+    const qmdDir = join(dir, ".qmd");
+    const yamlPath = join(qmdDir, "index.yaml");
+    if (existsSync(yamlPath)) return yamlPath;
+
+    const ymlPath = join(qmdDir, "index.yml");
+    if (existsSync(ymlPath)) return ymlPath;
+
+    const parent = dirname(dir);
+    if (parent === dir) return undefined;
+    dir = parent;
+  }
+}
+
+/** Return the local SQLite index path paired with a local .qmd/index.yaml file. */
+export function getLocalDbPath(configPath: string): string {
+  return join(dirname(configPath), "index.sqlite");
+}
+
 /**
 * Ensure config directory exists
 */
@ -161,7 +187,8 @@ export function loadConfig(): CollectionConfig {

  try {
    const content = readFileSync(configPath, "utf-8");
-    const config = YAML.parse(content) as CollectionConfig;
+    const parsed = YAML.parse(content) as CollectionConfig | null | undefined;
+    const config = parsed ?? { collections: {} };

    // Ensure collections object exists
    if (!config.collections) {
--- a/src/db.ts
+++ b/src/db.ts
@ -11,10 +11,16 @@
 * SQLite build before creating any database instances.
 */

-export const isBun = typeof globalThis.Bun !== "undefined";
+export const isBun = "Bun" in globalThis;

-let _Database: any;
-let _sqliteVecLoad: ((db: any) => void) | null;
+export type SQLiteValue = string | number | bigint | Buffer | Uint8Array | Float32Array | null;
+export type SQLiteParams = readonly SQLiteValue[];
+
+type DatabaseConstructor = new (path: string) => Database;
+type LoadableSqliteDatabase = Pick<Database, "loadExtension">;
+
+let _Database: DatabaseConstructor;
+let _sqliteVecLoad: ((db: LoadableSqliteDatabase) => void) | null;

 if (isBun) {
  // Dynamic string prevents tsc from resolving bun:sqlite on Node.js builds
@ -44,15 +50,15 @@ if (isBun) {
    const testDb = new BunDatabase(":memory:");
    testDb.loadExtension(vecPath);
    testDb.close();
-    _sqliteVecLoad = (db: any) => db.loadExtension(vecPath);
+    _sqliteVecLoad = (db: LoadableSqliteDatabase) => db.loadExtension(vecPath);
  } catch {
    // Vector search won't work, but BM25 and other operations are unaffected.
    _sqliteVecLoad = null;
  }
 } else {
-  _Database = (await import("better-sqlite3")).default;
+  _Database = (await import("better-sqlite3")).default as unknown as DatabaseConstructor;
  const sqliteVec = await import("sqlite-vec");
-  _sqliteVecLoad = (db: any) => sqliteVec.load(db);
+  _sqliteVecLoad = (db: LoadableSqliteDatabase) => sqliteVec.load(db as Parameters<typeof sqliteVec.load>[0]);
 }

 /**
@ -70,13 +76,14 @@ export interface Database {
  prepare(sql: string): Statement;
  transaction<T extends (...args: any[]) => any>(fn: T): T;
  loadExtension(path: string): void;
+  transaction<T extends (...args: SQLiteValue[]) => unknown>(fn: T): T;
  close(): void;
 }

 export interface Statement {
-  run(...params: any[]): { changes: number; lastInsertRowid: number | bigint };
-  get(...params: any[]): any;
-  all(...params: any[]): any[];
+  run(...params: SQLiteValue[]): { changes: number; lastInsertRowid: number | bigint };
+  get<T = unknown>(...params: SQLiteValue[]): T | undefined;
+  all<T = unknown>(...params: SQLiteValue[]): T[];
 }

 /**
--- a/src/embedded-skills.ts
+++ b/src/embedded-skills.ts
--- a/src/index.ts
+++ b/src/index.ts
@ -23,7 +23,6 @@ import {
  structuredSearch,
  extractSnippet,
  addLineNumbers,
-  DEFAULT_EMBED_MODEL,
  DEFAULT_MULTI_GET_MAX_BYTES,
  reindexCollection,
  generateEmbeddings,
@ -159,6 +158,8 @@ export interface SearchOptions {
  collections?: string[];
  /** Max results (default: 10) */
  limit?: number;
+  /** Max candidates to rerank (default: 40) */
+  candidateLimit?: number;
  /** Minimum score threshold */
  minScore?: number;
  /** Include explain traces */
@ -290,6 +291,8 @@ export interface QMDStore {
  embed(options?: {
    force?: boolean;
    model?: string;
+    /** Restrict embedding to documents in one collection. */
+    collection?: string;
    maxDocsPerBatch?: number;
    maxBatchBytes?: number;
    chunkStrategy?: ChunkStrategy;
@ -400,6 +403,7 @@ export async function createStore(options: StoreOptions): Promise<QMDStore> {
          minScore: opts.minScore,
          explain: opts.explain,
          intent: opts.intent,
+          candidateLimit: opts.candidateLimit,
          skipRerank,
          chunkStrategy: opts.chunkStrategy,
        });
@ -412,12 +416,13 @@ export async function createStore(options: StoreOptions): Promise<QMDStore> {
        minScore: opts.minScore,
        explain: opts.explain,
        intent: opts.intent,
+        candidateLimit: opts.candidateLimit,
        skipRerank,
        chunkStrategy: opts.chunkStrategy,
      });
    },
    searchLex: async (q, opts) => internal.searchFTS(q, opts?.limit, opts?.collection),
-    searchVector: async (q, opts) => internal.searchVec(q, DEFAULT_EMBED_MODEL, opts?.limit, opts?.collection),
+    searchVector: async (q, opts) => internal.searchVec(q, llm.embedModelName, opts?.limit, opts?.collection),
    expandQuery: async (q, opts) => internal.expandQuery(q, undefined, opts?.intent),
    get: async (pathOrDocid, opts) => internal.findDocument(pathOrDocid, opts),
    getDocumentBody: async (pathOrDocid, opts) => {
@ -516,6 +521,7 @@ export async function createStore(options: StoreOptions): Promise<QMDStore> {
      return generateEmbeddings(internal, {
        force: embedOpts?.force,
        model: embedOpts?.model,
+        collection: embedOpts?.collection,
        maxDocsPerBatch: embedOpts?.maxDocsPerBatch,
        maxBatchBytes: embedOpts?.maxBatchBytes,
        chunkStrategy: embedOpts?.chunkStrategy,
--- a/src/llm.ts
+++ b/src/llm.ts
@ -5,16 +5,72 @@
 * local GGUF embeddings plus local text generation and reranking via node-llama-cpp.
 */

-import {
-  getLlama,
-  resolveModelFile,
-  LlamaChatSession,
-  LlamaLogLevel,
-  type Llama,
-  type LlamaModel,
-  type LlamaEmbeddingContext,
-  type Token as LlamaToken,
+import type {
+  Llama,
+  LlamaModel,
+  LlamaEmbeddingContext,
+  Token as LlamaToken,
 } from "node-llama-cpp";
+
+type StdoutChunk = string | Uint8Array;
+type WriteCallback = (err?: Error | null) => void;
+
+type NodeLlamaCppModule = {
+  getLlama: (options: Record<string, unknown>) => Promise<Llama>;
+  getLlamaGpuTypes?: (include?: "supported" | "allValid") => Promise<LlamaGpuMode[]>;
+  resolveModelFile: (model: string, cacheDir: string) => Promise<string>;
+  LlamaChatSession: new (options: { contextSequence: unknown }) => {
+    prompt: (prompt: string, options?: Record<string, unknown>) => Promise<string>;
+  };
+  LlamaLogLevel: { error: unknown };
+};
+
+let nodeLlamaCppImport: Promise<NodeLlamaCppModule> | null = null;
+async function loadNodeLlamaCpp(): Promise<NodeLlamaCppModule> {
+  nodeLlamaCppImport ??= withNativeStdoutRedirectedToStderr(
+    () => import("node-llama-cpp") as Promise<NodeLlamaCppModule>
+  );
+  return nodeLlamaCppImport;
+}
+
+export function setNodeLlamaCppModuleForTest(module: NodeLlamaCppModule | null): void {
+  nodeLlamaCppImport = module ? Promise.resolve(module) : null;
+  failedGpuInitModes.clear();
+  noGpuAccelerationWarningShown = false;
+  cpuForcedPrebuiltFallbackWarningShown = false;
+}
+
+type StdoutWrite = typeof process.stdout.write;
+let nativeStdoutRedirectDepth = 0;
+let originalStdoutWrite: StdoutWrite | null = null;
+
+/**
+ * Some node-llama-cpp native build/probe paths write library noise to stdout.
+ * JSON APIs must reserve stdout for machine-readable payloads, so route that
+ * noise to stderr while native llama initialization is in progress.
+ */
+export async function withNativeStdoutRedirectedToStderr<T>(fn: () => Promise<T>): Promise<T> {
+  if (nativeStdoutRedirectDepth === 0) {
+    originalStdoutWrite = process.stdout.write.bind(process.stdout) as StdoutWrite;
+    process.stdout.write = ((chunk: StdoutChunk, encodingOrCallback?: BufferEncoding | WriteCallback, callback?: WriteCallback) => {
+      if (typeof encodingOrCallback === "function") {
+        return process.stderr.write(chunk, encodingOrCallback);
+      }
+      return process.stderr.write(chunk, encodingOrCallback, callback);
+    }) as StdoutWrite;
+  }
+  nativeStdoutRedirectDepth++;
+  try {
+    return await fn();
+  } finally {
+    nativeStdoutRedirectDepth--;
+    if (nativeStdoutRedirectDepth === 0 && originalStdoutWrite) {
+      process.stdout.write = originalStdoutWrite;
+      originalStdoutWrite = null;
+    }
+  }
+}
+
 import { homedir } from "os";
 import { join } from "path";
 import { existsSync, mkdirSync, statSync, unlinkSync, readdirSync, readFileSync, writeFileSync, openSync, readSync, closeSync } from "fs";
@ -37,7 +93,7 @@ export function isQwen3EmbeddingModel(modelUri: string): boolean {
 * Uses Qwen3-Embedding instruct format when a Qwen embedding model is active.
 */
 export function formatQueryForEmbedding(query: string, modelUri?: string): string {
-  const uri = modelUri ?? process.env.QMD_EMBED_MODEL ?? DEFAULT_EMBED_MODEL;
+  const uri = modelUri ?? resolveEmbedModel();
  if (isQwen3EmbeddingModel(uri)) {
    return `Instruct: Retrieve relevant documents for the given query\nQuery: ${query}`;
  }
@ -50,7 +106,7 @@ export function formatQueryForEmbedding(query: string, modelUri?: string): strin
 * Qwen3-Embedding encodes documents as raw text without special prefixes.
 */
 export function formatDocForEmbedding(text: string, title?: string, modelUri?: string): string {
-  const uri = modelUri ?? process.env.QMD_EMBED_MODEL ?? DEFAULT_EMBED_MODEL;
+  const uri = modelUri ?? resolveEmbedModel();
  if (isQwen3EmbeddingModel(uri)) {
    // Qwen3-Embedding: documents are raw text, no task prefix
    return title ? `${title}\n${text}` : text;
@ -208,6 +264,32 @@ export const DEFAULT_EMBED_MODEL_URI = DEFAULT_EMBED_MODEL;
 export const DEFAULT_RERANK_MODEL_URI = DEFAULT_RERANK_MODEL;
 export const DEFAULT_GENERATE_MODEL_URI = DEFAULT_GENERATE_MODEL;

+export type ModelResolutionConfig = {
+  embed?: string;
+  generate?: string;
+  rerank?: string;
+};
+
+export function resolveEmbedModel(config?: ModelResolutionConfig): string {
+  return config?.embed || process.env.QMD_EMBED_MODEL || DEFAULT_EMBED_MODEL;
+}
+
+export function resolveGenerateModel(config?: ModelResolutionConfig): string {
+  return config?.generate || process.env.QMD_GENERATE_MODEL || DEFAULT_GENERATE_MODEL;
+}
+
+export function resolveRerankModel(config?: ModelResolutionConfig): string {
+  return config?.rerank || process.env.QMD_RERANK_MODEL || DEFAULT_RERANK_MODEL;
+}
+
+export function resolveModels(config?: ModelResolutionConfig): Required<ModelResolutionConfig> {
+  return {
+    embed: resolveEmbedModel(config),
+    generate: resolveGenerateModel(config),
+    rerank: resolveRerankModel(config),
+  };
+}
+
 // Local model cache directory
 const MODEL_CACHE_DIR = process.env.XDG_CACHE_HOME
  ? join(process.env.XDG_CACHE_HOME, "qmd", "models")
@ -270,37 +352,106 @@ async function getRemoteEtag(ref: HfRef): Promise<string | null> {

 const GGUF_MAGIC = Buffer.from("GGUF");

+export type GgufFileInspection = {
+  exists: boolean;
+  valid: boolean;
+  kind: "missing" | "gguf" | "html" | "invalid";
+  sizeBytes?: number;
+  magic?: string;
+  details: string;
+};
+
+function formatModelFileSize(sizeBytes: number): string {
+  return `${(sizeBytes / 1024).toFixed(0)} KB`;
+}
+
+function printableMagic(header: Buffer): string {
+  const text = header.toString("utf-8");
+  return /^[\x20-\x7e]{1,4}$/.test(text) ? text : `0x${header.toString("hex")}`;
+}
+
+/**
+ * Inspect a potential GGUF model file without mutating it.
+ * Used by doctor for early diagnostics and by runtime validation before load.
+ */
+export function inspectGgufFile(filePath: string): GgufFileInspection {
+  if (!existsSync(filePath)) {
+    return { exists: false, valid: false, kind: "missing", details: "file does not exist" };
+  }
+
+  let sizeBytes = 0;
+  try {
+    sizeBytes = statSync(filePath).size;
+    const fd = openSync(filePath, "r");
+    const sniff = Buffer.alloc(512);
+    try {
+      readSync(fd, sniff, 0, 512, 0);
+    } finally {
+      closeSync(fd);
+    }
+
+    const header = sniff.subarray(0, 4);
+    if (header.equals(GGUF_MAGIC)) {
+      return {
+        exists: true,
+        valid: true,
+        kind: "gguf",
+        sizeBytes,
+        magic: "GGUF",
+        details: `valid GGUF (${formatModelFileSize(sizeBytes)})`,
+      };
+    }
+
+    const magic = printableMagic(header);
+    const text = sniff.toString("utf-8").toLowerCase();
+    const isHtml = text.includes("<!doctype") || text.includes("<html");
+    if (isHtml) {
+      return {
+        exists: true,
+        valid: false,
+        kind: "html",
+        sizeBytes,
+        magic,
+        details: `HTML page, not a GGUF model (${formatModelFileSize(sizeBytes)}); likely proxy/firewall/captive portal response`,
+      };
+    }
+
+    return {
+      exists: true,
+      valid: false,
+      kind: "invalid",
+      sizeBytes,
+      magic,
+      details: `not valid GGUF (expected magic "GGUF", got "${magic}", ${formatModelFileSize(sizeBytes)})`,
+    };
+  } catch (error) {
+    return {
+      exists: true,
+      valid: false,
+      kind: "invalid",
+      sizeBytes,
+      details: `cannot read model file: ${error instanceof Error ? error.message : String(error)}`,
+    };
+  }
+}
+
 /**
 * Validate that a file is actually a GGUF model, not an HTML error page
 * from a proxy, firewall, or failed download.
 * Throws a descriptive error if the file is not valid GGUF.
 */
 function validateGgufFile(filePath: string, modelUri: string): void {
-  if (!existsSync(filePath)) return; // let downstream handle missing files
-
-  // Read header + sniff bytes in one go, then close immediately
-  const fd = openSync(filePath, "r");
-  const sniff = Buffer.alloc(512);
-  try {
-    readSync(fd, sniff, 0, 512, 0);
-  } finally {
-    closeSync(fd);
-  }
-
-  const header = sniff.subarray(0, 4);
-  if (header.equals(GGUF_MAGIC)) return; // valid GGUF
-
-  const text = sniff.toString("utf-8").toLowerCase();
-  const isHtml = text.includes("<!doctype") || text.includes("<html");
-  const got = header.toString("utf-8");
-  const sizeKB = (statSync(filePath).size / 1024).toFixed(0);
+  const inspection = inspectGgufFile(filePath);
+  if (!inspection.exists || inspection.valid) return; // let downstream handle missing files

  // Remove the bad file so the next attempt re-downloads
-  unlinkSync(filePath);
+  try {
+    unlinkSync(filePath);
+  } catch { /* best effort */ }

-  if (isHtml) {
+  if (inspection.kind === "html") {
    throw new Error(
-      `Downloaded model file is an HTML page, not a GGUF model (${sizeKB} KB).\n` +
+      `Downloaded model file is an HTML page, not a GGUF model (${formatModelFileSize(inspection.sizeBytes ?? 0)}).\n` +
      `Something is intercepting the download from huggingface.co (a proxy, firewall, or captive portal).\n\n` +
      `Model: ${modelUri}\n` +
      `Path:  ${filePath}\n\n` +
@ -313,7 +464,7 @@ function validateGgufFile(filePath: string, modelUri: string): void {
  }

  throw new Error(
-    `Model file is not valid GGUF (expected magic "GGUF", got "${got}", file is ${sizeKB} KB).\n` +
+    `Model file is not valid GGUF (expected magic "GGUF", got "${inspection.magic ?? "unknown"}", file is ${formatModelFileSize(inspection.sizeBytes ?? 0)}).\n` +
    `Model: ${modelUri}\n` +
    `Path:  ${filePath}\n\n` +
    `The file has been removed. Run the command again to re-download.`
@ -364,6 +515,7 @@ export async function pullModels(
      }
    }

+    const { resolveModelFile } = await loadNodeLlamaCpp();
    const path = await resolveModelFile(model, cacheDir);
    validateGgufFile(path, model);
    const sizeBytes = existsSync(path) ? statSync(path).size : 0;
@ -460,9 +612,51 @@ export type LlamaCppConfig = {
 const DEFAULT_INACTIVITY_TIMEOUT_MS = 5 * 60 * 1000;
 const DEFAULT_EXPAND_CONTEXT_SIZE = 2048;

-type LlamaGpuMode = "auto" | "metal" | "vulkan" | "cuda" | false;
+export type LlamaGpuMode = "auto" | "metal" | "vulkan" | "cuda" | false;
+
+type ParallelismOptions = {
+  gpu: string | false;
+  platform?: NodeJS.Platform;
+  computed: number;
+  envValue?: string;
+};
+
+export function resolveParallelismOverride(envValue = process.env.QMD_EMBED_PARALLELISM): number | undefined {
+  const normalized = envValue?.trim() ?? "";
+  if (!normalized) return undefined;
+
+  const parsed = Number(normalized);
+  if (!Number.isInteger(parsed) || parsed < 1) {
+    process.stderr.write(`QMD Warning: invalid QMD_EMBED_PARALLELISM="${envValue}", using automatic parallelism.\n`);
+    return undefined;
+  }
+
+  return Math.min(8, parsed);
+}
+
+export function resolveSafeParallelism(options: ParallelismOptions): number {
+  const override = resolveParallelismOverride(options.envValue);
+  if (override !== undefined) return override;
+
+  // node-llama-cpp/llama.cpp CUDA on Windows is unstable with multiple
+  // simultaneous contexts (ggml-cuda.cu:98 in #519). Vulkan and CPU do not
+  // show the same failure mode, so only serialize Windows CUDA by default.
+  if ((options.platform ?? process.platform) === "win32" && options.gpu === "cuda") {
+    return 1;
+  }
+
+  return Math.max(1, options.computed);
+}
+
+export function resolveLlamaGpuMode(
+  envValue = process.env.QMD_LLAMA_GPU,
+  forceCpuValue = process.env.QMD_FORCE_CPU
+): LlamaGpuMode {
+  const forceCpu = forceCpuValue?.trim().toLowerCase() ?? "";
+  if (forceCpu && !["false", "off", "none", "disable", "disabled", "0"].includes(forceCpu)) {
+    return false;
+  }

-export function resolveLlamaGpuMode(envValue = process.env.QMD_LLAMA_GPU): LlamaGpuMode {
  const normalized = envValue?.trim().toLowerCase() ?? "";
  if (!normalized) return "auto";
  if (["false", "off", "none", "disable", "disabled", "0"].includes(normalized)) return false;
@ -472,6 +666,23 @@ export function resolveLlamaGpuMode(envValue = process.env.QMD_LLAMA_GPU): Llama
  return "auto";
 }

+async function disposeWithTimeout(resourceName: string, dispose: () => Promise<void>, timeoutMs = 1000): Promise<void> {
+  const timeoutPromise = new Promise<"timeout">((resolve) => {
+    setTimeout(() => resolve("timeout"), timeoutMs).unref();
+  });
+
+  try {
+    const result = await Promise.race([dispose(), timeoutPromise]);
+    if (result === "timeout") {
+      process.stderr.write(`QMD Warning: timed out disposing ${resourceName}; continuing shutdown.\n`);
+    }
+  } catch (error) {
+    process.stderr.write(
+      `QMD Warning: failed to dispose ${resourceName} (${error instanceof Error ? error.message : String(error)}); continuing shutdown.\n`
+    );
+  }
+}
+
 function resolveExpandContextSize(configValue?: number): number {
  if (configValue !== undefined) {
    if (!Number.isInteger(configValue) || configValue <= 0) {
@ -493,6 +704,14 @@ function resolveExpandContextSize(configValue?: number): number {
  return parsed;
 }

+const failedGpuInitModes = new Set<LlamaGpuMode>();
+let noGpuAccelerationWarningShown = false;
+let cpuForcedPrebuiltFallbackWarningShown = false;
+
+function isCpuModeRequested(): boolean {
+  return resolveLlamaGpuMode() === false;
+}
+
 export class LlamaCpp implements LLM {
  private readonly _ciMode = !!process.env.CI;
  private llama: Llama | null = null;
@ -530,6 +749,15 @@ export class LlamaCpp implements LLM {
    this.embedApiKey = config.embedApiKey || process.env.QMD_EMBED_API_KEY || process.env.NVIDIA_API_KEY || process.env.OPENAI_API_KEY;
    this.generateModelUri = config.generateModel || process.env.QMD_GENERATE_MODEL || DEFAULT_GENERATE_MODEL;
    this.rerankModelUri = config.rerankModel || process.env.QMD_RERANK_MODEL || DEFAULT_RERANK_MODEL;
+    // STRUCTURAL INVARIANT: the launcher (bin/qmd) sets GGML_METAL_NO_RESIDENCY=1
+    // on darwin BEFORE the native binding loads, which prevents the libggml-metal
+    // static destructor assertion at process exit (ggml-org/llama.cpp#22593).
+    // See isDarwinMetalMitigationActive() for the runtime check exposed to
+    // diagnostics. No constructor-time guard installation is needed.
+
+    this.embedModelUri = resolveEmbedModel({ embed: config.embedModel });
+    this.generateModelUri = resolveGenerateModel({ generate: config.generateModel });
+    this.rerankModelUri = resolveRerankModel({ rerank: config.rerankModel });
    this.modelCacheDir = config.modelCacheDir || MODEL_CACHE_DIR;
    this.expandContextSize = resolveExpandContextSize(config.expandContextSize);
    this.inactivityTimeoutMs = config.inactivityTimeoutMs ?? DEFAULT_INACTIVITY_TIMEOUT_MS;
@ -542,6 +770,13 @@ export class LlamaCpp implements LLM {

  get usesLocalEmbedding(): boolean {
    return isLocalEmbeddingModel(this.embedModelUri);
+    
+  get generateModelName(): string {
+    return this.generateModelUri;
+  }
+
+  get rerankModelName(): string {
+    return this.rerankModelUri;
  }

  /**
@ -649,33 +884,89 @@ export class LlamaCpp implements LLM {
    if (!this.llama) {
      const gpuMode = resolveLlamaGpuMode();

-      const loadLlama = async (gpu: LlamaGpuMode) =>
-        await getLlama({
-          build: allowBuild ? "autoAttempt" : "never",
+      const { getLlama, getLlamaGpuTypes, LlamaLogLevel } = await loadNodeLlamaCpp();
+      const loadLlama = async (gpu: LlamaGpuMode, sourceBuildAllowed = allowBuild, buildOverride?: "auto" | "never") =>
+        await withNativeStdoutRedirectedToStderr(() => getLlama({
+          // Prefer packaged prebuilt bindings before compiling llama.cpp locally.
+          // node-llama-cpp documents gpu:"auto" as the best default: Metal on
+          // Apple Silicon, CUDA when fully available, Vulkan where available,
+          // then CPU. Use build:"auto" for normal loads and build:"never" for
+          // diagnostic/probe paths that must not compile llama.cpp.
+          build: buildOverride ?? (sourceBuildAllowed ? "auto" : "never"),
          logLevel: LlamaLogLevel.error,
          gpu,
-          skipDownload: !allowBuild,
-        });
+          progressLogs: false,
+          skipDownload: !sourceBuildAllowed,
+        }));
+      const loadCpuCompatibleLlama = async () => {
+        try {
+          return await loadLlama(false, false);
+        } catch (err) {
+          // Some platforms, notably Apple Silicon, ship a Metal prebuilt but no
+          // CPU-only prebuilt. Do a fast no-build lookup for an actual CPU
+          // binding first; if it does not exist, use the packaged auto/Metal
+          // binding and disable model offloading via gpuLayers: 0.
+          if (!cpuForcedPrebuiltFallbackWarningShown) {
+            cpuForcedPrebuiltFallbackWarningShown = true;
+            process.stderr.write(
+              `QMD Warning: CPU-only llama.cpp prebuilt not available (${err instanceof Error ? err.message : String(err)}); using packaged backend with GPU offloading disabled.\n`
+            );
+          }
+          return await loadLlama("auto", false);
+        }
+      };

      let llama: Llama;
      if (gpuMode === false) {
-        llama = await loadLlama(false);
+        llama = await loadCpuCompatibleLlama();
+      } else if (failedGpuInitModes.has(gpuMode)) {
+        process.stderr.write(
+          `QMD Warning: skipping previously failed GPU init${gpuMode === "auto" ? "" : ` for QMD_LLAMA_GPU=${gpuMode}`}, using CPU.\n`
+        );
+        llama = await loadCpuCompatibleLlama();
      } else {
        try {
          llama = await loadLlama(gpuMode);
+
+          // If node-llama-cpp auto-detection chose CPU, do one no-build pass
+          // over all OS-valid packaged GPU backends. This preserves the
+          // documented auto mode for Metal/CUDA/Vulkan while recovering on
+          // systems where a packaged backend can load but detection is too
+          // conservative. Never compile during these extra probes.
+          if (gpuMode === "auto" && llama.gpu === false && getLlamaGpuTypes) {
+            const candidates = (await getLlamaGpuTypes("allValid"))
+              .filter((candidate): candidate is Exclude<LlamaGpuMode, "auto" | false> => candidate !== false && candidate !== "auto");
+            for (const candidate of candidates) {
+              if (failedGpuInitModes.has(candidate)) continue;
+              try {
+                const gpuLlama = await loadLlama(candidate, false, "never");
+                if (gpuLlama.gpu !== false) {
+                  await disposeWithTimeout("CPU llama runtime", () => llama.dispose());
+                  llama = gpuLlama;
+                  break;
+                }
+                await disposeWithTimeout(`${candidate} probe runtime`, () => gpuLlama.dispose());
+              } catch {
+                failedGpuInitModes.add(candidate);
+              }
+            }
+          }
        } catch (err) {
-          // GPU backend (e.g. Vulkan on headless/driverless machines) can throw at init.
-          // Fall back to CPU so qmd still works.
+          // GPU backend (e.g. Vulkan/CUDA on headless/driverless machines) can throw at init.
+          // Fall back to CPU so qmd still works, and cache the failure to avoid repeated
+          // expensive native build/probe attempts in this process.
+          failedGpuInitModes.add(gpuMode);
          process.stderr.write(
            `QMD Warning: GPU init failed${gpuMode === "auto" ? "" : ` for QMD_LLAMA_GPU=${gpuMode}`} (${err instanceof Error ? err.message : String(err)}), falling back to CPU.\n`
          );
-          llama = await loadLlama(false);
+          llama = await loadCpuCompatibleLlama();
        }
      }

-      if (llama.gpu === false) {
+      if (llama.gpu === false && !noGpuAccelerationWarningShown) {
+        noGpuAccelerationWarningShown = true;
        process.stderr.write(
-          "QMD Warning: no GPU acceleration, running on CPU (slow). Run 'qmd status' for details.\n"
+          "QMD Warning: no GPU acceleration, running on CPU (slow). Run 'qmd doctor' for device diagnostics.\n"
        );
      }
      this.llama = llama;
@ -683,6 +974,17 @@ export class LlamaCpp implements LLM {
    return this.llama;
  }

+  private isCpuOffloadForced(): boolean {
+    return isCpuModeRequested();
+  }
+
+  private modelLoadOptions(modelPath: string): { modelPath: string; gpuLayers?: number } {
+    return {
+      modelPath,
+      ...(this.isCpuOffloadForced() ? { gpuLayers: 0 } : {}),
+    };
+  }
+
  /**
   * Resolve a model URI to a local path, downloading if needed.
   * Validates the downloaded file is actually a GGUF model (not an HTML error page
@ -691,6 +993,7 @@ export class LlamaCpp implements LLM {
  private async resolveModel(modelUri: string): Promise<string> {
    this.ensureModelCacheDir();
    // resolveModelFile handles HF URIs and downloads to the cache dir
+    const { resolveModelFile } = await loadNodeLlamaCpp();
    const modelPath = await resolveModelFile(modelUri, this.modelCacheDir);
    validateGgufFile(modelPath, modelUri);
    return modelPath;
@ -713,7 +1016,7 @@ export class LlamaCpp implements LLM {
    this.embedModelLoadPromise = (async () => {
      const llama = await this.ensureLlama();
      const modelPath = await this.resolveModel(this.embedModelUri);
-      const model = await llama.loadModel({ modelPath });
+      const model = await llama.loadModel(this.modelLoadOptions(modelPath));
      this.embedModel = model;
      // Model loading counts as activity - ping to keep alive
      this.touchActivity();
@ -739,21 +1042,23 @@ export class LlamaCpp implements LLM {
  private async computeParallelism(perContextMB: number): Promise<number> {
    const llama = await this.ensureLlama();

-    if (llama.gpu) {
+    if (!this.isCpuOffloadForced() && llama.gpu) {
      try {
        const vram = await llama.getVramState();
        const freeMB = vram.free / (1024 * 1024);
        const maxByVram = Math.floor((freeMB * 0.25) / perContextMB);
-        return Math.max(1, Math.min(8, maxByVram));
+        const computed = Math.max(1, Math.min(8, maxByVram));
+        return resolveSafeParallelism({ gpu: llama.gpu, computed });
      } catch {
-        return 2;
+        return resolveSafeParallelism({ gpu: llama.gpu, computed: 2 });
      }
    }

    // CPU: split cores across contexts. At least 4 threads per context.
    const cores = llama.cpuMathCores || 4;
    const maxContexts = Math.floor(cores / 4);
-    return Math.max(1, Math.min(4, maxContexts));
+    const computed = Math.max(1, Math.min(4, maxContexts));
+    return resolveSafeParallelism({ gpu: false, computed });
  }

  /**
@ -762,7 +1067,7 @@ export class LlamaCpp implements LLM {
   */
  private async threadsPerContext(parallelism: number): Promise<number> {
    const llama = await this.ensureLlama();
-    if (llama.gpu) return 0; // GPU: let the library decide
+    if (!this.isCpuOffloadForced() && llama.gpu) return 0; // GPU: let the library decide
    const cores = llama.cpuMathCores || 4;
    return Math.max(1, Math.floor(cores / parallelism));
  }
@ -830,7 +1135,7 @@ export class LlamaCpp implements LLM {
      this.generateModelLoadPromise = (async () => {
        const llama = await this.ensureLlama();
        const modelPath = await this.resolveModel(this.generateModelUri);
-        const model = await llama.loadModel({ modelPath });
+        const model = await llama.loadModel(this.modelLoadOptions(modelPath));
        this.generateModel = model;
        return model;
      })();
@ -862,7 +1167,7 @@ export class LlamaCpp implements LLM {
    this.rerankModelLoadPromise = (async () => {
      const llama = await this.ensureLlama();
      const modelPath = await this.resolveModel(this.rerankModelUri);
-      const model = await llama.loadModel({ modelPath });
+      const model = await llama.loadModel(this.modelLoadOptions(modelPath));
      this.rerankModel = model;
      // Model loading counts as activity - ping to keep alive
      this.touchActivity();
@ -911,9 +1216,8 @@ export class LlamaCpp implements LLM {
        try {
          this.rerankContexts.push(await model.createRankingContext({
            contextSize: LlamaCpp.RERANK_CONTEXT_SIZE,
-            flashAttention: true,
            ...(threads > 0 ? { threads } : {}),
-          } as any));
+          }));
        } catch {
          if (this.rerankContexts.length === 0) {
            // Flash attention might not be supported — retry without it
@ -1194,6 +1498,7 @@ export class LlamaCpp implements LLM {
    // Create fresh context -> sequence -> session for each call
    const context = await this.generateModel!.createContext();
    const sequence = context.getSequence();
+    const { LlamaChatSession } = await loadNodeLlamaCpp();
    const session = new LlamaChatSession({ contextSequence: sequence });

    const maxTokens = options.maxTokens ?? 150;
@ -1208,7 +1513,7 @@ export class LlamaCpp implements LLM {
        temperature,
        topK: 20,
        topP: 0.8,
-        onTextChunk: (text) => {
+        onTextChunk: (text: string) => {
          result += text;
        },
      });
@ -1274,6 +1579,7 @@ export class LlamaCpp implements LLM {
      contextSize: this.expandContextSize,
    });
    const sequence = genContext.getSequence();
+    const { LlamaChatSession } = await loadNodeLlamaCpp();
    const session = new LlamaChatSession({ contextSequence: sequence });

    try {
@ -1452,17 +1758,18 @@ export class LlamaCpp implements LLM {
    cpuCores: number;
  }> {
    const llama = await this.ensureLlama(options.allowBuild ?? true);
-    const gpuDevices = await llama.getGpuDeviceNames();
+    const cpuForced = this.isCpuOffloadForced();
+    const gpuDevices = cpuForced ? [] : await llama.getGpuDeviceNames();
    let vram: { total: number; used: number; free: number } | undefined;
-    if (llama.gpu) {
+    if (!cpuForced && llama.gpu) {
      try {
        const state = await llama.getVramState();
        vram = { total: state.total, used: state.used, free: state.free };
      } catch { /* no vram info */ }
    }
    return {
-      gpu: llama.gpu,
-      gpuOffloading: llama.supportsGpuOffloading,
+      gpu: cpuForced ? false : llama.gpu,
+      gpuOffloading: !cpuForced && llama.supportsGpuOffloading,
      gpuDevices,
      vram,
      cpuCores: llama.cpuMathCores,
@ -1482,22 +1789,37 @@ export class LlamaCpp implements LLM {
      this.inactivityTimer = null;
    }

-    // Disposing llama cascades to models and contexts automatically
-    // See: https://node-llama-cpp.withcat.ai/guide/objects-lifecycle
-    // Note: llama.dispose() can hang indefinitely, so we use a timeout
-    if (this.llama) {
-      const disposePromise = this.llama.dispose();
-      const timeoutPromise = new Promise<void>((resolve) => setTimeout(resolve, 1000));
-      await Promise.race([disposePromise, timeoutPromise]);
+    // Explicitly dispose in dependency order: contexts first, then models, then llama.
+    // Relying only on llama.dispose() leaves Metal resource sets alive until process
+    // finalization on Apple Silicon, where ggml_metal_device_free can abort after
+    // otherwise-successful CLI output (#368).
+    for (const ctx of this.embedContexts) {
+      await disposeWithTimeout("embedding context", () => ctx.dispose());
+    }
+    this.embedContexts = [];
+
+    for (const ctx of this.rerankContexts) {
+      await disposeWithTimeout("rerank context", () => ctx.dispose());
+    }
+    this.rerankContexts = [];
+
+    if (this.embedModel) {
+      await disposeWithTimeout("embedding model", () => this.embedModel!.dispose());
+      this.embedModel = null;
+    }
+    if (this.generateModel) {
+      await disposeWithTimeout("generation model", () => this.generateModel!.dispose());
+      this.generateModel = null;
+    }
+    if (this.rerankModel) {
+      await disposeWithTimeout("rerank model", () => this.rerankModel!.dispose());
+      this.rerankModel = null;
    }

-    // Clear references
-    this.embedContexts = [];
-    this.rerankContexts = [];
-    this.embedModel = null;
-    this.generateModel = null;
-    this.rerankModel = null;
-    this.llama = null;
+    if (this.llama) {
+      await disposeWithTimeout("llama runtime", () => this.llama!.dispose());
+      this.llama = null;
+    }

    // Clear any in-flight load/create promises
    this.embedModelLoadPromise = null;
@ -1752,6 +2074,66 @@ export function canUnloadLLM(): boolean {
  return defaultSessionManager.canUnload();
 }

+// =============================================================================
+// Darwin Metal exit-crash mitigation
+// =============================================================================
+//
+// libggml-metal on macOS keeps allocated model memory wired via "residency
+// sets" with a 180-second keep_alive timer (added in ggml-org/llama.cpp#11427).
+// The process-static `std::vector<std::unique_ptr<ggml_metal_device>>`
+// destructor fires during libc `exit()` → `__cxa_finalize_ranges` and asserts
+// `[rsets->data count] == 0` — but the keep_alive hasn't expired, so the
+// assertion fails and `ggml_abort` dumps a multi-kilobyte stack trace to
+// stderr after the user-visible output. See ggml-org/llama.cpp#22593.
+//
+// No JS-side dispose call (`llama.dispose()`, `model.dispose()`, etc.) can
+// prevent it: the static destructor runs after every JS-reachable cleanup,
+// and `process.reallyExit` on Node calls libc `exit()` not `_exit()` (it
+// does NOT skip C++ static destructors — verified in
+// node/src/api/environment.cc).
+//
+// The actual fix is to disable residency sets via `GGML_METAL_NO_RESIDENCY=1`,
+// which we set from `bin/qmd` before Node loads the native binding. For QMD's
+// short-lived CLI workflow this has no measurable cost (subsequent calls
+// don't reuse the warm mapping). The functions below report whether that
+// mitigation is in effect — kept here, in the module that depends on the
+// underlying resource, so doctor can answer "is the protection active?"
+// without reaching into env handling directly.
+//
+// Setting `QMD_METAL_KEEP_RESIDENCY=1` opts back into residency sets (with
+// the visible-noise consequences). The legacy `QMD_DISABLE_DARWIN_SAFE_EXIT`
+// env var is accepted as a no-op alias for back-compat; it had no effect on
+// Node prior to this fix.
+
+/**
+ * Whether QMD's darwin Metal exit-crash mitigation is active in this process:
+ *   true  → residency sets disabled, process exit completes silently
+ *   false → either non-darwin, or `QMD_METAL_KEEP_RESIDENCY=1` overrode it,
+ *           in which case the libggml-metal teardown assertion may fire
+ */
+export function isDarwinMetalMitigationActive(): boolean {
+  if (process.platform !== "darwin") return false;
+  if (process.env.QMD_METAL_KEEP_RESIDENCY === "1") return false;
+  return process.env.GGML_METAL_NO_RESIDENCY === "1";
+}
+
+/**
+ * Compatibility shim: previous releases installed a `process.on('exit')` hook
+ * that tried to skip the C++ static destructor by calling `process.reallyExit`.
+ * That mechanism didn't work on Node (Environment::Exit still calls libc
+ * `exit()`), so it was replaced by `GGML_METAL_NO_RESIDENCY=1` from bin/qmd.
+ * Kept as a no-op for code paths that still call it; safe to remove once no
+ * production launcher predates the residency-set fix.
+ */
+export function installDarwinExitGuard(): void {
+  // Intentional no-op. See isDarwinMetalMitigationActive() for the real check.
+}
+
+/** @deprecated Replaced by isDarwinMetalMitigationActive. */
+export function isDarwinExitGuardInstalled(): boolean {
+  return isDarwinMetalMitigationActive();
+}
+
 // =============================================================================
 // Singleton for default LlamaCpp instance
 // =============================================================================
@ -1759,7 +2141,9 @@ export function canUnloadLLM(): boolean {
 let defaultLlamaCpp: LlamaCpp | null = null;

 /**
- * Get the default LlamaCpp instance (creates one if needed)
+ * Get the default LlamaCpp instance (creates one if needed). The LlamaCpp
+ * constructor installs the darwin exit guard, so any code path that obtains
+ * the singleton is protected.
 */
 export function getDefaultLlamaCpp(): LlamaCpp {
  if (!defaultLlamaCpp) {
@ -1769,12 +2153,24 @@ export function getDefaultLlamaCpp(): LlamaCpp {
 }

 /**
- * Set a custom default LlamaCpp instance (useful for testing)
+ * Set a custom default LlamaCpp instance (useful for testing). Setting a
+ * non-null instance also ensures the darwin exit guard is installed — keeps
+ * the invariant intact for test doubles that didn't go through the real
+ * constructor.
 */
 export function setDefaultLlamaCpp(llm: LlamaCpp | null): void {
+  if (llm !== null) installDarwinExitGuard();
  defaultLlamaCpp = llm;
 }

+/**
+ * Peek at the default LlamaCpp instance without instantiating one. Used by
+ * doctor and lifecycle diagnostics.
+ */
+export function hasDefaultLlamaCpp(): boolean {
+  return defaultLlamaCpp !== null;
+}
+
 /**
 * Dispose the default LlamaCpp instance if it exists.
 * Call this before process exit to prevent NAPI crashes.
--- a/src/mcp/server.ts
+++ b/src/mcp/server.ts
@ -32,8 +32,6 @@ import {
 import { getConfigPath } from "../collections.js";
 import { enableProductionMode } from "../store.js";

-enableProductionMode();
-
 // =============================================================================
 // Types for structured content
 // =============================================================================
@ -44,6 +42,7 @@ type SearchResultItem = {
  title: string;
  score: number;
  context: string | null;
+  line: number;   // Absolute line in source markdown
  snippet: string;
 };

@ -108,7 +107,6 @@ function getPackageVersion(): string {
 */
 async function buildInstructions(store: QMDStore): Promise<string> {
  const status = await store.getStatus();
-  const contexts = await store.listContexts();
  const globalCtx = await store.getGlobalContext();
  const lines: string[] = [];

@ -117,15 +115,13 @@ async function buildInstructions(store: QMDStore): Promise<string> {
  if (globalCtx) lines.push(`Context: ${globalCtx}`);

  // --- What's searchable? ---
+  // Emit names only — the per-collection doc counts and descriptions can run to ~1.5 KB
+  // across a dozen collections, and the same info is available on demand via the `status` tool.
  if (status.collections.length > 0) {
    lines.push("");
-    lines.push("Collections (scope with `collection` parameter):");
-    for (const col of status.collections) {
-      // Find root context for this collection
-      const rootCtx = contexts.find(c => c.collection === col.name && (c.path === "" || c.path === "/"));
-      const desc = rootCtx ? ` — ${rootCtx.context}` : "";
-      lines.push(`  - "${col.name}" (${col.documents} docs)${desc}`);
-    }
+    const names = status.collections.map(c => c.name).join(", ");
+    lines.push(`Collections (scope with \`collections\` parameter): ${names}`);
+    lines.push("Call the `status` tool for collection descriptions, paths, and per-collection doc counts.");
  }

  // --- Capability gaps ---
@ -155,7 +151,7 @@ async function buildInstructions(store: QMDStore): Promise<string> {
  // --- Retrieval workflow ---
  lines.push("");
  lines.push("Retrieval:");
-  lines.push("  - `get` — single document by path or docid (#abc123). Supports line offset (`file.md:100`).");
+  lines.push("  - `get` — single document by path or docid (#abc123). Supports a line-range suffix: `file.md:100` (from line 100) or `file.md:100:40` (40 lines from line 100).");
  lines.push("  - `multi_get` — batch retrieve by glob (`journals/2025-05*.md`) or comma-separated list.");

  // --- Non-obvious things that prevent mistakes ---
@ -244,6 +240,8 @@ async function createMcpServer(store: QMDStore): Promise<McpServer> {
      title: "Query",
      description: `Search the knowledge base using a query document — one or more typed sub-queries combined for best recall.

+Each result includes a \`line\` field with the absolute 1-indexed line of the best match in the source markdown. To read more context around a hit, call \`get(file, fromLine = max(1, line - 20), maxLines = 80, lineNumbers = true)\`.
+
 ## Query Types

 **lex** — BM25 keyword search. Fast, exact, no LLM needed.
@ -333,6 +331,7 @@ Intent-aware lex (C++ performance, not sports):
        collections: effectiveCollections.length > 0 ? effectiveCollections : undefined,
        limit,
        minScore,
+        candidateLimit,
        rerank,
        intent,
      });
@ -343,13 +342,14 @@ Intent-aware lex (C++ performance, not sports):
        || searches[0]?.query || "";

      const filtered: SearchResultItem[] = results.map(r => {
-        const { line, snippet } = extractSnippet(r.bestChunk, primaryQuery, 300, undefined, undefined, intent);
+        const { line, snippet } = extractSnippet(r.body, primaryQuery, 300, r.bestChunkPos, r.bestChunk.length, intent);
        return {
          docid: `#${r.docid}`,
          file: r.displayPath,
          title: r.title,
          score: Math.round(r.score * 100) / 100,
          context: r.context,
+          line,
          snippet: addLineNumbers(snippet, line),
        };
      });
@ -372,21 +372,31 @@ Intent-aware lex (C++ performance, not sports):
      description: "Retrieve the full content of a document by its file path or docid. Use paths or docids (#abc123) from search results. Suggests similar files if not found.",
      annotations: { readOnlyHint: true, openWorldHint: false },
      inputSchema: {
-        file: z.string().describe("File path or docid from search results (e.g., 'pages/meeting.md', '#abc123', or 'pages/meeting.md:100' to start at line 100)"),
+        file: z.string().describe("File path or docid from search results. Supports a line-range suffix: 'pages/meeting.md:100' starts at line 100; 'pages/meeting.md:100:40' (or '#abc123:100:40') reads 40 lines from line 100."),
        fromLine: z.number().optional().describe("Start from this line number (1-indexed)"),
        maxLines: z.number().optional().describe("Maximum number of lines to return"),
-        lineNumbers: z.boolean().optional().default(false).describe("Add line numbers to output (format: 'N: content')"),
+        lineNumbers: z.boolean().optional().default(true).describe("Add line numbers to output (format: 'N: content'). On by default; set false for raw content."),
      },
    },
    async ({ file, fromLine, maxLines, lineNumbers }) => {
-      // Support :line suffix in `file` (e.g. "foo.md:120") when fromLine isn't provided
+      // Support :line and :from:count suffixes in `file` (e.g. "foo.md:120" or
+      // "foo.md:120:40"). Explicit fromLine/maxLines args take precedence.
      let parsedFromLine = fromLine;
+      let parsedMaxLines = maxLines;
      let lookup = file;
-      const colonMatch = lookup.match(/:(\d+)$/);
-      if (colonMatch && colonMatch[1] && parsedFromLine === undefined) {
-        parsedFromLine = parseInt(colonMatch[1], 10);
-        lookup = lookup.slice(0, -colonMatch[0].length);
+      const rangeMatch = lookup.match(/:(\d+):(\d+)$/);
+      if (rangeMatch) {
+        if (parsedFromLine === undefined) parsedFromLine = parseInt(rangeMatch[1]!, 10);
+        if (parsedMaxLines === undefined) parsedMaxLines = parseInt(rangeMatch[2]!, 10);
+        lookup = lookup.slice(0, -rangeMatch[0].length);
+      } else {
+        const colonMatch = lookup.match(/:(\d+)$/);
+        if (colonMatch && colonMatch[1] && parsedFromLine === undefined) {
+          parsedFromLine = parseInt(colonMatch[1], 10);
+          lookup = lookup.slice(0, -colonMatch[0].length);
+        }
      }
+      if (parsedFromLine !== undefined) parsedFromLine = Math.max(1, parsedFromLine);

      const result = await store.get(lookup, { includeBody: false });

@ -401,7 +411,7 @@ Intent-aware lex (C++ performance, not sports):
        };
      }

-      const body = await store.getDocumentBody(result.filepath, { fromLine: parsedFromLine, maxLines }) ?? "";
+      const body = await store.getDocumentBody(result.filepath, { fromLine: parsedFromLine, maxLines: parsedMaxLines }) ?? "";
      let text = body;
      if (lineNumbers) {
        const startLine = parsedFromLine || 1;
@ -440,7 +450,7 @@ Intent-aware lex (C++ performance, not sports):
        pattern: z.string().describe("Glob pattern or comma-separated list of file paths"),
        maxLines: z.number().optional().describe("Maximum lines per file"),
        maxBytes: z.number().optional().default(10240).describe("Skip files larger than this (default: 10240 = 10KB)"),
-        lineNumbers: z.boolean().optional().default(false).describe("Add line numbers to output (format: 'N: content')"),
+        lineNumbers: z.boolean().optional().default(true).describe("Add line numbers to output (format: 'N: content'). On by default; set false for raw content."),
      },
    },
    async ({ pattern, maxLines, maxBytes, lineNumbers }) => {
@ -540,10 +550,20 @@ Intent-aware lex (C++ performance, not sports):
 // Transport: stdio (default)
 // =============================================================================

-export async function startMcpServer(): Promise<void> {
+export type McpStartupOptions = {
+  dbPath?: string;
+};
+
+export async function startMcpServer(options: McpStartupOptions = {}): Promise<void> {
+  // Opt into production mode when the MCP server is actually started, not
+  // when this module is merely imported for its exports. Importing the module
+  // at the top level flipped the global production flag and broke test
+  // isolation for downstream suites that expect the default (development)
+  // database path behaviour.
+  enableProductionMode();
  const configPath = getConfigPath();
  const store = await createStore({
-    dbPath: getDefaultDbPath(),
+    dbPath: options.dbPath ?? getDefaultDbPath(),
    ...(existsSync(configPath) ? { configPath } : {}),
  });
  const server = await createMcpServer(store);
@ -565,10 +585,17 @@ export type HttpServerHandle = {
 * Start MCP server over Streamable HTTP (JSON responses, no SSE).
 * Binds to localhost only. Returns a handle for shutdown and port discovery.
 */
-export async function startMcpHttpServer(port: number, options?: { quiet?: boolean }): Promise<HttpServerHandle> {
+export async function startMcpHttpServer(
+  port: number,
+  options: ({ quiet?: boolean } & McpStartupOptions) = {},
+): Promise<HttpServerHandle> {
+  // See startMcpServer() for the rationale — flip production mode here so the
+  // HTTP transport resolves the real database path, without leaking state into
+  // callers that only import this module for its exports (e.g. tests).
+  enableProductionMode();
  const configPath = getConfigPath();
  const store = await createStore({
-    dbPath: getDefaultDbPath(),
+    dbPath: options.dbPath ?? getDefaultDbPath(),
    ...(existsSync(configPath) ? { configPath } : {}),
  });

@ -608,9 +635,21 @@ export async function startMcpHttpServer(port: number, options?: { quiet?: boole
    return new Date().toISOString().slice(11, 23); // HH:mm:ss.SSS
  }

+  type JsonRpcLikeBody = {
+    method?: unknown;
+    params?: {
+      name?: unknown;
+      arguments?: Record<string, unknown>;
+    };
+  };
+  type RestSearchInput = {
+    type?: unknown;
+    query?: unknown;
+  };
+
  /** Extract a human-readable label from a JSON-RPC body */
-  function describeRequest(body: any): string {
-    const method = body?.method ?? "unknown";
+  function describeRequest(body: JsonRpcLikeBody): string {
+    const method = typeof body.method === "string" ? body.method : "unknown";
    if (method === "tools/call") {
      const tool = body.params?.name ?? "?";
      const args = body.params?.arguments;
@ -654,7 +693,7 @@ export async function startMcpHttpServer(port: number, options?: { quiet?: boole
      // REST endpoint: POST /query (alias: /search) — structured search without MCP protocol
      if ((pathname === "/query" || pathname === "/search") && nodeReq.method === "POST") {
        const rawBody = await collectBody(nodeReq);
-        const params = JSON.parse(rawBody);
+        const params = JSON.parse(rawBody) as Record<string, unknown>;

        // Validate required fields
        if (!params.searches || !Array.isArray(params.searches)) {
@ -664,35 +703,39 @@ export async function startMcpHttpServer(port: number, options?: { quiet?: boole
        }

        // Map to internal format
-        const queries: ExpandedQuery[] = params.searches.map((s: any) => ({
+        const searches = params.searches as RestSearchInput[];
+        const queries: ExpandedQuery[] = searches.map((s) => ({
          type: s.type as 'lex' | 'vec' | 'hyde',
          query: String(s.query || ""),
        }));

        // Use default collections if none specified
-        const effectiveCollections = params.collections ?? defaultCollectionNames;
+        const effectiveCollections = Array.isArray(params.collections) ? params.collections.map(String) : defaultCollectionNames;

        const results = await store.search({
          queries,
          collections: effectiveCollections.length > 0 ? effectiveCollections : undefined,
-          limit: params.limit ?? 10,
-          minScore: params.minScore ?? 0,
-          intent: params.intent,
+          limit: typeof params.limit === "number" ? params.limit : 10,
+          minScore: typeof params.minScore === "number" ? params.minScore : 0,
+          candidateLimit: typeof params.candidateLimit === "number" ? params.candidateLimit : undefined,
+          intent: typeof params.intent === "string" ? params.intent : undefined,
+          rerank: typeof params.rerank === "boolean" ? params.rerank : undefined,
        });

        // Use first lex or vec query for snippet extraction
-        const primaryQuery = params.searches.find((s: any) => s.type === 'lex')?.query
-          || params.searches.find((s: any) => s.type === 'vec')?.query
-          || params.searches[0]?.query || "";
+        const primaryQuery = searches.find((s) => s.type === 'lex')?.query
+          || searches.find((s) => s.type === 'vec')?.query
+          || searches[0]?.query || "";

        const formatted = results.map(r => {
-          const { line, snippet } = extractSnippet(r.bestChunk, primaryQuery, 300);
+          const { line, snippet } = extractSnippet(r.body, String(primaryQuery), 300, r.bestChunkPos, r.bestChunk.length, typeof params.intent === "string" ? params.intent : undefined);
          return {
            docid: `#${r.docid}`,
-            file: r.displayPath,
+            file: `qmd://${encodeQmdPath(r.displayPath)}`,
            title: r.title,
            score: Math.round(r.score * 100) / 100,
            context: r.context,
+            line,
            snippet: addLineNumbers(snippet, line),
          };
        });
--- a/src/paths.ts
+++ b/src/paths.ts
@ -0,0 +1,5 @@
+import { homedir as osHomedir } from "node:os";
+
+export function qmdHomedir(): string {
+  return process.env.HOME || process.env.USERPROFILE || osHomedir() || "/tmp";
+}
--- a/src/store.ts
+++ b/src/store.ts
--- a/src/test-preload.ts
+++ b/src/test-preload.ts
@ -1,8 +1,19 @@
 /**
 * Test preload file to ensure proper cleanup of native resources.
 *
- * Uses bun:test afterAll to properly dispose of llama.cpp Metal
- * resources before the process exits, avoiding GGML_ASSERT failures.
+ * Uses bun:test afterAll to dispose of llama.cpp Metal resources before
+ * the process exits — necessary on darwin to avoid the upstream rsets
+ * destructor assertion (ggml-org/llama.cpp#22593, fix open as #22595).
+ *
+ * The runner-level mitigation `GGML_METAL_NO_RESIDENCY=1` must be set
+ * BEFORE bun/node starts (libggml-metal reads it via libc getenv at
+ * module load). Bun does not propagate `process.env` writes to libc
+ * setenv, so setting it from here would be a no-op for the native
+ * binding. The env var is injected by:
+ *   - bin/qmd for production CLI runs
+ *   - scripts/test-all.mjs for `npm test`
+ *   - package.json test:bun / test:unit scripts for direct invocation
+ * See CLAUDE.md for invoking `bun test` manually on darwin.
 */
 import { afterAll } from "bun:test";
 import { disposeDefaultLlamaCpp } from "./llm";
--- a/src/types/picomatch.d.ts
+++ b/src/types/picomatch.d.ts
@ -0,0 +1,4 @@
+declare module "picomatch" {
+  export type Matcher = (input: string) => boolean;
+  export default function picomatch(pattern: string | string[], options?: Record<string, unknown>): Matcher;
+}
--- a/test/Containerfile
+++ b/test/Containerfile
@ -13,10 +13,12 @@ ENV PATH="/root/.local/bin:$PATH"
 # Pre-install node and bun
 RUN mise use -g node@latest bun@latest

-# Copy the packed tarball and install via both package managers
-COPY tobilu-qmd-*.tgz /tmp/
-RUN mise exec node@latest -- npm install -g /tmp/tobilu-qmd-*.tgz
-RUN mise exec bun@latest -- bun install -g /tmp/tobilu-qmd-*.tgz
+# Copy the packed tarball and install via both package managers. Keep a stable
+# tarball path for npm-exec/npx-style smoke scenarios.
+COPY tobilu-qmd-*.tgz /tmp/qmd-package.tgz
+RUN cp /tmp/qmd-package.tgz /tmp/tobilu-qmd.tgz
+RUN mise exec node@latest -- npm install -g /tmp/qmd-package.tgz
+RUN mise exec bun@latest -- bun install -g /tmp/qmd-package.tgz

 # Copy test project (src + test + configs) and install deps
 COPY test-src/ /opt/qmd/
--- a/test/ast.test.ts
+++ b/test/ast.test.ts
@ -6,7 +6,7 @@
 */

 import { describe, test, expect } from "vitest";
-import { detectLanguage, getASTBreakPoints, extractSymbols } from "../src/ast.js";
+import { detectLanguage, getASTBreakPoints, extractSymbols, formatGrammarLoadError } from "../src/ast.js";
 import type { SupportedLanguage } from "../src/ast.js";

 // =============================================================================
@ -315,6 +315,16 @@ describe("getASTBreakPoints - error handling", () => {
    // Should either return some partial break points or empty array — not throw
    expect(Array.isArray(points)).toBe(true);
  });
+
+  test("explains missing grammar packages with a repair command", () => {
+    const msg = formatGrammarLoadError(
+      "typescript",
+      new Error("Cannot find module 'tree-sitter-typescript/tree-sitter-typescript.wasm'"),
+    );
+    expect(msg).toContain("tree-sitter-typescript");
+    expect(msg).toContain("bun add tree-sitter-typescript@0.23.2");
+    expect(msg).toContain("falling back to regex");
+  });
 });

 // =============================================================================
--- a/test/bench-score.test.ts
+++ b/test/bench-score.test.ts
@ -99,6 +99,20 @@ describe("scoreResults", () => {
    expect(result.mrr).toBeCloseTo(0.5); // 1/2
  });

+  test("reports recall@1/3/5 and matched documents", () => {
+    const result = scoreResults(
+      ["x.md", "qmd://concepts/a.md", "docs/b.md", "docs/c.md", "docs/d.md"],
+      ["concepts/a.md", "b.md", "missing.md"],
+      3,
+    );
+
+    expect(result.recall_at_1).toBe(0);
+    expect(result.recall_at_3).toBeCloseTo(2 / 3);
+    expect(result.recall_at_5).toBeCloseTo(2 / 3);
+    expect(result.matched_files).toEqual(["concepts/a.md", "b.md"]);
+    expect(result.unmatched_expected_files).toEqual(["missing.md"]);
+  });
+
  test("empty results", () => {
    const result = scoreResults([], ["a.md"], 1);
    expect(result.precision_at_k).toBe(0);
--- a/test/bin-wrapper.test.ts
+++ b/test/bin-wrapper.test.ts
@ -0,0 +1,263 @@
+import { afterEach, describe, expect, test } from "vitest";
+import { chmodSync, copyFileSync, mkdtempSync, mkdirSync, readFileSync, realpathSync, rmSync, symlinkSync, writeFileSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { dirname, join, relative } from "node:path";
+import { execFileSync, spawnSync } from "node:child_process";
+import { fileURLToPath } from "node:url";
+
+const repoRoot = fileURLToPath(new URL("..", import.meta.url));
+const fixtures: string[] = [];
+
+function makeTempFixture() {
+  const root = mkdtempSync(join(tmpdir(), "qmd-bin-wrapper-"));
+  fixtures.push(root);
+  const capturePath = join(root, "capture.txt");
+  const runtimeBin = join(root, "runtime-bin");
+  mkdirSync(runtimeBin, { recursive: true });
+
+  for (const runtime of ["node", "bun"]) {
+    const runtimePath = join(runtimeBin, runtime);
+    if (runtime === "node") {
+      writeFileSync(
+        runtimePath,
+        `#!/bin/sh
+if [ "$(basename "$1")" = "qmd" ]; then
+  exec "${process.execPath}" "$@"
+else
+  {
+    printf '%s\\n' 'node'
+    printf '%s\\n' "$1"
+    shift
+    printf '%s\\n' "$@"
+  } > "$QMD_WRAPPER_CAPTURE"
+fi
+`,
+      );
+    } else {
+      writeFileSync(
+        runtimePath,
+        `#!/bin/sh\n{\n  printf '%s\\n' '${runtime}'\n  printf '%s\\n' "$1"\n  shift\n  printf '%s\\n' "$@"\n} > "$QMD_WRAPPER_CAPTURE"\n`,
+      );
+    }
+    chmodSync(runtimePath, 0o755);
+  }
+
+  return { root, capturePath, runtimeBin };
+}
+
+function makePackage(root: string, packagePath: string, lockfiles: string[] = [], options: { dist?: boolean; source?: boolean; tsx?: boolean; git?: boolean } = {}) {
+  const packageRoot = join(root, packagePath);
+  const includeDist = options.dist ?? true;
+  mkdirSync(join(packageRoot, "bin"), { recursive: true });
+  copyFileSync(join(repoRoot, "bin", "qmd"), join(packageRoot, "bin", "qmd"));
+  chmodSync(join(packageRoot, "bin", "qmd"), 0o755);
+  if (includeDist) {
+    mkdirSync(join(packageRoot, "dist", "cli"), { recursive: true });
+    writeFileSync(join(packageRoot, "dist", "cli", "qmd.js"), "// fixture\n");
+  }
+  if (options.source) {
+    mkdirSync(join(packageRoot, "src", "cli"), { recursive: true });
+    writeFileSync(join(packageRoot, "src", "cli", "qmd.ts"), "// source fixture\n");
+  }
+  if (options.tsx) {
+    mkdirSync(join(packageRoot, "node_modules", "tsx", "dist"), { recursive: true });
+    writeFileSync(join(packageRoot, "node_modules", "tsx", "dist", "cli.mjs"), "// tsx fixture\n");
+  }
+  if (options.git) {
+    mkdirSync(join(packageRoot, ".git"), { recursive: true });
+  }
+  for (const lockfile of lockfiles) {
+    writeFileSync(join(packageRoot, lockfile), "");
+  }
+  return packageRoot;
+}
+
+function symlinkRelative(target: string, linkPath: string) {
+  mkdirSync(dirname(linkPath), { recursive: true });
+  symlinkSync(relative(dirname(linkPath), target), linkPath);
+}
+
+function runWrapper(commandPath: string, runtimeBin: string, capturePath: string, env: Record<string, string> = {}) {
+  rmSync(capturePath, { force: true });
+  execFileSync(commandPath, ["--version"], {
+    env: {
+      ...process.env,
+      ...env,
+      PATH: `${runtimeBin}:${process.env.PATH ?? ""}`,
+      QMD_WRAPPER_CAPTURE: capturePath,
+    },
+    stdio: ["ignore", "pipe", "pipe"],
+  });
+  const [runtime, scriptPath, ...args] = readFileSync(capturePath, "utf8").trimEnd().split("\n");
+  return { runtime, scriptPath, args };
+}
+
+afterEach(() => {
+  for (const fixture of fixtures.splice(0)) {
+    rmSync(fixture, { recursive: true, force: true });
+  }
+});
+
+describe("bin/qmd package wrapper", () => {
+  test("direct package invocation resolves dist/cli/qmd.js from the package root", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "node_modules/@tobilu/qmd");
+
+    const result = runWrapper(join(packageRoot, "bin", "qmd"), runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+    expect(result.args).toEqual(["--version"]);
+  });
+
+  test("npm/Homebrew global bin symlink resolves scoped package path", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "opt/homebrew/lib/node_modules/@tobilu/qmd");
+    const globalBin = join(root, "opt", "homebrew", "bin", "qmd");
+    symlinkRelative(join(packageRoot, "bin", "qmd"), globalBin);
+
+    const result = runWrapper(globalBin, runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("multi-hop global bin symlink chain resolves to the real package root", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "opt/homebrew/lib/node_modules/@tobilu/qmd");
+    const globalBin = join(root, "opt", "homebrew", "bin", "qmd");
+    const shim = join(root, "opt", "homebrew", "Cellar", "qmd", "current", "bin", "qmd");
+    symlinkRelative(join(packageRoot, "bin", "qmd"), shim);
+    symlinkRelative(shim, globalBin);
+
+    const result = runWrapper(globalBin, runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("linuxbrew global bin symlink resolves lib/node_modules scoped package path", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "home/linuxbrew/.linuxbrew/lib/node_modules/@tobilu/qmd");
+    const globalBin = join(root, "home", "linuxbrew", ".linuxbrew", "bin", "qmd");
+    symlinkRelative(join(packageRoot, "bin", "qmd"), globalBin);
+
+    const result = runWrapper(globalBin, runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("npx scoped package .bin symlink resolves @tobilu/qmd package path", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "npm/_npx/abc123/node_modules/@tobilu/qmd");
+    const npxBin = join(root, "npm", "_npx", "abc123", "node_modules", ".bin", "qmd");
+    symlinkRelative(join(packageRoot, "bin", "qmd"), npxBin);
+
+    const result = runWrapper(npxBin, runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("bun global symlink uses bun when package-local bun lockfile exists", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "home/user/.bun/install/global/node_modules/@tobilu/qmd", ["bun.lock"]);
+    const bunBin = join(root, "home", "user", ".bun", "bin", "qmd");
+    symlinkRelative(join(packageRoot, "bin", "qmd"), bunBin);
+
+    const result = runWrapper(bunBin, runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("bun");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("ambient BUN_INSTALL alone does not select bun for an npm-installed package", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "opt/homebrew/lib/node_modules/@tobilu/qmd");
+    const globalBin = join(root, "opt", "homebrew", "bin", "qmd");
+    symlinkRelative(join(packageRoot, "bin", "qmd"), globalBin);
+
+    const result = runWrapper(globalBin, runtimeBin, capturePath, { BUN_INSTALL: join(root, ".bun") });
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("package-lock.json takes priority over bun lockfiles", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "node_modules/@tobilu/qmd", ["package-lock.json", "bun.lock"]);
+
+    const result = runWrapper(join(packageRoot, "bin", "qmd"), runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("packaged tree uses dist even if source files are present", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "node_modules/@tobilu/qmd", ["bun.lock"], { source: true });
+
+    const result = runWrapper(join(packageRoot, "bin", "qmd"), runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("bun");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "dist", "cli", "qmd.js")));
+  });
+
+  test("prefers source with bun in a Bun checkout even when dist exists", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "qmd", ["bun.lock"], { source: true, git: true });
+
+    const result = runWrapper(join(packageRoot, "bin", "qmd"), runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("bun");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "src", "cli", "qmd.ts")));
+    expect(result.args).toEqual(["--version"]);
+  });
+
+  test("prefers source through tsx in a Node checkout even when dist exists", () => {
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "qmd", [], { source: true, tsx: true, git: true });
+
+    const result = runWrapper(join(packageRoot, "bin", "qmd"), runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "node_modules", "tsx", "dist", "cli.mjs")));
+    expect(result.args).toEqual([realpathSync(join(packageRoot, "src", "cli", "qmd.ts")), "--version"]);
+  });
+
+  test("source checkout with both bun.lock and package-lock.json prefers node+tsx", () => {
+    // Mirrors the dist-mode "npm priority" rule: a working tree that has both
+    // lockfiles (because the user ran `npm install` against a repo that also
+    // ships bun.lock) installed native modules for Node's ABI, so source mode
+    // must route through tsx to avoid better-sqlite3 / sqlite-vec mismatches.
+    const { root, runtimeBin, capturePath } = makeTempFixture();
+    const packageRoot = makePackage(root, "qmd", ["bun.lock", "package-lock.json"], { source: true, tsx: true, git: true });
+
+    const result = runWrapper(join(packageRoot, "bin", "qmd"), runtimeBin, capturePath);
+
+    expect(result.runtime).toBe("node");
+    expect(result.scriptPath).toBe(realpathSync(join(packageRoot, "node_modules", "tsx", "dist", "cli.mjs")));
+    expect(result.args).toEqual([realpathSync(join(packageRoot, "src", "cli", "qmd.ts")), "--version"]);
+  });
+
+  test("explains how to build when dist is missing and source cannot run", () => {
+    const { root, runtimeBin } = makeTempFixture();
+    const packageRoot = makePackage(root, "qmd", [], { dist: false });
+
+    const result = spawnSync(join(packageRoot, "bin", "qmd"), ["--version"], {
+      env: {
+        ...process.env,
+        PATH: `${runtimeBin}:${process.env.PATH ?? ""}`,
+      },
+      encoding: "utf8",
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+
+    expect(result.status).toBe(1);
+    expect(result.stderr).toContain("qmd is not built");
+    expect(result.stderr).toContain("bun install && bun run build");
+    expect(result.stderr).toContain("npm install && npm run build");
+    expect(result.stderr).toContain("qmd doctor");
+  });
+});
--- a/test/cli-exit-lifecycle.test.ts
+++ b/test/cli-exit-lifecycle.test.ts
@ -0,0 +1,128 @@
+import { describe, expect, test } from "vitest";
+import { finishSuccessfulCliCommand } from "../src/cli/qmd.ts";
+import { LlamaCpp, isDarwinMetalMitigationActive } from "../src/llm.ts";
+
+describe("CLI successful-exit lifecycle", () => {
+  test("exits 0 after successful output when post-output LLM cleanup fails", async () => {
+    const exitCodes: number[] = [];
+    const stderr: string[] = [];
+    const flushed: string[] = [];
+
+    await finishSuccessfulCliCommand({
+      command: "query",
+      format: "json",
+      cleanup: async () => {
+        throw new Error("ggml_metal_device_free abort simulation");
+      },
+      exit: (code) => {
+        exitCodes.push(code);
+      },
+      stdout: { write: (chunk: string | Uint8Array, cb?: (error?: Error | null) => void) => { flushed.push(String(chunk)); cb?.(); return true; } },
+      stderr: { write: (chunk: string | Uint8Array, cb?: (error?: Error | null) => void) => { stderr.push(String(chunk)); cb?.(); return true; } },
+    });
+
+    expect(exitCodes).toEqual([0]);
+    expect(stderr.join("")).toContain("QMD Warning: cleanup after successful output failed");
+    expect(flushed).toEqual([""]);
+  });
+
+  test("flushes stdout, runs cleanup, flushes stderr, then exits (when exit is provided)", async () => {
+    // The legacy lifecycle order is preserved for callers that pass an
+    // explicit `exit` function — primarily this test, which needs an
+    // observable terminating step.
+    const calls: string[] = [];
+
+    await finishSuccessfulCliCommand({
+      command: "query",
+      format: "json",
+      cleanup: async () => { calls.push("cleanup"); },
+      exit: (code) => { calls.push(`exit:${code}`); },
+      stdout: { write: (_chunk: string | Uint8Array, cb?: (error?: Error | null) => void) => { calls.push("stdout-flush"); cb?.(); return true; } },
+      stderr: { write: (_chunk: string | Uint8Array, cb?: (error?: Error | null) => void) => { calls.push("stderr-flush"); cb?.(); return true; } },
+    });
+
+    expect(calls).toEqual(["stdout-flush", "cleanup", "stderr-flush", "exit:0"]);
+  });
+
+  test("production path: sets process.exitCode=0 and returns instead of calling process.exit", async () => {
+    // The real CLI does NOT pass `exit` — finishSuccessfulCliCommand should set
+    // process.exitCode and return, letting Node's `beforeExit` fire so
+    // node-llama-cpp's auto-dispose runs BEFORE libc's static destructor.
+    // process.exit() skips `beforeExit`, which is what trips the libggml-metal
+    // assertion (ggml-org/llama.cpp#22593) even with explicit dispose.
+    const prevCode = process.exitCode;
+    process.exitCode = 1; // poison the state to verify we set it
+    try {
+      const calls: string[] = [];
+      await finishSuccessfulCliCommand({
+        command: "query",
+        format: "json",
+        cleanup: async () => { calls.push("cleanup"); },
+        stdout: { write: (_c: string | Uint8Array, cb?: (error?: Error | null) => void) => { calls.push("stdout-flush"); cb?.(); return true; } },
+        stderr: { write: (_c: string | Uint8Array, cb?: (error?: Error | null) => void) => { calls.push("stderr-flush"); cb?.(); return true; } },
+      });
+
+      expect(calls).toEqual(["stdout-flush", "cleanup", "stderr-flush"]);
+      expect(process.exitCode).toBe(0);
+    } finally {
+      process.exitCode = prevCode;
+    }
+  });
+
+  test("darwin Metal mitigation reflects launcher-exported env on darwin", () => {
+    // The real mitigation lives in bin/qmd, which sets GGML_METAL_NO_RESIDENCY=1
+    // before Node loads the llama.cpp native binding. The JS-side predicate
+    // just reports whether that env was set (and not overridden by
+    // QMD_METAL_KEEP_RESIDENCY). On non-darwin the function returns false.
+    const expected =
+      process.platform === "darwin" &&
+      process.env.QMD_METAL_KEEP_RESIDENCY !== "1" &&
+      process.env.GGML_METAL_NO_RESIDENCY === "1";
+    expect(isDarwinMetalMitigationActive()).toBe(expected);
+  });
+
+  test("QMD_METAL_KEEP_RESIDENCY=1 disables the mitigation even when GGML_METAL_NO_RESIDENCY is set", () => {
+    const prevKeep = process.env.QMD_METAL_KEEP_RESIDENCY;
+    const prevNoRes = process.env.GGML_METAL_NO_RESIDENCY;
+    try {
+      process.env.QMD_METAL_KEEP_RESIDENCY = "1";
+      process.env.GGML_METAL_NO_RESIDENCY = "1";
+      expect(isDarwinMetalMitigationActive()).toBe(false);
+    } finally {
+      if (prevKeep === undefined) delete process.env.QMD_METAL_KEEP_RESIDENCY;
+      else process.env.QMD_METAL_KEEP_RESIDENCY = prevKeep;
+      if (prevNoRes === undefined) delete process.env.GGML_METAL_NO_RESIDENCY;
+      else process.env.GGML_METAL_NO_RESIDENCY = prevNoRes;
+    }
+  });
+
+  test("disposes Llama resources in dependency order before CLI exit", async () => {
+    const calls: string[] = [];
+    const llm = new LlamaCpp({ inactivityTimeoutMs: 0 });
+    const disposable = (name: string) => ({
+      dispose: async () => {
+        calls.push(name);
+      },
+    });
+
+    Object.assign(llm as unknown as Record<string, unknown>, {
+      embedContexts: [disposable("embed-context")],
+      rerankContexts: [disposable("rerank-context")],
+      embedModel: disposable("embed-model"),
+      generateModel: disposable("generate-model"),
+      rerankModel: disposable("rerank-model"),
+      llama: disposable("llama"),
+    });
+
+    await llm.dispose();
+
+    expect(calls).toEqual([
+      "embed-context",
+      "rerank-context",
+      "embed-model",
+      "generate-model",
+      "rerank-model",
+      "llama",
+    ]);
+  });
+});
--- a/test/cli-lazy-llm-import.test.ts
+++ b/test/cli-lazy-llm-import.test.ts
@ -0,0 +1,20 @@
+import { describe, expect, test } from "vitest";
+import { readFileSync } from "fs";
+import { join } from "path";
+
+describe("LLM module loading", () => {
+  test("node-llama-cpp is only dynamically imported by LLM operations", () => {
+    const source = readFileSync(join(process.cwd(), "src", "llm.ts"), "utf-8");
+
+    expect(source).not.toMatch(/import\s+(?!type\b)[\s\S]*?from\s+["']node-llama-cpp["']/);
+    expect(source).toContain('import("node-llama-cpp")');
+  });
+
+  test("importing the CLI for lightweight commands succeeds", async () => {
+    const mod = await import("../src/cli/qmd.ts");
+    expect(mod).toMatchObject({
+      buildEditorUri: expect.any(Function),
+      termLink: expect.any(Function),
+    });
+  });
+});
--- a/test/cli.test.ts
+++ b/test/cli.test.ts
--- a/test/collections-config.test.ts
+++ b/test/collections-config.test.ts
@ -6,15 +6,19 @@
 */

 import { describe, test, expect, beforeEach, afterEach } from "vitest";
+import { mkdtemp, rm, writeFile } from "fs/promises";
+import { tmpdir } from "os";
 import { join } from "path";
-import { homedir } from "os";
-import { getConfigPath, setConfigIndexName } from "../src/collections.js";
+import { qmdHomedir } from "../src/paths.js";
+import { getConfigPath, loadConfig, setConfigIndexName } from "../src/collections.js";

 // Save/restore env vars around each test
 let savedEnv: Record<string, string | undefined>;

 beforeEach(() => {
  savedEnv = {
+    HOME: process.env.HOME,
+    USERPROFILE: process.env.USERPROFILE,
    QMD_CONFIG_DIR: process.env.QMD_CONFIG_DIR,
    XDG_CONFIG_HOME: process.env.XDG_CONFIG_HOME,
  };
@ -38,7 +42,16 @@ describe("getConfigDir via getConfigPath", () => {
  test("defaults to ~/.config/qmd when no env vars are set", () => {
    delete process.env.QMD_CONFIG_DIR;
    delete process.env.XDG_CONFIG_HOME;
-    expect(getConfigPath()).toBe(join(homedir(), ".config", "qmd", "index.yml"));
+    expect(getConfigPath()).toBe(join(qmdHomedir(), ".config", "qmd", "index.yml"));
+  });
+
+  test("uses the same USERPROFILE fallback as default DB path when HOME is unset", () => {
+    delete process.env.HOME;
+    delete process.env.QMD_CONFIG_DIR;
+    delete process.env.XDG_CONFIG_HOME;
+    process.env.USERPROFILE = "/Users/windows-user";
+
+    expect(getConfigPath()).toBe(join("/Users/windows-user", ".config", "qmd", "index.yml"));
  });

  test("QMD_CONFIG_DIR takes highest priority", () => {
@ -71,4 +84,15 @@ describe("getConfigDir via getConfigPath", () => {
    setConfigIndexName("myindex");
    expect(getConfigPath()).toBe(join("/xdg/config", "qmd", "myindex.yml"));
  });
+
+  test("loadConfig treats an empty YAML file as an empty config", async () => {
+    const dir = await mkdtemp(join(tmpdir(), "qmd-empty-config-"));
+    try {
+      process.env.QMD_CONFIG_DIR = dir;
+      await writeFile(join(dir, "index.yml"), "");
+      expect(loadConfig()).toEqual({ collections: {} });
+    } finally {
+      await rm(dir, { recursive: true, force: true });
+    }
+  });
 });
--- a/test/esm-ambiguous-module.test.ts
+++ b/test/esm-ambiguous-module.test.ts
@ -0,0 +1,27 @@
+import { describe, expect, test } from "vitest";
+import { execFileSync } from "child_process";
+import { mkdtempSync } from "fs";
+import { tmpdir } from "os";
+import { dirname, join, resolve } from "path";
+import { fileURLToPath } from "url";
+
+const repoRoot = resolve(dirname(fileURLToPath(import.meta.url)), "..");
+
+describe("Node ESM entrypoints", () => {
+  test("CLI --index path normalizes via setIndexName/setConfigIndexName under Node 22+", () => {
+    execFileSync(process.execPath, ["scripts/build.mjs"], {
+      cwd: repoRoot,
+      encoding: "utf-8",
+      stdio: "pipe",
+    });
+
+    const indexPath = join(mkdtempSync(join(tmpdir(), "qmd-index-")), "nested", "idx");
+    const output = execFileSync(process.execPath, ["dist/cli/qmd.js", "--index", indexPath, "--version"], {
+      cwd: repoRoot,
+      encoding: "utf-8",
+      stdio: "pipe",
+    });
+
+    expect(output).toContain("qmd ");
+  }, 120_000);
+});
--- a/test/llm.test.ts
+++ b/test/llm.test.ts
@ -12,6 +12,14 @@ import {
  getDefaultLlamaCpp,
  disposeDefaultLlamaCpp,
  resolveLlamaGpuMode,
+  setNodeLlamaCppModuleForTest,
+  withNativeStdoutRedirectedToStderr,
+  resolveParallelismOverride,
+  resolveSafeParallelism,
+  resolveEmbedModel,
+  resolveGenerateModel,
+  resolveRerankModel,
+  resolveModels,
  withLLMSession,
  canUnloadLLM,
  SessionReleasedError,
@ -19,6 +27,63 @@ import {
  type ILLMSession,
 } from "../src/llm.js";

+describe("model name resolution", () => {
+  function withModelEnv(env: Record<string, string | undefined>, fn: () => void): void {
+    const previous = {
+      QMD_EMBED_MODEL: process.env.QMD_EMBED_MODEL,
+      QMD_GENERATE_MODEL: process.env.QMD_GENERATE_MODEL,
+      QMD_RERANK_MODEL: process.env.QMD_RERANK_MODEL,
+    };
+    try {
+      for (const [key, value] of Object.entries(env)) {
+        if (value === undefined) delete process.env[key];
+        else process.env[key] = value;
+      }
+      fn();
+    } finally {
+      for (const [key, value] of Object.entries(previous)) {
+        if (value === undefined) delete process.env[key];
+        else process.env[key] = value;
+      }
+    }
+  }
+
+  test("all model roles resolve config hints before env fallbacks", () => {
+    withModelEnv({
+      QMD_EMBED_MODEL: "env-embed",
+      QMD_GENERATE_MODEL: "env-generate",
+      QMD_RERANK_MODEL: "env-rerank",
+    }, () => {
+      const config = {
+        embed: "config-embed",
+        generate: "config-generate",
+        rerank: "config-rerank",
+      };
+      expect(resolveEmbedModel(config)).toBe("config-embed");
+      expect(resolveGenerateModel(config)).toBe("config-generate");
+      expect(resolveRerankModel(config)).toBe("config-rerank");
+      expect(resolveModels(config)).toEqual(config);
+    });
+  });
+
+  test("LlamaCpp constructor uses the same resolver as status/embed/query helpers", () => {
+    withModelEnv({
+      QMD_EMBED_MODEL: "env-embed",
+      QMD_GENERATE_MODEL: "env-generate",
+      QMD_RERANK_MODEL: "env-rerank",
+    }, () => {
+      const llm = new LlamaCpp({
+        embedModel: "config-embed",
+        generateModel: "config-generate",
+        rerankModel: "config-rerank",
+      });
+      expect(llm.embedModelName).toBe(resolveEmbedModel({ embed: "config-embed" }));
+      expect(llm.generateModelName).toBe(resolveGenerateModel({ generate: "config-generate" }));
+      expect(llm.rerankModelName).toBe(resolveRerankModel({ rerank: "config-rerank" }));
+    });
+  });
+});
+
 // =============================================================================
 // Singleton Tests (no model loading required)
 // =============================================================================
@ -75,6 +140,29 @@ describe("QMD_LLAMA_GPU resolution", () => {
    expect(resolveLlamaGpuMode(" cuda ")).toBe("cuda");
  });

+  test("QMD_FORCE_CPU disables GPU before QMD_LLAMA_GPU auto-detection", () => {
+    const prevForceCpu = process.env.QMD_FORCE_CPU;
+    process.env.QMD_FORCE_CPU = "1";
+    try {
+      expect(resolveLlamaGpuMode(undefined)).toBe(false);
+      expect(resolveLlamaGpuMode("cuda")).toBe(false);
+    } finally {
+      if (prevForceCpu === undefined) delete process.env.QMD_FORCE_CPU;
+      else process.env.QMD_FORCE_CPU = prevForceCpu;
+    }
+  });
+
+  test("QMD_FORCE_CPU ignores false-ish values", () => {
+    const prevForceCpu = process.env.QMD_FORCE_CPU;
+    process.env.QMD_FORCE_CPU = "0";
+    try {
+      expect(resolveLlamaGpuMode(undefined)).toBe("auto");
+    } finally {
+      if (prevForceCpu === undefined) delete process.env.QMD_FORCE_CPU;
+      else process.env.QMD_FORCE_CPU = prevForceCpu;
+    }
+  });
+
  test("warns and falls back to auto for unsupported values", () => {
    const stderrSpy = vi.spyOn(process.stderr, "write").mockReturnValue(true);
    try {
@ -87,6 +175,201 @@ describe("QMD_LLAMA_GPU resolution", () => {
  });
 });

+describe("native llama stdout containment", () => {
+  test("redirects native stdout noise to stderr while JSON callers are initializing llama", async () => {
+    const stdoutSpy = vi.spyOn(process.stdout, "write").mockReturnValue(true);
+    const stderrSpy = vi.spyOn(process.stderr, "write").mockReturnValue(true);
+    try {
+      await withNativeStdoutRedirectedToStderr(async () => {
+        process.stdout.write("cmake build spam\n");
+        return "ok";
+      });
+
+      expect(stdoutSpy).not.toHaveBeenCalled();
+      expect(stderrSpy).toHaveBeenCalledWith("cmake build spam\n", undefined, undefined);
+    } finally {
+      stdoutSpy.mockRestore();
+      stderrSpy.mockRestore();
+    }
+  });
+
+  test("keeps native GPU failure noise off stdout and caches failed GPU init", async () => {
+    const prevGpu = process.env.QMD_LLAMA_GPU;
+    const prevForceCpu = process.env.QMD_FORCE_CPU;
+    process.env.QMD_LLAMA_GPU = "cuda";
+    delete process.env.QMD_FORCE_CPU;
+
+    const calls: unknown[] = [];
+    const fakeLlama = { gpu: false, cpuMathCores: 4 };
+    setNodeLlamaCppModuleForTest({
+      LlamaLogLevel: { error: "error" },
+      resolveModelFile: vi.fn(),
+      LlamaChatSession: vi.fn() as any,
+      getLlama: vi.fn(async (options: Record<string, unknown>) => {
+        calls.push(options.gpu);
+        if (options.gpu === "cuda") {
+          process.stdout.write("cmake build spam\n");
+          throw new Error("CUDA unavailable");
+        }
+        return fakeLlama as any;
+      }),
+    });
+
+    const stdoutSpy = vi.spyOn(process.stdout, "write").mockReturnValue(true);
+    const stderrSpy = vi.spyOn(process.stderr, "write").mockReturnValue(true);
+    try {
+      const first = new LlamaCpp();
+      const second = new LlamaCpp();
+
+      await (first as any).ensureLlama();
+      await (second as any).ensureLlama();
+
+      expect(stdoutSpy).not.toHaveBeenCalled();
+      expect(stderrSpy).toHaveBeenCalledWith("cmake build spam\n", undefined, undefined);
+      expect(calls).toEqual(["cuda", false, false]);
+      expect(String(stderrSpy.mock.calls.map(call => call[0]).join(""))).toContain("skipping previously failed GPU init");
+    } finally {
+      stdoutSpy.mockRestore();
+      stderrSpy.mockRestore();
+      setNodeLlamaCppModuleForTest(null);
+      if (prevGpu === undefined) delete process.env.QMD_LLAMA_GPU;
+      else process.env.QMD_LLAMA_GPU = prevGpu;
+      if (prevForceCpu === undefined) delete process.env.QMD_FORCE_CPU;
+      else process.env.QMD_FORCE_CPU = prevForceCpu;
+    }
+  });
+
+  test("warns about CPU fallback only once per process", async () => {
+    const prevGpu = process.env.QMD_LLAMA_GPU;
+    const prevForceCpu = process.env.QMD_FORCE_CPU;
+    process.env.QMD_LLAMA_GPU = "false";
+    delete process.env.QMD_FORCE_CPU;
+
+    setNodeLlamaCppModuleForTest({
+      LlamaLogLevel: { error: "error" },
+      resolveModelFile: vi.fn(),
+      LlamaChatSession: vi.fn() as any,
+      getLlama: vi.fn(async () => ({ gpu: false, cpuMathCores: 4 }) as any),
+    });
+
+    const stderrSpy = vi.spyOn(process.stderr, "write").mockReturnValue(true);
+    try {
+      const first = new LlamaCpp();
+      const second = new LlamaCpp();
+
+      await (first as any).ensureLlama();
+      await (second as any).ensureLlama();
+
+      const stderr = String(stderrSpy.mock.calls.map(call => call[0]).join(""));
+      expect(stderr.match(/no GPU acceleration/g)?.length).toBe(1);
+      expect(stderr).toContain("qmd doctor");
+      expect(stderr).not.toContain("QMD_STATUS_DEVICE_PROBE");
+    } finally {
+      stderrSpy.mockRestore();
+      setNodeLlamaCppModuleForTest(null);
+      if (prevGpu === undefined) delete process.env.QMD_LLAMA_GPU;
+      else process.env.QMD_LLAMA_GPU = prevGpu;
+      if (prevForceCpu === undefined) delete process.env.QMD_FORCE_CPU;
+      else process.env.QMD_FORCE_CPU = prevForceCpu;
+    }
+  });
+
+  test("embeds hello world with QMD_FORCE_CPU=1 without throwing", async () => {
+    const prevGpu = process.env.QMD_LLAMA_GPU;
+    const prevForceCpu = process.env.QMD_FORCE_CPU;
+    process.env.QMD_FORCE_CPU = "1";
+    process.env.QMD_LLAMA_GPU = "metal";
+
+    const getEmbeddingFor = vi.fn(async (text: string) => ({
+      vector: new Float32Array([0.1, 0.2, 0.3]),
+      text,
+    }));
+    const createEmbeddingContext = vi.fn(async () => ({
+      getEmbeddingFor,
+      dispose: vi.fn(async () => {}),
+    }));
+    const loadModel = vi.fn(async () => ({
+      trainContextSize: 2048,
+      tokenize: (text: string) => Array.from(text),
+      detokenize: (tokens: string[]) => tokens.join(""),
+      createEmbeddingContext,
+      dispose: vi.fn(async () => {}),
+    }));
+    const getLlama = vi.fn(async (options: Record<string, unknown>) => ({
+      gpu: false,
+      cpuMathCores: 4,
+      loadModel,
+      dispose: vi.fn(async () => {}),
+    }) as any);
+
+    setNodeLlamaCppModuleForTest({
+      LlamaLogLevel: { error: "error" },
+      resolveModelFile: vi.fn(async () => "/tmp/nonexistent-model.gguf"),
+      LlamaChatSession: vi.fn() as any,
+      getLlama,
+    });
+
+    const stderrSpy = vi.spyOn(process.stderr, "write").mockReturnValue(true);
+    const llm = new LlamaCpp();
+    try {
+      const result = await llm.embed("hello world");
+      expect(result).toEqual({
+        embedding: [0.10000000149011612, 0.20000000298023224, 0.30000001192092896],
+        model: llm.embedModelName,
+      });
+      expect(getLlama).toHaveBeenCalledWith(expect.objectContaining({ gpu: false, build: "never" }));
+      expect(loadModel).toHaveBeenCalledWith(expect.objectContaining({ gpuLayers: 0 }));
+      expect(getEmbeddingFor).toHaveBeenCalledWith("hello world");
+    } finally {
+      await llm.dispose();
+      stderrSpy.mockRestore();
+      setNodeLlamaCppModuleForTest(null);
+      if (prevGpu === undefined) delete process.env.QMD_LLAMA_GPU;
+      else process.env.QMD_LLAMA_GPU = prevGpu;
+      if (prevForceCpu === undefined) delete process.env.QMD_FORCE_CPU;
+      else process.env.QMD_FORCE_CPU = prevForceCpu;
+    }
+  });
+});
+
+describe("LLM context parallelism safety", () => {
+  test("defaults Windows CUDA to one context to avoid ggml-cuda.cu:98 crashes", () => {
+    expect(resolveSafeParallelism({
+      gpu: "cuda",
+      platform: "win32",
+      computed: 8,
+      envValue: undefined,
+    })).toBe(1);
+  });
+
+  test("keeps non-Windows and non-CUDA backends on computed parallelism", () => {
+    expect(resolveSafeParallelism({ gpu: "cuda", platform: "linux", computed: 8 })).toBe(8);
+    expect(resolveSafeParallelism({ gpu: "vulkan", platform: "win32", computed: 8 })).toBe(8);
+    expect(resolveSafeParallelism({ gpu: false, platform: "win32", computed: 4 })).toBe(4);
+  });
+
+  test("QMD_EMBED_PARALLELISM overrides the Windows CUDA safety default", () => {
+    expect(resolveSafeParallelism({
+      gpu: "cuda",
+      platform: "win32",
+      computed: 8,
+      envValue: "2",
+    })).toBe(2);
+  });
+
+  test("QMD_EMBED_PARALLELISM clamps invalid values and warns", () => {
+    const stderrSpy = vi.spyOn(process.stderr, "write").mockReturnValue(true);
+    try {
+      expect(resolveParallelismOverride("0")).toBeUndefined();
+      expect(resolveParallelismOverride("bad")).toBeUndefined();
+      expect(stderrSpy).toHaveBeenCalledTimes(2);
+      expect(String(stderrSpy.mock.calls[0]?.[0] || "")).toContain("QMD_EMBED_PARALLELISM");
+    } finally {
+      stderrSpy.mockRestore();
+    }
+  });
+});
+
 describe("LlamaCpp expand context size config", () => {
  const defaultExpandContextSize = 2048;

@ -820,7 +1103,7 @@ describe.skipIf(!!process.env.CI)("LlamaCpp Integration", () => {
      for (const doc of result.results) {
        console.log(`  ${doc.file}: ${doc.score.toFixed(4)}`);
      }
-    });
+    }, 30000);
  });

  describe("expandQuery", () => {
--- a/test/local-config.test.ts
+++ b/test/local-config.test.ts
@ -0,0 +1,98 @@
+import { existsSync, mkdtempSync, mkdirSync, writeFileSync, rmSync, realpathSync } from "node:fs";
+import { execFileSync } from "node:child_process";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+import { afterEach, describe, expect, test } from "vitest";
+import { findLocalConfigPath, getLocalDbPath } from "../src/collections.js";
+
+function cliCommandArgs(command: string): { bin: string; args: string[] } {
+  const cliPath = join(process.cwd(), "src/cli/qmd.ts");
+  if (process.versions.bun) {
+    return { bin: process.execPath, args: [cliPath, command] };
+  }
+  return {
+    bin: process.execPath,
+    args: [join(process.cwd(), "node_modules/tsx/dist/cli.mjs"), cliPath, command],
+  };
+}
+
+const roots: string[] = [];
+
+function tempProject(): string {
+  const root = mkdtempSync(join(tmpdir(), "qmd-local-config-"));
+  roots.push(root);
+  return root;
+}
+
+afterEach(() => {
+  for (const root of roots.splice(0)) {
+    rmSync(root, { recursive: true, force: true });
+  }
+});
+
+describe("local .qmd project config", () => {
+  test("finds .qmd/index.yaml from nested working directories", () => {
+    const root = tempProject();
+    const configPath = join(root, ".qmd", "index.yaml");
+    mkdirSync(join(root, ".qmd"), { recursive: true });
+    writeFileSync(configPath, "collections: {}\n");
+    const nested = join(root, "wiki", "Shopify");
+    mkdirSync(nested, { recursive: true });
+
+    expect(findLocalConfigPath(nested)).toBe(configPath);
+  });
+
+  test("prefers index.yaml over index.yml when both exist", () => {
+    const root = tempProject();
+    mkdirSync(join(root, ".qmd"), { recursive: true });
+    const yaml = join(root, ".qmd", "index.yaml");
+    const yml = join(root, ".qmd", "index.yml");
+    writeFileSync(yaml, "collections: {}\n");
+    writeFileSync(yml, "collections: {}\n");
+
+    expect(findLocalConfigPath(root)).toBe(yaml);
+  });
+
+  test("uses .qmd/index.sqlite next to the local config", () => {
+    const root = tempProject();
+    mkdirSync(join(root, ".qmd"), { recursive: true });
+    const configPath = join(root, ".qmd", "index.yaml");
+    writeFileSync(configPath, "collections: {}\n");
+
+    expect(getLocalDbPath(configPath)).toBe(join(root, ".qmd", "index.sqlite"));
+  });
+
+  test("CLI uses local .qmd config and index instead of global cache", () => {
+    const root = tempProject();
+    mkdirSync(join(root, ".qmd"), { recursive: true });
+    mkdirSync(join(root, "docs"), { recursive: true });
+    writeFileSync(join(root, "docs", "a.md"), "# A\n\nLocal test document.\n");
+    writeFileSync(join(root, ".qmd", "index.yaml"), `collections:\n  docs:\n    path: ${JSON.stringify(join(root, "docs"))}\n    pattern: "**/*.md"\n    context:\n      /: Local test docs\nmodels:\n  embed: local-embed-model\n  rerank: local-rerank-model\n  generate: local-generate-model\n`);
+
+    const home = join(root, "home");
+    const { bin, args } = cliCommandArgs("status");
+    const output = execFileSync(bin, args, {
+      cwd: root,
+      encoding: "utf-8",
+      env: {
+        ...process.env,
+        HOME: home,
+        XDG_CONFIG_HOME: join(home, ".config"),
+        XDG_CACHE_HOME: join(home, ".cache"),
+        QMD_EMBED_MODEL: "env-embed-model",
+        QMD_RERANK_MODEL: "env-rerank-model",
+        QMD_GENERATE_MODEL: "env-generate-model",
+      },
+    });
+
+    const localIndex = join(root, ".qmd", "index.sqlite");
+    expect(output).toContain(`Index: ${realpathSync(localIndex)}`);
+    expect(output).toContain("docs (qmd://docs/)");
+    expect(output).toContain("Embedding:   local-embed-model");
+    expect(output).toContain("Reranking:   local-rerank-model");
+    expect(output).toContain("Generation:  local-generate-model");
+    expect(output).not.toContain("env-embed-model");
+    expect(existsSync(localIndex)).toBe(true);
+    expect(existsSync(join(home, ".cache", "qmd", "index.sqlite"))).toBe(false);
+  });
+});
--- a/test/mcp.test.ts
+++ b/test/mcp.test.ts
@ -80,6 +80,7 @@ function initTestDatabase(db: Database): void {
      seq INTEGER NOT NULL DEFAULT 0,
      pos INTEGER NOT NULL DEFAULT 0,
      model TEXT NOT NULL,
+      embed_fingerprint TEXT NOT NULL DEFAULT '',
      embedded_at TEXT NOT NULL,
      PRIMARY KEY (hash, seq)
    )
@ -186,7 +187,7 @@ function seedTestData(db: Database): void {
  for (let i = 0; i < 768; i++) embedding[i] = Math.random();

  for (const doc of docs.slice(0, 4)) { // Skip large file for embeddings
-    db.prepare(`INSERT INTO content_vectors (hash, seq, pos, model, embedded_at) VALUES (?, 0, 0, 'embeddinggemma', ?)`).run(doc.hash, now);
+    db.prepare(`INSERT INTO content_vectors (hash, seq, pos, model, embed_fingerprint, embedded_at) VALUES (?, 0, 0, ?, ?, ?)`).run(doc.hash, DEFAULT_EMBED_MODEL, getEmbeddingFingerprint(DEFAULT_EMBED_MODEL), now);
    db.prepare(`INSERT INTO vectors_vec (hash_seq, embedding) VALUES (?, ?)`).run(`${doc.hash}_0`, embedding);
  }
 }
@ -211,6 +212,7 @@ import {
  findDocuments,
  getStatus,
  DEFAULT_EMBED_MODEL,
+  getEmbeddingFingerprint,
  DEFAULT_QUERY_MODEL,
  DEFAULT_RERANK_MODEL,
  DEFAULT_MULTI_GET_MAX_BYTES,
@ -887,6 +889,33 @@ describe("MCP Server", () => {
        expect(typeof col.documents).toBe("number");
      }
    });
+
+    test("REST /query and /search file field uses qmd:// URI prefix (#576)", () => {
+      // Regression test: the HTTP REST endpoint was returning r.displayPath (e.g.
+      // "docs/readme.md") instead of "qmd://docs/readme.md", while the CLI and MCP
+      // resource URIs always use the qmd:// scheme. This simulates the fix: the REST
+      // handler now applies encodeQmdPath and prepends "qmd://".
+      const results = searchFTS(testDb, "readme", 5);
+      expect(results.length).toBeGreaterThan(0);
+
+      // Simulate what the fixed REST handler produces for each result
+      const restResponseItems = results.map(r => ({
+        docid: `#${r.docid}`,
+        file: `qmd://${r.displayPath.split('/').map(s => encodeURIComponent(s)).join('/')}`,
+        title: r.title,
+        score: Math.round(r.score * 100) / 100,
+      }));
+
+      // Every file field must start with qmd://
+      for (const item of restResponseItems) {
+        expect(item.file).toMatch(/^qmd:\/\//);
+      }
+
+      // Spot-check the readme result
+      const readmeItem = restResponseItems.find(item => item.file.includes("readme"));
+      expect(readmeItem).toBeDefined();
+      expect(readmeItem!.file).toBe("qmd://docs/readme.md");
+    });
  });
 });

@ -913,6 +942,22 @@ describe.skipIf(!!process.env.CI)("MCP HTTP Transport", () => {
    initTestDatabase(db);
    seedTestData(db);

+    // 300 pad lines (37 chars each = 11100 chars) puts the marker past the
+    // first chunk boundary at CHUNK_SIZE_CHARS = 3600.
+    {
+      const padLine = "Pad line for chunk boundary coverage\n";
+      const absLineFixtureBody =
+        padLine.repeat(300) +
+        "UNIQUE_KEYWORD_XYZ marker\n" +
+        padLine.repeat(20);
+      const fixtureHash = "hash-abslines";
+      const now = new Date().toISOString();
+      db.prepare(`INSERT OR IGNORE INTO content (hash, doc, created_at) VALUES (?, ?, ?)`)
+        .run(fixtureHash, absLineFixtureBody, now);
+      db.prepare(`INSERT INTO documents (collection, path, title, hash, created_at, modified_at, active) VALUES ('docs', ?, ?, ?, ?, ?, 1)`)
+        .run("absolute-line-fixture.md", "Absolute Line Fixture", fixtureHash, now, now);
+    }
+
    // Sync config into SQLite
    const httpTestConfig: CollectionConfig = {
      collections: {
@ -1074,4 +1119,29 @@ describe.skipIf(!!process.env.CI)("MCP HTTP Transport", () => {
    expect(json.result).toBeDefined();
    expect(json.result.content.length).toBeGreaterThan(0);
  });
+
+  test("POST /mcp tools/call query returns absolute source-file line numbers, not chunk-local", async () => {
+    await mcpRequest({
+      jsonrpc: "2.0", id: 1, method: "initialize",
+      params: { protocolVersion: "2025-03-26", capabilities: {}, clientInfo: { name: "test", version: "1.0" } },
+    });
+
+    const { status, json } = await mcpRequest({
+      jsonrpc: "2.0", id: 5, method: "tools/call",
+      params: {
+        name: "query",
+        arguments: {
+          searches: [{ type: "lex", query: "UNIQUE_KEYWORD_XYZ" }],
+          rerank: false,
+        },
+      },
+    });
+    expect(status).toBe(200);
+    const results = json.result.structuredContent.results;
+    expect(results.length).toBeGreaterThan(0);
+    const hit = results.find((r: any) => r.file === "docs/absolute-line-fixture.md");
+    expect(hit).toBeDefined();
+    expect(hit.line).toBe(301);
+    expect(hit.snippet).toMatch(/^\d+: @@ -3\d\d,/);
+  });
 });
--- a/test/package.test.ts
+++ b/test/package.test.ts
@ -0,0 +1,71 @@
+import { describe, expect, test } from "vitest";
+import { readFileSync } from "node:fs";
+import { join } from "node:path";
+
+const root = new URL("..", import.meta.url);
+const pkg = JSON.parse(readFileSync(new URL("package.json", root), "utf8"));
+
+describe("package test task", () => {
+  test("runs typecheck, unit tests, and package smoke checks", () => {
+    expect(pkg.scripts.test).toContain("scripts/test-all.mjs");
+
+    expect(pkg.scripts["test:types"]).toContain("tsconfig.build.json --noEmit");
+    expect(pkg.scripts["test:unit"]).toContain("vitest.mjs");
+    expect(pkg.scripts["test:unit"]).toContain("bun test");
+    expect(pkg.scripts["test:unit"]).toContain("CI=true");
+
+    expect(pkg.scripts["test:package"]).toContain("scripts/package-smoke.mjs");
+
+    const testAllScript = readFileSync(new URL("scripts/test-all.mjs", root), "utf8");
+    expect(testAllScript).toContain("TypeScript build typecheck");
+    expect(testAllScript).toContain("Vitest suite under Node");
+    expect(testAllScript).toContain("Bun test suite");
+    expect(testAllScript).toContain("Package smoke");
+
+    const packageSmokeScript = readFileSync(new URL("scripts/package-smoke.mjs", root), "utf8");
+    expect(packageSmokeScript).toContain("scripts/build.mjs");
+    expect(packageSmokeScript).toContain("scripts/check-package-grammars.mjs");
+    expect(packageSmokeScript).toContain("compiled CLI under Node");
+    expect(packageSmokeScript).toContain("compiled CLI under Bun");
+    expect(packageSmokeScript).toContain("package wrapper");
+  });
+});
+
+describe("package grammar distribution", () => {
+  test("installs AST grammar wasm packages as required runtime dependencies", () => {
+    for (const dep of ["tree-sitter-typescript", "tree-sitter-python", "tree-sitter-go", "tree-sitter-rust"]) {
+      expect(pkg.dependencies, `${dep} should be a required dependency`).toHaveProperty(dep);
+      expect(pkg.optionalDependencies ?? {}, `${dep} should not be optional`).not.toHaveProperty(dep);
+    }
+  });
+
+  test("documents a packaging smoke check for grammar wasm availability", () => {
+    expect(pkg.scripts, "package.json scripts").toHaveProperty("smoke:package-grammars");
+    expect(String(pkg.scripts["smoke:package-grammars"])).toContain("check-package-grammars");
+
+    expect(pkg.files, "published package files").toContain("scripts/build.mjs");
+    expect(pkg.files, "published package files").toContain("scripts/check-package-grammars.mjs");
+    expect(pkg.files, "published package files").toContain("scripts/package-smoke.mjs");
+    expect(pkg.files, "published package files").toContain("scripts/test-all.mjs");
+    expect(pkg.files, "published package files").toContain("skills/");
+    const qmdSkill = readFileSync(new URL("skills/qmd/SKILL.md", root), "utf8");
+    expect(qmdSkill).toContain("# QMD - Query Markdown Documents");
+    expect(qmdSkill).toContain("## How search works");
+    expect(qmdSkill).toContain("## MCP Tool: `query`");
+    expect(qmdSkill).not.toContain("This file is a discovery stub");
+
+    const firstSixtyLines = qmdSkill.split(/\r?\n/).slice(0, 60).join("\n");
+    expect(firstSixtyLines).toContain("Search for candidate documents");
+    expect(firstSixtyLines).toContain("qmd search");
+    expect(firstSixtyLines).toContain('qmd multi-get "#abc123,#def432"');
+    expect(firstSixtyLines).toContain("Retrieved:");
+    expect(firstSixtyLines).toContain("qmd query");
+    // The skill must teach structured, self-authored queries near the top.
+    expect(firstSixtyLines).toContain("Default to structured");
+
+    const scriptPath = join(root.pathname, "scripts", "check-package-grammars.mjs");
+    const script = readFileSync(scriptPath, "utf8");
+    expect(script).toContain("tree-sitter-typescript/tree-sitter-typescript.wasm");
+    expect(script).toContain("tree-sitter-typescript/tree-sitter-tsx.wasm");
+  });
+});
--- a/test/path-fidelity.test.ts
+++ b/test/path-fidelity.test.ts
@ -0,0 +1,414 @@
+/**
+ * Path Fidelity Tests
+ *
+ * Verifies that QMD stores literal filesystem paths (not handalized slugs) so
+ * that paths with special characters — spaces, #, &, @, [], (), etc. — round-
+ * trip correctly through index → search → get → full-path.
+ *
+ * This covers the five breakage points found before the literal-path fix:
+ *   1. search --json `file` field shows handalized slug instead of real path
+ *   2. `qmd get --full-path` silently falls back (resolveVirtualPath built
+ *      a non-existent path from the slug, existsSync returned false)
+ *   3. `qmd get <actual-fs-path>` returns "Document not found"
+ *   4. `qmd ls` shows handalized slugs
+ *   5. `toVirtualPath(db, absPath)` returns null
+ *
+ * Also covers backward-compat migration: an index created with the old
+ * handalize-at-index-time code can be updated with `qmd update` and the paths
+ * are renamed to their literal forms in-place.
+ */
+
+import { describe, test, expect, beforeAll, afterAll } from "vitest";
+import { mkdir, mkdtemp, rm, writeFile } from "fs/promises";
+import { existsSync, realpathSync } from "fs";
+import { tmpdir } from "os";
+import { join } from "path";
+import { spawn } from "child_process";
+import { fileURLToPath } from "url";
+import { dirname } from "path";
+import YAML from "yaml";
+import { openDatabase } from "../src/db.js";
+import type { Database } from "../src/db.js";
+import {
+  createStore,
+  toVirtualPath,
+  insertDocument,
+  insertContent,
+  hashContent,
+  handelize,
+  normalizePathSeparators,
+  syncConfigToDb,
+} from "../src/store.js";
+import type { CollectionConfig } from "../src/collections.js";
+
+const thisDir = dirname(fileURLToPath(import.meta.url));
+const projectRoot = join(thisDir, "..");
+const qmdScript = join(projectRoot, "src", "cli", "qmd.ts");
+const isBunRuntime = typeof (globalThis as { Bun?: unknown }).Bun !== "undefined";
+const tsxCli = join(projectRoot, "node_modules", "tsx", "dist", "cli.mjs");
+
+async function runQmd(
+  args: string[],
+  opts: { cwd: string; dbPath: string; configDir: string; env?: Record<string, string> }
+): Promise<{ stdout: string; stderr: string; exitCode: number }> {
+  const runner = isBunRuntime
+    ? { command: process.execPath, args: [qmdScript, ...args] }
+    : { command: process.execPath, args: [tsxCli, qmdScript, ...args] };
+
+  const proc = spawn(runner.command, runner.args, {
+    cwd: opts.cwd,
+    env: {
+      ...process.env,
+      INDEX_PATH: opts.dbPath,
+      QMD_CONFIG_DIR: opts.configDir,
+      PWD: opts.cwd,
+      QMD_DOCTOR_DEVICE_PROBE: "0",
+      ...(opts.env ?? {}),
+    },
+    stdio: ["ignore", "pipe", "pipe"],
+  });
+
+  let stdout = "";
+  let stderr = "";
+  proc.stdout?.on("data", (c: Buffer) => { stdout += c.toString(); });
+  proc.stderr?.on("data", (c: Buffer) => { stderr += c.toString(); });
+  const exitCode = await new Promise<number>((res, rej) => {
+    proc.once("error", rej);
+    proc.on("close", (code) => res(code ?? 1));
+  });
+  return { stdout, stderr, exitCode };
+}
+
+// ---------------------------------------------------------------------------
+// Test environment setup
+// ---------------------------------------------------------------------------
+
+let testDir: string;
+
+// Files with names that previously broke due to handalize() at index time.
+const crazyFiles: Array<{ name: string; content: string }> = [
+  {
+    name: "# Meeting - 234232 3432 __ 5.md",
+    content: "# Meeting - 234232 3432 // 5\n\nSome meeting content with searchterm-alpha.\n",
+  },
+  {
+    name: "Budget & Revenue (Q4) [2024].md",
+    content: "# Budget & Revenue Q4 2024\n\nFinancial overview searchterm-beta.\n",
+  },
+  {
+    name: "normal-file.md",
+    content: "# Normal File\n\nPlain filename, should always work.\n",
+  },
+];
+
+const crazySubFiles: Array<{ name: string; content: string }> = [
+  {
+    name: "Notes #42 - foo@bar.md",
+    content: "# Notes #42\n\nSubdir file with searchterm-gamma.\n",
+  },
+];
+
+beforeAll(async () => {
+  testDir = await mkdtemp(join(tmpdir(), "qmd-path-fidelity-"));
+});
+
+afterAll(async () => {
+  await rm(testDir, { recursive: true, force: true });
+});
+
+// Helper: create a fresh isolated test environment with a corpus of crazy filenames.
+async function createCrazyCollection(prefix: string): Promise<{
+  collectionDir: string;
+  dbPath: string;
+  configDir: string;
+}> {
+  const envDir = join(testDir, prefix);
+  const collectionDir = join(envDir, "corpus");
+  const dbPath = join(envDir, "test.sqlite");
+  const configDir = join(envDir, "config");
+
+  await mkdir(collectionDir, { recursive: true });
+  await mkdir(join(collectionDir, "subdir"), { recursive: true });
+  await mkdir(configDir, { recursive: true });
+
+  // Resolve symlinks so the path matches what getRealPath() stores in the DB.
+  // On macOS /tmp is a symlink to /private/tmp; without this normalisation
+  // toVirtualPath() and --full-path resolution fail.
+  const realCollectionDir = realpathSync(collectionDir);
+
+  for (const f of crazyFiles) {
+    await writeFile(join(collectionDir, f.name), f.content);
+  }
+  for (const f of crazySubFiles) {
+    await writeFile(join(collectionDir, "subdir", f.name), f.content);
+  }
+
+  // Write empty YAML config — `collection add` will populate it
+  await writeFile(join(configDir, "index.yml"), "collections: {}\n");
+
+  return { collectionDir: realCollectionDir, dbPath, configDir };
+}
+
+// ---------------------------------------------------------------------------
+// Unit tests: store-level path storage
+// ---------------------------------------------------------------------------
+
+describe("Path fidelity — store level", () => {
+  test("reindexCollection stores literal relative paths, not handalized slugs", async () => {
+    const { collectionDir, dbPath, configDir } = await createCrazyCollection("store-unit");
+
+    // Run `collection add` to index
+    const add = await runQmd(
+      ["collection", "add", collectionDir, "--name", "crazytest"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(add.exitCode, `collection add failed: ${add.stderr}`).toBe(0);
+
+    // Inspect the DB directly
+    const db = openDatabase(dbPath);
+    const rows = db.prepare(
+      "SELECT path FROM documents WHERE active = 1 ORDER BY path"
+    ).all() as { path: string }[];
+    db.close();
+
+    const paths = rows.map((r) => r.path);
+
+    // Must contain literal filenames — not handalized slugs
+    expect(paths).toContain("# Meeting - 234232 3432 __ 5.md");
+    expect(paths).toContain("Budget & Revenue (Q4) [2024].md");
+    expect(paths).toContain("normal-file.md");
+    expect(paths).toContain("subdir/Notes #42 - foo@bar.md");
+
+    // Must NOT contain handalized versions
+    expect(paths).not.toContain("Meeting-234232-3432-5.md");
+    expect(paths).not.toContain("Budget-Revenue-Q4-2024.md");
+    expect(paths).not.toContain("subdir/Notes-42-foo-bar.md");
+  });
+
+  test("toVirtualPath returns non-null for crazy-named files", async () => {
+    const { collectionDir, dbPath, configDir } = await createCrazyCollection("store-to-virtual");
+    const add = await runQmd(
+      ["collection", "add", collectionDir, "--name", "crazytest"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(add.exitCode).toBe(0);
+
+    const rawDb = openDatabase(dbPath);
+    const result = toVirtualPath(rawDb, join(collectionDir, "Budget & Revenue (Q4) [2024].md"));
+    rawDb.close();
+
+    expect(result).not.toBeNull();
+    expect(result).toBe(`qmd://crazytest/Budget & Revenue (Q4) [2024].md`);
+  });
+});
+
+// ---------------------------------------------------------------------------
+// CLI integration tests — the five original breakage points
+// ---------------------------------------------------------------------------
+
+describe("Path fidelity — CLI integration", () => {
+  let collectionDir: string;
+  let dbPath: string;
+  let configDir: string;
+
+  // Index once for the whole describe block (read-only tests share it)
+  beforeAll(async () => {
+    ({ collectionDir, dbPath, configDir } = await createCrazyCollection("cli-shared"));
+    const add = await runQmd(
+      ["collection", "add", collectionDir, "--name", "crazytest"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(add.exitCode, `collection add failed: ${add.stderr}`).toBe(0);
+  });
+
+  test("(1) search --json file field contains literal path, not handalized slug", async () => {
+    const { stdout, exitCode } = await runQmd(
+      ["search", "searchterm-alpha", "--json"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(exitCode).toBe(0);
+
+    const results = JSON.parse(stdout) as Array<{ file: string }>;
+    expect(results.length).toBeGreaterThan(0);
+
+    const meetingResult = results.find((r) => r.file.includes("Meeting"));
+    expect(meetingResult).toBeDefined();
+    // Must contain the literal filename fragment
+    expect(meetingResult!.file).toContain("# Meeting - 234232 3432 __ 5.md");
+    // Must not contain the handalized version
+    expect(meetingResult!.file).not.toContain("Meeting-234232-3432-5.md");
+  });
+
+  test("(2) get --full-path resolves to real filesystem path for crazy-named file", async () => {
+    const virtualPath = `qmd://crazytest/Budget & Revenue (Q4) [2024].md`;
+    const { stdout, exitCode } = await runQmd(
+      ["get", virtualPath, "--full-path"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(exitCode, `get failed: ${stdout}`).toBe(0);
+
+    const header = stdout.split("\n")[0]!;
+    // Should show a real filesystem path, not a qmd:// virtual path
+    expect(header).not.toMatch(/^qmd:\/\//);
+    // Should include the literal filename
+    expect(header).toContain("Budget & Revenue (Q4) [2024].md");
+    // The resolved filesystem path should exist — strip the trailing docid (#abc123)
+    const fsPath = header.trim().replace(/\s+#[a-f0-9]{6}$/, "");
+    // Path may be absolute or relative-to-collectionDir; resolve against collectionDir
+    const absPath = fsPath.startsWith("/") ? fsPath : join(collectionDir, fsPath.replace(/^\.\//, ""));
+    expect(existsSync(absPath), `resolved path does not exist: ${absPath}`).toBe(true);
+  });
+  test("(3) get <actual-fs-path> finds the document", async () => {
+    const fsPath = join(collectionDir, "Budget & Revenue (Q4) [2024].md");
+    const { stdout, exitCode, stderr } = await runQmd(
+      ["get", fsPath],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(exitCode, `get by fs path failed: ${stderr}`).toBe(0);
+    // Header should contain the document identifier
+    expect(stdout).toContain("Budget & Revenue (Q4) [2024].md");
+  });
+
+  test("(3b) get <actual-fs-path> finds subdir file with crazy name", async () => {
+    const fsPath = join(collectionDir, "subdir", "Notes #42 - foo@bar.md");
+    const { stdout, exitCode, stderr } = await runQmd(
+      ["get", fsPath],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(exitCode, `get subdir file failed: ${stderr}`).toBe(0);
+    expect(stdout).toContain("Notes #42 - foo@bar.md");
+  });
+
+  test("(4) ls shows literal paths, not handalized slugs", async () => {
+    const { stdout, exitCode } = await runQmd(
+      ["ls", "crazytest"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(exitCode).toBe(0);
+
+    // Literal paths must appear
+    expect(stdout).toContain("# Meeting - 234232 3432 __ 5.md");
+    expect(stdout).toContain("Budget & Revenue (Q4) [2024].md");
+    expect(stdout).toContain("Notes #42 - foo@bar.md");
+
+    // Handalized slugs must NOT appear
+    expect(stdout).not.toContain("Meeting-234232-3432-5.md");
+    expect(stdout).not.toContain("Budget-Revenue-Q4-2024.md");
+    expect(stdout).not.toContain("Notes-42-foo-bar.md");
+  });
+
+  test("(5) search --json returns docid that can be fetched back", async () => {
+    const { stdout: searchOut, exitCode: searchExit } = await runQmd(
+      ["search", "searchterm-beta", "--json"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(searchExit).toBe(0);
+
+    const results = JSON.parse(searchOut) as Array<{ docid: string; file: string }>;
+    expect(results.length).toBeGreaterThan(0);
+
+    const hit = results[0]!;
+    expect(hit.docid).toMatch(/^#[a-f0-9]{6}$/);
+
+    // Fetch by docid — must work
+    const { stdout: getOut, exitCode: getExit } = await runQmd(
+      ["get", hit.docid],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(getExit, `get by docid failed`).toBe(0);
+    expect(getOut).toContain("Budget & Revenue (Q4) [2024].md");
+  });
+
+  test("normal filenames are still stored correctly (regression)", async () => {
+    const { stdout, exitCode } = await runQmd(
+      ["search", "Plain filename", "--json"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(exitCode).toBe(0);
+    const results = JSON.parse(stdout) as Array<{ file: string }>;
+    const hit = results.find((r) => r.file.includes("normal-file"));
+    expect(hit).toBeDefined();
+    expect(hit!.file).toContain("normal-file.md");
+  });
+});
+
+// ---------------------------------------------------------------------------
+// Migration test: old handalized DB upgraded by `qmd update`
+// ---------------------------------------------------------------------------
+
+describe("Path fidelity — migration from handalized index", () => {
+  test("qmd update migrates handalized paths to literal paths in existing index", async () => {
+    const { collectionDir, dbPath, configDir } = await createCrazyCollection("migration");
+
+    // Manually build an old-style DB using handalize() (simulates pre-fix index)
+    const store = createStore(dbPath);
+    const now = new Date().toISOString();
+    // Write and sync a config that points at the collection so `qmd update` knows where it is
+    const migrationYaml = `collections:\n  crazytest:\n    path: "${collectionDir}"\n    mask: "**/*.md"\n`;
+    await writeFile(join(configDir, "index.yml"), migrationYaml);
+    const config = YAML.parse(migrationYaml) as CollectionConfig;
+    syncConfigToDb(store.db, config);
+
+    // Insert documents with handalized paths (old behavior)
+    for (const f of crazyFiles) {
+      const relPath = normalizePathSeparators(f.name);
+      const handleized = handelize(relPath);
+      const hash = await hashContent(f.content);
+      insertContent(store.db, hash, f.content, now);
+      insertDocument(store.db, "crazytest", handleized, `Title ${f.name}`, hash, now, now);
+    }
+    const subFile = crazySubFiles[0]!;
+    const subRel = `subdir/${subFile.name}`;
+    const subHandelized = handelize(subRel);
+    const subHash = await hashContent(subFile.content);
+    insertContent(store.db, subHash, subFile.content, now);
+    insertDocument(store.db, "crazytest", subHandelized, "Sub title", subHash, now, now);
+    store.close();
+
+    // Verify the old DB has handalized paths
+    const dbBefore = openDatabase(dbPath);
+    const pathsBefore = (dbBefore.prepare(
+      "SELECT path FROM documents WHERE active = 1 ORDER BY path"
+    ).all() as { path: string }[]).map((r) => r.path);
+    dbBefore.close();
+
+    expect(pathsBefore).toContain("Meeting-234232-3432-5.md");
+    expect(pathsBefore).toContain("Budget-Revenue-Q4-2024.md");
+    expect(pathsBefore).not.toContain("# Meeting - 234232 3432 __ 5.md");
+
+    // Run `qmd update` with the new code — should migrate paths in-place
+    const update = await runQmd(
+      ["update"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(update.exitCode, `qmd update failed: ${update.stderr}`).toBe(0);
+
+    // Verify the DB now has literal paths
+    const dbAfter = openDatabase(dbPath);
+    const pathsAfter = (dbAfter.prepare(
+      "SELECT path FROM documents WHERE active = 1 ORDER BY path"
+    ).all() as { path: string }[]).map((r) => r.path);
+    dbAfter.close();
+
+    expect(pathsAfter).toContain("# Meeting - 234232 3432 __ 5.md");
+    expect(pathsAfter).toContain("Budget & Revenue (Q4) [2024].md");
+    expect(pathsAfter).toContain("normal-file.md");
+    expect(pathsAfter).toContain("subdir/Notes #42 - foo@bar.md");
+
+    // Handalized slugs must be gone
+    expect(pathsAfter).not.toContain("Meeting-234232-3432-5.md");
+    expect(pathsAfter).not.toContain("Budget-Revenue-Q4-2024.md");
+
+    // Search must work after migration
+    const { stdout: searchOut, exitCode: searchExit } = await runQmd(
+      ["search", "searchterm-alpha", "--json"],
+      { cwd: collectionDir, dbPath, configDir }
+    );
+    expect(searchExit).toBe(0);
+    const results = JSON.parse(searchOut) as Array<{ file: string }>;
+    expect(results.length).toBeGreaterThan(0);
+    const meetingResult = results.find((r) => r.file.includes("Meeting"));
+    expect(meetingResult).toBeDefined();
+    expect(meetingResult!.file).toContain("# Meeting - 234232 3432 __ 5.md");
+  });
+});
--- a/test/sdk.test.ts
+++ b/test/sdk.test.ts
@ -614,6 +614,20 @@ describe("search (unified API)", () => {
    expect(results.length).toBeGreaterThan(0);
  });

+  test("search() forwards candidateLimit to structured search", async () => {
+    const results = await store.search({
+      queries: [
+        { type: "lex", query: "authentication" },
+        { type: "lex", query: "meeting" },
+      ],
+      limit: 5,
+      candidateLimit: 1,
+      rerank: false,
+    });
+
+    expect(results).toHaveLength(1);
+  });
+
  // Tests below use search({ query: ... }) which triggers LLM query expansion
  describe.skipIf(!!process.env.CI)("with LLM query expansion", () => {
    test("search() with query and rerank:false returns results", async () => {
@ -982,6 +996,92 @@ describe("embed", () => {
    }
  });

+  test("store.embed scopes pending documents to the requested collection", async () => {
+    const store = await createStore({
+      dbPath: freshDbPath(),
+      config: {
+        collections: {
+          docs: { path: docsDir, pattern: "**/*.md" },
+          notes: { path: notesDir, pattern: "**/*.md" },
+        },
+      },
+    });
+
+    const fakeLlm = createFakeEmbedLlm();
+    setDefaultLlamaCpp(createFakeTokenizer() as any);
+    store.internal.llm = fakeLlm as any;
+
+    try {
+      await store.update();
+      const result = await store.embed({ collection: "docs" });
+
+      const vectorCounts = store.internal.db.prepare(`
+        SELECT d.collection, COUNT(DISTINCT v.hash) AS count
+        FROM documents d
+        LEFT JOIN content_vectors v ON v.hash = d.hash AND v.seq = 0
+        WHERE d.active = 1
+        GROUP BY d.collection
+        ORDER BY d.collection
+      `).all() as Array<{ collection: string; count: number }>;
+
+      expect(result.docsProcessed).toBe(3);
+      expect(result.chunksEmbedded).toBe(3);
+      expect(vectorCounts).toEqual([
+        { collection: "docs", count: 3 },
+        { collection: "notes", count: 0 },
+      ]);
+    } finally {
+      setDefaultLlamaCpp(null);
+      await store.close();
+    }
+  });
+
+  test("store.embed with force only clears the requested collection", async () => {
+    const store = await createStore({
+      dbPath: freshDbPath(),
+      config: {
+        collections: {
+          docs: { path: docsDir, pattern: "**/*.md" },
+          notes: { path: notesDir, pattern: "**/*.md" },
+        },
+      },
+    });
+
+    const fakeLlm = createFakeEmbedLlm();
+    setDefaultLlamaCpp(createFakeTokenizer() as any);
+    store.internal.llm = fakeLlm as any;
+
+    const vectorCounts = () => store.internal.db.prepare(`
+      SELECT d.collection, COUNT(DISTINCT v.hash) AS count
+      FROM documents d
+      LEFT JOIN content_vectors v ON v.hash = d.hash AND v.seq = 0
+      WHERE d.active = 1
+      GROUP BY d.collection
+      ORDER BY d.collection
+    `).all() as Array<{ collection: string; count: number }>;
+
+    try {
+      await store.update();
+      await store.embed();
+      expect(vectorCounts()).toEqual([
+        { collection: "docs", count: 3 },
+        { collection: "notes", count: 3 },
+      ]);
+
+      const result = await store.embed({ force: true, collection: "docs" });
+
+      expect(result.docsProcessed).toBe(3);
+      expect(result.chunksEmbedded).toBe(3);
+      expect(vectorCounts()).toEqual([
+        { collection: "docs", count: 3 },
+        { collection: "notes", count: 3 },
+      ]);
+    } finally {
+      setDefaultLlamaCpp(null);
+      await store.close();
+    }
+  });
+
  test("store.embed rejects invalid batch limits", async () => {
    const store = await createStore({
      dbPath: freshDbPath(),
--- a/test/smoke-install.sh
+++ b/test/smoke-install.sh
@ -1,17 +1,28 @@
 #!/usr/bin/env bash
-# Build a container image with qmd installed via npm and bun, then run smoke tests.
-# Works with docker or podman (whichever is available).
+# Build a clean container image from the current checkout package and exercise
+# install/runtime scenarios under npm, npx, and Bun. Supports optional qmd embed
+# and GPU probes, but keeps those expensive/device-specific checks opt-in.
 #
 # Usage:
-#   test/smoke-install.sh              # build + run all smoke tests
-#   test/smoke-install.sh --build      # build image only
-#   test/smoke-install.sh --shell      # drop into container shell
-#   test/smoke-install.sh -- CMD...    # run arbitrary command in container
+#   test/smoke-install.sh                         # build + run default smoke scenarios
+#   test/smoke-install.sh --build                 # build image only
+#   test/smoke-install.sh --shell                 # drop into container shell
+#   test/smoke-install.sh --scenario node         # run one scenario (node|npx|bun|all)
+#   test/smoke-install.sh --with-embed            # also run tiny qmd embed smoke tests
+#   test/smoke-install.sh --with-gpu              # also probe GPU in doctor/embed scenarios
+#   QMD_SMOKE_GPU_BACKEND=cuda|vulkan|auto        # backend for --with-gpu (default: auto)
+#   test/smoke-install.sh --no-build              # reuse existing image
+#   test/smoke-install.sh -- CMD...               # run arbitrary command in container
+#
+# GPU notes:
+#   Docker uses:  --gpus all
+#   Podman uses:  --device nvidia.com/gpu=all
+#   If your podman setup uses a different CDI device name, override with:
+#     QMD_SMOKE_GPU_ARGS='--device nvidia.com/gpu=all' test/smoke-install.sh --with-gpu
 set -euo pipefail

 cd "$(dirname "$0")/.."

-# Pick container runtime
 if command -v podman &>/dev/null; then
  CTR=podman
 elif command -v docker &>/dev/null; then
@ -21,10 +32,50 @@ else
  exit 1
 fi

-IMAGE=qmd-smoke
+IMAGE=${QMD_SMOKE_IMAGE:-qmd-smoke}
+SCENARIO=all
+DO_BUILD=1
+WITH_EMBED=0
+WITH_GPU=0
+GPU_BACKEND=${QMD_SMOKE_GPU_BACKEND:-auto}
+declare -a ARBITRARY_CMD=()
+
+usage() {
+  sed -n '2,20p' "$0" | sed 's/^# \{0,1\}//'
+}
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --build) DO_BUILD=1; BUILD_ONLY=1; shift ;;
+    --no-build) DO_BUILD=0; shift ;;
+    --shell) SHELL_ONLY=1; shift ;;
+    --scenario) SCENARIO="${2:-}"; shift 2 ;;
+    --with-embed) WITH_EMBED=1; shift ;;
+    --with-gpu) WITH_GPU=1; shift ;;
+    --help|-h) usage; exit 0 ;;
+    --) shift; ARBITRARY_CMD=("$@"); break ;;
+    *) echo "Unknown argument: $1" >&2; usage >&2; exit 1 ;;
+  esac
+done
+
+BUILD_ONLY=${BUILD_ONLY:-0}
+SHELL_ONLY=${SHELL_ONLY:-0}
+
+gpu_args() {
+  if [[ $WITH_GPU -ne 1 ]]; then return 0; fi
+  if [[ -n "${QMD_SMOKE_GPU_ARGS:-}" ]]; then
+    # shellcheck disable=SC2206
+    echo ${QMD_SMOKE_GPU_ARGS}
+    return 0
+  fi
+  case "$CTR" in
+    docker) echo "--gpus all" ;;
+    podman) echo "--device nvidia.com/gpu=all" ;;
+  esac
+}

 build_image() {
-  echo "==> Building TypeScript..."
+  echo "==> Building TypeScript package..."
  npm run build --silent

  echo "==> Packing tarball..."
@ -32,32 +83,35 @@ build_image() {
  TARBALL=$(npm pack --pack-destination test/ 2>/dev/null | tail -1)
  echo "    $TARBALL"

-  # Copy project files into build context so vitest/bun tests can run inside
+  echo "==> Preparing container test project..."
  rm -rf test/test-src
-  mkdir -p test/test-src/src test/test-src/test
-  cp src/*.ts test/test-src/src/
+  mkdir -p test/test-src/test
+  cp -r src test/test-src/
  cp -r dist test/test-src/
-  cp test/*.test.ts test/test-src/test/
+  cp -r test/*.test.ts test/test-src/test/
  cp package.json tsconfig.json tsconfig.build.json test/test-src/

-  echo "==> Building container image ($CTR)..."
+  echo "==> Building container image ($CTR): $IMAGE"
  $CTR build -f test/Containerfile -t "$IMAGE" test/

-  # Clean up
  rm -f test/tobilu-qmd-*.tgz
  rm -rf test/test-src
  echo "==> Image ready: $IMAGE"
 }

 run() {
-  $CTR run --rm "$IMAGE" bash -c "$*"
+  local args=()
+  # Intentionally word-split GPU args: container CLIs expect separate flags.
+  # shellcheck disable=SC2206
+  args=( $(gpu_args) )
+  $CTR run --rm "${args[@]}" "$IMAGE" bash -lc "$*"
 }

 PASS=0
 FAIL=0

-ok()   { printf "  %-50s OK\n" "$1"; PASS=$((PASS + 1)); }
-fail() { printf "  %-50s FAIL\n" "$1"; FAIL=$((FAIL + 1)); echo "$2" | sed 's/^/    /'; }
+ok()   { printf "  %-58s OK\n" "$1"; PASS=$((PASS + 1)); }
+fail() { printf "  %-58s FAIL\n" "$1"; FAIL=$((FAIL + 1)); echo "$2" | sed 's/^/    /'; }

 smoke_test() {
  local label="$1"; shift
@ -73,97 +127,136 @@ smoke_test_output() {
  local label="$1"; local expect="$2"; shift 2
  local out
  out=$(run "$@" 2>&1) || true
-  if echo "$out" | grep -q "$expect"; then
+  if grep -q "$expect" <<<"$out"; then
    ok "$label"
  else
    fail "$label" "$out"
  fi
 }

-run_smoke_tests() {
-  # ------------------------------------------------------------------
-  # Node (npm-installed qmd)
-  # ------------------------------------------------------------------
+fixture_setup='rm -rf /tmp/qmd-fixture /tmp/qmd-cache /tmp/qmd-config /tmp/qmd-models; mkdir -p /tmp/qmd-fixture; printf "# Smoke Doc\n\nGPU and CPU embedding smoke test.\n" > /tmp/qmd-fixture/doc.md; export XDG_CACHE_HOME=/tmp/qmd-cache QMD_CONFIG_DIR=/tmp/qmd-config'
+
+gpu_env() {
+  case "$GPU_BACKEND" in
+    auto|"") echo "" ;;
+    cuda|vulkan|metal) echo "QMD_LLAMA_GPU=$GPU_BACKEND" ;;
+    *) echo "Unsupported QMD_SMOKE_GPU_BACKEND=$GPU_BACKEND" >&2; exit 1 ;;
+  esac
+}
+
+run_doctor_smoke() {
+  local label="$1" bin="$2" extra_env="${3:-}"
+  smoke_test_output "$label doctor" "QMD Doctor" \
+    "$fixture_setup; $extra_env $bin doctor"
+}
+
+run_collection_smoke() {
+  local label="$1" bin="$2" extra_env="${3:-}"
+  smoke_test "$label collection add/list/status" \
+    "$fixture_setup; cd /tmp/qmd-fixture; $extra_env $bin collection add . --name smoke; $extra_env $bin collection list; $extra_env $bin status"
+}
+
+run_embed_smoke() {
+  local label="$1" bin="$2" extra_env="${3:-}"
+  [[ $WITH_EMBED -eq 1 ]] || return 0
+  smoke_test "$label qmd embed tiny fixture" \
+    "$fixture_setup; cd /tmp/qmd-fixture; $extra_env $bin collection add . --name smoke; $extra_env $bin embed --max-docs-per-batch 1 --max-batch-mb 1; $extra_env $bin doctor"
+}
+
+run_runtime_matrix() {
+  local label="$1" bin="$2" path_env="$3"
+  smoke_test_output "$label qmd help" "Usage:" "$path_env; $bin"
+  run_doctor_smoke "$label auto" "$path_env; $bin"
+  run_doctor_smoke "$label force-cpu" "$path_env; $bin" "QMD_FORCE_CPU=1"
+  run_collection_smoke "$label" "$path_env; $bin" "QMD_FORCE_CPU=1"
+  run_embed_smoke "$label force-cpu" "$path_env; $bin" "QMD_FORCE_CPU=1"
+  run_embed_smoke "$label auto" "$path_env; $bin"
+  if [[ $WITH_GPU -eq 1 ]]; then
+    local ge
+    ge=$(gpu_env)
+    run_doctor_smoke "$label gpu-$GPU_BACKEND" "$path_env; $bin" "$ge"
+    run_embed_smoke "$label gpu-$GPU_BACKEND" "$path_env; $bin" "$ge"
+  fi
+}
+
+run_node_scenario() {
  local NODE_BIN='$(mise where node@latest)/bin'
-  echo "=== Node (npm install) ==="
-
-  smoke_test_output "qmd shows help" "Usage:" \
-    "export PATH=$NODE_BIN:\$PATH; qmd"
-
-  smoke_test "qmd collection list" \
-    "export PATH=$NODE_BIN:\$PATH; qmd collection list"
-
-  smoke_test "qmd status" \
-    "export PATH=$NODE_BIN:\$PATH; qmd status"
-
-  smoke_test "sqlite-vec loads" \
-    "export PATH=$NODE_BIN:\$PATH;
-     NPM_GLOBAL=\$(npm root -g);
-     node -e \"
-      const {openDatabase, loadSqliteVec} = await import('\$NPM_GLOBAL/@tobilu/qmd/dist/db.js');
+  local bin='qmd'
+  echo "=== Node: npm install -g packed tarball ==="
+  run_runtime_matrix "node" "$bin" "export PATH=$NODE_BIN:\$PATH"
+  smoke_test "node sqlite-vec loads" \
+    "export PATH=$NODE_BIN:\$PATH; NPM_GLOBAL=\$(npm root -g); node -e \"
+      const {openDatabase, loadSqliteVec} = await import('\\$NPM_GLOBAL/@tobilu/qmd/dist/db.js');
      const db = openDatabase(':memory:');
      loadSqliteVec(db);
      const r = db.prepare('SELECT vec_version() as v').get();
      console.log('sqlite-vec', r.v);
      if (!r.v) process.exit(1);
    \""
-
-  smoke_test "vitest (node)" \
+  smoke_test "node vitest store subset" \
    "export PATH=$NODE_BIN:\$PATH; cd /opt/qmd && npx vitest run --reporter=verbose test/store.test.ts 2>&1 | tail -5"
+}

-  # ------------------------------------------------------------------
-  # Bun (bun-installed qmd)
-  # ------------------------------------------------------------------
+run_npx_scenario() {
+  local NODE_BIN='$(mise where node@latest)/bin'
+  local bin='npm exec --yes --package /tmp/tobilu-qmd.tgz -- qmd'
+  echo "=== Node: npm exec/npx-style packed tarball ==="
+  run_runtime_matrix "npx-style" "$bin" "export PATH=$NODE_BIN:\$PATH"
+}
+
+run_bun_scenario() {
+  local NODE_BIN='$(mise where node@latest)/bin'
  local BUN_BIN='$(mise where bun@latest)/bin'
-  echo ""
-  echo "=== Bun (bun install) ==="
-
-  smoke_test_output "qmd shows help" "Usage:" \
-    "export PATH=$BUN_BIN:$NODE_BIN:\$PATH; \$HOME/.bun/bin/qmd"
-
-  smoke_test "qmd collection list" \
-    "export PATH=$BUN_BIN:$NODE_BIN:\$PATH; \$HOME/.bun/bin/qmd collection list"
-
-  smoke_test "qmd status" \
-    "export PATH=$BUN_BIN:$NODE_BIN:\$PATH; \$HOME/.bun/bin/qmd status"
-
-  smoke_test "sqlite-vec loads (bun)" \
+  local bin='$HOME/.bun/bin/qmd'
+  echo "=== Bun: bun install -g packed tarball ==="
+  run_runtime_matrix "bun" "$bin" "export PATH=$BUN_BIN:$NODE_BIN:\$PATH"
+  smoke_test "bun sqlite-vec loads" \
    "export PATH=$BUN_BIN:\$PATH; bun -e \"
-      const {openDatabase, loadSqliteVec} = await import('\$HOME/.bun/install/global/node_modules/@tobilu/qmd/dist/db.js');
+      const {openDatabase, loadSqliteVec} = await import('\\$HOME/.bun/install/global/node_modules/@tobilu/qmd/dist/db.js');
      const db = openDatabase(':memory:');
      loadSqliteVec(db);
      const r = db.prepare('SELECT vec_version() as v').get();
      console.log('sqlite-vec', r.v);
      if (!r.v) process.exit(1);
    \""
-
-  smoke_test "bun test store" \
+  smoke_test "bun test store subset" \
    "export PATH=$BUN_BIN:\$PATH; cd /opt/qmd && bun test --preload ./src/test-preload.ts --timeout 30000 test/store.test.ts 2>&1 | tail -10"
+}

-  # ------------------------------------------------------------------
+run_smoke_tests() {
+  case "$SCENARIO" in
+    node) run_node_scenario ;;
+    npx) run_npx_scenario ;;
+    bun) run_bun_scenario ;;
+    all) run_node_scenario; echo; run_npx_scenario; echo; run_bun_scenario ;;
+    *) echo "Unknown scenario: $SCENARIO" >&2; exit 1 ;;
+  esac
  echo ""
  echo "=== Results: $PASS passed, $FAIL failed ==="
  [[ $FAIL -eq 0 ]]
 }

-# Parse arguments
-case "${1:-}" in
-  --build)
-    build_image
-    ;;
-  --shell)
-    build_image
-    echo "==> Dropping into container shell..."
-    $CTR run --rm -it "$IMAGE" bash
-    ;;
-  --)
-    shift
-    run "$@"
-    ;;
-  *)
-    build_image
-    echo ""
-    echo "==> Running smoke tests..."
-    run_smoke_tests
-    ;;
-esac
+if [[ $DO_BUILD -eq 1 ]]; then
+  build_image
+fi
+
+if [[ ${#ARBITRARY_CMD[@]} -gt 0 ]]; then
+  run "${ARBITRARY_CMD[*]}"
+  exit $?
+fi
+
+if [[ $BUILD_ONLY -eq 1 ]]; then
+  exit 0
+fi
+
+if [[ $SHELL_ONLY -eq 1 ]]; then
+  echo "==> Dropping into container shell..."
+  # shellcheck disable=SC2206
+  gpu=( $(gpu_args) )
+  $CTR run --rm -it "${gpu[@]}" "$IMAGE" bash
+  exit $?
+fi
+
+echo ""
+echo "==> Running smoke tests..."
+run_smoke_tests
--- a/test/store.test.ts
+++ b/test/store.test.ts
@ -9,7 +9,7 @@
 import { describe, test, expect, beforeAll, afterAll, beforeEach, afterEach, vi } from "vitest";
 import { openDatabase, loadSqliteVec } from "../src/db.js";
 import type { Database } from "../src/db.js";
-import { unlink, mkdtemp, rmdir, writeFile } from "node:fs/promises";
+import { unlink, mkdtemp, rmdir, writeFile, rm, mkdir, rename } from "node:fs/promises";
 import { tmpdir } from "node:os";
 import { join } from "node:path";
 import YAML from "yaml";
@ -26,6 +26,7 @@ import {
  extractTitle,
  formatQueryForEmbedding,
  formatDocForEmbedding,
+  getEmbeddingFingerprint,
  chunkDocument,
  chunkDocumentByTokens,
  chunkDocumentAsync,
@ -46,13 +47,22 @@ import {
  normalizeDocid,
  isDocid,
  syncConfigToDb,
+  reindexCollection,
  STRONG_SIGNAL_MIN_SCORE,
  STRONG_SIGNAL_MIN_GAP,
+  insertContent,
+  insertDocument,
  generateEmbeddings,
+  getHybridRrfWeights,
+  _resetProductionModeForTesting,
+  hybridQuery,
+  structuredSearch,
+  vectorSearchQuery,
  type Store,
  type DocumentResult,
  type SearchResult,
  type RankedResult,
+  type RankedListMeta,
 } from "../src/store.js";
 import type { CollectionConfig } from "../src/collections.js";

@ -156,18 +166,18 @@ async function insertTestDocument(
  const hash = opts.hash || await hashContent(body);

  // Insert content (with OR IGNORE for deduplication)
-  db.prepare(`
-    INSERT OR IGNORE INTO content (hash, doc, created_at)
-    VALUES (?, ?, ?)
-  `).run(hash, body, now);
+  insertContent(db, hash, body, now);

-  // Insert document
-  const result = db.prepare(`
-    INSERT INTO documents (collection, path, title, hash, created_at, modified_at, active)
-    VALUES (?, ?, ?, ?, ?, ?, ?)
-  `).run(collectionName, path, title, hash, now, now, active);
+  insertDocument(db, collectionName, path, title, hash, now, now);
+  const row = db.prepare(`
+    SELECT id FROM documents WHERE collection = ? AND path = ?
+  `).get(collectionName, path) as { id: number } | undefined;

-  return Number(result.lastInsertRowid);
+  if (active === 0 && row) {
+    db.prepare(`UPDATE documents SET active = 0 WHERE id = ?`).run(row.id);
+  }
+
+  return row?.id ?? 0;
 }

 /** Sync YAML config file to SQLite store_collections in the current test store */
@ -277,7 +287,9 @@ afterAll(async () => {

 describe("Store Creation", () => {
  test("createStore throws without explicit path in test mode", () => {
-    // In test mode, createStore without path should throw to prevent accidental writes
+    // In test mode, createStore without path should throw to prevent accidental writes.
+    // Other tests may enable production mode in the same Bun process, so reset first.
+    _resetProductionModeForTesting();
    const originalIndexPath = process.env.INDEX_PATH;
    delete process.env.INDEX_PATH;

@ -300,19 +312,127 @@ describe("Store Creation", () => {

    // Check tables exist
    const tables = store.db.prepare(`
-      SELECT name FROM sqlite_master WHERE type='table' ORDER BY name
+      SELECT name FROM sqlite_master
+      WHERE type='table'
+      ORDER BY name
    `).all() as { name: string }[];

    const tableNames = tables.map(t => t.name);
    expect(tableNames).toContain("documents");
    expect(tableNames).toContain("documents_fts");
    expect(tableNames).toContain("content_vectors");
+    expect(tableNames).toContain("content");
    expect(tableNames).toContain("llm_cache");
    // Note: path_contexts table removed in favor of YAML-based context storage

    await cleanupTestDb(store);
  });

+  test("createStore defers content_vectors embed_fingerprint migration until embedding health needs it", async () => {
+    const dbPath = join(testDir, `legacy-${Date.now()}-${Math.random().toString(36).slice(2)}.sqlite`);
+    const model = "hf:test/embed-model.gguf";
+    const legacyDb = openDatabase(dbPath);
+    legacyDb.exec(`
+      CREATE TABLE content (
+        hash TEXT PRIMARY KEY,
+        doc TEXT NOT NULL,
+        created_at TEXT NOT NULL
+      );
+      CREATE TABLE documents (
+        id INTEGER PRIMARY KEY AUTOINCREMENT,
+        collection TEXT NOT NULL,
+        path TEXT NOT NULL,
+        title TEXT,
+        hash TEXT NOT NULL,
+        created_at TEXT NOT NULL,
+        modified_at TEXT NOT NULL,
+        active INTEGER NOT NULL DEFAULT 1,
+        FOREIGN KEY (hash) REFERENCES content(hash) ON DELETE CASCADE,
+        UNIQUE(collection, path)
+      );
+      CREATE TABLE content_vectors (
+        hash TEXT NOT NULL,
+        seq INTEGER NOT NULL DEFAULT 0,
+        pos INTEGER NOT NULL DEFAULT 0,
+        model TEXT NOT NULL,
+        total_chunks INTEGER NOT NULL DEFAULT 1,
+        embedded_at TEXT NOT NULL,
+        PRIMARY KEY (hash, seq)
+      )
+    `);
+    const now = new Date().toISOString();
+    legacyDb.prepare(`INSERT INTO content (hash, doc, created_at) VALUES (?, ?, ?)`).run("hash1", "# Legacy\nbody", now);
+    legacyDb.prepare(`INSERT INTO documents (collection, path, title, hash, created_at, modified_at, active) VALUES (?, ?, ?, ?, ?, ?, 1)`).run("test", "legacy.md", "Legacy", "hash1", now, now);
+    legacyDb.prepare(`INSERT INTO content_vectors (hash, seq, pos, model, total_chunks, embedded_at) VALUES (?, ?, ?, ?, ?, ?)`).run("hash1", 0, 0, model, 1, now);
+    legacyDb.close();
+
+    const store = createStore(dbPath);
+    let columns = store.db.prepare(`PRAGMA table_info(content_vectors)`).all() as { name: string }[];
+    expect(columns.map(col => col.name)).not.toContain("embed_fingerprint");
+
+    expect(store.getHashesNeedingEmbedding(model)).toBe(1);
+
+    columns = store.db.prepare(`PRAGMA table_info(content_vectors)`).all() as { name: string }[];
+    const migratedRow = store.db.prepare(`SELECT embed_fingerprint FROM content_vectors WHERE hash = ?`).get("hash1") as { embed_fingerprint: string };
+    expect(columns.map(col => col.name)).toContain("embed_fingerprint");
+    expect(migratedRow.embed_fingerprint).toBe("");
+
+    await cleanupTestDb(store);
+  });
+
+  test("content_vectors column repair runs the full ALTER series and retries the failed operation", async () => {
+    const dbPath = join(testDir, `legacy-no-seq-${Date.now()}-${Math.random().toString(36).slice(2)}.sqlite`);
+    const model = "hf:test/embed-model.gguf";
+    const legacyDb = openDatabase(dbPath);
+    legacyDb.exec(`
+      CREATE TABLE content (
+        hash TEXT PRIMARY KEY,
+        doc TEXT NOT NULL,
+        created_at TEXT NOT NULL
+      );
+      CREATE TABLE documents (
+        id INTEGER PRIMARY KEY AUTOINCREMENT,
+        collection TEXT NOT NULL,
+        path TEXT NOT NULL,
+        title TEXT,
+        hash TEXT NOT NULL,
+        created_at TEXT NOT NULL,
+        modified_at TEXT NOT NULL,
+        active INTEGER NOT NULL DEFAULT 1,
+        FOREIGN KEY (hash) REFERENCES content(hash) ON DELETE CASCADE,
+        UNIQUE(collection, path)
+      );
+      CREATE TABLE content_vectors (
+        hash TEXT NOT NULL,
+        model TEXT NOT NULL,
+        embed_fingerprint TEXT NOT NULL DEFAULT '',
+        total_chunks INTEGER NOT NULL DEFAULT 1,
+        embedded_at TEXT NOT NULL
+      )
+    `);
+    legacyDb.close();
+
+    const store = createStore(dbPath);
+    let columns = store.db.prepare(`PRAGMA table_info(content_vectors)`).all() as { name: string }[];
+    expect(columns.map(col => col.name)).not.toContain("seq");
+    expect(columns.map(col => col.name)).not.toContain("pos");
+
+    store.ensureVecTable(3);
+    store.insertEmbedding("hash1", 1, 42, new Float32Array([1, 2, 3]), model, new Date().toISOString(), 2);
+
+    columns = store.db.prepare(`PRAGMA table_info(content_vectors)`).all() as { name: string }[];
+    const columnNames = columns.map(col => col.name);
+    expect(columnNames).toEqual(expect.arrayContaining(["seq", "pos", "model", "embed_fingerprint", "total_chunks", "embedded_at"]));
+    expect(store.db.prepare(`SELECT seq, pos, model, total_chunks FROM content_vectors WHERE hash = ?`).get("hash1")).toEqual({
+      seq: 1,
+      pos: 42,
+      model,
+      total_chunks: 2,
+    });
+
+    await cleanupTestDb(store);
+  });
+
  test("createStore sets WAL journal mode", async () => {
    const store = await createTestStore();
    const result = store.db.prepare("PRAGMA journal_mode").get() as { journal_mode: string };
@ -1250,6 +1370,61 @@ describe("FTS Search", () => {
    await cleanupTestDb(store);
  });

+  test("searchFTS finds CJK documents by exact and mixed queries", async () => {
+    const store = await createTestStore();
+    const collectionName = await createTestCollection();
+
+    await insertTestDocument(store.db, collectionName, {
+      name: "zh",
+      title: "中文检索说明",
+      body: "这里介绍 vector 数据库和关键词检索。",
+      displayPath: "cjk/zh.md",
+    });
+    await insertTestDocument(store.db, collectionName, {
+      name: "ja",
+      title: "日本語検索メモ",
+      body: "この文書は検索品質とトークン化について説明します。",
+      displayPath: "cjk/ja.md",
+    });
+    await insertTestDocument(store.db, collectionName, {
+      name: "ko",
+      title: "한국어 검색 노트",
+      body: "이 문서는 검색 품질과 토큰화 문제를 설명합니다.",
+      displayPath: "cjk/ko.md",
+    });
+
+    expect(store.searchFTS("关键词检索", 10).map(r => r.displayPath)).toContain(`${collectionName}/cjk/zh.md`);
+    expect(store.searchFTS("検索品質", 10).map(r => r.displayPath)).toContain(`${collectionName}/cjk/ja.md`);
+    expect(store.searchFTS("검색 품질", 10).map(r => r.displayPath)).toContain(`${collectionName}/cjk/ko.md`);
+    expect(store.searchFTS("vector 关键词", 10).map(r => r.displayPath)).toContain(`${collectionName}/cjk/zh.md`);
+
+    await cleanupTestDb(store);
+  });
+
+  test("searchFTS keeps English behavior while indexing CJK text", async () => {
+    const store = await createTestStore();
+    const collectionName = await createTestCollection();
+
+    await insertTestDocument(store.db, collectionName, {
+      name: "english",
+      title: "Vector Search Notes",
+      body: "The quick brown fox explains vector search and BM25 ranking.",
+      displayPath: "english.md",
+    });
+    await insertTestDocument(store.db, collectionName, {
+      name: "zh",
+      title: "中文检索说明",
+      body: "这里介绍向量数据库和关键词检索。",
+      displayPath: "zh.md",
+    });
+
+    const foxResults = store.searchFTS("quick fox", 10);
+    expect(foxResults.map(r => r.displayPath)).toContain(`${collectionName}/english.md`);
+    expect(foxResults.map(r => r.displayPath)).not.toContain(`${collectionName}/zh.md`);
+
+    await cleanupTestDb(store);
+  });
+
  test("searchFTS handles special characters in query", async () => {
    const store = await createTestStore();
    const collectionName = await createTestCollection();
@ -1429,6 +1604,39 @@ describe("FTS Search", () => {

    await cleanupTestDb(store);
  });
+
+  test("searchFTS matches dotted version strings like 2026.4.10 (#563)", async () => {
+    // Regression test: porter unicode61 tokenizer splits on dots, so the index
+    // stores "2026", "4", "10" as separate tokens. Before the fix, sanitizeFTS5Term
+    // stripped the dots producing "2026410" which never matched anything.
+    const store = await createTestStore();
+    const collectionName = await createTestCollection();
+
+    await insertTestDocument(store.db, collectionName, {
+      name: "release-notes",
+      title: "Release Notes",
+      body: "## Release 2026.4.10\n\nThis version introduces new features and bug fixes.",
+      displayPath: "test/release-notes.md",
+    });
+
+    // A document that does NOT contain the version string
+    await insertTestDocument(store.db, collectionName, {
+      name: "other-doc",
+      title: "Other Document",
+      body: "Unrelated content about gardening and cooking.",
+      displayPath: "test/other.md",
+    });
+
+    const results = store.searchFTS("2026.4.10", 10);
+    expect(results.length).toBeGreaterThan(0);
+    expect(results.map(r => r.displayPath)).toContain(`${collectionName}/test/release-notes.md`);
+
+    // Partial version should also work
+    const partial = store.searchFTS("2026.4", 10);
+    expect(partial.map(r => r.displayPath)).toContain(`${collectionName}/test/release-notes.md`);
+
+    await cleanupTestDb(store);
+  });
 });

 // =============================================================================
@ -1647,6 +1855,21 @@ describe("Document Retrieval", () => {
      expect(body).toBeNull();
      await cleanupTestDb(store);
    });
+
+    test("getDocumentBody clamps negative fromLine to top of document", async () => {
+      const store = await createTestStore();
+      const collectionName = await createTestCollection({ pwd: "/path" });
+      await insertTestDocument(store.db, collectionName, {
+        name: "mydoc",
+        displayPath: "mydoc.md",
+        body: "Line 1\nLine 2\nLine 3\nLine 4\nLine 5",
+      });
+
+      const body = store.getDocumentBody({ filepath: "/path/mydoc.md" }, -19, 80);
+      expect(body).toBe("Line 1\nLine 2\nLine 3\nLine 4\nLine 5");
+
+      await cleanupTestDb(store);
+    });
  });

  describe("findDocuments (multi-get)", () => {
@ -1903,6 +2126,26 @@ describe("Snippet Extraction", () => {
    expect(linesAfter).toBe(2);   // Fourth, Fifth
  });

+  test("extractSnippet with leading blank/frontmatter lines reports 1 before, not 0", () => {
+    // Regression: a user looked at `@@ -2,4 @@ (1 before, 72 after)` and
+    // suspected "1 before" was wrong because the match appeared to be the
+    // topmost visible line. The math takes "before" from the absolute file
+    // line, not from the visible portion of the snippet — so when the
+    // snippet starts at line 2, "1 before" is the correct count. Lock that
+    // in with a 77-line document whose match sits on line 3.
+    const otherLines = Array.from({ length: 72 }, (_, i) => `body line ${i + 6}`).join("\n");
+    const body = `---\ntitle: Notes\n# Heading with keyword\nIntro paragraph.\nMore intro lines.\n${otherLines}`;
+
+    const { line, linesBefore, snippetLines, linesAfter, snippet } =
+      extractSnippet(body, "keyword", 500);
+
+    expect(line).toBe(3);             // match is on line 3
+    expect(linesBefore).toBe(1);      // exactly one line above the 4-line snippet window
+    expect(snippetLines).toBe(4);     // lines 2..5 form the snippet
+    expect(linesAfter).toBe(72);      // remaining body
+    expect(snippet).toContain("@@ -2,4 @@ (1 before, 72 after)");
+  });
+
  test("extractSnippet at document end shows 0 after", () => {
    const body = "First\nSecond\nThird\nFourth\nFifth keyword";
    const { linesBefore, linesAfter, snippetLines, line } = extractSnippet(body, "keyword", 500);
@ -1935,6 +2178,33 @@ describe("Snippet Extraction", () => {
    expect(line).toBe(51); // "Target keyword" is line 51
    expect(linesBefore).toBeGreaterThan(40); // Many lines before
  });
+
+  test("extractSnippet anchors on chunkPos when lexical scoring finds no match", () => {
+    // The snippet tokenizer does not strip FTS5 syntax, so a quoted-phrase query
+    // tokenises into terms with embedded quotes that never appear in body text.
+    // bestScore stays at 0 even though the reranker correctly identified a chunk;
+    // the fallback should anchor on chunkPos rather than defaulting to line 1.
+    const padLine = "Lorem ipsum dolor sit amet\n";
+    const padding = padLine.repeat(100);
+    const body = padding + "chunk content here\nmore chunk content\n" + padding;
+    const chunkPos = padding.length;
+
+    const { line } = extractSnippet(body, '"unrelated quoted phrase"', 200, chunkPos);
+
+    expect(line).toBeGreaterThan(50);
+    expect(line).toBeLessThan(110);
+  });
+
+  test("extractSnippet with chunkPos=0 falls back to full-body scan when chunk has no match", () => {
+    // chunkPos=0 may be the chunk selector's bestIdx=0 default rather than a real
+    // first-chunk hit, so the fallback must consider matches outside chunk 0.
+    const padding = "Lorem ipsum dolor sit amet\n".repeat(200);
+    const body = padding + "TARGET_KEYWORD line content\ntail line\n";
+
+    const { line } = extractSnippet(body, "TARGET_KEYWORD", 200, 0);
+
+    expect(line).toBe(201);
+  });
 });

 // =============================================================================
@ -1988,6 +2258,38 @@ describe("Reciprocal Rank Fusion", () => {
    expect(fused[0]!.file).toBe("doc1");
  });

+  test("hybrid RRF weights boost original vector evidence over expansion-only hits", () => {
+    const originalFtsOnly = makeResult("original-fts-only.md", 0.95);
+    const expansionOnly = makeResult("lex-expansion-only.md", 0.95);
+    const originalVector = makeResult("original-vector.md", 0.95);
+
+    // Mirrors hybridQuery's common list order when a lex expansion exists:
+    // original FTS, lex expansion FTS, original vector.
+    const rankedLists = [
+      [originalFtsOnly],
+      [expansionOnly],
+      [originalVector],
+    ];
+    const rankedListMeta: RankedListMeta[] = [
+      { source: "fts", queryType: "original", query: "user query" },
+      { source: "fts", queryType: "lex", query: "lex expansion" },
+      { source: "vec", queryType: "original", query: "user query" },
+    ];
+
+    const positionBasedWeights = rankedLists.map((_, i) => i < 2 ? 2.0 : 1.0);
+    const buggyOrder = reciprocalRankFusion(rankedLists, positionBasedWeights);
+
+    expect(buggyOrder.findIndex(r => r.file === "lex-expansion-only.md"))
+      .toBeLessThan(buggyOrder.findIndex(r => r.file === "original-vector.md"));
+
+    const semanticWeights = getHybridRrfWeights(rankedListMeta);
+    const fixedOrder = reciprocalRankFusion(rankedLists, semanticWeights);
+
+    expect(semanticWeights).toEqual([2.0, 1.0, 2.0]);
+    expect(fixedOrder.findIndex(r => r.file === "original-vector.md"))
+      .toBeLessThan(fixedOrder.findIndex(r => r.file === "lex-expansion-only.md"));
+  });
+
  test("RRF adds top-rank bonus", () => {
    // doc1 is #1 in list1, doc2 is #2 in list1
    const list1 = [makeResult("doc1", 0.9), makeResult("doc2", 0.8)];
@ -2020,6 +2322,65 @@ describe("Reciprocal Rank Fusion", () => {
  });
 });

+// =============================================================================
+// Reindex Collection Tests
+// =============================================================================
+
+describe("Reindex Collection", () => {
+  test("preserves document id and embeddings when file path changes only by case", async () => {
+    const store = await createTestStore();
+    const collectionName = "docs";
+    const collectionPath = join(testDir, `case-rename-${Date.now()}-${Math.random().toString(36).slice(2)}`);
+    await mkdir(collectionPath, { recursive: true });
+
+    const originalPath = join(collectionPath, "README.md");
+    const renamedPath = join(collectionPath, "readme.md");
+    const body = "# Case Rename\n\nContent that should keep the same embedding.";
+    await writeFile(originalPath, body);
+
+    const firstResult = await reindexCollection(store, collectionPath, "**/*.md", collectionName);
+    expect(firstResult.indexed).toBe(1);
+
+    const before = store.db.prepare(`
+      SELECT id, path, hash FROM documents
+      WHERE collection = ? AND active = 1
+    `).get(collectionName) as { id: number; path: string; hash: string };
+    expect(before.path).toBe("README.md");
+
+    store.db.prepare(`
+      INSERT INTO content_vectors (hash, seq, pos, model, embedded_at)
+      VALUES (?, 0, 0, 'test-model', ?)
+    `).run(before.hash, new Date().toISOString());
+
+    await rename(originalPath, renamedPath);
+
+    const secondResult = await reindexCollection(store, collectionPath, "**/*.md", collectionName);
+    expect(secondResult.indexed).toBe(0);
+    expect(secondResult.unchanged).toBe(1);
+    expect(secondResult.removed).toBe(0);
+
+    const afterRows = store.db.prepare(`
+      SELECT id, path, hash, active FROM documents
+      WHERE collection = ?
+      ORDER BY id
+    `).all(collectionName) as { id: number; path: string; hash: string; active: number }[];
+    expect(afterRows).toHaveLength(1);
+    expect(afterRows[0]).toMatchObject({ id: before.id, path: "readme.md", hash: before.hash, active: 1 });
+
+    const vectorCount = store.db.prepare(`
+      SELECT COUNT(*) AS count FROM content_vectors WHERE hash = ?
+    `).get(before.hash) as { count: number };
+    expect(vectorCount.count).toBe(1);
+
+    const ftsRows = store.db.prepare(`
+      SELECT rowid, filepath FROM documents_fts WHERE rowid = ?
+    `).all(before.id) as { rowid: number; filepath: string }[];
+    expect(ftsRows).toEqual([{ rowid: before.id, filepath: "docs/readme.md" }]);
+
+    await cleanupTestDb(store);
+  });
+});
+
 // =============================================================================
 // Index Status Tests
 // =============================================================================
@ -2082,6 +2443,43 @@ describe("Index Status", () => {
    await cleanupTestDb(store);
  });

+  test("embedding health is scoped to the active embed model", async () => {
+    const store = await createTestStore();
+    const collectionName = await createTestCollection();
+    const activeModel = "hf:active/embed-model.gguf";
+    const staleModel = "hf:stale/embed-model.gguf";
+    const now = new Date().toISOString();
+
+    store.llm = { embedModelName: activeModel } as any;
+    store.ensureVecTable(3);
+    await insertTestDocument(store.db, collectionName, { name: "doc1", hash: "hash1" });
+    store.insertEmbedding("hash1", 0, 0, new Float32Array([1, 2, 3]), staleModel, now, 1);
+
+    expect(store.getHashesNeedingEmbedding()).toBe(1);
+    expect(store.getStatus().needsEmbedding).toBe(1);
+    expect(store.getIndexHealth().needsEmbedding).toBe(1);
+    expect(store.getHashesNeedingEmbedding(staleModel)).toBe(0);
+
+    await cleanupTestDb(store);
+  });
+
+  test("embedding health treats stale fingerprints as needing re-embedding", async () => {
+    const store = await createTestStore();
+    const collectionName = await createTestCollection();
+    const model = "hf:test/embed-model.gguf";
+    const now = new Date().toISOString();
+
+    store.llm = { embedModelName: model } as any;
+    store.ensureVecTable(3);
+    await insertTestDocument(store.db, collectionName, { name: "doc1", hash: "hash1" });
+    store.insertEmbedding("hash1", 0, 0, new Float32Array([1, 2, 3]), model, now, 1, "stale1");
+
+    expect(getEmbeddingFingerprint(model)).toMatch(/^[a-f0-9]{6}$/);
+    expect(store.getHashesNeedingEmbedding()).toBe(1);
+
+    await cleanupTestDb(store);
+  });
+
  test("getIndexHealth returns health info", async () => {
    const store = await createTestStore();
    const collectionName = await createTestCollection();
@ -2256,6 +2654,33 @@ describe("Vector Table", () => {

    await cleanupTestDb(store);
  });
+
+  test("insertEmbedding is idempotent for an existing vec0 hash_seq (#598)", async () => {
+    const store = await createTestStore();
+    store.ensureVecTable(2);
+
+    const hash = "existinghashseq";
+    const first = new Float32Array([0.1, 0.2]);
+    const second = new Float32Array([0.3, 0.4]);
+    const now = new Date().toISOString();
+
+    store.db.prepare(`INSERT INTO vectors_vec (hash_seq, embedding) VALUES (?, ?)`).run(`${hash}_0`, first);
+
+    // Reproduces sqlite-vec's broken conflict handling: vec0 does not honor OR REPLACE.
+    expect(() => {
+      store.db.prepare(`INSERT OR REPLACE INTO vectors_vec (hash_seq, embedding) VALUES (?, ?)`).run(`${hash}_0`, second);
+    }).toThrow(/UNIQUE constraint failed/i);
+
+    // QMD must therefore use DELETE + INSERT when upserting the vector row.
+    expect(() => store.insertEmbedding(hash, 0, 0, second, "test-model", now)).not.toThrow();
+
+    const vectorCount = store.db.prepare(`SELECT COUNT(*) AS count FROM vectors_vec WHERE hash_seq = ?`).get(`${hash}_0`) as { count: number };
+    const metadataCount = store.db.prepare(`SELECT COUNT(*) AS count FROM content_vectors WHERE hash = ? AND seq = 0`).get(hash) as { count: number };
+    expect(vectorCount.count).toBe(1);
+    expect(metadataCount.count).toBe(1);
+
+    await cleanupTestDb(store);
+  });
 });

 // =============================================================================
@ -2263,6 +2688,47 @@ describe("Vector Table", () => {
 // =============================================================================

 describe("Integration", () => {
+  test("reindexCollection soft-deletes removed files and preserves inactive content (#585)", async () => {
+    const store = await createTestStore();
+    const collectionDir = await mkdtemp(join(testDir, "orphan-regression-"));
+    const collectionName = "orphan-regression";
+
+    try {
+      for (let i = 1; i <= 5; i++) {
+        await writeFile(join(collectionDir, `doc-${i}.md`), `# Doc ${i}\n\nUnique body ${i}`);
+      }
+
+      await createTestCollection({ pwd: collectionDir, glob: "**/*.md", name: collectionName });
+
+      const initial = await reindexCollection(store, collectionDir, "**/*.md", collectionName);
+      expect(initial.indexed).toBe(5);
+      expect(initial.removed).toBe(0);
+
+      await rm(join(collectionDir, "doc-3.md"));
+      await rm(join(collectionDir, "doc-4.md"));
+      await rm(join(collectionDir, "doc-5.md"));
+
+      const afterDelete = await reindexCollection(store, collectionDir, "**/*.md", collectionName);
+      expect(afterDelete.removed).toBe(3);
+
+      const counts = store.db.prepare(`
+        SELECT
+          SUM(CASE WHEN active = 1 THEN 1 ELSE 0 END) AS active,
+          SUM(CASE WHEN active = 0 THEN 1 ELSE 0 END) AS inactive,
+          COUNT(*) AS total
+        FROM documents
+        WHERE collection = ?
+      `).get(collectionName) as { active: number; inactive: number; total: number };
+      const contentCount = store.db.prepare(`SELECT COUNT(*) AS count FROM content`).get() as { count: number };
+
+      expect(counts).toEqual({ active: 2, inactive: 3, total: 5 });
+      expect(contentCount.count).toBe(5);
+    } finally {
+      await rm(collectionDir, { recursive: true, force: true });
+      await cleanupTestDb(store);
+    }
+  });
+
  test("full document lifecycle: create, search, retrieve", async () => {
    const store = await createTestStore();
    const collectionName = await createTestCollection({ pwd: "/test/notes", glob: "**/*.md" });
@ -2802,6 +3268,219 @@ describe("Embedding batching", () => {
    }
  });

+  test("generateEmbeddings uses the active llm embed model when no explicit model is passed", async () => {
+    const store = await createTestStore();
+    const db = store.db;
+    const fakeLlm = createFakeEmbedLlm();
+    const model = "hf:env/embed-model.gguf";
+
+    setDefaultLlamaCpp(createFakeTokenizer() as any);
+    store.llm = { ...fakeLlm, embedModelName: model } as any;
+
+    try {
+      await insertTestDocument(db, "docs", { name: "one", body: "# One\n\nAlpha" });
+
+      const result = await generateEmbeddings(store);
+
+      expect(result.chunksEmbedded).toBe(1);
+      expect(fakeLlm.embedCalls[0]?.options?.model).toBe(model);
+      expect(fakeLlm.embedBatchModelCalls).toEqual([{ model }]);
+      expect(db.prepare(`SELECT DISTINCT model FROM content_vectors`).all()).toEqual([{ model }]);
+    } finally {
+      setDefaultLlamaCpp(null);
+      await cleanupTestDb(store);
+    }
+  });
+
+  test("generateEmbeddings does not mark a partially embedded multi-chunk document complete", async () => {
+    const store = await createTestStore();
+    const db = store.db;
+    let embedCalls = 0;
+    const fakeLlm = {
+      async embed(_text: string, _options?: { model?: string }) {
+        embedCalls++;
+        return embedCalls === 1
+          ? { embedding: [0.1, 0.2, 0.3], model: "fake-embed" }
+          : null;
+      },
+      async embedBatch(texts: string[], _options?: { model?: string }) {
+        return texts.map((_text, index) => index === 0
+          ? { embedding: [1, 2, 3], model: "fake-embed" }
+          : null
+        );
+      },
+    };
+
+    setDefaultLlamaCpp(createFakeTokenizer() as any);
+    store.llm = fakeLlm as any;
+
+    try {
+      await insertTestDocument(db, "docs", {
+        name: "long-doc",
+        body: "# Long doc\n\n" + "partial embedding regression ".repeat(260),
+      });
+
+      const result = await generateEmbeddings(store);
+
+      expect(result.errors).toBeGreaterThan(0);
+      expect(result.failures?.[0]?.attempts).toBe(3);
+      expect(db.prepare(`SELECT COUNT(*) as count FROM content_vectors`).get()).toEqual({ count: 0 });
+      expect(db.prepare(`SELECT COUNT(*) as count FROM vectors_vec`).get()).toEqual({ count: 0 });
+      expect(store.getHashesNeedingEmbedding()).toBe(1);
+      expect(store.getStatus().needsEmbedding).toBe(1);
+    } finally {
+      setDefaultLlamaCpp(null);
+      await cleanupTestDb(store);
+    }
+  });
+
+  test("generateEmbeddings clears chunk errors after successful retry", async () => {
+    const store = await createTestStore();
+    const db = store.db;
+    const fakeLlm = {
+      async embed(_text: string, _options?: { model?: string }) {
+        return { embedding: [0.1, 0.2, 0.3], model: "fake-embed" };
+      },
+      async embedBatch(texts: string[], _options?: { model?: string }) {
+        return texts.map((_text, index) => index === 0
+          ? { embedding: [1, 2, 3], model: "fake-embed" }
+          : null
+        );
+      },
+    };
+
+    setDefaultLlamaCpp(createFakeTokenizer() as any);
+    store.llm = fakeLlm as any;
+
+    try {
+      await insertTestDocument(db, "docs", {
+        name: "retry-doc",
+        body: "# Retry doc\n\n" + "transient embedding failure ".repeat(260),
+      });
+
+      const result = await generateEmbeddings(store);
+
+      expect(result.errors).toBe(0);
+      expect(result.failures).toEqual([]);
+      expect(db.prepare(`SELECT COUNT(*) as count FROM content_vectors`).get()).toEqual({ count: result.chunksEmbedded });
+      expect(store.getHashesNeedingEmbedding()).toBe(0);
+    } finally {
+      setDefaultLlamaCpp(null);
+      await cleanupTestDb(store);
+    }
+  });
+
+  test("generateEmbeddings opens a long-lived LLM session for embed runs", async () => {
+    const store = await createTestStore();
+    const fakeLlm = createFakeEmbedLlm();
+    const sessionSpy = vi.spyOn(llmModule, "withLLMSessionForLlm");
+
+    setDefaultLlamaCpp(createFakeTokenizer() as any);
+    store.llm = fakeLlm as any;
+
+    try {
+      await insertTestDocument(store.db, "docs", { name: "one", body: "# One\n\nAlpha" });
+
+      await generateEmbeddings(store);
+
+      expect(sessionSpy).toHaveBeenCalledWith(
+        fakeLlm,
+        expect.any(Function),
+        expect.objectContaining({ maxDuration: 30 * 60 * 1000, name: "generateEmbeddings" }),
+      );
+    } finally {
+      sessionSpy.mockRestore();
+      setDefaultLlamaCpp(null);
+      await cleanupTestDb(store);
+    }
+  });
+
+  test("vectorSearchQuery uses the active llm embed model for vector lookups", async () => {
+    const store = await createTestStore();
+    const model = "hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf";
+    const searchVecSpy = vi.fn(async () => [] as SearchResult[]) as any;
+
+    store.db.exec(`CREATE TABLE vectors_vec (hash_seq TEXT PRIMARY KEY, embedding BLOB)`);
+    store.llm = { embedModelName: model } as any;
+    store.searchVec = searchVecSpy as any;
+    store.expandQuery = vi.fn(async () => []) as any;
+
+    try {
+      await vectorSearchQuery(store, "custom query", { limit: 7, minScore: 0 });
+
+      expect(searchVecSpy).toHaveBeenCalledTimes(1);
+      expect(searchVecSpy.mock.calls[0]?.[0]).toBe("custom query");
+      expect(searchVecSpy.mock.calls[0]?.[1]).toBe(model);
+      expect(searchVecSpy.mock.calls[0]?.[2]).toBe(7);
+    } finally {
+      await cleanupTestDb(store);
+    }
+  });
+
+  test("hybridQuery uses the active llm embed model for precomputed vector lookups", async () => {
+    const store = await createTestStore();
+    const model = "hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf";
+    const embedBatchSpy = vi.fn(async (texts: string[]) => texts.map(() => ({
+      embedding: [1, 2, 3],
+      model,
+    })));
+    const searchVecSpy = vi.fn(async () => [] as SearchResult[]) as any;
+
+    store.db.exec(`CREATE TABLE vectors_vec (hash_seq TEXT PRIMARY KEY, embedding BLOB)`);
+    store.llm = {
+      embedModelName: model,
+      embedBatch: embedBatchSpy,
+    } as any;
+    store.searchVec = searchVecSpy as any;
+    store.searchFTS = vi.fn(() => []) as any;
+    store.expandQuery = vi.fn(async () => []) as any;
+
+    try {
+      await hybridQuery(store, "hybrid query", { limit: 5, minScore: 0, skipRerank: true });
+
+      expect(embedBatchSpy).toHaveBeenCalledTimes(1);
+      expect(searchVecSpy).toHaveBeenCalledTimes(1);
+      expect(searchVecSpy.mock.calls[0]?.[0]).toBe("hybrid query");
+      expect(searchVecSpy.mock.calls[0]?.[1]).toBe(model);
+      expect(searchVecSpy.mock.calls[0]?.[5]).toEqual([1, 2, 3]);
+    } finally {
+      await cleanupTestDb(store);
+    }
+  });
+
+  test("structuredSearch uses the active llm embed model for precomputed vector lookups", async () => {
+    const store = await createTestStore();
+    const model = "hf:Qwen/Qwen3-Embedding-0.6B-GGUF/Qwen3-Embedding-0.6B-Q8_0.gguf";
+    const embedBatchSpy = vi.fn(async (texts: string[]) => texts.map(() => ({
+      embedding: [1, 2, 3],
+      model,
+    })));
+    const searchVecSpy = vi.fn(async () => [] as SearchResult[]) as any;
+
+    store.db.exec(`CREATE TABLE vectors_vec (hash_seq TEXT PRIMARY KEY, embedding BLOB)`);
+    store.llm = {
+      embedModelName: model,
+      embedBatch: embedBatchSpy,
+    } as any;
+    store.searchVec = searchVecSpy as any;
+
+    try {
+      await structuredSearch(store, [{ type: "vec", query: "structured query" }], {
+        limit: 5,
+        minScore: 0,
+        skipRerank: true,
+      });
+
+      expect(embedBatchSpy).toHaveBeenCalledTimes(1);
+      expect(searchVecSpy).toHaveBeenCalledTimes(1);
+      expect(searchVecSpy.mock.calls[0]?.[0]).toBe("structured query");
+      expect(searchVecSpy.mock.calls[0]?.[1]).toBe(model);
+      expect(searchVecSpy.mock.calls[0]?.[5]).toEqual([1, 2, 3]);
+    } finally {
+      await cleanupTestDb(store);
+    }
+  });
+
  test("generateEmbeddings rejects invalid batch limits", async () => {
    const store = await createTestStore();

--- a/test/structured-search.test.ts
+++ b/test/structured-search.test.ts
@ -361,17 +361,73 @@ describe("lex query syntax", () => {
      expect(validateSemanticQuery("what is the CAP theorem")).toBeNull();
    });

-    test("rejects negation syntax", () => {
+    test("rejects negation at start of query", () => {
+      expect(validateSemanticQuery("-redis connection pooling")).toContain("Negation");
+    });
+
+    test("rejects negation after space", () => {
      expect(validateSemanticQuery("performance -sports")).toContain("Negation");
+    });
+
+    test("rejects negated quoted phrase", () => {
      expect(validateSemanticQuery('-"exact phrase"')).toContain("Negation");
    });

+    test("rejects multiple negations", () => {
+      expect(validateSemanticQuery("error handling -java -python")).toContain("Negation");
+    });
+
+    test("rejects negation after leading whitespace", () => {
+      expect(validateSemanticQuery("  -term at start")).toContain("Negation");
+    });
+
+    test("rejects negation after tab", () => {
+      expect(validateSemanticQuery("foo\t-bar")).toContain("Negation");
+    });
+
+    test("accepts hyphenated compound words", () => {
+      expect(validateSemanticQuery("long-lived server shared across clients")).toBeNull();
+      expect(validateSemanticQuery("real-time voice processing pipeline")).toBeNull();
+      expect(validateSemanticQuery("how does the rate-limiter handle burst traffic")).toBeNull();
+      expect(validateSemanticQuery("self-hosted deployment options")).toBeNull();
+      expect(validateSemanticQuery("multi-client session architecture")).toBeNull();
+      expect(validateSemanticQuery("cross-platform compatibility")).toBeNull();
+      expect(validateSemanticQuery("non-blocking I/O model")).toBeNull();
+      expect(validateSemanticQuery("in-memory caching strategy")).toBeNull();
+      expect(validateSemanticQuery("write-ahead log for crash recovery")).toBeNull();
+      expect(validateSemanticQuery("copy-on-write semantics")).toBeNull();
+    });
+
+    test("accepts multiple hyphens in a phrase", () => {
+      expect(validateSemanticQuery("state-of-the-art embedding models")).toBeNull();
+      expect(validateSemanticQuery("end-to-end testing")).toBeNull();
+      expect(validateSemanticQuery("man-in-the-middle attack prevention")).toBeNull();
+    });
+
+    test("accepts multiple hyphenated words in one query", () => {
+      expect(validateSemanticQuery("built-in vs add-on features")).toBeNull();
+    });
+
+    test("accepts short hyphenated terms", () => {
+      expect(validateSemanticQuery("A-B testing for ML models")).toBeNull();
+      expect(validateSemanticQuery("e-commerce platform")).toBeNull();
+    });
+
+    test("accepts bare hyphen without word character", () => {
+      expect(validateSemanticQuery("-")).toBeNull();
+    });

    test("accepts hyde-style hypothetical answers", () => {
      expect(validateSemanticQuery(
        "The CAP theorem states that a distributed system cannot simultaneously provide consistency, availability, and partition tolerance."
      )).toBeNull();
    });
+
+    test("accepts hyde with hyphenated words", () => {
+      expect(validateSemanticQuery(
+        "HTTP transport runs a single long-lived daemon shared across all clients, avoiding per-session model re-loading."
+      )).toBeNull();
+    });
  });

  describe("validateLexQuery", () => {
--- a/tsconfig.build.json
+++ b/tsconfig.build.json
@ -4,7 +4,7 @@
    "noEmit": false,
    "outDir": "dist",
    "declaration": true,
-    "noImplicitAny": false
+    "noImplicitAny": true
  },
  "include": ["src/**/*.ts"],
  "exclude": ["src/**/*.test.ts", "src/test-preload.ts", "src/bench-*.ts"]