Documents previously-undocumented surface area surfaced by onboarding feedback
and the bench discoverability report:
- README: collection filtering (-c semantics), collection show/include/exclude/
update-cmd, --intent/--no-rerank/-C/--full-path, --format <kind> (legacy
output booleans noted as aliases), vector-search/deep-search aliases, embed
memory flags, a sample --explain trace, MCP tool parameter reference, qmd
doctor/init, get :from:count + --no-line-numbers, and a Benchmarking section
for qmd bench.
- README: removed the misleading `qmd update --pull` example; --pull is parsed
but never consumed, so it points to `qmd collection update-cmd` (the real
per-collection pre-reindex mechanism) instead.
- docs/SYNTAX.md: drop the non-existent `q` MCP parameter (the query tool/REST
endpoint accept only `searches`); add a Scoping section.
- server.ts: buildInstructions now advertises the plural `collections` parameter
to match the schema (singular was silently stripped, yielding unscoped
results), and the `get` instruction documents the full file.md:from:count
range suffix instead of only file.md:100.
Refs #25, #181, #217, #372, #520, #576
On macOS /tmp is a symlink to /private/tmp. mkdtemp returns /tmp/...
but getRealPath(resolve(pwd)) in collectionAdd resolves symlinks and
stores /private/tmp/... in the DB. toVirtualPath and --full-path
resolution then fail because the test-side collectionDir path doesn't
match the DB-side path. Fix by calling realpathSync on the test
collectionDir before passing it to CLI commands and assertions.
Filenames with special characters (#, &, spaces, [], (), etc.) now
round-trip correctly through index → search → get → full-path.
Root cause: reindexCollection() called handelize() on the relative path
before storing it in documents.path, turning
'# Meeting - 234232 3432 __ 5.md' → 'Meeting-234232-3432-5.md'
This broke all downstream operations that needed to reconstruct the
real filesystem path from the DB record.
Changes:
- Remove handelize() from reindexCollection() in store.ts (index time)
- Remove handelize() from update command path in cli/qmd.ts
- findOrMigrateLegacyDocument now tries both raw path and handalized
variant so existing indexes auto-migrate on next qmd update
- resolveVirtualPath, toVirtualPath, detectCollectionFromPath all work
correctly once the DB stores literal paths
Tests (test/path-fidelity.test.ts — 10/10):
- Store level: DB contains literal paths, not handalized slugs
- toVirtualPath returns non-null for crazy-named files
- (1) search --json file field shows literal path
- (2) get --full-path resolves to a real on-disk path
- (3) get <actual-fs-path> finds the document
- (3b) subdir file with crazy name also works
- (4) ls shows literal paths
- (5) search docid can be fetched back
- Normal filenames still work (regression)
- Migration: qmd update on handalized index rewrites paths to literal
The libggml-metal static destructor asserts on a non-empty residency-set
collection during __cxa_finalize_ranges, dumping a multi-kB GGML backtrace
after successful output (ggml-org/llama.cpp#22593, one-line fix open as
PR #22595). The assertion only trips when process.exit() skips Node's
beforeExit hook — which is exactly the hook node-llama-cpp registers to
auto-dispose its native handles.
Primary fix: finishSuccessfulCliCommand now sets process.exitCode = 0
and returns instead of calling process.exit(0). The event loop drains,
beforeExit fires, native Metal resources tear down in order, and the
process exits cleanly even without the workaround env var.
Defense-in-depth retained: bin/qmd and scripts/test-all.mjs still export
GGML_METAL_NO_RESIDENCY=1 on darwin for error paths and tests that
terminate via process.exit(). Opt back in with QMD_METAL_KEEP_RESIDENCY=1.
Also: correct upstream issue refs (was #17869 → now #22593/#22595).
Add scripts/repro-metal-rsets-crash.mjs minimal reproduction.
The libggml-metal static device destructor asserts on a non-empty
residency set during libc `exit()` → `__cxa_finalize_ranges`
(ggml-org/llama.cpp#17869). The residency set's 180 s keep_alive timer
hasn't expired by exit, so `GGML_ASSERT([rsets->data count] == 0)`
fails and `ggml_abort` dumps a multi-kB backtrace to stderr after the
user-visible output. Every llama-using CLI command (`query`,
`vsearch`, `embed`) was affected, plus the `bun test` runner.
No JS-side dispose path can prevent it: the static destructor runs
after every JS-reachable cleanup, and Node's `reallyExit` calls libc
`exit()` not `_exit()` (verified in node/src/api/environment.cc),
so it does NOT skip C++ static destructors as we'd assumed.
The actual fix is to disable residency sets via
`GGML_METAL_NO_RESIDENCY=1` before the native binding loads. For
QMD's short-lived CLI workflow there's no measurable cost
(benchmarked: identical wall time with and without on M3 Pro).
Three propagation points are needed:
- `bin/qmd` exports the env var before spawning node/bun. This
covers all production CLI invocations.
- `src/test-preload.ts` mirrors the launcher for `bun test` runs.
Bun does NOT sync `process.env` mutations to libc `setenv()`
(verified empirically — Node does, via uv_os_setenv), so on Bun we
reach for `bun:ffi` to call `setenv()` directly. vitest forks
per-test-file so its parent never loads the binding.
- `qmd doctor` reports the mitigation state via the new
`isDarwinMetalMitigationActive()` predicate so users can verify it
in their environment.
Opt back in with `QMD_METAL_KEEP_RESIDENCY=1` (long-lived qmd
processes, MCP daemon hot reload, upstream fix triage). The old
`QMD_DISABLE_DARWIN_QUERY_JSON_SAFE_EXIT` is removed — its per-command
bypass mechanism didn't actually work on Node (it called
`process.reallyExit` which goes through libc exit) and is fully
replaced by the launcher env var.
Removed the old broken `installDarwinExitGuard()` mechanism from
LlamaCpp; kept the function name as a no-op shim for back-compat.
Source-mode runner selection now mirrors the dist-mode 'npm priority' rule:
if both package-lock.json and bun.lock are present in the package root,
use Node + tsx instead of Bun. pnpm/npm installs ship Node-ABI native
modules (better-sqlite3, sqlite-vec), and routing through Bun produces
ABI mismatches.
This also fixes pnpm-global installs, which copy the entire working tree
(including .git and bun.lock) into <prefix>/node_modules/@tobilu/qmd/.
The old logic saw .git + bun.lock + bun-on-PATH and routed to Bun
against the Node-installed native modules.
Adds a regression test covering the both-lockfiles source-checkout case.
--full-path now ./-prefixes any path that resolves under $PWD, both for
search/query results and for get/multi-get headers. This makes the
output unambiguously a filesystem path — a bare 'notes/foo.md' could be
misread as a collection-relative qmd:// fragment, but './notes/foo.md'
cannot. Absolute realpaths (when the file is outside $PWD) are
unchanged. Extracted as renderFullPath() and reused across the three
call sites so the policy stays consistent.
New --format <kind> flag selects output format for search/query and
multi-get (cli|json|csv|md|xml|files). The legacy boolean aliases
(--json/--csv/--md/--xml/--files) still work for back-compat but are
removed from --help; the skill is updated to use --format.
ANSI colors and OSC 8 hyperlinks are already gated on process.stdout
.isTTY, so piped/agentic invocations get clean plain-text output with
no escape sequences. Verified via od -c on a piped 'qmd search' run.
`qmd://` URIs remain the default identifier in search and query output
(across all formats: cli, --json, --md, --csv, --xml, --files). The
default CLI view now consistently prints the full qmd:// URI as the
visible label so it can be piped straight into `qmd get`, and --md
output gains a **file:** line for the same reason.
--full-path (already on get/multi-get) now also applies to search and
query: the per-result label becomes the file's on-disk path — relative
to $PWD when the file is in a subfolder of the current directory,
absolute realpath otherwise — and the per-result #docid is dropped
because the path is the identifier. Falls back to qmd:// when the file
is no longer resolvable on disk.
Also locks in @@ -line,count @@ header arithmetic with a regression test
that mirrors the user-reported 77-line / '1 before, 72 after' scenario.
- Make structured `qmd query` with intent:/lex:/vec:/hyde: the default
search mode, and emphasize that the caller authors the expansion
rather than leaning on the built-in query-expansion model.
- Tell the caller to cite the #docid and exact line numbers now
printed by get/multi-get, and to slice files with the :from:count
suffix or --from/-l instead of piping through sed/head/tail.
- Document --full-path for handing the on-disk path to editor tools.
- Bump skill version to 2.2.0 and record the behavior changes under
## [Unreleased] in CHANGELOG.md.
- Update the package smoke test that pinned the old 'structured
queries' wording to match the new, more specific intro phrasing.
Redesign the get/multi-get retrieval surface so callers can cite what
they retrieved and request follow-up slices without piping through sed:
- Output is line-numbered by default; opt out with --no-line-numbers.
- Header always identifies the document by qmd:// path + #docid. The
MCP get/multi_get tools default lineNumbers=true to match.
- qmd get and the MCP get tool accept a :from:count suffix on a path
or docid (e.g. '#abc123:120:40' reads 40 lines from line 120).
Explicit --from/-l flags still override the suffix.
- qmd multi-get now includes #docid in every output format (--md,
--json, --csv, --xml, --files, default CLI), matching qmd search.
- New --full-path flag swaps the qmd:// + docid header for the
document's on-disk path (handy for piping into Read/Edit/editors);
falls back to the canonical header when the file no longer exists.