The MCP status tool shows collection paths but not names,
making it impossible for agents to discover valid collection
filter values. The CLI 'qmd status' already shows names.
Add col.name prefix to each collection line in the status
tool response.
Resolve conflicts: combine AST chunking args (filepath, chunkStrategy)
with abort signal parameter from #458.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflict: use CTE approach from #455 with updated BM25
weights (1.5, 4.0, 1.0) from #462.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR #475 changed handelize() to preserve original case and dots,
but the tests still expected lowercase output. Update assertions
to match the new behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The MCP query tool always ran LLM reranking, even for lex-only queries.
On CPU-only infrastructure (e.g. Railway), the reranker adds 60-120s
per query. The SDK and CLI already support skipping reranking, but the
MCP server did not expose this option.
Add a `rerank` boolean parameter (default: true) to the MCP query
tool's input schema, forwarded to store.search() as the existing
`rerank` option.
Fixes#477
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The handelize() regex replaced all non-letter/non-number chars with
dashes, including dots in the filename stem. This mangled session
filenames like "topic-1773595309.753009.md" to "topic-1773595309-753009.md",
breaking memory_get path resolution (file not found on disk).
Fix: add dot to the preserved character class in the filename regex.
After deploying, run qmd-reindex.sh to rebuild indexes with correct paths.
Hyphenated terms like multi-agent, DEC-0054, gpt-4 were being stripped
of hyphens and concatenated (e.g., "multiagent") which missed matches.
Now they're split into FTS5 phrase queries ("multi agent") so the porter
tokenizer matches them correctly.
The bm25() call only had 2 weights for 3 columns (filepath, title, body),
giving body an implicit weight of 0. Add proper weights: filepath=1.5,
title=4.0, body=1.0 so title matches are boosted and body content is scored.
After the session's max duration timer fires (30 min), the embedding loop
continued iterating over all remaining chunks. Each embed call threw
SessionReleasedError, was caught, incremented errors, and the loop moved
to the next chunk — burning 100% CPU for days with zero useful output.
Three targeted fixes:
1. Check session.isValid before each batch iteration in the embedding loop,
breaking early when the session has been aborted.
2. Pass the session's AbortSignal to chunkDocumentByTokens so tokenization
also respects session expiry instead of running unbounded.
3. Add an error-rate circuit breaker: if >80% of processed chunks fail,
abort early rather than grinding through the remaining work.
Fixes#440
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sqlite-vec's vec0 virtual tables silently ignore the OR REPLACE conflict
clause. When a crash interrupts embedding mid-way, chunks that were
inserted into vectors_vec but not content_vectors get re-selected by
getHashesForEmbedding, causing a UNIQUE constraint error on re-embed.
Two changes:
1. Insert content_vectors first so getHashesForEmbedding won't re-select
the hash if a crash occurs between the two inserts.
2. Use DELETE + INSERT for vectors_vec instead of INSERT OR REPLACE.
Fixes#445
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MODEL_CACHE_DIR was hardcoded to ~/.cache/qmd/models/, ignoring the
XDG_CACHE_HOME environment variable. This was inconsistent with the rest
of the codebase (store.ts, cli/qmd.ts) which already respects XDG paths.
Fixes#425
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When searchFTS combines FTS5 MATCH with a collection filter (d.collection = ?)
in the same WHERE clause, SQLite's query planner abandons the FTS5 index and
falls back to a full scan. This turns an 8ms query into a 17+ second query on
large collections (16K+ documents).
The fix wraps the FTS5 query in a CTE so it runs first with proper index usage,
then filters by collection on the materialized results.
Benchmarks on a 16,258-document collection:
Before: qmd search "knowctl" -c <collection> → 19.8s
After: qmd search "knowctl" -c <collection> → 0.4s
The CTE fetches limit*10 candidates from the FTS index to ensure enough results
survive collection filtering. Without a collection filter, the query plan was
already optimal, so no CTE overhead is added in that case.
Add opt-in AST-aware chunk boundary detection for code files using
web-tree-sitter. When enabled with `--chunk-strategy auto`, code files
(.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class,
and import boundaries instead of arbitrary text positions. Default
behavior (`regex`) is unchanged — no surprises on upgrade.
In testing on QMD's own codebase, AST mode split 42% fewer function
bodies across chunk boundaries compared to regex-only chunking.
Usage:
qmd embed --chunk-strategy auto
qmd query "search terms" --chunk-strategy auto
What's included:
- Language detection from file extension with support for TypeScript,
JavaScript (including arrow functions and function expressions),
Python, Go, and Rust
- Per-language tree-sitter queries with scored break points aligned to
the existing markdown scale (class=100, function=90, type=80, import=60)
- AST break points merged with regex break points — highest score wins
at each position, so embedded markdown (comments, docstrings) still
benefits from regex patterns
- Refactored chunking core: chunkDocumentWithBreakPoints() extracted,
mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST
- ChunkStrategy type ("auto" | "regex") threaded through
generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK
- getASTStatus() health check wired into `qmd status`
- Parse failures log a warning and fall back to regex — never crash
Hardening:
- Grammar packages are optionalDependencies with pinned versions to
prevent ABI breaks from semver drift
- web-tree-sitter is a direct dependency (pinned)
- Errors are logged (not silently swallowed) for debuggability
- Tested on both Node.js and Bun (Bun is actually faster)
Testing:
- 26 unit tests (test/ast.test.ts) — all 4 languages, error handling
- 7 integration tests (test/store.test.ts) — merge, equivalence, bypass
- Standalone test-ast-chunking.mjs with 63 synthetic tests and a
real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code)
- Validated end-to-end with qmd embed + qmd query on QMD's own codebase
- Zero markdown regressions across all test paths
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Default 2048 was too small for longer documents (session transcripts, CJK
text, large markdown files). After truncation the Qwen3 reranker template
adds more overhead than the original 200-token estimate, causing node-llama-cpp
to throw 'input lengths exceed context size'.
Fixes: tobi/qmd#91tobi/qmd#290tobi/qmd#291tobi/qmd#314
- Add missing subprocess import (NameError on any quantize path)
- Replace broken optimum-cli quantize calls with direct onnxruntime:
Q4 uses MatMulNBitsQuantizer, Q8 uses quantize_dynamic
- Add onnxconverter-common to deps for FP16 (was silently swallowed)
- Make FP16 fail loudly on missing dep instead of silently uploading FP32
- README and transformers_js_config now reflect actual quantize_type
instead of always hardcoding Q4
- Remove dead _convert_fp16_external function
- Use no_post_process=True for ONNX export to avoid protobuf serialize error
- Add --validate and --validate-only flags for inference verification
- Fix position_ids in validation feed (required by Qwen3 ONNX export)
- Use optimum-cli for quantization to handle external data format
- Fix optimum dependency to optimum[onnxruntime]
Tested: export + validation passes on CPU, KV cache present (56 tensors).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add convert_onnx.py that mirrors convert_gguf.py's structure:
- Loads base Qwen3 model, merges SFT + GRPO adapters
- Exports to ONNX via Optimum (text-generation-with-past task)
- Supports Q4 (MatMulNBits), Q8, FP16, and FP32 output
- Uploads to separate HF repo (e.g. tobil/qmd-query-expansion-1.7B-ONNX)
- Writes Transformers.js compatibility config
- Includes model card with usage example
Usage:
uv run convert_onnx.py --size 1.7B
uv run convert_onnx.py --size 1.7B --quantize q4 --no-upload
Also adds `just convert-onnx` and `just convert-gguf` tasks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bun.lock still resolved better-sqlite3 to 11.x after package.json was
bumped to ^12.4.5 in v2.0.0. This breaks sandboxed builds (e.g. Nix
with bun2nix) where network access is unavailable to resolve the
mismatch.
CI and the publish workflow now use --frozen-lockfile so drift is caught
immediately. The release script also validates lockfile consistency
before tagging.
Closes#386
When a chunk exceeds the embedding model's context window (trainContextSize),
node-llama-cpp's getEmbeddingFor() triggers a native SIGABRT in GGML/Metal,
crashing the entire process.
Fix: Add truncateToContextSize() guard in embed() and embedBatch() that uses
the model's own tokenizer to check token count before calling getEmbeddingFor().
Oversized text is truncated to (trainContextSize - 4) tokens with a warning,
preserving partial embedding coverage instead of crashing.
Fixes#303
The bin/qmd wrapper checks for bun.lock to select the runtime, but since
bun.lock is committed to the repo, source builds using npm install are
incorrectly routed to Bun — causing native module ABI mismatches (#381)
and sqlite-vec crashes (#380).
Add package-lock.json as a higher-priority signal: if it exists, npm
installed the dependencies and Node should be used. Also fix
cleanupOrphanedVectors() to use the existing isSqliteVecAvailable()
guard instead of checking sqlite_master, which can report the virtual
table even when the vec0 module isn't loaded.
Fixes#381, fixes#380
Continuation of #362 (runtime detection false positives)
The caret range ^4.2.1 allows npm to resolve zod 4.3.x, which has
breaking type changes against @modelcontextprotocol/sdk. Source builds
fail with TypeScript errors. Pinning to exact 4.2.1 resolves this.
See: https://github.com/tobi/qmd/issues/379
On macOS, bun:sqlite uses Apple's system SQLite which is compiled with
SQLITE_OMIT_LOAD_EXTENSION, preventing sqlite-vec from loading. The v2.0
refactor also silently swallowed extension loading failures, losing the
actionable error messages that existed pre-2.0.
- Call Database.setCustomSQLite() on macOS to use Homebrew's SQLite
- Eagerly validate extension loading at init, not at first query
- Throw with platform-specific fix instructions in loadSqliteVec()
- Log warning in store.ts instead of silently catching
Fixes#363
On WSL, paths like /c/work/... are valid drvfs mount points, not Git
Bash drive-letter shortcuts. The existing code in isAbsolutePath() and
resolve() detected /c/ as a Windows C: path, converting drvfs paths to
C:/work/... which broke indexing entirely.
Fix: detect WSL via WSL_DISTRO_NAME or WSL_INTEROP environment variables
and skip the Git Bash /c/ -> C: branch on WSL. Native Linux path handling
continues as before.
Exposes the existing skipRerank option as a --no-rerank CLI flag for
qmd query. On CPU-only machines, reranking takes 120s+ for 20 chunks -
this flag lets users get RRF-fused results without the reranking penalty.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>