When qmd update runs against an index created before case-preservation,
documents may exist under lowercase paths (e.g. "skill.md" for a file
actually named "SKILL.md"). Add findOrMigrateLegacyDocument() that:
- Falls back to a lowercase lookup when the canonical path is not found
- Renames the document path in-place via UPDATE OR IGNORE
- Manually rebuilds the FTS entry (FTS5 INSERT OR REPLACE does not
reliably update existing rows via triggers)
- Handles UNIQUE conflicts gracefully (returns null on conflict)
Embeddings are keyed by content hash, so the rename preserves all
existing vectors — no re-embedding required.
Both the CLI indexer and the library reindexer share the same helper,
eliminating the duplication that a previous review flagged.
Includes integration tests for: successful migration, already-lowercase
no-op, and UNIQUE conflict handling.
Cover ~25 community PRs including embedding stability fixes, BM25
field weight and hyphenation fixes, reranker context sizing, launcher
reliability, XDG compliance, and the --no-rerank flag.
Adds a benchmark harness that measures search quality across backends.
Given a fixture file with queries and expected results, it runs each
query through BM25, vector, hybrid (no rerank), and full pipeline,
then reports precision@k, recall, MRR, F1, and latency.
This is primarily a regression testing tool — users create fixtures
for their own vaults to catch quality regressions after config or
index changes. Ships with an example fixture against the eval-docs
test collection to demonstrate the format.
New files:
src/bench/bench.ts — main runner
src/bench/score.ts — precision, recall, MRR, F1, path matching
src/bench/types.ts — fixture and result types
src/bench/fixtures/ — example fixture
test/bench-score.test.ts — unit tests for scoring (16 tests)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Test nix flake builds in CI
* Update outdated bun.lock file
* fix: restore toLowerCase() in handelize and update tests
* Fix flake to use proper FODs
---------
Co-authored-by: Tobias Lütke <tobi@shopify.com>
Resolve conflicts: combine AST chunking args (filepath, chunkStrategy)
with abort signal parameter from #458.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add opt-in AST-aware chunk boundary detection for code files using
web-tree-sitter. When enabled with `--chunk-strategy auto`, code files
(.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class,
and import boundaries instead of arbitrary text positions. Default
behavior (`regex`) is unchanged — no surprises on upgrade.
In testing on QMD's own codebase, AST mode split 42% fewer function
bodies across chunk boundaries compared to regex-only chunking.
Usage:
qmd embed --chunk-strategy auto
qmd query "search terms" --chunk-strategy auto
What's included:
- Language detection from file extension with support for TypeScript,
JavaScript (including arrow functions and function expressions),
Python, Go, and Rust
- Per-language tree-sitter queries with scored break points aligned to
the existing markdown scale (class=100, function=90, type=80, import=60)
- AST break points merged with regex break points — highest score wins
at each position, so embedded markdown (comments, docstrings) still
benefits from regex patterns
- Refactored chunking core: chunkDocumentWithBreakPoints() extracted,
mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST
- ChunkStrategy type ("auto" | "regex") threaded through
generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK
- getASTStatus() health check wired into `qmd status`
- Parse failures log a warning and fall back to regex — never crash
Hardening:
- Grammar packages are optionalDependencies with pinned versions to
prevent ABI breaks from semver drift
- web-tree-sitter is a direct dependency (pinned)
- Errors are logged (not silently swallowed) for debuggability
- Tested on both Node.js and Bun (Bun is actually faster)
Testing:
- 26 unit tests (test/ast.test.ts) — all 4 languages, error handling
- 7 integration tests (test/store.test.ts) — merge, equivalence, bypass
- Standalone test-ast-chunking.mjs with 63 synthetic tests and a
real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code)
- Validated end-to-end with qmd embed + qmd query on QMD's own codebase
- Zero markdown regressions across all test paths
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bun.lock still resolved better-sqlite3 to 11.x after package.json was
bumped to ^12.4.5 in v2.0.0. This breaks sandboxed builds (e.g. Nix
with bun2nix) where network access is unavailable to resolve the
mismatch.
CI and the publish workflow now use --frozen-lockfile so drift is caught
immediately. The release script also validates lockfile consistency
before tagging.
Closes#386
- Move model info from --help to `qmd status` with live HuggingFace
links derived from actual configured URIs
- Pre-push hook: handle non-interactive shells gracefully, resolve
annotated tags correctly for CI checks
- Add tsc build step (tsconfig.build.json) so npm package ships
compiled JS instead of raw TypeScript requiring tsx at runtime
- Update qmd wrapper and daemon spawn to use dist/qmd.js in
production while keeping tsx for development
- Add self-installing pre-push hook validating v* tag pushes:
package.json version match, changelog entry, CI status
- Add release.sh script that renames [Unreleased] to versioned
entry, bumps package.json, commits, and tags
- Add extract-changelog.sh for cumulative GitHub release notes
- Update publish workflow with build step and GitHub release creation
- Flesh out CHANGELOG.md with full history from 0.1.0 through 1.0.0
in Keep-a-Changelog format with PR/contributor attributions
- Add release standards and changelog guidelines to CLAUDE.md