ai-workspace-services/qmd

Author	SHA1	Message	Date
Kim Junmo	fee576bf98	fix: migrate legacy lowercase paths on reindex When qmd update runs against an index created before case-preservation, documents may exist under lowercase paths (e.g. "skill.md" for a file actually named "SKILL.md"). Add findOrMigrateLegacyDocument() that: - Falls back to a lowercase lookup when the canonical path is not found - Renames the document path in-place via UPDATE OR IGNORE - Manually rebuilds the FTS entry (FTS5 INSERT OR REPLACE does not reliably update existing rows via triggers) - Handles UNIQUE conflicts gracefully (returns null on conflict) Embeddings are keyed by content hash, so the rename preserves all existing vectors — no re-embedding required. Both the CLI indexer and the library reindexer share the same helper, eliminating the duplication that a previous review flagged. Includes integration tests for: successful migration, already-lowercase no-op, and UNIQUE conflict handling.	2026-04-09 08:25:00 +09:00
Kim Junmo	9fb9de4fd2	fix: preserve original case in handelize() The blanket .toLowerCase() in handelize() drops filename casing, which breaks path resolution on case-sensitive filesystems (Linux). Files like README.md, CHANGELOG.md, and SKILL.md become unreachable when the index stores them as readme.md, changelog.md, skill.md. Since FTS5 already performs case-insensitive matching via the unicode61 tokenizer, lowercasing the stored path provides no search benefit — it only corrupts the metadata used to locate files on disk. Remove .toLowerCase() and update all affected test expectations.	2026-04-09 07:59:22 +09:00
Tobi Lutke	66e70c028e	fix(test): reset _productionMode in getDefaultDbPath test Bun runs all test files in a single process, so module-level state leaks between files. The getDefaultDbPath test now resets the _productionMode flag before asserting it throws, fixing the flaky failure on Bun (ubuntu-latest) in CI.	2026-04-05 18:39:51 -04:00
Tobi Lutke	32e504c883	fix(test): remove duplicate path/handelize tests from store.test.ts These tests are already in store.helpers.unit.test.ts. The duplicates in store.test.ts failed in CI because _productionMode module state leaked from earlier tests in the same bun process, causing getDefaultDbPath to return a path instead of throwing.	2026-04-05 18:31:17 -04:00
JohnRichardEnders	50ce17bbfa	feat(llm): resolve models as config > env > default Separate hardcoded default from env var in DEFAULT_EMBED_MODEL so the constructor can resolve: config param > env var > hardcoded default. Also add env var support for QMD_GENERATE_MODEL and QMD_RERANK_MODEL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 18:00:08 -04:00
dan mackinlay	1bada2eba6	Add explicit TTY link output tests	2026-04-05 17:58:09 -04:00
dan mackinlay	06f5642252	Fix stale ls test expectation	2026-04-05 17:56:26 -04:00
dan mackinlay	636631225e	Add clickable OSC8 editor links for CLI search results	2026-04-05 17:56:26 -04:00
James Risberg	33fae1c4f5	chore: migrate AST chunking tests to vitest Replace standalone test-ast-chunking.mjs (823 lines, custom check() harness, invisible to CI) with proper vitest integration tests. All unique assertions preserved; duplicates already in ast.test.ts dropped. Performance benchmarks and real-collection scanner removed (dev tools, not regression tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 17:19:59 -04:00
John R Milinovich	b7a5a86a9b	feat(cli): add `qmd bench` command for search quality benchmarks Adds a benchmark harness that measures search quality across backends. Given a fixture file with queries and expected results, it runs each query through BM25, vector, hybrid (no rerank), and full pipeline, then reports precision@k, recall, MRR, F1, and latency. This is primarily a regression testing tool — users create fixtures for their own vaults to catch quality regressions after config or index changes. Ships with an example fixture against the eval-docs test collection to demonstrate the format. New files: src/bench/bench.ts — main runner src/bench/score.ts — precision, recall, MRR, F1, path matching src/bench/types.ts — fixture and result types src/bench/fixtures/ — example fixture test/bench-score.test.ts — unit tests for scoring (16 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 17:17:59 -04:00
Tobias Lütke	76a2f0fb31	Merge pull request #506 from danmackinlay/fix-505-json-line-output feat: Include line in --json search output # Conflicts: # CHANGELOG.md	2026-04-05 17:16:05 -04:00
Tobias Lütke	9c9de94bd8	fix(handelize): restore lowercase + convert dots to dashes - Restore .toLowerCase() in handelize (was dropped, both test files expected it inconsistently) - Convert dots to dashes in filename body (e.g. v2.0 -> v2-0), keeping only the extension dot. Tobi confirmed this is the intended behavior. - Align both test/store.test.ts and test/store.helpers.unit.test.ts to match (they had diverged, one expected case-preserved, one lowercase) - Adjust 'ensureVecTable recreates' test to expect throw behavior (matches #501 dimension-mismatch fix)	2026-04-05 17:12:53 -04:00
Surma	2de225c9e7	Test nix flake builds in CI (#487 ) * Test nix flake builds in CI * Update outdated bun.lock file * fix: restore toLowerCase() in handelize and update tests * Fix flake to use proper FODs --------- Co-authored-by: Tobias Lütke <tobi@shopify.com>	2026-04-05 16:59:27 -04:00
Tobias Lütke	828823d20a	fix: restore toLowerCase() in handelize + align tests with post-#501 behavior - Restore .toLowerCase() in handelize (was dropped somewhere, tests expect it) - Update dimension-mismatch test to expect throw instead of silent rebuild (matches new behavior from #501) - Fix one stale test expectation for preserved dots in filenames	2026-04-05 16:56:06 -04:00
Antonio Mello	ef062e1b54	fix(multi-get): support brace expansion patterns in glob matching (#424 ) Brace expansion patterns like `{doc1,doc2}.md` or `collection/{a,b}.md` were incorrectly parsed as comma-separated file lists instead of being passed to the glob matcher (picomatch). This happened because the comma-detection heuristic only checked for `*` and `?` but not `{`. Also adds `collection/path` matching in `matchFilesByGlob` so patterns like `my-collection/{file1,file2}.md` work — previously the glob only matched against `qmd://collection/path` (virtual) and `path` (relative to collection root), missing the `collection/path` form. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 16:45:33 -04:00
LJY	698b44fe87	Fix qmd embed model selection (#494 )	2026-04-05 16:45:04 -04:00
Matt Van Horn	1ad3388132	fix(store): preserve underscores in BM25 search terms (#404 ) sanitizeFTS5Term stripped all non-letter/non-number characters including underscores, causing snake_case identifiers like `my_variable` to become `myvariable` and silently fail BM25 matches. Add underscore to the preserved character set in the Unicode regex. Export the function and add unit tests covering snake_case, contractions, punctuation stripping, and unicode. Fixes #305 Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 16:44:14 -04:00
dan mackinlay	c22d00829b	Add line to JSON search output	2026-04-05 10:08:57 +00:00
Tobias Lütke	1fb2e2819e	Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 20:00:49 -04:00
Tobias Lütke	dd27f499c7	Merge pull request #463 from goldsr09/fix/hyphenated-lex-queries Fix hyphenated tokens in FTS5 lex queries	2026-03-28 19:58:22 -04:00
Tobias Lütke	08566ec316	Merge pull request #462 from goldsr09/fix/bm25-field-weights Fix BM25 field weights to include all 3 FTS columns	2026-03-28 19:56:04 -04:00
Tobias Lütke	8d343b9da1	Update handelize tests for case/dot preservation (#475 ) PR #475 changed handelize() to preserve original case and dots, but the tests still expected lowercase output. Update assertions to match the new behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 19:54:18 -04:00
Ryan	7b9bd01226	fix: handle hyphenated tokens in FTS5 lex queries Hyphenated terms like multi-agent, DEC-0054, gpt-4 were being stripped of hyphens and concatenated (e.g., "multiagent") which missed matches. Now they're split into FTS5 phrase queries ("multi agent") so the porter tokenizer matches them correctly.	2026-03-24 20:13:52 -04:00
Ryan	fa214db367	fix: correct BM25 field weights to include all 3 FTS columns The bm25() call only had 2 weights for 3 columns (filepath, title, body), giving body an implicit weight of 0. Add proper weights: filepath=1.5, title=4.0, body=1.0 so title matches are boosted and body content is scored.	2026-03-24 20:12:45 -04:00
James Risberg	244ddf5ecb	feat: AST-aware chunking for code files via tree-sitter Add opt-in AST-aware chunk boundary detection for code files using web-tree-sitter. When enabled with `--chunk-strategy auto`, code files (.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class, and import boundaries instead of arbitrary text positions. Default behavior (`regex`) is unchanged — no surprises on upgrade. In testing on QMD's own codebase, AST mode split 42% fewer function bodies across chunk boundaries compared to regex-only chunking. Usage: qmd embed --chunk-strategy auto qmd query "search terms" --chunk-strategy auto What's included: - Language detection from file extension with support for TypeScript, JavaScript (including arrow functions and function expressions), Python, Go, and Rust - Per-language tree-sitter queries with scored break points aligned to the existing markdown scale (class=100, function=90, type=80, import=60) - AST break points merged with regex break points — highest score wins at each position, so embedded markdown (comments, docstrings) still benefits from regex patterns - Refactored chunking core: chunkDocumentWithBreakPoints() extracted, mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST - ChunkStrategy type ("auto" \| "regex") threaded through generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK - getASTStatus() health check wired into `qmd status` - Parse failures log a warning and fall back to regex — never crash Hardening: - Grammar packages are optionalDependencies with pinned versions to prevent ABI breaks from semver drift - web-tree-sitter is a direct dependency (pinned) - Errors are logged (not silently swallowed) for debuggability - Tested on both Node.js and Bun (Bun is actually faster) Testing: - 26 unit tests (test/ast.test.ts) — all 4 languages, error handling - 7 integration tests (test/store.test.ts) — merge, equivalence, bypass - Standalone test-ast-chunking.mjs with 63 synthetic tests and a real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code) - Validated end-to-end with qmd embed + qmd query on QMD's own codebase - Zero markdown regressions across all test paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 01:22:39 -04:00
Tobias Lütke	5f6821629b	Merge pull request #385 from rymalia/fix/launcher-lockfile-priority fix: prioritize package-lock.json in launcher to prevent Bun false positive	2026-03-14 08:08:03 -04:00
Tobias Lütke	5b48bcb6c1	Merge pull request #389 from sonwr/fix-issue-380-cleanup-no-sqlite-vec fix: skip cleanup when sqlite-vec is unavailable	2026-03-14 08:07:11 -04:00
programcaicai	809aa36172	fix: bound memory usage during embed	2026-03-13 17:39:17 +08:00
sonwr	7df09e8235	fix: skip vector cleanup when sqlite-vec is unavailable	2026-03-12 13:51:20 +00:00
Ryan Malia	28903d8eba	fix: prioritize package-lock.json in launcher to prevent Bun false positive The bin/qmd wrapper checks for bun.lock to select the runtime, but since bun.lock is committed to the repo, source builds using npm install are incorrectly routed to Bun — causing native module ABI mismatches (#381) and sqlite-vec crashes (#380). Add package-lock.json as a higher-priority signal: if it exists, npm installed the dependencies and Node should be used. Also fix cleanupOrphanedVectors() to use the existing isSqliteVecAvailable() guard instead of checking sqlite_master, which can report the virtual table even when the vec0 module isn't loaded. Fixes #381, fixes #380 Continuation of #362 (runtime detection false positives)	2026-03-12 01:46:38 -07:00
nkkko	b16d77146a	feat(skill): install packaged qmd skill	2026-03-10 23:18:15 +01:00
Tobi Lutke	55f16460d0	fix(ci): guard LLM calls in CI and increase test timeouts Add _ciMode flag to LlamaCpp that throws immediately on embedBatch, generate, expandQuery, and rerank when CI=true — prevents silent 30s timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.	2026-03-10 13:28:37 -04:00
Tobi Lutke	ed0249fd6b	fix(test): increase timeout for SDK search tests that trigger LLM expansion These tests load the query expansion model on first call, which consistently exceeds the 30s timeout on CI runners.	2026-03-10 12:59:46 -04:00
Tobi Lutke	c68904fe08	refactor: move CLI and MCP to subdirectories, MCP consumes SDK Move frontends into src/cli/ and src/mcp/ to separate them from the core library. The MCP server is fully rewritten to import only from the SDK (src/index.ts) — zero direct store.ts/collections.ts/llm.ts access. - src/qmd.ts → src/cli/qmd.ts - src/formatter.ts → src/cli/formatter.ts - src/mcp.ts → src/mcp/server.ts (rewritten to use QMDStore SDK) - New src/maintenance.ts: Maintenance class for CLI housekeeping - SDK gains: getDocumentBody(), getDefaultCollectionNames(), extractSnippet/addLineNumbers/DEFAULT_MULTI_GET_MAX_BYTES exports, getDefaultDbPath re-export, InternalStore type export - package.json bin/scripts updated for new paths - All 692 tests pass	2026-03-10 11:39:55 -04:00
Tobi Lutke	839d774a06	feat: redesign SDK search API with unified search() and ExpandedQuery type Replace three separate search methods (query, search, structuredSearch) with a single search(options) that accepts either a query string (auto-expanded) or pre-expanded queries. Add searchLex/searchVector convenience methods and expandQuery for manual control. Unify StructuredSubSearch and ExpandedQuery into a single ExpandedQuery type with { type, query } used throughout the pipeline. Add skipRerank option to hybridQuery and structuredSearch for fast no-LLM searches. New SDK surface: - search({ query, intent, rerank, limit, ... }) - search({ queries: expanded }) - searchLex(query, opts) - searchVector(query, opts) - expandQuery(query, { intent })	2026-03-10 11:04:45 -04:00
Tobi Lutke	040c6fa904	feat: add SDK/library mode for programmatic access Allow QMD to be used as a library (`import { createStore } from '@tobilu/qmd'`) in addition to CLI and MCP modes. The constructor requires explicit dbPath and either a configPath (YAML file) or inline config object — no defaults assumed, making it safe to embed in any application. - Add src/index.ts entry point with QMDStore interface exposing search, retrieval, collection/context management, and index health - Add setConfigSource() to collections.ts for inline config support (in-memory config with no file I/O) - Add main/types/exports fields to package.json - Add SDK documentation section to README - Add 56 unit tests covering constructor, collections, contexts, search, document retrieval, config isolation, YAML persistence, and lifecycle	2026-03-08 15:59:22 -04:00
Tobi Lutke	ad38c1f698	feat: add intent parameter for query disambiguation Add optional `intent` parameter that steers query expansion, reranking, chunk selection, and snippet extraction without searching on its own. When a query like "performance" is ambiguous (web-perf vs team health vs fitness), intent provides background context that disambiguates results across all pipeline stages: - expandQuery: includes intent in LLM prompt ("Query intent: {intent}") - rerank: prepends intent to rerank query for Qwen3-Reranker - chunk selection: intent terms scored at 0.5x weight vs query terms - snippet extraction: intent terms scored at 0.3x weight - strong-signal bypass: disabled when intent provided Available via CLI (--intent flag or intent: line in query documents), MCP (intent field on query tool), and programmatic API. Adapted from PR #180 (thanks @vyalamar).	2026-03-07 19:27:29 -04:00
Tobi Lutke	e3549dab1a	perf(rerank): cap parallelism, deduplicate chunks, cache by content - Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines - Deduplicate identical chunk texts before sending to reranker - Cache rerank scores by chunk content instead of file path — same text from different files now shares a single reranker call - Add truncation cache to avoid re-tokenizing duplicate documents	2026-03-07 15:57:36 -04:00
vyalamar	b068ad0dd6	feat(query): add --explain score traces for hybrid search	2026-03-07 14:35:10 -04:00
Tobias Lütke	8bd93366ad	Merge pull request #228 from amsminn/fix-empty-results-format fix(cli): prevent parser breakage on empty results across output formats	2026-03-07 14:25:16 -04:00
Tobias Lütke	ee08997f23	Merge pull request #313 from 0xble/fix/expand-context-size-config fix(llm): make query expansion context size configurable	2026-03-07 14:25:04 -04:00
Tobias Lütke	a28163fb2c	Merge pull request #304 from sebkouba/feature/collection-ignore feat: add ignore patterns for collections	2026-03-07 14:25:02 -04:00
Tobias Lütke	e6b50cfca9	Merge pull request #308 from debugerman/fix/handelize-emoji-crash fix(store): handle emoji-only filenames in handelize (#302)	2026-03-07 14:24:59 -04:00
Brian Le	0dec1df047	fix(llm): make expansion context size configurable	2026-03-06 16:35:33 -05:00
Brian Le	49d5b4f450	fix(index): deactivate stale docs on empty collection updates	2026-03-06 16:29:52 -05:00
Ning	dc777e3be0	fix(store): handle emoji-only filenames in handelize (#302 ) Convert emoji codepoints to hex representation (e.g. 🐘 → 1f418) instead of crashing, so files like 🐘.md can be indexed without halting the entire update process. Fixes #302	2026-03-06 14:24:24 +08:00
Sebastian Kouba	fde542cd0d	feat: add ignore patterns for collections Add an optional 'ignore' field to collection config that accepts an array of glob patterns to exclude from indexing. This allows collections to skip specific subdirectories without needing separate collections. Example YAML config: personal: path: ~/personal_synced pattern: '*/.md' ignore: - 'Sessions/' - 'archive/' The ignore patterns are passed to fast-glob's ignore option alongside the existing hardcoded excludes (node_modules, .git, etc). Already-indexed files matching new ignore patterns are deactivated on the next update. Changes: - Add ignore?: string[] to Collection interface - Pass ignore patterns through to fast-glob in indexFiles() - Show ignore patterns in collection list/status output - 5 new CLI integration tests covering the feature	2026-03-05 19:17:44 +01:00
CHAEWAN KIM	b024693f5d	Merge branch 'main' into fix-empty-results-format	2026-02-23 22:36:21 -08:00
Tobi Lütke	5233e676d9	fix(rerank): truncate documents exceeding 2048-token context size node-llama-cpp throws a hard error when any document + query + template overhead exceeds the ranking context size. Truncate oversized documents using the rerank model's tokenizer before passing them to rankAll(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 12:41:59 -05:00
Tobi Lutke	64ef25e1f6	Document query grammar and add skill helpers	2026-02-22 13:36:08 -04:00

1 2

68 Commits