ai-workspace-services/qmd

Author	SHA1	Message	Date
Tobias Lütke	3295294be3	Merge pull request #532 from kuishou68/fix-qmd-uri-index-query fix: include custom index in qmd:// links	2026-04-10 20:47:55 -04:00
Tobias Lütke	46c4dfdaac	Merge pull request #545 from kuishou68/fix-sqlite-vec-actionable-guidance fix(store): surface actionable sqlite-vec guidance	2026-04-10 20:47:16 -04:00
Bek	e4990e470e	Harden embedding overflow handling	2026-04-10 16:02:46 -04:00
kuishou68	0adbdeb337	fix(store): surface actionable sqlite-vec guidance	2026-04-09 10:13:40 +08:00
Tobias Lütke	171e9e3e65	Merge pull request #530 from kuishou68/fix-status-no-build-probe	2026-04-08 21:19:56 -04:00
Jeff Gardner	1ecb5c9f96	Fix QMD_LLAMA_GPU backend override handling	2026-04-07 18:49:22 +02:00
cocoon	8404cc3bb1	fix(uri): include index in custom qmd links	2026-04-07 23:26:19 +08:00
cocoon	26e3d0c077	fix(status): avoid build attempts during device probe	2026-04-07 23:18:58 +08:00
Tobi Lutke	66e70c028e	fix(test): reset _productionMode in getDefaultDbPath test Bun runs all test files in a single process, so module-level state leaks between files. The getDefaultDbPath test now resets the _productionMode flag before asserting it throws, fixing the flaky failure on Bun (ubuntu-latest) in CI.	2026-04-05 18:39:51 -04:00
Tobi Lutke	32e504c883	fix(test): remove duplicate path/handelize tests from store.test.ts These tests are already in store.helpers.unit.test.ts. The duplicates in store.test.ts failed in CI because _productionMode module state leaked from earlier tests in the same bun process, causing getDefaultDbPath to return a path instead of throwing.	2026-04-05 18:31:17 -04:00
JohnRichardEnders	50ce17bbfa	feat(llm): resolve models as config > env > default Separate hardcoded default from env var in DEFAULT_EMBED_MODEL so the constructor can resolve: config param > env var > hardcoded default. Also add env var support for QMD_GENERATE_MODEL and QMD_RERANK_MODEL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 18:00:08 -04:00
dan mackinlay	1bada2eba6	Add explicit TTY link output tests	2026-04-05 17:58:09 -04:00
dan mackinlay	06f5642252	Fix stale ls test expectation	2026-04-05 17:56:26 -04:00
dan mackinlay	636631225e	Add clickable OSC8 editor links for CLI search results	2026-04-05 17:56:26 -04:00
James Risberg	33fae1c4f5	chore: migrate AST chunking tests to vitest Replace standalone test-ast-chunking.mjs (823 lines, custom check() harness, invisible to CI) with proper vitest integration tests. All unique assertions preserved; duplicates already in ast.test.ts dropped. Performance benchmarks and real-collection scanner removed (dev tools, not regression tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 17:19:59 -04:00
John R Milinovich	b7a5a86a9b	feat(cli): add `qmd bench` command for search quality benchmarks Adds a benchmark harness that measures search quality across backends. Given a fixture file with queries and expected results, it runs each query through BM25, vector, hybrid (no rerank), and full pipeline, then reports precision@k, recall, MRR, F1, and latency. This is primarily a regression testing tool — users create fixtures for their own vaults to catch quality regressions after config or index changes. Ships with an example fixture against the eval-docs test collection to demonstrate the format. New files: src/bench/bench.ts — main runner src/bench/score.ts — precision, recall, MRR, F1, path matching src/bench/types.ts — fixture and result types src/bench/fixtures/ — example fixture test/bench-score.test.ts — unit tests for scoring (16 tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 17:17:59 -04:00
Tobias Lütke	76a2f0fb31	Merge pull request #506 from danmackinlay/fix-505-json-line-output feat: Include line in --json search output # Conflicts: # CHANGELOG.md	2026-04-05 17:16:05 -04:00
Tobias Lütke	9c9de94bd8	fix(handelize): restore lowercase + convert dots to dashes - Restore .toLowerCase() in handelize (was dropped, both test files expected it inconsistently) - Convert dots to dashes in filename body (e.g. v2.0 -> v2-0), keeping only the extension dot. Tobi confirmed this is the intended behavior. - Align both test/store.test.ts and test/store.helpers.unit.test.ts to match (they had diverged, one expected case-preserved, one lowercase) - Adjust 'ensureVecTable recreates' test to expect throw behavior (matches #501 dimension-mismatch fix)	2026-04-05 17:12:53 -04:00
Surma	2de225c9e7	Test nix flake builds in CI (#487 ) * Test nix flake builds in CI * Update outdated bun.lock file * fix: restore toLowerCase() in handelize and update tests * Fix flake to use proper FODs --------- Co-authored-by: Tobias Lütke <tobi@shopify.com>	2026-04-05 16:59:27 -04:00
Tobias Lütke	828823d20a	fix: restore toLowerCase() in handelize + align tests with post-#501 behavior - Restore .toLowerCase() in handelize (was dropped somewhere, tests expect it) - Update dimension-mismatch test to expect throw instead of silent rebuild (matches new behavior from #501) - Fix one stale test expectation for preserved dots in filenames	2026-04-05 16:56:06 -04:00
Antonio Mello	ef062e1b54	fix(multi-get): support brace expansion patterns in glob matching (#424 ) Brace expansion patterns like `{doc1,doc2}.md` or `collection/{a,b}.md` were incorrectly parsed as comma-separated file lists instead of being passed to the glob matcher (picomatch). This happened because the comma-detection heuristic only checked for `*` and `?` but not `{`. Also adds `collection/path` matching in `matchFilesByGlob` so patterns like `my-collection/{file1,file2}.md` work — previously the glob only matched against `qmd://collection/path` (virtual) and `path` (relative to collection root), missing the `collection/path` form. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 16:45:33 -04:00
LJY	698b44fe87	Fix qmd embed model selection (#494 )	2026-04-05 16:45:04 -04:00
Matt Van Horn	1ad3388132	fix(store): preserve underscores in BM25 search terms (#404 ) sanitizeFTS5Term stripped all non-letter/non-number characters including underscores, causing snake_case identifiers like `my_variable` to become `myvariable` and silently fail BM25 matches. Add underscore to the preserved character set in the Unicode regex. Export the function and add unit tests covering snake_case, contractions, punctuation stripping, and unicode. Fixes #305 Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 16:44:14 -04:00
dan mackinlay	c22d00829b	Add line to JSON search output	2026-04-05 10:08:57 +00:00
Tobias Lütke	1fb2e2819e	Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 20:00:49 -04:00
Tobias Lütke	dd27f499c7	Merge pull request #463 from goldsr09/fix/hyphenated-lex-queries Fix hyphenated tokens in FTS5 lex queries	2026-03-28 19:58:22 -04:00
Tobias Lütke	08566ec316	Merge pull request #462 from goldsr09/fix/bm25-field-weights Fix BM25 field weights to include all 3 FTS columns	2026-03-28 19:56:04 -04:00
Tobias Lütke	8d343b9da1	Update handelize tests for case/dot preservation (#475 ) PR #475 changed handelize() to preserve original case and dots, but the tests still expected lowercase output. Update assertions to match the new behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 19:54:18 -04:00
Ryan	7b9bd01226	fix: handle hyphenated tokens in FTS5 lex queries Hyphenated terms like multi-agent, DEC-0054, gpt-4 were being stripped of hyphens and concatenated (e.g., "multiagent") which missed matches. Now they're split into FTS5 phrase queries ("multi agent") so the porter tokenizer matches them correctly.	2026-03-24 20:13:52 -04:00
Ryan	fa214db367	fix: correct BM25 field weights to include all 3 FTS columns The bm25() call only had 2 weights for 3 columns (filepath, title, body), giving body an implicit weight of 0. Add proper weights: filepath=1.5, title=4.0, body=1.0 so title matches are boosted and body content is scored.	2026-03-24 20:12:45 -04:00
James Risberg	244ddf5ecb	feat: AST-aware chunking for code files via tree-sitter Add opt-in AST-aware chunk boundary detection for code files using web-tree-sitter. When enabled with `--chunk-strategy auto`, code files (.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class, and import boundaries instead of arbitrary text positions. Default behavior (`regex`) is unchanged — no surprises on upgrade. In testing on QMD's own codebase, AST mode split 42% fewer function bodies across chunk boundaries compared to regex-only chunking. Usage: qmd embed --chunk-strategy auto qmd query "search terms" --chunk-strategy auto What's included: - Language detection from file extension with support for TypeScript, JavaScript (including arrow functions and function expressions), Python, Go, and Rust - Per-language tree-sitter queries with scored break points aligned to the existing markdown scale (class=100, function=90, type=80, import=60) - AST break points merged with regex break points — highest score wins at each position, so embedded markdown (comments, docstrings) still benefits from regex patterns - Refactored chunking core: chunkDocumentWithBreakPoints() extracted, mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST - ChunkStrategy type ("auto" \| "regex") threaded through generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK - getASTStatus() health check wired into `qmd status` - Parse failures log a warning and fall back to regex — never crash Hardening: - Grammar packages are optionalDependencies with pinned versions to prevent ABI breaks from semver drift - web-tree-sitter is a direct dependency (pinned) - Errors are logged (not silently swallowed) for debuggability - Tested on both Node.js and Bun (Bun is actually faster) Testing: - 26 unit tests (test/ast.test.ts) — all 4 languages, error handling - 7 integration tests (test/store.test.ts) — merge, equivalence, bypass - Standalone test-ast-chunking.mjs with 63 synthetic tests and a real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code) - Validated end-to-end with qmd embed + qmd query on QMD's own codebase - Zero markdown regressions across all test paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 01:22:39 -04:00
Tobias Lütke	5f6821629b	Merge pull request #385 from rymalia/fix/launcher-lockfile-priority fix: prioritize package-lock.json in launcher to prevent Bun false positive	2026-03-14 08:08:03 -04:00
Tobias Lütke	5b48bcb6c1	Merge pull request #389 from sonwr/fix-issue-380-cleanup-no-sqlite-vec fix: skip cleanup when sqlite-vec is unavailable	2026-03-14 08:07:11 -04:00
programcaicai	809aa36172	fix: bound memory usage during embed	2026-03-13 17:39:17 +08:00
sonwr	7df09e8235	fix: skip vector cleanup when sqlite-vec is unavailable	2026-03-12 13:51:20 +00:00
Ryan Malia	28903d8eba	fix: prioritize package-lock.json in launcher to prevent Bun false positive The bin/qmd wrapper checks for bun.lock to select the runtime, but since bun.lock is committed to the repo, source builds using npm install are incorrectly routed to Bun — causing native module ABI mismatches (#381) and sqlite-vec crashes (#380). Add package-lock.json as a higher-priority signal: if it exists, npm installed the dependencies and Node should be used. Also fix cleanupOrphanedVectors() to use the existing isSqliteVecAvailable() guard instead of checking sqlite_master, which can report the virtual table even when the vec0 module isn't loaded. Fixes #381, fixes #380 Continuation of #362 (runtime detection false positives)	2026-03-12 01:46:38 -07:00
nkkko	b16d77146a	feat(skill): install packaged qmd skill	2026-03-10 23:18:15 +01:00
Tobi Lutke	55f16460d0	fix(ci): guard LLM calls in CI and increase test timeouts Add _ciMode flag to LlamaCpp that throws immediately on embedBatch, generate, expandQuery, and rerank when CI=true — prevents silent 30s timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.	2026-03-10 13:28:37 -04:00
Tobi Lutke	ed0249fd6b	fix(test): increase timeout for SDK search tests that trigger LLM expansion These tests load the query expansion model on first call, which consistently exceeds the 30s timeout on CI runners.	2026-03-10 12:59:46 -04:00
Tobi Lutke	c68904fe08	refactor: move CLI and MCP to subdirectories, MCP consumes SDK Move frontends into src/cli/ and src/mcp/ to separate them from the core library. The MCP server is fully rewritten to import only from the SDK (src/index.ts) — zero direct store.ts/collections.ts/llm.ts access. - src/qmd.ts → src/cli/qmd.ts - src/formatter.ts → src/cli/formatter.ts - src/mcp.ts → src/mcp/server.ts (rewritten to use QMDStore SDK) - New src/maintenance.ts: Maintenance class for CLI housekeeping - SDK gains: getDocumentBody(), getDefaultCollectionNames(), extractSnippet/addLineNumbers/DEFAULT_MULTI_GET_MAX_BYTES exports, getDefaultDbPath re-export, InternalStore type export - package.json bin/scripts updated for new paths - All 692 tests pass	2026-03-10 11:39:55 -04:00
Tobi Lutke	839d774a06	feat: redesign SDK search API with unified search() and ExpandedQuery type Replace three separate search methods (query, search, structuredSearch) with a single search(options) that accepts either a query string (auto-expanded) or pre-expanded queries. Add searchLex/searchVector convenience methods and expandQuery for manual control. Unify StructuredSubSearch and ExpandedQuery into a single ExpandedQuery type with { type, query } used throughout the pipeline. Add skipRerank option to hybridQuery and structuredSearch for fast no-LLM searches. New SDK surface: - search({ query, intent, rerank, limit, ... }) - search({ queries: expanded }) - searchLex(query, opts) - searchVector(query, opts) - expandQuery(query, { intent })	2026-03-10 11:04:45 -04:00
Tobi Lutke	040c6fa904	feat: add SDK/library mode for programmatic access Allow QMD to be used as a library (`import { createStore } from '@tobilu/qmd'`) in addition to CLI and MCP modes. The constructor requires explicit dbPath and either a configPath (YAML file) or inline config object — no defaults assumed, making it safe to embed in any application. - Add src/index.ts entry point with QMDStore interface exposing search, retrieval, collection/context management, and index health - Add setConfigSource() to collections.ts for inline config support (in-memory config with no file I/O) - Add main/types/exports fields to package.json - Add SDK documentation section to README - Add 56 unit tests covering constructor, collections, contexts, search, document retrieval, config isolation, YAML persistence, and lifecycle	2026-03-08 15:59:22 -04:00
Tobi Lutke	ad38c1f698	feat: add intent parameter for query disambiguation Add optional `intent` parameter that steers query expansion, reranking, chunk selection, and snippet extraction without searching on its own. When a query like "performance" is ambiguous (web-perf vs team health vs fitness), intent provides background context that disambiguates results across all pipeline stages: - expandQuery: includes intent in LLM prompt ("Query intent: {intent}") - rerank: prepends intent to rerank query for Qwen3-Reranker - chunk selection: intent terms scored at 0.5x weight vs query terms - snippet extraction: intent terms scored at 0.3x weight - strong-signal bypass: disabled when intent provided Available via CLI (--intent flag or intent: line in query documents), MCP (intent field on query tool), and programmatic API. Adapted from PR #180 (thanks @vyalamar).	2026-03-07 19:27:29 -04:00
Tobi Lutke	e3549dab1a	perf(rerank): cap parallelism, deduplicate chunks, cache by content - Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines - Deduplicate identical chunk texts before sending to reranker - Cache rerank scores by chunk content instead of file path — same text from different files now shares a single reranker call - Add truncation cache to avoid re-tokenizing duplicate documents	2026-03-07 15:57:36 -04:00
vyalamar	b068ad0dd6	feat(query): add --explain score traces for hybrid search	2026-03-07 14:35:10 -04:00
Tobias Lütke	8bd93366ad	Merge pull request #228 from amsminn/fix-empty-results-format fix(cli): prevent parser breakage on empty results across output formats	2026-03-07 14:25:16 -04:00
Tobias Lütke	ee08997f23	Merge pull request #313 from 0xble/fix/expand-context-size-config fix(llm): make query expansion context size configurable	2026-03-07 14:25:04 -04:00
Tobias Lütke	a28163fb2c	Merge pull request #304 from sebkouba/feature/collection-ignore feat: add ignore patterns for collections	2026-03-07 14:25:02 -04:00
Tobias Lütke	e6b50cfca9	Merge pull request #308 from debugerman/fix/handelize-emoji-crash fix(store): handle emoji-only filenames in handelize (#302)	2026-03-07 14:24:59 -04:00
Brian Le	0dec1df047	fix(llm): make expansion context size configurable	2026-03-06 16:35:33 -05:00

1 2

74 Commits