ai-workspace-services/qmd

Author	SHA1	Message	Date
Bek	e4990e470e	Harden embedding overflow handling	2026-04-10 16:02:46 -04:00
Tobi Lutke	32e504c883	fix(test): remove duplicate path/handelize tests from store.test.ts These tests are already in store.helpers.unit.test.ts. The duplicates in store.test.ts failed in CI because _productionMode module state leaked from earlier tests in the same bun process, causing getDefaultDbPath to return a path instead of throwing.	2026-04-05 18:31:17 -04:00
Tobias Lütke	9c9de94bd8	fix(handelize): restore lowercase + convert dots to dashes - Restore .toLowerCase() in handelize (was dropped, both test files expected it inconsistently) - Convert dots to dashes in filename body (e.g. v2.0 -> v2-0), keeping only the extension dot. Tobi confirmed this is the intended behavior. - Align both test/store.test.ts and test/store.helpers.unit.test.ts to match (they had diverged, one expected case-preserved, one lowercase) - Adjust 'ensureVecTable recreates' test to expect throw behavior (matches #501 dimension-mismatch fix)	2026-04-05 17:12:53 -04:00
Tobias Lütke	828823d20a	fix: restore toLowerCase() in handelize + align tests with post-#501 behavior - Restore .toLowerCase() in handelize (was dropped somewhere, tests expect it) - Update dimension-mismatch test to expect throw instead of silent rebuild (matches new behavior from #501) - Fix one stale test expectation for preserved dots in filenames	2026-04-05 16:56:06 -04:00
Antonio Mello	ef062e1b54	fix(multi-get): support brace expansion patterns in glob matching (#424 ) Brace expansion patterns like `{doc1,doc2}.md` or `collection/{a,b}.md` were incorrectly parsed as comma-separated file lists instead of being passed to the glob matcher (picomatch). This happened because the comma-detection heuristic only checked for `*` and `?` but not `{`. Also adds `collection/path` matching in `matchFilesByGlob` so patterns like `my-collection/{file1,file2}.md` work — previously the glob only matched against `qmd://collection/path` (virtual) and `path` (relative to collection root), missing the `collection/path` form. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 16:45:33 -04:00
LJY	698b44fe87	Fix qmd embed model selection (#494 )	2026-04-05 16:45:04 -04:00
Tobias Lütke	1fb2e2819e	Merge origin/main into feat/ast-aware-chunking Resolve conflicts: combine AST chunking args (filepath, chunkStrategy) with abort signal parameter from #458. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 20:00:49 -04:00
Ryan	fa214db367	fix: correct BM25 field weights to include all 3 FTS columns The bm25() call only had 2 weights for 3 columns (filepath, title, body), giving body an implicit weight of 0. Add proper weights: filepath=1.5, title=4.0, body=1.0 so title matches are boosted and body content is scored.	2026-03-24 20:12:45 -04:00
James Risberg	244ddf5ecb	feat: AST-aware chunking for code files via tree-sitter Add opt-in AST-aware chunk boundary detection for code files using web-tree-sitter. When enabled with `--chunk-strategy auto`, code files (.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class, and import boundaries instead of arbitrary text positions. Default behavior (`regex`) is unchanged — no surprises on upgrade. In testing on QMD's own codebase, AST mode split 42% fewer function bodies across chunk boundaries compared to regex-only chunking. Usage: qmd embed --chunk-strategy auto qmd query "search terms" --chunk-strategy auto What's included: - Language detection from file extension with support for TypeScript, JavaScript (including arrow functions and function expressions), Python, Go, and Rust - Per-language tree-sitter queries with scored break points aligned to the existing markdown scale (class=100, function=90, type=80, import=60) - AST break points merged with regex break points — highest score wins at each position, so embedded markdown (comments, docstrings) still benefits from regex patterns - Refactored chunking core: chunkDocumentWithBreakPoints() extracted, mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST - ChunkStrategy type ("auto" \| "regex") threaded through generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK - getASTStatus() health check wired into `qmd status` - Parse failures log a warning and fall back to regex — never crash Hardening: - Grammar packages are optionalDependencies with pinned versions to prevent ABI breaks from semver drift - web-tree-sitter is a direct dependency (pinned) - Errors are logged (not silently swallowed) for debuggability - Tested on both Node.js and Bun (Bun is actually faster) Testing: - 26 unit tests (test/ast.test.ts) — all 4 languages, error handling - 7 integration tests (test/store.test.ts) — merge, equivalence, bypass - Standalone test-ast-chunking.mjs with 63 synthetic tests and a real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code) - Validated end-to-end with qmd embed + qmd query on QMD's own codebase - Zero markdown regressions across all test paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 01:22:39 -04:00
programcaicai	809aa36172	fix: bound memory usage during embed	2026-03-13 17:39:17 +08:00
Tobi Lutke	839d774a06	feat: redesign SDK search API with unified search() and ExpandedQuery type Replace three separate search methods (query, search, structuredSearch) with a single search(options) that accepts either a query string (auto-expanded) or pre-expanded queries. Add searchLex/searchVector convenience methods and expandQuery for manual control. Unify StructuredSubSearch and ExpandedQuery into a single ExpandedQuery type with { type, query } used throughout the pipeline. Add skipRerank option to hybridQuery and structuredSearch for fast no-LLM searches. New SDK surface: - search({ query, intent, rerank, limit, ... }) - search({ queries: expanded }) - searchLex(query, opts) - searchVector(query, opts) - expandQuery(query, { intent })	2026-03-10 11:04:45 -04:00
Tobi Lutke	e3549dab1a	perf(rerank): cap parallelism, deduplicate chunks, cache by content - Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines - Deduplicate identical chunk texts before sending to reranker - Cache rerank scores by chunk content instead of file path — same text from different files now shares a single reranker call - Add truncation cache to avoid re-tokenizing duplicate documents	2026-03-07 15:57:36 -04:00
Tobi Lutke	870d3aed3b	test: move all tests to flat test/ directory No more src/models/ and src/integration/ subfolders to forget about. All 9 test files live in test/, one command runs everything: npx vitest run test/ bun test test/	2026-02-15 21:37:47 -04:00

13 Commits