Commit Graph

21 Commits

Author SHA1 Message Date
Haitao Pan
e3711767c6 fix: disable local qmd models by default 2026-05-23 11:04:48 +08:00
Tobi Lütke
cfd640ed34
fix(test): resolve LLM test timeouts by disabling file parallelism
Parallel test files each cold-load their own LLM model, competing for
CPU and causing timeouts even at 120s. Sequential execution eliminates
contention — tests that timed out at 30s now complete in 1-15s.

Made-with: Cursor
2026-04-11 01:21:22 +00:00
Tobias Lütke
525b9970cd
Merge pull request #546 from junmo-kim/fix/handelize-preserve-case
fix: preserve original case in handelize()
2026-04-10 20:48:24 -04:00
Tobias Lütke
3295294be3
Merge pull request #532 from kuishou68/fix-qmd-uri-index-query
fix: include custom index in qmd:// links
2026-04-10 20:47:55 -04:00
Tobias Lütke
46c4dfdaac
Merge pull request #545 from kuishou68/fix-sqlite-vec-actionable-guidance
fix(store): surface actionable sqlite-vec guidance
2026-04-10 20:47:16 -04:00
Bek
e4990e470e Harden embedding overflow handling 2026-04-10 16:02:46 -04:00
kuishou68
0adbdeb337 fix(store): surface actionable sqlite-vec guidance 2026-04-09 10:13:40 +08:00
Kim Junmo
fee576bf98 fix: migrate legacy lowercase paths on reindex
When qmd update runs against an index created before case-preservation,
documents may exist under lowercase paths (e.g. "skill.md" for a file
actually named "SKILL.md"). Add findOrMigrateLegacyDocument() that:

- Falls back to a lowercase lookup when the canonical path is not found
- Renames the document path in-place via UPDATE OR IGNORE
- Manually rebuilds the FTS entry (FTS5 INSERT OR REPLACE does not
  reliably update existing rows via triggers)
- Handles UNIQUE conflicts gracefully (returns null on conflict)

Embeddings are keyed by content hash, so the rename preserves all
existing vectors — no re-embedding required.

Both the CLI indexer and the library reindexer share the same helper,
eliminating the duplication that a previous review flagged.

Includes integration tests for: successful migration, already-lowercase
no-op, and UNIQUE conflict handling.
2026-04-09 08:25:00 +09:00
cocoon
8404cc3bb1 fix(uri): include index in custom qmd links 2026-04-07 23:26:19 +08:00
Tobi Lutke
32e504c883
fix(test): remove duplicate path/handelize tests from store.test.ts
These tests are already in store.helpers.unit.test.ts. The duplicates
in store.test.ts failed in CI because _productionMode module state
leaked from earlier tests in the same bun process, causing
getDefaultDbPath to return a path instead of throwing.
2026-04-05 18:31:17 -04:00
Tobias Lütke
9c9de94bd8 fix(handelize): restore lowercase + convert dots to dashes
- Restore .toLowerCase() in handelize (was dropped, both test files
  expected it inconsistently)
- Convert dots to dashes in filename body (e.g. v2.0 -> v2-0), keeping
  only the extension dot. Tobi confirmed this is the intended behavior.
- Align both test/store.test.ts and test/store.helpers.unit.test.ts to
  match (they had diverged, one expected case-preserved, one lowercase)
- Adjust 'ensureVecTable recreates' test to expect throw behavior
  (matches #501 dimension-mismatch fix)
2026-04-05 17:12:53 -04:00
Tobias Lütke
828823d20a fix: restore toLowerCase() in handelize + align tests with post-#501 behavior
- Restore .toLowerCase() in handelize (was dropped somewhere, tests expect it)
- Update dimension-mismatch test to expect throw instead of silent rebuild
  (matches new behavior from #501)
- Fix one stale test expectation for preserved dots in filenames
2026-04-05 16:56:06 -04:00
Antonio Mello
ef062e1b54
fix(multi-get): support brace expansion patterns in glob matching (#424)
Brace expansion patterns like `{doc1,doc2}.md` or `collection/{a,b}.md`
were incorrectly parsed as comma-separated file lists instead of being
passed to the glob matcher (picomatch). This happened because the
comma-detection heuristic only checked for `*` and `?` but not `{`.

Also adds `collection/path` matching in `matchFilesByGlob` so patterns
like `my-collection/{file1,file2}.md` work — previously the glob only
matched against `qmd://collection/path` (virtual) and `path` (relative
to collection root), missing the `collection/path` form.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:45:33 -04:00
LJY
698b44fe87
Fix qmd embed model selection (#494) 2026-04-05 16:45:04 -04:00
Tobias Lütke
1fb2e2819e Merge origin/main into feat/ast-aware-chunking
Resolve conflicts: combine AST chunking args (filepath, chunkStrategy)
with abort signal parameter from #458.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 20:00:49 -04:00
Ryan
fa214db367 fix: correct BM25 field weights to include all 3 FTS columns
The bm25() call only had 2 weights for 3 columns (filepath, title, body),
giving body an implicit weight of 0. Add proper weights: filepath=1.5,
title=4.0, body=1.0 so title matches are boosted and body content is scored.
2026-03-24 20:12:45 -04:00
James Risberg
244ddf5ecb feat: AST-aware chunking for code files via tree-sitter
Add opt-in AST-aware chunk boundary detection for code files using
web-tree-sitter. When enabled with `--chunk-strategy auto`, code files
(.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class,
and import boundaries instead of arbitrary text positions. Default
behavior (`regex`) is unchanged — no surprises on upgrade.

In testing on QMD's own codebase, AST mode split 42% fewer function
bodies across chunk boundaries compared to regex-only chunking.

Usage:
  qmd embed --chunk-strategy auto
  qmd query "search terms" --chunk-strategy auto

What's included:
- Language detection from file extension with support for TypeScript,
  JavaScript (including arrow functions and function expressions),
  Python, Go, and Rust
- Per-language tree-sitter queries with scored break points aligned to
  the existing markdown scale (class=100, function=90, type=80, import=60)
- AST break points merged with regex break points — highest score wins
  at each position, so embedded markdown (comments, docstrings) still
  benefits from regex patterns
- Refactored chunking core: chunkDocumentWithBreakPoints() extracted,
  mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST
- ChunkStrategy type ("auto" | "regex") threaded through
  generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK
- getASTStatus() health check wired into `qmd status`
- Parse failures log a warning and fall back to regex — never crash

Hardening:
- Grammar packages are optionalDependencies with pinned versions to
  prevent ABI breaks from semver drift
- web-tree-sitter is a direct dependency (pinned)
- Errors are logged (not silently swallowed) for debuggability
- Tested on both Node.js and Bun (Bun is actually faster)

Testing:
- 26 unit tests (test/ast.test.ts) — all 4 languages, error handling
- 7 integration tests (test/store.test.ts) — merge, equivalence, bypass
- Standalone test-ast-chunking.mjs with 63 synthetic tests and a
  real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code)
- Validated end-to-end with qmd embed + qmd query on QMD's own codebase
- Zero markdown regressions across all test paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 01:22:39 -04:00
programcaicai
809aa36172 fix: bound memory usage during embed 2026-03-13 17:39:17 +08:00
Tobi Lutke
839d774a06
feat: redesign SDK search API with unified search() and ExpandedQuery type
Replace three separate search methods (query, search, structuredSearch)
with a single search(options) that accepts either a query string
(auto-expanded) or pre-expanded queries. Add searchLex/searchVector
convenience methods and expandQuery for manual control.

Unify StructuredSubSearch and ExpandedQuery into a single ExpandedQuery
type with { type, query } used throughout the pipeline. Add skipRerank
option to hybridQuery and structuredSearch for fast no-LLM searches.

New SDK surface:
- search({ query, intent, rerank, limit, ... })
- search({ queries: expanded })
- searchLex(query, opts)
- searchVector(query, opts)
- expandQuery(query, { intent })
2026-03-10 11:04:45 -04:00
Tobi Lutke
e3549dab1a
perf(rerank): cap parallelism, deduplicate chunks, cache by content
- Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines
- Deduplicate identical chunk texts before sending to reranker
- Cache rerank scores by chunk content instead of file path — same text
  from different files now shares a single reranker call
- Add truncation cache to avoid re-tokenizing duplicate documents
2026-03-07 15:57:36 -04:00
Tobi Lutke
870d3aed3b
test: move all tests to flat test/ directory
No more src/models/ and src/integration/ subfolders to forget about.
All 9 test files live in test/, one command runs everything:

  npx vitest run test/
  bun test test/
2026-02-15 21:37:47 -04:00