Commit Graph

86 Commits

Author SHA1 Message Date
Haitao Pan
47bd3ded44 feat(pg): add switchable PostgreSQL backend + OpenClaw/Hermes memory bridge
Add an optional PostgreSQL backend (QMD_BACKEND=pg) alongside the
unchanged default SQLite path. PG store uses pgvector (HNSW) for vectors
and pg_jieba + pg_trgm for full-text/Chinese tokenization, with a
namespace column isolating multi-agent memory (openclaw/hermes).

- src/pg/: config, db-pg, schema bootstrap, memory store
- MCP memory_add/memory_search/memory_get tools; qmd pg status + memory CLI
- connection via QMD_PG_URL/DATABASE_URL/qmd config, stunnel TLS 5443
- tests: pg-config (unit) + pg-memory integration (gated on QMD_PG_URL) + pg-compose
- docs/plan: plan, usage, test report, changelog; track docs/**/*.md

SQLite path: zero regression (typecheck clean, 249 passed / 6 skipped).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 19:13:04 +08:00
Haitao Pan
77024f7904 feat: add NVIDIA embedding API support and QMD remote sync 2026-06-12 07:32:43 +08:00
Haitao Pan
e3711767c6 fix: disable local qmd models by default 2026-05-23 11:04:48 +08:00
Haitao Pan
7c17c8bcce feat: default to NVIDIA embeddings 2026-05-09 16:50:04 +08:00
Haitao Pan
fbad5791e3 feat: support NVIDIA embedding API 2026-05-09 16:44:47 +08:00
Haitao Pan
49fc83ebe2 Default embeddings to external API 2026-05-07 16:19:18 +08:00
Tobias Lütke
e8de7cab02 fix(cli): make status device probe opt-in 2026-04-21 21:45:52 -04:00
Tobi Lütke
cfd640ed34
fix(test): resolve LLM test timeouts by disabling file parallelism
Parallel test files each cold-load their own LLM model, competing for
CPU and causing timeouts even at 120s. Sequential execution eliminates
contention — tests that timed out at 30s now complete in 1-15s.

Made-with: Cursor
2026-04-11 01:21:22 +00:00
Tobias Lütke
525b9970cd
Merge pull request #546 from junmo-kim/fix/handelize-preserve-case
fix: preserve original case in handelize()
2026-04-10 20:48:24 -04:00
Tobias Lütke
3295294be3
Merge pull request #532 from kuishou68/fix-qmd-uri-index-query
fix: include custom index in qmd:// links
2026-04-10 20:47:55 -04:00
Tobias Lütke
46c4dfdaac
Merge pull request #545 from kuishou68/fix-sqlite-vec-actionable-guidance
fix(store): surface actionable sqlite-vec guidance
2026-04-10 20:47:16 -04:00
Bek
e4990e470e Harden embedding overflow handling 2026-04-10 16:02:46 -04:00
Kim Junmo
bb5becaf81 Merge remote-tracking branch 'origin/main' into fix/handelize-preserve-case
# Conflicts:
#	CHANGELOG.md
2026-04-09 18:27:16 +09:00
kuishou68
0adbdeb337 fix(store): surface actionable sqlite-vec guidance 2026-04-09 10:13:40 +08:00
Tobias Lütke
171e9e3e65
Merge pull request #530 from kuishou68/fix-status-no-build-probe 2026-04-08 21:19:56 -04:00
Kim Junmo
fee576bf98 fix: migrate legacy lowercase paths on reindex
When qmd update runs against an index created before case-preservation,
documents may exist under lowercase paths (e.g. "skill.md" for a file
actually named "SKILL.md"). Add findOrMigrateLegacyDocument() that:

- Falls back to a lowercase lookup when the canonical path is not found
- Renames the document path in-place via UPDATE OR IGNORE
- Manually rebuilds the FTS entry (FTS5 INSERT OR REPLACE does not
  reliably update existing rows via triggers)
- Handles UNIQUE conflicts gracefully (returns null on conflict)

Embeddings are keyed by content hash, so the rename preserves all
existing vectors — no re-embedding required.

Both the CLI indexer and the library reindexer share the same helper,
eliminating the duplication that a previous review flagged.

Includes integration tests for: successful migration, already-lowercase
no-op, and UNIQUE conflict handling.
2026-04-09 08:25:00 +09:00
Kim Junmo
9fb9de4fd2 fix: preserve original case in handelize()
The blanket .toLowerCase() in handelize() drops filename casing,
which breaks path resolution on case-sensitive filesystems (Linux).
Files like README.md, CHANGELOG.md, and SKILL.md become unreachable
when the index stores them as readme.md, changelog.md, skill.md.

Since FTS5 already performs case-insensitive matching via the
unicode61 tokenizer, lowercasing the stored path provides no search
benefit — it only corrupts the metadata used to locate files on disk.

Remove .toLowerCase() and update all affected test expectations.
2026-04-09 07:59:22 +09:00
Jeff Gardner
1ecb5c9f96
Fix QMD_LLAMA_GPU backend override handling 2026-04-07 18:49:22 +02:00
cocoon
8404cc3bb1 fix(uri): include index in custom qmd links 2026-04-07 23:26:19 +08:00
cocoon
26e3d0c077 fix(status): avoid build attempts during device probe 2026-04-07 23:18:58 +08:00
Tobi Lutke
66e70c028e
fix(test): reset _productionMode in getDefaultDbPath test
Bun runs all test files in a single process, so module-level state
leaks between files. The getDefaultDbPath test now resets the
_productionMode flag before asserting it throws, fixing the flaky
failure on Bun (ubuntu-latest) in CI.
2026-04-05 18:39:51 -04:00
Tobi Lutke
32e504c883
fix(test): remove duplicate path/handelize tests from store.test.ts
These tests are already in store.helpers.unit.test.ts. The duplicates
in store.test.ts failed in CI because _productionMode module state
leaked from earlier tests in the same bun process, causing
getDefaultDbPath to return a path instead of throwing.
2026-04-05 18:31:17 -04:00
JohnRichardEnders
50ce17bbfa feat(llm): resolve models as config > env > default
Separate hardcoded default from env var in DEFAULT_EMBED_MODEL so the
constructor can resolve: config param > env var > hardcoded default.
Also add env var support for QMD_GENERATE_MODEL and QMD_RERANK_MODEL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:00:08 -04:00
dan mackinlay
1bada2eba6 Add explicit TTY link output tests 2026-04-05 17:58:09 -04:00
dan mackinlay
06f5642252 Fix stale ls test expectation 2026-04-05 17:56:26 -04:00
dan mackinlay
636631225e Add clickable OSC8 editor links for CLI search results 2026-04-05 17:56:26 -04:00
James Risberg
33fae1c4f5 chore: migrate AST chunking tests to vitest
Replace standalone test-ast-chunking.mjs (823 lines, custom check()
harness, invisible to CI) with proper vitest integration tests.

All unique assertions preserved; duplicates already in ast.test.ts
dropped. Performance benchmarks and real-collection scanner removed
(dev tools, not regression tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:19:59 -04:00
John R Milinovich
b7a5a86a9b feat(cli): add qmd bench command for search quality benchmarks
Adds a benchmark harness that measures search quality across backends.
Given a fixture file with queries and expected results, it runs each
query through BM25, vector, hybrid (no rerank), and full pipeline,
then reports precision@k, recall, MRR, F1, and latency.

This is primarily a regression testing tool — users create fixtures
for their own vaults to catch quality regressions after config or
index changes. Ships with an example fixture against the eval-docs
test collection to demonstrate the format.

New files:
  src/bench/bench.ts       — main runner
  src/bench/score.ts       — precision, recall, MRR, F1, path matching
  src/bench/types.ts       — fixture and result types
  src/bench/fixtures/      — example fixture
  test/bench-score.test.ts — unit tests for scoring (16 tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:17:59 -04:00
Tobias Lütke
76a2f0fb31 Merge pull request #506 from danmackinlay/fix-505-json-line-output
feat: Include line in --json search output

# Conflicts:
#	CHANGELOG.md
2026-04-05 17:16:05 -04:00
Tobias Lütke
9c9de94bd8 fix(handelize): restore lowercase + convert dots to dashes
- Restore .toLowerCase() in handelize (was dropped, both test files
  expected it inconsistently)
- Convert dots to dashes in filename body (e.g. v2.0 -> v2-0), keeping
  only the extension dot. Tobi confirmed this is the intended behavior.
- Align both test/store.test.ts and test/store.helpers.unit.test.ts to
  match (they had diverged, one expected case-preserved, one lowercase)
- Adjust 'ensureVecTable recreates' test to expect throw behavior
  (matches #501 dimension-mismatch fix)
2026-04-05 17:12:53 -04:00
Surma
2de225c9e7
Test nix flake builds in CI (#487)
* Test nix flake builds in CI

* Update outdated bun.lock file

* fix: restore toLowerCase() in handelize and update tests

* Fix flake to use proper FODs

---------

Co-authored-by: Tobias Lütke <tobi@shopify.com>
2026-04-05 16:59:27 -04:00
Tobias Lütke
828823d20a fix: restore toLowerCase() in handelize + align tests with post-#501 behavior
- Restore .toLowerCase() in handelize (was dropped somewhere, tests expect it)
- Update dimension-mismatch test to expect throw instead of silent rebuild
  (matches new behavior from #501)
- Fix one stale test expectation for preserved dots in filenames
2026-04-05 16:56:06 -04:00
Antonio Mello
ef062e1b54
fix(multi-get): support brace expansion patterns in glob matching (#424)
Brace expansion patterns like `{doc1,doc2}.md` or `collection/{a,b}.md`
were incorrectly parsed as comma-separated file lists instead of being
passed to the glob matcher (picomatch). This happened because the
comma-detection heuristic only checked for `*` and `?` but not `{`.

Also adds `collection/path` matching in `matchFilesByGlob` so patterns
like `my-collection/{file1,file2}.md` work — previously the glob only
matched against `qmd://collection/path` (virtual) and `path` (relative
to collection root), missing the `collection/path` form.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:45:33 -04:00
LJY
698b44fe87
Fix qmd embed model selection (#494) 2026-04-05 16:45:04 -04:00
Matt Van Horn
1ad3388132
fix(store): preserve underscores in BM25 search terms (#404)
sanitizeFTS5Term stripped all non-letter/non-number characters including
underscores, causing snake_case identifiers like `my_variable` to become
`myvariable` and silently fail BM25 matches.

Add underscore to the preserved character set in the Unicode regex.
Export the function and add unit tests covering snake_case, contractions,
punctuation stripping, and unicode.

Fixes #305

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:44:14 -04:00
dan mackinlay
c22d00829b Add line to JSON search output 2026-04-05 10:08:57 +00:00
Tobias Lütke
1fb2e2819e Merge origin/main into feat/ast-aware-chunking
Resolve conflicts: combine AST chunking args (filepath, chunkStrategy)
with abort signal parameter from #458.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 20:00:49 -04:00
Tobias Lütke
dd27f499c7
Merge pull request #463 from goldsr09/fix/hyphenated-lex-queries
Fix hyphenated tokens in FTS5 lex queries
2026-03-28 19:58:22 -04:00
Tobias Lütke
08566ec316
Merge pull request #462 from goldsr09/fix/bm25-field-weights
Fix BM25 field weights to include all 3 FTS columns
2026-03-28 19:56:04 -04:00
Tobias Lütke
8d343b9da1 Update handelize tests for case/dot preservation (#475)
PR #475 changed handelize() to preserve original case and dots,
but the tests still expected lowercase output. Update assertions
to match the new behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 19:54:18 -04:00
Ryan
7b9bd01226 fix: handle hyphenated tokens in FTS5 lex queries
Hyphenated terms like multi-agent, DEC-0054, gpt-4 were being stripped
of hyphens and concatenated (e.g., "multiagent") which missed matches.
Now they're split into FTS5 phrase queries ("multi agent") so the porter
tokenizer matches them correctly.
2026-03-24 20:13:52 -04:00
Ryan
fa214db367 fix: correct BM25 field weights to include all 3 FTS columns
The bm25() call only had 2 weights for 3 columns (filepath, title, body),
giving body an implicit weight of 0. Add proper weights: filepath=1.5,
title=4.0, body=1.0 so title matches are boosted and body content is scored.
2026-03-24 20:12:45 -04:00
James Risberg
244ddf5ecb feat: AST-aware chunking for code files via tree-sitter
Add opt-in AST-aware chunk boundary detection for code files using
web-tree-sitter. When enabled with `--chunk-strategy auto`, code files
(.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class,
and import boundaries instead of arbitrary text positions. Default
behavior (`regex`) is unchanged — no surprises on upgrade.

In testing on QMD's own codebase, AST mode split 42% fewer function
bodies across chunk boundaries compared to regex-only chunking.

Usage:
  qmd embed --chunk-strategy auto
  qmd query "search terms" --chunk-strategy auto

What's included:
- Language detection from file extension with support for TypeScript,
  JavaScript (including arrow functions and function expressions),
  Python, Go, and Rust
- Per-language tree-sitter queries with scored break points aligned to
  the existing markdown scale (class=100, function=90, type=80, import=60)
- AST break points merged with regex break points — highest score wins
  at each position, so embedded markdown (comments, docstrings) still
  benefits from regex patterns
- Refactored chunking core: chunkDocumentWithBreakPoints() extracted,
  mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST
- ChunkStrategy type ("auto" | "regex") threaded through
  generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK
- getASTStatus() health check wired into `qmd status`
- Parse failures log a warning and fall back to regex — never crash

Hardening:
- Grammar packages are optionalDependencies with pinned versions to
  prevent ABI breaks from semver drift
- web-tree-sitter is a direct dependency (pinned)
- Errors are logged (not silently swallowed) for debuggability
- Tested on both Node.js and Bun (Bun is actually faster)

Testing:
- 26 unit tests (test/ast.test.ts) — all 4 languages, error handling
- 7 integration tests (test/store.test.ts) — merge, equivalence, bypass
- Standalone test-ast-chunking.mjs with 63 synthetic tests and a
  real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code)
- Validated end-to-end with qmd embed + qmd query on QMD's own codebase
- Zero markdown regressions across all test paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 01:22:39 -04:00
Tobias Lütke
5f6821629b
Merge pull request #385 from rymalia/fix/launcher-lockfile-priority
fix: prioritize package-lock.json in launcher to prevent Bun false positive
2026-03-14 08:08:03 -04:00
Tobias Lütke
5b48bcb6c1
Merge pull request #389 from sonwr/fix-issue-380-cleanup-no-sqlite-vec
fix: skip cleanup when sqlite-vec is unavailable
2026-03-14 08:07:11 -04:00
programcaicai
809aa36172 fix: bound memory usage during embed 2026-03-13 17:39:17 +08:00
sonwr
7df09e8235 fix: skip vector cleanup when sqlite-vec is unavailable 2026-03-12 13:51:20 +00:00
Ryan Malia
28903d8eba fix: prioritize package-lock.json in launcher to prevent Bun false positive
The bin/qmd wrapper checks for bun.lock to select the runtime, but since
bun.lock is committed to the repo, source builds using npm install are
incorrectly routed to Bun — causing native module ABI mismatches (#381)
and sqlite-vec crashes (#380).

Add package-lock.json as a higher-priority signal: if it exists, npm
installed the dependencies and Node should be used. Also fix
cleanupOrphanedVectors() to use the existing isSqliteVecAvailable()
guard instead of checking sqlite_master, which can report the virtual
table even when the vec0 module isn't loaded.

Fixes #381, fixes #380
Continuation of #362 (runtime detection false positives)
2026-03-12 01:46:38 -07:00
nkkko
b16d77146a feat(skill): install packaged qmd skill 2026-03-10 23:18:15 +01:00
Tobi Lutke
55f16460d0
fix(ci): guard LLM calls in CI and increase test timeouts
Add _ciMode flag to LlamaCpp that throws immediately on embedBatch,
generate, expandQuery, and rerank when CI=true — prevents silent 30s
timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real
LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.
2026-03-10 13:28:37 -04:00