Commit Graph

469 Commits

Author SHA1 Message Date
Antonio
840a614223 fix: respect XDG_CACHE_HOME for model cache directory
MODEL_CACHE_DIR was hardcoded to ~/.cache/qmd/models/, ignoring the
XDG_CACHE_HOME environment variable. This was inconsistent with the rest
of the codebase (store.ts, cli/qmd.ts) which already respects XDG paths.

Fixes #425

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:08:21 -03:00
Mike Bannister
bc80e72a06 chore: update bun.lock after dependency install 2026-03-23 11:49:25 -04:00
Mike Bannister
939d15652c fix: use CTE in searchFTS to prevent query planner regression with collection filter
When searchFTS combines FTS5 MATCH with a collection filter (d.collection = ?)
in the same WHERE clause, SQLite's query planner abandons the FTS5 index and
falls back to a full scan. This turns an 8ms query into a 17+ second query on
large collections (16K+ documents).

The fix wraps the FTS5 query in a CTE so it runs first with proper index usage,
then filters by collection on the materialized results.

Benchmarks on a 16,258-document collection:
  Before: qmd search "knowctl" -c <collection> → 19.8s
  After:  qmd search "knowctl" -c <collection> → 0.4s

The CTE fetches limit*10 candidates from the FTS index to ensure enough results
survive collection filtering. Without a collection filter, the query plan was
already optimal, so no CTE overhead is added in that case.
2026-03-23 11:35:22 -04:00
James Risberg
244ddf5ecb feat: AST-aware chunking for code files via tree-sitter
Add opt-in AST-aware chunk boundary detection for code files using
web-tree-sitter. When enabled with `--chunk-strategy auto`, code files
(.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class,
and import boundaries instead of arbitrary text positions. Default
behavior (`regex`) is unchanged — no surprises on upgrade.

In testing on QMD's own codebase, AST mode split 42% fewer function
bodies across chunk boundaries compared to regex-only chunking.

Usage:
  qmd embed --chunk-strategy auto
  qmd query "search terms" --chunk-strategy auto

What's included:
- Language detection from file extension with support for TypeScript,
  JavaScript (including arrow functions and function expressions),
  Python, Go, and Rust
- Per-language tree-sitter queries with scored break points aligned to
  the existing markdown scale (class=100, function=90, type=80, import=60)
- AST break points merged with regex break points — highest score wins
  at each position, so embedded markdown (comments, docstrings) still
  benefits from regex patterns
- Refactored chunking core: chunkDocumentWithBreakPoints() extracted,
  mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST
- ChunkStrategy type ("auto" | "regex") threaded through
  generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK
- getASTStatus() health check wired into `qmd status`
- Parse failures log a warning and fall back to regex — never crash

Hardening:
- Grammar packages are optionalDependencies with pinned versions to
  prevent ABI breaks from semver drift
- web-tree-sitter is a direct dependency (pinned)
- Errors are logged (not silently swallowed) for debuggability
- Tested on both Node.js and Bun (Bun is actually faster)

Testing:
- 26 unit tests (test/ast.test.ts) — all 4 languages, error handling
- 7 integration tests (test/store.test.ts) — merge, equivalence, bypass
- Standalone test-ast-chunking.mjs with 63 synthetic tests and a
  real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code)
- Validated end-to-end with qmd embed + qmd query on QMD's own codebase
- Zero markdown regressions across all test paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 01:22:39 -04:00
Jarvis
783359f55c
fix: increase RERANK_CONTEXT_SIZE default 2048→4096, make configurable via QMD_RERANK_CONTEXT_SIZE env var, fix RERANK_TEMPLATE_OVERHEAD underestimate 200→512
Default 2048 was too small for longer documents (session transcripts, CJK
text, large markdown files). After truncation the Qwen3 reranker template
adds more overhead than the original 200-token estimate, causing node-llama-cpp
to throw 'input lengths exceed context size'.

Fixes: tobi/qmd#91 tobi/qmd#290 tobi/qmd#291 tobi/qmd#314
2026-03-21 20:59:11 -07:00
Tobias Lütke
2b8f329d7e
Merge pull request #370 from mvanhorn/osc/231-no-rerank-cli-flag
feat(cli): add --no-rerank flag to skip reranking in qmd query
2026-03-14 08:10:38 -04:00
Tobias Lütke
4721e07975
Merge pull request #371 from oysteinkrog/fix/wsl-drvfs-path-detection
Fix QMD when running under WSL (Windows Subsystem for Linux)
2026-03-14 08:10:25 -04:00
Tobias Lütke
95dc295433
Merge pull request #377 from serhii12/fix/sqlite-vec-macos-bun-error-handling
fix(db): add macOS Homebrew SQLite support for Bun and restore actionable errors
2026-03-14 08:08:50 -04:00
Tobias Lütke
43660c468e
Merge pull request #382 from rymalia/fix/zod-version-pin
fix: pin zod to exact 4.2.1 to prevent tsc build failure
2026-03-14 08:08:20 -04:00
Tobias Lütke
5f6821629b
Merge pull request #385 from rymalia/fix/launcher-lockfile-priority
fix: prioritize package-lock.json in launcher to prevent Bun false positive
2026-03-14 08:08:03 -04:00
Tobias Lütke
f35b4e19e0
Merge pull request #393 from lskun/fix/embed-context-overflow
fix: truncate oversized text before embedding to prevent GGML crash
2026-03-14 08:07:47 -04:00
Tobias Lütke
a13a84fb28
Merge pull request #396 from Mic92/qmd-fix
sync stale bun.lock, guard against future lockfile drift
2026-03-14 08:07:25 -04:00
Tobias Lütke
5b48bcb6c1
Merge pull request #389 from sonwr/fix-issue-380-cleanup-no-sqlite-vec
fix: skip cleanup when sqlite-vec is unavailable
2026-03-14 08:07:11 -04:00
Tobias Lütke
7ab1497ebb
Merge pull request #395 from ProgramCaiCai/fix/embed-batching-memory
Bound qmd embed memory usage with default batched processing
2026-03-14 08:06:51 -04:00
Tobias Lütke
398eadf15b
Merge pull request #399 from shreyaskarnik/feat/onnx-conversion
Add ONNX conversion script for Transformers.js deployment
2026-03-14 08:05:10 -04:00
Shreyas Karnik
df8d625c00
fix: map quantize_type to valid Transformers.js dtype values
--quantize none now emits dtype: "fp32" in the README instead of
dtype: "none", matching Transformers.js documented values (fp32,
fp16, q8, q4).
2026-03-13 12:57:19 -07:00
Shreyas Karnik
b05d8863ca
fix: quantization paths, missing imports, and hardcoded metadata
- Add missing subprocess import (NameError on any quantize path)
- Replace broken optimum-cli quantize calls with direct onnxruntime:
  Q4 uses MatMulNBitsQuantizer, Q8 uses quantize_dynamic
- Add onnxconverter-common to deps for FP16 (was silently swallowed)
- Make FP16 fail loudly on missing dep instead of silently uploading FP32
- README and transformers_js_config now reflect actual quantize_type
  instead of always hardcoding Q4
- Remove dead _convert_fp16_external function
2026-03-13 12:45:48 -07:00
Shreyas Karnik
e1ce37c989
fix: handle 2GB protobuf limit, add validation, fix input feeds
- Use no_post_process=True for ONNX export to avoid protobuf serialize error
- Add --validate and --validate-only flags for inference verification
- Fix position_ids in validation feed (required by Qwen3 ONNX export)
- Use optimum-cli for quantization to handle external data format
- Fix optimum dependency to optimum[onnxruntime]

Tested: export + validation passes on CPU, KV cache present (56 tensors).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 12:30:26 -07:00
Shreyas Karnik
2df95ac9ba
feat: add ONNX conversion script for Transformers.js deployment
Add convert_onnx.py that mirrors convert_gguf.py's structure:
- Loads base Qwen3 model, merges SFT + GRPO adapters
- Exports to ONNX via Optimum (text-generation-with-past task)
- Supports Q4 (MatMulNBits), Q8, FP16, and FP32 output
- Uploads to separate HF repo (e.g. tobil/qmd-query-expansion-1.7B-ONNX)
- Writes Transformers.js compatibility config
- Includes model card with usage example

Usage:
    uv run convert_onnx.py --size 1.7B
    uv run convert_onnx.py --size 1.7B --quantize q4 --no-upload

Also adds `just convert-onnx` and `just convert-gguf` tasks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 11:50:03 -07:00
Jörg Thalheim
8c4b4b335d sync stale bun.lock, guard against future lockfile drift
bun.lock still resolved better-sqlite3 to 11.x after package.json was
bumped to ^12.4.5 in v2.0.0. This breaks sandboxed builds (e.g. Nix
with bun2nix) where network access is unavailable to resolve the
mismatch.

CI and the publish workflow now use --frozen-lockfile so drift is caught
immediately. The release script also validates lockfile consistency
before tagging.

Closes #386
2026-03-13 13:34:17 +01:00
programcaicai
809aa36172 fix: bound memory usage during embed 2026-03-13 17:39:17 +08:00
edy
9718d3767c fix: truncate oversized text before embedding to prevent GGML crash
When a chunk exceeds the embedding model's context window (trainContextSize),
node-llama-cpp's getEmbeddingFor() triggers a native SIGABRT in GGML/Metal,
crashing the entire process.

Fix: Add truncateToContextSize() guard in embed() and embedBatch() that uses
the model's own tokenizer to check token count before calling getEmbeddingFor().
Oversized text is truncated to (trainContextSize - 4) tokens with a warning,
preserving partial embedding coverage instead of crashing.

Fixes #303
2026-03-13 09:35:17 +08:00
sonwr
7df09e8235 fix: skip vector cleanup when sqlite-vec is unavailable 2026-03-12 13:51:20 +00:00
Ryan Malia
28903d8eba fix: prioritize package-lock.json in launcher to prevent Bun false positive
The bin/qmd wrapper checks for bun.lock to select the runtime, but since
bun.lock is committed to the repo, source builds using npm install are
incorrectly routed to Bun — causing native module ABI mismatches (#381)
and sqlite-vec crashes (#380).

Add package-lock.json as a higher-priority signal: if it exists, npm
installed the dependencies and Node should be used. Also fix
cleanupOrphanedVectors() to use the existing isSqliteVecAvailable()
guard instead of checking sqlite_master, which can report the virtual
table even when the vec0 module isn't loaded.

Fixes #381, fixes #380
Continuation of #362 (runtime detection false positives)
2026-03-12 01:46:38 -07:00
Ryan Malia
e988c5d286 fix: pin zod to exact 4.2.1 to fix tsc build failure (#379)
The caret range ^4.2.1 allows npm to resolve zod 4.3.x, which has
breaking type changes against @modelcontextprotocol/sdk. Source builds
fail with TypeScript errors. Pinning to exact 4.2.1 resolves this.

See: https://github.com/tobi/qmd/issues/379
2026-03-11 21:43:09 -07:00
serhii12
52ebafce11 fix(db): add macOS Homebrew SQLite support for Bun and restore actionable errors
On macOS, bun:sqlite uses Apple's system SQLite which is compiled with
SQLITE_OMIT_LOAD_EXTENSION, preventing sqlite-vec from loading. The v2.0
refactor also silently swallowed extension loading failures, losing the
actionable error messages that existed pre-2.0.

- Call Database.setCustomSQLite() on macOS to use Homebrew's SQLite
- Eagerly validate extension loading at init, not at first query
- Throw with platform-specific fix instructions in loadSqliteVec()
- Log warning in store.ts instead of silently catching

Fixes #363
2026-03-11 12:55:20 -03:00
Øystein Krog
26d4ebfa56 fix: skip Git Bash path detection on WSL
On WSL, paths like /c/work/... are valid drvfs mount points, not Git
Bash drive-letter shortcuts. The existing code in isAbsolutePath() and
resolve() detected /c/ as a Windows C: path, converting drvfs paths to
C:/work/... which broke indexing entirely.

Fix: detect WSL via WSL_DISTRO_NAME or WSL_INTEROP environment variables
and skip the Git Bash /c/ -> C: branch on WSL. Native Linux path handling
continues as before.
2026-03-11 12:25:28 +01:00
Matt Van Horn
11b3f17fba feat(cli): add --no-rerank flag to skip reranking in qmd query
Exposes the existing skipRerank option as a --no-rerank CLI flag for
qmd query. On CPU-only machines, reranking takes 120s+ for 20 chunks -
this flag lets users get RRF-fused results without the reranking penalty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 23:41:43 -07:00
Tobias Lütke
ae3604cb88
Merge pull request #362 from syedair/fix/launcher-bun-install-false-positive
fix: remove $BUN_INSTALL check from launcher to prevent false Bun detection
2026-03-10 21:38:41 -04:00
Syed Humair
b0a14b18ad fix: remove $BUN_INSTALL check from launcher to prevent false Bun detection
When Bun is installed on the system but QMD was installed via npm,
$BUN_INSTALL is always set (typically to ~/.bun), causing the launcher
to incorrectly run QMD under Bun. This leads to ABI mismatches with
native modules (better-sqlite3, sqlite-vec) that were compiled for Node,
breaking vector operations with "no such module: vec0".

Only check for bun.lock/bun.lockb files, which reliably indicate that
QMD was actually installed with Bun.

Fixes #361

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 05:24:44 +04:00
Tobi Lutke
21a5dcc853
release: v2.0.1 2026-03-10 20:59:27 -04:00
Tobi Lutke
1207fe7776
docs: write changelog for 2.0.1 2026-03-10 20:58:45 -04:00
Tobias Lütke
55c951b15e
Merge pull request #349 from byheaven/fix/qwen3-embedding-model-filename-case
docs: fix Qwen3-Embedding GGUF filename case (404 on download)
2026-03-10 20:08:53 -04:00
Tobias Lütke
22ba426dc0
Merge pull request #352 from nibzard/fix-global-launcher-path
Fix launcher path for global installs
2026-03-10 20:08:04 -04:00
Tobias Lütke
e710e9d2b9
Merge pull request #355 from nibzard/feat-skill-install-clean
Add skill install command
2026-03-10 20:07:47 -04:00
nkkko
b16d77146a feat(skill): install packaged qmd skill 2026-03-10 23:18:15 +01:00
nkkko
9f4c71c783 fix(cli): resolve symlinked global launcher path 2026-03-10 22:38:20 +01:00
Tobi Lutke
55f16460d0
fix(ci): guard LLM calls in CI and increase test timeouts
Add _ciMode flag to LlamaCpp that throws immediately on embedBatch,
generate, expandQuery, and rerank when CI=true — prevents silent 30s
timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real
LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.
2026-03-10 13:28:37 -04:00
Tobi Lutke
ed0249fd6b
fix(test): increase timeout for SDK search tests that trigger LLM expansion
These tests load the query expansion model on first call, which
consistently exceeds the 30s timeout on CI runners.
2026-03-10 12:59:46 -04:00
Tobi Lutke
8478ddb666
release: v2.0.0 2026-03-10 11:53:25 -04:00
Tobi Lutke
a444c86382
docs: rewrite SDK section for 2.0, fix MCP tool names, add changelog
- Expand SDK documentation from ~70 lines to comprehensive coverage:
  store creation modes, unified search(), retrieval, collections,
  context, indexing, types, and lifecycle
- Fix MCP tools section: old names (qmd_search, qmd_deep_search)
  replaced with actual registered names (query, get, multi_get, status)
- Write 2.0.0 changelog under [Unreleased]
2026-03-10 11:53:12 -04:00
Tobi Lutke
b252219add
fix(deps): bump better-sqlite3 to ^12.4.5, add runtime-aware bin wrapper
- Bump better-sqlite3 from ^11 to ^12.4.5 for Node 25 support (prebuilds
  + V8 API compat). Closes #257.
- Add bin/qmd shell wrapper that detects bun vs node install and execs
  with the matching runtime, preventing native module ABI mismatches
  when installed via bun. Closes #319.
2026-03-10 11:43:00 -04:00
Tobi Lutke
c68904fe08
refactor: move CLI and MCP to subdirectories, MCP consumes SDK
Move frontends into src/cli/ and src/mcp/ to separate them from the
core library. The MCP server is fully rewritten to import only from
the SDK (src/index.ts) — zero direct store.ts/collections.ts/llm.ts
access.

- src/qmd.ts → src/cli/qmd.ts
- src/formatter.ts → src/cli/formatter.ts
- src/mcp.ts → src/mcp/server.ts (rewritten to use QMDStore SDK)
- New src/maintenance.ts: Maintenance class for CLI housekeeping
- SDK gains: getDocumentBody(), getDefaultCollectionNames(),
  extractSnippet/addLineNumbers/DEFAULT_MULTI_GET_MAX_BYTES exports,
  getDefaultDbPath re-export, InternalStore type export
- package.json bin/scripts updated for new paths
- All 692 tests pass
2026-03-10 11:39:55 -04:00
Tobi Lutke
839d774a06
feat: redesign SDK search API with unified search() and ExpandedQuery type
Replace three separate search methods (query, search, structuredSearch)
with a single search(options) that accepts either a query string
(auto-expanded) or pre-expanded queries. Add searchLex/searchVector
convenience methods and expandQuery for manual control.

Unify StructuredSubSearch and ExpandedQuery into a single ExpandedQuery
type with { type, query } used throughout the pipeline. Add skipRerank
option to hybridQuery and structuredSearch for fast no-LLM searches.

New SDK surface:
- search({ query, intent, rerank, limit, ... })
- search({ queries: expanded })
- searchLex(query, opts)
- searchVector(query, opts)
- expandQuery(query, { intent })
2026-03-10 11:04:45 -04:00
YuBai
740b17b485 docs: fix Qwen3-Embedding GGUF filename case in README and llm.ts
HuggingFace filenames are case-sensitive. The documented filename
'qwen3-embedding-0.6b-q8_0.gguf' (lowercase) returns 404. The correct
filename is 'Qwen3-Embedding-0.6B-Q8_0.gguf' (original case from the
HuggingFace repo).

Co-Authored-By: Oz <oz-agent@warp.dev>
2026-03-10 18:54:36 +08:00
Tobi Lutke
032f26edca
release: v1.1.6 2026-03-09 17:23:14 -04:00
Tobi Lutke
0c83dc1593
docs: write changelog for 1.1.6 2026-03-09 17:22:54 -04:00
Tobi Lutke
040c6fa904
feat: add SDK/library mode for programmatic access
Allow QMD to be used as a library (`import { createStore } from '@tobilu/qmd'`)
in addition to CLI and MCP modes. The constructor requires explicit dbPath and
either a configPath (YAML file) or inline config object — no defaults assumed,
making it safe to embed in any application.

- Add src/index.ts entry point with QMDStore interface exposing search,
  retrieval, collection/context management, and index health
- Add setConfigSource() to collections.ts for inline config support
  (in-memory config with no file I/O)
- Add main/types/exports fields to package.json
- Add SDK documentation section to README
- Add 56 unit tests covering constructor, collections, contexts, search,
  document retrieval, config isolation, YAML persistence, and lifecycle
2026-03-08 15:59:22 -04:00
Tobi Lutke
4fa11682db
fix: update Store type to match intent parameter signatures 2026-03-07 21:30:31 -04:00
Tobi Lutke
ba97c03b02
docs: credit Ilya Grigorik in 1.1.5 changelog 2026-03-07 21:26:32 -04:00