The handelize() regex replaced all non-letter/non-number chars with
dashes, including dots in the filename stem. This mangled session
filenames like "topic-1773595309.753009.md" to "topic-1773595309-753009.md",
breaking memory_get path resolution (file not found on disk).
Fix: add dot to the preserved character class in the filename regex.
After deploying, run qmd-reindex.sh to rebuild indexes with correct paths.
- Add missing subprocess import (NameError on any quantize path)
- Replace broken optimum-cli quantize calls with direct onnxruntime:
Q4 uses MatMulNBitsQuantizer, Q8 uses quantize_dynamic
- Add onnxconverter-common to deps for FP16 (was silently swallowed)
- Make FP16 fail loudly on missing dep instead of silently uploading FP32
- README and transformers_js_config now reflect actual quantize_type
instead of always hardcoding Q4
- Remove dead _convert_fp16_external function
- Use no_post_process=True for ONNX export to avoid protobuf serialize error
- Add --validate and --validate-only flags for inference verification
- Fix position_ids in validation feed (required by Qwen3 ONNX export)
- Use optimum-cli for quantization to handle external data format
- Fix optimum dependency to optimum[onnxruntime]
Tested: export + validation passes on CPU, KV cache present (56 tensors).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add convert_onnx.py that mirrors convert_gguf.py's structure:
- Loads base Qwen3 model, merges SFT + GRPO adapters
- Exports to ONNX via Optimum (text-generation-with-past task)
- Supports Q4 (MatMulNBits), Q8, FP16, and FP32 output
- Uploads to separate HF repo (e.g. tobil/qmd-query-expansion-1.7B-ONNX)
- Writes Transformers.js compatibility config
- Includes model card with usage example
Usage:
uv run convert_onnx.py --size 1.7B
uv run convert_onnx.py --size 1.7B --quantize q4 --no-upload
Also adds `just convert-onnx` and `just convert-gguf` tasks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bun.lock still resolved better-sqlite3 to 11.x after package.json was
bumped to ^12.4.5 in v2.0.0. This breaks sandboxed builds (e.g. Nix
with bun2nix) where network access is unavailable to resolve the
mismatch.
CI and the publish workflow now use --frozen-lockfile so drift is caught
immediately. The release script also validates lockfile consistency
before tagging.
Closes#386
When a chunk exceeds the embedding model's context window (trainContextSize),
node-llama-cpp's getEmbeddingFor() triggers a native SIGABRT in GGML/Metal,
crashing the entire process.
Fix: Add truncateToContextSize() guard in embed() and embedBatch() that uses
the model's own tokenizer to check token count before calling getEmbeddingFor().
Oversized text is truncated to (trainContextSize - 4) tokens with a warning,
preserving partial embedding coverage instead of crashing.
Fixes#303
The bin/qmd wrapper checks for bun.lock to select the runtime, but since
bun.lock is committed to the repo, source builds using npm install are
incorrectly routed to Bun — causing native module ABI mismatches (#381)
and sqlite-vec crashes (#380).
Add package-lock.json as a higher-priority signal: if it exists, npm
installed the dependencies and Node should be used. Also fix
cleanupOrphanedVectors() to use the existing isSqliteVecAvailable()
guard instead of checking sqlite_master, which can report the virtual
table even when the vec0 module isn't loaded.
Fixes#381, fixes#380
Continuation of #362 (runtime detection false positives)
The caret range ^4.2.1 allows npm to resolve zod 4.3.x, which has
breaking type changes against @modelcontextprotocol/sdk. Source builds
fail with TypeScript errors. Pinning to exact 4.2.1 resolves this.
See: https://github.com/tobi/qmd/issues/379
On macOS, bun:sqlite uses Apple's system SQLite which is compiled with
SQLITE_OMIT_LOAD_EXTENSION, preventing sqlite-vec from loading. The v2.0
refactor also silently swallowed extension loading failures, losing the
actionable error messages that existed pre-2.0.
- Call Database.setCustomSQLite() on macOS to use Homebrew's SQLite
- Eagerly validate extension loading at init, not at first query
- Throw with platform-specific fix instructions in loadSqliteVec()
- Log warning in store.ts instead of silently catching
Fixes#363
On WSL, paths like /c/work/... are valid drvfs mount points, not Git
Bash drive-letter shortcuts. The existing code in isAbsolutePath() and
resolve() detected /c/ as a Windows C: path, converting drvfs paths to
C:/work/... which broke indexing entirely.
Fix: detect WSL via WSL_DISTRO_NAME or WSL_INTEROP environment variables
and skip the Git Bash /c/ -> C: branch on WSL. Native Linux path handling
continues as before.
Exposes the existing skipRerank option as a --no-rerank CLI flag for
qmd query. On CPU-only machines, reranking takes 120s+ for 20 chunks -
this flag lets users get RRF-fused results without the reranking penalty.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When Bun is installed on the system but QMD was installed via npm,
$BUN_INSTALL is always set (typically to ~/.bun), causing the launcher
to incorrectly run QMD under Bun. This leads to ABI mismatches with
native modules (better-sqlite3, sqlite-vec) that were compiled for Node,
breaking vector operations with "no such module: vec0".
Only check for bun.lock/bun.lockb files, which reliably indicate that
QMD was actually installed with Bun.
Fixes#361
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add _ciMode flag to LlamaCpp that throws immediately on embedBatch,
generate, expandQuery, and rerank when CI=true — prevents silent 30s
timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real
LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.
- Bump better-sqlite3 from ^11 to ^12.4.5 for Node 25 support (prebuilds
+ V8 API compat). Closes#257.
- Add bin/qmd shell wrapper that detects bun vs node install and execs
with the matching runtime, preventing native module ABI mismatches
when installed via bun. Closes#319.
Move frontends into src/cli/ and src/mcp/ to separate them from the
core library. The MCP server is fully rewritten to import only from
the SDK (src/index.ts) — zero direct store.ts/collections.ts/llm.ts
access.
- src/qmd.ts → src/cli/qmd.ts
- src/formatter.ts → src/cli/formatter.ts
- src/mcp.ts → src/mcp/server.ts (rewritten to use QMDStore SDK)
- New src/maintenance.ts: Maintenance class for CLI housekeeping
- SDK gains: getDocumentBody(), getDefaultCollectionNames(),
extractSnippet/addLineNumbers/DEFAULT_MULTI_GET_MAX_BYTES exports,
getDefaultDbPath re-export, InternalStore type export
- package.json bin/scripts updated for new paths
- All 692 tests pass
Replace three separate search methods (query, search, structuredSearch)
with a single search(options) that accepts either a query string
(auto-expanded) or pre-expanded queries. Add searchLex/searchVector
convenience methods and expandQuery for manual control.
Unify StructuredSubSearch and ExpandedQuery into a single ExpandedQuery
type with { type, query } used throughout the pipeline. Add skipRerank
option to hybridQuery and structuredSearch for fast no-LLM searches.
New SDK surface:
- search({ query, intent, rerank, limit, ... })
- search({ queries: expanded })
- searchLex(query, opts)
- searchVector(query, opts)
- expandQuery(query, { intent })
HuggingFace filenames are case-sensitive. The documented filename
'qwen3-embedding-0.6b-q8_0.gguf' (lowercase) returns 404. The correct
filename is 'Qwen3-Embedding-0.6B-Q8_0.gguf' (original case from the
HuggingFace repo).
Co-Authored-By: Oz <oz-agent@warp.dev>
Allow QMD to be used as a library (`import { createStore } from '@tobilu/qmd'`)
in addition to CLI and MCP modes. The constructor requires explicit dbPath and
either a configPath (YAML file) or inline config object — no defaults assumed,
making it safe to embed in any application.
- Add src/index.ts entry point with QMDStore interface exposing search,
retrieval, collection/context management, and index health
- Add setConfigSource() to collections.ts for inline config support
(in-memory config with no file I/O)
- Add main/types/exports fields to package.json
- Add SDK documentation section to README
- Add 56 unit tests covering constructor, collections, contexts, search,
document retrieval, config isolation, YAML persistence, and lifecycle
Add optional `intent` parameter that steers query expansion, reranking,
chunk selection, and snippet extraction without searching on its own.
When a query like "performance" is ambiguous (web-perf vs team health vs
fitness), intent provides background context that disambiguates results
across all pipeline stages:
- expandQuery: includes intent in LLM prompt ("Query intent: {intent}")
- rerank: prepends intent to rerank query for Qwen3-Reranker
- chunk selection: intent terms scored at 0.5x weight vs query terms
- snippet extraction: intent terms scored at 0.3x weight
- strong-signal bypass: disabled when intent provided
Available via CLI (--intent flag or intent: line in query documents),
MCP (intent field on query tool), and programmatic API.
Adapted from PR #180 (thanks @vyalamar).