ai-workspace-services/qmd

Author	SHA1	Message	Date
Tobias Lütke	f35b4e19e0	Merge pull request #393 from lskun/fix/embed-context-overflow fix: truncate oversized text before embedding to prevent GGML crash	2026-03-14 08:07:47 -04:00
Tobias Lütke	a13a84fb28	Merge pull request #396 from Mic92/qmd-fix sync stale bun.lock, guard against future lockfile drift	2026-03-14 08:07:25 -04:00
Tobias Lütke	5b48bcb6c1	Merge pull request #389 from sonwr/fix-issue-380-cleanup-no-sqlite-vec fix: skip cleanup when sqlite-vec is unavailable	2026-03-14 08:07:11 -04:00
Tobias Lütke	7ab1497ebb	Merge pull request #395 from ProgramCaiCai/fix/embed-batching-memory Bound qmd embed memory usage with default batched processing	2026-03-14 08:06:51 -04:00
Tobias Lütke	398eadf15b	Merge pull request #399 from shreyaskarnik/feat/onnx-conversion Add ONNX conversion script for Transformers.js deployment	2026-03-14 08:05:10 -04:00
Shreyas Karnik	df8d625c00	fix: map quantize_type to valid Transformers.js dtype values --quantize none now emits dtype: "fp32" in the README instead of dtype: "none", matching Transformers.js documented values (fp32, fp16, q8, q4).	2026-03-13 12:57:19 -07:00
Shreyas Karnik	b05d8863ca	fix: quantization paths, missing imports, and hardcoded metadata - Add missing subprocess import (NameError on any quantize path) - Replace broken optimum-cli quantize calls with direct onnxruntime: Q4 uses MatMulNBitsQuantizer, Q8 uses quantize_dynamic - Add onnxconverter-common to deps for FP16 (was silently swallowed) - Make FP16 fail loudly on missing dep instead of silently uploading FP32 - README and transformers_js_config now reflect actual quantize_type instead of always hardcoding Q4 - Remove dead _convert_fp16_external function	2026-03-13 12:45:48 -07:00
Shreyas Karnik	e1ce37c989	fix: handle 2GB protobuf limit, add validation, fix input feeds - Use no_post_process=True for ONNX export to avoid protobuf serialize error - Add --validate and --validate-only flags for inference verification - Fix position_ids in validation feed (required by Qwen3 ONNX export) - Use optimum-cli for quantization to handle external data format - Fix optimum dependency to optimum[onnxruntime] Tested: export + validation passes on CPU, KV cache present (56 tensors). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 12:30:26 -07:00
Shreyas Karnik	2df95ac9ba	feat: add ONNX conversion script for Transformers.js deployment Add convert_onnx.py that mirrors convert_gguf.py's structure: - Loads base Qwen3 model, merges SFT + GRPO adapters - Exports to ONNX via Optimum (text-generation-with-past task) - Supports Q4 (MatMulNBits), Q8, FP16, and FP32 output - Uploads to separate HF repo (e.g. tobil/qmd-query-expansion-1.7B-ONNX) - Writes Transformers.js compatibility config - Includes model card with usage example Usage: uv run convert_onnx.py --size 1.7B uv run convert_onnx.py --size 1.7B --quantize q4 --no-upload Also adds `just convert-onnx` and `just convert-gguf` tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 11:50:03 -07:00
Jörg Thalheim	8c4b4b335d	sync stale bun.lock, guard against future lockfile drift bun.lock still resolved better-sqlite3 to 11.x after package.json was bumped to ^12.4.5 in v2.0.0. This breaks sandboxed builds (e.g. Nix with bun2nix) where network access is unavailable to resolve the mismatch. CI and the publish workflow now use --frozen-lockfile so drift is caught immediately. The release script also validates lockfile consistency before tagging. Closes #386	2026-03-13 13:34:17 +01:00
programcaicai	809aa36172	fix: bound memory usage during embed	2026-03-13 17:39:17 +08:00
edy	9718d3767c	fix: truncate oversized text before embedding to prevent GGML crash When a chunk exceeds the embedding model's context window (trainContextSize), node-llama-cpp's getEmbeddingFor() triggers a native SIGABRT in GGML/Metal, crashing the entire process. Fix: Add truncateToContextSize() guard in embed() and embedBatch() that uses the model's own tokenizer to check token count before calling getEmbeddingFor(). Oversized text is truncated to (trainContextSize - 4) tokens with a warning, preserving partial embedding coverage instead of crashing. Fixes #303	2026-03-13 09:35:17 +08:00
sonwr	7df09e8235	fix: skip vector cleanup when sqlite-vec is unavailable	2026-03-12 13:51:20 +00:00
Tobias Lütke	ae3604cb88	Merge pull request #362 from syedair/fix/launcher-bun-install-false-positive fix: remove $BUN_INSTALL check from launcher to prevent false Bun detection	2026-03-10 21:38:41 -04:00
Syed Humair	b0a14b18ad	fix: remove $BUN_INSTALL check from launcher to prevent false Bun detection When Bun is installed on the system but QMD was installed via npm, $BUN_INSTALL is always set (typically to ~/.bun), causing the launcher to incorrectly run QMD under Bun. This leads to ABI mismatches with native modules (better-sqlite3, sqlite-vec) that were compiled for Node, breaking vector operations with "no such module: vec0". Only check for bun.lock/bun.lockb files, which reliably indicate that QMD was actually installed with Bun. Fixes #361 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 05:24:44 +04:00
Tobi Lutke	21a5dcc853	release: v2.0.1	2026-03-10 20:59:27 -04:00
Tobi Lutke	1207fe7776	docs: write changelog for 2.0.1	2026-03-10 20:58:45 -04:00
Tobias Lütke	55c951b15e	Merge pull request #349 from byheaven/fix/qwen3-embedding-model-filename-case docs: fix Qwen3-Embedding GGUF filename case (404 on download)	2026-03-10 20:08:53 -04:00
Tobias Lütke	22ba426dc0	Merge pull request #352 from nibzard/fix-global-launcher-path Fix launcher path for global installs	2026-03-10 20:08:04 -04:00
Tobias Lütke	e710e9d2b9	Merge pull request #355 from nibzard/feat-skill-install-clean Add skill install command	2026-03-10 20:07:47 -04:00
nkkko	b16d77146a	feat(skill): install packaged qmd skill	2026-03-10 23:18:15 +01:00
nkkko	9f4c71c783	fix(cli): resolve symlinked global launcher path	2026-03-10 22:38:20 +01:00
Tobi Lutke	55f16460d0	fix(ci): guard LLM calls in CI and increase test timeouts Add _ciMode flag to LlamaCpp that throws immediately on embedBatch, generate, expandQuery, and rerank when CI=true — prevents silent 30s timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.	2026-03-10 13:28:37 -04:00
Tobi Lutke	ed0249fd6b	fix(test): increase timeout for SDK search tests that trigger LLM expansion These tests load the query expansion model on first call, which consistently exceeds the 30s timeout on CI runners.	2026-03-10 12:59:46 -04:00
Tobi Lutke	8478ddb666	release: v2.0.0	2026-03-10 11:53:25 -04:00
Tobi Lutke	a444c86382	docs: rewrite SDK section for 2.0, fix MCP tool names, add changelog - Expand SDK documentation from ~70 lines to comprehensive coverage: store creation modes, unified search(), retrieval, collections, context, indexing, types, and lifecycle - Fix MCP tools section: old names (qmd_search, qmd_deep_search) replaced with actual registered names (query, get, multi_get, status) - Write 2.0.0 changelog under [Unreleased]	2026-03-10 11:53:12 -04:00
Tobi Lutke	b252219add	fix(deps): bump better-sqlite3 to ^12.4.5, add runtime-aware bin wrapper - Bump better-sqlite3 from ^11 to ^12.4.5 for Node 25 support (prebuilds + V8 API compat). Closes #257. - Add bin/qmd shell wrapper that detects bun vs node install and execs with the matching runtime, preventing native module ABI mismatches when installed via bun. Closes #319.	2026-03-10 11:43:00 -04:00
Tobi Lutke	c68904fe08	refactor: move CLI and MCP to subdirectories, MCP consumes SDK Move frontends into src/cli/ and src/mcp/ to separate them from the core library. The MCP server is fully rewritten to import only from the SDK (src/index.ts) — zero direct store.ts/collections.ts/llm.ts access. - src/qmd.ts → src/cli/qmd.ts - src/formatter.ts → src/cli/formatter.ts - src/mcp.ts → src/mcp/server.ts (rewritten to use QMDStore SDK) - New src/maintenance.ts: Maintenance class for CLI housekeeping - SDK gains: getDocumentBody(), getDefaultCollectionNames(), extractSnippet/addLineNumbers/DEFAULT_MULTI_GET_MAX_BYTES exports, getDefaultDbPath re-export, InternalStore type export - package.json bin/scripts updated for new paths - All 692 tests pass	2026-03-10 11:39:55 -04:00
Tobi Lutke	839d774a06	feat: redesign SDK search API with unified search() and ExpandedQuery type Replace three separate search methods (query, search, structuredSearch) with a single search(options) that accepts either a query string (auto-expanded) or pre-expanded queries. Add searchLex/searchVector convenience methods and expandQuery for manual control. Unify StructuredSubSearch and ExpandedQuery into a single ExpandedQuery type with { type, query } used throughout the pipeline. Add skipRerank option to hybridQuery and structuredSearch for fast no-LLM searches. New SDK surface: - search({ query, intent, rerank, limit, ... }) - search({ queries: expanded }) - searchLex(query, opts) - searchVector(query, opts) - expandQuery(query, { intent })	2026-03-10 11:04:45 -04:00
YuBai	740b17b485	docs: fix Qwen3-Embedding GGUF filename case in README and llm.ts HuggingFace filenames are case-sensitive. The documented filename 'qwen3-embedding-0.6b-q8_0.gguf' (lowercase) returns 404. The correct filename is 'Qwen3-Embedding-0.6B-Q8_0.gguf' (original case from the HuggingFace repo). Co-Authored-By: Oz <oz-agent@warp.dev>	2026-03-10 18:54:36 +08:00
Tobi Lutke	032f26edca	release: v1.1.6	2026-03-09 17:23:14 -04:00
Tobi Lutke	0c83dc1593	docs: write changelog for 1.1.6	2026-03-09 17:22:54 -04:00
Tobi Lutke	040c6fa904	feat: add SDK/library mode for programmatic access Allow QMD to be used as a library (`import { createStore } from '@tobilu/qmd'`) in addition to CLI and MCP modes. The constructor requires explicit dbPath and either a configPath (YAML file) or inline config object — no defaults assumed, making it safe to embed in any application. - Add src/index.ts entry point with QMDStore interface exposing search, retrieval, collection/context management, and index health - Add setConfigSource() to collections.ts for inline config support (in-memory config with no file I/O) - Add main/types/exports fields to package.json - Add SDK documentation section to README - Add 56 unit tests covering constructor, collections, contexts, search, document retrieval, config isolation, YAML persistence, and lifecycle	2026-03-08 15:59:22 -04:00
Tobi Lutke	4fa11682db	fix: update Store type to match intent parameter signatures	2026-03-07 21:30:31 -04:00
Tobi Lutke	ba97c03b02	docs: credit Ilya Grigorik in 1.1.5 changelog	2026-03-07 21:26:32 -04:00
Tobi Lutke	da9cf691fd	release: v1.1.5	2026-03-07 20:16:32 -04:00
Tobi Lutke	66fb5b1d98	docs: write changelog for 1.1.5	2026-03-07 20:15:54 -04:00
Tobi Lutke	ad38c1f698	feat: add intent parameter for query disambiguation Add optional `intent` parameter that steers query expansion, reranking, chunk selection, and snippet extraction without searching on its own. When a query like "performance" is ambiguous (web-perf vs team health vs fitness), intent provides background context that disambiguates results across all pipeline stages: - expandQuery: includes intent in LLM prompt ("Query intent: {intent}") - rerank: prepends intent to rerank query for Qwen3-Reranker - chunk selection: intent terms scored at 0.5x weight vs query terms - snippet extraction: intent terms scored at 0.3x weight - strong-signal bypass: disabled when intent provided Available via CLI (--intent flag or intent: line in query documents), MCP (intent field on query tool), and programmatic API. Adapted from PR #180 (thanks @vyalamar).	2026-03-07 19:27:29 -04:00
Tobi Lutke	b838f74c8c	release: v1.1.2	2026-03-07 15:58:28 -04:00
Tobi Lutke	0ff9bec129	docs: write changelog for 1.1.2	2026-03-07 15:58:14 -04:00
Tobi Lutke	e3549dab1a	perf(rerank): cap parallelism, deduplicate chunks, cache by content - Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines - Deduplicate identical chunk texts before sending to reranker - Cache rerank scores by chunk content instead of file path — same text from different files now shares a single reranker call - Add truncation cache to avoid re-tokenizing duplicate documents	2026-03-07 15:57:36 -04:00
Tobi Lutke	44d7145bfe	Merge pull request #242 from vyalamar/feat/query-explain-score-traces feat(query): add --explain score traces for hybrid retrieval	2026-03-07 14:35:26 -04:00
vyalamar	b068ad0dd6	feat(query): add --explain score traces for hybrid search	2026-03-07 14:35:10 -04:00
Tobias Lütke	7904ab9a9d	Merge pull request #273 from daocoding/feature/configurable-embed-model feat: add QMD_EMBED_MODEL env var for multilingual embedding support	2026-03-07 14:28:59 -04:00
Tobias Lütke	cb5d84ff07	Merge pull request #225 from ilepn/fix/sqlite-vec-windows-package-name fix(package.json): correct Windows sqlite-vec package name + add linux-arm64	2026-03-07 14:28:55 -04:00
Tobias Lütke	a4b641d8e3	Merge pull request #255 from pandysp/feat/expose-candidate-limit feat: expose candidateLimit as MCP tool parameter and CLI flag	2026-03-07 14:28:52 -04:00
Tobias Lütke	e3bc5ccdc3	Merge pull request #286 from joelev/fix/multi-session-http fix: support multiple concurrent HTTP clients	2026-03-07 14:28:50 -04:00
Tobias Lütke	8bd93366ad	Merge pull request #228 from amsminn/fix-empty-results-format fix(cli): prevent parser breakage on empty results across output formats	2026-03-07 14:25:16 -04:00
Tobias Lütke	0b3fb07a8f	Merge pull request #230 from Balneario-de-Cofrentes/fix/tty-progress-guard fix(cli): suppress progress bars when not TTY	2026-03-07 14:25:13 -04:00
Tobias Lütke	271feb7791	Merge pull request #253 from jimmynail/fix/skip-unreadable-files fix: skip unreadable files during indexing instead of crashing	2026-03-07 14:25:11 -04:00

1 2 3 4 5 ...

354 Commits