ai-workspace-services/qmd

Author	SHA1	Message	Date
Tobias Lütke	67e2aab18c	Merge pull request #206 from tobi/liquidai-query-expansion	2026-02-18 08:42:01 -04:00
Tobias Lütke	b142be8cdc	Merge pull request #205 from tobi/mate/dataset-v3-improvements	2026-02-18 08:41:36 -04:00
Tobias Lütke	1a67e1a093	Merge pull request #208 from tobi/fix/json-output-and-index-paths	2026-02-17 20:27:49 -04:00
Tobi Lütke	1007b46fcc	fix: return empty JSON array when no search results with --json flag Previously, 'qmd search --json' would output plain text 'No results found.' when no matches were found, which is invalid JSON. Now it correctly outputs an empty JSON array [] when using --json format. Fixed in all search commands: search, vsearch, and query.	2026-02-17 10:46:31 -05:00
Tobi Lütke	00bcfbbd34	fix: resolve relative paths in --index flag to prevent malformed config paths When using --index with relative paths like './index/my-project', the path was stored directly as the config filename, resulting in paths like: /home/user/.config/qmd/./index/my-project.yml This caused 'no such file or directory' errors. Now relative paths are resolved and normalized by replacing path separators with underscores.	2026-02-17 10:46:31 -05:00
Tobi Lütke	48f0917269	feat(finetune): hyde-first ordering, relative paths, structured format Dataset improvements: - Reorder output to put hyde first for better retrieval priming - Convert absolute paths to relative paths in scripts - Add convert_to_structured.py for structured data format - Add qmd_expansion_v3_structured.jsonl with type/query objects - Update schema.py with reorder_hyde_first() helper - Verify data now validates hyde-first ordering Training data regenerated with new ordering (100% validation success).	2026-02-17 06:31:35 -05:00
Tobi Lütke	57f7caa93b	feat: add LiquidAI LFM2 support for query expansion Add training configuration and documentation for using LiquidAI's LFM2-1.2B as an alternative base model for query expansion fine-tuning. LFM2 benefits: - 2x faster decode/prefill vs standard transformers - Optimized for edge/on-device inference - Good at agentic tasks, RAG, and data extraction Changes: - Add configs/sft_lfm2.yaml with LFM2-specific LoRA target modules - Add jobs/sft_lfm2.py for HuggingFace Jobs training - Update llm.ts with LFM2 GGUF model URIs - Add documentation for LFM2 training workflow LFM2 uses a hybrid architecture (convolutions + attention) requiring different LoRA targets: q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3	2026-02-17 06:20:57 -05:00
Tobi Lütke	a282ce7a26	feat(finetune): improve query expansion dataset v3 Dataset improvements: - fix_hyde.py: Replace generic template hyde entries with query-specific ones using GPT-4o-mini (removed 'comprehensive guide covers everything' pattern) - fix_lex_filler.py: Remove filler words (overview, tutorial, guide, examples, documentation, best practices) that were padding rather than genuine search intent - qmd_expansion_v3.jsonl: Improved dataset with 1,498 high-quality entries Training data preparation: - convert_to_chatml.py: Convert to ChatML format for LFM2.5 training - verify_data.py: Validation script to ensure data quality - train-lfm2/: Ready-to-use training data (90/10 train/val split) Data quality metrics: - 100% success rate (all entries properly formatted) - Query length: 6-65 chars (avg: 29.3) - Response length: 307-777 chars (avg: 539.5) - All entries contain lex, vec, and hyde expansions	2026-02-17 06:19:59 -05:00
Tobi Lutke	640ac13cd0	fix: support multiple -c collection filters in search commands Closes #191 (thanks @openclaw)	2026-02-16 14:03:53 -04:00
Tobi Lutke	8c2282c979	fix: respect XDG_CONFIG_HOME in collection config path Closes #190 (thanks @openclaw)	2026-02-16 14:03:49 -04:00
Tobi Lutke	1781e7bf61	fix: pre-push hook writes to stderr, no interactive prompts Git hooks can't rely on tty access. Remove all interactive prompting — just validate and exit non-zero on failure.	2026-02-16 11:53:52 -04:00
Tobi Lutke	51c03d9445	release: v1.0.6	2026-02-16 09:08:34 -04:00
Tobi Lutke	63f3b68559	feat: show models in status, improve pre-push hook - Move model info from --help to `qmd status` with live HuggingFace links derived from actual configured URIs - Pre-push hook: handle non-interactive shells gracefully, resolve annotated tags correctly for CI checks	2026-02-16 09:08:28 -04:00
Tobi Lutke	8dd6cdcebf	fix: hide bun:sqlite import from tsc on Node.js builds Concatenate the module specifier at runtime ('bun:' + 'sqlite') so tsc doesn't try to resolve it during compilation on Node.js CI runners.	2026-02-16 08:56:02 -04:00
Tobi Lutke	6d399bc50a	release: v1.0.5	2026-02-16 08:47:23 -04:00
Tobi Lutke	e39848c030	chore: gitignore .claude/	2026-02-16 08:47:20 -04:00
Tobi Lutke	614c8d6328	docs: write changelog for v1.0.5 Build now ships compiled JS, new release skill and tooling.	2026-02-16 08:46:46 -04:00
Tobi Lutke	7fb69a5ca2	feat: release skill with changelog-driven workflow and git hooks - Add /release skill with full process: hook install, changelog validation, git history review, preview, and release execution - Skill auto-populates [Unreleased] from git history when empty - Install hook script symlinks pre-push for tag validation - Register skills/ dir in .pi/settings.json for pi discovery	2026-02-16 08:46:10 -04:00
Tobi Lutke	09803a75b7	feat: compile to JS for npm, release system, full changelog - Add tsc build step (tsconfig.build.json) so npm package ships compiled JS instead of raw TypeScript requiring tsx at runtime - Update qmd wrapper and daemon spawn to use dist/qmd.js in production while keeping tsx for development - Add self-installing pre-push hook validating v* tag pushes: package.json version match, changelog entry, CI status - Add release.sh script that renames [Unreleased] to versioned entry, bumps package.json, commits, and tags - Add extract-changelog.sh for cumulative GitHub release notes - Update publish workflow with build step and GitHub release creation - Flesh out CHANGELOG.md with full history from 0.1.0 through 1.0.0 in Keep-a-Changelog format with PR/contributor attributions - Add release standards and changelog guidelines to CLAUDE.md	2026-02-16 08:42:32 -04:00
Tobi Lutke	77c6eba159	fix: publish workflow bun test timeout and npm auth	2026-02-15 23:02:33 -04:00
Tobias Lütke	7acba1c451	Merge pull request #178 from tobi/release/v1.0.0 Release v1.0.0	2026-02-15 22:59:04 -04:00
Tobi Lutke	2780dfb5d0	fix: increase bun test timeout to 30s via CLI flag The default 5s timeout is too short for CLI subprocess tests in CI.	2026-02-15 21:59:18 -04:00
Tobi Lutke	93f277c5e3	fix: MCP session support and cross-runtime test compat - mcp.ts: add sessionIdGenerator to HTTP transport (fixes "stateless transport cannot be reused" error in CI) - test-preload.ts: set 30s default timeout for bun test runner (matches vitest config, prevents CLI subprocess test timeouts) - mcp.test.ts: use == null check instead of toBeUndefined for SQLite get() result (bun:sqlite returns null, better-sqlite3 returns undefined)	2026-02-15 21:54:25 -04:00
Tobi Lutke	edc9a87234	fix: correct test paths after moving to test/ directory - cli.test.ts: fix qmdScript path from <root>/qmd.ts to <root>/src/qmd.ts (broke when tests moved from src/integration/ to test/) - mcp.test.ts: forward Mcp-Session-Id header per MCP Streamable HTTP spec	2026-02-15 21:46:45 -04:00
Tobi Lutke	870d3aed3b	test: move all tests to flat test/ directory No more src/models/ and src/integration/ subfolders to forget about. All 9 test files live in test/, one command runs everything: npx vitest run test/ bun test test/	2026-02-15 21:37:47 -04:00
Tobi Lutke	dcedfb5268	feat: cross-runtime SQLite compat layer (bun:sqlite + better-sqlite3) Add src/db.ts that dynamically imports bun:sqlite under Bun and better-sqlite3 under Node.js. Exports openDatabase(), loadSqliteVec(), and a shared Database interface. - sqlite-vec loading is now optional — FTS works without it, vector ops throw a clear error if unavailable - CI tests both runtimes: Node 22/23 via vitest, Bun via bun test - All 104 unit tests pass on both Node and Bun	2026-02-15 17:15:47 -04:00
Tobi Lutke	c685f7ac71	ci: switch from bun test to vitest on Node.js All test files now use vitest + better-sqlite3 imports. bun test can't load the better-sqlite3 native addon (symbol error on Linux, segfault on macOS). Run vitest on Node 22/23.	2026-02-15 17:04:58 -04:00
Tobi Lutke	dc64166a2a	release: v1.0.0 Node.js compatibility, parallel embedding/reranking, flash attention, GPU auto-detection, and restructured test suite.	2026-02-15 17:02:00 -04:00
Tobi Lutke	294fc76d9f	Merge remote-tracking branch 'origin/nodejs'	2026-02-15 16:58:48 -04:00
Tobi Lutke	9b89a51d10	test: split integration/model suites Split test suites for explicit runtime execution. - Move model-related tests under `src/models/`. - Move CLI/integration tests under `src/integration/`. - Add `src/store.helpers.unit.test.ts` for helper unit coverage. - Add shared Vitest config with default timeout and suite organization. - Remove legacy flat test files from `src/` root. - Keep core test commands in scripts supporting unit/models/integration runs.	2026-02-15 16:57:13 -04:00
Tobi Lutke	4df5505bd6	Merge origin/nodejs: Node.js compat, perf improvements, vitest Brings in Node.js compatibility (tsx, vitest), GPU auto-detection, parallel embedding/reranking contexts, and flash attention support. Preserves @tobilu/qmd package scope and publish config from main.	2026-02-15 16:52:30 -04:00
Tobi Lutke	b88c10bf83	docs: show bun/node install and package scope Document both Node and Bun execution paths. - Update install examples to `@tobilu/qmd` for npm and bun. - Add npx/bunx one-off usage examples. - Reflect Bun as first-class supported runtime in requirements.	2026-02-15 16:45:35 -04:00
Tobi Lutke	13e8473455	docs: update node usage and bump version Update README installation and quick-start commands to Node examples. - replace bun install/link commands with npm-based Node workflow - bump package version to 0.9.9 for CLI and MCP metadata - keep Bun guidance as optional development/runtime note	2026-02-15 16:44:47 -04:00
Tobi Lutke	ee58a685de	ci: use trusted publishing (OIDC provenance)	2026-02-15 15:15:08 -04:00
Tobi Lutke	00ff084fd9	chore: fix bin path, add author, use token-based npm publish	2026-02-15 15:14:45 -04:00
Tobi Lutke	5d73752b47	chore: rename package scope to @tobilu/qmd	2026-02-15 15:07:26 -04:00
Tobi Lutke	53bf2ebf10	ci: use npm trusted publishing (OIDC) instead of token	2026-02-15 14:59:09 -04:00
Tobi Lutke	73985a2aaa	test: skip all model-dependent tests in CI Token-based chunking, vector search, hybrid search, and store LlamaCpp integration tests all require model downloads.	2026-02-15 14:46:41 -04:00
Tobi Lutke	ed4df97122	test: skip LLM integration tests in CI Model download + GPU inference won't work on CI runners. Uses describe.skipIf(CI) for LlamaCpp Integration, LLM Session Management, vector search, and deep search tests.	2026-02-15 14:42:20 -04:00
Tobi Lutke	2279389415	chore: set up npm publishing as @tobi/qmd v0.9.0 - Scope package to @tobi/qmd, version 0.9.0 - Add files whitelist, publishConfig, repo metadata - Add CI workflow (bun tests on ubuntu + macos, bun latest + 1.1.0) - Add publish workflow (triggers on v* tags, publishes to npm) - Add release script for version bumping + changelog generation - Add LICENSE (MIT) and initial CHANGELOG.md - Update install instructions to use @tobi/qmd	2026-02-15 14:31:23 -04:00
Tobi Lutke	31dd977c32	fix: handle dense content (code) that tokenizes to more than expected The 4 chars/token estimate is accurate for prose but code can be 1.7-2 chars/token. This caused chunks to exceed the embedding model's 2048 token context limit. - Use 3 chars/token as initial estimate (balanced for mixed content) - Add safety net: re-chunk any chunks that still exceed token limit - Use actual chars/token ratio when re-chunking for accuracy Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 14:20:09 -04:00
Tobi Lutke	537d15a9e6	fix: proper cleanup of Metal GPU resources in tests Add test-preload.ts with global afterAll hook that ensures llama.cpp Metal resources are properly disposed before process exit, avoiding GGML_ASSERT failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 14:20:09 -04:00
Tobi Lutke	2d2f53034d	fix: use max chunk size for snippet search window extractSnippet was using the snippet output length (500 chars) to determine the search window, which was too small even for fixed chunks. With variable-length smart chunks, this could miss relevant content entirely. Now uses CHUNK_SIZE_CHARS as fallback, ensuring the entire chunk region is searched regardless of actual chunk length. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 14:20:09 -04:00
Tobi Lutke	32112256c1	docs: document smart chunking algorithm in README Add Smart Chunking section explaining break point scoring, distance decay formula, and code fence protection. Update token counts from 800 to 900 throughout. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 14:20:09 -04:00
Tobi Lutke	f0e87a454a	feat: smart chunking with scored markdown break points Replace hard 800-token boundary chunking with scoring algorithm that finds natural document break points. Chunks now end at headings, code blocks, and paragraph boundaries when possible. - Add break point scoring: h1=100, h2=90, h3=80, codeblock=80, blank=20 - Use squared distance decay so headings win even at window edge - Protect code fences from being split - Increase chunk size to 900 tokens to accommodate smart boundaries - Add comprehensive tests for chunking functions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-15 14:20:09 -04:00
Tobi Lütke	392934e78a	perf: CPU parallelism via multi-context thread splitting Our assumption that CPU can't benefit from multiple contexts was wrong. The withLock in node-llama-cpp serializes within a single context, but separate contexts with split threads run on different cores in true parallel. Key changes: - computeParallelism() now returns >1 on CPU (cores / 4, max 4) - threadsPerContext() splits math cores evenly across contexts - Both embed and rerank contexts get proper thread counts - Benchmark updated to test CPU parallelism Before (CPU, 40 docs): 9.7s (4.1 docs/s) — 6 threads, 1 context After (CPU, 40 docs): 2.3s (17.2 docs/s) — 32 threads, 8 contexts Two fixes stacked: 1. Thread count: default was 6 (library hardcode), now uses all math cores — 2× improvement alone 2. Multi-context: splitting cores across 8 contexts gives another 2.2× on top End-to-end 'qmd query' on CPU: 10.3s → 2.9s CPU benchmark (Threadripper PRO 7975WX, 32 math cores): 1 ctx: 5001ms (8.0 docs/s) 2 ctx: 3585ms (11.2 docs/s) 1.4× 4 ctx: 2874ms (13.9 docs/s) 1.7× 8 ctx: 2323ms (17.2 docs/s) 2.2×	2026-02-15 11:21:45 -05:00
Tobi Lütke	bf42223086	bench: add reranker benchmark (bench-rerank.ts) Standalone benchmark for the reranking pipeline. Reports: - System info (CPU, GPU, VRAM) - Model VRAM usage - Per-config: parallelism, flash attention, median time, throughput (docs/s), VRAM per context, total VRAM, peak RSS - Speedup relative to baseline (1 context) Usage: bun src/bench-rerank.ts # full (40 docs, 3 iters, 1/2/4/8 ctx) bun src/bench-rerank.ts --quick # quick (10 docs, 1 iter) bun src/bench-rerank.ts --docs 100 # custom doc count Results on this machine: CUDA: 254ms/40 docs (8 ctx), 688ms (1 ctx) = 2.7x speedup CPU: 9697ms/40 docs (1 ctx) = 38x slower than single GPU ctx	2026-02-15 10:51:09 -05:00
Tobi Lütke	0a941c442f	perf: flash attention, right-sized contexts, cleaner GPU detection Holistic tuning pass on context and GPU configuration: GPU detection: - Use getLlamaGpuTypes() to discover available backends at runtime instead of try/catch loop. Prefer CUDA > Metal > Vulkan > CPU. - getLlama({gpu:'auto'}) returns false even when CUDA is available (node-llama-cpp issue), so we can't rely on it. Context tuning: - Rerank context: 2048 tokens (was auto=40960). The Qwen3 reranker template adds ~200 tokens overhead, chunks are ~800, query ~50. Total ~1050 tokens, so 2048 gives comfortable margin. VRAM per context: ~960 MB (was 11.6 GB with auto). - Flash attention enabled for rerank contexts (~20% less VRAM). Falls back gracefully if flash attention not supported. - Embed context: kept at model default (2048 for nomic-embed). Platform considerations: - CUDA (server): up to 8 parallel contexts, flash attention - Metal (MacBook): 1-4 contexts depending on unified memory - Vulkan: detected and used if CUDA/Metal unavailable - CPU: single context (parallelism has no benefit due to locks) Context size was 1024 initially but Qwen3's reranker template is verbose (system prompt + instruct + think tags) — some inputs exceeded 1024 tokens. Bumped to 2048 for safety.	2026-02-15 10:34:39 -05:00
Tobi Lütke	4ac95b5e26	perf: adaptive parallel contexts for embed + rerank, fix VRAM waste Holistic overhaul of context management: 1. Parallel embedding contexts: embedBatch now splits work across multiple EmbeddingContexts (same pattern as reranking). Each context is ~143 MB. Benchmarked 6x speedup on 20 texts with 4 contexts vs 1. 2. Rerank context size: was using auto (40960 tokens = 11.6 GB per context!). Reranking chunks are ~800 tokens max, so 1024 is plenty. Now 711 MB per context — 16x less VRAM. 4 contexts went from 46 GB to 2.8 GB. 3. Adaptive parallelism via computeParallelism(): checks available VRAM and allocates at most 25% of free VRAM for contexts, capped at 8. Falls back to 1 on CPU (no benefit from multiple contexts with node-llama-cpp's withLock serialization). Gracefully handles allocation failures — uses however many contexts succeeded. VRAM budget per operation: - Embed: N × 143 MB (nomic-embed, 2048 ctx) - Rerank: N × 711 MB (Qwen3-Reranker-0.6B, 1024 ctx) - Generate: ~1.1 GB (qmd-expansion-1.7B, fresh ctx per call) Works across: - Large GPU boxes (4x A6000, 190 GB): allocates up to 8 contexts - Consumer GPUs (16 GB): 2-4 contexts fit comfortably - Apple Metal (8-16 GB unified): 1-4 contexts depending on memory - CPU-only: single context (parallelism has no benefit)	2026-02-15 10:27:01 -05:00
Tobi Lütke	0a0e1e6f29	perf: parallel reranking with multiple contexts (2.7x speedup) node-llama-cpp's LlamaRankingContext uses a single sequence with a withLock() guard, making rankAll() effectively sequential despite using Promise.all(). Each document evaluation erases the context, evaluates tokens, and extracts the logit — all serialized. Fix: create 4 parallel ranking contexts from the same model (model weights are shared, only KV cache is duplicated). Split documents across contexts and evaluate in parallel via Promise.all(). Benchmarks (40 chunks, CUDA, 4x A6000): - 1 context: 898ms (baseline) - 2 contexts: 460ms (2.0x) - 4 contexts: 338ms (2.7x) ← sweet spot - 8 contexts: 458ms (VRAM contention) End-to-end 'qmd query' time: 7.5s → 3.7s Gracefully handles VRAM limits — if creating the Nth context fails, falls back to however many were successfully created.	2026-02-15 10:19:55 -05:00

1 2 3 4 5

234 Commits