ai-workspace-services/qmd

Author	SHA1	Message	Date
Tobias Lütke	e6b50cfca9	Merge pull request #308 from debugerman/fix/handelize-emoji-crash fix(store): handle emoji-only filenames in handelize (#302)	2026-03-07 14:24:59 -04:00
Brian Le	49d5b4f450	fix(index): deactivate stale docs on empty collection updates	2026-03-06 16:29:52 -05:00
Ning	dc777e3be0	fix(store): handle emoji-only filenames in handelize (#302 ) Convert emoji codepoints to hex representation (e.g. 🐘 → 1f418) instead of crashing, so files like 🐘.md can be indexed without halting the entire update process. Fixes #302	2026-03-06 14:24:24 +08:00
Tobi Lütke	5233e676d9	fix(rerank): truncate documents exceeding 2048-token context size node-llama-cpp throws a hard error when any document + query + template overhead exceeds the ranking context size. Truncate oversized documents using the rerank model's tokenizer before passing them to rankAll(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 12:41:59 -05:00
Tobi Lutke	64ef25e1f6	Document query grammar and add skill helpers	2026-02-22 13:36:08 -04:00
Tobi Lutke	c7e8ea02a5	test: restructure container smoke tests for interactive use Replaces the inner test script with an outer driver that runs individual podman/docker commands against a pre-built image. Tests sqlite-vec loading and store unit tests under both node and bun runtimes. Supports --build (image only), --shell (interactive), and -- CMD (arbitrary command) for debugging install issues in isolation.	2026-02-22 11:09:36 -04:00
Tobi Lutke	0b57711d32	refactor: replace bash wrapper with standard #!/usr/bin/env node shebang The qmd bin was a custom bash script that discovered node via hardcoded fallback paths (mise, asdf, nvm, homebrew). This was nonstandard and caused ABI mismatches when installed via bun (native modules compiled for bun but executed with node). Now uses the standard npm bin convention: dist/qmd.js with a node shebang, added by the build script. The isMain guard resolves symlinks so it works when npm/bun create symlinked bin entries. Also converts all dynamic require() calls in tests to ESM imports, and adds container-based smoke tests (test/smoke-install.sh) that verify install + run under both node and bun via mise in a Debian container.	2026-02-22 11:09:36 -04:00
Tobi Lütke	3b87e3e224	feat: query document format, lex phrase/negation syntax, training data The 'query document' is now a first-class concept in QMD: a structured document with typed sub-queries that combine for best recall. ## Query types - lex: BM25 keyword search with phrase and negation syntax - vec: Semantic vector search (natural language questions) - hyde: Hypothetical document (write the expected answer) - expand: Auto-expand via local LLM (max 1, default for plain queries) ## Lex syntax Full BM25 operator support: "exact phrase" verbatim match, no prefix -term exclude documents containing term -"exact phrase" exclude documents containing phrase Examples: "C++ performance" optimization -sports -athlete "connection pool" timeout -redis "machine learning" -sports -athlete ## MCP tool description rewritten The 'query' tool description now fully teaches AI agents the query document format, lex syntax, and strategy for combining types. Includes worked examples including intent-aware lex (C++ performance, not sports) which is critical for disambiguation in dense corpora. ## Unit tests 11 new lex parser tests covering: - plain terms, quoted phrases, negation, combined - intent-aware disambiguation (performance -sports -athlete) - only-negation returns null (FTS5 constraint) - empty/whitespace handling ## Training data 12 new intent-aware examples for next model training round: - Real technical topics with lex phrase+negation combinations - Covers: C++ perf, Python memory, DB connections, rate limiting, SQL optimization, ML overfitting, Docker, JWT, async/await, git conflicts, Kubernetes, React state - Each shows how context/intent shapes lex query construction (e.g. performance with C++ context → -sports -athlete exclusions)	2026-02-19 06:52:58 -05:00
Tobi Lütke	4649069e62	feat: add expand: type, rename to query, document syntax BREAKING CHANGES: - MCP tool renamed: structured_search → query - HTTP endpoint renamed: /search → /query New features: - expand: type auto-expands via local LLM (max 1 per query) - docs/SYNTAX.md formal grammar for query documents - lex syntax: "phrase", -negation documented Query types: lex, vec, hyde, expand Default (no prefix) = expand (backwards compatible)	2026-02-18 22:22:50 -05:00
Tobi Lütke	de3a83a553	refactor: remove OR operator from lex queries Simplify to just: terms, "phrases", and -negation	2026-02-18 22:17:52 -05:00
Tobi Lütke	efb39616e6	feat(lex): add query syntax for exact phrases, negation, and OR Lex queries now support: - "exact phrase" - quoted exact matching (no prefix) - -term or -"phrase" - exclude from results - term1 OR term2 - match either term Semantic queries (vec/hyde) validate and reject these operators with helpful error messages. Examples: performance -sports → matches "performance" excluding "sports" "machine learning" → exact phrase match auth OR authentication → matches either term	2026-02-18 22:14:09 -05:00
Tobi Lütke	19284ddb80	refactor(mcp): remove deprecated search tools, keep only structured_search BREAKING CHANGE: MCP tools search, vector_search, deep_search removed. Use structured_search with lex/vec/hyde queries instead. - Remove search, vector_search, deep_search MCP tool registrations - Update MCP instructions to focus on structured_search - Update skill docs to reflect simplified API - Rename test describes to reflect they test store functions - CLI commands (qmd search, vsearch, query) unchanged for backwards compat	2026-02-18 21:50:25 -05:00
Tobi Lütke	db44e1a5bc	test: add comprehensive tests for structured search 32 tests covering: - parseStructuredQuery parser (24 tests) - plain queries returning null - single/multiple prefixed queries - mixed plain + prefixed lines - error on multiple plain lines - whitespace handling - edge cases (colons in text, etc.) - StructuredSubSearch type validation (3 tests) - structuredSearch function basics (5 tests) - empty searches - no matches - limit/minScore options	2026-02-18 21:39:40 -05:00
Tobi Lutke	648779a04d	fix(test): reset currentIndexName between test files collections-config.test.ts set currentIndexName to "myindex" in its last test but only restored env vars in afterEach — not the module variable. Under bun test (single process), this leaked into mcp.test.ts, causing it to look for myindex.yml instead of index.yml. Fix: reset setConfigIndexName("index") in afterEach, and add defensive reset in mcp.test.ts beforeAll.	2026-02-18 15:53:58 -04:00
Tobi Lutke	640ac13cd0	fix: support multiple -c collection filters in search commands Closes #191 (thanks @openclaw)	2026-02-16 14:03:53 -04:00
Tobi Lutke	8c2282c979	fix: respect XDG_CONFIG_HOME in collection config path Closes #190 (thanks @openclaw)	2026-02-16 14:03:49 -04:00
Tobi Lutke	93f277c5e3	fix: MCP session support and cross-runtime test compat - mcp.ts: add sessionIdGenerator to HTTP transport (fixes "stateless transport cannot be reused" error in CI) - test-preload.ts: set 30s default timeout for bun test runner (matches vitest config, prevents CLI subprocess test timeouts) - mcp.test.ts: use == null check instead of toBeUndefined for SQLite get() result (bun:sqlite returns null, better-sqlite3 returns undefined)	2026-02-15 21:54:25 -04:00
Tobi Lutke	edc9a87234	fix: correct test paths after moving to test/ directory - cli.test.ts: fix qmdScript path from <root>/qmd.ts to <root>/src/qmd.ts (broke when tests moved from src/integration/ to test/) - mcp.test.ts: forward Mcp-Session-Id header per MCP Streamable HTTP spec	2026-02-15 21:46:45 -04:00
Tobi Lutke	870d3aed3b	test: move all tests to flat test/ directory No more src/models/ and src/integration/ subfolders to forget about. All 9 test files live in test/, one command runs everything: npx vitest run test/ bun test test/	2026-02-15 21:37:47 -04:00
Tobi Lutke	431f6e505b	Fix qmd embed crash and resolve all TypeScript errors - Fix ReferenceError in vectorIndex(): firstResult was used but never defined. Added code to embed first chunk to get embedding dimensions. - Fix 87 TypeScript errors across codebase: - formatter.ts: Define MultiGetFile type locally (was missing from store.ts) - collections.ts: Add non-null assertion for array access - mcp.ts: Fix StatusResult type to match store.ts CollectionInfo, add list parameter to ResourceTemplate, fix undefined checks - qmd.ts: Fix boolean/string type coercions, undefined array access - llm.test.ts: Update expandQuery tests for Queryable[] return type, fix array access assertions - store.test.ts: Add non-null assertions for array access in tests - eval-harness.ts: Fix array access assertion	2025-12-31 13:32:30 -04:00
Tobi Lutke	945d4b4572	Add 6 synthetic evaluation documents Topics covered: - API design principles - Startup fundraising memo - Distributed systems overview - Product launch retrospective - Machine learning primer - Remote work policy 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 13:10:35 -04:00
Tobi Lutke	7828566333	Add evaluation harness with synthetic test documents - 6 public-style documents covering diverse topics - 18 test queries: 6 easy, 6 medium, 6 hard - Easy: exact keyword matches - Medium: semantic/conceptual queries - Hard: partial recall, indirect references - Measures Hit@1, Hit@3, Hit@5 by difficulty - Tests both search (BM25) and query (hybrid) modes Run: bun test/eval-harness.ts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-21 13:10:24 -04:00

22 Commits