Convert emoji codepoints to hex representation (e.g. 🐘 → 1f418) instead
of crashing, so files like 🐘.md can be indexed without halting the
entire update process.
Fixes#302
node-llama-cpp throws a hard error when any document + query + template
overhead exceeds the ranking context size. Truncate oversized documents
using the rerank model's tokenizer before passing them to rankAll().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the inner test script with an outer driver that runs individual
podman/docker commands against a pre-built image. Tests sqlite-vec
loading and store unit tests under both node and bun runtimes.
Supports --build (image only), --shell (interactive), and -- CMD
(arbitrary command) for debugging install issues in isolation.
The qmd bin was a custom bash script that discovered node via hardcoded
fallback paths (mise, asdf, nvm, homebrew). This was nonstandard and
caused ABI mismatches when installed via bun (native modules compiled
for bun but executed with node).
Now uses the standard npm bin convention: dist/qmd.js with a node
shebang, added by the build script. The isMain guard resolves symlinks
so it works when npm/bun create symlinked bin entries.
Also converts all dynamic require() calls in tests to ESM imports, and
adds container-based smoke tests (test/smoke-install.sh) that verify
install + run under both node and bun via mise in a Debian container.
The 'query document' is now a first-class concept in QMD: a structured
document with typed sub-queries that combine for best recall.
## Query types
- lex: BM25 keyword search with phrase and negation syntax
- vec: Semantic vector search (natural language questions)
- hyde: Hypothetical document (write the expected answer)
- expand: Auto-expand via local LLM (max 1, default for plain queries)
## Lex syntax
Full BM25 operator support:
"exact phrase" verbatim match, no prefix
-term exclude documents containing term
-"exact phrase" exclude documents containing phrase
Examples:
"C++ performance" optimization -sports -athlete
"connection pool" timeout -redis
"machine learning" -sports -athlete
## MCP tool description rewritten
The 'query' tool description now fully teaches AI agents the query
document format, lex syntax, and strategy for combining types.
Includes worked examples including intent-aware lex (C++ performance,
not sports) which is critical for disambiguation in dense corpora.
## Unit tests
11 new lex parser tests covering:
- plain terms, quoted phrases, negation, combined
- intent-aware disambiguation (performance -sports -athlete)
- only-negation returns null (FTS5 constraint)
- empty/whitespace handling
## Training data
12 new intent-aware examples for next model training round:
- Real technical topics with lex phrase+negation combinations
- Covers: C++ perf, Python memory, DB connections, rate limiting,
SQL optimization, ML overfitting, Docker, JWT, async/await,
git conflicts, Kubernetes, React state
- Each shows how context/intent shapes lex query construction
(e.g. performance with C++ context → -sports -athlete exclusions)
Lex queries now support:
- "exact phrase" - quoted exact matching (no prefix)
- -term or -"phrase" - exclude from results
- term1 OR term2 - match either term
Semantic queries (vec/hyde) validate and reject these operators
with helpful error messages.
Examples:
performance -sports → matches "performance" excluding "sports"
"machine learning" → exact phrase match
auth OR authentication → matches either term
BREAKING CHANGE: MCP tools search, vector_search, deep_search removed.
Use structured_search with lex/vec/hyde queries instead.
- Remove search, vector_search, deep_search MCP tool registrations
- Update MCP instructions to focus on structured_search
- Update skill docs to reflect simplified API
- Rename test describes to reflect they test store functions
- CLI commands (qmd search, vsearch, query) unchanged for backwards compat
collections-config.test.ts set currentIndexName to "myindex" in its
last test but only restored env vars in afterEach — not the module
variable. Under bun test (single process), this leaked into mcp.test.ts,
causing it to look for myindex.yml instead of index.yml.
Fix: reset setConfigIndexName("index") in afterEach, and add defensive
reset in mcp.test.ts beforeAll.
- mcp.ts: add sessionIdGenerator to HTTP transport (fixes "stateless
transport cannot be reused" error in CI)
- test-preload.ts: set 30s default timeout for bun test runner (matches
vitest config, prevents CLI subprocess test timeouts)
- mcp.test.ts: use == null check instead of toBeUndefined for SQLite
get() result (bun:sqlite returns null, better-sqlite3 returns undefined)
- cli.test.ts: fix qmdScript path from <root>/qmd.ts to <root>/src/qmd.ts
(broke when tests moved from src/integration/ to test/)
- mcp.test.ts: forward Mcp-Session-Id header per MCP Streamable HTTP spec
No more src/models/ and src/integration/ subfolders to forget about.
All 9 test files live in test/, one command runs everything:
npx vitest run test/
bun test test/
- Fix ReferenceError in vectorIndex(): firstResult was used but never
defined. Added code to embed first chunk to get embedding dimensions.
- Fix 87 TypeScript errors across codebase:
- formatter.ts: Define MultiGetFile type locally (was missing from store.ts)
- collections.ts: Add non-null assertion for array access
- mcp.ts: Fix StatusResult type to match store.ts CollectionInfo,
add list parameter to ResourceTemplate, fix undefined checks
- qmd.ts: Fix boolean/string type coercions, undefined array access
- llm.test.ts: Update expandQuery tests for Queryable[] return type,
fix array access assertions
- store.test.ts: Add non-null assertions for array access in tests
- eval-harness.ts: Fix array access assertion