Separate hardcoded default from env var in DEFAULT_EMBED_MODEL so the
constructor can resolve: config param > env var > hardcoded default.
Also add env var support for QMD_GENERATE_MODEL and QMD_RERANK_MODEL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add _ciMode flag to LlamaCpp that throws immediately on embedBatch,
generate, expandQuery, and rerank when CI=true — prevents silent 30s
timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real
LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.
- Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines
- Deduplicate identical chunk texts before sending to reranker
- Cache rerank scores by chunk content instead of file path — same text
from different files now shares a single reranker call
- Add truncation cache to avoid re-tokenizing duplicate documents
node-llama-cpp throws a hard error when any document + query + template
overhead exceeds the ranking context size. Truncate oversized documents
using the rerank model's tokenizer before passing them to rankAll().
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No more src/models/ and src/integration/ subfolders to forget about.
All 9 test files live in test/, one command runs everything:
npx vitest run test/
bun test test/