Commit Graph

14 Commits

Author SHA1 Message Date
Haitao Pan
e3711767c6 fix: disable local qmd models by default 2026-05-23 11:04:48 +08:00
Haitao Pan
7c17c8bcce feat: default to NVIDIA embeddings 2026-05-09 16:50:04 +08:00
Haitao Pan
fbad5791e3 feat: support NVIDIA embedding API 2026-05-09 16:44:47 +08:00
Haitao Pan
49fc83ebe2 Default embeddings to external API 2026-05-07 16:19:18 +08:00
Bek
e4990e470e Harden embedding overflow handling 2026-04-10 16:02:46 -04:00
Tobias Lütke
171e9e3e65
Merge pull request #530 from kuishou68/fix-status-no-build-probe 2026-04-08 21:19:56 -04:00
Jeff Gardner
1ecb5c9f96
Fix QMD_LLAMA_GPU backend override handling 2026-04-07 18:49:22 +02:00
cocoon
26e3d0c077 fix(status): avoid build attempts during device probe 2026-04-07 23:18:58 +08:00
JohnRichardEnders
50ce17bbfa feat(llm): resolve models as config > env > default
Separate hardcoded default from env var in DEFAULT_EMBED_MODEL so the
constructor can resolve: config param > env var > hardcoded default.
Also add env var support for QMD_GENERATE_MODEL and QMD_RERANK_MODEL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:00:08 -04:00
Tobi Lutke
55f16460d0
fix(ci): guard LLM calls in CI and increase test timeouts
Add _ciMode flag to LlamaCpp that throws immediately on embedBatch,
generate, expandQuery, and rerank when CI=true — prevents silent 30s
timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real
LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.
2026-03-10 13:28:37 -04:00
Tobi Lutke
e3549dab1a
perf(rerank): cap parallelism, deduplicate chunks, cache by content
- Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines
- Deduplicate identical chunk texts before sending to reranker
- Cache rerank scores by chunk content instead of file path — same text
  from different files now shares a single reranker call
- Add truncation cache to avoid re-tokenizing duplicate documents
2026-03-07 15:57:36 -04:00
Brian Le
0dec1df047
fix(llm): make expansion context size configurable 2026-03-06 16:35:33 -05:00
Tobi Lütke
5233e676d9
fix(rerank): truncate documents exceeding 2048-token context size
node-llama-cpp throws a hard error when any document + query + template
overhead exceeds the ranking context size. Truncate oversized documents
using the rerank model's tokenizer before passing them to rankAll().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 12:41:59 -05:00
Tobi Lutke
870d3aed3b
test: move all tests to flat test/ directory
No more src/models/ and src/integration/ subfolders to forget about.
All 9 test files live in test/, one command runs everything:

  npx vitest run test/
  bun test test/
2026-02-15 21:37:47 -04:00