ai-workspace-services/qmd

Author	SHA1	Message	Date
Haitao Pan	e3711767c6	fix: disable local qmd models by default	2026-05-23 11:04:48 +08:00
Haitao Pan	7c17c8bcce	feat: default to NVIDIA embeddings	2026-05-09 16:50:04 +08:00
Haitao Pan	fbad5791e3	feat: support NVIDIA embedding API	2026-05-09 16:44:47 +08:00
Haitao Pan	49fc83ebe2	Default embeddings to external API	2026-05-07 16:19:18 +08:00
Bek	e4990e470e	Harden embedding overflow handling	2026-04-10 16:02:46 -04:00
Tobias Lütke	171e9e3e65	Merge pull request #530 from kuishou68/fix-status-no-build-probe	2026-04-08 21:19:56 -04:00
Jeff Gardner	1ecb5c9f96	Fix QMD_LLAMA_GPU backend override handling	2026-04-07 18:49:22 +02:00
cocoon	26e3d0c077	fix(status): avoid build attempts during device probe	2026-04-07 23:18:58 +08:00
JohnRichardEnders	50ce17bbfa	feat(llm): resolve models as config > env > default Separate hardcoded default from env var in DEFAULT_EMBED_MODEL so the constructor can resolve: config param > env var > hardcoded default. Also add env var support for QMD_GENERATE_MODEL and QMD_RERANK_MODEL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 18:00:08 -04:00
Tobi Lutke	55f16460d0	fix(ci): guard LLM calls in CI and increase test timeouts Add _ciMode flag to LlamaCpp that throws immediately on embedBatch, generate, expandQuery, and rerank when CI=true — prevents silent 30s timeouts. Skip MCP HTTP Transport tests in CI (they instantiate a real LlamaCpp). Bump vitest/bun test timeouts to 60s for slower CI runners.	2026-03-10 13:28:37 -04:00
Tobi Lutke	e3549dab1a	perf(rerank): cap parallelism, deduplicate chunks, cache by content - Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines - Deduplicate identical chunk texts before sending to reranker - Cache rerank scores by chunk content instead of file path — same text from different files now shares a single reranker call - Add truncation cache to avoid re-tokenizing duplicate documents	2026-03-07 15:57:36 -04:00
Brian Le	0dec1df047	fix(llm): make expansion context size configurable	2026-03-06 16:35:33 -05:00
Tobi Lütke	5233e676d9	fix(rerank): truncate documents exceeding 2048-token context size node-llama-cpp throws a hard error when any document + query + template overhead exceeds the ranking context size. Truncate oversized documents using the rerank model's tokenizer before passing them to rankAll(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 12:41:59 -05:00
Tobi Lutke	870d3aed3b	test: move all tests to flat test/ directory No more src/models/ and src/integration/ subfolders to forget about. All 9 test files live in test/, one command runs everything: npx vitest run test/ bun test test/	2026-02-15 21:37:47 -04:00

14 Commits