Commit Graph

469 Commits

Author SHA1 Message Date
Tobi Lutke
da9cf691fd
release: v1.1.5 2026-03-07 20:16:32 -04:00
Tobi Lutke
66fb5b1d98
docs: write changelog for 1.1.5 2026-03-07 20:15:54 -04:00
Tobi Lutke
ad38c1f698
feat: add intent parameter for query disambiguation
Add optional `intent` parameter that steers query expansion, reranking,
chunk selection, and snippet extraction without searching on its own.

When a query like "performance" is ambiguous (web-perf vs team health vs
fitness), intent provides background context that disambiguates results
across all pipeline stages:

- expandQuery: includes intent in LLM prompt ("Query intent: {intent}")
- rerank: prepends intent to rerank query for Qwen3-Reranker
- chunk selection: intent terms scored at 0.5x weight vs query terms
- snippet extraction: intent terms scored at 0.3x weight
- strong-signal bypass: disabled when intent provided

Available via CLI (--intent flag or intent: line in query documents),
MCP (intent field on query tool), and programmatic API.

Adapted from PR #180 (thanks @vyalamar).
2026-03-07 19:27:29 -04:00
Tobi Lutke
b838f74c8c
release: v1.1.2 2026-03-07 15:58:28 -04:00
Tobi Lutke
0ff9bec129
docs: write changelog for 1.1.2 2026-03-07 15:58:14 -04:00
Tobi Lutke
e3549dab1a
perf(rerank): cap parallelism, deduplicate chunks, cache by content
- Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines
- Deduplicate identical chunk texts before sending to reranker
- Cache rerank scores by chunk content instead of file path — same text
  from different files now shares a single reranker call
- Add truncation cache to avoid re-tokenizing duplicate documents
2026-03-07 15:57:36 -04:00
Tobi Lutke
44d7145bfe
Merge pull request #242 from vyalamar/feat/query-explain-score-traces
feat(query): add --explain score traces for hybrid retrieval
2026-03-07 14:35:26 -04:00
vyalamar
b068ad0dd6
feat(query): add --explain score traces for hybrid search 2026-03-07 14:35:10 -04:00
Tobias Lütke
7904ab9a9d
Merge pull request #273 from daocoding/feature/configurable-embed-model
feat: add QMD_EMBED_MODEL env var for multilingual embedding support
2026-03-07 14:28:59 -04:00
Tobias Lütke
cb5d84ff07
Merge pull request #225 from ilepn/fix/sqlite-vec-windows-package-name
fix(package.json): correct Windows sqlite-vec package name + add linux-arm64
2026-03-07 14:28:55 -04:00
Tobias Lütke
a4b641d8e3
Merge pull request #255 from pandysp/feat/expose-candidate-limit
feat: expose candidateLimit as MCP tool parameter and CLI flag
2026-03-07 14:28:52 -04:00
Tobias Lütke
e3bc5ccdc3
Merge pull request #286 from joelev/fix/multi-session-http
fix: support multiple concurrent HTTP clients
2026-03-07 14:28:50 -04:00
Tobias Lütke
8bd93366ad
Merge pull request #228 from amsminn/fix-empty-results-format
fix(cli): prevent parser breakage on empty results across output formats
2026-03-07 14:25:16 -04:00
Tobias Lütke
0b3fb07a8f
Merge pull request #230 from Balneario-de-Cofrentes/fix/tty-progress-guard
fix(cli): suppress progress bars when not TTY
2026-03-07 14:25:13 -04:00
Tobias Lütke
271feb7791
Merge pull request #253 from jimmynail/fix/skip-unreadable-files
fix: skip unreadable files during indexing instead of crashing
2026-03-07 14:25:11 -04:00
Tobias Lütke
ee08997f23
Merge pull request #313 from 0xble/fix/expand-context-size-config
fix(llm): make query expansion context size configurable
2026-03-07 14:25:04 -04:00
Tobias Lütke
a28163fb2c
Merge pull request #304 from sebkouba/feature/collection-ignore
feat: add ignore patterns for collections
2026-03-07 14:25:02 -04:00
Tobias Lütke
e6b50cfca9
Merge pull request #308 from debugerman/fix/handelize-emoji-crash
fix(store): handle emoji-only filenames in handelize (#302)
2026-03-07 14:24:59 -04:00
Tobias Lütke
72e96d16c0
Merge pull request #312 from 0xble/fix/empty-collection-deactivate
fix(index): deactivate stale docs on empty collection updates
2026-03-07 14:24:57 -04:00
Tobias Lütke
f75c668e46
Merge pull request #310 from giladgd/nodeLlamaCppUseBuildAutoAttempt
feat: use `build: "autoAttempt"` on `getLlama`
2026-03-07 14:24:52 -04:00
Tobias Lütke
6934c464db
Merge pull request #311 from gi11es/patch-1
Fix claude plugin setup syntax
2026-03-07 14:24:49 -04:00
Gilad S.
607ab7a402
fix: remove unused config 2026-03-07 07:48:13 +02:00
Brian Le
0dec1df047
fix(llm): make expansion context size configurable 2026-03-06 16:35:33 -05:00
Brian Le
49d5b4f450
fix(index): deactivate stale docs on empty collection updates 2026-03-06 16:29:52 -05:00
Tobi Lutke
2ae1baba2f
release: v1.1.1 2026-03-06 14:48:47 -04:00
Tobi Lutke
60721658c0
docs: write changelog for 1.1.1 2026-03-06 14:48:35 -04:00
Gilles Dubuc
7f8e33e0a9
Fix plugin install syntax 2026-03-06 12:14:16 +01:00
Gilles Dubuc
75589d77f3
Fix claude marketplace syntax 2026-03-06 12:12:14 +01:00
Gilad S.
3095041e0f feat: use build: "autoAttempt" on getLlama 2026-03-06 07:02:50 +00:00
Ning
dc777e3be0
fix(store): handle emoji-only filenames in handelize (#302)
Convert emoji codepoints to hex representation (e.g. 🐘 → 1f418) instead
of crashing, so files like 🐘.md can be indexed without halting the
entire update process.

Fixes #302
2026-03-06 14:24:24 +08:00
Sebastian Kouba
fde542cd0d feat: add ignore patterns for collections
Add an optional 'ignore' field to collection config that accepts an array
of glob patterns to exclude from indexing. This allows collections to skip
specific subdirectories without needing separate collections.

Example YAML config:
  personal:
    path: ~/personal_synced
    pattern: '**/*.md'
    ignore:
      - 'Sessions/**'
      - 'archive/**'

The ignore patterns are passed to fast-glob's ignore option alongside the
existing hardcoded excludes (node_modules, .git, etc). Already-indexed
files matching new ignore patterns are deactivated on the next update.

Changes:
- Add ignore?: string[] to Collection interface
- Pass ignore patterns through to fast-glob in indexFiles()
- Show ignore patterns in collection list/status output
- 5 new CLI integration tests covering the feature
2026-03-05 19:17:44 +01:00
Joel Johnson
383a2e5cf1 fix: support multiple concurrent HTTP clients
The HTTP MCP server creates a single Transport + McpServer pair at
startup. Once the first client initializes, all subsequent clients
are rejected with "Server already initialized" — making the HTTP
mode unusable for reconnect, crash recovery, or multi-client scenarios.

Replace the singleton with a per-session architecture: each initialize
request creates its own McpServer + Transport pair, stored in a
sessions Map keyed by session ID. The shared Store (SQLite) is
stateless and safe for concurrent access.

Key changes:
- createSession() factory creates fresh McpServer + Transport per client
- POST /mcp routes by mcp-session-id header to existing sessions
- New initialize requests (no session header) create new sessions
- Unknown session IDs return 404 per MCP spec
- Missing session IDs return 400
- onsessioninitialized callback stores sessions at the right time
- transport.onclose cleans up the sessions Map
- Shutdown iterates all active sessions

Tested with 3+ concurrent clients, session cleanup via DELETE,
cross-session isolation, and rapid session creation.

Fixes #195
Closes #163

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 22:22:49 -05:00
Big (daocoding)
b71649b12d feat: add QMD_EMBED_MODEL env var for multilingual embedding support
The default embeddinggemma-300M model is English-centric and produces
poor embeddings for CJK (Chinese, Japanese, Korean) text. This change
allows overriding the embedding model via the QMD_EMBED_MODEL environment
variable.

Changes:
- DEFAULT_EMBED_MODEL now reads from QMD_EMBED_MODEL env var (fallback to
  embeddinggemma-300M for backward compatibility)
- getDefaultLlamaCpp() passes QMD_EMBED_MODEL to LlamaCpp config when set
- formatQueryForEmbedding() and formatDocForEmbedding() detect Qwen3-Embedding
  models and apply the correct prompt format (Qwen3 uses task-instruction
  format; embeddinggemma uses nomic-style prefix format)
- store.ts: pass model URI to format functions so format selection is
  consistent between indexing and query time
- README: document QMD_EMBED_MODEL with Qwen3-Embedding example

Recommended multilingual model:
  QMD_EMBED_MODEL=hf:Qwen/Qwen3-Embedding-0.6B-GGUF/qwen3-embedding-0.6b-q8_0.gguf

After changing the model, run: qmd embed -f
2026-03-01 12:41:09 -05:00
Tobias Lütke
40610c3aa6
Merge pull request #256 from rkbadhan/reward-design
fix(reward): tighten entity detection, filler penalty, stricter diversity
2026-02-26 06:15:49 -05:00
rkbadhan
4511b9bd4d fix(reward): tighten entity detection, add filler penalty, stricter diversity
- Compound entity chaining now stops one level deep. Previously "TDS
  motorsports team history" would inflate the expected entity set with
  "team" and "history", causing false-positive entity-preservation
  penalties during GRPO. Now only {tds, motorsports} are detected.

- Add INTERIOR_FILLER_WORDS penalty (-3/line): lex lines containing
  "overview" or "basics" absent from the original query are penalised.
  Targets template-generator noise, e.g. "ancient overview rome timeline".

- Raise is_diverse threshold 2→3: requires 3 unique words between lex
  lines before they count as diverse. Reduces reward for near-duplicate
  pairs like "auth setup" / "auth configuration".

- Broaden quoted-phrase bonus: was gated on named entities existing;
  now any multi-word query earns +3 for using quotes in lex lines.
  Better incentivises BM25-aware syntax like "memory leak" python.

Fixes scoring noise identified while working on issue #247.
2026-02-24 19:46:23 +05:30
Andreas Spannagel
87bd968d7b feat: expose candidateLimit as MCP tool parameter and CLI flag
Reranking 40 chunks takes ~2 min on CPU (the default candidateLimit).
The option already exists in hybridQuery()/structuredSearch() but was
never surfaced to users. This adds:

- `candidateLimit` param to the MCP `query` tool inputSchema
- `candidateLimit` field to the REST /query endpoint
- `--candidate-limit` / `-C` CLI flag for `qmd query`

Default stays 40 (no behavior change). Users on CPU-only machines can
lower it for a speed/recall tradeoff. Complements #231.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 14:13:49 +01:00
CHAEWAN KIM
b024693f5d
Merge branch 'main' into fix-empty-results-format 2026-02-23 22:36:21 -08:00
Kit
32cd83b470 fix: skip unreadable files during indexing instead of crashing
On macOS with iCloud Drive (especially shared folders), some files may
appear in the filesystem but return EAGAIN (error -11) when read via
Node's readFileSync. This happens when iCloud has evicted the file
content but the file metadata remains visible.

Previously this crashed the entire update process. Now we catch the
error and skip the file, allowing the remaining files to index
successfully.

Affects: iCloud Drive shared folders on macOS
Error: 'Unknown system error -11: Unknown system error -11, read'
Reproduces with: Node.js v25.x, readFileSync on evicted iCloud files
2026-02-23 14:40:09 -08:00
Tobi Lütke
d6f3688d91
Remove grpo command from default train entrypoint 2026-02-22 15:29:09 -05:00
Tobi Lütke
189916d6fb
Move GRPO training out of default finetune pipeline 2026-02-22 15:26:23 -05:00
Tobi Lütke
cbeeb1f89b
Add wall-clock checkpoints and full eval defaults 2026-02-22 15:02:02 -05:00
Tobi Lütke
5233e676d9
fix(rerank): truncate documents exceeding 2048-token context size
node-llama-cpp throws a hard error when any document + query + template
overhead exceeds the ranking context size. Truncate oversized documents
using the rerank model's tokenizer before passing them to rankAll().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 12:41:59 -05:00
Tobi Lutke
1d7d167b29
finetune: strict Pydantic schema, one canonical data format
Replace ad-hoc JSON parsing with a strict Pydantic model
(TrainingExample with typed OutputPair). All data loading goes
through load_examples() which fails loudly on invalid data.

- Convert v3_structured.jsonl from "searches" to "output" format
- Rewrite all consumer scripts (prepare, validate, score, analyze)
  to load through the Pydantic schema
- Prepared train/val files are ephemeral build artifacts
- Restore LFM2 and GEPA experiments under experiments/
- Add pydantic>=2.0 to dependencies
2026-02-22 13:39:00 -04:00
Tobi Lutke
3950055708
finetune: quoted phrases, negation, and entity preservation (#247)
Training data:
- Expand lex phrases/negation examples from 12 to 74 with intent field
- Add 50 personal entity examples (meetings, emails, projects with names)

Reward function:
- Detect entities at position 0 (fixes "Bob asked about deploy")
- Per-entity coverage penalty: -20 per entity absent from all lex+vec
- Phrase quoting bonus: +3 when lex uses quotes for multi-word terms
- Expanded stopwords to reduce false positive entity detection

Eval queries: add 21 test queries for personal entities, quoted phrases,
and negation/disambiguation scenarios.
2026-02-22 13:38:59 -04:00
Tobi Lutke
599935754b
finetune: remove orphaned files and abandoned experiments
Remove one-off data generator/fix scripts, superseded data files (v2, v3
replaced by v3_structured), LFM2 experiment, GEPA directory, duplicate
job scripts, and historical docs. Clean up Justfile.

These are restored under experiments/ in a later commit.
2026-02-22 13:38:59 -04:00
Tobi Lutke
64ef25e1f6
Document query grammar and add skill helpers 2026-02-22 13:36:08 -04:00
Tobi Lutke
0e0feb6f2b
release: v1.1.0 2026-02-22 11:09:36 -04:00
Tobi Lutke
60564b34f8
chore: gitignore package-lock.json 2026-02-22 11:09:36 -04:00
Tobi Lutke
1765b6870d
docs: write changelog for 1.1.0
Query document format, lex phrase/negation syntax, standard node
shebang, collection management commands, and formal SYNTAX.md spec.
2026-02-22 11:09:36 -04:00
Tobi Lutke
c7e8ea02a5
test: restructure container smoke tests for interactive use
Replaces the inner test script with an outer driver that runs individual
podman/docker commands against a pre-built image. Tests sqlite-vec
loading and store unit tests under both node and bun runtimes.

Supports --build (image only), --shell (interactive), and -- CMD
(arbitrary command) for debugging install issues in isolation.
2026-02-22 11:09:36 -04:00