Commit Graph

31 Commits

Author SHA1 Message Date
Tobias Lütke
55c951b15e
Merge pull request #349 from byheaven/fix/qwen3-embedding-model-filename-case
docs: fix Qwen3-Embedding GGUF filename case (404 on download)
2026-03-10 20:08:53 -04:00
Tobi Lutke
a444c86382
docs: rewrite SDK section for 2.0, fix MCP tool names, add changelog
- Expand SDK documentation from ~70 lines to comprehensive coverage:
  store creation modes, unified search(), retrieval, collections,
  context, indexing, types, and lifecycle
- Fix MCP tools section: old names (qmd_search, qmd_deep_search)
  replaced with actual registered names (query, get, multi_get, status)
- Write 2.0.0 changelog under [Unreleased]
2026-03-10 11:53:12 -04:00
YuBai
740b17b485 docs: fix Qwen3-Embedding GGUF filename case in README and llm.ts
HuggingFace filenames are case-sensitive. The documented filename
'qwen3-embedding-0.6b-q8_0.gguf' (lowercase) returns 404. The correct
filename is 'Qwen3-Embedding-0.6B-Q8_0.gguf' (original case from the
HuggingFace repo).

Co-Authored-By: Oz <oz-agent@warp.dev>
2026-03-10 18:54:36 +08:00
Tobi Lutke
040c6fa904
feat: add SDK/library mode for programmatic access
Allow QMD to be used as a library (`import { createStore } from '@tobilu/qmd'`)
in addition to CLI and MCP modes. The constructor requires explicit dbPath and
either a configPath (YAML file) or inline config object — no defaults assumed,
making it safe to embed in any application.

- Add src/index.ts entry point with QMDStore interface exposing search,
  retrieval, collection/context management, and index health
- Add setConfigSource() to collections.ts for inline config support
  (in-memory config with no file I/O)
- Add main/types/exports fields to package.json
- Add SDK documentation section to README
- Add 56 unit tests covering constructor, collections, contexts, search,
  document retrieval, config isolation, YAML persistence, and lifecycle
2026-03-08 15:59:22 -04:00
vyalamar
b068ad0dd6
feat(query): add --explain score traces for hybrid search 2026-03-07 14:35:10 -04:00
Tobias Lütke
7904ab9a9d
Merge pull request #273 from daocoding/feature/configurable-embed-model
feat: add QMD_EMBED_MODEL env var for multilingual embedding support
2026-03-07 14:28:59 -04:00
Gilles Dubuc
7f8e33e0a9
Fix plugin install syntax 2026-03-06 12:14:16 +01:00
Gilles Dubuc
75589d77f3
Fix claude marketplace syntax 2026-03-06 12:12:14 +01:00
Big (daocoding)
b71649b12d feat: add QMD_EMBED_MODEL env var for multilingual embedding support
The default embeddinggemma-300M model is English-centric and produces
poor embeddings for CJK (Chinese, Japanese, Korean) text. This change
allows overriding the embedding model via the QMD_EMBED_MODEL environment
variable.

Changes:
- DEFAULT_EMBED_MODEL now reads from QMD_EMBED_MODEL env var (fallback to
  embeddinggemma-300M for backward compatibility)
- getDefaultLlamaCpp() passes QMD_EMBED_MODEL to LlamaCpp config when set
- formatQueryForEmbedding() and formatDocForEmbedding() detect Qwen3-Embedding
  models and apply the correct prompt format (Qwen3 uses task-instruction
  format; embeddinggemma uses nomic-style prefix format)
- store.ts: pass model URI to format functions so format selection is
  consistent between indexing and query time
- README: document QMD_EMBED_MODEL with Qwen3-Embedding example

Recommended multilingual model:
  QMD_EMBED_MODEL=hf:Qwen/Qwen3-Embedding-0.6B-GGUF/qwen3-embedding-0.6b-q8_0.gguf

After changing the model, run: qmd embed -f
2026-03-01 12:41:09 -05:00
Tobi Lutke
09803a75b7
feat: compile to JS for npm, release system, full changelog
- Add tsc build step (tsconfig.build.json) so npm package ships
  compiled JS instead of raw TypeScript requiring tsx at runtime
- Update qmd wrapper and daemon spawn to use dist/qmd.js in
  production while keeping tsx for development
- Add self-installing pre-push hook validating v* tag pushes:
  package.json version match, changelog entry, CI status
- Add release.sh script that renames [Unreleased] to versioned
  entry, bumps package.json, commits, and tags
- Add extract-changelog.sh for cumulative GitHub release notes
- Update publish workflow with build step and GitHub release creation
- Flesh out CHANGELOG.md with full history from 0.1.0 through 1.0.0
  in Keep-a-Changelog format with PR/contributor attributions
- Add release standards and changelog guidelines to CLAUDE.md
2026-02-16 08:42:32 -04:00
Tobi Lutke
4df5505bd6
Merge origin/nodejs: Node.js compat, perf improvements, vitest
Brings in Node.js compatibility (tsx, vitest), GPU auto-detection,
parallel embedding/reranking contexts, and flash attention support.
Preserves @tobilu/qmd package scope and publish config from main.
2026-02-15 16:52:30 -04:00
Tobi Lutke
b88c10bf83
docs: show bun/node install and package scope
Document both Node and Bun execution paths.
- Update install examples to `@tobilu/qmd` for npm and bun.
- Add npx/bunx one-off usage examples.
- Reflect Bun as first-class supported runtime in requirements.
2026-02-15 16:45:35 -04:00
Tobi Lutke
13e8473455
docs: update node usage and bump version
Update README installation and quick-start commands to Node examples.
- replace bun install/link commands with npm-based Node workflow
- bump package version to 0.9.9 for CLI and MCP metadata
- keep Bun guidance as optional development/runtime note
2026-02-15 16:44:47 -04:00
Tobi Lutke
5d73752b47
chore: rename package scope to @tobilu/qmd 2026-02-15 15:07:26 -04:00
Tobi Lutke
2279389415
chore: set up npm publishing as @tobi/qmd v0.9.0
- Scope package to @tobi/qmd, version 0.9.0
- Add files whitelist, publishConfig, repo metadata
- Add CI workflow (bun tests on ubuntu + macos, bun latest + 1.1.0)
- Add publish workflow (triggers on v* tags, publishes to npm)
- Add release script for version bumping + changelog generation
- Add LICENSE (MIT) and initial CHANGELOG.md
- Update install instructions to use @tobi/qmd
2026-02-15 14:31:23 -04:00
Tobi Lutke
32112256c1
docs: document smart chunking algorithm in README
Add Smart Chunking section explaining break point scoring, distance
decay formula, and code fence protection. Update token counts from
800 to 900 throughout.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-15 14:20:09 -04:00
Tobi Lütke
03a25d69d9
Add QMD architecture diagram to README
Generated with PaperBanana (Gemini 3 Pro). Shows query expansion
fanning HyDE+Vec into vector searches, Lex into BM25, merged via
reciprocal rank fusion and LLM reranking.
2026-02-14 19:17:11 -05:00
Ilya Grigorik
785bbcf319
MCP: Streamable HTTP, scoring fixes, tool improvements (#149)
* feat: MCP HTTP transport with daemon lifecycle

  Add streaming HTTP transport as an alternative to stdio for the MCP
  server. A long-lived HTTP server avoids reloading 3 GGUF models (~2GB)
  on every client connection, reducing warm query latency from ~16s (CLI)
  to ~10s.

  New CLI surface:
    qmd mcp --http [--port N]   # foreground, default port 3000
    qmd mcp --http --daemon     # background, PID in ~/.cache/qmd/mcp.pid
    qmd mcp stop                # stop daemon via PID file
    qmd status                  # now shows MCP daemon liveness

  Server implementation (mcp.ts):
  - Extract createMcpServer(store) shared by stdio and HTTP transports
  - HTTP transport uses WebStandardStreamableHTTPServerTransport with
    JSON responses (stateless, no SSE)
  - /health endpoint with uptime, /mcp for MCP protocol, 404 otherwise
  - Request logging to stderr with timestamps, tool names, query args

  Daemon lifecycle (qmd.ts):
  - PID file + log file management with stale PID detection
  - Absolute paths in Bun.spawn (process.execPath + import.meta.path)
    so daemon works regardless of cwd
  - mkdirSync for cache dir on fresh installs
  - Removes top-level SIGTERM/SIGINT handlers before starting HTTP
    server so async cleanup in mcp.ts actually runs

  Move hybridQuery() and vectorSearchQuery() into store.ts as standalone
  functions that take a Store as first argument. Both CLI and MCP now
  call the identical pipeline, eliminating the class of bugs where one
  copy drifts from the other.

  Shared pipeline (store.ts):
  - hybridQuery(): BM25 probe → expand → FTS+vec search → RRF →
    chunk → rerank (chunks only) → position-aware blending → dedup
  - vectorSearchQuery(): expand → vec search → dedup → sort
  - SearchHooks interface for optional progress callbacks
  - Constants: STRONG_SIGNAL_MIN_SCORE, STRONG_SIGNAL_MIN_GAP,
    RERANK_CANDIDATE_LIMIT (40), addLineNumbers()

  Bugs fixed by unification:
  - MCP now gets strong-signal short-circuit (was CLI-only)
  - Reranker candidate limit unified at 40 (MCP had 30)
  - File dedup added to hybrid query (MCP was missing it)
  - Collection filter pushed into searchVec DB query
  - Filter-then-slice ordering fixed (MCP was slice-then-filter)

* feat: type-routed query expansion — lex→FTS, vec/hyde→vector

  expandQuery() now returns typed ExpandedQuery[] instead of string[],
  preserving the lex/vec/hyde type info from the LLM's GBNF-structured
  output. hybridQuery() and vectorSearchQuery() route searches by type:
  lex queries go to FTS only, vec/hyde go to vector only.

  Previously, every expanded query ran through BOTH backends — keyword
  variants wasted embedding forward passes, semantic paraphrases wasted
  BM25 lookups. Type routing eliminates ~4 calls/query with zero quality
  loss (cross-backend noise actually hurt RRF fusion).

  Cache format changed from newline-separated text to JSON (preserves
  types). Old cache entries gracefully re-expand on first access.

  CLI expansion tree now shows query types:
    ├─ original query
    ├─ lex: keyword variant
    ├─ vec: semantic meaning
    └─ hyde: hypothetical document...

  Benchmark (5 queries, 1756-doc index, warm LLM, Apple Silicon):

    Metric              Old (untyped)  New (typed)  Delta
    Avg backend calls   10.0           6.0          -40%
    Total wall time     1278ms         549ms        -57%
    Avg saved/query     —              —            146ms

    "authentication setup"          12 → 7 calls   511 → 112ms
    "database migration strategy"   10 → 6 calls   182 → 106ms
    "how to handle errors in API"   10 → 6 calls   216 → 121ms
    "meeting notes from last week"  10 → 6 calls   228 → 110ms
    "performance optimization"       8 → 5 calls   141 → 100ms

  Savings come from skipped embed() calls (~30-80ms each). FTS is
  synchronous SQLite (~0ms), so lex→FTS routing is free while
  vec/hyde→vector-only avoids wasted embedding passes.

* fix: MCP query snippets now use reranker's best chunk, not full body

  extractSnippet() was scanning the entire document body for keyword
  matches to build the snippet. But hybridQuery() already identified
  the most relevant chunk via cross-attention reranking — rescanning
  the full body is redundant and can land on a less relevant section
  if the query terms appear elsewhere in the document.

  CLI was already using bestChunk (set during the refactor). MCP was
  still using body — a pre-existing inconsistency, not a regression.

* feat: dynamic MCP instructions + tool annotations

  The MCP server now generates instructions at startup from actual index
  state and injects them into the initialize response. LLMs see collection
  names, document counts, content descriptions, and search strategy
  guidance in their system prompt — zero tool calls needed for orientation.

  Previously, the only guidance was generic static tool descriptions and
  a user-invocable "query" prompt that no LLM would discover on its own.
  An LLM connecting to QMD had no idea what collections existed, what they
  contained, or how to scope searches effectively.

* change default port to 8181

* fix: BM25 score normalization was inverted

  The normalization formula `1 / (1 + |bm25|)` is a decreasing function of
  match strength. FTS5 BM25 scores are negative where more negative = better
  match (e.g., -10 is strong, -0.5 is weak). The formula mapped:

    strong match (raw -10) → 1/(1+10) =  9%   ← should be highest
    weak match   (raw -0.5) → 1/(1+0.5) = 67%  ← should be lowest

  Three downstream effects:
  1. `--min-score 0.5` (or MCP minScore: 0.5) filtered OUT strong matches
     and kept only weak ones. The MCP instructions recommend this threshold.
  2. CLI `formatScore()` color bands never showed green for BM25 results
     (best matches scored ~9%, green threshold is 70%).
  3. The strong signal optimization in hybridQuery (skip ~2s LLM expansion
     when BM25 already has a clear winner) was dead code — strong matches
     scored ~0.09, never reaching the 0.85 threshold.

  Fix: `|x| / (1 + |x|)` — same (0,1) range, monotonic, no per-query
  normalization needed, but now correctly maps strong → high, weak → low.

  The normalization was born broken (Math.max(0, x) clamped all
  negative BM25 to 0 → every score = 1.0), then PR #76 changed to
  Math.abs which made scores vary but inverted the direction. Neither
  state was ever correct.

* fix: rerank cache key ignores chunk content

  The rerank cache key was (query, file, model) but the actual text sent
  to the reranker is a keyword-selected chunk that varies by query terms.
  Two different queries hitting the same file can select different chunks,
  but the second query gets a stale cached score from the first chunk.

  Example:
    Query "auth flow" → selects chunk about authentication → score 0.92
    Query "auth tokens" → same file, selects chunk about tokens
      → cache HIT on (query, file, model) → returns 0.92 from wrong chunk

  Fix: include full chunk text in cache key. getCacheKey() already
  SHA-256 hashes its inputs, so this adds no key bloat — just
  disambiguation. Old cache entries become natural misses (different key
  shape) and re-warm on next query.

* rename MCP tools for clarity, rewrite descriptions for LLM tool selection

  Rename MCP tools: vsearch → vector_search, query → deep_search.
  LLMs see these names — self-documenting names reduce reliance on
  descriptions for tool selection. CLI commands stay unchanged
  (qmd vsearch, qmd query) — different namespace, users type those.

  Rewrite all search tool descriptions to be action-oriented:
    - search: "Search by keyword. Finds documents containing exact
      words and phrases in the query."
    - vector_search: "Search by meaning. Finds relevant documents even
      when they use different words than the query — handles synonyms,
      paraphrases, and related concepts."
    - deep_search: "Deep search. Auto-expands the query into variations,
      searches each by keyword and meaning, and reranks for top hits
      across all results."

  Rewrite instructions ladder — each tool says what it does, no
  "start here" / "escalate as needed" strategy language.

  Delete the "query" prompt (registerPrompt) — it restated what
  descriptions + instructions already cover. No LLM proactively
  calls prompts/get to learn how to use tools.

* supress HTTP server logs during tests
2026-02-10 16:37:33 -05:00
Matt Galligan
63028fd5e9
feat: add Claude Code plugin support with inline status check (#99)
- Add marketplace.json for Claude Code plugin installation
- Simplify skill status check to inline `qmd status` (portable across agents)
- Update SKILL.md MCP section, reference mcp-setup.md for manual config
- Clean up mcp-setup.md (remove redundant prerequisites)
- Rename MCP-SETUP.md to mcp-setup.md

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:14:24 -05:00
Tobi Lutke
17c201ea81
fix: correct QMD acronym to Query Markup Documents
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 13:22:54 -05:00
Tobi Lütke
46ff098361 Change only: format to only:lex (no space after colon) 2026-01-31 16:23:28 +00:00
Tobias Lütke
eb1b77c8cb
Deploy fine-tuned GRPO model as default query expansion (#67)
* Add query expansion model finetuning infrastructure

- Training scripts for Qwen3-0.6B and 1.7B models
- Dataset generation from s-emanuilov/query-expansion
- Evaluation scripts comparing finetuned vs baseline models
- GRPO RL training script (optional improvement)
- Export script for GGUF conversion

Results:
- 0.6B finetuned: 95% format compliance (lex/vec/hyde)
- Baseline: 0% format compliance
- Dataset: 5,157 examples on HuggingFace Hub

Models available at:
- tobil/qmd-query-expansion-0.6B (recommended)
- tobil/qmd-query-expansion-train (dataset)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix GRPO training script for TRL API compatibility

- Use max_completion_length instead of max_new_tokens
- Use processing_class instead of tokenizer
- Use args instead of config for GRPOTrainer
- Add __name__ attribute to reward function class
- Accept **kwargs in reward function for extra TRL args
- Add new LoRA adapter after merging SFT weights

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update README with final evaluation results

- 0.6B SFT: 95% format compliance (best)
- 0.6B GRPO: 0% (catastrophic forgetting from RL)
- 1.7B v2: training completed, evaluation pending
- Added GRPO evaluation results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add comprehensive scoring system for query expansion

New scoring criteria (0-100 points):
- Format (30): Must have lex: and vec: prefixes
- Diversity (30): Multiple types, no echoing query, diverse expansions
- Hyde (20): Optional, concise, no newlines, no word repetition
- Quality (20): Lex=keywords, vec=natural language

See SCORING.md for full documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add HuggingFace login and comprehensive scoring to GRPO v2 training

- Add explicit HF_TOKEN login before training
- Use SCORING.md criteria as RL reward function
- Conservative training: LR 1e-6, LoRA rank 4
- Reward scores: good=0.94, bad=0.38

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Refactor finetune folder: train/rl scripts with YAML configs

Major changes:
- train.py: Generic SFT training script using YAML config
- rl.py: Generic GRPO training script using YAML config
- configs/: YAML configs per training run (sft_v4.yaml, grpo_v4.yaml)
- dataset/: Data preparation scripts moved here
- tui.py: Interactive model testing interface

Training results:
- SFT v4: 98.8% avg score (all Excellent)
- GRPO v4: 0% (failed - model drifted to verbose explanations)

Removed per-model scripts (train_0.6B.py, train_1.7B.py, etc)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add named entity extraction to GRPO reward function

Key changes:
- Extract named entities (acronyms, proper nouns, technical terms)
- Heavy penalty (-30) when lex queries miss named entities
- Penalty (-15) for generic filler phrases like "find information about"
- Compound entity detection (TDS motorsports -> both words)
- Update GRPO config with KL regularization (beta=0.04)
- Lower learning rate (5e-7) and add max_steps (200)

Test results:
- "who is TDS motorsports" good: 1.00, bad: 0.30 (was 0.75)
- "how to use React hooks" good: 0.87, bad: 0.45 (was 0.75)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add chat template leakage detection to reward function

Zero reward for outputs containing:
- <|im_start|>, <|im_end|> tokens
- <think>, </think> tags (Qwen3 thinking mode)
- Role markers like \nassistant\n, \nuser\n
- <|endoftext|> token

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Strict format validation: every line must be lex:/vec:/hyde:

Any line that doesn't start with a valid prefix now returns 0.0
instead of just counting as a penalty. This prevents any prose,
explanations, bullet points, or other invalid content.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Clean up evaluation files

- Remove old versioned evaluation files (0.6B, 1.7B, baseline)
- Rename evaluation_v4.json -> evaluation_sft.json
- Rename evaluation_v4_grpo.json -> evaluation_grpo_failed.json

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Refactor evals into separate run and score scripts

New structure:
- evals/run.py: Generate model outputs to JSONL
- evals/score.py: Score outputs with detailed breakdown
- evals/queries.txt: Test queries (26 total)

Features:
- Supports both HF Hub and local model paths
- Named entity preservation scoring
- Chat template leakage detection
- Strict format validation (every line must be lex:/vec:/hyde:)
- Generic phrase detection

Usage:
  uv run evals/run.py --model tobil/qmd-query-expansion-0.6B-v4
  uv run evals/score.py evals/results_*.jsonl

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix GRPO model loading to use SFT base first

The GRPO adapter was trained on merged SFT weights, so loading it
directly on the base model results in 0% score. Added --sft-model
parameter to evals/run.py to load SFT first, then apply GRPO adapter.

With correct loading: GRPO scores 89.7% (all 26 queries Excellent).

Updated README with correct GRPO score and loading instructions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix TUI to load GRPO models with SFT base first

GRPO adapters were trained on merged SFT weights, so they need SFT
loaded and merged first before applying the GRPO adapter.

Updated MODELS config to include sft_base path for GRPO models,
and load_model() now handles the SFT -> merge -> GRPO flow.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update README for unified model repository structure

All models (0.6B, 1.7B, 4B) with SFT and GRPO variants now go into
a single HuggingFace repo (tobil/qmd-query-expansion) with subfolders
for each size and training method.

Updated loading examples to show subfolder-based model loading.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Update README with separate model repos

Changed from subfolder approach to separate repos per model since
trainer.push_to_hub() doesn't support subfolder argument.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Add 1.7B and 4B GRPO training and GGUF conversion scripts

Training scripts for GRPO fine-tuning:
- train_1.7B_grpo.py: GRPO training for Qwen3-1.7B
- train_4B_grpo.py: GRPO training for Qwen3-4B

GGUF conversion scripts:
- convert_1.7B_gguf.py: Merge SFT+GRPO adapters and convert to GGUF
- convert_4B_gguf.py: Merge SFT+GRPO adapters and convert to GGUF

All scripts use PEP 723 inline dependencies for HuggingFace Jobs.

Models published:
- tobil/qmd-query-expansion-1.7B-sft
- tobil/qmd-query-expansion-1.7B-grpo
- tobil/qmd-query-expansion-1.7B-gguf
- tobil/qmd-query-expansion-4B-sft
- tobil/qmd-query-expansion-4B-grpo
- tobil/qmd-query-expansion-4B-gguf

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Remove beads issue tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Remove beads reference from CLAUDE.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix GRPO reward function to handle think blocks and end tokens

- Strip <|im_end|> token from completions (model output includes it)
- Change think_penalty to skipped_think bonus (+20 for not using think)
- Adjust max_possible to account for bonus (120/140)
- Fix typo in chat template artifact check

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Make TUI model list dynamic from HuggingFace Hub

- Fetch available qmd-query-expansion models from tobil/ on Hub
- Auto-detect model size (0.6B, 1.7B, 4B) and use correct base model
- Group models by type (SFT vs GRPO) in menu
- Skip GGUF repos in model listing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Fix GRPO training: apply chat template to prompts

The SFT model was trained with chat template format but GRPO was
passing raw prompts. Now prompts are formatted with tokenizer.apply_chat_template()
so the model sees the same format it learned during SFT.

Also update extract_query_from_prompt to strip chat template artifacts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Finetune 2.0: consolidate and simplify the entire training pipeline

Consolidate ~2,800 lines of duplicated code across 12 files into 5 clean,
well-documented files targeting Qwen3-1.7B end-to-end.

Key changes:
- Extract reward function into single source of truth (reward.py)
  Previously duplicated 3x with divergent bugs across rl.py,
  train_1.7B_grpo.py, and train_4B_grpo.py
- Unify training into one script with sft/grpo subcommands (train.py)
  Replaces train.py + rl.py + train_1.7B_grpo.py + train_4B_grpo.py
- Merge eval generate+score into single eval.py
  Replaces evals/run.py + evals/score.py
- Parameterize GGUF conversion by --size (convert_gguf.py)
  Replaces convert_1.7B_gguf.py + convert_4B_gguf.py
- Fix critical bug: rl.py silently ignored beta/temperature from config,
  causing the exact catastrophic drift its own comments warned about
- Fix prompt consistency: all files use /no_think chat template format
- Retarget configs from 0.6B to 1.7B
- Comprehensive README documenting the full pipeline

Removed: rl.py, train_1.7B_grpo.py, train_4B_grpo.py, convert_1.7B_gguf.py,
convert_4B_gguf.py, tui.py, evals/run.py, evals/score.py

Net: -3,429 lines, +382 lines

Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com>

* Add HF Jobs scripts, temporal query examples, and training results

- jobs/sft.py and jobs/grpo.py: self-contained scripts for
  `hf jobs uv run` (no local GPU needed)
- 12 temporal/recency query examples in training data (e.g. "recent
  news about Shopify" -> lex with years 2025/2026)
- 4 temporal test queries in evals/queries.txt
- README updated with HF Jobs workflow, training results, and
  updated file structure
- Remove .beads tracking

SFT and GRPO successfully trained on A10G via HF Jobs:
  SFT: eval loss 0.321, token accuracy 92.4%
  GRPO: mean reward 0.757, 200 steps, KL 0.00048

Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com>

* Deploy fine-tuned GRPO model as default for query expansion

Switch from generic Qwen3-1.7B-Q8_0 (~2.2GB) to fine-tuned
qmd-query-expansion-1.7B-q4_k_m (~1.1GB). The fine-tuned Q4
scores 91.7% avg with 30/30 Excellent, outperforming the base Q8.

- Update default generate model in src/llm.ts
- Update README model table, architecture diagram, config block
- Add v2 training data, eval scripts, and quantize job
- Remove superseded v1 training data (5,742 → 1,000 examples)
- Update finetune README with v2 results and file structure

Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 23:25:17 -08:00
George Zhang
c8f72de12e
docs: fix query expansion model size (Qwen3-1.7B, not 0.6B)
The code uses Qwen3-1.7B (~2.2GB) for query expansion, but the README
documented Qwen3-0.6B (~640MB) in three places:
- Model requirements table
- Architecture diagram
- Code configuration sample

This caused confusion when users saw a 2GB+ download instead of 640MB.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 04:09:27 +08:00
Tobi Lutke
d383b5c226
Migrate to node-llama-cpp and add structured query expansion
- Replace Ollama HTTP API with node-llama-cpp for local GGUF models
- Add structured query expansion using JSON schema grammar:
  - Generates lexical query (for BM25), vector query, and HyDE
  - Tree-style CLI output showing query types
- Fix vector search: use cosine distance instead of L2
- Format queries with embeddinggemma nomic-style prompts
- Rename ollama_cache table to llm_cache
- Add disposeDefaultLlamaCpp() for clean process exit

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 18:03:41 -04:00
Tobi Lutke
a3703c069a
Add path normalization, output format tests, and fix test isolation
- Add support for collection/path.md format in get command (checks if
  first component is a known collection before treating as filesystem path)
- Add comprehensive output format tests verifying qmd:// URIs, docid,
  and context in JSON, CSV, MD, XML, files, and CLI formats
- Add path normalization tests for various input formats:
  qmd://, //, qmd:////, collection/path, and path:line suffix
- Add isolated test environments (createIsolatedTestEnv) to prevent
  YAML config conflicts between test suites
- Add test fixture files test1.md and test2.md for path tests
- Update runQmd helper to accept custom configDir parameter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 14:45:18 -04:00
Tobi Lutke
529e989d83
Refactor: Move TypeScript source files to src/ directory
Move all .ts files to src/ to clean up the project root:
- Created src/ directory and moved all TypeScript source and test files
- Updated qmd shell wrapper to point to src/qmd.ts
- Updated package.json scripts to use src/ paths
- Updated documentation (CLAUDE.md, README.md) to reflect new structure
- All imports remain relative within src/, no changes needed
- Tests pass with same results (192 pass, 75 fail - existing issues)

This improves project organization and makes the root directory cleaner.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 17:12:09 -05:00
Tobi Lutke
bab46dacb2
Refactor: extract store, LLM, and formatter modules with comprehensive tests
- Extract store.ts: database operations, search, document retrieval
  - createStore() factory pattern for clean DB lifecycle management
  - Unified DocumentResult type with optional body loading
  - Snippet extraction with diff-style headers (@@ -line,count @@)

- Extract llm.ts: LLM abstraction layer with Ollama implementation
  - Clean interface for embed, generate, rerank operations
  - High-level rerankerLogprobsCheck with logprob-based scoring
  - Query expansion support

- Extract formatter.ts: output formatting utilities
  - Support for CLI, JSON, CSV, MD, XML formats
  - MCP-specific CSV formatting

- Extract mcp.ts: MCP server using createStore() pattern
  - Single DB connection for server lifetime (fixes closed DB errors)
  - URL-decode resource paths for proper space/special char handling

- Add comprehensive test suites (215 tests total)
  - store.test.ts: 96 tests covering all store operations
  - llm.test.ts: 60 tests for LLM abstraction
  - mcp.test.ts: 59 tests for MCP endpoints and resources
  - All tests use mocked Ollama (errors on unmocked calls)

- Add bun run inspector script for MCP debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:33:32 -05:00
Tobi Lutke
25ac53848f
Add MCP server for AI agent integration
- Add `qmd mcp` command to start stdio-based MCP server
- Expose tools: qmd_search, qmd_vsearch, qmd_query, qmd_get, qmd_status
- Add index health warnings for unembedded docs and stale indexes
- Return CSV format with text/csv mime type for search results
- Add MCP documentation and configuration examples to README
- Add @modelcontextprotocol/sdk and zod dependencies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 14:59:56 -05:00
Tobi Lutke
877917487d
Update README with quick start guide and agentic workflow examples
- Add new tagline: on-device search engine for everything you need to remember
- Add Quick Start section with walkthrough of indexing multiple directories
- Add "Using with AI Agents" section showing --json and --files workflows
- Update output format example to reflect new CLI format with Title/Context/Score
- Document --all flag for returning all matches
2025-12-08 12:54:57 -05:00
Tobi Lutke
42ab3f6c10
Update README to reflect current implementation
- Fix architecture diagram: show BM25+Vector for all query variations
- Add position-aware blending percentages to diagram
- Update CLI commands: add → index, add-context, cleanup, status
- Document chunked embeddings (~6KB pieces with hash/seq/pos)
- Update schema section with new tables (path_contexts, ollama_cache)
- Rewrite How It Works flows with accurate pipeline details
- Fix output format examples to show ~/... paths
- Add --files and --json output options

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-08 09:31:20 -05:00
Tobi Lutke
39193ea252
Initial commit: QMD - Quick Markdown Search
A CLI tool for searching markdown knowledge bases using hybrid retrieval:
- BM25 full-text search via SQLite FTS5
- Vector semantic search via sqlite-vec + Ollama embeddings
- LLM re-ranking with qwen3-reranker (logprobs-based scoring)
- Reciprocal Rank Fusion with weighted queries and position-aware blending

Features:
- `qmd add .` - Index markdown files in current directory
- `qmd embed` - Generate vector embeddings
- `qmd search` - BM25 full-text search
- `qmd vsearch` - Vector similarity search
- `qmd query` - Hybrid search with query expansion + reranking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:16:16 -05:00