ai-workspace-services/qmd

Author	SHA1	Message	Date
Shreyas Karnik	b05d8863ca	fix: quantization paths, missing imports, and hardcoded metadata - Add missing subprocess import (NameError on any quantize path) - Replace broken optimum-cli quantize calls with direct onnxruntime: Q4 uses MatMulNBitsQuantizer, Q8 uses quantize_dynamic - Add onnxconverter-common to deps for FP16 (was silently swallowed) - Make FP16 fail loudly on missing dep instead of silently uploading FP32 - README and transformers_js_config now reflect actual quantize_type instead of always hardcoding Q4 - Remove dead _convert_fp16_external function	2026-03-13 12:45:48 -07:00
Shreyas Karnik	e1ce37c989	fix: handle 2GB protobuf limit, add validation, fix input feeds - Use no_post_process=True for ONNX export to avoid protobuf serialize error - Add --validate and --validate-only flags for inference verification - Fix position_ids in validation feed (required by Qwen3 ONNX export) - Use optimum-cli for quantization to handle external data format - Fix optimum dependency to optimum[onnxruntime] Tested: export + validation passes on CPU, KV cache present (56 tensors). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 12:30:26 -07:00
Shreyas Karnik	2df95ac9ba	feat: add ONNX conversion script for Transformers.js deployment Add convert_onnx.py that mirrors convert_gguf.py's structure: - Loads base Qwen3 model, merges SFT + GRPO adapters - Exports to ONNX via Optimum (text-generation-with-past task) - Supports Q4 (MatMulNBits), Q8, FP16, and FP32 output - Uploads to separate HF repo (e.g. tobil/qmd-query-expansion-1.7B-ONNX) - Writes Transformers.js compatibility config - Includes model card with usage example Usage: uv run convert_onnx.py --size 1.7B uv run convert_onnx.py --size 1.7B --quantize q4 --no-upload Also adds `just convert-onnx` and `just convert-gguf` tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 11:50:03 -07:00
rkbadhan	4511b9bd4d	fix(reward): tighten entity detection, add filler penalty, stricter diversity - Compound entity chaining now stops one level deep. Previously "TDS motorsports team history" would inflate the expected entity set with "team" and "history", causing false-positive entity-preservation penalties during GRPO. Now only {tds, motorsports} are detected. - Add INTERIOR_FILLER_WORDS penalty (-3/line): lex lines containing "overview" or "basics" absent from the original query are penalised. Targets template-generator noise, e.g. "ancient overview rome timeline". - Raise is_diverse threshold 2→3: requires 3 unique words between lex lines before they count as diverse. Reduces reward for near-duplicate pairs like "auth setup" / "auth configuration". - Broaden quoted-phrase bonus: was gated on named entities existing; now any multi-word query earns +3 for using quotes in lex lines. Better incentivises BM25-aware syntax like "memory leak" python. Fixes scoring noise identified while working on issue #247.	2026-02-24 19:46:23 +05:30
Tobi Lütke	d6f3688d91	Remove grpo command from default train entrypoint	2026-02-22 15:29:09 -05:00
Tobi Lütke	189916d6fb	Move GRPO training out of default finetune pipeline	2026-02-22 15:26:23 -05:00
Tobi Lütke	cbeeb1f89b	Add wall-clock checkpoints and full eval defaults	2026-02-22 15:02:02 -05:00
Tobi Lutke	1d7d167b29	finetune: strict Pydantic schema, one canonical data format Replace ad-hoc JSON parsing with a strict Pydantic model (TrainingExample with typed OutputPair). All data loading goes through load_examples() which fails loudly on invalid data. - Convert v3_structured.jsonl from "searches" to "output" format - Rewrite all consumer scripts (prepare, validate, score, analyze) to load through the Pydantic schema - Prepared train/val files are ephemeral build artifacts - Restore LFM2 and GEPA experiments under experiments/ - Add pydantic>=2.0 to dependencies	2026-02-22 13:39:00 -04:00
Tobi Lutke	3950055708	finetune: quoted phrases, negation, and entity preservation (#247 ) Training data: - Expand lex phrases/negation examples from 12 to 74 with intent field - Add 50 personal entity examples (meetings, emails, projects with names) Reward function: - Detect entities at position 0 (fixes "Bob asked about deploy") - Per-entity coverage penalty: -20 per entity absent from all lex+vec - Phrase quoting bonus: +3 when lex uses quotes for multi-word terms - Expanded stopwords to reduce false positive entity detection Eval queries: add 21 test queries for personal entities, quoted phrases, and negation/disambiguation scenarios.	2026-02-22 13:38:59 -04:00
Tobi Lutke	599935754b	finetune: remove orphaned files and abandoned experiments Remove one-off data generator/fix scripts, superseded data files (v2, v3 replaced by v3_structured), LFM2 experiment, GEPA directory, duplicate job scripts, and historical docs. Clean up Justfile. These are restored under experiments/ in a later commit.	2026-02-22 13:38:59 -04:00
Tobi Lütke	8d73eda4de	data: add 48 sports acronym training examples Covers UFC, NFL, NBA, NHL, MLB, F1, MLS, IMSA, WEC, NASCAR, PGA, ATP, WTA, FIFA. Fixes query expansion failures like UFC → 'united fighting club'.	2026-02-22 09:37:25 -05:00
Tobi Lütke	3b87e3e224	feat: query document format, lex phrase/negation syntax, training data The 'query document' is now a first-class concept in QMD: a structured document with typed sub-queries that combine for best recall. ## Query types - lex: BM25 keyword search with phrase and negation syntax - vec: Semantic vector search (natural language questions) - hyde: Hypothetical document (write the expected answer) - expand: Auto-expand via local LLM (max 1, default for plain queries) ## Lex syntax Full BM25 operator support: "exact phrase" verbatim match, no prefix -term exclude documents containing term -"exact phrase" exclude documents containing phrase Examples: "C++ performance" optimization -sports -athlete "connection pool" timeout -redis "machine learning" -sports -athlete ## MCP tool description rewritten The 'query' tool description now fully teaches AI agents the query document format, lex syntax, and strategy for combining types. Includes worked examples including intent-aware lex (C++ performance, not sports) which is critical for disambiguation in dense corpora. ## Unit tests 11 new lex parser tests covering: - plain terms, quoted phrases, negation, combined - intent-aware disambiguation (performance -sports -athlete) - only-negation returns null (FTS5 constraint) - empty/whitespace handling ## Training data 12 new intent-aware examples for next model training round: - Real technical topics with lex phrase+negation combinations - Covers: C++ perf, Python memory, DB connections, rate limiting, SQL optimization, ML overfitting, Docker, JWT, async/await, git conflicts, Kubernetes, React state - Each shows how context/intent shapes lex query construction (e.g. performance with C++ context → -sports -athlete exclusions)	2026-02-19 06:52:58 -05:00
Tobias Lütke	67e2aab18c	Merge pull request #206 from tobi/liquidai-query-expansion	2026-02-18 08:42:01 -04:00
Tobi Lütke	48f0917269	feat(finetune): hyde-first ordering, relative paths, structured format Dataset improvements: - Reorder output to put hyde first for better retrieval priming - Convert absolute paths to relative paths in scripts - Add convert_to_structured.py for structured data format - Add qmd_expansion_v3_structured.jsonl with type/query objects - Update schema.py with reorder_hyde_first() helper - Verify data now validates hyde-first ordering Training data regenerated with new ordering (100% validation success).	2026-02-17 06:31:35 -05:00
Tobi Lütke	57f7caa93b	feat: add LiquidAI LFM2 support for query expansion Add training configuration and documentation for using LiquidAI's LFM2-1.2B as an alternative base model for query expansion fine-tuning. LFM2 benefits: - 2x faster decode/prefill vs standard transformers - Optimized for edge/on-device inference - Good at agentic tasks, RAG, and data extraction Changes: - Add configs/sft_lfm2.yaml with LFM2-specific LoRA target modules - Add jobs/sft_lfm2.py for HuggingFace Jobs training - Update llm.ts with LFM2 GGUF model URIs - Add documentation for LFM2 training workflow LFM2 uses a hybrid architecture (convolutions + attention) requiring different LoRA targets: q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3	2026-02-17 06:20:57 -05:00
Tobi Lütke	a282ce7a26	feat(finetune): improve query expansion dataset v3 Dataset improvements: - fix_hyde.py: Replace generic template hyde entries with query-specific ones using GPT-4o-mini (removed 'comprehensive guide covers everything' pattern) - fix_lex_filler.py: Remove filler words (overview, tutorial, guide, examples, documentation, best practices) that were padding rather than genuine search intent - qmd_expansion_v3.jsonl: Improved dataset with 1,498 high-quality entries Training data preparation: - convert_to_chatml.py: Convert to ChatML format for LFM2.5 training - verify_data.py: Validation script to ensure data quality - train-lfm2/: Ready-to-use training data (90/10 train/val split) Data quality metrics: - 100% success rate (all entries properly formatted) - Query length: 6-65 chars (avg: 29.3) - Response length: 307-777 chars (avg: 539.5) - All entries contain lex, vec, and hyde expansions	2026-02-17 06:19:59 -05:00
Tobi Lütke	102ff861d3	fix: use Qwen3 recommended sampling params to prevent repetition loops - Changed temperature from 0/0.1 to 0.7 (Qwen3 non-thinking mode default) - Added topK=20, topP=0.8 per Qwen3 docs - Added repeatPenalty with presencePenalty=0.5 for query expansion - Fixes infinite loop on acronyms like DHH, BFCM Qwen3 docs explicitly warn: 'DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions'	2026-02-01 03:24:20 +00:00
Tobi Lütke	bf1b8fc90a	lots of training stuff	2026-01-31 23:02:23 +00:00
Tobi Lutke	739038e1a7	docs: add explicit HuggingFace repo destinations - List all HuggingFace repos in CLAUDE.md (model, gguf, sft, grpo, train) - Update jobs scripts to use tobil/qmd-query-expansion-train (no -v2) - Clarify rules: no versioned repos, update in place Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-31 12:26:02 -05:00
Tobi Lutke	38073799c0	chore: clean up finetune folder and fix training workflow - Remove versioned files (sft_v4.yaml, prepare_v4_dataset.py, train_v2/) - Update configs to use local data/train/ directory - Add glob pattern support to prepare_data.py and train.py - Update .gitignore to properly ignore outputs/ and data/train*/ - Document data preparation step in CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-31 12:21:09 -05:00
Tobi Lutke	533f0eed37	docs: add finetune CLAUDE.md and update training workflow - Add finetune/CLAUDE.md documenting the training pipeline - Update configs to output to local outputs/ directory (gitignored) - Document that all data/*.jsonl files are training data - Document local CUDA training vs HuggingFace Jobs cloud training - Enforce eval requirement before any model upload - Single model repo (no -v1, -v2, -v4 versioning) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-31 12:15:56 -05:00
Tobi Lutke	7de18ee066	Merge main into finetune Brings in: - /only: variants for single-type expansions - LLM session management for lifecycle safety - skills.sh integration for AI agent discovery - Various bug fixes for vector search and embeddings Merge conflicts resolved by keeping hyde-first format ordering from finetune branch while accepting expanded templates and new features from main. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-31 12:10:22 -05:00
Tobi Lutke	785620467a	refactor: reorder output format to put hyde line first Move the hyde (hypothetical document) line to the beginning of the output format, before lex and vec lines. This better reflects the logical flow where the hypothetical document is generated first and then informs the keyword/semantic expansions. Also adds auto-download of eval_common.py in training scripts for standalone HuggingFace Jobs execution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-31 12:09:04 -05:00
Tobi Lütke	8cc7d8c138	Add sampled /only: variants (399) for training balance	2026-01-31 16:29:02 +00:00
Tobi Lütke	20aef8a3e9	Change format to /only:lex (slash prefix)	2026-01-31 16:24:18 +00:00
Tobi Lütke	46ff098361	Change only: format to only:lex (no space after colon)	2026-01-31 16:23:28 +00:00
Tobi Lütke	806a0cfc14	Add 'only:' mode support for single-type expansions - generate_only_variants.py: Creates training data where queries end with 'only: lex', 'only: vec', or 'only: hyde' and output contains ONLY that type - reward.py: Updated scorer to handle 'only:' mode separately - Penalizes presence of unwanted types - Type-specific quality checks - Filters templated low-quality hyde outputs - 4,444 high-quality 'only:' variants from v2 + handcrafted data	2026-01-31 16:15:59 +00:00
Tobi Lutke	5cf4958bfa	Add HuggingFace model card YAML metadata to finetune README Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com>	2026-01-28 23:33:55 -08:00
Tobias Lütke	eb1b77c8cb	Deploy fine-tuned GRPO model as default query expansion (#67 ) * Add query expansion model finetuning infrastructure - Training scripts for Qwen3-0.6B and 1.7B models - Dataset generation from s-emanuilov/query-expansion - Evaluation scripts comparing finetuned vs baseline models - GRPO RL training script (optional improvement) - Export script for GGUF conversion Results: - 0.6B finetuned: 95% format compliance (lex/vec/hyde) - Baseline: 0% format compliance - Dataset: 5,157 examples on HuggingFace Hub Models available at: - tobil/qmd-query-expansion-0.6B (recommended) - tobil/qmd-query-expansion-train (dataset) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix GRPO training script for TRL API compatibility - Use max_completion_length instead of max_new_tokens - Use processing_class instead of tokenizer - Use args instead of config for GRPOTrainer - Add __name__ attribute to reward function class - Accept *kwargs in reward function for extra TRL args - Add new LoRA adapter after merging SFT weights Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Update README with final evaluation results - 0.6B SFT: 95% format compliance (best) - 0.6B GRPO: 0% (catastrophic forgetting from RL) - 1.7B v2: training completed, evaluation pending - Added GRPO evaluation results Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add comprehensive scoring system for query expansion New scoring criteria (0-100 points): - Format (30): Must have lex: and vec: prefixes - Diversity (30): Multiple types, no echoing query, diverse expansions - Hyde (20): Optional, concise, no newlines, no word repetition - Quality (20): Lex=keywords, vec=natural language See SCORING.md for full documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add HuggingFace login and comprehensive scoring to GRPO v2 training - Add explicit HF_TOKEN login before training - Use SCORING.md criteria as RL reward function - Conservative training: LR 1e-6, LoRA rank 4 - Reward scores: good=0.94, bad=0.38 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Refactor finetune folder: train/rl scripts with YAML configs Major changes: - train.py: Generic SFT training script using YAML config - rl.py: Generic GRPO training script using YAML config - configs/: YAML configs per training run (sft_v4.yaml, grpo_v4.yaml) - dataset/: Data preparation scripts moved here - tui.py: Interactive model testing interface Training results: - SFT v4: 98.8% avg score (all Excellent) - GRPO v4: 0% (failed - model drifted to verbose explanations) Removed per-model scripts (train_0.6B.py, train_1.7B.py, etc) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add named entity extraction to GRPO reward function Key changes: - Extract named entities (acronyms, proper nouns, technical terms) - Heavy penalty (-30) when lex queries miss named entities - Penalty (-15) for generic filler phrases like "find information about" - Compound entity detection (TDS motorsports -> both words) - Update GRPO config with KL regularization (beta=0.04) - Lower learning rate (5e-7) and add max_steps (200) Test results: - "who is TDS motorsports" good: 1.00, bad: 0.30 (was 0.75) - "how to use React hooks" good: 0.87, bad: 0.45 (was 0.75) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add chat template leakage detection to reward function Zero reward for outputs containing: - <\|im_start\|>, <\|im_end\|> tokens - <think>, </think> tags (Qwen3 thinking mode) - Role markers like \nassistant\n, \nuser\n - <\|endoftext\|> token Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Strict format validation: every line must be lex:/vec:/hyde: Any line that doesn't start with a valid prefix now returns 0.0 instead of just counting as a penalty. This prevents any prose, explanations, bullet points, or other invalid content. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Clean up evaluation files - Remove old versioned evaluation files (0.6B, 1.7B, baseline) - Rename evaluation_v4.json -> evaluation_sft.json - Rename evaluation_v4_grpo.json -> evaluation_grpo_failed.json Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Refactor evals into separate run and score scripts New structure: - evals/run.py: Generate model outputs to JSONL - evals/score.py: Score outputs with detailed breakdown - evals/queries.txt: Test queries (26 total) Features: - Supports both HF Hub and local model paths - Named entity preservation scoring - Chat template leakage detection - Strict format validation (every line must be lex:/vec:/hyde:) - Generic phrase detection Usage: uv run evals/run.py --model tobil/qmd-query-expansion-0.6B-v4 uv run evals/score.py evals/results_.jsonl Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Fix GRPO model loading to use SFT base first The GRPO adapter was trained on merged SFT weights, so loading it directly on the base model results in 0% score. Added --sft-model parameter to evals/run.py to load SFT first, then apply GRPO adapter. With correct loading: GRPO scores 89.7% (all 26 queries Excellent). Updated README with correct GRPO score and loading instructions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix TUI to load GRPO models with SFT base first GRPO adapters were trained on merged SFT weights, so they need SFT loaded and merged first before applying the GRPO adapter. Updated MODELS config to include sft_base path for GRPO models, and load_model() now handles the SFT -> merge -> GRPO flow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update README for unified model repository structure All models (0.6B, 1.7B, 4B) with SFT and GRPO variants now go into a single HuggingFace repo (tobil/qmd-query-expansion) with subfolders for each size and training method. Updated loading examples to show subfolder-based model loading. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update README with separate model repos Changed from subfolder approach to separate repos per model since trainer.push_to_hub() doesn't support subfolder argument. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add 1.7B and 4B GRPO training and GGUF conversion scripts Training scripts for GRPO fine-tuning: - train_1.7B_grpo.py: GRPO training for Qwen3-1.7B - train_4B_grpo.py: GRPO training for Qwen3-4B GGUF conversion scripts: - convert_1.7B_gguf.py: Merge SFT+GRPO adapters and convert to GGUF - convert_4B_gguf.py: Merge SFT+GRPO adapters and convert to GGUF All scripts use PEP 723 inline dependencies for HuggingFace Jobs. Models published: - tobil/qmd-query-expansion-1.7B-sft - tobil/qmd-query-expansion-1.7B-grpo - tobil/qmd-query-expansion-1.7B-gguf - tobil/qmd-query-expansion-4B-sft - tobil/qmd-query-expansion-4B-grpo - tobil/qmd-query-expansion-4B-gguf Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove beads issue tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove beads reference from CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix GRPO reward function to handle think blocks and end tokens - Strip <\|im_end\|> token from completions (model output includes it) - Change think_penalty to skipped_think bonus (+20 for not using think) - Adjust max_possible to account for bonus (120/140) - Fix typo in chat template artifact check Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Make TUI model list dynamic from HuggingFace Hub - Fetch available qmd-query-expansion models from tobil/ on Hub - Auto-detect model size (0.6B, 1.7B, 4B) and use correct base model - Group models by type (SFT vs GRPO) in menu - Skip GGUF repos in model listing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix GRPO training: apply chat template to prompts The SFT model was trained with chat template format but GRPO was passing raw prompts. Now prompts are formatted with tokenizer.apply_chat_template() so the model sees the same format it learned during SFT. Also update extract_query_from_prompt to strip chat template artifacts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Finetune 2.0: consolidate and simplify the entire training pipeline Consolidate ~2,800 lines of duplicated code across 12 files into 5 clean, well-documented files targeting Qwen3-1.7B end-to-end. Key changes: - Extract reward function into single source of truth (reward.py) Previously duplicated 3x with divergent bugs across rl.py, train_1.7B_grpo.py, and train_4B_grpo.py - Unify training into one script with sft/grpo subcommands (train.py) Replaces train.py + rl.py + train_1.7B_grpo.py + train_4B_grpo.py - Merge eval generate+score into single eval.py Replaces evals/run.py + evals/score.py - Parameterize GGUF conversion by --size (convert_gguf.py) Replaces convert_1.7B_gguf.py + convert_4B_gguf.py - Fix critical bug: rl.py silently ignored beta/temperature from config, causing the exact catastrophic drift its own comments warned about - Fix prompt consistency: all files use /no_think chat template format - Retarget configs from 0.6B to 1.7B - Comprehensive README documenting the full pipeline Removed: rl.py, train_1.7B_grpo.py, train_4B_grpo.py, convert_1.7B_gguf.py, convert_4B_gguf.py, tui.py, evals/run.py, evals/score.py Net: -3,429 lines, +382 lines Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com> * Add HF Jobs scripts, temporal query examples, and training results - jobs/sft.py and jobs/grpo.py: self-contained scripts for `hf jobs uv run` (no local GPU needed) - 12 temporal/recency query examples in training data (e.g. "recent news about Shopify" -> lex with years 2025/2026) - 4 temporal test queries in evals/queries.txt - README updated with HF Jobs workflow, training results, and updated file structure - Remove .beads tracking SFT and GRPO successfully trained on A10G via HF Jobs: SFT: eval loss 0.321, token accuracy 92.4% GRPO: mean reward 0.757, 200 steps, KL 0.00048 Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com> * Deploy fine-tuned GRPO model as default for query expansion Switch from generic Qwen3-1.7B-Q8_0 (~2.2GB) to fine-tuned qmd-query-expansion-1.7B-q4_k_m (~1.1GB). The fine-tuned Q4 scores 91.7% avg with 30/30 Excellent, outperforming the base Q8. - Update default generate model in src/llm.ts - Update README model table, architecture diagram, config block - Add v2 training data, eval scripts, and quantize job - Remove superseded v1 training data (5,742 → 1,000 examples) - Update finetune README with v2 results and file structure Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 23:25:17 -08:00
Tobi Lutke	8572c2fd94	Deploy fine-tuned GRPO model as default for query expansion Switch from generic Qwen3-1.7B-Q8_0 (~2.2GB) to fine-tuned qmd-query-expansion-1.7B-q4_k_m (~1.1GB). The fine-tuned Q4 scores 91.7% avg with 30/30 Excellent, outperforming the base Q8. - Update default generate model in src/llm.ts - Update README model table, architecture diagram, config block - Add v2 training data, eval scripts, and quantize job - Remove superseded v1 training data (5,742 → 1,000 examples) - Update finetune README with v2 results and file structure Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com>	2026-01-28 23:24:58 -08:00
Tobi Lutke	5ab78d00a2	Add HF Jobs scripts, temporal query examples, and training results - jobs/sft.py and jobs/grpo.py: self-contained scripts for `hf jobs uv run` (no local GPU needed) - 12 temporal/recency query examples in training data (e.g. "recent news about Shopify" -> lex with years 2025/2026) - 4 temporal test queries in evals/queries.txt - README updated with HF Jobs workflow, training results, and updated file structure - Remove .beads tracking SFT and GRPO successfully trained on A10G via HF Jobs: SFT: eval loss 0.321, token accuracy 92.4% GRPO: mean reward 0.757, 200 steps, KL 0.00048 Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com>	2026-01-28 15:46:44 -08:00
Tobi Lutke	354744af53	Finetune 2.0: consolidate and simplify the entire training pipeline Consolidate ~2,800 lines of duplicated code across 12 files into 5 clean, well-documented files targeting Qwen3-1.7B end-to-end. Key changes: - Extract reward function into single source of truth (reward.py) Previously duplicated 3x with divergent bugs across rl.py, train_1.7B_grpo.py, and train_4B_grpo.py - Unify training into one script with sft/grpo subcommands (train.py) Replaces train.py + rl.py + train_1.7B_grpo.py + train_4B_grpo.py - Merge eval generate+score into single eval.py Replaces evals/run.py + evals/score.py - Parameterize GGUF conversion by --size (convert_gguf.py) Replaces convert_1.7B_gguf.py + convert_4B_gguf.py - Fix critical bug: rl.py silently ignored beta/temperature from config, causing the exact catastrophic drift its own comments warned about - Fix prompt consistency: all files use /no_think chat template format - Retarget configs from 0.6B to 1.7B - Comprehensive README documenting the full pipeline Removed: rl.py, train_1.7B_grpo.py, train_4B_grpo.py, convert_1.7B_gguf.py, convert_4B_gguf.py, tui.py, evals/run.py, evals/score.py Net: -3,429 lines, +382 lines Co-Authored-By: Claude (claude-fudge-eap-cc) <noreply@anthropic.com>	2026-01-28 14:00:36 -08:00
Tobi Lutke	9b3a209a97	Fix GRPO training: apply chat template to prompts The SFT model was trained with chat template format but GRPO was passing raw prompts. Now prompts are formatted with tokenizer.apply_chat_template() so the model sees the same format it learned during SFT. Also update extract_query_from_prompt to strip chat template artifacts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 17:21:22 -05:00
Tobi Lutke	3ea85eff50	Make TUI model list dynamic from HuggingFace Hub - Fetch available qmd-query-expansion models from tobil/ on Hub - Auto-detect model size (0.6B, 1.7B, 4B) and use correct base model - Group models by type (SFT vs GRPO) in menu - Skip GGUF repos in model listing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 17:17:40 -05:00
Tobi Lutke	891f3262cf	Fix GRPO reward function to handle think blocks and end tokens - Strip <\|im_end\|> token from completions (model output includes it) - Change think_penalty to skipped_think bonus (+20 for not using think) - Adjust max_possible to account for bonus (120/140) - Fix typo in chat template artifact check Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 16:32:13 -05:00
Tobi Lutke	8a1c4cdab0	Add 1.7B and 4B GRPO training and GGUF conversion scripts Training scripts for GRPO fine-tuning: - train_1.7B_grpo.py: GRPO training for Qwen3-1.7B - train_4B_grpo.py: GRPO training for Qwen3-4B GGUF conversion scripts: - convert_1.7B_gguf.py: Merge SFT+GRPO adapters and convert to GGUF - convert_4B_gguf.py: Merge SFT+GRPO adapters and convert to GGUF All scripts use PEP 723 inline dependencies for HuggingFace Jobs. Models published: - tobil/qmd-query-expansion-1.7B-sft - tobil/qmd-query-expansion-1.7B-grpo - tobil/qmd-query-expansion-1.7B-gguf - tobil/qmd-query-expansion-4B-sft - tobil/qmd-query-expansion-4B-grpo - tobil/qmd-query-expansion-4B-gguf Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 11:35:27 -05:00
Tobi Lutke	b9b1b39a76	Update README with separate model repos Changed from subfolder approach to separate repos per model since trainer.push_to_hub() doesn't support subfolder argument. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 08:13:30 -05:00
Tobi Lutke	312c281109	Update README for unified model repository structure All models (0.6B, 1.7B, 4B) with SFT and GRPO variants now go into a single HuggingFace repo (tobil/qmd-query-expansion) with subfolders for each size and training method. Updated loading examples to show subfolder-based model loading. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 01:00:17 -05:00
Tobi Lutke	2648512b7c	Fix TUI to load GRPO models with SFT base first GRPO adapters were trained on merged SFT weights, so they need SFT loaded and merged first before applying the GRPO adapter. Updated MODELS config to include sft_base path for GRPO models, and load_model() now handles the SFT -> merge -> GRPO flow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:47:59 -05:00
Tobi Lutke	f96766cce8	Fix GRPO model loading to use SFT base first The GRPO adapter was trained on merged SFT weights, so loading it directly on the base model results in 0% score. Added --sft-model parameter to evals/run.py to load SFT first, then apply GRPO adapter. With correct loading: GRPO scores 89.7% (all 26 queries Excellent). Updated README with correct GRPO score and loading instructions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:46:07 -05:00
Tobi Lutke	f6a6716c44	Refactor evals into separate run and score scripts New structure: - evals/run.py: Generate model outputs to JSONL - evals/score.py: Score outputs with detailed breakdown - evals/queries.txt: Test queries (26 total) Features: - Supports both HF Hub and local model paths - Named entity preservation scoring - Chat template leakage detection - Strict format validation (every line must be lex:/vec:/hyde:) - Generic phrase detection Usage: uv run evals/run.py --model tobil/qmd-query-expansion-0.6B-v4 uv run evals/score.py evals/results_*.jsonl Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:40:33 -05:00
Tobi Lutke	857a85ab58	Clean up evaluation files - Remove old versioned evaluation files (0.6B, 1.7B, baseline) - Rename evaluation_v4.json -> evaluation_sft.json - Rename evaluation_v4_grpo.json -> evaluation_grpo_failed.json Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:20:39 -05:00
Tobi Lutke	dc8f5a2335	Strict format validation: every line must be lex:/vec:/hyde: Any line that doesn't start with a valid prefix now returns 0.0 instead of just counting as a penalty. This prevents any prose, explanations, bullet points, or other invalid content. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:08:08 -05:00
Tobi Lutke	2ad507a86e	Add chat template leakage detection to reward function Zero reward for outputs containing: - <\|im_start\|>, <\|im_end\|> tokens - <think>, </think> tags (Qwen3 thinking mode) - Role markers like \nassistant\n, \nuser\n - <\|endoftext\|> token Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:07:12 -05:00
Tobi Lutke	6062dc769f	Add named entity extraction to GRPO reward function Key changes: - Extract named entities (acronyms, proper nouns, technical terms) - Heavy penalty (-30) when lex queries miss named entities - Penalty (-15) for generic filler phrases like "find information about" - Compound entity detection (TDS motorsports -> both words) - Update GRPO config with KL regularization (beta=0.04) - Lower learning rate (5e-7) and add max_steps (200) Test results: - "who is TDS motorsports" good: 1.00, bad: 0.30 (was 0.75) - "how to use React hooks" good: 0.87, bad: 0.45 (was 0.75) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:05:40 -05:00
Tobi Lutke	32706a720f	Refactor finetune folder: train/rl scripts with YAML configs Major changes: - train.py: Generic SFT training script using YAML config - rl.py: Generic GRPO training script using YAML config - configs/: YAML configs per training run (sft_v4.yaml, grpo_v4.yaml) - dataset/: Data preparation scripts moved here - tui.py: Interactive model testing interface Training results: - SFT v4: 98.8% avg score (all Excellent) - GRPO v4: 0% (failed - model drifted to verbose explanations) Removed per-model scripts (train_0.6B.py, train_1.7B.py, etc) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 20:26:46 -05:00
Tobi Lutke	d32e13c172	Add HuggingFace login and comprehensive scoring to GRPO v2 training - Add explicit HF_TOKEN login before training - Use SCORING.md criteria as RL reward function - Conservative training: LR 1e-6, LoRA rank 4 - Reward scores: good=0.94, bad=0.38 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 17:37:49 -05:00
Tobi Lutke	c35dbd6cbd	Add comprehensive scoring system for query expansion New scoring criteria (0-100 points): - Format (30): Must have lex: and vec: prefixes - Diversity (30): Multiple types, no echoing query, diverse expansions - Hyde (20): Optional, concise, no newlines, no word repetition - Quality (20): Lex=keywords, vec=natural language See SCORING.md for full documentation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 11:00:55 -05:00
Tobi Lutke	994a094546	Update README with final evaluation results - 0.6B SFT: 95% format compliance (best) - 0.6B GRPO: 0% (catastrophic forgetting from RL) - 1.7B v2: training completed, evaluation pending - Added GRPO evaluation results Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 10:45:48 -05:00
Tobi Lutke	0353994e7d	Fix GRPO training script for TRL API compatibility - Use max_completion_length instead of max_new_tokens - Use processing_class instead of tokenizer - Use args instead of config for GRPOTrainer - Add __name__ attribute to reward function class - Accept **kwargs in reward function for extra TRL args - Add new LoRA adapter after merging SFT weights Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 22:25:09 -05:00

1 2

51 Commits