Commit Graph

54 Commits

Author SHA1 Message Date
Tobi Lutke
c35dbd6cbd
Add comprehensive scoring system for query expansion
New scoring criteria (0-100 points):
- Format (30): Must have lex: and vec: prefixes
- Diversity (30): Multiple types, no echoing query, diverse expansions
- Hyde (20): Optional, concise, no newlines, no word repetition
- Quality (20): Lex=keywords, vec=natural language

See SCORING.md for full documentation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 11:00:55 -05:00
Tobi Lutke
994a094546
Update README with final evaluation results
- 0.6B SFT: 95% format compliance (best)
- 0.6B GRPO: 0% (catastrophic forgetting from RL)
- 1.7B v2: training completed, evaluation pending
- Added GRPO evaluation results

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 10:45:48 -05:00
Tobi Lutke
0353994e7d
Fix GRPO training script for TRL API compatibility
- Use max_completion_length instead of max_new_tokens
- Use processing_class instead of tokenizer
- Use args instead of config for GRPOTrainer
- Add __name__ attribute to reward function class
- Accept **kwargs in reward function for extra TRL args
- Add new LoRA adapter after merging SFT weights

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 22:25:09 -05:00
Tobi Lutke
7cca164dd9
Add query expansion model finetuning infrastructure
- Training scripts for Qwen3-0.6B and 1.7B models
- Dataset generation from s-emanuilov/query-expansion
- Evaluation scripts comparing finetuned vs baseline models
- GRPO RL training script (optional improvement)
- Export script for GGUF conversion

Results:
- 0.6B finetuned: 95% format compliance (lex/vec/hyde)
- Baseline: 0% format compliance
- Dataset: 5,157 examples on HuggingFace Hub

Models available at:
- tobil/qmd-query-expansion-0.6B (recommended)
- tobil/qmd-query-expansion-train (dataset)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 19:47:06 -05:00