qmd/finetune/configs
Tobi Lutke 6062dc769f
Add named entity extraction to GRPO reward function
Key changes:
- Extract named entities (acronyms, proper nouns, technical terms)
- Heavy penalty (-30) when lex queries miss named entities
- Penalty (-15) for generic filler phrases like "find information about"
- Compound entity detection (TDS motorsports -> both words)
- Update GRPO config with KL regularization (beta=0.04)
- Lower learning rate (5e-7) and add max_steps (200)

Test results:
- "who is TDS motorsports" good: 1.00, bad: 0.30 (was 0.75)
- "how to use React hooks" good: 0.87, bad: 0.45 (was 0.75)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 00:05:40 -05:00
..
grpo_v4.yaml Add named entity extraction to GRPO reward function 2026-01-25 00:05:40 -05:00
sft_v4.yaml Refactor finetune folder: train/rl scripts with YAML configs 2026-01-24 20:26:46 -05:00