qmd/configs at dc8f5a2335d0fcf7f2c9ee96548d170ec27e2d88 - qmd

History

Tobi Lutke 6062dc769f Add named entity extraction to GRPO reward function Key changes: - Extract named entities (acronyms, proper nouns, technical terms) - Heavy penalty (-30) when lex queries miss named entities - Penalty (-15) for generic filler phrases like "find information about" - Compound entity detection (TDS motorsports -> both words) - Update GRPO config with KL regularization (beta=0.04) - Lower learning rate (5e-7) and add max_steps (200) Test results: - "who is TDS motorsports" good: 1.00, bad: 0.30 (was 0.75) - "how to use React hooks" good: 0.87, bad: 0.45 (was 0.75) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-25 00:05:40 -05:00
..
grpo_v4.yaml	Add named entity extraction to GRPO reward function	2026-01-25 00:05:40 -05:00
sft_v4.yaml	Refactor finetune folder: train/rl scripts with YAML configs	2026-01-24 20:26:46 -05:00

Add named entity extraction to GRPO reward function

Key changes:
- Extract named entities (acronyms, proper nouns, technical terms)
- Heavy penalty (-30) when lex queries miss named entities
- Penalty (-15) for generic filler phrases like "find information about"
- Compound entity detection (TDS motorsports -> both words)
- Update GRPO config with KL regularization (beta=0.04)
- Lower learning rate (5e-7) and add max_steps (200)

Test results:
- "who is TDS motorsports" good: 1.00, bad: 0.30 (was 0.75)
- "how to use React hooks" good: 0.87, bad: 0.45 (was 0.75)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-25 00:05:40 -05:00

grpo_v4.yaml

Add named entity extraction to GRPO reward function

2026-01-25 00:05:40 -05:00

sft_v4.yaml

Refactor finetune folder: train/rl scripts with YAML configs

2026-01-24 20:26:46 -05:00