Key changes: - Extract named entities (acronyms, proper nouns, technical terms) - Heavy penalty (-30) when lex queries miss named entities - Penalty (-15) for generic filler phrases like "find information about" - Compound entity detection (TDS motorsports -> both words) - Update GRPO config with KL regularization (beta=0.04) - Lower learning rate (5e-7) and add max_steps (200) Test results: - "who is TDS motorsports" good: 1.00, bad: 0.30 (was 0.75) - "how to use React hooks" good: 0.87, bad: 0.45 (was 0.75) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| grpo_v4.yaml | ||
| sft_v4.yaml | ||