- Add finetune/CLAUDE.md documenting the training pipeline - Update configs to output to local outputs/ directory (gitignored) - Document that all data/*.jsonl files are training data - Document local CUDA training vs HuggingFace Jobs cloud training - Enforce eval requirement before any model upload - Single model repo (no -v1, -v2, -v4 versioning) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.7 KiB
QMD Query Expansion Fine-Tuning
Overview
Train Qwen3-1.7B to expand search queries into structured hyde:/lex:/vec: output for QMD's hybrid retrieval pipeline.
Output Format
hyde: A hypothetical document passage that would answer the query.
lex: keyword1
lex: keyword2
vec: semantic query reformulation
vec: another semantic variation
hyde:always comes FIRST (one line max)lex:lines for BM25 keyword search (1-3 lines, short keywords)vec:lines for vector similarity search (1-3 lines, natural language)
Model Repository
Single destination: tobil/qmd-query-expansion-1.7B
- No versioned directories (
-v1,-v2,-v4, etc.) - No separate
-sftor-grporepos for final models - Update the main repo only when eval scores improve
- GGUF variants go to
tobil/qmd-query-expansion-1.7B-gguf
Training Data
All JSONL files in data/ are training data:
data/
├── qmd_expansion_v2.jsonl
├── qmd_expansion_handcrafted_only.jsonl
├── qmd_only_sampled.jsonl
├── qmd_only_variants.jsonl
└── ... any additional .jsonl files
All .jsonl files in data/ should be concatenated for training runs.
Each JSONL line: {"input": "query", "output": "hyde:...\nlex:...\nvec:..."}
Data Generation Tools
| Script | Purpose |
|---|---|
dataset/generate_data.py |
Generate via Claude API (high quality) |
dataset/generate_data_offline.py |
Transform from HuggingFace datasets |
dataset/prepare_data.py |
Format for Qwen3 chat template |
dataset/clean_data.py |
Detect and fix technical term issues |
generate_only_variants.py |
Generate /only:lex and /only:vec variants |
Local Training Output
All training outputs go to outputs/ (gitignored):
outputs/
├── sft/ # SFT checkpoint
└── grpo/ # GRPO checkpoint
Training Pipeline
Always use Qwen3-1.7B as the base model unless explicitly stated otherwise.
Training can run locally (requires CUDA GPU) or via HuggingFace Jobs (cloud GPU, no local hardware needed).
Stage 1: SFT
# Local (requires CUDA)
uv run train.py sft --config configs/sft.yaml
# Output: outputs/sft/
# Cloud (HuggingFace Jobs - no local GPU needed)
hf jobs uv run --flavor a10g-large --secrets HF_TOKEN --timeout 2h jobs/sft.py
Stage 2: GRPO
# Local (requires CUDA)
uv run train.py grpo --config configs/grpo.yaml
# Output: outputs/grpo/
# Cloud (HuggingFace Jobs - no local GPU needed)
hf jobs uv run --flavor a10g-large --secrets HF_TOKEN --timeout 4h jobs/grpo.py
HuggingFace Jobs
If no local CUDA device is available, use hf jobs to run training in the cloud:
hf jobs ps # List running jobs
hf jobs logs <job-id> # Stream logs
hf jobs inspect <job-id> # Check status
hf jobs cancel <job-id> # Cancel a job
The jobs/ directory contains self-contained scripts that include all dependencies inline.
Evaluation
# Eval local model
uv run eval.py --model ./outputs/grpo
# Eval HuggingFace model
uv run eval.py --model tobil/qmd-query-expansion-1.7B
# Save eval results to file
uv run eval.py --model ./outputs/grpo -o eval_results.json
Quality Scoring
reward.py is the single source of truth for scoring:
# Self-test the reward function
uv run reward.py
See SCORING.md for the full rubric.
Deployment Rules
Never upload without eval. Every model push must include eval results.
Checklist
- Train SFT on all
data/*.jsonl→outputs/sft/ - Train GRPO on top of SFT →
outputs/grpo/ - Run eval on local model:
uv run eval.py --model ./outputs/grpo -o eval_results.json - Compare against current deployed model's eval
- If eval improves:
- Push to
tobil/qmd-query-expansion-1.7B - Include eval output in the model card / commit message
- Push to
- Convert to GGUF and update
tobil/qmd-query-expansion-1.7B-gguf - Update
src/llm.tsDEFAULT_GENERATE_MODEL if repo name changed
Key Files
finetune/
├── reward.py # Scoring function (single source of truth)
├── train.py # Unified SFT + GRPO training
├── eval.py # Generate and score expansions
├── convert_gguf.py # GGUF conversion
├── SCORING.md # Detailed scoring rubric
├── CLAUDE.md # This file
├── data/ # All training JSONL files
├── outputs/ # Local training outputs (gitignored)
├── dataset/ # Data generation scripts
├── jobs/ # Self-contained HuggingFace Jobs scripts
├── configs/ # Training configs (sft.yaml, grpo.yaml)
└── evals/ # Test queries and results