History

Tobi Lutke 0353994e7d Fix GRPO training script for TRL API compatibility - Use max_completion_length instead of max_new_tokens - Use processing_class instead of tokenizer - Use args instead of config for GRPOTrainer - Add __name__ attribute to reward function class - Accept **kwargs in reward function for extra TRL args - Add new LoRA adapter after merging SFT weights Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>		2026-01-23 22:25:09 -05:00
..
data	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
.gitignore	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
evaluate_baseline.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
evaluate_model.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
evaluation_0.6B_v2.json	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
evaluation_0.6B.json	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
evaluation_1.7B_v2.json	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
evaluation_1.7B.json	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
evaluation_baseline_0.6B.json	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
export_gguf.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
generate_data_offline.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
generate_data.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
prepare_data.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
README.md	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
train_0.6B.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
train_1.7B_v2.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
train_1.7B.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00
train_grpo.py	Fix GRPO training script for TRL API compatibility	2026-01-23 22:25:09 -05:00
train_hf_job.py	Add query expansion model finetuning infrastructure	2026-01-23 19:47:06 -05:00

README.md

QMD Query Expansion Model Finetuning

Finetune small Qwen models for QMD's query expansion task.

Goal

Train models that convert user queries into retrieval-optimized outputs:

Input: "how to configure authentication"

Output:
lex: authentication setup
lex: auth configuration
vec: how to set up user authentication in the application
hyde: To configure authentication, set the AUTH_SECRET environment variable and enable the auth middleware in your application config.

Output Format

Type	Purpose	Count
`lex`	BM25 keyword variations	1-3
`vec`	Semantic reformulations	1-3
`hyde`	Hypothetical document passage	0-1

Trained Models

Model	HuggingFace	Format Compliance	Status
Qwen3-0.6B (finetuned)	tobil/qmd-query-expansion-0.6B	95%	Recommended
Qwen3-1.7B (finetuned)	tobil/qmd-query-expansion-1.7B	0%	Training issues
Qwen3-0.6B (baseline)	-	0%	Untrained

Training Dataset

Dataset: tobil/qmd-query-expansion-train
Source: Transformed from s-emanuilov/query-expansion (CC BY 4.0)
Size: 5,157 examples (train: 4,641, eval: 516)
Format: Chat messages with user query and assistant response in lex/vec/hyde format

Directory Structure

finetune/
├── README.md                 # This file
├── DATASETS.md               # Dataset research findings
├── TRAINING_JOBS.md          # HuggingFace Jobs tracking
├── generate_data_offline.py  # Transform s-emanuilov dataset to QMD format
├── prepare_data.py           # Upload to HuggingFace Hub
├── train_0.6B.py             # Training script for 0.6B model
├── train_1.7B.py             # Training script for 1.7B model
├── train_grpo.py             # GRPO RL training (optional)
├── evaluate_model.py         # Evaluate finetuned models
├── evaluate_baseline.py      # Evaluate base models
├── data/
│   ├── qmd_expansion.jsonl   # Generated training data
│   └── train/                # Prepared chat format
└── evaluation_*.json         # Evaluation results

Quick Start

1. Generate Training Data

# Transform s-emanuilov dataset to QMD format (no API needed)
uv run generate_data_offline.py

2. Prepare and Upload Dataset

# Convert to chat format and upload to HuggingFace Hub
uv run prepare_data.py

3. Train on HuggingFace Jobs

# Train Qwen3-0.6B (recommended)
hf jobs uv run --flavor a10g-large --timeout 3h --secrets HF_TOKEN \
  "https://huggingface.co/tobil/qmd-training-scripts/resolve/main/train_0.6B.py"

4. Evaluate

# Evaluate finetuned model
uv run evaluate_model.py --model tobil/qmd-query-expansion-0.6B --base-model Qwen/Qwen3-0.6B

# Compare to baseline
uv run evaluate_baseline.py --model Qwen/Qwen3-0.6B --num-queries 10

5. Export to GGUF

# Convert to GGUF for node-llama-cpp (TODO)
uv run export_gguf.py --model tobil/qmd-query-expansion-0.6B --quantization Q8_0

Training Configuration

Parameter	Value
Method	LoRA (rank 16, alpha 32)
Learning Rate	2e-4
Epochs	3
Batch Size	4 (with 4x gradient accumulation)
Max Seq Length	512
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Prompt Format

The models are trained on this simple prompt format:

Expand this search query:

{query}

The model responds with lex/vec/hyde lines directly.

Evaluation Results

0.6B Finetuned Model (95% format compliance)

Sample outputs:

Query	Output
`how to configure authentication`	lex: steps for setting up authentication vec: steps for setting up authentication in cloud services hyde: The process of configure authentication...
`kubernetes vs docker swarm`	lex: kubernetes and docker swarm vec: kubernetes vs docker swarm hyde: Kubernetes vs docker swarm is an important concept...
`cors error fix`	lex: how to fix cors vec: how to fix cors issues in web apps hyde: The topic of cors error fix guide...

Baseline Model (0% format compliance)

The untrained model generates random prose, code blocks, or repetitive text with no understanding of the lex/vec/hyde format.

Future Work

Export to GGUF for local inference
Integrate into QMD as default query expansion model
GRPO training for improved diversity (optional)
Fix 1.7B training issues