A CLI tool for searching markdown knowledge bases using hybrid retrieval: - BM25 full-text search via SQLite FTS5 - Vector semantic search via sqlite-vec + Ollama embeddings - LLM re-ranking with qwen3-reranker (logprobs-based scoring) - Reciprocal Rank Fusion with weighted queries and position-aware blending Features: - `qmd add .` - Index markdown files in current directory - `qmd embed` - Generate vector embeddings - `qmd search` - BM25 full-text search - `qmd vsearch` - Vector similarity search - `qmd query` - Hybrid search with query expansion + reranking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .gitignore | ||
| bun.lock | ||
| CLAUDE.md | ||
| package.json | ||
| qmd | ||
| qmd.ts | ||
| README.md | ||
| tsconfig.json | ||
QMD - Quick Markdown Search
A CLI tool for searching markdown knowledge bases using hybrid retrieval: combining BM25 full-text search, vector semantic search, and LLM re-ranking.
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ QMD Search Pipeline │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ User Query │
└────────┬────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Query Expansion│ │ Direct Query │
│ (qwen3:0.6b) │ │ (×2 weight) │
└───────┬────────┘ └───────┬────────┘
│ │
│ 1 alternative query │
└──────────────┬──────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ FTS Search │ │ FTS Search │ │ FTS Search │
│ (BM25) │ │ (BM25) │ │ (BM25) │
└───────┬────────┘ └───────┬────────┘ └───────┬────────┘
│ │ │
┌───────┴────────┐ ┌───────┴────────┐ ┌───────┴────────┐
│ Vector Search │ │ Vector Search │ │ Vector Search │
│(embeddinggemma)│ │(embeddinggemma)│ │(embeddinggemma)│
└───────┬────────┘ └───────┬────────┘ └───────┬────────┘
│ │ │
└──────────────────┼──────────────────┘
│
▼
┌───────────────────────┐
│ RRF Fusion + Bonus │
│ (Top-rank preserved) │
│ Top 30 Kept │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ LLM Re-ranking │
│ (qwen3-reranker) │
│ Yes/No + logprobs │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ Position-Aware Blend │
│ (RRF + Reranker) │
└───────────────────────┘
Score Normalization & Fusion
Search Backends
| Backend | Raw Score | Conversion | Range |
|---|---|---|---|
| FTS (BM25) | SQLite FTS5 BM25 | Math.abs(score) |
0 to ~25+ |
| Vector | Cosine distance | 1 / (1 + distance) |
0.0 to 1.0 |
| Reranker | LLM 0-10 rating | score / 10 |
0.0 to 1.0 |
Fusion Strategy
The query command uses Reciprocal Rank Fusion (RRF) with position-aware blending:
- Query Expansion: Original query (×2 for weighting) + 1 LLM variation
- Parallel Retrieval: Each query searches both FTS and vector indexes
- RRF Fusion: Combine all result lists using
score = Σ(1/(k+rank+1))where k=60 - Top-Rank Bonus: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
- Top-K Selection: Take top 30 candidates for reranking
- Re-ranking: LLM scores each document (yes/no with logprobs confidence)
- Position-Aware Blending:
- RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
- RRF rank 4-10: 60% retrieval, 40% reranker
- RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)
Why this approach: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.
Score Interpretation
| Score | Meaning |
|---|---|
| 0.8 - 1.0 | Highly relevant |
| 0.5 - 0.8 | Moderately relevant |
| 0.2 - 0.5 | Somewhat relevant |
| 0.0 - 0.2 | Low relevance |
Requirements
System Requirements
- Bun >= 1.0.0
- macOS: Homebrew SQLite (for extension support)
brew install sqlite - Ollama running locally (default:
http://localhost:11434)
Ollama Models
QMD uses three models (auto-pulled if missing):
| Model | Purpose | Size |
|---|---|---|
embeddinggemma |
Vector embeddings | ~1.6GB |
ExpedientFalcon/qwen3-reranker:0.6b-q8_0 |
Re-ranking (trained) | ~640MB |
qwen3:0.6b |
Query expansion | ~400MB |
# Pre-pull models (optional)
ollama pull embeddinggemma
ollama pull ExpedientFalcon/qwen3-reranker:0.6b-q8_0
ollama pull qwen3:0.6b
Installation
bun install
Usage
Index Markdown Files
# Index all .md files in current directory
qmd index
# Index with custom glob pattern
qmd index "**/*.md"
# Index specific directory
qmd index "docs/**/*.md"
Generate Vector Embeddings
# Embed all indexed documents
qmd embed
Search Commands
┌──────────────────────────────────────────────────────────────────┐
│ Search Modes │
├──────────┬───────────────────────────────────────────────────────┤
│ search │ BM25 full-text search only │
│ vsearch │ Vector semantic search only │
│ query │ Hybrid: FTS + Vector + Query Expansion + Re-ranking │
└──────────┴───────────────────────────────────────────────────────┘
# Full-text search (fast, keyword-based)
qmd search "authentication flow"
# Vector search (semantic similarity)
qmd vsearch "how to login"
# Hybrid search with re-ranking (best quality)
qmd query "user authentication"
Options
-n <num> # Number of results (default: 5)
--min-score <num> # Minimum score threshold (default: 0)
--full # Show full document content
-csv # CSV output (for piping/scripting)
-md # Output as markdown
-xml # Output as XML
--index <name> # Use named index
Output Format
Default output is colorized CLI format (respects NO_COLOR env):
93% docs/guide.md:42
│ This section covers the **craftsmanship** of building
│ quality software with attention to detail.
│ See also: engineering principles
67% notes/meeting.md:15
│ Discussion about code quality and craftsmanship
│ in the development process.
- Score: Color-coded (green >70%, yellow >40%, dim otherwise)
- Path: Shortened relative to current directory
- Line: Line number where match was found (omitted for vector-only results)
- Snippet: Context around match with query terms highlighted
Examples
# Get 10 results with minimum score 0.3
qmd query -n 10 --min-score 0.3 "API design patterns"
# Output as markdown for LLM context
qmd search -md --full "error handling"
# Use separate index for different knowledge base
qmd --index work search "quarterly reports"
Manage Collections
# List all indexed collections
qmd list
# Show database statistics
qmd stats
# Forget a collection
qmd forget
Data Storage
Index stored in: ~/.cache/qmd/index.sqlite
Schema
collections -- Indexed directories and glob patterns
documents -- Markdown content with metadata
documents_fts -- FTS5 full-text index
content_vectors -- Embedding cache (by content hash)
vectors_vec -- sqlite-vec vector index
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_URL |
http://localhost:11434 |
Ollama API endpoint |
XDG_CACHE_HOME |
~/.cache |
Cache directory location |
How It Works
Indexing Flow
Markdown Files ──► Parse Title ──► Hash Content ──► Store in SQLite
│ │
└─► FTS5 Index ◄─────────────────────┘
Embedding Flow
Document ──► Format for EmbeddingGemma ──► Ollama API ──► Store Vector
"title: X | text: Y" /api/embed
Query Flow (Hybrid)
Query ──► Expand (3 variations) ──► FTS + Vector (per variation)
│
▼
Merge (max score)
│
▼
Top 25 candidates
│
▼
LLM Re-rank (0-10)
│
▼
Final ranked results
Model Configuration
Models are configured as constants in qmd.ts:
const DEFAULT_EMBED_MODEL = "embeddinggemma";
const DEFAULT_RERANK_MODEL = "ExpedientFalcon/qwen3-reranker:0.6b-q8_0";
const DEFAULT_QUERY_MODEL = "qwen3:0.6b";
EmbeddingGemma Prompt Format
// For queries
"task: search result | query: {query}"
// For documents
"title: {title} | text: {content}"
Qwen3-Reranker
A dedicated reranker model trained on relevance classification:
System: Judge whether the Document meets the requirements based on the Query
and the Instruct provided. Note that the answer can only be "yes" or "no".
User: <Instruct>: Given a search query, determine if the document is relevant...
<Query>: {query}
<Document>: {doc}
- Uses
logprobs: trueto extract token probabilities - Outputs yes/no with confidence score (0.0 - 1.0)
num_predict: 1- Only need the yes/no token
Qwen3 (Query Expansion)
num_predict: 150- For generating query variations
License
MIT