feat: compile to JS for npm, release system, full changelog

- Add tsc build step (tsconfig.build.json) so npm package ships compiled JS instead of raw TypeScript requiring tsx at runtime - Update qmd wrapper and daemon spawn to use dist/qmd.js in production while keeping tsx for development - Add self-installing pre-push hook validating v* tag pushes: package.json version match, changelog entry, CI status - Add release.sh script that renames [Unreleased] to versioned entry, bumps package.json, commits, and tags - Add extract-changelog.sh for cumulative GitHub release notes - Update publish workflow with build step and GitHub release creation - Flesh out CHANGELOG.md with full history from 0.1.0 through 1.0.0 in Keep-a-Changelog format with PR/contributor attributions - Add release standards and changelog guidelines to CLAUDE.md
2026-02-16 08:42:05 -04:00 · 2026-02-16 08:42:05 -04:00 · 09803a75b7
commit 09803a75b7
parent 77c6eba159
18 changed files with 725 additions and 121 deletions
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@ -33,6 +33,23 @@ jobs:
          node-version: 22
          registry-url: https://registry.npmjs.org

+      - run: npm run build
      - run: npm publish --provenance --access public
        env:
          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+
+      - name: Extract release notes
+        id: notes
+        run: |
+          VERSION="${GITHUB_REF_NAME#v}"
+          NOTES=$(./scripts/extract-changelog.sh "$VERSION")
+          # Write to file for gh release (avoids quoting issues)
+          echo "$NOTES" > /tmp/release-notes.md
+
+      - name: Create GitHub release
+        run: |
+          gh release create "$GITHUB_REF_NAME" \
+            --title "$GITHUB_REF_NAME" \
+            --notes-file /tmp/release-notes.md
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,5 @@
 node_modules/
+dist/
 .npmrc
 *.sqlite
 .DS_Store
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,68 +1,275 @@
 # Changelog

-All notable changes to QMD will be documented in this file.
+## [Unreleased]

 ## [1.0.0] - 2026-02-15

-### Node.js Compatibility
+QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
+through parallel GPU contexts. GPU auto-detection replaces the unreliable
+`gpu: "auto"` with explicit CUDA/Metal/Vulkan probing.

-QMD now runs on both **Node.js (>=22)** and **Bun**. Install with `npm install -g @tobilu/qmd` or `bun install -g @tobilu/qmd` — your choice. The `qmd` wrapper auto-detects Node.js via `tsx` and works out of the box with mise, asdf, nvm, and Homebrew installs.
+### Changes

-### Performance
-
- **Parallel embedding & reranking** — multiple contexts split work across CPU cores (or VRAM on GPU), delivering up to **2.7x faster reranking** and significantly faster embedding on multi-core machines
- **Flash attention** — ~20% less VRAM per reranking context, enabling more parallel contexts on GPU
- **Right-sized contexts** — reranker context dropped from 40960 to 2048 tokens (17x less memory), since chunks are capped at ~900 tokens
- **Adaptive parallelism** — automatically scales context count based on available VRAM (GPU) or CPU math cores
- **CPU thread splitting** — each context runs on its own cores for true parallelism instead of contending on a single context
-
-### GPU Auto-Detection
-
- Probes for CUDA, Metal, and Vulkan at startup — uses the best available backend
- Falls back gracefully to CPU with a warning if GPU init fails
- `qmd status` now shows device info (GPU type, VRAM usage)
-
-### Test Suite
-
- Tests split into `src/*.test.ts` (unit), `src/models/*.test.ts` (model), and `src/integration/*.test.ts` (CLI/integration)
- Vitest config for Node.js; bun test still works for Bun
- New `eval-bm25` and `store.helpers.unit` test suites
+- Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
+  abstraction layer (`src/db.ts`). `bun:sqlite` on Bun, `better-sqlite3` on
+  Node. The `qmd` wrapper auto-detects a suitable Node.js install via PATH,
+  then falls back to mise, asdf, nvm, and Homebrew locations.
+- Performance: parallel embedding & reranking via multiple LlamaContext
+  instances — up to 2.7x faster on multi-core machines.
+- Performance: flash attention for ~20% less VRAM per reranking context,
+  enabling more parallel contexts on GPU.
+- Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
+  memory) since chunks are capped at ~900 tokens.
+- Performance: adaptive parallelism — context count computed from available
+  VRAM (GPU) or CPU math cores rather than hardcoded.
+- GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
+  relying on node-llama-cpp's `gpu: "auto"`. `qmd status` shows device info.
+- Tests: reorganized into flat `test/` directory with vitest for Node.js and
+  bun test for Bun. New `eval-bm25` and `store.helpers.unit` suites.

 ### Fixes

- Prevent VRAM waste from duplicate context creation during concurrent loads
- Collection-aware FTS filtering for scoped keyword search
-
---
+- Prevent VRAM waste from duplicate context creation during concurrent
+  `embedBatch` calls — initialization lock now covers the full path.
+- Collection-aware FTS filtering so scoped keyword search actually restricts
+  results to the requested collection.

 ## [0.9.0] - 2026-02-15

-Initial public release.
+First published release on npm as `@tobilu/qmd`. MCP HTTP transport with
+daemon mode cuts warm query latency from ~16s to ~10s by keeping models
+loaded between requests.

-### Features
+### Changes

- **Hybrid search pipeline** — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
- **Smart chunking** — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
- **Query expansion** — fine-tuned Qwen3 1.7B model generates search variations for better recall
- **Cross-encoder reranking** — Qwen3-Reranker scores candidates with position-aware blending
- **Vector embeddings** — EmbeddingGemma 300M via node-llama-cpp, all on-device
- **MCP server** — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
- **Collection management** — index multiple directories with glob patterns
- **Context annotations** — add descriptions to collections and paths for richer search
- **Document IDs** — 6-char content hash for stable references across re-indexes
- **Multi-get** — retrieve multiple documents by glob pattern, comma list, or docids
- **Multiple output formats** — JSON, CSV, Markdown, XML, files list
- **Claude Code plugin** — inline status checks and MCP integration
+- MCP: HTTP transport with daemon lifecycle — `qmd mcp --http --daemon`
+  starts a background server, `qmd mcp stop` shuts it down. Models stay warm
+  in VRAM between queries. #149 (thanks @igrigorik)
+- Search: type-routed query expansion preserves lex/vec/hyde type info and
+  routes to the appropriate backend. Eliminates ~4 wasted backend calls per
+  query (10.0 → 6.0 calls, 1278ms → 549ms). #149 (thanks @igrigorik)
+- Search: unified pipeline — extracted `hybridQuery()` and
+  `vectorSearchQuery()` to `store.ts` so CLI and MCP share identical logic.
+  Fixes a class of bugs where results differed between the two. #149 (thanks
+  @igrigorik)
+- MCP: dynamic instructions generated at startup from actual index state —
+  LLMs see collection names, doc counts, and content descriptions. #149
+  (thanks @igrigorik)
+- MCP: tool renames (vsearch → vector_search, query → deep_search) with
+  rewritten descriptions for better tool selection. #149 (thanks @igrigorik)
+- Integration: Claude Code plugin with inline status checks and MCP
+  integration. #99 (thanks @galligan)

 ### Fixes

- Handle dense content (code) that tokenizes beyond expected chunk size
- Proper cleanup of Metal GPU resources
- SQLite-vec readiness verification after extension load
- Reactivate deactivated documents on re-index
- BM25 score normalization with Math.abs
- Bun UTF-8 path corruption workaround
+- BM25 score normalization — formula was inverted (`1/(1+|x|)` instead of
+  `|x|/(1+|x|)`), so strong matches scored *lowest*. Broke `--min-score`
+  filtering and made the "strong signal" short-circuit dead code. #76 (thanks
+  @dgilperez)
+- Normalize Unicode paths to NFC for macOS compatibility. #82 (thanks
+  @c-stoeckl)
+- Handle dense content (code) that tokenizes beyond expected chunk size.
+- Proper cleanup of Metal GPU resources on process exit.
+- SQLite-vec readiness verification after extension load.
+- Reactivate deactivated documents on re-index instead of creating duplicates.
+- Bun UTF-8 path corruption workaround for non-ASCII filenames.
+- Disable following symlinks in glob.scan to avoid infinite loops.

+## [0.8.0] - 2026-01-28
+
+Fine-tuned query expansion model trained with GRPO replaces the stock Qwen3
+0.6B. The training pipeline scores expansions on named entity preservation,
+format compliance, and diversity — producing noticeably better lexical
+variations and HyDE documents.
+
+### Changes
+
+- LLM: deploy GRPO-trained (Group Relative Policy Optimization) query
+  expansion model, hosted on HuggingFace and auto-downloaded on first use.
+  Better preservation of proper nouns and technical terms in expansions.
+- LLM: `/only:lex` mode for single-type expansions — useful when you know
+  which search backend will help.
+- LLM: HyDE output moved to first position so vector search can start
+  embedding while other expansions generate.
+- LLM: session lifecycle management via `withLLMSession()` pattern — ensures
+  cleanup even on failure, similar to database transactions.
+- Integration: org-mode title extraction support. #50 (thanks @sh54)
+- Integration: SQLite extension loading in Nix devshell. #48 (thanks @sh54)
+- Integration: AI agent discovery via skills.sh. #64 (thanks @Algiras)
+
+### Fixes
+
+- Use sequential embedding on CPU-only systems — parallel contexts caused a
+  race condition where contexts competed for CPU cores, making things slower.
+  #54 (thanks @freeman-jiang)
+- Fix `collectionName` column in vector search SQL (was still using old
+  `collectionId` from before YAML migration). #61 (thanks @jdvmi00)
+- Fix Qwen3 sampling params to prevent repetition loops — stock
+  temperature/top-p caused occasional infinite repeat patterns.
+- Add `--index` option to CLI argument parser (was documented but not wired
+  up). #84 (thanks @Tritlo)
+- Fix DisposedError during slow batch embedding. #41 (thanks @wuhup)
+
+## [0.7.0] - 2026-01-09
+
+First community contributions. The project gained external contributors,
+surfacing bugs that only appear in diverse environments — Homebrew sqlite-vec
+paths, case-sensitive model filenames, and sqlite-vec JOIN incompatibilities.
+
+### Changes
+
+- Indexing: native `realpathSync()` replaces `readlink -f` subprocess spawn
+  per file. On a 5000-file collection this eliminates 5000 shell spawns,
+  ~15% faster. #8 (thanks @burke)
+- Indexing: single-pass tokenization — chunking algorithm tokenized each
+  document twice (count then split); now tokenizes once and reuses. #9
+  (thanks @burke)
+
+### Fixes
+
+- Fix `vsearch` and `query` hanging — sqlite-vec's virtual table doesn't
+  support the JOIN pattern used; rewrote to subquery. #23 (thanks @mbrendan)
+- Fix MCP server exiting immediately after startup — process had no active
+  handles keeping the event loop alive. #29 (thanks @mostlydev)
+- Fix collection filter SQL to properly restrict vector search results.
+- Support non-ASCII filenames in collection filter.
+- Skip empty files during indexing instead of crashing on zero-length content.
+- Fix case sensitivity in Qwen3 model filename resolution. #15 (thanks
+  @gavrix)
+- Fix sqlite-vec loading on macOS with Homebrew (`BREW_PREFIX` detection).
+  #42 (thanks @komsit37)
+- Fix Nix flake to use correct `src/qmd.ts` path. #7 (thanks @burke)
+- Fix docid lookup with quotes support in get command. #36 (thanks
+  @JoshuaLelon)
+- Fix query expansion model size in documentation. #38 (thanks @odysseus0)
+
+## [0.6.0] - 2025-12-28
+
+Replaced Ollama HTTP API with node-llama-cpp for all LLM operations. Ollama
+adds convenience but also a running server dependency. node-llama-cpp loads
+GGUF models directly in-process — zero external dependencies. Models
+auto-download from HuggingFace on first use.
+
+### Changes
+
+- LLM: structured query expansion via JSON schema grammar constraints.
+  Model produces typed expansions — **lexical** (BM25 keywords), **vector**
+  (semantic rephrasings), **HyDE** (hypothetical document excerpts) — so each
+  routes to the right backend instead of sending everything everywhere.
+- LLM: lazy model loading with 2-minute inactivity auto-unload. Keeps memory
+  low when idle while avoiding ~3s model load on every query.
+- Search: conditional query expansion — when BM25 returns strong results, the
+  expensive LLM expansion is skipped entirely.
+- Search: multi-chunk reranking — documents with multiple relevant chunks
+  scored by aggregating across all chunks rather than best single chunk.
+- Search: cosine distance for vector search (was L2).
+- Search: embeddinggemma nomic-style prompt formatting.
+- Testing: evaluation harness with synthetic test documents and Hit@K metrics
+  for BM25, vector, and hybrid RRF.
+
+## [0.5.0] - 2025-12-13
+
+Collections and contexts moved from SQLite tables to YAML at
+`~/.config/qmd/index.yml`. SQLite was overkill for config — you can't share
+it, and it's opaque. YAML is human-readable and version-controllable. The
+migration was extensive (35+ commits) because every part of the system that
+touched collections or contexts had to be updated.
+
+### Changes
+
+- Config: YAML-based collections and contexts replace SQLite tables.
+  `collections` and `path_contexts` tables dropped from schema. Collections
+  support an optional `update:` command (e.g., `git pull`) before re-index.
+- CLI: `qmd collection add/list/remove/rename` commands with `--name` and
+  `--mask` glob pattern support.
+- CLI: `qmd ls` virtual file tree — list collections, files in a collection,
+  or files under a path prefix.
+- CLI: `qmd context add/list/check/rm` with hierarchical context inheritance.
+  A query to `qmd://notes/2024/jan/` inherits context from `notes/`,
+  `notes/2024/`, and `notes/2024/jan/`.
+- CLI: `qmd context add / "text"` for global context across all collections.
+- CLI: `qmd context check` audit command to find paths without context.
+- Paths: `qmd://` virtual URI scheme for portable document references.
+  `qmd://notes/ideas.md` works regardless of where the collection lives on
+  disk. Works in `get`, `multi-get`, `ls`, and context commands.
+- CLI: document IDs (docid) — first 6 chars of content hash for stable
+  references. Shown as `#abc123` in search results, usable with `get` and
+  `multi-get`.
+- CLI: `--line-numbers` flag for get command output.
+
+## [0.4.0] - 2025-12-10
+
+MCP server for AI agent integration. Without it, agents had to shell out to
+`qmd search` and parse CLI output. The monolithic `qmd.ts` (1840 lines) was
+split into focused modules with the project's first test suite (215 tests).
+
+### Changes
+
+- MCP: stdio server with tools for search, vector search, hybrid query,
+  document retrieval, and status. Runs over stdio transport for Claude
+  Desktop and MCP clients.
+- MCP: spec-compliant with June 2025 MCP specification — removed non-spec
+  `mimeType`, added `isError: true` to errors, `structuredContent` for
+  machine-readable results, proper URI encoding.
+- MCP: simplified tool naming (`qmd_search` → `search`) since MCP already
+  namespaces by server.
+- Architecture: extract `store.ts` (1221 LOC), `llm.ts` (539 LOC),
+  `formatter.ts` (359 LOC), `mcp.ts` (503 LOC) from monolithic `qmd.ts`.
+- Testing: 215 tests (store: 96, llm: 60, mcp: 59) with mocked Ollama for
+  fast, deterministic runs. Before this: zero tests.
+
+## [0.3.0] - 2025-12-08
+
+Document chunking for vector search. A 5000-word document about many topics
+gets a single embedding that averages everything together, matching poorly for
+specific queries. Chunking produces one embedding per ~900-token section with
+focused semantic signal.
+
+### Changes
+
+- Search: markdown-aware chunking — prefers heading boundaries, then paragraph
+  breaks, then sentence boundaries. 15% overlap between chunks ensures
+  cross-boundary queries still match.
+- Search: multi-chunk scoring bonus (+0.02 per additional chunk, capped at
+  +0.1 for 5+ chunks). Documents relevant in multiple sections rank higher.
+- CLI: display paths show collection-relative paths and extracted titles
+  (from H1 headings or YAML frontmatter) instead of raw filesystem paths.
+- CLI: `--all` flag returns all matches (use with `--min-score` to filter).
+- CLI: byte-based progress bar with ETA for `embed` command.
+- CLI: human-readable time formatting ("15m 4s" instead of "904.2s").
+- CLI: documents >64KB truncated with warning during embedding.
+
+## [0.2.0] - 2025-12-08
+
+### Changes
+
+- CLI: `--json`, `--csv`, `--files`, `--md`, `--xml` output format flags.
+  `--json` for programmatic access, `--files` for piping, `--md`/`--xml` for
+  LLM consumption, `--csv` for spreadsheets.
+- CLI: `qmd status` shows index health — document count, size, embedding
+  coverage, time since last update.
+- Search: weighted RRF — original query gets 2x weight relative to expanded
+  queries since the user's actual words are a more reliable signal.
+
+## [0.1.0] - 2025-12-07
+
+Initial implementation. Built in a single day for searching personal markdown
+notes, journals, and meeting transcripts.
+
+### Changes
+
+- Search: SQLite FTS5 with BM25 ranking. Chose SQLite over Elasticsearch
+  because QMD is a personal tool — single binary, no server dependencies.
+- Search: sqlite-vec for vector similarity. Same rationale: in-process, no
+  external vector database.
+- Search: Reciprocal Rank Fusion to combine BM25 and vector results. RRF is
+  parameter-free and handles missing signals gracefully.
+- LLM: Ollama for embeddings, reranking, and query expansion. Later replaced
+  with node-llama-cpp in 0.6.0.
+- CLI: `qmd add`, `qmd embed`, `qmd search`, `qmd vsearch`, `qmd query`,
+  `qmd get`. ~1800 lines of TypeScript in a single `qmd.ts` file.
+
+[Unreleased]: https://github.com/tobi/qmd/compare/v1.0.0...HEAD
 [1.0.0]: https://github.com/tobi/qmd/releases/tag/v1.0.0
-[0.9.0]: https://github.com/tobi/qmd/releases/tag/v0.9.0
+[0.9.0]: https://github.com/tobi/qmd/compare/v0.8.0...v0.9.0

--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -149,4 +149,17 @@ bun test --preload ./src/test-preload.ts test/
 ## Do NOT compile

 - Never run `bun build --compile` - it overwrites the shell wrapper and breaks sqlite-vec
- The `qmd` file is a shell script that runs `bun src/qmd.ts` - do not replace it
+- The `qmd` file is a shell script that runs compiled JS from `dist/` - do not replace it
+- `npm run build` compiles TypeScript to `dist/` via `tsc -p tsconfig.build.json`
+
+## Releasing
+
+Use `/release <version>` to cut a release. Full changelog standards,
+release workflow, and git hook setup are documented in the
+[release skill](skills/release/SKILL.md).
+
+Key points:
+- Add changelog entries under `## [Unreleased]` **as you make changes**
+- The release script renames `[Unreleased]` → `[X.Y.Z] - date` at release time
+- Credit external PRs with `#NNN (thanks @username)`
+- GitHub releases roll up the full minor series (e.g. 1.2.0 through 1.2.3)
--- a/README.md
+++ b/README.md
@ -6,6 +6,8 @@ QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking

 ![QMD Architecture](assets/qmd-architecture.png)

+You can read more about QMD's progress in the [CHANGELOG](CHANGELOG.md).
+
 ## Quick Start

 ```sh
@ -23,7 +25,7 @@ qmd collection add ~/notes --name notes
 qmd collection add ~/Documents/meetings --name meetings
 qmd collection add ~/work/docs --name docs

-# Add context to help with search results
+# Add context to help with search results, each piece of context will be returned when matching sub documents are returned. This works as a tree. This is the key feature of QMD as it allows LLMs to make much better contextual choices when selecting documents. Don't sleep on it!
 qmd context add qmd://notes "Personal notes and ideas"
 qmd context add qmd://meetings "Meeting transcripts and notes"
 qmd context add qmd://docs "Work documentation"
--- a/package.json
+++ b/package.json
@ -7,14 +7,14 @@
    "qmd": "qmd"
  },
  "files": [
-    "src/**/*.ts",
-    "!src/**/*.test.ts",
-    "!src/test-preload.ts",
+    "dist/",
    "qmd",
    "LICENSE",
    "CHANGELOG.md"
  ],
  "scripts": {
+    "prepare": "[ -d .git ] && ./scripts/install-hooks.sh || true",
+    "build": "tsc -p tsconfig.build.json",
    "test": "vitest run --reporter=verbose test/",
    "qmd": "tsx src/qmd.ts",
    "index": "tsx src/qmd.ts index",
--- a/2
+++ b/2
@ -43,4 +43,4 @@ while [[ -L "$SOURCE" ]]; do
 done
 SCRIPT_DIR="$(cd -P "$(dirname "$SOURCE")" && pwd)"

-exec "$NODE" --import tsx "$SCRIPT_DIR/src/qmd.ts" "$@"
+exec "$NODE" "$SCRIPT_DIR/dist/qmd.js" "$@"
--- a/scripts/extract-changelog.sh
+++ b/scripts/extract-changelog.sh
@ -0,0 +1,78 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Extract cumulative release notes from CHANGELOG.md.
+#
+# For a given version (e.g. 1.0.5), extracts all entries from the current
+# minor series back to x.x.0 (e.g. 1.0.0 through 1.0.5). This means each
+# GitHub release restates the full arc of changes for the minor series.
+#
+# The [Unreleased] section is included — it contains the content that will
+# become [X.Y.Z] when the release script runs. If the version is already
+# released, [Unreleased] may be empty and is omitted.
+#
+# Fails if neither [Unreleased] nor [X.Y.Z] has content in the changelog.
+#
+# Usage: scripts/extract-changelog.sh <version>
+# Example: scripts/extract-changelog.sh 1.0.5
+#   -> extracts [Unreleased] + [1.0.5], [1.0.4], ..., [1.0.0]
+
+VERSION="${1:?Usage: extract-changelog.sh <version>}"
+
+# Parse major.minor.patch from version
+IFS='.' read -r MAJOR MINOR PATCH <<< "$VERSION"
+
+if [[ ! -f CHANGELOG.md ]]; then
+  echo "CHANGELOG.md not found" >&2
+  exit 1
+fi
+
+# Extract [Unreleased] section and all [X.Y.Z] sections matching our minor series.
+OUTPUT=""
+CAPTURING=false
+UNRELEASED_CONTENT=""
+IN_UNRELEASED=false
+
+while IFS= read -r line; do
+  if [[ "$line" =~ ^##\ \[Unreleased\] ]]; then
+    CAPTURING=true
+    IN_UNRELEASED=true
+  elif [[ "$line" =~ ^##\ \[([0-9]+\.[0-9]+\.[0-9]+)\] ]]; then
+    IN_UNRELEASED=false
+    ENTRY_VERSION="${BASH_REMATCH[1]}"
+    IFS='.' read -r E_MAJOR E_MINOR E_PATCH <<< "$ENTRY_VERSION"
+    if [[ "$E_MAJOR" == "$MAJOR" && "$E_MINOR" == "$MINOR" ]]; then
+      CAPTURING=true
+      OUTPUT+="$line"$'\n'
+    else
+      CAPTURING=false
+    fi
+  elif [[ "$line" =~ ^##\  ]]; then
+    IN_UNRELEASED=false
+    CAPTURING=false
+  elif $CAPTURING; then
+    if $IN_UNRELEASED; then
+      UNRELEASED_CONTENT+="$line"$'\n'
+    else
+      OUTPUT+="$line"$'\n'
+    fi
+  fi
+done < CHANGELOG.md
+
+# Only include [Unreleased] if it has non-blank content
+TRIMMED=$(echo "$UNRELEASED_CONTENT" | sed '/^[[:space:]]*$/d')
+if [[ -n "$TRIMMED" ]]; then
+  OUTPUT="## [Unreleased]"$'\n'"$UNRELEASED_CONTENT$OUTPUT"
+fi
+
+# Fail if we got nothing
+TRIMMED_OUTPUT=$(echo "$OUTPUT" | sed '/^[[:space:]]*$/d')
+if [[ -z "$TRIMMED_OUTPUT" ]]; then
+  echo "error: no changelog content found for $VERSION" >&2
+  echo "Expected either:" >&2
+  echo "  ## [Unreleased]  (with content)" >&2
+  echo "  ## [$VERSION] - YYYY-MM-DD" >&2
+  exit 1
+fi
+
+printf '%s' "$OUTPUT"
--- a/scripts/install-hooks.sh
+++ b/scripts/install-hooks.sh
@ -0,0 +1,19 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Self-installing git hooks for qmd
+# Called from package.json "prepare" script after bun install
+
+REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+HOOKS_DIR="$REPO_ROOT/.git/hooks"
+
+if [[ ! -d "$HOOKS_DIR" ]]; then
+  echo "Not a git repository, skipping hook install"
+  exit 0
+fi
+
+# Install pre-push hook
+cp "$REPO_ROOT/scripts/pre-push" "$HOOKS_DIR/pre-push"
+chmod +x "$HOOKS_DIR/pre-push"
+
+echo "Installed git hooks: pre-push"
--- a/scripts/pre-push
+++ b/scripts/pre-push
@ -0,0 +1,112 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Pre-push hook: validates v* tag pushes before they reach the remote.
+#
+# Checks:
+#   1. package.json version matches the tag
+#   2. CHANGELOG.md has a "## [{version}] - {date}" entry
+#   3. CI passed upstream on GitHub for the tagged commit
+#
+# Installed automatically by: bun install (via prepare script)
+
+while read -r local_ref local_sha remote_ref remote_sha; do
+  # Only validate v* tag pushes
+  if [[ "$local_ref" != refs/tags/v* ]]; then
+    continue
+  fi
+
+  # Skip tag deletions
+  if [[ "$local_sha" == "0000000000000000000000000000000000000000" ]]; then
+    continue
+  fi
+
+  TAG="${local_ref#refs/tags/}"
+  VERSION="${TAG#v}"
+
+  echo "Validating release $TAG..."
+
+  # --- 1. package.json version must match the tag ---
+  PKG_VERSION=$(jq -r .version package.json)
+  if [[ "$PKG_VERSION" != "$VERSION" ]]; then
+    echo ""
+    echo "ABORT: package.json version is $PKG_VERSION but tag is $TAG"
+    echo "Run: jq --arg v '$VERSION' '.version = \$v' package.json > tmp && mv tmp package.json"
+    exit 1
+  fi
+  echo "  package.json version: $PKG_VERSION"
+
+  # --- 2. CHANGELOG.md must have an entry for this version ---
+  if [[ ! -f CHANGELOG.md ]]; then
+    echo ""
+    echo "ABORT: CHANGELOG.md not found"
+    exit 1
+  fi
+
+  if ! grep -q "^## \[$VERSION\] - " CHANGELOG.md; then
+    echo ""
+    echo "ABORT: CHANGELOG.md has no entry for this release"
+    echo "Expected heading: ## [$VERSION] - $(date +%Y-%m-%d)"
+    echo ""
+    echo "Write the changelog entry first. See CLAUDE.md for guidelines."
+    exit 1
+  fi
+  echo "  CHANGELOG.md: entry found"
+
+  # --- 3. CI must have passed on GitHub for this commit ---
+  COMMIT=$(git rev-parse "$TAG" 2>/dev/null || git rev-parse HEAD)
+
+  if ! command -v gh &>/dev/null; then
+    echo "  CI check: skipped (gh CLI not installed)"
+    continue
+  fi
+
+  # Check GitHub Actions check runs
+  CHECK_JSON=$(gh api "repos/{owner}/{repo}/commits/$COMMIT/check-runs" 2>/dev/null || echo "")
+
+  if [[ -z "$CHECK_JSON" ]]; then
+    echo "  CI check: skipped (could not reach GitHub API)"
+    continue
+  fi
+
+  TOTAL=$(echo "$CHECK_JSON" | jq -r '.total_count // 0' 2>/dev/null || echo "0")
+
+  if [[ "$TOTAL" -eq 0 ]] 2>/dev/null; then
+    # No checks found — commit may not be pushed yet
+    MAIN_SHA=$(git rev-parse origin/main 2>/dev/null || echo "")
+    if [[ "$COMMIT" != "$MAIN_SHA" ]]; then
+      echo ""
+      echo "WARNING: No CI runs found for $COMMIT and it's not the tip of origin/main."
+      echo "Push the commit to main first and wait for CI to pass."
+      read -p "Push tag anyway? [y/N] " -n 1 -r </dev/tty
+      echo ""
+      [[ $REPLY =~ ^[Yy]$ ]] || exit 1
+    else
+      echo "  CI check: no runs found (commit is on main)"
+    fi
+  else
+    FAILED=$(echo "$CHECK_JSON" | jq '[.check_runs // [] | .[] | select(.conclusion == "failure")] | length' 2>/dev/null || echo "0")
+    PENDING=$(echo "$CHECK_JSON" | jq '[.check_runs // [] | .[] | select(.status != "completed")] | length' 2>/dev/null || echo "0")
+
+    if [[ "$FAILED" -gt 0 ]] 2>/dev/null; then
+      echo ""
+      echo "ABORT: CI failed for commit $COMMIT"
+      echo "Check: https://github.com/tobi/qmd/commit/$COMMIT"
+      exit 1
+    fi
+
+    if [[ "$PENDING" -gt 0 ]] 2>/dev/null; then
+      echo ""
+      echo "WARNING: CI still running for commit $COMMIT ($PENDING pending)"
+      read -p "Push tag anyway? [y/N] " -n 1 -r </dev/tty
+      echo ""
+      [[ $REPLY =~ ^[Yy]$ ]] || exit 1
+    else
+      echo "  CI check: all passed"
+    fi
+  fi
+
+  echo "All checks passed for $TAG"
+done
+
+exit 0
--- a/scripts/release.sh
+++ b/scripts/release.sh
@ -2,6 +2,11 @@
 set -euo pipefail

 # QMD Release Script
+#
+# Renames the [Unreleased] section in CHANGELOG.md to the new version,
+# bumps package.json, commits, and creates a tag. The actual publish
+# happens via GitHub Actions when the tag is pushed.
+#
 # Usage: ./scripts/release.sh [patch|minor|major|<version>]
 # Examples:
 #   ./scripts/release.sh patch     # 0.9.0 -> 0.9.1
@ -41,74 +46,60 @@ bump_version() {
 }

 NEW=$(bump_version "$CURRENT" "$BUMP")
+DATE=$(date +%Y-%m-%d)
 echo "New version:     $NEW"
 echo ""

-# Confirm
+# --- Validate CHANGELOG.md ---
+
+if [[ ! -f CHANGELOG.md ]]; then
+  echo "Error: CHANGELOG.md not found" >&2
+  exit 1
+fi
+
+# The [Unreleased] section must have content
+if ! grep -q "^## \[Unreleased\]" CHANGELOG.md; then
+  echo "Error: no [Unreleased] section in CHANGELOG.md" >&2
+  echo "" >&2
+  echo "Add your changes under an [Unreleased] heading first:" >&2
+  echo "" >&2
+  echo "  ## [Unreleased]" >&2
+  echo "" >&2
+  echo "  ### Changes" >&2
+  echo "  - Your change here" >&2
+  exit 1
+fi
+
+# --- Preview release notes ---
+
+echo "--- Release notes (will appear on GitHub) ---"
+./scripts/extract-changelog.sh "$NEW"
+echo "--- End ---"
+echo ""
+
+# --- Confirm ---
+
 read -p "Release v$NEW? [y/N] " -n 1 -r
 echo ""
 [[ $REPLY =~ ^[Yy]$ ]] || { echo "Aborted."; exit 1; }

-# Gather commits since last tag (or all if no tags)
-LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
-if [[ -n "$LAST_TAG" ]]; then
-  RANGE="$LAST_TAG..HEAD"
-else
-  RANGE="HEAD"
-fi
+# --- Rename [Unreleased] -> [X.Y.Z] - date, add fresh [Unreleased] ---

-echo ""
-echo "Commits since ${LAST_TAG:-beginning}:"
-git log "$RANGE" --oneline --no-decorate
-echo ""
+sed -i '' "s/^## \[Unreleased\].*/## [$NEW] - $DATE/" CHANGELOG.md

-# Generate changelog entry
-DATE=$(date +%Y-%m-%d)
-ENTRY="## [$NEW] - $DATE"$'\n'$'\n'
+# Insert a new empty [Unreleased] section after the header
+awk '
+  /^## \['"$NEW"'\]/ && !done {
+    print "## [Unreleased]\n"
+    done = 1
+  }
+  { print }
+' CHANGELOG.md > CHANGELOG.md.tmp && mv CHANGELOG.md.tmp CHANGELOG.md

-# Collect conventional commits
-FEATS=$(git log "$RANGE" --oneline --no-decorate --grep="^feat" | sed 's/^[a-f0-9]* feat[:(]/- /' | sed 's/)$//' || true)
-FIXES=$(git log "$RANGE" --oneline --no-decorate --grep="^fix" | sed 's/^[a-f0-9]* fix[:(]/- /' | sed 's/)$//' || true)
-OTHER=$(git log "$RANGE" --oneline --no-decorate --grep="^feat" --grep="^fix" --grep="^docs" --grep="^chore" --grep="^refactor" --invert-grep | sed 's/^[a-f0-9]* /- /' || true)
+# --- Bump version and commit ---

-if [[ -n "$FEATS" ]]; then
-  ENTRY+="### Features"$'\n'$'\n'"$FEATS"$'\n'$'\n'
-fi
-if [[ -n "$FIXES" ]]; then
-  ENTRY+="### Fixes"$'\n'$'\n'"$FIXES"$'\n'$'\n'
-fi
-if [[ -n "$OTHER" ]]; then
-  ENTRY+="### Other"$'\n'$'\n'"$OTHER"$'\n'$'\n'
-fi
-
-# Add link reference
-LINK="[$NEW]: https://github.com/tobi/qmd/compare/v$CURRENT...v$NEW"
-
-# Show what will be added
-echo "--- Changelog entry ---"
-echo "$ENTRY"
-echo "$LINK"
-echo "--- End ---"
-echo ""
-read -p "Looks good? [y/N] " -n 1 -r
-echo ""
-[[ $REPLY =~ ^[Yy]$ ]] || { echo "Aborted."; exit 1; }
-
-# Update package.json version
 jq --arg v "$NEW" '.version = $v' package.json > package.json.tmp && mv package.json.tmp package.json

-# Prepend changelog entry (after the header line)
-if [[ -f CHANGELOG.md ]]; then
-  # Insert after "# Changelog" header and any blank lines
-  awk -v entry="$ENTRY$LINK" '
-    /^# Changelog/ { print; getline; print; print ""; print entry; print ""; next }
-    { print }
-  ' CHANGELOG.md > CHANGELOG.md.tmp && mv CHANGELOG.md.tmp CHANGELOG.md
-else
-  echo "# Changelog"$'\n'$'\n'"$ENTRY$LINK" > CHANGELOG.md
-fi
-
-# Commit and tag
 git add package.json CHANGELOG.md
 git commit -m "release: v$NEW"
 git tag -a "v$NEW" -m "v$NEW"
@ -116,9 +107,6 @@ git tag -a "v$NEW" -m "v$NEW"
 echo ""
 echo "Created commit and tag v$NEW"
 echo ""
-echo "Next steps:"
-echo "  git push origin main --tags   # push to GitHub"
-echo "  npm publish                   # publish to npm"
+echo "Next: push to trigger the publish workflow"
 echo ""
-echo "Or both at once:"
-echo "  git push origin main --tags && npm publish"
+echo "  git push origin main --tags"
--- a/skills/release/SKILL.md
+++ b/skills/release/SKILL.md
@ -0,0 +1,114 @@
+---
+name: release
+description: Manage releases for this project. Validates changelog, installs git hooks, and cuts releases. Use when user says "/release", "release 1.0.5", "cut a release", or asks about the release process. NOT auto-invoked by the model.
+disable-model-invocation: true
+---
+
+# Release
+
+Cut a release, validate the changelog, and ensure git hooks are installed.
+
+## Usage
+
+`/release 1.0.5` or `/release patch` (bumps patch from current version).
+
+## Process
+
+When the user triggers `/release <version>`:
+
+1. **Install hooks** — run `scripts/install-hooks.sh` (idempotent)
+2. **Validate changelog** — confirm `## [Unreleased]` has content
+3. **Preview** — show the user what will be released (unreleased content + minor series rollup via `scripts/extract-changelog.sh`)
+4. **Ask for confirmation** — do NOT proceed without explicit user approval
+5. **Run `scripts/release.sh <version>`** — renames `[Unreleased]`, bumps version, commits, tags
+6. **Remind** — tell the user to `git push origin main --tags`
+
+If any step fails, stop and explain. Never force-push or skip validation.
+
+## Changelog Standard
+
+The changelog lives in `CHANGELOG.md` and follows [Keep a Changelog](https://keepachangelog.com/) conventions.
+
+### Heading format
+
+- `## [Unreleased]` — accumulates entries between releases
+- `## [X.Y.Z] - YYYY-MM-DD` — released versions
+
+The release script renames `[Unreleased]` → `[X.Y.Z] - date` and inserts a
+fresh empty `[Unreleased]` section automatically.
+
+### Structure of a release entry
+
+Each version entry has two parts:
+
+**1. Highlights (optional, 1-4 sentences of prose)**
+
+Immediately after the version heading, before any `###` section. This is the
+elevator pitch — what would you tell someone in 30 seconds? Only include for
+releases with significant changes. Skip for small patches.
+
+```markdown
+## [1.1.0] - 2026-03-01
+
+QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
+through parallel contexts. GPU auto-detection replaces the unreliable
+`gpu: "auto"` with explicit CUDA/Metal/Vulkan probing.
+```
+
+**2. Detailed changelog (`### Changes` and `### Fixes`)**
+
+```markdown
+### Changes
+
+- Runtime: support Node.js (>=22) alongside Bun. The `qmd` wrapper
+  auto-detects a suitable install via PATH. #149 (thanks @igrigorik)
+- Performance: parallel embedding & reranking — up to 2.7x faster on
+  multi-core machines.
+
+### Fixes
+
+- Prevent VRAM waste from duplicate context creation during concurrent
+  `embedBatch` calls. #152 (thanks @jkrems)
+```
+
+### Writing guidelines
+
+- **Explain the why, not just the what.** The changelog is for users.
+- **Include numbers.** "2.7x faster", "17x less memory".
+- **Group by theme, not by file.** "Performance" not "Changes to llm.ts".
+- **Don't list every commit.** Aggregate related changes.
+- **Credit contributors:** end bullets with `#NNN (thanks @username)` for
+  external PRs. No need to credit the repo owner.
+
+### What not to include
+
+- Internal refactors with no user-visible effect
+- Dependency bumps (unless fixing a user-facing bug)
+- CI/tooling changes (unless affecting the release artifact)
+- Test additions (unless validating a fix worth mentioning)
+
+## GitHub Release Notes
+
+Each GitHub release includes the full changelog for the **minor series** back
+to x.x.0. Releasing v1.2.3 includes entries for 1.2.3, 1.2.2, 1.2.1, and
+1.2.0. The `scripts/extract-changelog.sh` script handles this, and the
+publish workflow (`publish.yml`) calls it to populate the GitHub release.
+
+## Git Hooks
+
+The pre-push hook (`scripts/pre-push`) blocks `v*` tag pushes unless:
+
+1. `package.json` version matches the tag
+2. `CHANGELOG.md` has a `## [X.Y.Z] - date` entry for the version
+3. CI passed on GitHub for the tagged commit
+
+Run `skills/release/scripts/install-hooks.sh` to install (also runs
+automatically via `bun install` prepare script).
+
+## Scripts
+
+- [`scripts/install-hooks.sh`](scripts/install-hooks.sh) — install/update git hooks
+- Project scripts used during release:
+  - `scripts/release.sh` — rename [Unreleased], bump version, commit, tag
+  - `scripts/extract-changelog.sh` — extract minor series notes for GitHub release
+  - `scripts/pre-push` — pre-push validation hook
--- a/skills/release/scripts/install-hooks.sh
+++ b/skills/release/scripts/install-hooks.sh
@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Install git hooks for release validation.
+# Idempotent — safe to run multiple times.
+
+REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
+if [[ -z "$REPO_ROOT" ]]; then
+  echo "Error: not in a git repository" >&2
+  exit 1
+fi
+
+HOOKS_DIR="$REPO_ROOT/.git/hooks"
+SOURCE="$REPO_ROOT/scripts/pre-push"
+
+if [[ ! -f "$SOURCE" ]]; then
+  echo "Error: scripts/pre-push not found at $SOURCE" >&2
+  exit 1
+fi
+
+# Install pre-push hook
+if [[ -L "$HOOKS_DIR/pre-push" ]] && [[ "$(readlink "$HOOKS_DIR/pre-push")" == "$SOURCE" ]]; then
+  echo "pre-push hook: already installed (symlink)"
+elif [[ -f "$HOOKS_DIR/pre-push" ]]; then
+  # Existing hook that isn't our symlink — back it up
+  BACKUP="$HOOKS_DIR/pre-push.backup.$(date +%s)"
+  echo "pre-push hook: backing up existing hook to $(basename "$BACKUP")"
+  mv "$HOOKS_DIR/pre-push" "$BACKUP"
+  ln -sf "$SOURCE" "$HOOKS_DIR/pre-push"
+  echo "pre-push hook: installed (symlink → scripts/pre-push)"
+else
+  ln -sf "$SOURCE" "$HOOKS_DIR/pre-push"
+  echo "pre-push hook: installed (symlink → scripts/pre-push)"
+fi
+
+# Ensure the source is executable
+chmod +x "$SOURCE"
+echo "Done."
--- a/src/llm.ts
+++ b/src/llm.ts
@ -494,6 +494,7 @@ export class LlamaCpp implements LLM {
      // Detect available GPU types and use the best one.
      // We can't rely on gpu:"auto" — it returns false even when CUDA is available
      // (likely a binary/build config issue in node-llama-cpp).
+      // @ts-expect-error node-llama-cpp API compat
      const gpuTypes = await getLlamaGpuTypes();
      // Prefer CUDA > Metal > Vulkan > CPU
      const preferred = (["cuda", "metal", "vulkan"] as const).find(g => gpuTypes.includes(g));
@ -733,7 +734,7 @@ export class LlamaCpp implements LLM {
            contextSize: LlamaCpp.RERANK_CONTEXT_SIZE,
            flashAttention: true,
            ...(threads > 0 ? { threads } : {}),
-          }));
+          } as any));
        } catch {
          if (this.rerankContexts.length === 0) {
            // Flash attention might not be supported — retry without it
@ -828,7 +829,7 @@ export class LlamaCpp implements LLM {
      if (n === 1) {
        // Single context: sequential (no point splitting)
        const context = contexts[0]!;
-        const embeddings = [];
+        const embeddings: ({ embedding: number[]; model: string } | null)[] = [];
        for (const text of texts) {
          try {
            const embedding = await context.getEmbeddingFor(text);
--- a/src/mcp.ts
+++ b/src/mcp.ts
@ -682,6 +682,6 @@ export async function startMcpHttpServer(port: number, options?: { quiet?: boole
 }

 // Run if this is the main module
-if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/mcp.ts")) {
+if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/mcp.ts") || process.argv[1]?.endsWith("/mcp.js")) {
  startMcpServer().catch(console.error);
 }
--- a/src/qmd.ts
+++ b/src/qmd.ts
@ -2207,7 +2207,7 @@ async function showVersion(): Promise<void> {
 }

 // Main CLI - only run if this is the main module
-if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/qmd.ts")) {
+if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/qmd.ts") || process.argv[1]?.endsWith("/qmd.js")) {
  const cli = parseCLI();

  if (cli.values.version) {
@ -2489,8 +2489,11 @@ if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsW
          mkdirSync(cacheDir, { recursive: true });
          const logPath = resolve(cacheDir, "mcp.log");
          const logFd = openSync(logPath, "w"); // truncate — fresh log per daemon run
-          const tsxLoader = pathJoin(dirname(fileURLToPath(import.meta.url)), "..", "node_modules", "tsx", "dist", "esm", "index.mjs");
-          const child = nodeSpawn(process.execPath, ["--import", tsxLoader, fileURLToPath(import.meta.url), "mcp", "--http", "--port", String(port)], {
+          const selfPath = fileURLToPath(import.meta.url);
+          const spawnArgs = selfPath.endsWith(".ts")
+            ? ["--import", pathJoin(dirname(selfPath), "..", "node_modules", "tsx", "dist", "esm", "index.mjs"), selfPath, "mcp", "--http", "--port", String(port)]
+            : [selfPath, "mcp", "--http", "--port", String(port)];
+          const child = nodeSpawn(process.execPath, spawnArgs, {
            stdio: ["ignore", logFd, logFd],
            detached: true,
          });
--- a/src/store.ts
+++ b/src/store.ts
@ -23,7 +23,7 @@ import {
  formatDocForEmbedding,
  type RerankDocument,
  type ILLMSession,
-} from "./llm";
+} from "./llm.js";
 import {
  findContextForPath as collectionsFindContextForPath,
  addContext as collectionsAddContext,
@ -37,7 +37,7 @@ import {
  setGlobalContext,
  loadConfig as collectionsLoadConfig,
  type NamedCollection,
-} from "./collections";
+} from "./collections.js";

 // =============================================================================
 // Configuration
--- a/tsconfig.build.json
+++ b/tsconfig.build.json
@ -0,0 +1,11 @@
+{
+  "extends": "./tsconfig.json",
+  "compilerOptions": {
+    "noEmit": false,
+    "outDir": "dist",
+    "declaration": true,
+    "noImplicitAny": false
+  },
+  "include": ["src/**/*.ts"],
+  "exclude": ["src/**/*.test.ts", "src/test-preload.ts", "src/bench-*.ts"]
+}