release: v1.0.0
Node.js compatibility, parallel embedding/reranking, flash attention, GPU auto-detection, and restructured test suite.
This commit is contained in:
parent
294fc76d9f
commit
dc64166a2a
34
CHANGELOG.md
34
CHANGELOG.md
@ -2,6 +2,39 @@
|
||||
|
||||
All notable changes to QMD will be documented in this file.
|
||||
|
||||
## [1.0.0] - 2026-02-15
|
||||
|
||||
### Node.js Compatibility
|
||||
|
||||
QMD now runs on both **Node.js (>=22)** and **Bun**. Install with `npm install -g @tobilu/qmd` or `bun install -g @tobilu/qmd` — your choice. The `qmd` wrapper auto-detects Node.js via `tsx` and works out of the box with mise, asdf, nvm, and Homebrew installs.
|
||||
|
||||
### Performance
|
||||
|
||||
- **Parallel embedding & reranking** — multiple contexts split work across CPU cores (or VRAM on GPU), delivering up to **2.7x faster reranking** and significantly faster embedding on multi-core machines
|
||||
- **Flash attention** — ~20% less VRAM per reranking context, enabling more parallel contexts on GPU
|
||||
- **Right-sized contexts** — reranker context dropped from 40960 to 2048 tokens (17x less memory), since chunks are capped at ~900 tokens
|
||||
- **Adaptive parallelism** — automatically scales context count based on available VRAM (GPU) or CPU math cores
|
||||
- **CPU thread splitting** — each context runs on its own cores for true parallelism instead of contending on a single context
|
||||
|
||||
### GPU Auto-Detection
|
||||
|
||||
- Probes for CUDA, Metal, and Vulkan at startup — uses the best available backend
|
||||
- Falls back gracefully to CPU with a warning if GPU init fails
|
||||
- `qmd status` now shows device info (GPU type, VRAM usage)
|
||||
|
||||
### Test Suite
|
||||
|
||||
- Tests split into `src/*.test.ts` (unit), `src/models/*.test.ts` (model), and `src/integration/*.test.ts` (CLI/integration)
|
||||
- Vitest config for Node.js; bun test still works for Bun
|
||||
- New `eval-bm25` and `store.helpers.unit` test suites
|
||||
|
||||
### Fixes
|
||||
|
||||
- Prevent VRAM waste from duplicate context creation during concurrent loads
|
||||
- Collection-aware FTS filtering for scoped keyword search
|
||||
|
||||
---
|
||||
|
||||
## [0.9.0] - 2026-02-15
|
||||
|
||||
Initial public release.
|
||||
@ -30,5 +63,6 @@ Initial public release.
|
||||
- BM25 score normalization with Math.abs
|
||||
- Bun UTF-8 path corruption workaround
|
||||
|
||||
[1.0.0]: https://github.com/tobi/qmd/releases/tag/v1.0.0
|
||||
[0.9.0]: https://github.com/tobi/qmd/releases/tag/v0.9.0
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@tobilu/qmd",
|
||||
"version": "0.9.9",
|
||||
"version": "1.0.0",
|
||||
"description": "Query Markup Documents - On-device hybrid search for markdown files with BM25, vector search, and LLM reranking",
|
||||
"type": "module",
|
||||
"bin": {
|
||||
|
||||
Loading…
Reference in New Issue
Block a user