Commit Graph

26 Commits

Author SHA1 Message Date
Haitao Pan
47bd3ded44 feat(pg): add switchable PostgreSQL backend + OpenClaw/Hermes memory bridge
Add an optional PostgreSQL backend (QMD_BACKEND=pg) alongside the
unchanged default SQLite path. PG store uses pgvector (HNSW) for vectors
and pg_jieba + pg_trgm for full-text/Chinese tokenization, with a
namespace column isolating multi-agent memory (openclaw/hermes).

- src/pg/: config, db-pg, schema bootstrap, memory store
- MCP memory_add/memory_search/memory_get tools; qmd pg status + memory CLI
- connection via QMD_PG_URL/DATABASE_URL/qmd config, stunnel TLS 5443
- tests: pg-config (unit) + pg-memory integration (gated on QMD_PG_URL) + pg-compose
- docs/plan: plan, usage, test report, changelog; track docs/**/*.md

SQLite path: zero regression (typecheck clean, 249 passed / 6 skipped).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 19:13:04 +08:00
Haitao Pan
49fc83ebe2 Default embeddings to external API 2026-05-07 16:19:18 +08:00
James Risberg
244ddf5ecb feat: AST-aware chunking for code files via tree-sitter
Add opt-in AST-aware chunk boundary detection for code files using
web-tree-sitter. When enabled with `--chunk-strategy auto`, code files
(.ts, .tsx, .js, .jsx, .py, .go, .rs) are chunked at function, class,
and import boundaries instead of arbitrary text positions. Default
behavior (`regex`) is unchanged — no surprises on upgrade.

In testing on QMD's own codebase, AST mode split 42% fewer function
bodies across chunk boundaries compared to regex-only chunking.

Usage:
  qmd embed --chunk-strategy auto
  qmd query "search terms" --chunk-strategy auto

What's included:
- Language detection from file extension with support for TypeScript,
  JavaScript (including arrow functions and function expressions),
  Python, Go, and Rust
- Per-language tree-sitter queries with scored break points aligned to
  the existing markdown scale (class=100, function=90, type=80, import=60)
- AST break points merged with regex break points — highest score wins
  at each position, so embedded markdown (comments, docstrings) still
  benefits from regex patterns
- Refactored chunking core: chunkDocumentWithBreakPoints() extracted,
  mergeBreakPoints() added, async chunkDocumentAsync() wrapper for AST
- ChunkStrategy type ("auto" | "regex") threaded through
  generateEmbeddings(), hybridQuery(), structuredSearch(), CLI, and SDK
- getASTStatus() health check wired into `qmd status`
- Parse failures log a warning and fall back to regex — never crash

Hardening:
- Grammar packages are optionalDependencies with pinned versions to
  prevent ABI breaks from semver drift
- web-tree-sitter is a direct dependency (pinned)
- Errors are logged (not silently swallowed) for debuggability
- Tested on both Node.js and Bun (Bun is actually faster)

Testing:
- 26 unit tests (test/ast.test.ts) — all 4 languages, error handling
- 7 integration tests (test/store.test.ts) — merge, equivalence, bypass
- Standalone test-ast-chunking.mjs with 63 synthetic tests and a
  real-collection performance scanner (npx tsx test-ast-chunking.mjs ~/code)
- Validated end-to-end with qmd embed + qmd query on QMD's own codebase
- Zero markdown regressions across all test paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 01:22:39 -04:00
Tobi Lutke
c68904fe08
refactor: move CLI and MCP to subdirectories, MCP consumes SDK
Move frontends into src/cli/ and src/mcp/ to separate them from the
core library. The MCP server is fully rewritten to import only from
the SDK (src/index.ts) — zero direct store.ts/collections.ts/llm.ts
access.

- src/qmd.ts → src/cli/qmd.ts
- src/formatter.ts → src/cli/formatter.ts
- src/mcp.ts → src/mcp/server.ts (rewritten to use QMDStore SDK)
- New src/maintenance.ts: Maintenance class for CLI housekeeping
- SDK gains: getDocumentBody(), getDefaultCollectionNames(),
  extractSnippet/addLineNumbers/DEFAULT_MULTI_GET_MAX_BYTES exports,
  getDefaultDbPath re-export, InternalStore type export
- package.json bin/scripts updated for new paths
- All 692 tests pass
2026-03-10 11:39:55 -04:00
Tobi Lutke
09803a75b7
feat: compile to JS for npm, release system, full changelog
- Add tsc build step (tsconfig.build.json) so npm package ships
  compiled JS instead of raw TypeScript requiring tsx at runtime
- Update qmd wrapper and daemon spawn to use dist/qmd.js in
  production while keeping tsx for development
- Add self-installing pre-push hook validating v* tag pushes:
  package.json version match, changelog entry, CI status
- Add release.sh script that renames [Unreleased] to versioned
  entry, bumps package.json, commits, and tags
- Add extract-changelog.sh for cumulative GitHub release notes
- Update publish workflow with build step and GitHub release creation
- Flesh out CHANGELOG.md with full history from 0.1.0 through 1.0.0
  in Keep-a-Changelog format with PR/contributor attributions
- Add release standards and changelog guidelines to CLAUDE.md
2026-02-16 08:42:32 -04:00
Tobi Lutke
870d3aed3b
test: move all tests to flat test/ directory
No more src/models/ and src/integration/ subfolders to forget about.
All 9 test files live in test/, one command runs everything:

  npx vitest run test/
  bun test test/
2026-02-15 21:37:47 -04:00
Tobi Lutke
294fc76d9f
Merge remote-tracking branch 'origin/nodejs' 2026-02-15 16:58:48 -04:00
Tobi Lutke
9b89a51d10
test: split integration/model suites
Split test suites for explicit runtime execution.

- Move model-related tests under `src/models/*`.
- Move CLI/integration tests under `src/integration/*`.
- Add `src/store.helpers.unit.test.ts` for helper unit coverage.
- Add shared Vitest config with default timeout and suite organization.
- Remove legacy flat test files from `src/` root.
- Keep core test commands in scripts supporting unit/models/integration runs.
2026-02-15 16:57:13 -04:00
Tobi Lutke
f0e87a454a
feat: smart chunking with scored markdown break points
Replace hard 800-token boundary chunking with scoring algorithm that
finds natural document break points. Chunks now end at headings,
code blocks, and paragraph boundaries when possible.

- Add break point scoring: h1=100, h2=90, h3=80, codeblock=80, blank=20
- Use squared distance decay so headings win even at window edge
- Protect code fences from being split
- Increase chunk size to 900 tokens to accommodate smart boundaries
- Add comprehensive tests for chunking functions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-15 14:20:09 -04:00
Tobi Lütke
96634da39b feat: promote query as primary search command, add CLI aliases
List query first in --help as the recommended search method. Add
vector-search and deep-search as undocumented CLI aliases matching
MCP tool names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 00:34:29 -05:00
Ilya Grigorik
785bbcf319
MCP: Streamable HTTP, scoring fixes, tool improvements (#149)
* feat: MCP HTTP transport with daemon lifecycle

  Add streaming HTTP transport as an alternative to stdio for the MCP
  server. A long-lived HTTP server avoids reloading 3 GGUF models (~2GB)
  on every client connection, reducing warm query latency from ~16s (CLI)
  to ~10s.

  New CLI surface:
    qmd mcp --http [--port N]   # foreground, default port 3000
    qmd mcp --http --daemon     # background, PID in ~/.cache/qmd/mcp.pid
    qmd mcp stop                # stop daemon via PID file
    qmd status                  # now shows MCP daemon liveness

  Server implementation (mcp.ts):
  - Extract createMcpServer(store) shared by stdio and HTTP transports
  - HTTP transport uses WebStandardStreamableHTTPServerTransport with
    JSON responses (stateless, no SSE)
  - /health endpoint with uptime, /mcp for MCP protocol, 404 otherwise
  - Request logging to stderr with timestamps, tool names, query args

  Daemon lifecycle (qmd.ts):
  - PID file + log file management with stale PID detection
  - Absolute paths in Bun.spawn (process.execPath + import.meta.path)
    so daemon works regardless of cwd
  - mkdirSync for cache dir on fresh installs
  - Removes top-level SIGTERM/SIGINT handlers before starting HTTP
    server so async cleanup in mcp.ts actually runs

  Move hybridQuery() and vectorSearchQuery() into store.ts as standalone
  functions that take a Store as first argument. Both CLI and MCP now
  call the identical pipeline, eliminating the class of bugs where one
  copy drifts from the other.

  Shared pipeline (store.ts):
  - hybridQuery(): BM25 probe → expand → FTS+vec search → RRF →
    chunk → rerank (chunks only) → position-aware blending → dedup
  - vectorSearchQuery(): expand → vec search → dedup → sort
  - SearchHooks interface for optional progress callbacks
  - Constants: STRONG_SIGNAL_MIN_SCORE, STRONG_SIGNAL_MIN_GAP,
    RERANK_CANDIDATE_LIMIT (40), addLineNumbers()

  Bugs fixed by unification:
  - MCP now gets strong-signal short-circuit (was CLI-only)
  - Reranker candidate limit unified at 40 (MCP had 30)
  - File dedup added to hybrid query (MCP was missing it)
  - Collection filter pushed into searchVec DB query
  - Filter-then-slice ordering fixed (MCP was slice-then-filter)

* feat: type-routed query expansion — lex→FTS, vec/hyde→vector

  expandQuery() now returns typed ExpandedQuery[] instead of string[],
  preserving the lex/vec/hyde type info from the LLM's GBNF-structured
  output. hybridQuery() and vectorSearchQuery() route searches by type:
  lex queries go to FTS only, vec/hyde go to vector only.

  Previously, every expanded query ran through BOTH backends — keyword
  variants wasted embedding forward passes, semantic paraphrases wasted
  BM25 lookups. Type routing eliminates ~4 calls/query with zero quality
  loss (cross-backend noise actually hurt RRF fusion).

  Cache format changed from newline-separated text to JSON (preserves
  types). Old cache entries gracefully re-expand on first access.

  CLI expansion tree now shows query types:
    ├─ original query
    ├─ lex: keyword variant
    ├─ vec: semantic meaning
    └─ hyde: hypothetical document...

  Benchmark (5 queries, 1756-doc index, warm LLM, Apple Silicon):

    Metric              Old (untyped)  New (typed)  Delta
    Avg backend calls   10.0           6.0          -40%
    Total wall time     1278ms         549ms        -57%
    Avg saved/query     —              —            146ms

    "authentication setup"          12 → 7 calls   511 → 112ms
    "database migration strategy"   10 → 6 calls   182 → 106ms
    "how to handle errors in API"   10 → 6 calls   216 → 121ms
    "meeting notes from last week"  10 → 6 calls   228 → 110ms
    "performance optimization"       8 → 5 calls   141 → 100ms

  Savings come from skipped embed() calls (~30-80ms each). FTS is
  synchronous SQLite (~0ms), so lex→FTS routing is free while
  vec/hyde→vector-only avoids wasted embedding passes.

* fix: MCP query snippets now use reranker's best chunk, not full body

  extractSnippet() was scanning the entire document body for keyword
  matches to build the snippet. But hybridQuery() already identified
  the most relevant chunk via cross-attention reranking — rescanning
  the full body is redundant and can land on a less relevant section
  if the query terms appear elsewhere in the document.

  CLI was already using bestChunk (set during the refactor). MCP was
  still using body — a pre-existing inconsistency, not a regression.

* feat: dynamic MCP instructions + tool annotations

  The MCP server now generates instructions at startup from actual index
  state and injects them into the initialize response. LLMs see collection
  names, document counts, content descriptions, and search strategy
  guidance in their system prompt — zero tool calls needed for orientation.

  Previously, the only guidance was generic static tool descriptions and
  a user-invocable "query" prompt that no LLM would discover on its own.
  An LLM connecting to QMD had no idea what collections existed, what they
  contained, or how to scope searches effectively.

* change default port to 8181

* fix: BM25 score normalization was inverted

  The normalization formula `1 / (1 + |bm25|)` is a decreasing function of
  match strength. FTS5 BM25 scores are negative where more negative = better
  match (e.g., -10 is strong, -0.5 is weak). The formula mapped:

    strong match (raw -10) → 1/(1+10) =  9%   ← should be highest
    weak match   (raw -0.5) → 1/(1+0.5) = 67%  ← should be lowest

  Three downstream effects:
  1. `--min-score 0.5` (or MCP minScore: 0.5) filtered OUT strong matches
     and kept only weak ones. The MCP instructions recommend this threshold.
  2. CLI `formatScore()` color bands never showed green for BM25 results
     (best matches scored ~9%, green threshold is 70%).
  3. The strong signal optimization in hybridQuery (skip ~2s LLM expansion
     when BM25 already has a clear winner) was dead code — strong matches
     scored ~0.09, never reaching the 0.85 threshold.

  Fix: `|x| / (1 + |x|)` — same (0,1) range, monotonic, no per-query
  normalization needed, but now correctly maps strong → high, weak → low.

  The normalization was born broken (Math.max(0, x) clamped all
  negative BM25 to 0 → every score = 1.0), then PR #76 changed to
  Math.abs which made scores vary but inverted the direction. Neither
  state was ever correct.

* fix: rerank cache key ignores chunk content

  The rerank cache key was (query, file, model) but the actual text sent
  to the reranker is a keyword-selected chunk that varies by query terms.
  Two different queries hitting the same file can select different chunks,
  but the second query gets a stale cached score from the first chunk.

  Example:
    Query "auth flow" → selects chunk about authentication → score 0.92
    Query "auth tokens" → same file, selects chunk about tokens
      → cache HIT on (query, file, model) → returns 0.92 from wrong chunk

  Fix: include full chunk text in cache key. getCacheKey() already
  SHA-256 hashes its inputs, so this adds no key bloat — just
  disambiguation. Old cache entries become natural misses (different key
  shape) and re-warm on next query.

* rename MCP tools for clarity, rewrite descriptions for LLM tool selection

  Rename MCP tools: vsearch → vector_search, query → deep_search.
  LLMs see these names — self-documenting names reduce reliance on
  descriptions for tool selection. CLI commands stay unchanged
  (qmd vsearch, qmd query) — different namespace, users type those.

  Rewrite all search tool descriptions to be action-oriented:
    - search: "Search by keyword. Finds documents containing exact
      words and phrases in the query."
    - vector_search: "Search by meaning. Finds relevant documents even
      when they use different words than the query — handles synonyms,
      paraphrases, and related concepts."
    - deep_search: "Deep search. Auto-expands the query into variations,
      searches each by keyword and meaning, and reranks for top hits
      across all results."

  Rewrite instructions ladder — each tool says what it does, no
  "start here" / "escalate as needed" strategy language.

  Delete the "query" prompt (registerPrompt) — it restated what
  descriptions + instructions already cover. No LLM proactively
  calls prompts/get to learn how to use tools.

* supress HTTP server logs during tests
2026-02-10 16:37:33 -05:00
Tobi Lutke
17c201ea81
fix: correct QMD acronym to Query Markup Documents
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 13:22:54 -05:00
Tobi Lutke
66bb8ed963
Remove beads reference from CLAUDE.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 16:24:14 -05:00
Tobi Lutke
d383b5c226
Migrate to node-llama-cpp and add structured query expansion
- Replace Ollama HTTP API with node-llama-cpp for local GGUF models
- Add structured query expansion using JSON schema grammar:
  - Generates lexical query (for BM25), vector query, and HyDE
  - Tree-style CLI output showing query types
- Fix vector search: use cosine distance instead of L2
- Format queries with embeddinggemma nomic-style prompts
- Rename ollama_cache table to llm_cache
- Add disposeDefaultLlamaCpp() for clean process exit

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-20 18:03:41 -04:00
Tobi Lutke
5e60bd7085
Add docid, line-numbers, handelize, and fix displayPath format
Features:
- Add short document IDs (docid) - first 6 chars of hash - to all search outputs
- Add --line-numbers CLI option and lineNumbers param for MCP tools
- Add handelize() function for token-friendly filenames (lowercase, special chars to dash, preserves extension)
- Convert triple underscore `___` to folder separator in filenames
- Change displayPath format to include collection name (collection/path)
- Make line-numbers default for MCP search snippets

Changes:
- store.ts: Add getDocid(), findDocumentByDocid(), handelize() functions
- formatter.ts: Add docid to all formatters, addLineNumbers() helper
- qmd.ts: Add --line-numbers option, use handelize during indexing
- mcp.ts: Remove resource listing, lineNumbers default for snippets
- Update all tests to expect new displayPath format and handelize behavior
- Update CLAUDE.md with docid documentation

All 274 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 13:26:39 -04:00
Tobi Lutke
27fbf91d48
Fix context add to support collection root paths
Problem: Virtual paths like qmd://journals/ were rejected as invalid

Changes:
- Updated parseVirtualPath() regex to make path optional: /^qmd:\/\/([^\/]+)\/?(.*)$/
- Now supports: qmd://name, qmd://name/, qmd://name/path
- Empty path represents collection root context
- Improved help message with collection root example
- Better output message showing "(collection root)" for clarity
- Updated CLAUDE.md documentation

Test cases verified:
✓ qmd context add qmd://journals/ "..." (with trailing slash)
✓ qmd context add qmd://journals "..." (without trailing slash)
✓ qmd context add qmd://journals/2024 "..." (with specific path)

Fixes issue where users couldn't add context to entire collections

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-13 09:41:31 -05:00
Tobi Lutke
c93d38c5fd
Add 'qmd context check' command to identify missing contexts
This commit implements a diagnostic command to help users find collections
and paths that don't have context strings defined.

Changes:
1. New 'qmd context check' command that:
   - Lists collections without any context configured
   - Identifies top-level directories in collections missing context
   - Provides actionable suggestions with example commands

2. Updated help text and command documentation in CLAUDE.md

3. Enhanced error messages to include the new 'check' subcommand

The command provides helpful output showing:
- Collections without context (with document counts and suggestions)
- Top-level paths within collections that lack context
- Suggestions for adding context using virtual path syntax

This addresses the issue where users had contexts but couldn't find them,
by providing a clear diagnostic tool to identify what's missing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 17:17:36 -05:00
Tobi Lutke
1459b04406
Add git status and --pull flag to qmd update command
This commit adds git integration to the qmd update command:

1. Git repository detection: Checks for .git directory in each collection
2. Git status display: Shows short status for git repositories during update
3. --pull flag: Added optional --pull flag to execute git pull before reindexing
4. Error handling: Gracefully handles git errors without failing the update

When a collection is a git repository:
- Displays "Git repository detected"
- If --pull is specified, runs git pull and shows output (dimmed)
- Shows git status --short output (dimmed)
- If status is clean, shows "Git status: clean"

Usage:
  qmd update           # Update all collections, show git status
  qmd update --pull    # Pull changes first, then update

All git operations are non-blocking - failures are shown as warnings
but don't prevent the update from continuing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 17:15:29 -05:00
Tobi Lutke
529e989d83
Refactor: Move TypeScript source files to src/ directory
Move all .ts files to src/ to clean up the project root:
- Created src/ directory and moved all TypeScript source and test files
- Updated qmd shell wrapper to point to src/qmd.ts
- Updated package.json scripts to use src/ paths
- Updated documentation (CLAUDE.md, README.md) to reflect new structure
- All imports remain relative within src/, no changes needed
- Tests pass with same results (192 pass, 75 fail - existing issues)

This improves project organization and makes the root directory cleaner.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 17:12:09 -05:00
Tobi Lutke
27f9e8b630
Implement qmd collection rename command
- Added collectionRename() function to rename collections
- Updates collection name in database (changes virtual path prefix)
- Added CLI handler for "qmd collection rename <old> <new>"
- Supports alias "mv" for rename command
- Includes comprehensive error handling:
  * Checks if old collection exists
  * Prevents renaming to existing collection name
  * Validates required arguments
- Fixed bug in collectionAdd where --name flag was ignored
  * indexFiles now accepts pwd and name parameters
  * Collection name is properly passed through to getOrCreateCollection
- Updated help text and CLAUDE.md documentation
- Added 4 new tests for rename functionality (all passing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 16:29:49 -05:00
Tobi Lutke
9e5005be81
Remove qmd add command and complete CLI review
- Remove qmd add command (replaced by qmd collection add)
- Remove --drop flag (use collection remove + add instead)
- Update all help text to reflect new command structure
- Update CLAUDE.md documentation
- Update all tests to use collection add (39/42 passing)
- Reorganize help to put collection commands first
- Remove add-context from "do not run automatically" list

The CLI now has consistent command structure:
- qmd collection {add,list,remove} for collection management
- qmd context {add,list,rm} for context management
- qmd ls for browsing collections
- qmd get/multi-get for document retrieval

Closes qmd-c0m

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 16:07:01 -05:00
Tobi Lutke
99aee71903
Update get and multi-get commands for virtual paths
- Update getDocument() to support qmd:// virtual paths and filesystem paths
- Update multiGet() to handle virtual paths in patterns and comma-separated lists
- Update matchFilesByGlob() in store.ts to return virtual paths
- Remove duplicate getContextForFile() function from qmd.ts
- Use collection-scoped getContextForPath() instead of legacy function
- All get and multi-get tests now passing

Closes qmd-vro

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 15:47:42 -05:00
Tobi Lutke
bab46dacb2
Refactor: extract store, LLM, and formatter modules with comprehensive tests
- Extract store.ts: database operations, search, document retrieval
  - createStore() factory pattern for clean DB lifecycle management
  - Unified DocumentResult type with optional body loading
  - Snippet extraction with diff-style headers (@@ -line,count @@)

- Extract llm.ts: LLM abstraction layer with Ollama implementation
  - Clean interface for embed, generate, rerank operations
  - High-level rerankerLogprobsCheck with logprob-based scoring
  - Query expansion support

- Extract formatter.ts: output formatting utilities
  - Support for CLI, JSON, CSV, MD, XML formats
  - MCP-specific CSV formatting

- Extract mcp.ts: MCP server using createStore() pattern
  - Single DB connection for server lifetime (fixes closed DB errors)
  - URL-decode resource paths for proper space/special char handling

- Add comprehensive test suites (215 tests total)
  - store.test.ts: 96 tests covering all store operations
  - llm.test.ts: 60 tests for LLM abstraction
  - mcp.test.ts: 59 tests for MCP endpoints and resources
  - All tests use mocked Ollama (errors on unmocked calls)

- Add bun run inspector script for MCP debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:33:32 -05:00
Tobi Lutke
ceb534a30b
added chunking 2025-12-08 09:21:39 -05:00
Tobi Lutke
e963555ff8
Add status command, fix collections, improve CLI output
- Rename 'collections' to 'status' with richer output:
  - Index size
  - Documents count and vector embedding status
  - Time since last update
  - Per-collection stats

- Fix `qmd add .` to use default glob pattern
- Fix duplicate collections with cleanup and INSERT OR IGNORE
- Improve update-all with colored progress output
- Fix 'qmd vector' → 'qmd embed' in help messages
- Implement weighted RRF (2x weight for original query)
- Simplify CLAUDE.md for project-specific instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:19:34 -05:00
Tobi Lutke
39193ea252
Initial commit: QMD - Quick Markdown Search
A CLI tool for searching markdown knowledge bases using hybrid retrieval:
- BM25 full-text search via SQLite FTS5
- Vector semantic search via sqlite-vec + Ollama embeddings
- LLM re-ranking with qwen3-reranker (logprobs-based scoring)
- Reciprocal Rank Fusion with weighted queries and position-aware blending

Features:
- `qmd add .` - Index markdown files in current directory
- `qmd embed` - Generate vector embeddings
- `qmd search` - BM25 full-text search
- `qmd vsearch` - Vector similarity search
- `qmd query` - Hybrid search with query expansion + reranking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:16:16 -05:00