Commit Graph

18 Commits

Author SHA1 Message Date
Tobi Lutke
4782badfd3
Fix migration SQL for proper basename extraction
- Replace complex SQL with application logic to extract basenames
- Use TypeScript path.split() instead of SQL string manipulation
- Consistent with getOrCreateCollection() implementation
- Avoids SQLite function compatibility issues

Closes qmd-bx1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 15:50:36 -05:00
Tobi Lutke
99aee71903
Update get and multi-get commands for virtual paths
- Update getDocument() to support qmd:// virtual paths and filesystem paths
- Update multiGet() to handle virtual paths in patterns and comma-separated lists
- Update matchFilesByGlob() in store.ts to return virtual paths
- Remove duplicate getContextForFile() function from qmd.ts
- Use collection-scoped getContextForPath() instead of legacy function
- All get and multi-get tests now passing

Closes qmd-vro

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 15:47:42 -05:00
Tobi Lutke
bf65655f84
bd sync: 2025-12-12 15:31:28 2025-12-12 15:31:28 -05:00
Tobi Lutke
e67fb83a17
changes 2025-12-12 15:02:00 -05:00
Tobi Lutke
3b22f88c9f
Make MCP server spec-compliant (2025-06-18)
- Remove mimeType from TextContent (not in spec, only valid on EmbeddedResource)
- Add isError: true to all error responses for proper error detection
- Replace CSV output with structuredContent for machine-readable results
- Add name and title fields to all embedded resources
- Fix URI encoding: preserve slashes, encode special chars (spaces → %20)
- Change template to {+path} for proper nested path support
- Rename tools: qmd_search → search, qmd_get → get, etc.
- Update tests for new response format and spec compliance
2025-12-10 09:13:17 -05:00
Tobi Lutke
bab46dacb2
Refactor: extract store, LLM, and formatter modules with comprehensive tests
- Extract store.ts: database operations, search, document retrieval
  - createStore() factory pattern for clean DB lifecycle management
  - Unified DocumentResult type with optional body loading
  - Snippet extraction with diff-style headers (@@ -line,count @@)

- Extract llm.ts: LLM abstraction layer with Ollama implementation
  - Clean interface for embed, generate, rerank operations
  - High-level rerankerLogprobsCheck with logprob-based scoring
  - Query expansion support

- Extract formatter.ts: output formatting utilities
  - Support for CLI, JSON, CSV, MD, XML formats
  - MCP-specific CSV formatting

- Extract mcp.ts: MCP server using createStore() pattern
  - Single DB connection for server lifetime (fixes closed DB errors)
  - URL-decode resource paths for proper space/special char handling

- Add comprehensive test suites (215 tests total)
  - store.test.ts: 96 tests covering all store operations
  - llm.test.ts: 60 tests for LLM abstraction
  - mcp.test.ts: 59 tests for MCP endpoints and resources
  - All tests use mocked Ollama (errors on unmocked calls)

- Add bun run inspector script for MCP debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-09 16:33:32 -05:00
Tobi Lutke
542509a098
Add Nix flake for easy installation
- Package QMD as a Nix derivation with bun and sqlite dependencies
- Support `nix profile install`, `nix run`, and `nix develop`
- Set library paths for sqlite extension support
2025-12-09 09:47:50 -05:00
Tobi Lutke
25ac53848f
Add MCP server for AI agent integration
- Add `qmd mcp` command to start stdio-based MCP server
- Expose tools: qmd_search, qmd_vsearch, qmd_query, qmd_get, qmd_status
- Add index health warnings for unembedded docs and stale indexes
- Return CSV format with text/csv mime type for search results
- Add MCP documentation and configuration examples to README
- Add @modelcontextprotocol/sdk and zod dependencies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 14:59:56 -05:00
Tobi Lutke
877917487d
Update README with quick start guide and agentic workflow examples
- Add new tagline: on-device search engine for everything you need to remember
- Add Quick Start section with walkthrough of indexing multiple directories
- Add "Using with AI Agents" section showing --json and --files workflows
- Update output format example to reflect new CLI format with Title/Context/Score
- Document --all flag for returning all matches
2025-12-08 12:54:57 -05:00
Tobi Lutke
2a1307c38a
Add --all flag and improve display paths and get output
- Add --all flag to search/vsearch for returning all matches
  - Works with --min-score to filter by relevance threshold
  - Useful for bulk exports: qmd search "term" --all --files
- Display paths now always include parent folder for better context
  - e.g., pages/file.md instead of just file.md
- qmd get now outputs folder context header when available
  - Format: "Folder Context: <description>\n---\n<content>"
2025-12-08 12:52:04 -05:00
Tobi Lutke
9d09d5e518
Add display paths, titles to search output and improve CLI format
- Add display_path column with unique index for collection-relative paths
  - Computes shortest unique path starting from filename
  - Auto-migrates existing documents via update-all
- Add title field to all search result types (FTS, vector, hybrid)
- Improve CLI output format:
  - Filepath on first line
  - Title and Context on labeled lines
  - Score on separate line
  - Remove | prefix from snippets for better word wrap
  - Double newline between results
- Add line range support to get command:
  - qmd get file.md:100 (start at line 100)
  - qmd get file.md -l 20 --from 50
- Include title in JSON, CSV, XML output formats
- Fix update-all crash when same file exists in multiple collections
2025-12-08 10:21:33 -05:00
Tobi Lutke
342379610a
Add multi-chunk scoring bonus to vector search
- Documents matching multiple chunks get +0.02 per additional chunk
- Bonus capped at +0.1 (5+ chunks)
- Still uses max chunk score as base
- Rewards documents with broader relevance across content

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-08 09:41:02 -05:00
Tobi Lutke
42ab3f6c10
Update README to reflect current implementation
- Fix architecture diagram: show BM25+Vector for all query variations
- Add position-aware blending percentages to diagram
- Update CLI commands: add → index, add-context, cleanup, status
- Document chunked embeddings (~6KB pieces with hash/seq/pos)
- Update schema section with new tables (path_contexts, ollama_cache)
- Rewrite How It Works flows with accurate pipeline details
- Fix output format examples to show ~/... paths
- Add --files and --json output options

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2025-12-08 09:31:20 -05:00
Tobi Lutke
ceb534a30b
added chunking 2025-12-08 09:21:39 -05:00
Tobi Lutke
46010e6342
Improve embed: truncate large docs, better error messages
- Truncate documents > 64KB with warning showing filenames
- Show document title in error messages instead of hash
- Format total time as "15m 4s" instead of "904.2s"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-08 07:55:24 -05:00
Tobi Lutke
6c7e2911a2
Improve embed progress bar with byte-based ETA
- Visual progress bar with filled/empty blocks
- Calculate ETA based on bytes processed (larger files = longer time)
- Show throughput in bytes/sec
- Skip empty documents
- Fix UNIQUE constraint with INSERT OR REPLACE

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:24:09 -05:00
Tobi Lutke
e963555ff8
Add status command, fix collections, improve CLI output
- Rename 'collections' to 'status' with richer output:
  - Index size
  - Documents count and vector embedding status
  - Time since last update
  - Per-collection stats

- Fix `qmd add .` to use default glob pattern
- Fix duplicate collections with cleanup and INSERT OR IGNORE
- Improve update-all with colored progress output
- Fix 'qmd vector' → 'qmd embed' in help messages
- Implement weighted RRF (2x weight for original query)
- Simplify CLAUDE.md for project-specific instructions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:19:34 -05:00
Tobi Lutke
39193ea252
Initial commit: QMD - Quick Markdown Search
A CLI tool for searching markdown knowledge bases using hybrid retrieval:
- BM25 full-text search via SQLite FTS5
- Vector semantic search via sqlite-vec + Ollama embeddings
- LLM re-ranking with qwen3-reranker (logprobs-based scoring)
- Reciprocal Rank Fusion with weighted queries and position-aware blending

Features:
- `qmd add .` - Index markdown files in current directory
- `qmd embed` - Generate vector embeddings
- `qmd search` - BM25 full-text search
- `qmd vsearch` - Vector similarity search
- `qmd query` - Hybrid search with query expansion + reranking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 19:16:16 -05:00