docs(qmd-skill): structured-query-first; cite docid + lines; no sed

- Make structured `qmd query` with intent:/lex:/vec:/hyde: the default
  search mode, and emphasize that the caller authors the expansion
  rather than leaning on the built-in query-expansion model.
- Tell the caller to cite the #docid and exact line numbers now
  printed by get/multi-get, and to slice files with the :from:count
  suffix or --from/-l instead of piping through sed/head/tail.
- Document --full-path for handing the on-disk path to editor tools.
- Bump skill version to 2.2.0 and record the behavior changes under
  ## [Unreleased] in CHANGELOG.md.
- Update the package smoke test that pinned the old 'structured
  queries' wording to match the new, more specific intro phrasing.
This commit is contained in:
Tobi Lutke 2026-05-28 10:56:13 -07:00
parent 41bc3a27d8
commit fa8f904a9d
No known key found for this signature in database
3 changed files with 130 additions and 20 deletions

View File

@ -2,6 +2,32 @@
## [Unreleased]
### Features
- `qmd get` now accepts a `:from:count` suffix on a path or docid (e.g.
`qmd get "#abc123:120:40"` reads 40 lines starting at line 120). Explicit
`--from`/`-l` flags still override the suffix. The MCP `get` tool accepts the
same suffix.
- `qmd get` and `qmd multi-get` are now **line-numbered by default** and print
the document's `#docid` and `qmd://` path in the output header. Disable line
numbers with `--no-line-numbers`. The MCP `get`/`multi_get` tools default
`lineNumbers` to `true` to match.
- `qmd multi-get` now includes the `#docid` in every output format
(`--md`, `--json`, `--csv`, `--xml`, `--files`, and the default CLI view),
consistent with `qmd search`.
- `qmd get` and `qmd multi-get` accept `--full-path`, which replaces the
`qmd://` path + `#docid` with the document's on-disk filesystem path (handy for
piping into `Read`/`Edit`/an editor). Falls back to the canonical `qmd://` +
docid header when the file no longer exists on disk.
### Docs
- qmd skill: emphasize reading line ranges with `get`'s built-in
`:from:count` suffix / `--from`/`-l` flags instead of piping through
`sed`/`head`/`tail`; cite the docid and line numbers now present in retrieval
output; and author structured `intent:`/`lex:`/`vec:`/`hyde:` queries yourself
rather than relying on built-in query expansion.
## [2.5.2] - 2026-05-22
### Fixes

View File

@ -5,7 +5,7 @@ license: MIT
compatibility: Requires qmd CLI or MCP server. Install via `npm install -g @tobilu/qmd`.
metadata:
author: tobi
version: "2.1.0"
version: "2.2.0"
allowed-tools: Bash(qmd:*), mcp__qmd__*
---
@ -34,8 +34,13 @@ qmd search "merchant reality support interviews" -n 5
qmd multi-get "#abc123,#def432" --md
```
For harder searches, use `qmd query` structured queries with `intent:`, `lex:`,
`vec:`, and `hyde:` fields.
**Default to structured `qmd query` with `intent:`, `lex:`, `vec:`, and `hyde:`
fields that you write yourself.** You are a better query expander than the
built-in model: you know the user's actual goal, the domain vocabulary, and the
nearby-but-wrong concepts to avoid. Do not just paste the user's words into
`qmd query "..."` and hope the expansion model guesses right — supply the
`intent:` and craft the lexical and semantic terms deliberately (see
[Pick the right search mode](#pick-the-right-search-mode)).
When reporting what you retrieved, a compact note is enough; do not paste whole
files unless needed:
@ -56,28 +61,37 @@ qmd search "cockpit OKR Goodhart" -n 10
qmd search '"AI Before Headcount"' -c concepts -n 5
```
Use **hybrid semantic search** when the user describes an idea indirectly, uses
different wording than the source, or needs conceptual recall:
```bash
qmd query "decision quality depends on surfacing assumptions and context" -n 10
qmd query --json --explain "metrics as cockpit instruments but not OKRs"
```
Use **structured queries** for hard searches. They combine exact anchors with
semantic recall:
Use **`qmd query` with structured fields** when the user describes an idea
indirectly, uses different wording than the source, or needs conceptual recall.
**This is the default mode — write the fields yourself rather than leaning on
query expansion.** Combine exact anchors with semantic recall:
```bash
qmd query $'intent: Find the concept note about metrics as instruments without letting OKRs replace judgment.\nlex: cockpit instruments OKR Goodhart metrics judgment\nvec: data informed not metric driven product judgment\nhyde: A concept note says metrics are useful like cockpit instruments, but leaders should remain data-informed rather than metric-driven because OKRs and dashboards can Goodhart product judgment.'
```
Structured query fields:
Structured query fields (you author each one — do not delegate this to the
expansion model):
- `intent:` states what you are trying to find and what to avoid.
- `lex:` uses exact terms, aliases, titles, and rare words.
- `vec:` paraphrases the idea in natural language.
- `intent:` states what you are trying to find **and what to avoid**. Always
supply this. It steers ranking away from nearby-but-wrong concepts.
- `lex:` exact terms, aliases, titles, code symbols, and rare words you expect
in the source. This is your own keyword expansion.
- `vec:` paraphrases the idea in natural language, in source-like wording.
- `hyde:` describes the document or answer that would satisfy the request.
You do not need all four every time, but you should almost always write at least
`intent:` plus one of `lex:`/`vec:`. A bare `qmd query "the user's sentence"`
throws away the context only you have and relies on the built-in expander to
reconstruct it — prefer the structured form.
If you genuinely have nothing to expand (a single rare token, a verbatim phrase),
that is a job for `qmd search`, not bare `qmd query`:
```bash
qmd query --json --explain $'intent: ...\nlex: ...\nvec: ...' # inspect ranking
```
If `qmd query` is slow or model/GPU setup fails, fall back to `qmd search` with
better lexical terms.
@ -87,14 +101,77 @@ Search results include docids like `#abc123` and `qmd://...` paths. Fetch them:
```bash
qmd get "#abc123"
qmd get qmd://concepts/ai-before-headcount.md --full
qmd get qmd://concepts/ai-before-headcount.md
qmd multi-get "#abc123,#def432" --md
qmd multi-get 'concepts/{ai-before-headcount.md,data-informed-not-metric-driven.md}' --md
qmd multi-get 'sources/podcast-2025-*.md' -l 80
```
Use `multi-get` when comparing several hits or gathering context across pages.
Use `--full` when the exact source matters.
### Output is line-numbered and carries the docid — cite both
`get` and `multi-get` are **line-numbered by default** and always print the
document's `#docid` and `qmd://` path. So `get` output looks like:
```text
qmd://concepts/note.md #abc123
---
1: # Metrics as instruments
2:
3: Treat dashboards like cockpit instruments...
```
Cite the docid and exact line numbers in your answer, and use the numbers to ask
for the next slice. Pass `--no-line-numbers` only when you need raw content to
copy verbatim (e.g. reproducing a code block).
When you need to open or edit the underlying file (e.g. hand a path to `Read`,
`Edit`, or an editor), add `--full-path`. It replaces the `qmd://` URL + docid
header with the document's on-disk path, falling back to the canonical header if
the file no longer exists on disk:
```text
$ qmd get "#abc123" --full-path
/Users/you/notes/concepts/note.md
---
1: # Metrics as instruments
```
### Read line ranges with the `:from:count` suffix — never pipe through `sed`/`head`/`tail`
`qmd get` slices files itself. Use the suffix or flags; do **not** shell out to
`sed -n`, `head`, `tail`, or `awk` to pull a line range. Piping defeats docid
resolution, virtual-path lookups, line numbering, and the header, and it is
slower and more error-prone.
The most compact form is a `:from:count` suffix right on the path or docid —
prefer it:
```bash
qmd get "#abc123:120:40" # 40 lines starting at line 120
qmd get qmd://concepts/note.md:200:60 # lines 200259
qmd get "#abc123:120" # from line 120 to end of file
qmd get "#abc123" --from 120 -l 40 # equivalent, using flags
```
Suffix and flags:
- `<path>:<from>:<count>` — start at line `<from>`, read `<count>` lines. **Best
for reading around a search hit.**
- `<path>:<from>` — start at `<from>`, read to end of file.
- `--from <line>` / `-l <lines>` — flag equivalents. Explicit flags override the
suffix, so `... :5:2 -l 1` reads 1 line.
- `--no-line-numbers` — drop the `N:` prefixes (line numbers are on by default).
Wrong: `qmd get "#abc123" | sed -n '120,160p'`
Right: `qmd get "#abc123:120:40"`
Search results include a `:line` anchor on each hit — feed it straight into
`qmd get path:line:<n>` to read a window around the match (line numbers in the
output will start at `line`).
## Discover what is indexed
@ -189,6 +266,12 @@ server configuration.
## Pitfalls
- **Do not stop at snippets.** Fetch documents before making claims.
- **Do not slice files with `sed`/`head`/`tail`.** Use the `path:from:count`
suffix (e.g. `qmd get "#abc123:120:40"`) or `--from`/`-l`. Output is already
line-numbered; piping breaks docid resolution, the header, and virtual paths.
- **Do not lean on query expansion.** Write `intent:`/`lex:`/`vec:`/`hyde:`
yourself. A bare `qmd query "user sentence"` discards the context only you
have. You expand the query; the model just ranks.
- **Do not overuse semantic search.** If you know exact titles or terms, BM25 is
faster and often better.
- **Do not mutate indexes casually.** `qmd collection add`, `qmd update`, and

View File

@ -60,7 +60,8 @@ describe("package grammar distribution", () => {
expect(firstSixtyLines).toContain('qmd multi-get "#abc123,#def432"');
expect(firstSixtyLines).toContain("Retrieved:");
expect(firstSixtyLines).toContain("qmd query");
expect(firstSixtyLines).toContain("structured queries");
// The skill must teach structured, self-authored queries near the top.
expect(firstSixtyLines).toContain("Default to structured");
const scriptPath = join(root.pathname, "scripts", "check-package-grammars.mjs");
const script = readFileSync(scriptPath, "utf8");