fix(core): reset steps for promoted prompts (#33452)

2026-06-23 01:01:14 +02:00 · 2026-06-23 01:01:14 +02:00 · dc468bdcfd
commit dc468bdcfd
parent f48f24ec4e
10 changed files with 144 additions and 122 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -152,7 +152,7 @@ const table = sqliteTable("session", {
 - Keep `SessionExecution` process-global and Session-ID based. Its local implementation owns the process-local Session coordinator and discovers placement through `SessionStore` plus `LocationServiceMap.get(session.location)` only when a drain starts; no layer should take a Session ID. V2 interruption targets the active process-local ownership chain for that Session; idle or missing interruption is a no-op.
 - Keep `SessionRunner`, model resolution, tool registry, permissions, and filesystem Location-scoped. Omitted `Location.workspaceID` means implicit-local placement; explicit workspace identity remains reserved for future placement semantics.
 - Preserve one explicit `llm.stream(request)` call per provider turn and reload projected history before durable continuation. Do not bridge through legacy `SessionPrompt.loop(...)` or delegate orchestration to an in-memory tool loop.
- Keep local Session drains process-local until clustering is implemented. `SessionRunCoordinator` joins explicit same-Session resumes, coalesces prompt wakeups, and allows different Sessions to run concurrently. Advisory wakes drain eligible durable inbox rows only; post-crash activity recovery requires a separate explicit design before it may retry provider work.
+- Keep local Session drains process-local until clustering is implemented. `SessionRunCoordinator` joins explicit same-Session resumes, coalesces prompt wakeups, and allows different Sessions to run concurrently. Advisory wakes drain eligible durable inbox rows only; post-crash continuation recovery requires a separate explicit design before it may retry provider work. A drain has no durable identity or transcript boundary.
- Keep delivery vocabulary explicit. Prompts steer by default and coalesce into the active activity at the next safe provider-turn boundary. Explicit `queue` inputs open FIFO future activities one at a time after the active activity settles.
+- Keep delivery vocabulary explicit. Prompts steer by default and promote at the next safe provider-turn boundary while the current drain requires continuation. An explicit `queue` input remains pending until the Session would otherwise become idle; promote one queued input at that boundary, then reevaluate continuation before promoting another. Promoting any new user input resets the selected agent's provider-turn allowance; a batch of steers resets it once.
 - Keep EventV2 replay owner claims separate from clustered Session execution ownership.
 - Keep the System Context algebra, registry, and built-ins in `src/system-context`; keep Context Source producers with their observed domains, and keep Session History selection plus Context Epoch persistence Session-owned.
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -39,6 +39,18 @@ An expected temporary inability to observe a **Context Source** value; the runti
 **Safe Provider-Turn Boundary**:
 The point immediately before a provider call, after durable input promotion and any required tool settlement, where context changes may be admitted chronologically.
 **Admitted Prompt**:
 A durable user input accepted into the Session inbox but not yet included in **Session History**.
 **Prompt Promotion**:
 The durable transition that removes an **Admitted Prompt** from pending input and appends its user message to **Session History**.
 **Provider Turn**:
 One request to a model provider and the response projected from that request.
 **Session Drain**:
 One process-local execution span that promotes eligible input and runs required **Provider Turns** until no immediate continuation remains. A Session Drain has no durable identity or transcript boundary.
 **Model Tool Output**:
 The bounded projection of a Core-executed tool result persisted in Session history and replayed to the model. A tool may shape this projection semantically, but the Tool Registry enforces the final size limit.
@ -67,6 +79,11 @@ The host-supplied environment overlay applied by the server when creating a PTY,
 - Changes from multiple **Context Sources** admitted at one safe boundary combine into one **Mid-Conversation System Message**.
 - Context changes are sampled and admitted lazily at a **Safe Provider-Turn Boundary**, never pushed asynchronously when their source changes.
 - At a **Safe Provider-Turn Boundary**, newly promoted user input or settled tool results precede any combined **Mid-Conversation System Message**.
 - An **Admitted Prompt** is replayable pending input, not yet model-visible **Session History**.
 - **Prompt Promotion** atomically consumes the pending inbox entry and appends its model-visible user message.
 - Steering prompts promote at the next **Safe Provider-Turn Boundary** while the current **Session Drain** still requires continuation. Promoting any newly admitted user input resets the selected agent's provider-turn allowance; multiple prompts promoted at one boundary reset it once.
 - A queued prompt does not promote while the current **Session Drain** requires continuation. The runner promotes one queued prompt when the Session would otherwise become idle, then reevaluates continuation before promoting another.
 - A **Session Drain** is process-local coordination rather than a durable domain entity. Durable recovery must reason from prompts, projected history, provider attempts, and tool state rather than inventing an enclosing execution identity.
 - The first provider turn renders the latest complete **Baseline System Context** and initializes its **Context Snapshot** without emitting a redundant **Mid-Conversation System Message**; unavailable initial context blocks the turn instead of persisting an incomplete baseline.
 - Initial **System Context** preparation precedes the first durable input promotion so an unavailable baseline leaves that input pending and retryable; ordinary reconciliation remains after promotion.
 - Compaction starts a new **Context Epoch** with a freshly rendered **Baseline System Context** and **Context Snapshot**; prior **Mid-Conversation System Messages** remain durable audit history but leave projected model history.
--- a/packages/core/src/session/runner/llm.ts
+++ b/packages/core/src/session/runner/llm.ts
@ -79,7 +79,7 @@ import { MAX_STEPS_PROMPT } from "./max-steps"
 *   - [ ] Update title, summaries, compaction state, and cleanup in bounded background work.
 *
 * Use `llm.stream(request)` for each provider turn. Keep tool execution and continuation here.
- * Durable activity recovery remains a separate future slice with an explicit retry policy.
+ * Durable continuation recovery remains a separate future slice with an explicit retry policy.
 *
 * The current slice loads V2 history, translates it, resolves a model through a core service, and persists one
 * provider turn. Registry definitions are advertised, local tool calls are settled durably, and an
@ -142,9 +142,9 @@ export const layer = Layer.effect(
    type TurnTransition =
      // Automatic compaction completed; rebuild the request from compacted history.
-      | { readonly _tag: "ContinueAfterCompaction" }
+      | { readonly _tag: "ContinueAfterCompaction"; readonly step: number }
      // Overflow compaction completed; rebuild once through the path without overflow recovery.
-      | { readonly _tag: "ContinueAfterOverflowCompaction" }
+      | { readonly _tag: "ContinueAfterOverflowCompaction"; readonly step: number }
    class TurnTransitionError extends Error {
      constructor(readonly transition: TurnTransition) {
@ -152,10 +152,9 @@ export const layer = Layer.effect(
      }
    }
-    const continueAfterCompaction = new TurnTransitionError({ _tag: "ContinueAfterCompaction" })
+    const continueAfterCompaction = (step: number) => new TurnTransitionError({ _tag: "ContinueAfterCompaction", step })
-    const continueAfterOverflowCompaction = new TurnTransitionError({
+    const continueAfterOverflowCompaction = (step: number) =>
-      _tag: "ContinueAfterOverflowCompaction",
+      new TurnTransitionError({ _tag: "ContinueAfterOverflowCompaction", step })
    })
    const loadSystemContext = (agent: AgentV2.Selection) =>
      Effect.all([systemContext.load(), skillGuidance.load(agent), referenceGuidance.load()], {
@ -175,20 +174,23 @@ export const layer = Layer.effect(
      const initialized = yield* SessionContextEpoch.initialize(db, loadSystemContext(agent), session.id)
      const toolFibers = yield* FiberSet.make<void, ToolOutputStore.Error>()
      let needsContinuation = false
      let currentStep = step
      if (promotion) {
        const cutoff = yield* EventV2.latestSequence(db, session.id)
-        if (promotion === "steer") yield* SessionInput.promoteSteers(db, events, session.id, cutoff)
+        let promoted = 0
        if (promotion === "steer") promoted = yield* SessionInput.promoteSteers(db, events, session.id, cutoff)
        if (promotion === "queue") {
-          yield* SessionInput.promoteNextQueued(db, events, session.id)
+          promoted += Number(yield* SessionInput.promoteNextQueued(db, events, session.id))
-          yield* SessionInput.promoteSteers(db, events, session.id, cutoff)
+          promoted += yield* SessionInput.promoteSteers(db, events, session.id, cutoff)
        }
        if (promoted > 0) currentStep = 1
      }
      const system =
        initialized ?? (yield* SessionContextEpoch.prepare(db, events, loadSystemContext(agent), session.id))
      const model = yield* models.resolve(session)
      const entries = yield* SessionHistory.entriesForRunner(db, session.id, system.baselineSeq)
      const context = entries.map((entry) => entry.message)
-      const isLastStep = agent.info?.steps !== undefined && step >= agent.info.steps
+      const isLastStep = agent.info?.steps !== undefined && currentStep >= agent.info.steps
      const toolMaterialization = isLastStep ? undefined : yield* tools.materialize(agent.info?.permissions)
      const promptCacheKey = /^ses_[0-9a-f]{64}$/.test(session.id) ? session.id.slice(4) : session.id
      const request = LLM.request({
@ -202,7 +204,7 @@ export const layer = Layer.effect(
        toolChoice: isLastStep ? "none" : undefined,
      })
      if (yield* compaction.compactIfNeeded({ sessionID: session.id, entries, model, request }))
-        return yield* Effect.die(continueAfterCompaction)
+        return yield* Effect.die(continueAfterCompaction(currentStep))
      const publisher = createLLMEventPublisher(events, {
        sessionID: session.id,
        agent: agent.id,
@ -272,7 +274,7 @@ export const layer = Layer.effect(
            isContextOverflowFailure(overflowFailure ?? failure) &&
            (yield* restore(recoverOverflow({ sessionID: session.id, entries, model, request })))
          )
-            return yield* Effect.die(continueAfterOverflowCompaction)
+            return yield* Effect.die(continueAfterOverflowCompaction(currentStep))
          if (overflowFailure) yield* publish(overflowFailure)
          const llmFailure = failure instanceof LLMError ? failure : undefined
          if (llmFailure && !publisher.hasProviderError()) {
@ -306,7 +308,7 @@ export const layer = Layer.effect(
            yield* withPublication(publisher.failUnsettledTools("Provider did not return a tool result", true))
          if (stream._tag === "Failure") return yield* Effect.failCause(stream.cause)
          if (settled._tag === "Failure") return yield* Effect.failCause(settled.cause)
-          return !publisher.hasProviderError() && needsContinuation
+          return { needsContinuation: !publisher.hasProviderError() && needsContinuation, step: currentStep }
        }),
      )
    }, Effect.scoped)
@ -314,7 +316,7 @@ export const layer = Layer.effect(
      sessionID: SessionSchema.ID,
      promotion: SessionInput.Delivery | undefined,
      step: number,
-    ) => Effect.Effect<boolean, RunError>
+    ) => Effect.Effect<{ readonly needsContinuation: boolean; readonly step: number }, RunError>
    const runAfterOverflowCompaction: RunTurn = Effect.fnUntraced(function* (sessionID, promotion, step) {
      return yield* runTurnAttempt(sessionID, promotion, step).pipe(
@ -324,7 +326,7 @@ export const layer = Layer.effect(
            if (defect.transition._tag === "ContinueAfterOverflowCompaction")
              return yield* Effect.die("Post-compaction provider attempt cannot recover another overflow")
            yield* Effect.yieldNow
-            return yield* runAfterOverflowCompaction(sessionID, undefined, step)
+            return yield* runAfterOverflowCompaction(sessionID, undefined, defect.transition.step)
          }),
        ),
      )
@ -337,8 +339,8 @@ export const layer = Layer.effect(
            if (!(defect instanceof TurnTransitionError)) return yield* Effect.die(defect)
            yield* Effect.yieldNow
            if (defect.transition._tag === "ContinueAfterOverflowCompaction")
-              return yield* runAfterOverflowCompaction(sessionID, undefined, step)
+              return yield* runAfterOverflowCompaction(sessionID, undefined, defect.transition.step)
-            return yield* runTurn(sessionID, undefined, step)
+            return yield* runTurn(sessionID, undefined, defect.transition.step)
          }),
        ),
      )
@ -353,16 +355,19 @@ export const layer = Layer.effect(
      if (!input.force && !hasSteer && !hasQueue) return
      yield* failInterruptedTools(input.sessionID)
      let promotion: SessionInput.Delivery | undefined = hasSteer ? "steer" : hasQueue ? "queue" : undefined
-      let openActivity = input.force || hasSteer || hasQueue
+      let shouldRun = input.force || hasSteer || hasQueue
-      while (openActivity) {
+      while (shouldRun) {
        let needsContinuation = true
-        for (let step = 1; needsContinuation; step++) {
+        let step = 1
-          needsContinuation = yield* runTurn(input.sessionID, promotion, step)
+        while (needsContinuation) {
          const result = yield* runTurn(input.sessionID, promotion, step)
          needsContinuation = result.needsContinuation
          step = result.step + 1
          promotion = "steer"
          if (!needsContinuation) needsContinuation = yield* SessionInput.hasPending(db, input.sessionID, "steer")
        }
-        openActivity = yield* SessionInput.hasPending(db, input.sessionID, "queue")
+        shouldRun = yield* SessionInput.hasPending(db, input.sessionID, "queue")
-        promotion = openActivity ? "queue" : undefined
+        promotion = shouldRun ? "queue" : undefined
      }
    })
--- a/packages/core/test/session-runner.test.ts
+++ b/packages/core/test/session-runner.test.ts
@ -1851,7 +1851,7 @@ describe("SessionRunnerLLM", () => {
    }),
  )
-  it.effect("starts queued input after the active activity settles", () =>
+  it.effect("promotes queued input after continuation ends", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
@ -1883,7 +1883,7 @@ describe("SessionRunnerLLM", () => {
      yield* Deferred.await(streamStarted)
      yield* session.prompt({
        sessionID,
-        prompt: new Prompt({ text: "Wait until the next activity" }),
+        prompt: new Prompt({ text: "Wait until continuation ends" }),
        delivery: "queue",
      })
      yield* Deferred.succeed(streamGate, undefined)
@ -1894,7 +1894,7 @@ describe("SessionRunnerLLM", () => {
      expect(requests).toHaveLength(3)
      expect(userTexts(requests[0]!)).toEqual(["Start working"])
      expect(userTexts(requests[1]!)).toEqual(["Start working"])
-      expect(userTexts(requests[2]!)).toEqual(["Start working", "Wait until the next activity"])
+      expect(userTexts(requests[2]!)).toEqual(["Start working", "Wait until continuation ends"])
    }),
  )
@ -1984,7 +1984,7 @@ describe("SessionRunnerLLM", () => {
    }),
  )
-  it.effect("runs queued active inputs as separate FIFO activities", () =>
+  it.effect("promotes queued inputs one at a time in FIFO order", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
@ -2027,14 +2027,14 @@ describe("SessionRunnerLLM", () => {
    }),
  )
-  it.effect("opens queued input after idle steering activity settles", () =>
+  it.effect("promotes queued input after steering continuation ends", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
-      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Start steering activity" }), resume: false })
+      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Start steering" }), resume: false })
      yield* session.prompt({
        sessionID,
-        prompt: new Prompt({ text: "Queue later activity" }),
+        prompt: new Prompt({ text: "Queue for later" }),
        delivery: "queue",
        resume: false,
      })
@ -2056,12 +2056,12 @@ describe("SessionRunnerLLM", () => {
      yield* session.resume(sessionID)
      expect(requests).toHaveLength(2)
-      expect(userTexts(requests[0]!)).toEqual(["Start steering activity"])
+      expect(userTexts(requests[0]!)).toEqual(["Start steering"])
-      expect(userTexts(requests[1]!)).toEqual(["Start steering activity", "Queue later activity"])
+      expect(userTexts(requests[1]!)).toEqual(["Start steering", "Queue for later"])
    }),
  )
-  it.effect("coalesces steers into the active queued activity before starting the next queued activity", () =>
+  it.effect("promotes steers before the next queued input", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
@ -2101,8 +2101,8 @@ describe("SessionRunnerLLM", () => {
      streamGate = secondGate
      yield* Deferred.succeed(firstGate, undefined)
      while (requests.length < 2) yield* Effect.yieldNow
-      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Steer first queued activity" }) })
+      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Steer before next queued input" }) })
-      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Also steer first queued activity" }) })
+      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Also steer before next queued input" }) })
      yield* Deferred.succeed(secondGate, undefined)
      yield* Fiber.join(first)
      streamGate = undefined
@ -2113,14 +2113,14 @@ describe("SessionRunnerLLM", () => {
      expect(userTexts(requests[2]!)).toEqual([
        "Start working",
        "Queue first",
-        "Steer first queued activity",
+        "Steer before next queued input",
-        "Also steer first queued activity",
+        "Also steer before next queued input",
      ])
      expect(userTexts(requests[3]!)).toEqual([
        "Start working",
        "Queue first",
-        "Steer first queued activity",
+        "Steer before next queued input",
-        "Also steer first queued activity",
+        "Also steer before next queued input",
        "Queue second",
      ])
    }),
@ -2354,13 +2354,13 @@ describe("SessionRunnerLLM", () => {
    }),
  )
-  it.effect("starts the first queued activity when woken while idle", () =>
+  it.effect("promotes the first queued input when woken while idle", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
      yield* session.prompt({
        sessionID,
-        prompt: new Prompt({ text: "Wait for fresh activity" }),
+        prompt: new Prompt({ text: "Wait in queue" }),
        delivery: "queue",
        resume: false,
      })
@ -2370,30 +2370,7 @@ describe("SessionRunnerLLM", () => {
      yield* Effect.yieldNow
      expect(requests).toHaveLength(1)
-      expect(userTexts(requests[0]!)).toEqual(["Wait for fresh activity"])
+      expect(userTexts(requests[0]!)).toEqual(["Wait in queue"])
    }),
  )
  it.effect("does not spend one activity step budget across queued activities", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
      const queued = Array.from({ length: 26 }, (_, index) => `Queued activity ${index + 1}`)
      for (const text of queued) {
        yield* session.prompt({ sessionID, prompt: new Prompt({ text }), delivery: "queue", resume: false })
      }
      requests.length = 0
      responses = queued.map(() => [
        LLMEvent.stepStart({ index: 0 }),
        LLMEvent.stepFinish({ index: 0, reason: "stop" }),
        LLMEvent.finish({ reason: "stop" }),
      ])
      yield* session.resume(sessionID)
      expect(requests).toHaveLength(queued.length)
      expect(userTexts(requests.at(-1)!)).toEqual(queued)
    }),
  )
@ -2768,7 +2745,7 @@ describe("SessionRunnerLLM", () => {
    }),
  )
-  it.effect("interrupts a blocked provider turn without local tool activity", () =>
+  it.effect("interrupts a blocked provider turn without local tool execution", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
@ -2828,38 +2805,6 @@ describe("SessionRunnerLLM", () => {
    }),
  )
  it.effect("continues past 25 local tool steps when the agent has no step limit", () =>
    Effect.gen(function* () {
      yield* setup
      const session = yield* SessionV2.Service
      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Loop forever" }), resume: false })
      requests.length = 0
      authorizations.length = 0
      executions.length = 0
      streamGate = undefined
      streamStarted = undefined
      responses = [
        ...Array.from({ length: 25 }, (_, index) => [
          LLMEvent.stepStart({ index: 0 }),
          LLMEvent.toolCall({ id: `call-echo-${index}`, name: "echo", input: { text: `${index}` } }),
          LLMEvent.stepFinish({ index: 0, reason: "tool-calls" }),
          LLMEvent.finish({ reason: "tool-calls" }),
        ]),
        [
          LLMEvent.stepStart({ index: 0 }),
          LLMEvent.stepFinish({ index: 0, reason: "stop" }),
          LLMEvent.finish({ reason: "stop" }),
        ],
      ]
      yield* session.resume(sessionID)
      expect(requests).toHaveLength(26)
      expect(executions).toHaveLength(25)
    }),
  )
  it.effect("forces a text response on an agent's configured final step", () =>
    Effect.gen(function* () {
      yield* setup
@ -2908,6 +2853,58 @@ describe("SessionRunnerLLM", () => {
    }),
  )
  it.effect("resets the configured step allowance when steering input promotes", () =>
    Effect.gen(function* () {
      yield* setup
      const agents = yield* AgentV2.Service
      yield* agents.transform((editor) =>
        editor.update(AgentV2.ID.make("build"), (agent) => {
          agent.steps = 2
        }),
      )
      const session = yield* SessionV2.Service
      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Start work" }), resume: false })
      requests.length = 0
      executions.length = 0
      responses = [
        [
          LLMEvent.stepStart({ index: 0 }),
          LLMEvent.toolCall({ id: "call-before-steer", name: "echo", input: { text: "before" } }),
          LLMEvent.stepFinish({ index: 0, reason: "tool-calls" }),
          LLMEvent.finish({ reason: "tool-calls" }),
        ],
        [
          LLMEvent.stepStart({ index: 0 }),
          LLMEvent.toolCall({ id: "call-after-steer", name: "echo", input: { text: "after" } }),
          LLMEvent.stepFinish({ index: 0, reason: "tool-calls" }),
          LLMEvent.finish({ reason: "tool-calls" }),
        ],
        [
          LLMEvent.stepStart({ index: 0 }),
          LLMEvent.stepFinish({ index: 0, reason: "stop" }),
          LLMEvent.finish({ reason: "stop" }),
        ],
      ]
      streamGate = yield* Deferred.make<void>()
      streamStarted = yield* Deferred.make<void>()
      const run = yield* session.resume(sessionID).pipe(Effect.forkChild)
      yield* Deferred.await(streamStarted)
      yield* session.prompt({ sessionID, prompt: new Prompt({ text: "Change direction" }) })
      yield* Deferred.succeed(streamGate, undefined)
      yield* Fiber.join(run)
      streamGate = undefined
      streamStarted = undefined
      expect(requests).toHaveLength(3)
      expect(requests[1]?.toolChoice).toBeUndefined()
      expect(requests[1]?.tools).not.toEqual([])
      expect(requests[2]?.toolChoice).toMatchObject({ type: "none" })
      expect(executions).toEqual(["before", "after"])
    }),
  )
  it.effect("projects provider errors as terminal assistant step failures", () =>
    Effect.gen(function* () {
      yield* setup
--- a/packages/sdk/js/src/v2/gen/sdk.gen.ts
+++ b/packages/sdk/js/src/v2/gen/sdk.gen.ts
@ -5344,7 +5344,7 @@ export class Session3 extends HeyApiClient {
  /**
   * Switch session agent
   *
-   * Switch the agent used by subsequent session activity.
+   * Switch the agent used by subsequent provider turns.
   */
  public switchAgent<ThrowOnError extends boolean = false>(
    parameters: {
@ -5383,7 +5383,7 @@ export class Session3 extends HeyApiClient {
  /**
   * Switch session model
   *
-   * Switch the model used by subsequent session activity.
+   * Switch the model used by subsequent provider turns.
   */
  public switchModel<ThrowOnError extends boolean = false>(
    parameters: {
--- a/packages/sdk/openapi.json
+++ b/packages/sdk/openapi.json
@ -10383,7 +10383,7 @@
            }
          }
        },
-        "description": "Switch the agent used by subsequent session activity.",
+        "description": "Switch the agent used by subsequent provider turns.",
        "summary": "Switch session agent",
        "requestBody": {
          "content": {
@ -10468,7 +10468,7 @@
            }
          }
        },
-        "description": "Switch the model used by subsequent session activity.",
+        "description": "Switch the model used by subsequent provider turns.",
        "summary": "Switch session model",
        "requestBody": {
          "content": {
--- a/packages/server/src/groups/session.ts
+++ b/packages/server/src/groups/session.ts
@ -152,7 +152,7 @@ export const SessionGroup = HttpApiGroup.make("server.session")
        OpenApi.annotations({
          identifier: "v2.session.switchAgent",
          summary: "Switch session agent",
-          description: "Switch the agent used by subsequent session activity.",
+          description: "Switch the agent used by subsequent provider turns.",
        }),
      ),
  )
@ -168,7 +168,7 @@ export const SessionGroup = HttpApiGroup.make("server.session")
        OpenApi.annotations({
          identifier: "v2.session.switchModel",
          summary: "Switch session model",
-          description: "Switch the model used by subsequent session activity.",
+          description: "Switch the model used by subsequent provider turns.",
        }),
      ),
  )
--- a/specs/v2/schema-changelog.md
+++ b/specs/v2/schema-changelog.md
@ -138,7 +138,7 @@ Change:
 Reason:
 - Prompt admission and model-visible promotion must be separate durable operations.
- Steering must promote at safe provider-turn boundaries while queued prompts remain separate FIFO activities.
+- Steering must promote at safe provider-turn boundaries while queued prompts remain pending in FIFO order until continuation would otherwise end.
 Compatibility:
--- a/specs/v2/session.md
+++ b/specs/v2/session.md
@ -42,7 +42,7 @@ SessionExecution.resume(sessionID)
 `SessionExecution` and the read-side `SessionStore` are process-global. `SessionRunner`, catalog, model resolver, tool registry, permission state, and filesystem are cached per Location. No layer takes a Session ID. An omitted `Location.workspaceID` means implicit-local placement; explicit workspace identity remains reserved for future placement semantics.
-The local runner issues one explicit `llm.stream(request)` per provider turn, projects each complete local tool call durably before eagerly starting its structured child execution, awaits every started tool fiber after provider-stream closure, reloads projected history once before continuation, and fails after 25 provider turns within one local drain activity only when work remains. Tool settlement events carry the owning assistant message ID because provider-local call IDs may repeat across turns. Before assembling a provider request, the runner durably fails any local tool still projected as `running` from a previous process with `Tool execution interrupted`; abandoned side effects are never silently replayed.
+The local runner issues one explicit `llm.stream(request)` per provider turn, projects each complete local tool call durably before eagerly starting its structured child execution, awaits every started tool fiber after provider-stream closure, and reloads projected history once before continuation. Promoting any new user input resets the selected agent's configured provider-turn allowance; multiple steers promoted at one boundary reset it once. Tool settlement events carry the owning assistant message ID because provider-local call IDs may repeat across turns. Before assembling a provider request, the runner durably fails any local tool still projected as `running` from a previous process with `Tool execution interrupted`; abandoned side effects are never silently replayed.
 Projected hosted tools preserve call-side and settlement-side provider metadata separately so settlement and interruption recovery cannot erase continuation identifiers. Provider-native reasoning and provider metadata replay only while the historical assistant model matches the selected continuation model; after a model switch, visible reasoning text remains ordinary assistant text and provider-native metadata is omitted.
@ -96,7 +96,7 @@ Ambient project discovery canonicalizes and contains traversal within the projec
 Current Context Epoch follow-ups:
 - Add configured, remote, and nested instruction sources with explicit precedence and removal semantics.
- Add durable post-crash activity recovery for promoted or provider-dispatched work.
+- Add durable post-crash continuation recovery for promoted or provider-dispatched work.
 - Add explicit manual compaction on top of automatic request-budget compaction.
 - Add operational metrics for observation latency, unavailable sources, contention, baseline size, and chronological-update growth.
 - Consider watcher-backed per-file caching only if measurements show direct safe-boundary observation is too expensive.
@ -113,7 +113,7 @@ Compaction keeps the full transcript durable while replacing its active model re
 Repeated compactions update the previous structured summary with newly compacted messages. The runner then reloads projected history and executes the original pending turn.
-When a provider rejects a request as context overflow before durable assistant output or tool activity, the runner attempts one overflow-triggered compaction even when the local estimate did not predict pressure. A completed checkpoint rebuilds the same logical provider turn with one remaining physical attempt. A second overflow, unavailable compaction, or overflow after durable output becomes the ordinary terminal failure; recovery never loops or replays partial side effects. Deterministic old tool-result pruning remains a separate follow-up.
+When a provider rejects a request as context overflow before durable assistant output or tool execution, the runner attempts one overflow-triggered compaction even when the local estimate did not predict pressure. A completed checkpoint rebuilds the same logical provider turn with one remaining physical attempt. A second overflow, unavailable compaction, or overflow after durable output becomes the ordinary terminal failure; recovery never loops or replays partial side effects. Deterministic old tool-result pruning remains a separate follow-up.
 ## V1 Runtime Context Parity
@ -150,18 +150,18 @@ Provider timeout, retry, and watchdog policy is intentionally deferred. The runn
 Inbox delivery is explicit:
 - `steer` inputs promote at the next safe provider-turn boundary, including continuation inside the current drain.
- `queue` inputs form a FIFO of future activities. When the current activity settles, the runner promotes exactly one queued input to open the next activity. Multiple queued inputs remain separate activities.
+- `queue` inputs remain in a FIFO while the current drain requires continuation. When the Session would otherwise become idle, the runner promotes exactly one queued input, then reevaluates continuation before promoting another.
 Execution has two entry points:
 - `run` is an explicit resume. It joins any active execution or starts a forced drain while idle. A forced drain bypasses the no-eligible-input guard, but preparation may still fail before a provider attempt.
 - `wake` reports newly recorded durable inbox work. Repeated wakes coalesce. A wake calls the provider only when it can promote eligible input.
-Post-crash activity recovery is intentionally deferred. A wake does not infer that ambiguous provider work is safe to retry after an input has already been promoted. Explicit `run` may deliberately continue from durable projected history. A future recovery slice should model durable activity identity, provider-dispatch ambiguity, required continuation, queue-opener reservation, retry policy, and visible recovery status together.
+Post-crash continuation recovery is intentionally deferred. A wake does not infer that ambiguous provider work is safe to retry after an input has already been promoted. Explicit `run` may deliberately continue from durable projected history. A future recovery slice should model provider-dispatch ambiguity, required continuation, queued-input promotion, retry policy, and visible recovery status together. It must not assume an enclosing durable execution identity that the Session model does not otherwise need.
 A process-global `SessionRunCoordinator` serializes execution for each local Session while allowing different Sessions to run concurrently. Resumes join active execution, overlapping wakes coalesce into one follow-up, and interruption stops current process-local execution without deleting durable inbox work. The runner enters the Session's current Location when execution starts and fences each new provider turn against that Location.
-Inbox promotion coalesces pending steers in durable admission order and opens one queued activity at a time in FIFO order. Add explicit inbox backlog and steering-batch limits before exposing broad multi-caller admission or untrusted queue growth.
+Inbox promotion coalesces pending steers in durable admission order. Once continuation would otherwise end, it promotes one queued input at a time in FIFO order. Add explicit inbox backlog and steering-batch limits before exposing broad multi-caller admission or untrusted queue growth.
 Eager local-tool execution is intentionally unbounded in the current local slice. This minimizes tool latency but does not increase SQLite settlement throughput: Session-event publication remains serialized per provider turn. Before broadening exposure, revisit per-turn call limits, output truncation, and operational backpressure using observed workloads. The `session.next.*` event schemas remain experimental and unshipped; databases created by earlier experimental builds are disposable rather than compatibility targets.
--- a/specs/v2/todo.md
+++ b/specs/v2/todo.md
@ -26,14 +26,14 @@ through legacy `SessionPrompt.loop(...)`:
  tool results, and assistant output
 - a scoped `ToolRegistry` advertises definitions and the first permission-checked
  `read` built-in
- local continuation reloads projected history and stops after 25 provider turns within one local drain activity
+- local continuation reloads projected history, and promoting new user input resets the selected agent's configured provider-turn allowance
 - concurrent resumes for one Session join one process-local run while different
  Sessions remain concurrent
 Prompt admission now uses a durable `session_input` inbox rather than immediate
-transcript projection. `steer` inputs coalesce into the active activity at the
+transcript projection. `steer` inputs promote at the next safe provider-turn
-next safe provider-turn boundary. `queue` inputs form a FIFO of future activities
+boundary while the current drain requires continuation. `queue` inputs remain in
-that open one at a time.
+a FIFO until the Session would otherwise become idle and then promote one at a time.
 Next reviewed slices:
@ -53,16 +53,16 @@ Next reviewed slices:
 - add durable/clustered interruption, retries, and stale-owner fencing only as
  their slices become concrete
-### Deferred durable activity recovery
+### Deferred durable continuation recovery
 Do not infer that ambiguous provider work is safe to retry from an advisory wake.
 The first inbox-driven runner intentionally omits outer provider-attempt markers
 until they have a concrete consumer and a complete recovery policy.
-Design post-crash activity recovery as one explicit slice. It should model:
+Design post-crash continuation recovery as one explicit slice. It should model:
- durable activity identity and settlement
+- promoted input and projected-history state
- queue-opener reservation and steer assignment
+- queued-input promotion and steering assignment
 - provider-attempt preparation versus provider-dispatch ambiguity
 - required post-tool continuation across process loss
 - explicit `retry` and `abandon` decisions for unknown outcomes
@ -70,6 +70,9 @@ Design post-crash activity recovery as one explicit slice. It should model:
 - retry budget, backoff, visible recovery status, startup discovery, and future
  clustered ownership fencing
 Do not introduce an enclosing durable execution identity solely to group these
 facts; a process-local Session drain has no durable transcript boundary.
 ## Plugin API design - James?
 We need to figure out how we want server plugins to work and what hooks are useful.