feat: commit new adaptive routing
This commit is contained in:
parent
dd4a1d2be2
commit
924fa6a3bc
149
docs/my-website/docs/adaptive_router.md
Normal file
149
docs/my-website/docs/adaptive_router.md
Normal file
@ -0,0 +1,149 @@
|
||||
# [BETA] Adaptive Router
|
||||
|
||||
:::info
|
||||
|
||||
Beta feature. Share feedback on [Discord](https://discord.gg/wuPM9dRgDw) or [Slack](https://join.slack.com/t/litellmossslack/shared_invite/zt-3o7nkuyfr-p_kbNJj8taRfXGgQI1~YyA).
|
||||
|
||||
:::
|
||||
|
||||
**Requirements:** LiteLLM Proxy with a Postgres database. Quality estimates are stored in Postgres and loaded on startup — without a database the router works but forgets everything learned on restart.
|
||||
|
||||
You have a cheap model and an expensive one. You want to use the cheap one when it's good enough, and the expensive one when it actually matters — without hardcoding rules you'll spend months tuning.
|
||||
|
||||
The adaptive router does this automatically. It tracks which model performs best for each type of request (code, writing, analysis, etc.) and routes accordingly, balancing quality against cost based on weights you control.
|
||||
|
||||
## Quick start
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: gpt-4o
|
||||
litellm_params:
|
||||
model: openai/gpt-4o
|
||||
model_info:
|
||||
input_cost_per_token: 0.0000025
|
||||
adaptive_router_preferences:
|
||||
quality_tier: 3 # 1=budget, 2=mid, 3=frontier
|
||||
strengths: ["code_generation", "analytical_reasoning"]
|
||||
|
||||
- model_name: gpt-4o-mini
|
||||
litellm_params:
|
||||
model: openai/gpt-4o-mini
|
||||
model_info:
|
||||
input_cost_per_token: 0.00000015
|
||||
adaptive_router_preferences:
|
||||
quality_tier: 2
|
||||
strengths: ["factual_lookup"]
|
||||
|
||||
- model_name: my-router
|
||||
litellm_params:
|
||||
model: adaptive_router/smart-router
|
||||
adaptive_router_config:
|
||||
available_models: ["gpt-4o", "gpt-4o-mini"]
|
||||
weights:
|
||||
quality: 0.7 # raise this if quality complaints; lower if bill too high
|
||||
cost: 0.3 # must sum to 1.0 with quality
|
||||
```
|
||||
|
||||
Route to it by setting `model` to your adaptive router's name:
|
||||
|
||||
```bash
|
||||
curl -X POST {{baseURL}}/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer $LITELLM_API_KEY" \
|
||||
-d '{
|
||||
"model": "my-router",
|
||||
"messages": [
|
||||
{"role": "user", "content": "build me a python script that parses CSV"},
|
||||
{"role": "assistant", "content": "Here is a script using csv.DictReader..."},
|
||||
{"role": "user", "content": "now add error handling for missing files"},
|
||||
{"role": "assistant", "content": "Wrap the open() call in a try/except FileNotFoundError..."},
|
||||
{"role": "user", "content": "perfect, that worked. thanks!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
The response includes an `x-litellm-adaptive-router-model` header telling you which model was actually picked. The "thanks!" turn fires a satisfaction signal — that's what moves the bandit.
|
||||
|
||||
## Tuning cost vs. quality
|
||||
|
||||
The `weights` are your main lever:
|
||||
|
||||
| Goal | quality | cost |
|
||||
|---|---|---|
|
||||
| Minimize cost, quality is secondary | 0.3 | 0.7 |
|
||||
| Balanced | 0.5 | 0.5 |
|
||||
| Quality-first (default) | 0.7 | 0.3 |
|
||||
| Quality non-negotiable | 0.9 | 0.1 |
|
||||
|
||||
The router learns over time. For the first ~10 requests per model, it relies on the tiers you declared. After that, real performance data takes over.
|
||||
|
||||
## Force a minimum quality tier per request
|
||||
|
||||
If a specific request needs a frontier model regardless of cost, pass this header:
|
||||
|
||||
```
|
||||
x-litellm-min-quality-tier: 3
|
||||
```
|
||||
|
||||
You can also pass `min_quality_tier` via request metadata instead of a header.
|
||||
|
||||
## What's being learned
|
||||
|
||||
The router classifies each request into one of 7 types and tracks how each model performs on each independently. A model that's great at factual lookup but poor at code will win factual requests and lose code requests — even if it's cheaper overall.
|
||||
|
||||
| Type | Example |
|
||||
|---|---|
|
||||
| `code_generation` | "write me a Python sort function" |
|
||||
| `code_understanding` | "explain what this function does" |
|
||||
| `technical_design` | "how should I design this API?" |
|
||||
| `analytical_reasoning` | "calculate the probability that..." |
|
||||
| `writing` | "draft an email to my team about..." |
|
||||
| `factual_lookup` | "what is the capital of France?" |
|
||||
| `general` | anything else |
|
||||
|
||||
[**See classifier code**](https://github.com/BerriAI/litellm/blob/litellm_adaptive_routing/litellm/router_strategy/adaptive_router/classifier.py)
|
||||
|
||||
Learning signals are inspired by [Signals: Trajectory Sampling and Triage for Agentic Interactions](https://arxiv.org/pdf/2604.00356).
|
||||
|
||||
## Inspect the current state
|
||||
|
||||
```
|
||||
GET /adaptive_router/{router_name}/state
|
||||
```
|
||||
|
||||
Returns current quality estimates per model per request type. Useful for understanding why a model is or isn't being picked.
|
||||
|
||||
```json
|
||||
{
|
||||
"routers": [
|
||||
{
|
||||
"router_name": "smart-cheap-router",
|
||||
"available_models": ["fast", "smart"],
|
||||
"weights": { "quality": 0.7, "cost": 0.3 },
|
||||
"cells": [
|
||||
{
|
||||
"request_type": "analytical_reasoning",
|
||||
"model": "fast",
|
||||
"quality_mean": 0.5,
|
||||
"samples": 10.0
|
||||
},
|
||||
{
|
||||
"request_type": "analytical_reasoning",
|
||||
"model": "smart",
|
||||
"quality_mean": 0.95,
|
||||
"samples": 10.0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
`quality_mean` is the key number — it's the router's current estimate of how well that model handles that request type. `samples` counts how many real observations have moved the prior (starts at 10, the cold-start mass).
|
||||
|
||||
## Known limitations
|
||||
|
||||
- Latency isn't scored — a slow model can still win on quality + cost
|
||||
- Signals are regex-based and English-biased — no LLM judge
|
||||
- Hard cap of 200 observations per cell; no decay yet
|
||||
- Once a model is picked for a session, other models' turns in that session don't contribute to learning
|
||||
1
docs/my-website/package-lock.json
generated
1
docs/my-website/package-lock.json
generated
@ -24,6 +24,7 @@
|
||||
},
|
||||
"devDependencies": {
|
||||
"@docusaurus/module-type-aliases": "3.8.1",
|
||||
"ajv": "^8.18.0",
|
||||
"dotenv": "16.6.1"
|
||||
},
|
||||
"engines": {
|
||||
|
||||
@ -30,6 +30,7 @@
|
||||
},
|
||||
"devDependencies": {
|
||||
"@docusaurus/module-type-aliases": "3.8.1",
|
||||
"ajv": "^8.18.0",
|
||||
"dotenv": "16.6.1"
|
||||
},
|
||||
"browserslist": {
|
||||
|
||||
@ -1052,6 +1052,7 @@ const sidebars = {
|
||||
},
|
||||
items: [
|
||||
"routing",
|
||||
"adaptive_router",
|
||||
"scheduler",
|
||||
"proxy/auto_routing",
|
||||
"proxy/load_balancing",
|
||||
|
||||
@ -164,6 +164,7 @@ MCP_STDIO_ALLOWED_COMMANDS: frozenset = frozenset(
|
||||
LITELLM_UI_ALLOW_HEADERS = [
|
||||
"x-litellm-semantic-filter",
|
||||
"x-litellm-semantic-filter-tools",
|
||||
"x-litellm-adaptive-router-model",
|
||||
]
|
||||
|
||||
# Gemini model-specific minimal thinking budget constants
|
||||
|
||||
@ -88,6 +88,8 @@ Callers may pass header `x-litellm-min-quality-tier: 3` (or metadata key
|
||||
on the same `litellm.Router` raise at init.
|
||||
- **Bandit-delta mapping is unvalidated.** `_compute_bandit_delta` is a v0
|
||||
guess; expect to retune after the first ~1000 sessions of real traffic.
|
||||
- **`request_type` is classified per turn from the latest user message only.**
|
||||
The first turn's classification doesn't carry forward; a multi-turn session
|
||||
may shift bucket between turns.
|
||||
- **`request_type` is classified per turn from the latest user message.** For
|
||||
non-GENERAL turns, the current-turn type is used for bandit attribution (so
|
||||
genuine mid-session topic shifts update the correct cell). For GENERAL turns
|
||||
("thanks!", "ok", "sounds good"), attribution falls back to the session's
|
||||
original type to avoid misattributing closing pleasantries.
|
||||
|
||||
@ -307,11 +307,21 @@ class AdaptiveRouter:
|
||||
d_alpha, d_beta = self._compute_bandit_delta(delta)
|
||||
print("CALLS D_ALPHA", d_alpha)
|
||||
if d_alpha != 0 or d_beta != 0:
|
||||
cell_key = (request_type, model_name)
|
||||
# For non-GENERAL turns, attribute to the current-turn classification
|
||||
# so genuine mid-session topic shifts (e.g. code → math) update the
|
||||
# correct cell. For GENERAL turns ("thanks!", "ok", "sounds good"), fall
|
||||
# back to the session's original type so closing pleasantries don't
|
||||
# misattribute the reward.
|
||||
attribution_type = (
|
||||
request_type
|
||||
if request_type != RequestType.GENERAL
|
||||
else RequestType(state.classified_type)
|
||||
)
|
||||
cell_key = (attribution_type, model_name)
|
||||
self._cells[cell_key] = apply_delta(self._cells[cell_key], d_alpha, d_beta)
|
||||
await self.queue.add_state_delta(
|
||||
self.router_name,
|
||||
request_type.value,
|
||||
attribution_type.value,
|
||||
model_name,
|
||||
d_alpha,
|
||||
d_beta,
|
||||
|
||||
@ -52,6 +52,12 @@ def initial_cell(
|
||||
capped at 0.95 to avoid an over-confident prior.
|
||||
Total mass = COLD_START_MASS so that ~10 real observations can move it noticeably.
|
||||
"""
|
||||
if prefs.quality_tier not in BASE_TIER_WEIGHT:
|
||||
valid = sorted(BASE_TIER_WEIGHT)
|
||||
raise ValueError(
|
||||
f"quality_tier={prefs.quality_tier} is not supported; "
|
||||
f"valid tiers are {valid}"
|
||||
)
|
||||
base = BASE_TIER_WEIGHT[prefs.quality_tier]
|
||||
bonus = STRENGTH_BONUS if request_type in prefs.strengths else 0.0
|
||||
mean = min(0.95, base + bonus)
|
||||
|
||||
@ -44,10 +44,12 @@ def _resolve_session_key(kwargs: Dict[str, Any]) -> Optional[str]:
|
||||
1. Honor a client-supplied session id (`litellm_session_id` on either
|
||||
`litellm_params` or `litellm_params.metadata`, or `session_id` on
|
||||
metadata) — backward compat for callers already wired up.
|
||||
2. Otherwise derive a sha256 over (identity fields, first message) so
|
||||
the key is stable across turns of the same conversation.
|
||||
2. Otherwise derive a sha256 over (identity fields, first
|
||||
SIGNAL_GATE_MIN_MESSAGES messages) so the key is stable across turns
|
||||
and only materialises once there is enough context for the bandit to
|
||||
act on (matching the gate in the signal-processing path).
|
||||
|
||||
Returns None if there are no messages (nothing to attribute).
|
||||
Returns None if the conversation is shorter than SIGNAL_GATE_MIN_MESSAGES.
|
||||
"""
|
||||
litellm_params = kwargs.get("litellm_params") or {}
|
||||
sid = litellm_params.get("litellm_session_id")
|
||||
@ -60,19 +62,22 @@ def _resolve_session_key(kwargs: Dict[str, Any]) -> Optional[str]:
|
||||
return str(sid)
|
||||
|
||||
messages = kwargs.get("messages") or []
|
||||
if not messages:
|
||||
if len(messages) < SIGNAL_GATE_MIN_MESSAGES:
|
||||
# Don't attribute until we have enough turns to match the signal gate —
|
||||
# ensures the hash is stable (same N messages every time) and avoids
|
||||
# crediting the bandit for conversations that are too short to signal.
|
||||
return None
|
||||
|
||||
identity = ":".join(
|
||||
str(metadata.get(f) or "") if isinstance(metadata, dict) else ""
|
||||
for f in _IDENTITY_FIELDS
|
||||
)
|
||||
first = messages[0]
|
||||
anchor = messages[:SIGNAL_GATE_MIN_MESSAGES]
|
||||
payload = (
|
||||
identity
|
||||
+ "|"
|
||||
+ json.dumps(
|
||||
{"role": first.get("role"), "content": first.get("content")},
|
||||
[{"role": m.get("role"), "content": m.get("content")} for m in anchor],
|
||||
sort_keys=True,
|
||||
default=str,
|
||||
)
|
||||
@ -140,20 +145,23 @@ class AdaptiveRouterPostCallHook(CustomLogger):
|
||||
def __init__(self, adaptive_router: AdaptiveRouter) -> None:
|
||||
self.adaptive_router = adaptive_router
|
||||
|
||||
async def async_post_call_success_hook(
|
||||
async def async_post_call_response_headers_hook(
|
||||
self,
|
||||
data: Dict[str, Any],
|
||||
user_api_key_dict: Any,
|
||||
response: Any,
|
||||
) -> None:
|
||||
request_headers: Optional[Dict[str, str]] = None,
|
||||
litellm_call_info: Optional[Dict[str, Any]] = None,
|
||||
) -> Optional[Dict[str, str]]:
|
||||
"""
|
||||
Surface the chosen logical model picked by the pre-routing hook as the
|
||||
`x-litellm-adaptive-router-model` response header.
|
||||
Surface the chosen logical model as the `x-litellm-adaptive-router-model`
|
||||
response header for both streaming and non-streaming responses.
|
||||
|
||||
The chosen model is stashed on `data["metadata"]` by
|
||||
`AdaptiveRouter.async_pre_routing_hook`. The proxy awaits this hook
|
||||
before reading `_hidden_params["additional_headers"]` for the outgoing
|
||||
HTTP response, so any value we write here flows through.
|
||||
`async_post_call_success_hook` fires after the stream is fully consumed,
|
||||
so writing to `_hidden_params["additional_headers"]` there is too late for
|
||||
streaming — the StreamingResponse headers are already frozen. This hook is
|
||||
called during header construction (before StreamingResponse is built), so
|
||||
the header is included for both paths.
|
||||
"""
|
||||
metadata = data.get("metadata") or {}
|
||||
chosen = (
|
||||
@ -162,12 +170,8 @@ class AdaptiveRouterPostCallHook(CustomLogger):
|
||||
else None
|
||||
)
|
||||
if not chosen:
|
||||
return
|
||||
hidden_params = getattr(response, "_hidden_params", None)
|
||||
if not isinstance(hidden_params, dict):
|
||||
return
|
||||
hidden_params.setdefault("additional_headers", {})
|
||||
hidden_params["additional_headers"][ADAPTIVE_ROUTER_RESPONSE_HEADER] = chosen
|
||||
return None
|
||||
return {ADAPTIVE_ROUTER_RESPONSE_HEADER: chosen}
|
||||
|
||||
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
|
||||
await self._record(kwargs, response_obj, response_status=200)
|
||||
|
||||
157
scripts/adaptive_router_demo/README.md
Normal file
157
scripts/adaptive_router_demo/README.md
Normal file
@ -0,0 +1,157 @@
|
||||
# Adaptive Router — Live Demo
|
||||
|
||||
A 5-minute demo of LiteLLM's adaptive router learning, in real time, that
|
||||
the smart model wins for code while the fast model is fine for facts.
|
||||
|
||||
```
|
||||
┌─ traffic.py ──┐ ┌─ litellm proxy ──────────┐ ┌─ dashboard.html ─┐
|
||||
│ synthetic │──▶│ adaptive_router strategy │──▶│ bandit bars + │
|
||||
│ chat sessions │ │ /adaptive_router/state │ │ cost meter + │
|
||||
└───────────────┘ └──────────┬───────────────┘ │ activity log │
|
||||
│ └───────────────────┘
|
||||
┌─────────▼───────────┐
|
||||
│ chat.html │
|
||||
│ interactive chat │
|
||||
│ with preset │
|
||||
│ scenarios │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
## Files
|
||||
|
||||
| File | What it does |
|
||||
|---|---|
|
||||
| `dashboard.html` | Live bandit dashboard — polls `/adaptive_router/state` every 500ms |
|
||||
| `chat.html` | Interactive chat with preset scenarios — sends real requests through the router |
|
||||
| `traffic.py` | Synthetic traffic generator — drives labeled sessions for automated demo |
|
||||
|
||||
## What you're watching
|
||||
|
||||
- **Bandit posteriors** — one Beta(α, β) bar per `(request_type, model)`
|
||||
cell. Bars fill up as α grows from positive feedback signals.
|
||||
- **Pick share** — softmax estimate of how often the router would currently
|
||||
pick each model for that request type.
|
||||
- **Cost meter** — total spend so far compared to "always use the most
|
||||
expensive model". The savings line is the headline number.
|
||||
- **Activity log** — every signal that moves the bandit, in real time.
|
||||
|
||||
## 1. Start the proxy
|
||||
|
||||
The repo ships with a working example config:
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY=sk-... # underlying models hit OpenAI
|
||||
uv run litellm \
|
||||
--config litellm/proxy/example_config_yaml/adaptive_router_example.yaml \
|
||||
--port 4000
|
||||
```
|
||||
|
||||
`DATABASE_URL` is optional — the proxy falls back to a bundled Neon dev DB.
|
||||
Wait ~15s until you see `Application startup complete`.
|
||||
|
||||
## 2. Chat interactively with the router
|
||||
|
||||
Open `chat.html` in a browser (same `file://` or `python3 -m http.server` approach as the dashboard):
|
||||
|
||||
- Click **Connect** after filling in the proxy URL and API key.
|
||||
- Pick a preset scenario:
|
||||
- **🐛 Debug my code** — paste broken code and get a fix
|
||||
- **💡 Brainstorm a feature** — ideate on a product capability
|
||||
- **📚 Explain a concept** — get a clear technical explanation
|
||||
- **✍️ Write something** — draft emails, docs, or any prose
|
||||
- A starter message is pre-filled — edit it or send as-is.
|
||||
- Each response shows which model the router picked and the inferred request type (from the `x-litellm-adaptive-router-model` and `x-litellm-request-type` response headers).
|
||||
- A sidebar gate indicator tells you when the session has accumulated enough messages for the bandit to start updating (4+ turns).
|
||||
|
||||
> **Note on headers:** The model/type headers are only readable in the browser if the proxy sets `Access-Control-Expose-Headers`. LiteLLM defaults to exposing them. If the info panel shows `check dashboard`, the router still works — you can verify picks in `dashboard.html`.
|
||||
|
||||
## 4. Open the dashboard
|
||||
|
||||
The dashboard is a single static HTML file. Either:
|
||||
|
||||
- **Easy:** double-click `dashboard.html`. Most browsers will load it from
|
||||
`file://` and the LiteLLM proxy's CORS defaults (`*`) will accept it.
|
||||
- **If your browser blocks `file://` fetches:**
|
||||
|
||||
```bash
|
||||
cd scripts/adaptive_router_demo
|
||||
python3 -m http.server 8080
|
||||
```
|
||||
|
||||
Then open <http://localhost:8080/dashboard.html>.
|
||||
|
||||
In the connect bar, fill in:
|
||||
|
||||
- **Proxy URL:** `http://localhost:4000`
|
||||
- **Master Key:** the `master_key` from your config (`sk-1234` in the example).
|
||||
|
||||
Click **Connect**. The dashboard polls `GET /adaptive_router/state` every
|
||||
500ms (admin-only endpoint, returns one snapshot per configured router).
|
||||
|
||||
## 5. Drive synthetic traffic
|
||||
|
||||
In a second terminal:
|
||||
|
||||
```bash
|
||||
uv run python scripts/adaptive_router_demo/traffic.py \
|
||||
--proxy-url http://localhost:4000 \
|
||||
--api-key sk-1234 \
|
||||
--router smart-cheap-router \
|
||||
--rounds 100 \
|
||||
--rate 0.5
|
||||
```
|
||||
|
||||
What it does:
|
||||
|
||||
- Picks a random `(request_type, prompt)` per round from a small labeled corpus.
|
||||
- Sends a 5-message conversation (passes the `SIGNAL_GATE_MIN_MESSAGES=4` gate
|
||||
in one round-trip) so the post-call hook runs and updates the bandit.
|
||||
- Reads the `x-litellm-adaptive-router-model` response header to see what
|
||||
the router picked.
|
||||
- Rolls Bernoulli against a hard-coded oracle:
|
||||
```
|
||||
code_generation : smart=0.92 fast=0.35
|
||||
factual_lookup : smart=0.90 fast=0.85
|
||||
writing : smart=0.85 fast=0.55
|
||||
```
|
||||
- On success → sends a follow-up engineered to match the satisfaction
|
||||
regex (and re-classify into the same type). Bandit cell gets +α.
|
||||
- On failure → sends a neutral follow-up. No signal fires.
|
||||
|
||||
After 50–80 rounds you'll see `code_generation` decisively favor `smart`
|
||||
while `factual_lookup` stays near a coin flip — the router learned the
|
||||
asymmetry from the oracle.
|
||||
|
||||
## Tuning knobs
|
||||
|
||||
| Knob | Where | What changes |
|
||||
|---|---|---|
|
||||
| Quality vs. cost weight | `adaptive_router_config.weights` in proxy yaml | Bias toward quality or savings |
|
||||
| Per-cell cold-start mass | `litellm/router_strategy/adaptive_router/config.py` `COLD_START_MASS` | How long until the prior is overwritten |
|
||||
| Avg tokens per request | dashboard input box | How the cost meter estimates spend |
|
||||
| Oracle | `traffic.py` `ORACLE` dict | Which model "should" win for which type |
|
||||
| Sessions to drive | `--rounds` | Total learning budget |
|
||||
| Throttle | `--rate` | Seconds between sessions |
|
||||
|
||||
## Multi-router
|
||||
|
||||
If your proxy has more than one `auto_router/adaptive_router` deployment,
|
||||
the dashboard shows a router dropdown above the bars. Each router is
|
||||
independent; the cost meter is per-router (and resets when you switch).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- **"Disconnected" / HTTP 401 in the dashboard** — wrong master key.
|
||||
- **HTTP 403** — your key isn't `proxy_admin`. The state endpoint is
|
||||
admin-only. Use the master key.
|
||||
- **HTTP 404 from `/adaptive_router/state`** — proxy started, but no
|
||||
`auto_router/adaptive_router` deployment is in the model list.
|
||||
- **Bars don't move** — check the proxy logs for `record_turn` activity.
|
||||
Common cause: requests are not including 4+ messages, so the signal
|
||||
gate skips them. `traffic.py` already builds 5-message conversations,
|
||||
so this only happens if you've changed the script.
|
||||
- **Cost meter stays at $0** — your model deployments don't have
|
||||
`input_cost_per_token` set in `litellm_params`. Add it.
|
||||
- **CORS error in the dashboard console** — set `LITELLM_CORS_ORIGINS=*`
|
||||
on the proxy (the default), or serve `dashboard.html` from
|
||||
`python3 -m http.server` instead of `file://`.
|
||||
838
scripts/adaptive_router_demo/chat.html
Normal file
838
scripts/adaptive_router_demo/chat.html
Normal file
@ -0,0 +1,838 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<title>Adaptive Router — Chat</title>
|
||||
<style>
|
||||
:root {
|
||||
--bg: #0b0f17;
|
||||
--panel: #131a26;
|
||||
--panel-2: #1b2433;
|
||||
--fg: #e7ecf3;
|
||||
--muted: #8a95a8;
|
||||
--accent: #5dd6a4;
|
||||
--accent-2: #6fb6ff;
|
||||
--warn: #f6b94d;
|
||||
--bad: #ff6b6b;
|
||||
--bar-bg: #233047;
|
||||
--border: #25324a;
|
||||
}
|
||||
* { box-sizing: border-box; }
|
||||
body {
|
||||
margin: 0;
|
||||
font-family: -apple-system, BlinkMacSystemFont, "SF Pro Display",
|
||||
"Segoe UI", Roboto, Inter, sans-serif;
|
||||
background: var(--bg);
|
||||
color: var(--fg);
|
||||
font-size: 14px;
|
||||
line-height: 1.45;
|
||||
height: 100vh;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
}
|
||||
|
||||
header {
|
||||
padding: 14px 24px;
|
||||
border-bottom: 1px solid var(--border);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 16px;
|
||||
background: var(--panel);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
header h1 { margin: 0; font-size: 17px; font-weight: 600; }
|
||||
.dot { width: 8px; height: 8px; border-radius: 50%; background: var(--bad); display: inline-block; margin-right: 6px; }
|
||||
.dot.ok { background: var(--accent); }
|
||||
.status { color: var(--muted); font-size: 12px; }
|
||||
.header-link { margin-left: auto; color: var(--accent-2); font-size: 12px; text-decoration: none; }
|
||||
.header-link:hover { text-decoration: underline; }
|
||||
|
||||
.connect {
|
||||
padding: 12px 24px;
|
||||
display: flex; gap: 10px; align-items: center;
|
||||
background: var(--panel-2);
|
||||
border-bottom: 1px solid var(--border);
|
||||
flex-shrink: 0;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
.connect label { color: var(--muted); font-size: 12px; }
|
||||
.connect input, .connect select {
|
||||
background: #0e1422;
|
||||
color: var(--fg);
|
||||
border: 1px solid var(--border);
|
||||
padding: 6px 10px;
|
||||
border-radius: 6px;
|
||||
font-size: 13px;
|
||||
font-family: inherit;
|
||||
}
|
||||
.connect input[type=text] { width: 210px; }
|
||||
.connect input[type=password] { width: 180px; }
|
||||
.connect button.btn-connect {
|
||||
background: var(--accent-2);
|
||||
color: #0b0f17;
|
||||
border: none;
|
||||
padding: 7px 14px;
|
||||
border-radius: 6px;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
/* scenario strip */
|
||||
.scenarios {
|
||||
padding: 10px 24px;
|
||||
display: flex; gap: 8px; flex-wrap: wrap;
|
||||
background: var(--panel);
|
||||
border-bottom: 1px solid var(--border);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
.scenarios .sc-btn {
|
||||
background: var(--panel-2);
|
||||
border: 1px solid var(--border);
|
||||
color: var(--fg);
|
||||
padding: 7px 14px;
|
||||
border-radius: 20px;
|
||||
cursor: pointer;
|
||||
font-size: 13px;
|
||||
font-family: inherit;
|
||||
transition: border-color 0.15s, background 0.15s;
|
||||
}
|
||||
.scenarios .sc-btn:hover { border-color: var(--accent-2); background: #1e2d42; }
|
||||
.scenarios .sc-btn.active { border-color: var(--accent-2); background: #1a2d45; color: var(--accent-2); }
|
||||
.scenarios .sc-btn.new { border-color: var(--border); color: var(--muted); }
|
||||
.scenarios .sc-btn.new:hover { border-color: var(--warn); color: var(--warn); background: #1f1a10; }
|
||||
|
||||
/* main layout */
|
||||
.workspace {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 300px;
|
||||
flex: 1;
|
||||
min-height: 0;
|
||||
}
|
||||
@media (max-width: 900px) {
|
||||
.workspace { grid-template-columns: 1fr; }
|
||||
.info-panel { display: none; }
|
||||
}
|
||||
|
||||
/* chat panel */
|
||||
.chat-panel {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
min-height: 0;
|
||||
border-right: 1px solid var(--border);
|
||||
}
|
||||
|
||||
.messages {
|
||||
flex: 1;
|
||||
overflow-y: auto;
|
||||
padding: 20px 24px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 16px;
|
||||
}
|
||||
.messages .empty-state {
|
||||
margin: auto;
|
||||
text-align: center;
|
||||
color: var(--muted);
|
||||
}
|
||||
.messages .empty-state h2 {
|
||||
font-size: 18px;
|
||||
font-weight: 600;
|
||||
color: var(--fg);
|
||||
margin: 0 0 8px;
|
||||
}
|
||||
.messages .empty-state p {
|
||||
font-size: 13px;
|
||||
margin: 0;
|
||||
max-width: 360px;
|
||||
}
|
||||
|
||||
.msg {
|
||||
display: flex;
|
||||
gap: 12px;
|
||||
align-items: flex-start;
|
||||
}
|
||||
.msg.assistant { flex-direction: row; }
|
||||
.msg.user { flex-direction: row-reverse; }
|
||||
|
||||
.avatar {
|
||||
width: 28px; height: 28px;
|
||||
border-radius: 50%;
|
||||
display: flex; align-items: center; justify-content: center;
|
||||
font-size: 13px;
|
||||
flex-shrink: 0;
|
||||
}
|
||||
.msg.user .avatar { background: var(--accent-2); color: #0b0f17; font-weight: 700; }
|
||||
.msg.assistant .avatar { background: var(--accent); color: #0b0f17; }
|
||||
|
||||
.bubble {
|
||||
max-width: 75%;
|
||||
padding: 10px 14px;
|
||||
border-radius: 12px;
|
||||
font-size: 13px;
|
||||
line-height: 1.55;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
.msg.user .bubble {
|
||||
background: #1a2d45;
|
||||
border: 1px solid var(--border);
|
||||
border-top-right-radius: 4px;
|
||||
}
|
||||
.msg.assistant .bubble {
|
||||
background: var(--panel);
|
||||
border: 1px solid var(--border);
|
||||
border-top-left-radius: 4px;
|
||||
}
|
||||
.bubble code {
|
||||
font-family: ui-monospace, "SF Mono", Menlo, monospace;
|
||||
background: rgba(255,255,255,0.06);
|
||||
padding: 1px 4px;
|
||||
border-radius: 3px;
|
||||
font-size: 12px;
|
||||
}
|
||||
.bubble pre {
|
||||
background: #0e1422;
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
padding: 10px 12px;
|
||||
overflow-x: auto;
|
||||
margin: 8px 0 0;
|
||||
}
|
||||
.bubble pre code {
|
||||
background: none;
|
||||
padding: 0;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.msg-meta {
|
||||
font-size: 11px;
|
||||
color: var(--muted);
|
||||
margin-top: 4px;
|
||||
padding: 0 2px;
|
||||
}
|
||||
.msg.user .msg-meta { text-align: right; }
|
||||
.msg-model { color: var(--accent); font-weight: 600; }
|
||||
.msg-type { color: var(--accent-2); }
|
||||
|
||||
.thinking {
|
||||
display: flex; gap: 4px; align-items: center;
|
||||
padding: 8px 0;
|
||||
}
|
||||
.thinking span {
|
||||
width: 6px; height: 6px; border-radius: 50%;
|
||||
background: var(--muted);
|
||||
animation: blink 1.2s infinite;
|
||||
}
|
||||
.thinking span:nth-child(2) { animation-delay: 0.2s; }
|
||||
.thinking span:nth-child(3) { animation-delay: 0.4s; }
|
||||
@keyframes blink {
|
||||
0%, 80%, 100% { opacity: 0.2; }
|
||||
40% { opacity: 1; }
|
||||
}
|
||||
|
||||
/* input area */
|
||||
.input-area {
|
||||
padding: 14px 24px;
|
||||
border-top: 1px solid var(--border);
|
||||
background: var(--panel);
|
||||
flex-shrink: 0;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 8px;
|
||||
}
|
||||
.input-row {
|
||||
display: flex;
|
||||
gap: 10px;
|
||||
align-items: flex-end;
|
||||
}
|
||||
textarea {
|
||||
flex: 1;
|
||||
background: #0e1422;
|
||||
color: var(--fg);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 8px;
|
||||
padding: 10px 12px;
|
||||
font-size: 13px;
|
||||
font-family: inherit;
|
||||
resize: none;
|
||||
outline: none;
|
||||
min-height: 44px;
|
||||
max-height: 160px;
|
||||
line-height: 1.45;
|
||||
}
|
||||
textarea:focus { border-color: var(--accent-2); }
|
||||
textarea:disabled { opacity: 0.5; }
|
||||
.btn-send {
|
||||
background: var(--accent);
|
||||
color: #0b0f17;
|
||||
border: none;
|
||||
padding: 10px 18px;
|
||||
border-radius: 8px;
|
||||
font-weight: 700;
|
||||
font-size: 13px;
|
||||
cursor: pointer;
|
||||
font-family: inherit;
|
||||
flex-shrink: 0;
|
||||
align-self: flex-end;
|
||||
}
|
||||
.btn-send:disabled { opacity: 0.5; cursor: default; }
|
||||
|
||||
.input-hint {
|
||||
font-size: 11px;
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
/* info panel */
|
||||
.info-panel {
|
||||
background: var(--panel);
|
||||
padding: 18px 16px;
|
||||
overflow-y: auto;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 16px;
|
||||
}
|
||||
.info-block h3 {
|
||||
margin: 0 0 10px;
|
||||
font-size: 11px;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.6px;
|
||||
color: var(--muted);
|
||||
font-weight: 600;
|
||||
}
|
||||
.info-row {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
font-size: 12px;
|
||||
margin-bottom: 6px;
|
||||
}
|
||||
.info-row .label { color: var(--muted); }
|
||||
.info-row .val { font-family: ui-monospace, monospace; color: var(--fg); font-weight: 600; }
|
||||
.info-row .val.accent { color: var(--accent); }
|
||||
.info-row .val.accent-2 { color: var(--accent-2); }
|
||||
.info-row .val.warn { color: var(--warn); }
|
||||
|
||||
.signal-gate {
|
||||
padding: 8px 10px;
|
||||
border-radius: 6px;
|
||||
font-size: 12px;
|
||||
line-height: 1.4;
|
||||
}
|
||||
.signal-gate.waiting {
|
||||
background: rgba(246, 185, 77, 0.08);
|
||||
border: 1px solid rgba(246, 185, 77, 0.3);
|
||||
color: var(--warn);
|
||||
}
|
||||
.signal-gate.learning {
|
||||
background: rgba(93, 214, 164, 0.08);
|
||||
border: 1px solid rgba(93, 214, 164, 0.3);
|
||||
color: var(--accent);
|
||||
}
|
||||
|
||||
.scenario-desc {
|
||||
font-size: 12px;
|
||||
color: var(--muted);
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
.divider {
|
||||
border: none;
|
||||
border-top: 1px solid var(--border);
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.link-dash {
|
||||
display: block;
|
||||
text-align: center;
|
||||
color: var(--accent-2);
|
||||
font-size: 12px;
|
||||
text-decoration: none;
|
||||
padding: 8px;
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
margin-top: auto;
|
||||
}
|
||||
.link-dash:hover { border-color: var(--accent-2); background: #1a2d45; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<header>
|
||||
<h1>⚡ Adaptive Router — Chat</h1>
|
||||
<span class="status"><span id="conn-dot" class="dot"></span><span id="conn-label">Disconnected</span></span>
|
||||
<a class="header-link" href="dashboard.html">→ Open live dashboard</a>
|
||||
</header>
|
||||
|
||||
<div class="connect">
|
||||
<label>Proxy URL <input id="proxy-url" type="text" value="http://localhost:4000" /></label>
|
||||
<label>API Key <input id="api-key" type="password" placeholder="sk-1234" /></label>
|
||||
<label>Router
|
||||
<input id="router" type="text" value="smart-cheap-router" style="width:160px" />
|
||||
</label>
|
||||
<button class="btn-connect" id="connect-btn">Connect</button>
|
||||
</div>
|
||||
|
||||
<div class="scenarios" id="scenario-bar">
|
||||
<button class="sc-btn" data-id="debug_code">🐛 Debug my code</button>
|
||||
<button class="sc-btn" data-id="brainstorm_feature">💡 Brainstorm a feature</button>
|
||||
<button class="sc-btn" data-id="explain_concept">📚 Explain a concept</button>
|
||||
<button class="sc-btn" data-id="write_something">✍️ Write something</button>
|
||||
<button class="sc-btn new" id="new-chat-btn">+ New chat</button>
|
||||
</div>
|
||||
|
||||
<div class="workspace">
|
||||
<div class="chat-panel">
|
||||
<div class="messages" id="messages">
|
||||
<div class="empty-state">
|
||||
<h2>Pick a scenario to start</h2>
|
||||
<p>Choose one of the presets above or connect to the proxy and type your own message. The adaptive router will pick the best model for each turn.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="input-area">
|
||||
<div class="input-row">
|
||||
<textarea id="input" rows="1" placeholder="Send a message… (Shift+Enter for new line)" disabled></textarea>
|
||||
<button class="btn-send" id="send-btn" disabled>Send</button>
|
||||
</div>
|
||||
<div class="input-hint" id="input-hint">Connect first to start chatting.</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<aside class="info-panel">
|
||||
<div class="info-block">
|
||||
<h3>Session</h3>
|
||||
<div class="info-row"><span class="label">ID</span><span class="val" id="info-session-id">—</span></div>
|
||||
<div class="info-row"><span class="label">Messages</span><span class="val" id="info-msg-count">0</span></div>
|
||||
<div class="info-row"><span class="label">Scenario</span><span class="val accent-2" id="info-scenario">none</span></div>
|
||||
</div>
|
||||
|
||||
<div id="gate-status" class="signal-gate waiting" style="display:none">
|
||||
⏳ <b>Learning starts at 4 messages.</b> Keep chatting — the router will start updating its bandit after your next reply.
|
||||
</div>
|
||||
|
||||
<hr class="divider" />
|
||||
|
||||
<div class="info-block">
|
||||
<h3>Last response</h3>
|
||||
<div class="info-row"><span class="label">Model picked</span><span class="val accent" id="info-model">—</span></div>
|
||||
<div class="info-row"><span class="label">Request type</span><span class="val accent-2" id="info-req-type">—</span></div>
|
||||
<div class="info-row"><span class="label">Latency</span><span class="val" id="info-latency">—</span></div>
|
||||
</div>
|
||||
|
||||
<hr class="divider" />
|
||||
|
||||
<div class="info-block">
|
||||
<h3>How it works</h3>
|
||||
<p class="scenario-desc">
|
||||
Each message goes through the <b>adaptive router</b> which classifies your request type (code, writing, factual…) and uses a Thompson-sampling bandit to pick the model with the best quality for that category.<br><br>
|
||||
After 4+ messages, positive feedback signals (✓ in the activity log) update the bandit. Watch the bars move in the <a href="dashboard.html" style="color:var(--accent-2)">live dashboard</a>.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<a class="link-dash" href="dashboard.html">📊 Live bandit dashboard →</a>
|
||||
</aside>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
// ---- scenarios -------------------------------------------------------
|
||||
const SCENARIOS = {
|
||||
debug_code: {
|
||||
label: "Debug my code",
|
||||
system: "You are an expert debugging assistant. Be concise. Identify the bug, explain why it's wrong, and provide the corrected code.",
|
||||
starter: "I have a Python function that should return the sum of a list, but it always returns 0:\n\n```python\ndef sum_list(items):\n total = 0\n for item in items:\n total + item\n return total\n\nprint(sum_list([1, 2, 3])) # prints 0, expected 6\n```\n\nWhat's wrong with it?",
|
||||
},
|
||||
brainstorm_feature: {
|
||||
label: "Brainstorm a feature",
|
||||
system: "You are a product thinking partner. Help explore feature ideas with concrete examples, trade-offs, and implementation considerations. Be specific and opinionated.",
|
||||
starter: "I'm building a note-taking app for developers. What are 5 differentiated features that would make it stand out from Notion or Obsidian? Focus on things that would genuinely solve developer pain points.",
|
||||
},
|
||||
explain_concept: {
|
||||
label: "Explain a concept",
|
||||
system: "You are a clear, concise technical educator. Explain concepts with simple language, good analogies, and concrete examples. Avoid unnecessary jargon.",
|
||||
starter: "Can you explain how the Thompson Sampling algorithm works and why it's better than epsilon-greedy for multi-armed bandit problems? Use a concrete example if it helps.",
|
||||
},
|
||||
write_something: {
|
||||
label: "Write something",
|
||||
system: "You are a skilled writer. Produce clear, professional text tailored to the requested format and tone. Match the voice the user asks for.",
|
||||
starter: "Write a short Slack message to my team letting them know our weekly standup is moving from 9am to 10am starting next Monday. Keep it brief, friendly, and include a clear ask for them to update their calendars.",
|
||||
},
|
||||
};
|
||||
|
||||
// ---- state -----------------------------------------------------------
|
||||
const STATE = {
|
||||
connected: false,
|
||||
proxyUrl: "",
|
||||
apiKey: "",
|
||||
router: "",
|
||||
sessionId: "",
|
||||
messages: [], // [{role, content}] — sent to the API
|
||||
scenario: null,
|
||||
sending: false,
|
||||
msgCount: 0, // turns in current session
|
||||
lastModel: null,
|
||||
lastReqType: null,
|
||||
};
|
||||
|
||||
// ---- session ---------------------------------------------------------
|
||||
function newSession() {
|
||||
STATE.sessionId = "chat-" + Math.random().toString(36).slice(2, 10);
|
||||
STATE.messages = [];
|
||||
STATE.msgCount = 0;
|
||||
STATE.lastModel = null;
|
||||
STATE.lastReqType = null;
|
||||
renderInfo();
|
||||
renderGateStatus();
|
||||
}
|
||||
|
||||
// ---- persistence -----------------------------------------------------
|
||||
function ssGet(k) { try { return sessionStorage.getItem(k) || ""; } catch { return ""; } }
|
||||
function ssSet(k, v) { try { sessionStorage.setItem(k, v); } catch {} }
|
||||
|
||||
// ---- rendering -------------------------------------------------------
|
||||
function setConn(ok, label) {
|
||||
document.getElementById("conn-dot").className = "dot" + (ok ? " ok" : "");
|
||||
document.getElementById("conn-label").textContent = ok ? "Connected" : label || "Disconnected";
|
||||
}
|
||||
|
||||
function renderInfo() {
|
||||
const shortId = STATE.sessionId ? STATE.sessionId.slice(-8) : "—";
|
||||
document.getElementById("info-session-id").textContent = shortId;
|
||||
document.getElementById("info-msg-count").textContent = STATE.msgCount;
|
||||
document.getElementById("info-scenario").textContent = STATE.scenario ? SCENARIOS[STATE.scenario].label : "none";
|
||||
document.getElementById("info-model").textContent = STATE.lastModel || "—";
|
||||
document.getElementById("info-req-type").textContent = STATE.lastReqType || "—";
|
||||
}
|
||||
|
||||
function renderGateStatus() {
|
||||
const el = document.getElementById("gate-status");
|
||||
if (STATE.msgCount === 0) { el.style.display = "none"; return; }
|
||||
el.style.display = "";
|
||||
if (STATE.msgCount < 4) {
|
||||
el.className = "signal-gate waiting";
|
||||
el.innerHTML = `⏳ <b>${4 - STATE.msgCount} more message${4 - STATE.msgCount > 1 ? "s" : ""} until learning kicks in.</b> The bandit updates after 4+ turns.`;
|
||||
} else {
|
||||
el.className = "signal-gate learning";
|
||||
el.innerHTML = `✅ <b>Bandit is learning!</b> Each reply is now updating the router's model quality estimates.`;
|
||||
}
|
||||
}
|
||||
|
||||
function appendEmptyState() {
|
||||
const msgs = document.getElementById("messages");
|
||||
msgs.innerHTML = `<div class="empty-state">
|
||||
<h2>Pick a scenario to start</h2>
|
||||
<p>Choose one of the presets above or type your own message. The adaptive router picks the best model for each turn.</p>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
function clearMessages() {
|
||||
document.getElementById("messages").innerHTML = "";
|
||||
}
|
||||
|
||||
function appendMessage(role, content, meta) {
|
||||
const msgs = document.getElementById("messages");
|
||||
const div = document.createElement("div");
|
||||
div.className = `msg ${role}`;
|
||||
div.dataset.role = role;
|
||||
|
||||
const avatar = document.createElement("div");
|
||||
avatar.className = "avatar";
|
||||
avatar.textContent = role === "user" ? "U" : "AI";
|
||||
|
||||
const wrap = document.createElement("div");
|
||||
|
||||
const bubble = document.createElement("div");
|
||||
bubble.className = "bubble";
|
||||
bubble.textContent = content; // plain text; code blocks show as preformatted
|
||||
renderBubble(bubble, content);
|
||||
|
||||
wrap.appendChild(bubble);
|
||||
|
||||
if (meta) {
|
||||
const metaEl = document.createElement("div");
|
||||
metaEl.className = "msg-meta";
|
||||
metaEl.innerHTML = meta;
|
||||
wrap.appendChild(metaEl);
|
||||
}
|
||||
|
||||
div.appendChild(avatar);
|
||||
div.appendChild(wrap);
|
||||
msgs.appendChild(div);
|
||||
msgs.scrollTop = msgs.scrollHeight;
|
||||
return bubble;
|
||||
}
|
||||
|
||||
function renderBubble(el, text) {
|
||||
// Minimal markdown: fenced code blocks and inline code.
|
||||
const escaped = text
|
||||
.replace(/&/g, "&")
|
||||
.replace(/</g, "<")
|
||||
.replace(/>/g, ">");
|
||||
|
||||
const withBlocks = escaped.replace(
|
||||
/```(\w*)\n?([\s\S]*?)```/g,
|
||||
(_, lang, code) => `<pre><code>${code.trimEnd()}</code></pre>`
|
||||
);
|
||||
const withInline = withBlocks.replace(/`([^`]+)`/g, "<code>$1</code>");
|
||||
el.innerHTML = withInline;
|
||||
}
|
||||
|
||||
function appendThinking() {
|
||||
const msgs = document.getElementById("messages");
|
||||
const div = document.createElement("div");
|
||||
div.className = "msg assistant";
|
||||
div.id = "thinking-bubble";
|
||||
|
||||
const avatar = document.createElement("div");
|
||||
avatar.className = "avatar";
|
||||
avatar.textContent = "AI";
|
||||
|
||||
const bubble = document.createElement("div");
|
||||
bubble.className = "bubble";
|
||||
bubble.innerHTML = `<div class="thinking"><span></span><span></span><span></span></div>`;
|
||||
|
||||
div.appendChild(avatar);
|
||||
div.appendChild(bubble);
|
||||
msgs.appendChild(div);
|
||||
msgs.scrollTop = msgs.scrollHeight;
|
||||
return bubble;
|
||||
}
|
||||
|
||||
function removeThinking() {
|
||||
const el = document.getElementById("thinking-bubble");
|
||||
if (el) el.remove();
|
||||
}
|
||||
|
||||
// ---- chat send -------------------------------------------------------
|
||||
async function sendMessage(text) {
|
||||
if (STATE.sending || !text.trim()) return;
|
||||
STATE.sending = true;
|
||||
setSendEnabled(false);
|
||||
|
||||
// Add user message to history and UI
|
||||
STATE.messages.push({ role: "user", content: text });
|
||||
STATE.msgCount++;
|
||||
appendMessage("user", text);
|
||||
renderInfo();
|
||||
renderGateStatus();
|
||||
|
||||
const thinkingBubble = appendThinking();
|
||||
const t0 = Date.now();
|
||||
|
||||
try {
|
||||
const body = {
|
||||
model: STATE.router,
|
||||
messages: STATE.messages,
|
||||
stream: true,
|
||||
metadata: { litellm_session_id: STATE.sessionId },
|
||||
};
|
||||
|
||||
const resp = await fetch(`${STATE.proxyUrl}/v1/chat/completions`, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Authorization": `Bearer ${STATE.apiKey}`,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify(body),
|
||||
});
|
||||
|
||||
if (!resp.ok) {
|
||||
const err = await resp.text().catch(() => `HTTP ${resp.status}`);
|
||||
removeThinking();
|
||||
appendMessage("assistant", `Error ${resp.status}: ${err}`);
|
||||
STATE.sending = false;
|
||||
setSendEnabled(true);
|
||||
return;
|
||||
}
|
||||
|
||||
// Read model from response header (requires proxy to expose via CORS).
|
||||
const chosenModel = resp.headers.get("x-litellm-adaptive-router-model");
|
||||
const reqType = resp.headers.get("x-litellm-request-type");
|
||||
STATE.lastModel = chosenModel || "check dashboard";
|
||||
STATE.lastReqType = reqType || "—";
|
||||
|
||||
// Stream the response.
|
||||
removeThinking();
|
||||
const assistantBubble = appendMessage("assistant", "");
|
||||
let fullContent = "";
|
||||
|
||||
const reader = resp.body.getReader();
|
||||
const decoder = new TextDecoder();
|
||||
let buffer = "";
|
||||
|
||||
while (true) {
|
||||
const { done, value } = await reader.read();
|
||||
if (done) break;
|
||||
buffer += decoder.decode(value, { stream: true });
|
||||
|
||||
// Process complete SSE lines.
|
||||
const lines = buffer.split("\n");
|
||||
buffer = lines.pop(); // last fragment may be incomplete
|
||||
|
||||
for (const line of lines) {
|
||||
if (!line.startsWith("data: ")) continue;
|
||||
const raw = line.slice(6).trim();
|
||||
if (raw === "[DONE]") break;
|
||||
try {
|
||||
const chunk = JSON.parse(raw);
|
||||
const delta = chunk.choices?.[0]?.delta?.content || "";
|
||||
fullContent += delta;
|
||||
renderBubble(assistantBubble, fullContent);
|
||||
assistantBubble.closest(".messages")
|
||||
? (assistantBubble.closest(".messages").scrollTop = assistantBubble.closest(".messages").scrollHeight)
|
||||
: null;
|
||||
document.getElementById("messages").scrollTop = document.getElementById("messages").scrollHeight;
|
||||
} catch { /* incomplete JSON chunk, fine */ }
|
||||
}
|
||||
}
|
||||
|
||||
const latency = ((Date.now() - t0) / 1000).toFixed(2) + "s";
|
||||
document.getElementById("info-latency").textContent = latency;
|
||||
|
||||
// Add assistant turn meta
|
||||
const metaParts = [];
|
||||
if (chosenModel) metaParts.push(`<span class="msg-model">${chosenModel}</span>`);
|
||||
if (reqType) metaParts.push(`<span class="msg-type">${reqType}</span>`);
|
||||
metaParts.push(latency);
|
||||
if (metaParts.length) {
|
||||
const metaEl = document.createElement("div");
|
||||
metaEl.className = "msg-meta";
|
||||
metaEl.innerHTML = metaParts.join(" · ");
|
||||
assistantBubble.parentNode.appendChild(metaEl);
|
||||
}
|
||||
|
||||
STATE.messages.push({ role: "assistant", content: fullContent });
|
||||
STATE.msgCount++;
|
||||
renderInfo();
|
||||
renderGateStatus();
|
||||
} catch (e) {
|
||||
removeThinking();
|
||||
appendMessage("assistant", `Request failed: ${e.message}`);
|
||||
}
|
||||
|
||||
STATE.sending = false;
|
||||
setSendEnabled(true);
|
||||
}
|
||||
|
||||
// ---- input controls --------------------------------------------------
|
||||
function setSendEnabled(enabled) {
|
||||
const ta = document.getElementById("input");
|
||||
const btn = document.getElementById("send-btn");
|
||||
ta.disabled = !enabled || !STATE.connected;
|
||||
btn.disabled = !enabled || !STATE.connected;
|
||||
}
|
||||
|
||||
function setHint(text) {
|
||||
document.getElementById("input-hint").textContent = text;
|
||||
}
|
||||
|
||||
// ---- scenario selection ----------------------------------------------
|
||||
function activateScenario(id) {
|
||||
STATE.scenario = id;
|
||||
document.querySelectorAll(".sc-btn[data-id]").forEach(b => {
|
||||
b.classList.toggle("active", b.dataset.id === id);
|
||||
});
|
||||
|
||||
newSession();
|
||||
clearMessages();
|
||||
|
||||
const s = SCENARIOS[id];
|
||||
if (s.system) STATE.messages.push({ role: "system", content: s.system });
|
||||
|
||||
const ta = document.getElementById("input");
|
||||
ta.value = s.starter;
|
||||
ta.style.height = "auto";
|
||||
ta.style.height = Math.min(ta.scrollHeight, 160) + "px";
|
||||
ta.focus();
|
||||
|
||||
renderInfo();
|
||||
setHint(`Scenario: "${s.label}". Edit the starter if you like, then hit Send.`);
|
||||
}
|
||||
|
||||
// ---- connect ---------------------------------------------------------
|
||||
function connect() {
|
||||
const url = document.getElementById("proxy-url").value.trim().replace(/\/$/, "");
|
||||
const key = document.getElementById("api-key").value.trim();
|
||||
const router = document.getElementById("router").value.trim();
|
||||
|
||||
if (!url || !key || !router) {
|
||||
alert("Please fill in Proxy URL, API Key, and Router name.");
|
||||
return;
|
||||
}
|
||||
|
||||
STATE.proxyUrl = url;
|
||||
STATE.apiKey = key;
|
||||
STATE.router = router;
|
||||
STATE.connected = true;
|
||||
|
||||
ssSet("ar_proxy_url", url);
|
||||
ssSet("ar_api_key", key);
|
||||
ssSet("ar_router", router);
|
||||
|
||||
setConn(true);
|
||||
setSendEnabled(true);
|
||||
setHint("Pick a scenario above or type your own message.");
|
||||
newSession();
|
||||
appendEmptyState();
|
||||
renderInfo();
|
||||
}
|
||||
|
||||
// ---- textarea auto-resize & keyboard submit --------------------------
|
||||
document.getElementById("input").addEventListener("input", function () {
|
||||
this.style.height = "auto";
|
||||
this.style.height = Math.min(this.scrollHeight, 160) + "px";
|
||||
});
|
||||
|
||||
document.getElementById("input").addEventListener("keydown", function (e) {
|
||||
if (e.key === "Enter" && !e.shiftKey) {
|
||||
e.preventDefault();
|
||||
doSend();
|
||||
}
|
||||
});
|
||||
|
||||
function doSend() {
|
||||
const ta = document.getElementById("input");
|
||||
const text = ta.value.trim();
|
||||
if (!text) return;
|
||||
ta.value = "";
|
||||
ta.style.height = "auto";
|
||||
sendMessage(text);
|
||||
}
|
||||
|
||||
// ---- wiring ----------------------------------------------------------
|
||||
document.getElementById("connect-btn").addEventListener("click", connect);
|
||||
document.getElementById("send-btn").addEventListener("click", doSend);
|
||||
document.getElementById("new-chat-btn").addEventListener("click", () => {
|
||||
STATE.scenario = null;
|
||||
document.querySelectorAll(".sc-btn[data-id]").forEach(b => b.classList.remove("active"));
|
||||
newSession();
|
||||
clearMessages();
|
||||
appendEmptyState();
|
||||
const ta = document.getElementById("input");
|
||||
ta.value = "";
|
||||
ta.style.height = "auto";
|
||||
setHint("Type anything — the router will classify it and pick the best model.");
|
||||
renderInfo();
|
||||
});
|
||||
|
||||
document.querySelectorAll(".sc-btn[data-id]").forEach(btn => {
|
||||
btn.addEventListener("click", () => {
|
||||
if (!STATE.connected) {
|
||||
alert("Connect to the proxy first (fill in the form above and click Connect).");
|
||||
return;
|
||||
}
|
||||
activateScenario(btn.dataset.id);
|
||||
});
|
||||
});
|
||||
|
||||
// ---- restore session storage -----------------------------------------
|
||||
window.addEventListener("DOMContentLoaded", () => {
|
||||
const u = ssGet("ar_proxy_url"); if (u) document.getElementById("proxy-url").value = u;
|
||||
const k = ssGet("ar_api_key"); if (k) document.getElementById("api-key").value = k;
|
||||
const r = ssGet("ar_router"); if (r) document.getElementById("router").value = r;
|
||||
newSession();
|
||||
renderInfo();
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
635
scripts/adaptive_router_demo/dashboard.html
Normal file
635
scripts/adaptive_router_demo/dashboard.html
Normal file
@ -0,0 +1,635 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<title>Adaptive Router — Live</title>
|
||||
<style>
|
||||
:root {
|
||||
--bg: #0b0f17;
|
||||
--panel: #131a26;
|
||||
--panel-2: #1b2433;
|
||||
--fg: #e7ecf3;
|
||||
--muted: #8a95a8;
|
||||
--accent: #5dd6a4;
|
||||
--accent-2: #6fb6ff;
|
||||
--warn: #f6b94d;
|
||||
--bad: #ff6b6b;
|
||||
--bar-bg: #233047;
|
||||
--border: #25324a;
|
||||
}
|
||||
* { box-sizing: border-box; }
|
||||
body {
|
||||
margin: 0;
|
||||
font-family: -apple-system, BlinkMacSystemFont, "SF Pro Display",
|
||||
"Segoe UI", Roboto, Inter, sans-serif;
|
||||
background: var(--bg);
|
||||
color: var(--fg);
|
||||
font-size: 14px;
|
||||
line-height: 1.45;
|
||||
}
|
||||
header {
|
||||
padding: 18px 28px;
|
||||
border-bottom: 1px solid var(--border);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 16px;
|
||||
background: var(--panel);
|
||||
}
|
||||
header h1 {
|
||||
margin: 0;
|
||||
font-size: 18px;
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.2px;
|
||||
}
|
||||
header .dot {
|
||||
width: 8px; height: 8px; border-radius: 50%;
|
||||
background: var(--bad); display: inline-block; margin-right: 6px;
|
||||
}
|
||||
header .dot.ok { background: var(--accent); }
|
||||
header .status { color: var(--muted); font-size: 12px; }
|
||||
|
||||
.connect {
|
||||
padding: 16px 28px;
|
||||
display: flex; gap: 12px; align-items: center;
|
||||
background: var(--panel-2);
|
||||
border-bottom: 1px solid var(--border);
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
.connect input, .connect select {
|
||||
background: #0e1422;
|
||||
color: var(--fg);
|
||||
border: 1px solid var(--border);
|
||||
padding: 7px 10px;
|
||||
border-radius: 6px;
|
||||
font-size: 13px;
|
||||
font-family: inherit;
|
||||
}
|
||||
.connect input[type=text] { width: 240px; }
|
||||
.connect input[type=password] { width: 200px; }
|
||||
.connect input[type=number] { width: 70px; }
|
||||
.connect button {
|
||||
background: var(--accent-2);
|
||||
color: #0b0f17;
|
||||
border: none;
|
||||
padding: 7px 14px;
|
||||
border-radius: 6px;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
}
|
||||
.connect label { color: var(--muted); font-size: 12px; }
|
||||
|
||||
main {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 360px;
|
||||
gap: 20px;
|
||||
padding: 20px 28px 40px;
|
||||
max-width: 1400px;
|
||||
}
|
||||
@media (max-width: 1000px) { main { grid-template-columns: 1fr; } }
|
||||
|
||||
.panel {
|
||||
background: var(--panel);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 10px;
|
||||
padding: 18px 20px;
|
||||
}
|
||||
.panel h2 {
|
||||
margin: 0 0 14px;
|
||||
font-size: 13px;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.6px;
|
||||
color: var(--muted);
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.rt-group {
|
||||
margin-bottom: 18px;
|
||||
padding-bottom: 14px;
|
||||
border-bottom: 1px dashed var(--border);
|
||||
}
|
||||
.rt-group:last-child { border-bottom: none; margin-bottom: 0; }
|
||||
|
||||
.rt-title {
|
||||
font-weight: 600;
|
||||
font-size: 13px;
|
||||
color: var(--fg);
|
||||
margin-bottom: 8px;
|
||||
display: flex; justify-content: space-between;
|
||||
}
|
||||
.rt-title .meta { color: var(--muted); font-weight: 400; font-size: 12px; }
|
||||
|
||||
.row {
|
||||
display: grid;
|
||||
grid-template-columns: 80px 1fr 130px;
|
||||
align-items: center;
|
||||
gap: 10px;
|
||||
margin-bottom: 4px;
|
||||
font-size: 13px;
|
||||
}
|
||||
.row .name { color: var(--muted); font-family: ui-monospace, monospace; }
|
||||
.row .name.lead { color: var(--accent); font-weight: 600; }
|
||||
.row .num { color: var(--muted); font-family: ui-monospace, monospace;
|
||||
text-align: right; font-size: 12px; }
|
||||
|
||||
.bar { height: 14px; background: var(--bar-bg); border-radius: 4px;
|
||||
position: relative; overflow: hidden; }
|
||||
.bar > .fill {
|
||||
height: 100%;
|
||||
background: linear-gradient(90deg, var(--accent), var(--accent-2));
|
||||
border-radius: 4px;
|
||||
transition: width 0.4s ease;
|
||||
}
|
||||
.bar > .conf {
|
||||
position: absolute; top: 0; bottom: 0; width: 1px;
|
||||
background: rgba(255,255,255,0.4);
|
||||
}
|
||||
.bar > .conf.lo { background: rgba(255,255,255,0.5); }
|
||||
.bar > .conf.hi { background: rgba(255,255,255,0.5); }
|
||||
|
||||
.pick-pct {
|
||||
margin-top: 4px;
|
||||
font-size: 11px;
|
||||
color: var(--muted);
|
||||
padding-left: 90px;
|
||||
}
|
||||
|
||||
.cost-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 12px; }
|
||||
.cost-card {
|
||||
background: var(--panel-2);
|
||||
border-radius: 8px;
|
||||
padding: 14px 16px;
|
||||
}
|
||||
.cost-card .label { color: var(--muted); font-size: 11px;
|
||||
text-transform: uppercase; letter-spacing: 0.5px; }
|
||||
.cost-card .value { font-size: 22px; font-weight: 600; margin-top: 4px;
|
||||
font-family: ui-monospace, monospace; }
|
||||
.cost-card .value.big { font-size: 28px; }
|
||||
.cost-card .sub { color: var(--muted); font-size: 12px; margin-top: 2px; }
|
||||
.cost-card.good { border: 1px solid rgba(93, 214, 164, 0.35); }
|
||||
.cost-card.warn { border: 1px solid rgba(246, 185, 77, 0.35); }
|
||||
.cost-card.bad { border: 1px solid rgba(255, 107, 107, 0.35); }
|
||||
.savings {
|
||||
margin-top: 12px;
|
||||
padding: 10px 14px;
|
||||
background: rgba(93, 214, 164, 0.08);
|
||||
border: 1px solid rgba(93, 214, 164, 0.3);
|
||||
border-radius: 8px;
|
||||
color: var(--accent);
|
||||
font-weight: 600;
|
||||
font-size: 14px;
|
||||
}
|
||||
.savings.warn {
|
||||
background: rgba(246, 185, 77, 0.08);
|
||||
border-color: rgba(246, 185, 77, 0.3);
|
||||
color: var(--warn);
|
||||
}
|
||||
.savings.bad {
|
||||
background: rgba(255, 107, 107, 0.08);
|
||||
border-color: rgba(255, 107, 107, 0.3);
|
||||
color: var(--bad);
|
||||
}
|
||||
.savings .verdict-sub {
|
||||
display: block;
|
||||
color: var(--muted);
|
||||
font-weight: 400;
|
||||
font-size: 12px;
|
||||
margin-top: 4px;
|
||||
}
|
||||
.panel-explainer {
|
||||
margin: -8px 0 14px;
|
||||
color: var(--muted);
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
padding: 10px 12px;
|
||||
background: var(--panel-2);
|
||||
border-radius: 6px;
|
||||
border-left: 3px solid var(--accent-2);
|
||||
}
|
||||
.panel-explainer b { color: var(--fg); font-weight: 600; }
|
||||
|
||||
.activity {
|
||||
max-height: 360px; overflow-y: auto;
|
||||
font-family: ui-monospace, monospace;
|
||||
font-size: 12px;
|
||||
}
|
||||
.activity-row {
|
||||
padding: 6px 8px;
|
||||
border-bottom: 1px solid var(--border);
|
||||
color: var(--muted);
|
||||
display: grid;
|
||||
grid-template-columns: 70px 1fr;
|
||||
gap: 8px;
|
||||
}
|
||||
.activity-row .ts { color: #4f5d77; }
|
||||
.activity-row .alpha { color: var(--accent); }
|
||||
.activity-row .beta { color: var(--bad); }
|
||||
|
||||
.queue {
|
||||
display: flex; gap: 16px; flex-wrap: wrap;
|
||||
font-size: 12px; color: var(--muted);
|
||||
margin-top: 8px;
|
||||
}
|
||||
.queue span b { color: var(--fg); font-weight: 600; }
|
||||
|
||||
.empty {
|
||||
color: var(--muted); font-style: italic;
|
||||
text-align: center; padding: 20px;
|
||||
}
|
||||
|
||||
.pill {
|
||||
background: var(--panel-2);
|
||||
color: var(--muted);
|
||||
padding: 3px 8px;
|
||||
border-radius: 12px;
|
||||
font-size: 11px;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<header>
|
||||
<h1>⚡ Adaptive Router — Live</h1>
|
||||
<span class="status"><span id="conn-dot" class="dot"></span><span id="conn-label">Disconnected</span></span>
|
||||
<span id="poll-info" class="status"></span>
|
||||
</header>
|
||||
|
||||
<div class="connect">
|
||||
<label>Proxy URL <input id="proxy-url" type="text" value="http://localhost:4000" /></label>
|
||||
<label>Master Key <input id="api-key" type="password" placeholder="sk-1234" /></label>
|
||||
<label>Avg tokens/req <input id="avg-tokens" type="number" value="500" min="1" /></label>
|
||||
<label>Poll ms <input id="poll-ms" type="number" value="500" min="100" /></label>
|
||||
<button id="connect-btn">Connect</button>
|
||||
<select id="router-select" style="display:none;"></select>
|
||||
</div>
|
||||
|
||||
<main>
|
||||
<section class="panel" id="bandit-panel">
|
||||
<h2>How well each model performs, by request type</h2>
|
||||
<div class="panel-explainer">
|
||||
Each bar shows the <b>fraction of recent feedback that was positive</b>
|
||||
for that model on that kind of request. Wider = better. The number
|
||||
next to it (<b>"N signals"</b>) is how much real feedback the bar is
|
||||
based on — more signals means the router is more confident.
|
||||
It picks higher-quality bars first, with cost as a tiebreaker.
|
||||
</div>
|
||||
<div id="cells" class="empty">Connect to see live bandit state.</div>
|
||||
</section>
|
||||
|
||||
<aside style="display:flex; flex-direction:column; gap:20px;">
|
||||
<section class="panel">
|
||||
<h2>Are the savings worth it?</h2>
|
||||
<div class="panel-explainer">
|
||||
<b>Cost saved</b> is what you spent vs. always picking the most
|
||||
expensive model. <b>Quality kept</b> is the average quality of
|
||||
the model that was actually picked, divided by the average
|
||||
quality of the best-known model for each request type.
|
||||
<i>If quality kept stays high while cost saved is high, you're
|
||||
winning. If quality drops fast, you're saving money but making
|
||||
users mad.</i>
|
||||
</div>
|
||||
<div class="cost-grid">
|
||||
<div class="cost-card good">
|
||||
<div class="label">💰 Cost saved</div>
|
||||
<div class="value big" id="metric-cost-pct">—</div>
|
||||
<div class="sub" id="metric-cost-sub">no traffic yet</div>
|
||||
</div>
|
||||
<div class="cost-card good">
|
||||
<div class="label">⭐ Quality kept</div>
|
||||
<div class="value big" id="metric-quality-pct">—</div>
|
||||
<div class="sub" id="metric-quality-sub">vs best-known model</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="savings" id="verdict">Send some traffic to see how the router is balancing cost and quality.</div>
|
||||
<div class="queue" id="queue-info"></div>
|
||||
</section>
|
||||
|
||||
<section class="panel">
|
||||
<h2>Activity (last 30)</h2>
|
||||
<div id="activity" class="activity empty">Waiting for signals…</div>
|
||||
</section>
|
||||
</aside>
|
||||
</main>
|
||||
|
||||
<script>
|
||||
// ---- helpers ---------------------------------------------------------
|
||||
const REQ_TYPE_ORDER = [
|
||||
"code_generation",
|
||||
"code_understanding",
|
||||
"technical_design",
|
||||
"analytical_reasoning",
|
||||
"writing",
|
||||
"factual_lookup",
|
||||
"general",
|
||||
];
|
||||
|
||||
function fmtPct(n) { return (n * 100).toFixed(0) + "%"; }
|
||||
function fmtUSD(n) { return "$" + n.toFixed(4); }
|
||||
function nowTS() {
|
||||
const d = new Date();
|
||||
return d.toTimeString().slice(0, 8);
|
||||
}
|
||||
function ssGet(k) { try { return sessionStorage.getItem(k) || ""; } catch { return ""; } }
|
||||
function ssSet(k, v) { try { sessionStorage.setItem(k, v); } catch {} }
|
||||
|
||||
// ---- state -----------------------------------------------------------
|
||||
const STATE = {
|
||||
proxyUrl: "",
|
||||
apiKey: "",
|
||||
pollMs: 500,
|
||||
avgTokens: 500,
|
||||
timer: null,
|
||||
routers: [], // last snapshot list
|
||||
selectedRouter: null, // name
|
||||
prevCells: new Map(), // (router, rt, model) -> {alpha,beta,samples}
|
||||
costAdaptive: 0,
|
||||
costBaseline: 0,
|
||||
totalRequests: 0,
|
||||
activity: [], // [{ts, msg, kind}]
|
||||
};
|
||||
|
||||
// ---- rendering -------------------------------------------------------
|
||||
function renderRouters(routers) {
|
||||
const sel = document.getElementById("router-select");
|
||||
if (routers.length <= 1) {
|
||||
sel.style.display = "none";
|
||||
} else {
|
||||
sel.style.display = "";
|
||||
if (sel.options.length !== routers.length) {
|
||||
sel.innerHTML = "";
|
||||
for (const r of routers) {
|
||||
const opt = document.createElement("option");
|
||||
opt.value = r.router_name; opt.textContent = r.router_name;
|
||||
sel.appendChild(opt);
|
||||
}
|
||||
sel.value = STATE.selectedRouter || routers[0].router_name;
|
||||
}
|
||||
}
|
||||
if (!STATE.selectedRouter) STATE.selectedRouter = routers[0].router_name;
|
||||
}
|
||||
|
||||
function pickShare(rows) {
|
||||
// Approximate prob each model wins a Thompson-sample draw against the others.
|
||||
// Simple proxy: softmax over quality_mean with temperature=0.05.
|
||||
if (rows.length === 0) return {};
|
||||
const T = 0.05;
|
||||
const expv = rows.map(r => Math.exp(r.quality_mean / T));
|
||||
const sum = expv.reduce((a, b) => a + b, 0);
|
||||
const out = {};
|
||||
rows.forEach((r, i) => out[r.model] = expv[i] / sum);
|
||||
return out;
|
||||
}
|
||||
|
||||
function renderCells(router) {
|
||||
const container = document.getElementById("cells");
|
||||
container.classList.remove("empty");
|
||||
const byType = new Map();
|
||||
for (const c of router.cells) {
|
||||
if (!byType.has(c.request_type)) byType.set(c.request_type, []);
|
||||
byType.get(c.request_type).push(c);
|
||||
}
|
||||
const order = REQ_TYPE_ORDER.filter(t => byType.has(t));
|
||||
for (const t of byType.keys()) if (!order.includes(t)) order.push(t);
|
||||
|
||||
let html = "";
|
||||
for (const rt of order) {
|
||||
const rows = byType.get(rt).sort((a, b) => b.quality_mean - a.quality_mean);
|
||||
const shares = pickShare(rows);
|
||||
const lead = rows[0];
|
||||
html += `<div class="rt-group">`;
|
||||
html += `<div class="rt-title"><span>${rt}</span>`;
|
||||
html += `<span class="meta">${rows.reduce((s, r) => s + (r.samples - 10), 0)} learning signals</span>`;
|
||||
html += `</div>`;
|
||||
for (const r of rows) {
|
||||
const pct = fmtPct(r.quality_mean);
|
||||
const share = fmtPct(shares[r.model] || 0);
|
||||
const observed = Math.max(0, r.samples - 10); // strip cold-start prior mass
|
||||
const isLead = r.model === lead.model;
|
||||
const sigLabel = observed === 0 ? "no signals yet" : `${observed.toFixed(0)} signals`;
|
||||
// Tooltip exposes raw Beta(α,β) for power users.
|
||||
const tip = `Beta(α=${r.alpha.toFixed(1)}, β=${r.beta.toFixed(1)}) — ` +
|
||||
`started at α=5,β=5 (cold-start prior), so the bar reflects ` +
|
||||
`${observed.toFixed(0)} real feedback signals so far.`;
|
||||
html += `<div class="row" title="${tip}">`;
|
||||
html += `<div class="name ${isLead ? 'lead' : ''}">${r.model}</div>`;
|
||||
html += `<div class="bar"><div class="fill" style="width:${(r.quality_mean*100).toFixed(1)}%"></div></div>`;
|
||||
html += `<div class="num">${pct} good · ${sigLabel}</div>`;
|
||||
html += `</div>`;
|
||||
html += `<div class="pick-pct">→ ${share} of picks (Thompson softmax estimate)</div>`;
|
||||
}
|
||||
html += `</div>`;
|
||||
}
|
||||
container.innerHTML = html;
|
||||
}
|
||||
|
||||
function computeQualityKept(router) {
|
||||
// For each request type: pick_count_per_cell × quality_mean_per_cell
|
||||
// summed and divided by total picks gives "average quality delivered".
|
||||
// Compare against best-cell quality per type (weighted by picks in that type).
|
||||
const byType = new Map();
|
||||
for (const c of router.cells) {
|
||||
if (!byType.has(c.request_type)) byType.set(c.request_type, []);
|
||||
byType.get(c.request_type).push(c);
|
||||
}
|
||||
let totalPicks = 0, weightedDelivered = 0, weightedBest = 0;
|
||||
for (const cells of byType.values()) {
|
||||
const bestQ = Math.max(...cells.map(c => c.quality_mean));
|
||||
for (const c of cells) {
|
||||
const picks = Math.max(0, c.samples - 10);
|
||||
if (picks === 0) continue;
|
||||
totalPicks += picks;
|
||||
weightedDelivered += picks * c.quality_mean;
|
||||
weightedBest += picks * bestQ;
|
||||
}
|
||||
}
|
||||
if (totalPicks === 0 || weightedBest === 0) return null;
|
||||
return {
|
||||
delivered: weightedDelivered / totalPicks,
|
||||
best: weightedBest / totalPicks,
|
||||
keptPct: weightedDelivered / weightedBest,
|
||||
totalPicks,
|
||||
};
|
||||
}
|
||||
|
||||
function renderTradeoff(router) {
|
||||
const costEl = document.getElementById("metric-cost-pct");
|
||||
const costSub = document.getElementById("metric-cost-sub");
|
||||
const qualEl = document.getElementById("metric-quality-pct");
|
||||
const qualSub = document.getElementById("metric-quality-sub");
|
||||
const verdict = document.getElementById("verdict");
|
||||
|
||||
// ---- Cost side ---------------------------------------------------
|
||||
let costSavedPct = null;
|
||||
if (STATE.costBaseline > 0) {
|
||||
costSavedPct = 1 - STATE.costAdaptive / STATE.costBaseline;
|
||||
costEl.textContent = (costSavedPct * 100).toFixed(0) + "%";
|
||||
costSub.textContent = `${fmtUSD(STATE.costAdaptive)} spent vs ${fmtUSD(STATE.costBaseline)} baseline`;
|
||||
} else {
|
||||
costEl.textContent = "—";
|
||||
costSub.textContent = "no traffic yet";
|
||||
}
|
||||
|
||||
// ---- Quality side ------------------------------------------------
|
||||
const q = computeQualityKept(router);
|
||||
if (q) {
|
||||
qualEl.textContent = (q.keptPct * 100).toFixed(0) + "%";
|
||||
qualSub.textContent =
|
||||
`delivered ${(q.delivered*100).toFixed(0)}% vs best-known ${(q.best*100).toFixed(0)}%`;
|
||||
} else {
|
||||
qualEl.textContent = "—";
|
||||
qualSub.textContent = "vs best-known model";
|
||||
}
|
||||
|
||||
// ---- Color-code the cards ----------------------------------------
|
||||
const costCard = costEl.closest(".cost-card");
|
||||
const qualCard = qualEl.closest(".cost-card");
|
||||
costCard.className = "cost-card " + (costSavedPct === null ? "good"
|
||||
: costSavedPct >= 0.30 ? "good"
|
||||
: costSavedPct >= 0.05 ? "warn" : "bad");
|
||||
qualCard.className = "cost-card " + (!q ? "good"
|
||||
: q.keptPct >= 0.90 ? "good"
|
||||
: q.keptPct >= 0.75 ? "warn" : "bad");
|
||||
|
||||
// ---- Verdict line ------------------------------------------------
|
||||
if (costSavedPct === null || !q) {
|
||||
verdict.className = "savings";
|
||||
verdict.textContent = "Send some traffic to see how the router is balancing cost and quality.";
|
||||
return;
|
||||
}
|
||||
const savedTxt = costSavedPct >= 0
|
||||
? `${(costSavedPct*100).toFixed(0)}% cheaper`
|
||||
: `${((-costSavedPct)*100).toFixed(0)}% MORE expensive (still exploring)`;
|
||||
const qualLost = (1 - q.keptPct) * 100;
|
||||
let line, cls;
|
||||
if (q.keptPct >= 0.95 && costSavedPct >= 0.30) {
|
||||
cls = "savings"; line = `✅ Big win: ${savedTxt}, lost only ${qualLost.toFixed(0)}% quality.`;
|
||||
} else if (q.keptPct >= 0.85 && costSavedPct >= 0.10) {
|
||||
cls = "savings"; line = `✅ Good trade: ${savedTxt}, gave up ${qualLost.toFixed(0)}% quality.`;
|
||||
} else if (q.keptPct >= 0.75) {
|
||||
cls = "savings warn"; line = `⚠️ Mixed: ${savedTxt}, but ${qualLost.toFixed(0)}% quality lost. Consider raising the quality weight.`;
|
||||
} else {
|
||||
cls = "savings bad"; line = `❌ Saving money, hurting users: ${savedTxt} but ${qualLost.toFixed(0)}% quality lost. Raise quality weight in the router config.`;
|
||||
}
|
||||
verdict.className = cls;
|
||||
verdict.innerHTML = line +
|
||||
`<span class="verdict-sub">Based on ${q.totalPicks} feedback signals across ${STATE.totalRequests} routed requests.</span>`;
|
||||
}
|
||||
|
||||
function renderQueue(router) {
|
||||
const q = router.queue || {};
|
||||
document.getElementById("queue-info").innerHTML =
|
||||
`<span>state pending: <b>${q.state_pending ?? 0}</b></span>` +
|
||||
`<span>session pending: <b>${q.session_pending ?? 0}</b></span>` +
|
||||
`<span>sticky live: <b>${router.sticky_sessions_live ?? 0}</b></span>` +
|
||||
`<span>weights: q=<b>${router.weights?.quality ?? "?"}</b> c=<b>${router.weights?.cost ?? "?"}</b></span>`;
|
||||
}
|
||||
|
||||
function renderActivity() {
|
||||
const el = document.getElementById("activity");
|
||||
if (STATE.activity.length === 0) {
|
||||
el.classList.add("empty");
|
||||
el.textContent = "Waiting for signals…";
|
||||
return;
|
||||
}
|
||||
el.classList.remove("empty");
|
||||
el.innerHTML = STATE.activity.map(a =>
|
||||
`<div class="activity-row"><span class="ts">${a.ts}</span><span>${a.msg}</span></div>`
|
||||
).join("");
|
||||
}
|
||||
|
||||
// ---- diff & cost accounting -----------------------------------------
|
||||
function processDiff(router, costsByModel) {
|
||||
const maxCost = Math.max(0, ...Object.values(costsByModel));
|
||||
for (const c of router.cells) {
|
||||
const key = `${router.router_name}|${c.request_type}|${c.model}`;
|
||||
const prev = STATE.prevCells.get(key);
|
||||
if (prev) {
|
||||
const dA = c.alpha - prev.alpha;
|
||||
const dB = c.beta - prev.beta;
|
||||
const dPicks = (c.samples - 10) - (prev.samples - 10);
|
||||
if (dA > 0.001 || dB > 0.001) {
|
||||
const tag = dA > dB
|
||||
? `<span class="alpha">+${dA.toFixed(0)} 👍</span>`
|
||||
: `<span class="beta">+${dB.toFixed(0)} 👎</span>`;
|
||||
const qNow = (c.alpha / (c.alpha + c.beta) * 100).toFixed(0);
|
||||
STATE.activity.unshift({
|
||||
ts: nowTS(),
|
||||
msg: `${c.request_type} → <b>${c.model}</b> ${tag} (now ${qNow}% good)`,
|
||||
});
|
||||
STATE.activity = STATE.activity.slice(0, 30);
|
||||
}
|
||||
if (dPicks > 0) {
|
||||
const cost = costsByModel[c.model] || 0;
|
||||
STATE.costAdaptive += dPicks * cost * STATE.avgTokens;
|
||||
STATE.costBaseline += dPicks * maxCost * STATE.avgTokens;
|
||||
STATE.totalRequests += dPicks;
|
||||
}
|
||||
}
|
||||
STATE.prevCells.set(key, {
|
||||
alpha: c.alpha, beta: c.beta, samples: c.samples
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// ---- polling ---------------------------------------------------------
|
||||
async function pollOnce() {
|
||||
try {
|
||||
const r = await fetch(`${STATE.proxyUrl}/adaptive_router/state`, {
|
||||
headers: { "Authorization": `Bearer ${STATE.apiKey}` },
|
||||
});
|
||||
if (!r.ok) {
|
||||
setConn(false, `HTTP ${r.status}`);
|
||||
return;
|
||||
}
|
||||
const data = await r.json();
|
||||
setConn(true, `Polling every ${STATE.pollMs}ms`);
|
||||
STATE.routers = data.routers || [];
|
||||
renderRouters(STATE.routers);
|
||||
const router = STATE.routers.find(r => r.router_name === STATE.selectedRouter)
|
||||
|| STATE.routers[0];
|
||||
if (!router) return;
|
||||
processDiff(router, router.model_costs || {});
|
||||
renderCells(router);
|
||||
renderQueue(router);
|
||||
renderTradeoff(router);
|
||||
renderActivity();
|
||||
} catch (e) {
|
||||
setConn(false, e.message);
|
||||
}
|
||||
}
|
||||
|
||||
function setConn(ok, msg) {
|
||||
document.getElementById("conn-dot").className = "dot" + (ok ? " ok" : "");
|
||||
document.getElementById("conn-label").textContent = ok ? "Connected" : "Disconnected";
|
||||
document.getElementById("poll-info").textContent = msg || "";
|
||||
}
|
||||
|
||||
function startPolling() {
|
||||
if (STATE.timer) clearInterval(STATE.timer);
|
||||
pollOnce();
|
||||
STATE.timer = setInterval(pollOnce, STATE.pollMs);
|
||||
}
|
||||
|
||||
// ---- wiring ----------------------------------------------------------
|
||||
document.getElementById("connect-btn").addEventListener("click", () => {
|
||||
STATE.proxyUrl = document.getElementById("proxy-url").value.trim().replace(/\/$/, "");
|
||||
STATE.apiKey = document.getElementById("api-key").value.trim();
|
||||
STATE.pollMs = parseInt(document.getElementById("poll-ms").value, 10) || 500;
|
||||
STATE.avgTokens = parseInt(document.getElementById("avg-tokens").value, 10) || 500;
|
||||
ssSet("ar_proxy_url", STATE.proxyUrl);
|
||||
ssSet("ar_api_key", STATE.apiKey);
|
||||
startPolling();
|
||||
});
|
||||
|
||||
document.getElementById("router-select").addEventListener("change", (e) => {
|
||||
STATE.selectedRouter = e.target.value;
|
||||
STATE.prevCells.clear();
|
||||
});
|
||||
|
||||
window.addEventListener("DOMContentLoaded", () => {
|
||||
const u = ssGet("ar_proxy_url"); if (u) document.getElementById("proxy-url").value = u;
|
||||
const k = ssGet("ar_api_key"); if (k) document.getElementById("api-key").value = k;
|
||||
});
|
||||
</script>
|
||||
|
||||
</body>
|
||||
</html>
|
||||
271
scripts/adaptive_router_demo/eval.py
Normal file
271
scripts/adaptive_router_demo/eval.py
Normal file
@ -0,0 +1,271 @@
|
||||
# ruff: noqa: T201
|
||||
"""
|
||||
Adaptive router evaluator — LLM-as-judge harness.
|
||||
|
||||
For each test case:
|
||||
1. Sends the prompt to the adaptive router.
|
||||
2. Reads which model was picked (x-litellm-adaptive-router-model header).
|
||||
3. Asks the judge model whether the response meets the ideal criteria.
|
||||
4. Prints PASS or FAIL with one line of reasoning.
|
||||
|
||||
Run:
|
||||
uv run python scripts/adaptive_router_demo/eval.py \
|
||||
--proxy-url http://localhost:4000 \
|
||||
--api-key sk-1234 \
|
||||
--router smart-cheap-router \
|
||||
--judge-model smart
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import sys
|
||||
import uuid
|
||||
from dataclasses import dataclass
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test cases
|
||||
# ---------------------------------------------------------------------------
|
||||
@dataclass
|
||||
class EvalCase:
|
||||
category: str
|
||||
prompt: str
|
||||
ideal: str # criteria the judge checks the response against
|
||||
|
||||
|
||||
EVAL_CASES: List[EvalCase] = [
|
||||
# code_generation
|
||||
EvalCase(
|
||||
category="code_generation",
|
||||
prompt="Write a Python function that flattens a nested list of arbitrary depth.",
|
||||
ideal=(
|
||||
"A Python function (def flatten(...)) that accepts a list which may "
|
||||
"contain nested lists to arbitrary depth and returns a single flat list "
|
||||
"with all elements in order. Must handle at least two levels of nesting."
|
||||
),
|
||||
),
|
||||
EvalCase(
|
||||
category="code_generation",
|
||||
prompt="Write a Python decorator that retries a function up to 3 times on exception.",
|
||||
ideal=(
|
||||
"A Python decorator that wraps a callable, catches exceptions, and "
|
||||
"retries the call up to 3 times before re-raising. Should use functools.wraps "
|
||||
"or equivalent to preserve the wrapped function's metadata."
|
||||
),
|
||||
),
|
||||
EvalCase(
|
||||
category="code_generation",
|
||||
prompt="Write a SQL query that returns the top 5 customers by total order value.",
|
||||
ideal=(
|
||||
"A valid SQL SELECT query that JOINs an orders or order_items table with a "
|
||||
"customers table, groups by customer, sums order value, orders descending, "
|
||||
"and limits to 5 rows."
|
||||
),
|
||||
),
|
||||
# factual_lookup
|
||||
EvalCase(
|
||||
category="factual_lookup",
|
||||
prompt="What is the capital of New Zealand?",
|
||||
ideal="The answer must state Wellington as the capital of New Zealand.",
|
||||
),
|
||||
EvalCase(
|
||||
category="factual_lookup",
|
||||
prompt="In what year did World War II end?",
|
||||
ideal="The answer must state 1945 as the year World War II ended.",
|
||||
),
|
||||
EvalCase(
|
||||
category="factual_lookup",
|
||||
prompt="What is the chemical symbol for gold?",
|
||||
ideal="The answer must include 'Au' as the chemical symbol for gold.",
|
||||
),
|
||||
# writing
|
||||
EvalCase(
|
||||
category="writing",
|
||||
prompt=(
|
||||
"Write a short, polite email declining a meeting request because of "
|
||||
"a scheduling conflict."
|
||||
),
|
||||
ideal=(
|
||||
"A professional email that: (1) thanks the sender for the invitation, "
|
||||
"(2) clearly declines, (3) mentions a scheduling conflict as the reason, "
|
||||
"and (4) offers to reschedule or an alternative. Tone must be polite."
|
||||
),
|
||||
),
|
||||
EvalCase(
|
||||
category="writing",
|
||||
prompt="Write a one-paragraph product description for noise-cancelling headphones.",
|
||||
ideal=(
|
||||
"A marketing paragraph for noise-cancelling headphones that mentions "
|
||||
"noise cancellation as a feature, highlights at least one other benefit "
|
||||
"(comfort, audio quality, battery life, or similar), and ends with a "
|
||||
"persuasive call to action or closing statement."
|
||||
),
|
||||
),
|
||||
]
|
||||
|
||||
# Matches the satisfaction regex in signals.py (_SATISFACTION_PATTERNS).
|
||||
SATISFY_FOLLOWUP = "great, thanks!"
|
||||
NEUTRAL_FOLLOWUP = "ok, noted"
|
||||
FAB_ASSISTANT = "Got it. Working on that now."
|
||||
|
||||
JUDGE_SYSTEM = (
|
||||
"You are a strict but fair evaluator. Your job is to decide whether a model "
|
||||
"response meets the stated requirements. Reply with exactly two lines:\n"
|
||||
"Line 1: PASS or FAIL\n"
|
||||
"Line 2: One sentence of reasoning (≤ 25 words)."
|
||||
)
|
||||
|
||||
|
||||
def _judge_user(prompt: str, ideal: str, actual: str) -> str:
|
||||
return (
|
||||
f"Question sent to model:\n{prompt}\n\n"
|
||||
f"Requirements the response must meet:\n{ideal}\n\n"
|
||||
f"Actual model response:\n{actual}\n\n"
|
||||
"Does the response meet the requirements? Reply PASS or FAIL."
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# HTTP helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
async def _chat(
|
||||
client: httpx.AsyncClient,
|
||||
proxy_url: str,
|
||||
api_key: str,
|
||||
model: str,
|
||||
messages: List[Dict[str, str]],
|
||||
session_id: Optional[str] = None,
|
||||
) -> Tuple[str, str]:
|
||||
"""
|
||||
Returns (response_text, chosen_model_header).
|
||||
chosen_model_header is empty for non-router calls.
|
||||
"""
|
||||
body: Dict = {"model": model, "messages": messages}
|
||||
if session_id:
|
||||
body["metadata"] = {"litellm_session_id": session_id}
|
||||
|
||||
resp = await client.post(
|
||||
f"{proxy_url}/v1/chat/completions",
|
||||
json=body,
|
||||
headers={"Authorization": f"Bearer {api_key}"},
|
||||
timeout=60.0,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
text = data["choices"][0]["message"]["content"]
|
||||
chosen = resp.headers.get("x-litellm-adaptive-router-model", "")
|
||||
return text, chosen
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Evaluation loop
|
||||
# ---------------------------------------------------------------------------
|
||||
async def evaluate(
|
||||
proxy_url: str,
|
||||
api_key: str,
|
||||
router: str,
|
||||
judge_model: str,
|
||||
) -> None:
|
||||
passed = 0
|
||||
failed = 0
|
||||
|
||||
async with httpx.AsyncClient() as client:
|
||||
for i, case in enumerate(EVAL_CASES, 1):
|
||||
print(f"\n[{i}/{len(EVAL_CASES)}] category={case.category}")
|
||||
print(f" prompt : {case.prompt[:80]}{'…' if len(case.prompt) > 80 else ''}")
|
||||
|
||||
session_id = f"eval-{uuid.uuid4()}"
|
||||
|
||||
# Round 1: single-turn real request — get the actual LLM response to judge.
|
||||
try:
|
||||
response, chosen = await _chat(
|
||||
client, proxy_url, api_key, router,
|
||||
[{"role": "user", "content": case.prompt}],
|
||||
session_id=session_id,
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
print(f" ERROR calling router: {exc}", file=sys.stderr)
|
||||
failed += 1
|
||||
continue
|
||||
|
||||
print(f" model : {chosen or router}")
|
||||
print(f" response : {response[:120].replace(chr(10), ' ')}{'…' if len(response) > 120 else ''}")
|
||||
|
||||
# Judge the real response.
|
||||
judge_msgs = [
|
||||
{"role": "system", "content": JUDGE_SYSTEM},
|
||||
{"role": "user", "content": _judge_user(case.prompt, case.ideal, response)},
|
||||
]
|
||||
try:
|
||||
verdict, _ = await _chat(
|
||||
client, proxy_url, api_key, judge_model, judge_msgs,
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
print(f" ERROR calling judge: {exc}", file=sys.stderr)
|
||||
failed += 1
|
||||
continue
|
||||
|
||||
# Parse verdict — first non-empty line should be PASS or FAIL.
|
||||
lines = [ln.strip() for ln in verdict.splitlines() if ln.strip()]
|
||||
first = lines[0].upper() if lines else ""
|
||||
reason = lines[1] if len(lines) > 1 else ""
|
||||
is_pass = "PASS" in first
|
||||
|
||||
if is_pass:
|
||||
passed += 1
|
||||
print(f" verdict : \033[32mPASS\033[0m {reason}")
|
||||
else:
|
||||
failed += 1
|
||||
print(f" verdict : \033[31mFAIL\033[0m {reason}")
|
||||
|
||||
# Round 2: 5-message conversation on the same session_id so the bandit fires.
|
||||
# On PASS → satisfaction follow-up (+alpha). On FAIL → neutral (no signal).
|
||||
follow_up = SATISFY_FOLLOWUP if is_pass else NEUTRAL_FOLLOWUP
|
||||
bandit_msgs = [
|
||||
{"role": "user", "content": case.prompt},
|
||||
{"role": "assistant", "content": response},
|
||||
{"role": "user", "content": "ok continue"},
|
||||
{"role": "assistant", "content": FAB_ASSISTANT},
|
||||
{"role": "user", "content": follow_up},
|
||||
]
|
||||
try:
|
||||
await _chat(
|
||||
client, proxy_url, api_key, router, bandit_msgs,
|
||||
session_id=session_id,
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
print(f" WARNING: bandit update failed: {exc}", file=sys.stderr)
|
||||
|
||||
total = passed + failed
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Results: {passed}/{total} passed ({failed} failed)")
|
||||
if passed == total:
|
||||
print("All test cases passed — the adaptive router is working well!")
|
||||
elif passed >= total * 0.8:
|
||||
print("Most test cases passed — minor issues to investigate.")
|
||||
else:
|
||||
print("Significant failures — check router config and model availability.")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="Evaluate the adaptive router with LLM-as-judge.")
|
||||
ap.add_argument("--proxy-url", default="http://localhost:4000")
|
||||
ap.add_argument("--api-key", required=True, help="proxy API key")
|
||||
ap.add_argument("--router", default="smart-cheap-router", help="adaptive router model name")
|
||||
ap.add_argument("--judge-model", default="smart", help="model name for the judge (via proxy)")
|
||||
args = ap.parse_args()
|
||||
|
||||
asyncio.run(evaluate(args.proxy_url, args.api_key, args.router, args.judge_model))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
227
scripts/adaptive_router_demo/traffic.py
Normal file
227
scripts/adaptive_router_demo/traffic.py
Normal file
@ -0,0 +1,227 @@
|
||||
"""
|
||||
Synthetic traffic generator for the adaptive_router demo dashboard.
|
||||
|
||||
What it does:
|
||||
- Sends labeled multi-turn chat requests to the proxy's adaptive router.
|
||||
- For each turn, peeks at the `x-litellm-adaptive-router-model` response
|
||||
header to learn which underlying model was picked.
|
||||
- Draws a Bernoulli outcome from a hard-coded ORACLE table that says
|
||||
"model M succeeds at request type T with probability p".
|
||||
- Sends a final follow-up turn whose user message is engineered to
|
||||
BOTH classify into the same RequestType AND match the
|
||||
satisfaction regex on success (so the bandit's `(type, model)` cell
|
||||
gets +alpha). On failure we send a neutral follow-up so no signal
|
||||
fires — over time, models the oracle favors accumulate alpha faster.
|
||||
|
||||
Why this shape:
|
||||
- The post-call hook gates signal recording on len(messages) >= 4.
|
||||
A single 5-message request passes the gate in one round-trip, which
|
||||
keeps the demo cheap.
|
||||
- Mock responses (`mock_response=...`) skip the real LLM call but still
|
||||
flow through routing + post-call hooks, so no API keys / no spend.
|
||||
|
||||
Run:
|
||||
uv run python scripts/adaptive_router_demo/traffic.py \\
|
||||
--proxy-url http://localhost:4000 \\
|
||||
--api-key sk-1234 \\
|
||||
--router smart-cheap-router \\
|
||||
--rounds 100 \\
|
||||
--rate 0.5
|
||||
|
||||
Open `dashboard.html` in a browser alongside this and watch the bars move.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import random
|
||||
import sys
|
||||
import uuid
|
||||
from typing import Dict, List, Tuple
|
||||
|
||||
import httpx
|
||||
|
||||
# ---- prompts (paired with the RequestType the classifier will assign) ----
|
||||
# Each prompt is engineered to (a) classify into the listed type and (b) make
|
||||
# sense as a user request. Keep prompts short to limit token cost.
|
||||
PROMPTS: Dict[str, List[str]] = {
|
||||
"code_generation": [
|
||||
"Write a Python function that flattens a nested list",
|
||||
"Create a TypeScript function that debounces another function",
|
||||
"Build a Rust function that parses a CSV string",
|
||||
"Generate a SQL function that returns running totals",
|
||||
],
|
||||
"factual_lookup": [
|
||||
"What is the capital of New Zealand?",
|
||||
"When was the Treaty of Westphalia signed?",
|
||||
"Who is the current Secretary General of the UN?",
|
||||
"Where is Mount Kilimanjaro located?",
|
||||
],
|
||||
"writing": [
|
||||
"Write an email declining a meeting politely",
|
||||
"Draft a paragraph introducing a product launch",
|
||||
"Compose a short blog post about morning routines",
|
||||
"Rewrite this sentence to be more concise: ...",
|
||||
],
|
||||
}
|
||||
|
||||
# Engineered satisfaction follow-ups — each one is designed to:
|
||||
# (1) match the satisfaction regex (thanks/great/works/perfect/etc.), AND
|
||||
# (2) re-classify into the SAME RequestType as the first prompt
|
||||
# so that signals attribute to the right (type, model) bandit cell.
|
||||
SATISFY: Dict[str, str] = {
|
||||
"code_generation": "thanks, that works! now write me a python function that does the inverse",
|
||||
"factual_lookup": "perfect, thanks! who is the current prime minister?",
|
||||
"writing": "great, thanks! now write a follow-up email confirming attendance",
|
||||
}
|
||||
|
||||
# Neutral follow-up — does not match any signal regex, does not move the bandit.
|
||||
NEUTRAL_FOLLOWUP = "ok, noted"
|
||||
|
||||
# Oracle: P(success | request_type, model). Tunable.
|
||||
# Defaults: smart dominates code/writing; both are fine for factual_lookup.
|
||||
ORACLE: Dict[str, Dict[str, float]] = {
|
||||
"code_generation": {"smart": 0.92, "fast": 0.35},
|
||||
"factual_lookup": {"smart": 0.90, "fast": 0.85},
|
||||
"writing": {"smart": 0.85, "fast": 0.55},
|
||||
}
|
||||
|
||||
# Fabricated assistant turn — content doesn't matter for the hook, only the role.
|
||||
FAB_ASSISTANT = "Got it. Working on that now."
|
||||
|
||||
|
||||
def _build_messages(prompt: str, last_user: str) -> List[Dict[str, str]]:
|
||||
"""5-message conversation that passes the SIGNAL_GATE_MIN_MESSAGES=4 gate."""
|
||||
return [
|
||||
{"role": "user", "content": prompt},
|
||||
{"role": "assistant", "content": FAB_ASSISTANT},
|
||||
{"role": "user", "content": "ok continue"},
|
||||
{"role": "assistant", "content": FAB_ASSISTANT},
|
||||
{"role": "user", "content": last_user},
|
||||
]
|
||||
|
||||
|
||||
async def _send(
|
||||
client: httpx.AsyncClient,
|
||||
proxy_url: str,
|
||||
api_key: str,
|
||||
router: str,
|
||||
session_id: str,
|
||||
messages: List[Dict[str, str]],
|
||||
mock_response: str,
|
||||
) -> Tuple[bool, str]:
|
||||
"""Returns (ok, chosen_model)."""
|
||||
body = {
|
||||
"model": router,
|
||||
"messages": messages,
|
||||
"metadata": {"litellm_session_id": session_id},
|
||||
"mock_response": mock_response,
|
||||
}
|
||||
try:
|
||||
r = await client.post(
|
||||
f"{proxy_url}/v1/chat/completions",
|
||||
json=body,
|
||||
headers={"Authorization": f"Bearer {api_key}"},
|
||||
timeout=15.0,
|
||||
)
|
||||
r.raise_for_status()
|
||||
except Exception as e: # noqa: BLE001
|
||||
print(f" request failed: {e}", file=sys.stderr)
|
||||
return False, ""
|
||||
chosen = r.headers.get("x-litellm-adaptive-router-model", "")
|
||||
return True, chosen
|
||||
|
||||
|
||||
async def _drive_one_session(
|
||||
client: httpx.AsyncClient,
|
||||
proxy_url: str,
|
||||
api_key: str,
|
||||
router: str,
|
||||
request_type: str,
|
||||
prompt: str,
|
||||
) -> str:
|
||||
"""Run one labeled session. Returns the chosen model (for logging)."""
|
||||
session_id = f"demo-{uuid.uuid4()}"
|
||||
|
||||
# Send the engineered 5-message conversation. The follow-up is chosen
|
||||
# AFTER we observe what model the router would pick — but since the
|
||||
# router is sticky-per-session, the model on this single round-trip
|
||||
# IS the model we're crediting.
|
||||
#
|
||||
# Pre-decide success based on the oracle for whichever model gets picked.
|
||||
# We can't know the pick before sending, so: send a neutral follow-up
|
||||
# first to learn the pick, then send a second round with credit attached.
|
||||
#
|
||||
# Round 1: neutral follow-up → no signal fires, but we learn the pick.
|
||||
ok, chosen = await _send(
|
||||
client, proxy_url, api_key, router, session_id,
|
||||
_build_messages(prompt, NEUTRAL_FOLLOWUP),
|
||||
mock_response=FAB_ASSISTANT,
|
||||
)
|
||||
if not ok or not chosen:
|
||||
return ""
|
||||
|
||||
# Decide outcome from oracle.
|
||||
p = ORACLE.get(request_type, {}).get(chosen, 0.5)
|
||||
success = random.random() < p
|
||||
follow_up = SATISFY[request_type] if success else NEUTRAL_FOLLOWUP
|
||||
|
||||
# Round 2: include the round-1 turns + a new follow-up. On success the
|
||||
# follow-up matches satisfaction → +alpha for (request_type, chosen).
|
||||
history = _build_messages(prompt, NEUTRAL_FOLLOWUP) + [
|
||||
{"role": "assistant", "content": FAB_ASSISTANT},
|
||||
{"role": "user", "content": follow_up},
|
||||
]
|
||||
await _send(
|
||||
client, proxy_url, api_key, router, session_id, history,
|
||||
mock_response=FAB_ASSISTANT,
|
||||
)
|
||||
return chosen
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
ap = argparse.ArgumentParser()
|
||||
ap.add_argument("--proxy-url", default="http://localhost:4000")
|
||||
ap.add_argument("--api-key", required=True, help="proxy key with /v1/chat/completions perms")
|
||||
ap.add_argument("--router", default="smart-cheap-router")
|
||||
ap.add_argument("--rounds", type=int, default=100)
|
||||
ap.add_argument("--rate", type=float, default=0.5,
|
||||
help="seconds between sessions; lower = faster")
|
||||
ap.add_argument("--types", default="code_generation,factual_lookup,writing",
|
||||
help="comma-separated subset of request types to drive")
|
||||
args = ap.parse_args()
|
||||
|
||||
types = [t.strip() for t in args.types.split(",") if t.strip() in PROMPTS]
|
||||
if not types:
|
||||
print(f"ERROR: no valid types. Choose from: {list(PROMPTS)}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
print(f"driving {args.rounds} sessions across types: {types}")
|
||||
print(f"oracle: {ORACLE}")
|
||||
print(f"proxy: {args.proxy_url} router: {args.router}\n")
|
||||
|
||||
counts: Dict[Tuple[str, str], int] = {}
|
||||
async with httpx.AsyncClient() as client:
|
||||
for i in range(args.rounds):
|
||||
rt = random.choice(types)
|
||||
prompt = random.choice(PROMPTS[rt])
|
||||
chosen = await _drive_one_session(
|
||||
client, args.proxy_url, args.api_key, args.router, rt, prompt,
|
||||
)
|
||||
if chosen:
|
||||
counts[(rt, chosen)] = counts.get((rt, chosen), 0) + 1
|
||||
if (i + 1) % 10 == 0:
|
||||
summary = ", ".join(
|
||||
f"{rt}/{m}={n}" for (rt, m), n in sorted(counts.items())
|
||||
)
|
||||
print(f" round {i + 1}/{args.rounds} picks: {summary}")
|
||||
await asyncio.sleep(args.rate)
|
||||
|
||||
print("\nfinal pick distribution:")
|
||||
for (rt, m), n in sorted(counts.items()):
|
||||
print(f" {rt:22s} → {m:8s} {n}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
Loading…
Reference in New Issue
Block a user