litellm/scripts
Yassin Kortam a6494e6fe3
perf: eliminate per-request callback scanning on proxy hot path (#27858)
- Introduce `_CallbackCapabilities` dataclass and `ProxyLogging._callback_capabilities()` static method that inspects `litellm.callbacks` once and caches capability flags keyed on (list length, member ids); invalidates automatically when the callback list mutates without per-request iteration overhead
- Replace O(n) `litellm.callbacks` walks in `async_pre_call_hook`, `during_call_hook`, `async_post_call_streaming_iterator_hook`, `async_post_call_streaming_hook`, and `post_call_response_headers_hook` with fast-path exits when no relevant callbacks are registered
- Add `needs_iterator_wrap()` and `needs_per_chunk_streaming_hook()` instance methods to decouple iterator-level wrapping from per-chunk hook execution; avoids `get_response_string` materialization per chunk when no guardrail or chunk-hook callback is active
- Introduce `_fast_serialize_simple_model_response_stream()` using `orjson` for common single-choice text streaming chunks, bypassing the full Pydantic serializer; falls back to `model_dump_json` for tool calls, logprobs, usage, and provider-specific fields
- Add early-return in `_restamp_streaming_chunk_model` when downstream model already matches the requested model, avoiding unnecessary string comparisons on every chunk
- Fix stale zero-cost cache bug in `_is_model_cost_zero`: move the per-router `_zero_cost_cache` dict onto the `Router` instance and clear it in `_invalidate_model_group_info_cache` so in-place pricing updates via `upsert_deployment` immediately resume budget enforcement
- Add `scripts/benchmark_chat_completions_perf.py`: standalone async benchmarking tool with a mock OpenAI provider, LiteLLM proxy process management, non-streaming RPS, streaming TTFT, and full-stream latency measurements with repeat/median run support
- Add comprehensive unit tests covering capability detection, cache invalidation, fast-path correctness, zero-cost cache regression, and the no-callback streaming fast path

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-14 09:28:31 -07:00
..
adaptive_router_demo feat: commit new adaptive routing 2026-04-18 21:29:39 -07:00
health_check style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
benchmark_chat_completions_perf.py perf: eliminate per-request callback scanning on proxy hot path (#27858) 2026-05-14 09:28:31 -07:00
benchmark_mock.py style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
benchmark_proxy_vs_provider.py style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
create_litellm_branch.ps1 feat: add script to create branches with litellm_ prefix (#17606) 2025-12-06 10:41:39 -08:00
create_litellm_branch.sh enhance: create_litellm_branch tool to be more robust (#17874) 2025-12-12 05:35:50 -08:00
create_team_key_and_submit_guardrail.sh feat(guardrails): team-based guardrail registration and approval workflow (#22459) 2026-03-02 22:06:49 -08:00
eval_compression.py Prompt Compression - add it to the proxy (#25729) 2026-04-20 15:08:00 -07:00
install.sh build: migrate packaging, CI, and Docker from Poetry to uv (#25007) 2026-04-09 11:46:23 -07:00
mock_bedrock_passthrough_target.py Refactor Bedrock response stream shape handling (#27257) 2026-05-06 17:39:38 -07:00
mock_grayswan_timeout_server.py implement failopen option default to True on grayswan guardrail (#18266) 2026-01-06 15:17:05 +05:30
mutation_report.py ci: add manually-triggered mutation testing workflow (#27576) 2026-05-11 15:19:57 -07:00
test_agent_mcp_endpoints.sh Agents - assign tools (#22064) 2026-02-25 11:44:30 -08:00
test_guardrails_register_endpoints.sh feat(guardrails): team-based guardrail registration and approval workflow (#22459) 2026-03-02 22:06:49 -08:00
test_tool_allowlist_script.py style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
tpm_headline_test.sh fix: atomic TPM rate limit (#27001) 2026-05-05 16:58:07 -07:00
verify_adaptive_router.py feat: add adaptive routing to litellm 2026-04-18 16:35:17 -07:00