litellm

History

Yassin Kortam 2eab9ee2c0 perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths (#28289 ) * perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths - Introduce pure-text fast-path in `_build_complete_streaming_response` that collapses O(N) `content_block_delta` events into a single equivalent SSE event before conversion, eliminating per-output-token Pydantic `ModelResponseStream` construction; non-text streams (tool_use, thinking, citations) fall back to the unchanged legacy path - Skip agentic streaming wrapper entirely when no callback overrides `async_should_run_agentic_loop`; the wrapper buffered every chunk and rebuilt the SSE response only to call hooks that all return `(False, {})` — a pure no-op for the default config - Serialize request body once (`json.dumps`) for both the pre-call log input and the wire, instead of twice; avoids a full O(payload) scan per request, significant for long-context Claude Code histories - Add fast path in `async_streaming_data_generator` that bypasses the per-chunk `async_post_call_streaming_hook` coroutine await, response-string materialization, and cost-injection call when no callback/guardrail/cost-injection is active (the default config) - Resolve `_DD_STREAMING_TRACE_ENABLED` once at import time; eliminate per-chunk `NullSpan` context manager allocation when Datadog tracing is disabled (the default) - Memoize `get_type_hints(AnthropicMessagesRequestOptionalParams)` with `@lru_cache(maxsize=1)` — resolves once per process instead of once per `/v1/messages` request (~80µs each) - Hoist `cost_injection_active` out of the per-chunk loop in `chunk_processor`; eliminates repeated `getattr` + endpoint-type checks on every streamed byte chunk - Extract `_build_passthrough_logging_result` from `_route_streaming_logging_to_handler` as a standalone static method to facilitate future off-loop dispatch - Convert `async_sse_data_generator` from an `async for: yield` trampoline to a direct return of the underlying generator, removing one async-generator layer per streamed chunk - Skip redundant `strip_empty_text_blocks_from_anthropic_messages` scan in `anthropic_messages_handler` when the async wrapper already sanitized (signalled via `_litellm_messages_presanitized` sentinel, popped before reaching provider params) - Gate debug log `f-string` evaluation behind `isEnabledFor(DEBUG)` in both the streaming generator and the transformation layer to avoid serializing entire message payloads on every request at non-debug log levels - Add benchmark script (`scripts/benchmark_anthropic_messages_perf.py`) with a local mock Anthropic SSE provider for reproducible TTFT and TPM measurement across commits/branches - Add parity tests asserting fast-path and legacy-path produce byte-identical logged/billed payloads, plus unit tests for agentic hook detection, pre-serialized body reuse, and memoized key resolution * perf: address greptile review for anthropic streaming hot path - Bail to legacy in `_collapse_pure_text_chunks` when content_block_delta events from different block indexes are observed without an intervening flush. Anthropic sends blocks strictly sequentially, but defensive bail prevents silent text-merging if the protocol ever interleaves. - Replace leaf-class `__dict__` check for `async_post_call_streaming_hook` in `_callback_capabilities` with a function-identity comparison that walks the MRO. A vendor base class can carry the override and the registered class can add nothing else; before this PR the hook was unconditionally invoked, so an inherited-override miss would silently drop the hook on the streaming path. - Add unit tests for both behaviors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mypy): narrow model_name to str in cost-injection branch The hoisted cost_injection_active flag in chunk_processor encodes the `bool(model_name)` requirement but mypy can't track that invariant through the local, so the per-chunk `_process_chunk_with_cost_injection( chunk, model_name)` calls flagged Optional[str] vs str. Pin a typed non-None local inside the cost-injection branch so mypy narrows correctly without changing runtime behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-23 12:15:59 -07:00
..
adaptive_router_demo	feat: commit new adaptive routing	2026-04-18 21:29:39 -07:00
health_check	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
benchmark_anthropic_messages_perf.py	perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths (#28289 )	2026-05-23 12:15:59 -07:00
benchmark_chat_completions_perf.py	perf: eliminate per-request callback scanning on proxy hot path (#27858 )	2026-05-14 09:28:31 -07:00
benchmark_mock.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
benchmark_proxy_vs_provider.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
create_litellm_branch.ps1	feat: add script to create branches with litellm_ prefix (#17606 )	2025-12-06 10:41:39 -08:00
create_litellm_branch.sh	enhance: create_litellm_branch tool to be more robust (#17874 )	2025-12-12 05:35:50 -08:00
create_team_key_and_submit_guardrail.sh	feat(guardrails): team-based guardrail registration and approval workflow (#22459 )	2026-03-02 22:06:49 -08:00
eval_compression.py	Prompt Compression - add it to the proxy (#25729 )	2026-04-20 15:08:00 -07:00
install.sh	build: migrate packaging, CI, and Docker from Poetry to uv (#25007 )	2026-04-09 11:46:23 -07:00
mock_bedrock_passthrough_target.py	Refactor Bedrock response stream shape handling (#27257 )	2026-05-06 17:39:38 -07:00
mock_grayswan_timeout_server.py	implement failopen option default to True on grayswan guardrail (#18266 )	2026-01-06 15:17:05 +05:30
mutation_report.py	ci: add manually-triggered mutation testing workflow (#27576 )	2026-05-11 15:19:57 -07:00
test_agent_mcp_endpoints.sh	Agents - assign tools (#22064 )	2026-02-25 11:44:30 -08:00
test_guardrails_register_endpoints.sh	feat(guardrails): team-based guardrail registration and approval workflow (#22459 )	2026-03-02 22:06:49 -08:00
test_tool_allowlist_script.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
tpm_headline_test.sh	fix: atomic TPM rate limit (#27001 )	2026-05-05 16:58:07 -07:00
verify_adaptive_router.py	feat: add adaptive routing to litellm	2026-04-18 16:35:17 -07:00