* perf(realtime): eliminate redundant per-frame JSON work on OpenAI realtime relay The GA realtime support added in #27110 made backend_to_client_send_messages parse every backend frame up to three times for beta clients (OpenAI-Beta: realtime=v1), build a discarded Pydantic object per frame for logging, and re-serialize even frames that need no translation. For high-frequency response.output_audio.delta frames carrying multi-KB base64 payloads, that serialized CPU work on the hottest relay path drove the latency regression between v1.83.14 and v1.88.1 for gpt-realtime-1.5 and gpt-realtime-2. This parses each frame once via _parse_backend_event and threads the dict into _handle_raw_backend_message, store_message, and _translate_event_to_beta; short-circuits store_message before the Pydantic build for events not in the logged set; returns the original event unchanged from _translate_event_to_beta when no rename applies so the raw frame is forwarded without re-serialization; and only json.dumps when the type is actually renamed. * fix(realtime): widen store_message type hint to accept plain dict The parse-once refactor passes the dict produced by _parse_backend_event into store_message, but the parameter was typed as str | bytes | OpenAIRealtimeEvents (a union of TypedDicts), which mypy does not consider compatible with a plain dict. Add dict to the accepted union; the body already handles it. --------- Co-authored-by: Miguel Armenta <maarmenta92@gmail.com> |
||
|---|---|---|
| .. | ||
| agent_tests | ||
| audio_tests | ||
| basic_proxy_startup_tests | ||
| batches_tests | ||
| benchmarks | ||
| code_coverage_tests | ||
| documentation_tests | ||
| enterprise | ||
| guardrails_tests | ||
| image_gen_tests | ||
| integration | ||
| litellm | ||
| litellm_core_utils | ||
| litellm_utils_tests | ||
| litellm-proxy-extras | ||
| llm_responses_api_testing | ||
| llm_translation | ||
| load_tests | ||
| local_testing | ||
| logging_callback_tests | ||
| mcp_tests | ||
| multi_instance_e2e_tests | ||
| ocr_tests | ||
| old_proxy_tests/tests | ||
| openai_endpoints_tests | ||
| otel_tests | ||
| pass_through_tests | ||
| pass_through_unit_tests | ||
| proxy_admin_ui_tests | ||
| proxy_behavior | ||
| proxy_e2e_anthropic_messages_tests | ||
| proxy_migration_tests | ||
| proxy_security_tests | ||
| proxy_unit_tests | ||
| router_unit_tests | ||
| scim_tests | ||
| search_tests | ||
| spend_tracking_tests | ||
| store_model_in_db_tests | ||
| test_litellm | ||
| unified_google_tests | ||
| vector_store_tests | ||
| windows_tests | ||
| __init__.py | ||
| _flush_vcr_cache.py | ||
| _live_test_helpers.py | ||
| _openai_record_replay_proxy.py | ||
| _vcr_conftest_common.py | ||
| _vcr_redis_persister.py | ||
| eval_swe_bench.py | ||
| gettysburg.wav | ||
| large_text.py | ||
| openai_batch_completions.jsonl | ||
| README.MD | ||
| test_budget_management.py | ||
| test_callbacks_on_proxy.py | ||
| test_config.py | ||
| test_debug_warning.py | ||
| test_default_encoding_non_root.py | ||
| test_end_users.py | ||
| test_entrypoint.py | ||
| test_fallbacks.py | ||
| test_gpt5_azure_temperature_support.py | ||
| test_health.py | ||
| test_keys.py | ||
| test_litellm_proxy_responses_config.py | ||
| test_logging.conf | ||
| test_models.py | ||
| test_new_vector_store_endpoints.py | ||
| test_openai_endpoints.py | ||
| test_organizations.py | ||
| test_otel_thread_leak.py | ||
| test_passthrough_endpoints.py | ||
| test_presidio_latency.py | ||
| test_proxy_server_non_root.py | ||
| test_ratelimit.py | ||
| test_resource_cleanup.py | ||
| test_service_logger_otel.py | ||
| test_spend_logs.py | ||
| test_team_logging.py | ||
| test_team_members.py | ||
| test_team.py | ||
| test_users.py | ||
In total litellm runs 1000+ tests
[02/20/2025] Update:
To make it easier to contribute and map what behavior is tested,
we've started mapping the litellm directory in tests/test_litellm
This folder can only run mock tests.