Litellm oss staging 050626 (#29774)

* Mark xAI models retiring on 2026-05-15 (#28788) Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3 with various reasoning efforts; callers continuing to use the old slugs will be billed at grok-4.3 pricing): grok-4-1-fast-reasoning{,-latest} -> grok-4.3 (low effort) grok-4-1-fast-non-reasoning{,-latest} -> grok-4.3 (none) grok-4-fast-reasoning -> grok-4.3 (low effort) grok-4-fast-non-reasoning -> grok-4.3 (none) grok-4-0709 -> grok-4.3 (low effort) grok-code-fast-1{,-0825} -> grok-build-0.1 grok-3 -> grok-4.3 (none) Only the direct xai/ slugs are tagged; third-party hosts (azure_ai, oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The grok-3 retirement list explicitly names only the base grok-3 slug — the -mini / -fast / -beta / -latest variants are not listed, so they remain untouched. * feat(moonshot): advertise json_schema response support on live models (#29683) litellm.responses() already routes Moonshot through the responses->chat-completions bridge, and Moonshot honors response_format json_schema on chat completions. The cost-map entries left supports_response_schema unset, so discovery layers that gate on that flag dropped Moonshot from structured-output / responses listings even though the capability works end to end. Set supports_response_schema on the nine models currently live on api.moonshot.ai: kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants, and moonshot-v1-auto. Verified against the live API that each honors json_schema and that litellm.responses() returns schema-valid structured output through the bridge. * chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685) Thirteen Moonshot/Kimi models in the cost map no longer resolve on api.moonshot.ai (all return 404). Stamp each with its deprecation_date from platform.kimi.ai/docs/models rather than deleting the entries, so historical cost calculation keeps resolving the names while tooling can surface the retirement. Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot publishes no discontinuation date for them). * fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687) Kimi reasoning models reject every temperature except 1; a request with temperature=0.2 returns "invalid temperature: only 1 is allowed for this model". litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd. Drop the temperature param entirely for reasoning models (gated on supports_reasoning, the same signal transform_request already uses) so the model default is used; the non-reasoning moonshot-v1 models keep the existing clamp. Co-authored-by: Sameer Kankute <sameer@berri.ai> * feat(mcp): add per-server timeout configuration (#29672) * feat(mcp): add per-server timeout configuration * fix(mcp): address timeout field review comments - use is not None guard instead of or for 0.0 edge case - copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table) - add timeout Float? column to all three schema.prisma files - extend round-trip test to cover _build_mcp_server_table direction - add test for zero timeout not treated as falsy * fix(mcp): forward timeout in _build_temporary_mcp_server_record * fix(mcp): return 504 instead of 500 when per-server timeout fires * test(mcp): add 504 timeout regression test; fix black formatting * Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add jp. Bedrock cross-region inference profile for claude-opus-4-7 AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the existing us./eu./au./global. profiles for Claude Opus 4.7 (ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is missing from model_prices_and_context_window.json. Tokyo-region users currently get an "unknown model" error when routing through the JP geo profile. Adds the entry to both the canonical file and the bundled backup, mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches the other regional profiles (10% premium over base/global). Regression test pins all six documented profiles (base, global, us, eu, au, jp) and asserts pricing parity between jp. and au. variants. Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * feat(soniox): add soniox audio transcription integration (#29508) * feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650) The OpenMeter callback resolves the CloudEvent subject from kwargs["user"] first, then falls back to the key-bound user_api_key_user_id. For multi-tenant proxy deployments, a client can set `"user": "..."` in the request body and cause their usage to be attributed to that arbitrary string — a billing-attribution forgery risk. Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward compatibility). When set to "false", the request-supplied `user` field is ignored and the subject is resolved solely from user_api_key_user_id. Matches the existing env-var-driven config pattern in this file (OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE). * feat(search): add you_com as a search provider (#28370) * feat(search): add you_com as a search provider Registers You.com Search API as a first-class `search_provider` in the `search_tools` registry, alongside Tavily, Exa, Perplexity, etc. - New adapter: litellm/llms/you_com/search/transformation.py - POSTs to https://ydc-index.io/v1/search - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key) - Maps Perplexity unified spec: max_results -> count, search_domain_filter -> include_domains, country -> country - Flattens results.web + results.news into a single SearchResult list; snippet prefers snippets[0], falls back to description; page_age -> date - Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired into ProviderConfigManager.get_provider_search_config() - Pricing entry: model_prices_and_context_window.json (placeholder $0.0; happy to adjust to maintainers' preferred public number) - Docs: example router config snippet and example proxy yaml updated - Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests (payload shape, domain filter mapping, snippet fallback, news flattening, missing-api-key error) Refs upstream expansion signal: #15942 * review fixups: normalize api_base, lowercase country, scope env-var to test Addresses Greptile inline review comments on #28370: - get_complete_url: strip trailing slashes from api_base *before* the endswith("/v1/search") check, so a custom base like ".../v1/search/" doesn't become ".../v1/search/v1/search". - transform_search_request: .lower() country before sending, matching Tavily's convention so callers using the unified spec form ("US") get consistent behavior across providers. - Tests: replace direct os.environ writes with an autouse monkeypatch fixture so YOUCOM_API_KEY is set per-test and removed afterwards. The missing-key test now uses monkeypatch.delenv. New test asserts the trailing-slash normalization above. Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note that documentation changes belong in the litellm-docs repo. * support keyless free tier (api.you.com/v1/agents/search) as default You.com offers an IP-throttled keyless endpoint that returns the same response shape as the keyed one (~100 queries/day, no signup). This is a significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG providers already in the search_tools registry. Behavior: - YOUCOM_API_KEY set -> keyed: POST https://ydc-index.io/v1/search (X-API-Key header) - no key -> free: POST https://api.you.com/v1/agents/search (no auth) - YOUCOM_API_BASE override -> honored as-is Tests: - New: test_you_com_search_keyless_free_tier - asserts URL + absence of X-API-Key when no key is configured. - New: test_you_com_search_validate_environment_keyless - asserts the config no longer raises when the key is absent. - Removed: test_you_com_search_raises_without_api_key (the precondition no longer holds). - Existing payload/domain-filter/etc tests still cover keyed mode via the autouse YOUCOM_API_KEY fixture. Verified both endpoints accept POST + return identical JSON shape: results.web[] / results.news[] with title, url, snippets, description, page_age. * register you_com in provider_endpoints_support.json Adding `litellm/llms/you_com/` requires a corresponding entry in provider_endpoints_support.json or the code-quality/check_provider_folders_documented CI check fails. Follows the compact tavily/serper pattern - endpoints: { search: true }. Local run of the check now reports "All 114 provider folders are documented". * move tests under tests/test_litellm/llms/ so CI exercises them The litellm CI workflows scope unit tests to `tests/test_litellm/...` (see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so tests living under `tests/search_tests/` are never run in CI - which is why codecov reports 0% patch coverage for the new adapter even though the unit tests exist and pass locally. Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so the test-unit-llm-providers job picks it up. 7/7 tests still pass at the new location. (Sibling search-only providers - tavily, exa_ai, brave, etc. - still live only in `tests/search_tests/` and would benefit from the same move, but that is out of scope for this PR.) * fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises Content-Encoding: gzip but returns a body that httpx's decoder rejects with `zlib.error: Error -3 while decompressing data: incorrect header check`, surfacing as litellm.APIConnectionError in user code. curl works because it doesn't request compression by default. Pin Accept-Encoding: identity in validate_environment so the upstream server skips compression entirely. Harmless on the keyed endpoint (ydc-index.io/v1/search) which negotiates content-encoding correctly. The header uses setdefault so a caller-supplied Accept-Encoding still takes precedence. (Server-side bug has been flagged to the You.com team separately - once fixed there, this workaround can be removed.) New unit test: test_you_com_search_pins_identity_accept_encoding. --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * docs: fix README typo (#29419) Correct clear spelling mistakes in documentation without changing behavior. Confidence: high Scope-risk: narrow Tested: git diff --check; uvx codespell on changed files Not-tested: Full docs build not run; text-only changes * Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480) * fix(langfuse): pass ssl_verify to Langfuse httpx client * fix_langfuse_ * add unit tests * addressed comments --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * feat(models): add minimax/MiniMax-M3 to model cost map (#29412) Add MiniMax's new flagship MiniMax-M3 to the native minimax provider: 512K context, 128K max output, native multimodal (supports_vision), reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output 2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so cache_creation_input_token_cost is omitted. Updated both the root model_prices_and_context_window.json (remote source) and the bundled litellm/model_prices_and_context_window_backup.json (local fallback), keeping them in sync. * fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394) * fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log * fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation * feat(provider): Add Neosantara provider as OpenAI Compatible (#29646) * Add Neosantara provider * Register Neosantara provider enum * Address Neosantara provider review feedback * Add Neosantara packaged endpoint support --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix: address greptile and veria review feedback - langfuse: guard httpx_client injection behind version check (>= 2.7.3) - soniox: propagate audio_transcription_duration in _hidden_params for spend tracking - soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base - mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError * chore(mcp): add migration for per-server timeout column * fix(test): add tool_use_system_prompt_tokens to model prices schema validator * fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key * fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs The search flow resolves api_key in validate_environment but never passed it into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the env) set the X-API-Key header yet still selected the keyless free-tier endpoint. Forward api_key through both the search entrypoint and the http handler so the keyed endpoint is chosen. HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox poll and transcript-fetch GETs silently used the client global default instead of the caller timeout. Add a per-request timeout to get() and forward the configured timeout from the Soniox handler. * fix(soniox): price stt-async-v4 per second so transcriptions are billed The handler stores audio_transcription_duration in _hidden_params, but the model carried only token cost fields and the response has no token usage, so the transcription cost path fell through to cost_per_second and returned $0. An authenticated caller could transcribe Soniox audio without decrementing their budget. Switch the entry to output_cost_per_second at Soniox's published $0.10/hour async rate so the stored duration produces a real charge. * fix(langfuse): use a dedicated httpx client for the SDK injection The httpx_client handed to the Langfuse SDK came from _get_httpx_client(), which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that client on teardown it would invalidate the shared client used by every other LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL verification and client certificate from LiteLLM's configuration. * fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var * fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779) * fix(cohere): support max_completion_tokens on cohere v2 chat The default cohere_chat route resolves to CohereV2ChatConfig, which did not list or map max_completion_tokens, so get_optional_params raised UnsupportedParamsError for the standard OpenAI parameter (the modern replacement for the deprecated max_tokens). The v1 config already maps it to cohere's max_tokens; mirror that in v2 and add v2 regression tests. * fix(cohere): make max_completion_tokens take precedence over max_tokens on v2 When both max_tokens and max_completion_tokens are supplied, prefer max_completion_tokens explicitly rather than relying on dict iteration order, and cover both orderings with a regression test. --------- Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com> Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Dan Lemon <dan@danlemon.com> Co-authored-by: Saswat <saswatds@users.noreply.github.com> Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com> Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com> Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: kape <168134658+kapelame@users.noreply.github.com> Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com> Co-authored-by: Just R <remixingmagelang@gmail.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
2026-06-06 02:21:51 +05:30 · 2026-06-06 02:21:51 +05:30 · d671a09c20
commit d671a09c20
parent 4a5644d51e
58 changed files with 4615 additions and 63 deletions
--- a/README.md
+++ b/README.md
@ -407,7 +407,7 @@ Support for more providers. Missing a provider or LLM Platform, raise a [feature
 ### Run in Developer Mode
 #### Services
 1. Setup .env file in root
-2. Run dependant services `docker-compose up db prometheus`
+2. Run dependent services `docker-compose up db prometheus`
 #### Backend
 1. (In root) create virtual environment `python -m venv .venv`
--- a/litellm-proxy-extras/litellm_proxy_extras/migrations/20260605182307_add_timeout_to_mcp_server_table/migration.sql
+++ b/litellm-proxy-extras/litellm_proxy_extras/migrations/20260605182307_add_timeout_to_mcp_server_table/migration.sql
@ -0,0 +1,3 @@
 -- AlterTable
 ALTER TABLE "LiteLLM_MCPServerTable" ADD COLUMN     "timeout" DOUBLE PRECISION;
--- a/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
+++ b/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
@ -331,6 +331,7 @@ model LiteLLM_MCPServerTable {
  byok_description      String[] @default([])
  byok_api_key_help_url String?
  source_url            String?
  timeout               Float?
  // BYOM submission lifecycle
  approval_status  String?   @default("active")
  submitted_by     String?
--- a/litellm/init.py
+++ b/litellm/init.py
@ -612,6 +612,7 @@ cerebras_models: Set = set()
 galadriel_models: Set = set()
 nvidia_nim_models: Set = set()
 nvidia_riva_models: Set = set()
 soniox_models: Set = set()
 sambanova_models: Set = set()
 sambanova_embedding_models: Set = set()
 novita_models: Set = set()
@ -844,6 +845,8 @@ def add_known_models(model_cost_map: Optional[Dict] = None):
            nvidia_nim_models.add(key)
        elif value.get("litellm_provider") == "nvidia_riva":
            nvidia_riva_models.add(key)
        elif value.get("litellm_provider") == "soniox":
            soniox_models.add(key)
        elif value.get("litellm_provider") == "sambanova":
            sambanova_models.add(key)
        elif value.get("litellm_provider") == "sambanova-embedding-models":
@ -1009,6 +1012,7 @@ model_list = list(
    | galadriel_models
    | nvidia_nim_models
    | nvidia_riva_models
    | soniox_models
    | sambanova_models
    | azure_text_models
    | novita_models
@ -1109,6 +1113,7 @@ models_by_provider: dict = {
    "galadriel": galadriel_models,
    "nvidia_nim": nvidia_nim_models,
    "nvidia_riva": nvidia_riva_models,
    "soniox": soniox_models,
    "sambanova": sambanova_models | sambanova_embedding_models,
    "novita": novita_models,
    "nebius": nebius_models | nebius_embedding_models,
--- a/litellm/_lazy_imports_registry.py
+++ b/litellm/_lazy_imports_registry.py
@ -321,6 +321,7 @@ LLM_CONFIG_NAMES = (
    "LemonadeChatConfig",
    "SnowflakeEmbeddingConfig",
    "AmazonNovaChatConfig",
    "SonioxAudioTranscriptionConfig",
 )
 # Types that support lazy loading via _lazy_import_types
@ -1195,6 +1196,10 @@ _LLM_CONFIGS_IMPORT_MAP = {
        ".llms.amazon_nova.chat.transformation",
        "AmazonNovaChatConfig",
    ),
    "SonioxAudioTranscriptionConfig": (
        ".llms.soniox.audio_transcription.transformation",
        "SonioxAudioTranscriptionConfig",
    ),
 }
 # Import map for utils module lazy imports
--- a/litellm/experimental_mcp_client/client.py
+++ b/litellm/experimental_mcp_client/client.py
@ -556,7 +556,9 @@ class MCPClient:
            )
            return tool_result
        except asyncio.CancelledError:
-            verbose_logger.warning("MCP client tool call was cancelled")
+            verbose_logger.warning(
                f"MCP client tool call timed out after {self.timeout}s for {self.server_url}"
            )
            raise
        except Exception as e:
            import traceback
--- a/litellm/integrations/langfuse/langfuse_prompt_management.py
+++ b/litellm/integrations/langfuse/langfuse_prompt_management.py
@ -102,6 +102,18 @@ def langfuse_client_init(
    if Version(langfuse.version.__version__) >= Version("2.6.0"):
        parameters["sdk_integration"] = "litellm"
    if Version(langfuse.version.__version__) >= Version("2.7.3"):
        import httpx
        import litellm
        from ...llms.custom_httpx.http_handler import get_ssl_configuration
        parameters["httpx_client"] = httpx.Client(
            verify=get_ssl_configuration(),
            cert=os.getenv("SSL_CERTIFICATE", litellm.ssl_certificate),
        )
    client = Langfuse(**parameters)
    return client
--- a/litellm/integrations/openmeter.py
+++ b/litellm/integrations/openmeter.py
@ -65,7 +65,15 @@ class OpenMeterLogger(CustomLogger):
                "total_tokens": response_obj["usage"].get("total_tokens"),
            }
-        user_param = kwargs.get("user", None)  # end-user passed in via 'user' param
+        # OPENMETER_TRUST_REQUEST_USER (default "true"): when set to "false",
        # the request-supplied `user` field is ignored and the subject is
        # resolved solely from the key-bound user_api_key_user_id. Proxies
        # serving multi-tenant traffic enable this to prevent clients from
        # forging attribution by setting `user` in the request body.
        trust_request_user = (
            os.getenv("OPENMETER_TRUST_REQUEST_USER", "true").lower() != "false"
        )
        user_param = kwargs.get("user", None) if trust_request_user else None
        # If no user provided directly, try to get it from token user_id
        if user_param is None:
--- a/litellm/integrations/websearch_interception/ARCHITECTURE.md
+++ b/litellm/integrations/websearch_interception/ARCHITECTURE.md
@ -244,6 +244,9 @@ search_tools:
  - search_tool_name: "my-tavily-tool"
    litellm_params:
      search_provider: "tavily"
  - search_tool_name: "my-you-com-tool"
    litellm_params:
      search_provider: "you_com"
 ```
 ---
--- a/litellm/litellm_core_utils/get_llm_provider_logic.py
+++ b/litellm/litellm_core_utils/get_llm_provider_logic.py
@ -659,6 +659,11 @@ def _get_openai_compatible_provider_info(  # noqa: PLR0915
            or get_secret_str("NVIDIA_RIVA_API_KEY")
            or get_secret_str("NVIDIA_NIM_API_KEY")
        )
    elif custom_llm_provider == "soniox":
        api_base = (
            api_base or get_secret_str("SONIOX_API_BASE") or "https://api.soniox.com"
        )
        dynamic_api_key = api_key or get_secret_str("SONIOX_API_KEY")
    elif custom_llm_provider == "cerebras":
        api_base = (
            api_base or get_secret("CEREBRAS_API_BASE") or "https://api.cerebras.ai/v1"
--- a/litellm/litellm_core_utils/get_supported_openai_params.py
+++ b/litellm/litellm_core_utils/get_supported_openai_params.py
@ -341,6 +341,11 @@ def get_supported_openai_params(  # noqa: PLR0915
            return ElevenLabsAudioTranscriptionConfig().get_supported_openai_params(
                model=model
            )
    elif custom_llm_provider == "soniox":
        if request_type == "transcription":
            return litellm.SonioxAudioTranscriptionConfig().get_supported_openai_params(
                model=model
            )
    elif custom_llm_provider in litellm._custom_providers:
        if request_type == "chat_completion":
            provider_config = litellm.ProviderConfigManager.get_provider_chat_config(
--- a/litellm/litellm_core_utils/litellm_logging.py
+++ b/litellm/litellm_core_utils/litellm_logging.py
@ -3503,7 +3503,9 @@ class Logging(LiteLLMLoggingBaseClass):
        else:
            return None
-    def _handle_anthropic_messages_response_logging(self, result: Any) -> ModelResponse:
+    def _handle_anthropic_messages_response_logging(
        self, result: Any
    ) -> Union[ModelResponse, ResponsesAPIResponse]:
        """
        Handles logging for Anthropic messages responses.
@ -3522,6 +3524,15 @@ class Logging(LiteLLMLoggingBaseClass):
            return result
        elif isinstance(result, ModelResponse):
            return result
        elif isinstance(
            result,
            (ResponseCompletedEvent, ResponseIncompleteEvent, ResponseFailedEvent),
        ):
            # anthropic_messages() can route to OpenAI Responses API; in that path
            # the assembled streaming result is one of these terminal events rather than
            # a ModelResponse. Return the inner response so downstream handlers
            # (_transform_usage_objects, normalize_logging_result) can process it.
            return result.response
        httpx_response = self.model_call_details.get("httpx_response", None)
        if httpx_response and isinstance(httpx_response, httpx.Response):
--- a/litellm/llms/cohere/chat/v2_transformation.py
+++ b/litellm/llms/cohere/chat/v2_transformation.py
@ -120,6 +120,7 @@ class CohereV2ChatConfig(OpenAIGPTConfig):
            "stream",
            "temperature",
            "max_tokens",
            "max_completion_tokens",
            "top_p",
            "frequency_penalty",
            "presence_penalty",
@ -143,7 +144,12 @@ class CohereV2ChatConfig(OpenAIGPTConfig):
                optional_params["stream"] = value
            if param == "temperature":
                optional_params["temperature"] = value
-            if param == "max_tokens":
+            if (
                param == "max_tokens"
                and "max_completion_tokens" not in non_default_params
            ):
                optional_params["max_tokens"] = value
            if param == "max_completion_tokens":
                optional_params["max_tokens"] = value
            if param == "n":
                optional_params["num_generations"] = value
--- a/litellm/llms/custom_httpx/http_handler.py
+++ b/litellm/llms/custom_httpx/http_handler.py
@ -589,6 +589,7 @@ class AsyncHTTPHandler:
        params: Optional[dict] = None,
        headers: Optional[dict] = None,
        follow_redirects: Optional[bool] = None,
        timeout: Optional[Union[float, httpx.Timeout]] = None,
    ):
        # Set follow_redirects to UseClientDefault if None
        _follow_redirects = (
@ -599,7 +600,11 @@ class AsyncHTTPHandler:
        params.update(HTTPHandler.extract_query_params(url))
        response = await self.client.get(
-            url, params=params, headers=headers, follow_redirects=_follow_redirects  # type: ignore
+            url,
            params=params,
            headers=headers,  # type: ignore
            follow_redirects=_follow_redirects,  # type: ignore
            timeout=timeout if timeout is not None else USE_CLIENT_DEFAULT,
        )
        return response
@ -1115,6 +1120,7 @@ class HTTPHandler:
        params: Optional[dict] = None,
        headers: Optional[dict] = None,
        follow_redirects: Optional[bool] = None,
        timeout: Optional[Union[float, httpx.Timeout]] = None,
    ):
        # Set follow_redirects to UseClientDefault if None
        _follow_redirects = (
@ -1128,6 +1134,7 @@ class HTTPHandler:
            params=params,
            headers=headers,
            follow_redirects=_follow_redirects,
            timeout=timeout if timeout is not None else USE_CLIENT_DEFAULT,
        )
        return response
--- a/litellm/llms/custom_httpx/llm_http_handler.py
+++ b/litellm/llms/custom_httpx/llm_http_handler.py
@ -1751,6 +1751,7 @@ class BaseLLMHTTPHandler:
            api_base=api_base,
            optional_params=optional_params,
            data=data,
            api_key=api_key,
        )
        ## LOGGING
@ -1833,6 +1834,7 @@ class BaseLLMHTTPHandler:
            api_base=api_base,
            optional_params=optional_params,
            data=data,
            api_key=api_key,
        )
        ## LOGGING
--- a/litellm/llms/moonshot/chat/transformation.py
+++ b/litellm/llms/moonshot/chat/transformation.py
@ -134,11 +134,15 @@ class MoonshotChatConfig(OpenAIGPTConfig):
        ##########################################
        # temperature limitations
-        # 1. `temperature` on KIMI API is [0, 1] but OpenAI is [0, 2]
+        # 1. reasoning models (kimi-k2.5, kimi-k2.6, ...) reject every temperature
-        # 2. If temperature < 0.3 and n > 1, KIMI will raise an exception.
+        #    except 1, so the param is dropped and the model's default is used
        # 2. `temperature` on KIMI API is [0, 1] but OpenAI is [0, 2]
        # 3. If temperature < 0.3 and n > 1, KIMI will raise an exception.
        #       If we enter this condition, we set the temperature to 0.3 as suggested by Moonshot AI
        ##########################################
-        if "temperature" in optional_params:
+        if supports_reasoning(model=model, custom_llm_provider="moonshot"):
            optional_params.pop("temperature", None)
        elif "temperature" in optional_params:
            if optional_params["temperature"] > 1:
                optional_params["temperature"] = 1
            if optional_params["temperature"] < 0.3 and optional_params.get("n", 1) > 1:
--- a/litellm/llms/openai_like/providers.json
+++ b/litellm/llms/openai_like/providers.json
@ -115,6 +115,15 @@
      "max_completion_tokens": "max_tokens"
    }
  },
  "neosantara": {
    "base_url": "https://api.neosantara.xyz/v1",
    "api_key_env": "NEOSANTARA_API_KEY",
    "api_base_env": "NEOSANTARA_API_BASE",
    "param_mappings": {
      "max_completion_tokens": "max_tokens"
    },
    "supported_endpoints": ["/v1/chat/completions", "/v1/responses"]
  },
  "tensormesh": {
    "base_url": "https://serverless.tensormesh.ai/v1",
    "api_key_env": "TENSORMESH_INFERENCE_API_KEY",
--- a/litellm/llms/soniox/init.py
+++ b/litellm/llms/soniox/init.py
@ -0,0 +1 @@
 """Soniox LLM provider implementation."""
--- a/litellm/llms/soniox/audio_transcription/init.py
+++ b/litellm/llms/soniox/audio_transcription/init.py
@ -0,0 +1 @@
 """Soniox audio transcription implementation."""
--- a/litellm/llms/soniox/audio_transcription/handler.py
+++ b/litellm/llms/soniox/audio_transcription/handler.py
@ -0,0 +1,802 @@
 """
 Handler for Soniox async speech-to-text transcription.
 Soniox's async transcription API requires multiple HTTP calls:
  1. (optional) POST /v1/files                      — upload a local audio file
  2.            POST /v1/transcriptions             — create a transcription job
  3.            GET  /v1/transcriptions/{id}        — poll until status == "completed"
  4.            GET  /v1/transcriptions/{id}/transcript — fetch the transcript
  5. (optional) DELETE /v1/transcriptions/{id}      — cleanup
  6. (optional) DELETE /v1/files/{id}               — cleanup
 Because this does not fit the single-request shape of
 `base_llm_http_handler.audio_transcriptions`, the dispatch in
 `litellm.main.transcription()` routes Soniox requests directly to this
 handler (analogous to the OpenAI / Azure transcription handlers).
 """
 import asyncio
 import math
 import time
 from typing import (
    TYPE_CHECKING,
    Any,
    Coroutine,
    Dict,
    List,
    Optional,
    Tuple,
    Union,
 )
 import httpx
 from litellm.litellm_core_utils.audio_utils.utils import (
    get_audio_file_name,
    process_audio_file,
 )
 from litellm.llms.custom_httpx.http_handler import (
    AsyncHTTPHandler,
    HTTPHandler,
    _get_httpx_client,
    get_async_httpx_client,
 )
 from litellm.llms.soniox.audio_transcription.transformation import (
    SonioxAudioTranscriptionConfig,
 )
 from litellm.llms.soniox.common_utils import (
    SONIOX_DEFAULT_CLEANUP,
    SONIOX_DEFAULT_MAX_POLL_ATTEMPTS,
    SONIOX_DEFAULT_POLL_INTERVAL,
    SONIOX_MAX_POLL_ATTEMPTS,
    SONIOX_MAX_POLL_INTERVAL,
    SONIOX_MIN_POLL_INTERVAL,
    SONIOX_SECRET_FIELDS,
    SonioxException,
    get_soniox_api_base,
 )
 from litellm.types.utils import FileTypes, TranscriptionResponse
 if TYPE_CHECKING:
    from litellm.litellm_core_utils.litellm_logging import (
        Logging as LiteLLMLoggingObj,
    )
 else:
    LiteLLMLoggingObj = Any
 class SonioxAudioTranscriptionHandler:
    """Orchestrates the Soniox async transcription flow."""
    # ------------------------------------------------------------------
    # Public entry points
    # ------------------------------------------------------------------
    def audio_transcriptions(
        self,
        model: str,
        audio_file: Optional[FileTypes],
        optional_params: dict,
        litellm_params: dict,
        model_response: TranscriptionResponse,
        timeout: float,
        max_retries: int,
        logging_obj: LiteLLMLoggingObj,
        api_key: Optional[str],
        api_base: Optional[str],
        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
        atranscription: bool = False,
        headers: Optional[Dict[str, Any]] = None,
        provider_config: Optional[SonioxAudioTranscriptionConfig] = None,
    ) -> Union[TranscriptionResponse, Coroutine[Any, Any, TranscriptionResponse]]:
        """Sync/async dispatch for Soniox transcription requests.
        Note: ``max_retries`` is accepted for signature compatibility with
        ``litellm.transcription`` but is **not yet implemented** for the Soniox
        async pipeline. Transient HTTP failures during upload, create, poll,
        or fetch will surface immediately. Wrap calls with the standard
        ``litellm.Router`` / ``num_retries`` mechanism for retry behaviour.
        """
        config = provider_config or SonioxAudioTranscriptionConfig()
        if atranscription is True:
            return self._async_audio_transcriptions(
                model=model,
                audio_file=audio_file,
                optional_params=optional_params,
                litellm_params=litellm_params,
                model_response=model_response,
                timeout=timeout,
                logging_obj=logging_obj,
                api_key=api_key,
                api_base=api_base,
                client=client if isinstance(client, AsyncHTTPHandler) else None,
                headers=headers or {},
                provider_config=config,
            )
        return self._sync_audio_transcriptions(
            model=model,
            audio_file=audio_file,
            optional_params=optional_params,
            litellm_params=litellm_params,
            model_response=model_response,
            timeout=timeout,
            logging_obj=logging_obj,
            api_key=api_key,
            api_base=api_base,
            client=client if isinstance(client, HTTPHandler) else None,
            headers=headers or {},
            provider_config=config,
        )
    # ------------------------------------------------------------------
    # Helpers shared between sync and async paths
    # ------------------------------------------------------------------
    def _prepare(
        self,
        audio_file: Optional[FileTypes],
        optional_params: dict,
        litellm_params: dict,
        api_key: Optional[str],
        api_base: Optional[str],
        provider_config: SonioxAudioTranscriptionConfig,
        headers: Dict[str, Any],
    ) -> Tuple[
        Dict[str, str],  # auth headers
        str,  # api_base (no trailing slash)
        Dict[str, Any],  # body for POST /v1/transcriptions (without file_id/audio_url)
        Dict[str, Any],  # handler-only options (poll interval, cleanup, ...)
    ]:
        # Validate env -> auth headers.
        auth_headers = provider_config.validate_environment(
            headers=headers,
            model="",  # unused
            messages=[],
            optional_params=optional_params,
            litellm_params=litellm_params,
            api_key=api_key,
            api_base=api_base,
        )
        base_url = get_soniox_api_base(api_base)
        # Operate on a local copy so we don't mutate the caller's dict
        # (the caller may reuse `optional_params` for retries or logging).
        params = dict(optional_params)
        # Pull handler-only kwargs out of params so they aren't sent
        # to Soniox.
        poll_interval = float(
            params.pop("soniox_polling_interval", SONIOX_DEFAULT_POLL_INTERVAL)
        )
        try:
            max_attempts = int(
                params.pop(
                    "soniox_max_polling_attempts", SONIOX_DEFAULT_MAX_POLL_ATTEMPTS
                )
            )
        except (ValueError, OverflowError):
            max_attempts = SONIOX_DEFAULT_MAX_POLL_ATTEMPTS
        cleanup_raw = params.pop("soniox_cleanup", SONIOX_DEFAULT_CLEANUP)
        if cleanup_raw is None:
            cleanup: List[str] = []
        elif isinstance(cleanup_raw, str):
            cleanup = [cleanup_raw]
        else:
            cleanup = list(cleanup_raw)
        filename_override = params.pop("filename", None)
        # Server-side clamps. Caller-supplied poll settings (from request kwargs)
        # are bounded so an authenticated caller cannot force a worker into a
        # tight poll loop (zero interval) or pin it indefinitely (huge attempt
        # count). Total polling time is bounded by
        #   SONIOX_MAX_POLL_ATTEMPTS * SONIOX_MAX_POLL_INTERVAL.
        if not math.isfinite(poll_interval):
            poll_interval = SONIOX_DEFAULT_POLL_INTERVAL
        clamped_poll_interval = max(
            SONIOX_MIN_POLL_INTERVAL, min(poll_interval, SONIOX_MAX_POLL_INTERVAL)
        )
        clamped_max_attempts = max(1, min(max_attempts, SONIOX_MAX_POLL_ATTEMPTS))
        handler_opts: Dict[str, Any] = {
            "poll_interval": clamped_poll_interval,
            "max_attempts": clamped_max_attempts,
            "cleanup": cleanup,
            "filename_override": filename_override,
            "audio_url": params.pop("audio_url", None),
            "file_id": params.pop("file_id", None),
        }
        # Soniox does not accept `language` directly; map_openai_params should
        # already have translated it, but drop any leftover to be safe.
        params.pop("language", None)
        # response_format is handled by LiteLLM post-processing, not Soniox.
        handler_opts["response_format"] = params.pop("response_format", None)
        return auth_headers, base_url, params, handler_opts
    def _build_create_body(
        self,
        model: str,
        optional_params: dict,
        handler_opts: Dict[str, Any],
        file_id: Optional[str],
    ) -> Dict[str, Any]:
        body: Dict[str, Any] = {"model": model}
        # Soniox-native passthrough fields
        for key, value in optional_params.items():
            if value is None:
                continue
            body[key] = value
        if handler_opts.get("audio_url"):
            body["audio_url"] = handler_opts["audio_url"]
        if file_id:
            body["file_id"] = file_id
        return body
    @staticmethod
    def _redact_body_for_logging(body: Dict[str, Any]) -> Dict[str, Any]:
        """Return a shallow copy of ``body`` with secret fields redacted.
        Soniox's create-transcription body can include
        ``webhook_auth_header_value`` (a shared secret used to authenticate
        webhook callbacks). Forwarding that value to logging callbacks would
        let anyone with read access to those sinks forge webhook requests, so
        we replace any value of a known secret-bearing field with the literal
        ``"[REDACTED]"`` before logging. Non-secret fields are passed through
        unchanged.
        """
        if not body:
            return body
        redacted = dict(body)
        for field in SONIOX_SECRET_FIELDS:
            if field in redacted and redacted[field] is not None:
                redacted[field] = "[REDACTED]"
        return redacted
    @staticmethod
    def _safe_log_pre_call(
        logging_obj: LiteLLMLoggingObj,
        api_key: Optional[str],
        api_base: str,
        body: Dict[str, Any],
    ) -> None:
        try:
            logging_obj.pre_call(
                input=None,
                api_key=api_key,
                additional_args={
                    "api_base": f"{api_base}/v1/transcriptions",
                    "atranscription": True,
                    "complete_input_dict": SonioxAudioTranscriptionHandler._redact_body_for_logging(
                        body
                    ),
                },
            )
        except Exception:
            # Logging hooks are best-effort: a misbehaving callback or third-party
            # observability integration must never break a real Soniox call.
            pass
    @staticmethod
    def _safe_log_post_call(
        logging_obj: LiteLLMLoggingObj,
        audio_file: Optional[FileTypes],
        api_key: Optional[str],
        body: Dict[str, Any],
        original_response: Any,
    ) -> None:
        try:
            logging_obj.post_call(
                input=get_audio_file_name(audio_file) if audio_file else None,
                api_key=api_key,
                additional_args={
                    "complete_input_dict": SonioxAudioTranscriptionHandler._redact_body_for_logging(
                        body
                    )
                },
                original_response=original_response,
            )
        except Exception:
            # Logging hooks are best-effort: a misbehaving callback or third-party
            # observability integration must never break a real Soniox call.
            pass
    @staticmethod
    def _raise_for_response(
        response: httpx.Response,
        provider_config: SonioxAudioTranscriptionConfig,
        action: str,
    ) -> None:
        if response.status_code >= 400:
            try:
                payload = response.json()
                message = (
                    payload.get("error_message")
                    or payload.get("error")
                    or response.text
                )
            except Exception:
                message = response.text
            raise provider_config.get_error_class(
                error_message=f"Soniox {action} failed (HTTP {response.status_code}): {message}",
                status_code=response.status_code,
                headers=response.headers,
            )
    # ------------------------------------------------------------------
    # Sync flow
    # ------------------------------------------------------------------
    def _sync_audio_transcriptions(
        self,
        model: str,
        audio_file: Optional[FileTypes],
        optional_params: dict,
        litellm_params: dict,
        model_response: TranscriptionResponse,
        timeout: float,
        logging_obj: LiteLLMLoggingObj,
        api_key: Optional[str],
        api_base: Optional[str],
        client: Optional[HTTPHandler],
        headers: Dict[str, Any],
        provider_config: SonioxAudioTranscriptionConfig,
    ) -> TranscriptionResponse:
        auth_headers, base_url, opt_params, handler_opts = self._prepare(
            audio_file=audio_file,
            optional_params=optional_params,
            litellm_params=litellm_params,
            api_key=api_key,
            api_base=api_base,
            provider_config=provider_config,
            headers=headers,
        )
        http_client = (
            client
            if isinstance(client, HTTPHandler)
            else (
                _get_httpx_client(
                    params={"ssl_verify": litellm_params.get("ssl_verify", None)},
                )
            )
        )
        file_id = handler_opts.get("file_id")
        uploaded_file_id: Optional[str] = None
        transcription_id: Optional[str] = None
        try:
            if not file_id and not handler_opts.get("audio_url"):
                if audio_file is None:
                    raise SonioxException(
                        message=(
                            "Soniox transcription requires one of: a file argument, "
                            "an `audio_url` kwarg, or a `file_id` kwarg."
                        ),
                        status_code=400,
                        headers=None,
                    )
                uploaded_file_id = self._sync_upload_file(
                    http_client=http_client,
                    base_url=base_url,
                    auth_headers=auth_headers,
                    audio_file=audio_file,
                    filename_override=handler_opts.get("filename_override"),
                    timeout=timeout,
                    provider_config=provider_config,
                )
                file_id = uploaded_file_id
            body = self._build_create_body(model, opt_params, handler_opts, file_id)
            self._safe_log_pre_call(logging_obj, api_key, base_url, body)
            create_resp = http_client.post(
                url=f"{base_url}/v1/transcriptions",
                headers=auth_headers,
                json=body,
                timeout=timeout,
            )
            self._raise_for_response(
                create_resp, provider_config, "create transcription"
            )
            transcription_id = create_resp.json()["id"]
            transcription_meta = self._sync_poll_until_completed(
                http_client=http_client,
                base_url=base_url,
                auth_headers=auth_headers,
                transcription_id=transcription_id,
                poll_interval=handler_opts["poll_interval"],
                max_attempts=handler_opts["max_attempts"],
                timeout=timeout,
                provider_config=provider_config,
            )
            transcript_resp = http_client.get(
                url=f"{base_url}/v1/transcriptions/{transcription_id}/transcript",
                headers=auth_headers,
                timeout=timeout,
            )
            self._raise_for_response(
                transcript_resp, provider_config, "fetch transcript"
            )
            transcript = transcript_resp.json()
            payload = {"transcription": transcription_meta, "transcript": transcript}
            response = provider_config._build_response_from_payload(
                payload,
                model_response=model_response,
                response_format=handler_opts.get("response_format"),
            )
            self._safe_log_post_call(logging_obj, audio_file, api_key, body, payload)
            audio_duration_ms = transcription_meta.get("audio_duration_ms")
            response._hidden_params.update(
                {
                    "model": model,
                    "custom_llm_provider": "soniox",
                    "audio_transcription_duration": (
                        float(audio_duration_ms) / 1000.0
                        if audio_duration_ms is not None
                        else None
                    ),
                }
            )
            return response
        finally:
            self._sync_cleanup(
                http_client=http_client,
                base_url=base_url,
                auth_headers=auth_headers,
                cleanup=handler_opts["cleanup"],
                file_id_to_cleanup=uploaded_file_id,
                transcription_id=transcription_id,
                timeout=timeout,
            )
    def _sync_upload_file(
        self,
        http_client: HTTPHandler,
        base_url: str,
        auth_headers: Dict[str, str],
        audio_file: FileTypes,
        filename_override: Optional[str],
        timeout: float,
        provider_config: SonioxAudioTranscriptionConfig,
    ) -> str:
        processed = process_audio_file(audio_file)
        filename = filename_override or processed.filename
        files = {
            "file": (filename, processed.file_content, processed.content_type),
        }
        # `Authorization` header is fine; httpx sets multipart Content-Type.
        upload_headers = {"Authorization": auth_headers["Authorization"]}
        resp = http_client.post(
            url=f"{base_url}/v1/files",
            headers=upload_headers,
            files=files,
            timeout=timeout,
        )
        self._raise_for_response(resp, provider_config, "upload file")
        return resp.json()["id"]
    def _sync_poll_until_completed(
        self,
        http_client: HTTPHandler,
        base_url: str,
        auth_headers: Dict[str, str],
        transcription_id: str,
        poll_interval: float,
        max_attempts: int,
        timeout: float,
        provider_config: SonioxAudioTranscriptionConfig,
    ) -> Dict[str, Any]:
        for _ in range(max_attempts):
            resp = http_client.get(
                url=f"{base_url}/v1/transcriptions/{transcription_id}",
                headers=auth_headers,
                timeout=timeout,
            )
            self._raise_for_response(resp, provider_config, "poll transcription")
            data = resp.json()
            status = data.get("status")
            if status == "completed":
                return data
            if status == "error":
                raise provider_config.get_error_class(
                    error_message=(
                        f"Soniox transcription {transcription_id} failed: "
                        f"{data.get('error_message') or data.get('error_type') or 'unknown error'}"
                    ),
                    status_code=500,
                    headers=resp.headers,
                )
            time.sleep(poll_interval)
        raise provider_config.get_error_class(
            error_message=(
                f"Soniox transcription {transcription_id} did not complete after "
                f"{max_attempts} polling attempts (interval={poll_interval}s)."
            ),
            status_code=504,
            headers={},
        )
    def _sync_cleanup(
        self,
        http_client: HTTPHandler,
        base_url: str,
        auth_headers: Dict[str, str],
        cleanup: List[str],
        file_id_to_cleanup: Optional[str],
        transcription_id: Optional[str],
        timeout: float,
    ) -> None:
        if not cleanup:
            return
        if "transcription" in cleanup and transcription_id:
            try:
                http_client.delete(
                    url=f"{base_url}/v1/transcriptions/{transcription_id}",
                    headers=auth_headers,
                    timeout=timeout,
                )
            except Exception:
                # Cleanup is best-effort: a failed delete leaves stale data on
                # Soniox but must not mask the original transcription result
                # (or, on the error path, the original error).
                pass
        if "file" in cleanup and file_id_to_cleanup:
            try:
                http_client.delete(
                    url=f"{base_url}/v1/files/{file_id_to_cleanup}",
                    headers=auth_headers,
                    timeout=timeout,
                )
            except Exception:
                # Cleanup is best-effort; see comment above.
                pass
    # ------------------------------------------------------------------
    # Async flow
    # ------------------------------------------------------------------
    async def _async_audio_transcriptions(
        self,
        model: str,
        audio_file: Optional[FileTypes],
        optional_params: dict,
        litellm_params: dict,
        model_response: TranscriptionResponse,
        timeout: float,
        logging_obj: LiteLLMLoggingObj,
        api_key: Optional[str],
        api_base: Optional[str],
        client: Optional[AsyncHTTPHandler],
        headers: Dict[str, Any],
        provider_config: SonioxAudioTranscriptionConfig,
    ) -> TranscriptionResponse:
        import litellm
        auth_headers, base_url, opt_params, handler_opts = self._prepare(
            audio_file=audio_file,
            optional_params=optional_params,
            litellm_params=litellm_params,
            api_key=api_key,
            api_base=api_base,
            provider_config=provider_config,
            headers=headers,
        )
        http_client = (
            client
            if isinstance(client, AsyncHTTPHandler)
            else (
                get_async_httpx_client(
                    llm_provider=litellm.LlmProviders.SONIOX,
                    params={"ssl_verify": litellm_params.get("ssl_verify", None)},
                )
            )
        )
        file_id = handler_opts.get("file_id")
        uploaded_file_id: Optional[str] = None
        transcription_id: Optional[str] = None
        try:
            if not file_id and not handler_opts.get("audio_url"):
                if audio_file is None:
                    raise SonioxException(
                        message=(
                            "Soniox transcription requires one of: a file argument, "
                            "an `audio_url` kwarg, or a `file_id` kwarg."
                        ),
                        status_code=400,
                        headers=None,
                    )
                uploaded_file_id = await self._async_upload_file(
                    http_client=http_client,
                    base_url=base_url,
                    auth_headers=auth_headers,
                    audio_file=audio_file,
                    filename_override=handler_opts.get("filename_override"),
                    timeout=timeout,
                    provider_config=provider_config,
                )
                file_id = uploaded_file_id
            body = self._build_create_body(model, opt_params, handler_opts, file_id)
            self._safe_log_pre_call(logging_obj, api_key, base_url, body)
            create_resp = await http_client.post(
                url=f"{base_url}/v1/transcriptions",
                headers=auth_headers,
                json=body,
                timeout=timeout,
            )
            self._raise_for_response(
                create_resp, provider_config, "create transcription"
            )
            transcription_id = create_resp.json()["id"]
            transcription_meta = await self._async_poll_until_completed(
                http_client=http_client,
                base_url=base_url,
                auth_headers=auth_headers,
                transcription_id=transcription_id,
                poll_interval=handler_opts["poll_interval"],
                max_attempts=handler_opts["max_attempts"],
                timeout=timeout,
                provider_config=provider_config,
            )
            transcript_resp = await http_client.get(
                url=f"{base_url}/v1/transcriptions/{transcription_id}/transcript",
                headers=auth_headers,
                timeout=timeout,
            )
            self._raise_for_response(
                transcript_resp, provider_config, "fetch transcript"
            )
            transcript = transcript_resp.json()
            payload = {"transcription": transcription_meta, "transcript": transcript}
            response = provider_config._build_response_from_payload(
                payload,
                model_response=model_response,
                response_format=handler_opts.get("response_format"),
            )
            self._safe_log_post_call(logging_obj, audio_file, api_key, body, payload)
            audio_duration_ms = transcription_meta.get("audio_duration_ms")
            response._hidden_params.update(
                {
                    "model": model,
                    "custom_llm_provider": "soniox",
                    "audio_transcription_duration": (
                        float(audio_duration_ms) / 1000.0
                        if audio_duration_ms is not None
                        else None
                    ),
                }
            )
            return response
        finally:
            await self._async_cleanup(
                http_client=http_client,
                base_url=base_url,
                auth_headers=auth_headers,
                cleanup=handler_opts["cleanup"],
                file_id_to_cleanup=uploaded_file_id,
                transcription_id=transcription_id,
                timeout=timeout,
            )
    async def _async_upload_file(
        self,
        http_client: AsyncHTTPHandler,
        base_url: str,
        auth_headers: Dict[str, str],
        audio_file: FileTypes,
        filename_override: Optional[str],
        timeout: float,
        provider_config: SonioxAudioTranscriptionConfig,
    ) -> str:
        processed = process_audio_file(audio_file)
        filename = filename_override or processed.filename
        files = {
            "file": (filename, processed.file_content, processed.content_type),
        }
        upload_headers = {"Authorization": auth_headers["Authorization"]}
        resp = await http_client.post(
            url=f"{base_url}/v1/files",
            headers=upload_headers,
            files=files,
            timeout=timeout,
        )
        self._raise_for_response(resp, provider_config, "upload file")
        return resp.json()["id"]
    async def _async_poll_until_completed(
        self,
        http_client: AsyncHTTPHandler,
        base_url: str,
        auth_headers: Dict[str, str],
        transcription_id: str,
        poll_interval: float,
        max_attempts: int,
        timeout: float,
        provider_config: SonioxAudioTranscriptionConfig,
    ) -> Dict[str, Any]:
        for _ in range(max_attempts):
            resp = await http_client.get(
                url=f"{base_url}/v1/transcriptions/{transcription_id}",
                headers=auth_headers,
                timeout=timeout,
            )
            self._raise_for_response(resp, provider_config, "poll transcription")
            data = resp.json()
            status = data.get("status")
            if status == "completed":
                return data
            if status == "error":
                raise provider_config.get_error_class(
                    error_message=(
                        f"Soniox transcription {transcription_id} failed: "
                        f"{data.get('error_message') or data.get('error_type') or 'unknown error'}"
                    ),
                    status_code=500,
                    headers=resp.headers,
                )
            await asyncio.sleep(poll_interval)
        raise provider_config.get_error_class(
            error_message=(
                f"Soniox transcription {transcription_id} did not complete after "
                f"{max_attempts} polling attempts (interval={poll_interval}s)."
            ),
            status_code=504,
            headers={},
        )
    async def _async_cleanup(
        self,
        http_client: AsyncHTTPHandler,
        base_url: str,
        auth_headers: Dict[str, str],
        cleanup: List[str],
        file_id_to_cleanup: Optional[str],
        transcription_id: Optional[str],
        timeout: float,
    ) -> None:
        if not cleanup:
            return
        if "transcription" in cleanup and transcription_id:
            try:
                await http_client.delete(
                    url=f"{base_url}/v1/transcriptions/{transcription_id}",
                    headers=auth_headers,
                    timeout=timeout,
                )
            except Exception:
                # Cleanup is best-effort: a failed delete leaves stale data on
                # Soniox but must not mask the original transcription result
                # (or, on the error path, the original error).
                pass
        if "file" in cleanup and file_id_to_cleanup:
            try:
                await http_client.delete(
                    url=f"{base_url}/v1/files/{file_id_to_cleanup}",
                    headers=auth_headers,
                    timeout=timeout,
                )
            except Exception:
                # Cleanup is best-effort; see comment above.
                pass
--- a/litellm/llms/soniox/audio_transcription/transformation.py
+++ b/litellm/llms/soniox/audio_transcription/transformation.py
@ -0,0 +1,281 @@
 """
 Translates between OpenAI's `/v1/audio/transcriptions` shape and Soniox's
 async transcription API (https://soniox.com/docs/stt/async/async-transcription).
 This config covers parameter mapping, env validation and response shaping.
 The actual orchestration (file upload -> create -> poll -> fetch -> cleanup)
 lives in `litellm.llms.soniox.audio_transcription.handler`, because Soniox's
 async API requires multiple HTTP calls and does not fit the single-request
 contract of `base_llm_http_handler.audio_transcriptions`.
 """
 from typing import Any, Dict, List, Optional, Union
 from httpx import Headers, Response
 from litellm.llms.base_llm.audio_transcription.transformation import (
    AudioTranscriptionRequestData,
    BaseAudioTranscriptionConfig,
 )
 from litellm.llms.base_llm.chat.transformation import BaseLLMException
 from litellm.llms.soniox.common_utils import (
    SonioxException,
    get_soniox_api_base,
    get_soniox_api_key,
    render_soniox_tokens,
    render_soniox_tokens_as_srt,
    render_soniox_tokens_as_vtt,
 )
 from litellm.types.llms.openai import (
    AllMessageValues,
    OpenAIAudioTranscriptionOptionalParams,
 )
 from litellm.types.utils import FileTypes, TranscriptionResponse
 # Soniox-native kwargs the user can pass through `litellm.transcription(..., **kwargs)`
 # in addition to the standard OpenAI params.
 SONIOX_PASSTHROUGH_PARAMS: List[str] = [
    "language_hints",
    "language_hints_strict",
    "enable_language_identification",
    "enable_speaker_diarization",
    "context",
    "translation",
    "client_reference_id",
    "webhook_url",
    "webhook_auth_header_name",
    "webhook_auth_header_value",
    "audio_url",
    "file_id",
 ]
 # Handler-only kwargs (consumed by the handler, not sent to Soniox).
 SONIOX_HANDLER_ONLY_PARAMS: List[str] = [
    "soniox_polling_interval",
    "soniox_max_polling_attempts",
    "soniox_cleanup",
    "filename",
 ]
 class SonioxAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
    """Configuration for Soniox async speech-to-text transcription."""
    def get_supported_openai_params(
        self, model: str
    ) -> List[OpenAIAudioTranscriptionOptionalParams]:
        # `language` is mapped onto Soniox's `language_hints`.
        # `response_format` is handled by LiteLLM (Soniox doesn't support
        # SRT/VTT natively but we synthesize them from token timestamps).
        return ["language", "response_format"]
    def map_openai_params(
        self,
        non_default_params: dict,
        optional_params: dict,
        model: str,
        drop_params: bool,
    ) -> dict:
        # Translate the OpenAI `language` param into Soniox `language_hints`.
        if "language" in non_default_params and non_default_params["language"]:
            language = non_default_params["language"]
            existing_hints = optional_params.get("language_hints")
            if not existing_hints:
                optional_params["language_hints"] = [language]
            elif language not in existing_hints:
                optional_params["language_hints"] = [language] + list(existing_hints)
        # Capture response_format for post-processing (not sent to Soniox API).
        if "response_format" in non_default_params:
            optional_params["response_format"] = non_default_params["response_format"]
        # Pass through Soniox-native kwargs unchanged.
        for key in SONIOX_PASSTHROUGH_PARAMS + SONIOX_HANDLER_ONLY_PARAMS:
            if key in non_default_params and non_default_params[key] is not None:
                optional_params[key] = non_default_params[key]
        return optional_params
    def get_error_class(
        self, error_message: str, status_code: int, headers: Union[dict, Headers]
    ) -> BaseLLMException:
        return SonioxException(
            message=error_message, status_code=status_code, headers=headers
        )
    def validate_environment(
        self,
        headers: dict,
        model: str,
        messages: List[AllMessageValues],
        optional_params: dict,
        litellm_params: dict,
        api_key: Optional[str] = None,
        api_base: Optional[str] = None,
    ) -> dict:
        resolved_key = get_soniox_api_key(api_key)
        if not resolved_key:
            raise SonioxException(
                message=(
                    "Missing Soniox API key. Set the SONIOX_API_KEY environment "
                    "variable or pass api_key=... to litellm.transcription()."
                ),
                status_code=401,
                headers=None,
            )
        merged_headers: Dict[str, str] = {
            "Authorization": f"Bearer {resolved_key}",
        }
        if headers:
            merged_headers.update(headers)
        return merged_headers
    def get_complete_url(
        self,
        api_base: Optional[str],
        api_key: Optional[str],
        model: str,
        optional_params: dict,
        litellm_params: dict,
        stream: Optional[bool] = None,
    ) -> str:
        # The handler builds per-call URLs (uploads, create, poll, fetch, delete);
        # we just return the resolved base.
        return get_soniox_api_base(api_base)
    def transform_audio_transcription_request(
        self,
        model: str,
        audio_file: FileTypes,
        optional_params: dict,
        litellm_params: dict,
    ) -> AudioTranscriptionRequestData:
        """
        Build the JSON body for `POST /v1/transcriptions`.
        The handler is responsible for the file upload (if `audio_file` is bytes)
        and for filling in `file_id`/`audio_url`. This method exists so the
        config can be exercised in isolation by unit tests.
        """
        body: Dict[str, Any] = {"model": model}
        for key in SONIOX_PASSTHROUGH_PARAMS:
            value = optional_params.get(key)
            if value is not None:
                body[key] = value
        return AudioTranscriptionRequestData(
            data=body, files=None, content_type="application/json"
        )
    def transform_audio_transcription_response(
        self,
        raw_response: Response,
        model_response: Optional[TranscriptionResponse] = None,
    ) -> TranscriptionResponse:
        """
        Build a TranscriptionResponse from a Soniox transcript payload.
        `raw_response.json()` may be either:
          - a Soniox transcript object: `{"id": "...", "text": "...", "tokens": [...]}`
          - or a merged envelope: `{"transcription": {...}, "transcript": {...}}`
            produced by the handler so transcription metadata is also available.
        """
        try:
            payload = raw_response.json()
        except Exception as exc:
            raise SonioxException(
                message=f"Failed to parse Soniox response: {exc}",
                status_code=getattr(raw_response, "status_code", 500),
                headers=getattr(raw_response, "headers", None),
            )
        return self._build_response_from_payload(payload, model_response=model_response)
    def _build_response_from_payload(
        self,
        payload: Dict[str, Any],
        model_response: Optional[TranscriptionResponse] = None,
        response_format: Optional[str] = None,
    ) -> TranscriptionResponse:
        """Shared response-building logic (also used by the handler)."""
        transcription_meta: Dict[str, Any] = {}
        transcript: Dict[str, Any]
        if isinstance(payload, dict) and "transcript" in payload:
            transcription_meta = payload.get("transcription") or {}
            transcript = payload.get("transcript") or {}
        else:
            transcript = payload if isinstance(payload, dict) else {}
        tokens: List[Dict[str, Any]] = transcript.get("tokens") or []
        # Decide what to put in `text` based on response_format:
        #   - "srt": render tokens as SRT subtitles (synthesized from timestamps)
        #   - "vtt": render tokens as WebVTT subtitles (synthesized from timestamps)
        #   - "verbose_json": return JSON with word-level timing (handled below)
        #   - "text" / "json" / None: default plain text rendering
        if response_format == "srt" and tokens:
            text = render_soniox_tokens_as_srt(tokens)
        elif response_format == "vtt" and tokens:
            text = render_soniox_tokens_as_vtt(tokens)
        else:
            # Default text rendering (also used for "json", "text",
            # "verbose_json")
            has_speaker = any(t.get("speaker") is not None for t in tokens)
            has_language = any(t.get("language") is not None for t in tokens)
            if (has_speaker or has_language) and tokens:
                text = render_soniox_tokens(tokens)
            elif transcript.get("text"):
                text = transcript["text"]
            elif tokens:
                text = "".join(t.get("text", "") for t in tokens)
            else:
                text = ""
        response = model_response or TranscriptionResponse(text=text)
        response.text = text
        response["task"] = "transcribe"
        # Best-effort metadata fields matching OpenAI's verbose_json shape.
        if transcription_meta.get("audio_duration_ms") is not None:
            try:
                response["duration"] = (
                    float(transcription_meta["audio_duration_ms"]) / 1000.0
                )
            except (TypeError, ValueError):
                pass
        # Surface a representative language if all tokens agree.
        has_language = any(t.get("language") is not None for t in tokens)
        if has_language:
            languages = {t.get("language") for t in tokens if t.get("language")}
            if len(languages) == 1:
                response["language"] = next(iter(languages))
        # For verbose_json, include word-level timing from tokens.
        if response_format == "verbose_json" and tokens:
            words: List[Dict[str, Any]] = []
            for token in tokens:
                word_entry: Dict[str, Any] = {"word": token.get("text", "")}
                if token.get("start_ms") is not None:
                    word_entry["start"] = float(token["start_ms"]) / 1000.0
                if token.get("end_ms") is not None:
                    word_entry["end"] = float(token["end_ms"]) / 1000.0
                words.append(word_entry)
            if words:
                response["words"] = words
        # Stash the raw Soniox payload so power-users can read tokens, segments,
        # speaker/language data, etc.
        response._hidden_params.update(
            {
                "soniox_raw": {
                    "transcription": transcription_meta,
                    "transcript": transcript,
                }
            }
        )
        return response
--- a/litellm/llms/soniox/common_utils.py
+++ b/litellm/llms/soniox/common_utils.py
@ -0,0 +1,274 @@
 """
 Shared utilities for the Soniox provider (https://soniox.com).
 """
 from typing import Any, Dict, List, Optional
 from litellm.llms.base_llm.chat.transformation import BaseLLMException
 # Soniox API base URL.
 SONIOX_API_BASE: str = "https://api.soniox.com"
 # Default polling interval in seconds when waiting for an async transcription
 # to finish. Mirrors the Soniox SDK default.
 SONIOX_DEFAULT_POLL_INTERVAL: float = 1.0
 # Minimum polling interval (in seconds) the server will accept from caller-
 # supplied `soniox_polling_interval` kwargs. Prevents an authenticated caller
 # from forcing a worker into a tight poll loop with a zero/near-zero interval.
 SONIOX_MIN_POLL_INTERVAL: float = 0.5
 # Maximum polling interval (in seconds). Prevents a caller from setting an
 # excessively large or non-finite interval that would keep a worker sleeping
 # far longer than necessary between status checks.
 SONIOX_MAX_POLL_INTERVAL: float = 60.0
 # Default maximum number of polling attempts (1800 attempts * 1s ~= 30 minutes).
 SONIOX_DEFAULT_MAX_POLL_ATTEMPTS: int = 1800
 # Hard upper bound on polling attempts. Combined with `SONIOX_MIN_POLL_INTERVAL`
 # this caps total polling time per request at ~3000s (50 minutes), preventing a
 # caller from pinning a worker indefinitely via a huge attempt count.
 SONIOX_MAX_POLL_ATTEMPTS: int = 6000
 # Default cleanup behaviour: delete both the uploaded file (if any) and the
 # transcription record after the transcript has been fetched.
 SONIOX_DEFAULT_CLEANUP: List[str] = ["file", "transcription"]
 # Body fields that may carry secrets and must be redacted before being
 # forwarded to logging callbacks. Soniox accepts a webhook auth header value
 # alongside the create-transcription request; that value lets the recipient
 # authenticate webhook callbacks and must not leak into observability sinks.
 SONIOX_SECRET_FIELDS: List[str] = ["webhook_auth_header_value"]
 class SonioxException(BaseLLMException):
    """Provider-specific exception class for Soniox."""
    pass
 def get_soniox_api_key(api_key: Optional[str] = None) -> Optional[str]:
    """Resolve the Soniox API key from arg or env var."""
    # Local import to avoid a circular import: litellm.secret_managers.main
    # imports from litellm at top-level.
    from litellm.secret_managers.main import get_secret_str
    return api_key or get_secret_str("SONIOX_API_KEY")
 def get_soniox_api_base(api_base: Optional[str] = None) -> str:
    """Resolve the Soniox API base URL from arg or env var (defaults to public API)."""
    from litellm.secret_managers.main import get_secret_str
    base = api_base or get_secret_str("SONIOX_API_BASE") or SONIOX_API_BASE
    return base.rstrip("/")
 def render_soniox_tokens(tokens: List[Dict[str, Any]]) -> str:
    """
    Render a list of Soniox tokens to a readable transcript string.
    Mirrors the behaviour of the official Soniox SDK's `renderTokens` helper:
    - When the speaker changes, a `Speaker N:` tag is inserted.
    - When the language changes, a `[lang]` (or `[Translation][lang]`) tag is
      inserted.
    If neither speaker nor language information is present on any token (i.e.
    diarization and language identification are disabled), the function simply
    concatenates the token texts.
    """
    if not tokens:
        return ""
    text_parts: List[str] = []
    current_speaker: Optional[Any] = None
    current_language: Optional[Any] = None
    for token in tokens:
        text = token.get("text", "")
        speaker = token.get("speaker")
        language = token.get("language")
        is_translation = token.get("translation_status") == "translation"
        # Speaker changed -> emit a speaker tag.
        if speaker is not None and speaker != current_speaker:
            if current_speaker is not None:
                text_parts.append("\n\n")
            current_speaker = speaker
            current_language = None  # reset language whenever speaker changes
            text_parts.append(f"Speaker {current_speaker}:")
        # Language changed -> emit a language (or translation) tag.
        if language is not None and language != current_language:
            current_language = language
            prefix = "[Translation] " if is_translation else ""
            text_parts.append(f"\n{prefix}[{current_language}] ")
            text = text.lstrip() if isinstance(text, str) else text
        text_parts.append(text)
    return "".join(text_parts)
 # ---------------------------------------------------------------------------
 # SRT / VTT subtitle rendering
 # ---------------------------------------------------------------------------
 # Maximum number of tokens to group into a single subtitle cue.
 _CUE_MAX_TOKENS: int = 15
 # Maximum duration (in ms) for a single cue before forcing a break.
 _CUE_MAX_DURATION_MS: int = 5000
 def _format_timestamp_srt(ms: int) -> str:
    """Format milliseconds as SRT timestamp: HH:MM:SS,mmm"""
    if ms < 0:
        ms = 0
    hours = ms // 3_600_000
    ms %= 3_600_000
    minutes = ms // 60_000
    ms %= 60_000
    seconds = ms // 1_000
    millis = ms % 1_000
    return f"{hours:02d}:{minutes:02d}:{seconds:02d},{millis:03d}"
 def _format_timestamp_vtt(ms: int) -> str:
    """Format milliseconds as VTT timestamp: HH:MM:SS.mmm"""
    if ms < 0:
        ms = 0
    hours = ms // 3_600_000
    ms %= 3_600_000
    minutes = ms // 60_000
    ms %= 60_000
    seconds = ms // 1_000
    millis = ms % 1_000
    return f"{hours:02d}:{minutes:02d}:{seconds:02d}.{millis:03d}"
 def _group_tokens_into_cues(
    tokens: List[Dict[str, Any]],
 ) -> List[Dict[str, Any]]:
    """
    Group Soniox tokens into subtitle cues.
    Each cue has:
      - start_ms: int
      - end_ms: int
      - text: str
    Grouping heuristics:
      - A new cue starts when token count exceeds _CUE_MAX_TOKENS.
      - A new cue starts when duration exceeds _CUE_MAX_DURATION_MS.
      - A new cue starts when the speaker changes (if diarization is on).
      - Tokens without timestamps are appended to the current cue.
    """
    cues: List[Dict[str, Any]] = []
    current_tokens: List[str] = []
    current_start: Optional[int] = None
    current_end: Optional[int] = None
    current_speaker: Optional[Any] = None
    def _flush() -> None:
        if current_tokens and current_start is not None:
            text = "".join(current_tokens).strip()
            if text:
                cues.append(
                    {
                        "start_ms": current_start,
                        "end_ms": (
                            current_end if current_end is not None else current_start
                        ),
                        "text": text,
                    }
                )
    for token in tokens:
        start_ms = token.get("start_ms")
        end_ms = token.get("end_ms")
        text = token.get("text", "")
        speaker = token.get("speaker")
        # Skip tokens with no timestamp data entirely if we have no cue started
        if start_ms is None and current_start is None:
            continue
        # Speaker change forces a new cue
        if speaker is not None and speaker != current_speaker:
            _flush()
            current_tokens = []
            current_start = start_ms
            current_end = end_ms
            current_speaker = speaker
            current_tokens.append(text)
            continue
        # Duration or token count exceeded -> flush
        should_break = False
        if len(current_tokens) >= _CUE_MAX_TOKENS:
            should_break = True
        elif (
            current_start is not None
            and start_ms is not None
            and (start_ms - current_start) >= _CUE_MAX_DURATION_MS
        ):
            should_break = True
        if should_break:
            _flush()
            current_tokens = []
            current_start = start_ms
            current_end = end_ms
            current_tokens.append(text)
        else:
            if current_start is None:
                current_start = start_ms
            if end_ms is not None:
                current_end = end_ms
            current_tokens.append(text)
    _flush()
    return cues
 def render_soniox_tokens_as_srt(tokens: List[Dict[str, Any]]) -> str:
    """
    Render Soniox tokens as SRT (SubRip) subtitle format.
    Returns an empty string if no tokens have timestamp data.
    """
    cues = _group_tokens_into_cues(tokens)
    if not cues:
        return ""
    lines: List[str] = []
    for idx, cue in enumerate(cues, start=1):
        start = _format_timestamp_srt(cue["start_ms"])
        end = _format_timestamp_srt(cue["end_ms"])
        lines.append(str(idx))
        lines.append(f"{start} --> {end}")
        lines.append(cue["text"])
        lines.append("")  # blank line between cues
    return "\n".join(lines)
 def render_soniox_tokens_as_vtt(tokens: List[Dict[str, Any]]) -> str:
    """
    Render Soniox tokens as WebVTT subtitle format.
    Returns the VTT header even if no cues are present.
    """
    cues = _group_tokens_into_cues(tokens)
    lines: List[str] = ["WEBVTT", ""]
    for cue in cues:
        start = _format_timestamp_vtt(cue["start_ms"])
        end = _format_timestamp_vtt(cue["end_ms"])
        lines.append(f"{start} --> {end}")
        lines.append(cue["text"])
        lines.append("")  # blank line between cues
    return "\n".join(lines)
--- a/litellm/llms/you_com/init.py
+++ b/litellm/llms/you_com/init.py
--- a/litellm/llms/you_com/search/init.py
+++ b/litellm/llms/you_com/search/init.py
@ -0,0 +1,7 @@
 """
 You.com Search API module.
 """
 from litellm.llms.you_com.search.transformation import YouComSearchConfig
 __all__ = ["YouComSearchConfig"]
--- a/litellm/llms/you_com/search/transformation.py
+++ b/litellm/llms/you_com/search/transformation.py
@ -0,0 +1,193 @@
 """
 Calls You.com's /v1/search endpoint to search the web.
 You.com API Reference: https://you.com/docs/api-reference/search/v1-search
 OpenAPI spec:          https://you.com/specs/openapi_search_v1.yaml
 """
 from typing import Dict, List, Optional, TypedDict, Union
 import httpx
 from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
 from litellm.llms.base_llm.search.transformation import (
    BaseSearchConfig,
    SearchResponse,
    SearchResult,
 )
 from litellm.secret_managers.main import get_secret_str
 class _YouComSearchRequestRequired(TypedDict):
    """Required fields for You.com Search API request."""
    query: str
 class YouComSearchRequest(_YouComSearchRequestRequired, total=False):
    """
    You.com Search API request format.
    Based on: https://you.com/specs/openapi_search_v1.yaml
    """
    count: int
    country: str
    language: str
    freshness: str
    include_domains: List[str]
    exclude_domains: List[str]
    safesearch: str
 class YouComSearchConfig(BaseSearchConfig):
    # Keyed tier (higher rate limits): authenticate with X-API-Key.
    YOU_COM_API_BASE = "https://ydc-index.io"
    # Keyless free tier: IP-throttled (100 queries/day) and requires no auth.
    # Used automatically when YOUCOM_API_KEY is not set.
    YOU_COM_FREE_API_BASE = "https://api.you.com/v1/agents/search"
    @staticmethod
    def ui_friendly_name() -> str:
        return "You.com"
    def validate_environment(
        self,
        headers: Dict,
        api_key: Optional[str] = None,
        api_base: Optional[str] = None,
        **kwargs,
    ) -> Dict:
        """
        Set headers for the You.com Search API.
        If YOUCOM_API_KEY (or an explicit api_key) is present, use the keyed
        endpoint with the `X-API-Key` header. Otherwise fall through to the
        keyless free tier; no auth header is required.
        """
        api_key = api_key or get_secret_str("YOUCOM_API_KEY")
        headers["Content-Type"] = "application/json"
        # Pin Accept-Encoding to identity: the keyless `api.you.com/v1/agents/search`
        # endpoint advertises gzip content-encoding but returns body bytes the
        # decoder rejects, which surfaces as httpx.DecodingError through litellm's
        # http handler. Identity is harmless on the keyed endpoint.
        headers.setdefault("Accept-Encoding", "identity")
        if api_key:
            headers["X-API-Key"] = api_key
        return headers
    def get_complete_url(
        self,
        api_base: Optional[str],
        optional_params: dict,
        data: Optional[Union[Dict, List[Dict]]] = None,
        **kwargs,
    ) -> str:
        """
        Pick the endpoint based on whether an API key is configured.
        - api_base explicit override     -> use it as-is (normalized)
        - YOUCOM_API_KEY set             -> keyed endpoint (ydc-index.io/v1/search)
        - no key                         -> keyless free tier (api.you.com/v1/agents/search)
        """
        if api_base is None:
            api_base = get_secret_str("YOUCOM_API_BASE")
        if api_base is None:
            api_key = kwargs.get("api_key") or get_secret_str("YOUCOM_API_KEY")
            if api_key:
                api_base = self.YOU_COM_API_BASE
            else:
                # Keyless free tier already includes the full path.
                return self.YOU_COM_FREE_API_BASE
        api_base = api_base.rstrip("/")
        if not api_base.endswith("/v1/search") and not api_base.endswith(
            "/v1/agents/search"
        ):
            api_base = f"{api_base}/v1/search"
        return api_base
    def transform_search_request(
        self,
        query: Union[str, List[str]],
        optional_params: dict,
        **kwargs,
    ) -> Dict:
        """
        Transform Search request to You.com API format.
        Perplexity unified spec → You.com mappings:
        - query                 → query
        - max_results           → count
        - search_domain_filter  → include_domains
        - country               → country
        - max_tokens_per_page   → (not applicable, ignored)
        """
        if isinstance(query, list):
            query = " ".join(query)
        request_data: YouComSearchRequest = {
            "query": query,
        }
        if "max_results" in optional_params:
            request_data["count"] = optional_params["max_results"]
        if "search_domain_filter" in optional_params:
            request_data["include_domains"] = optional_params["search_domain_filter"]
        if "country" in optional_params:
            request_data["country"] = optional_params["country"].lower()
        result_data = dict(request_data)
        for param, value in optional_params.items():
            if (
                param not in self.get_supported_perplexity_optional_params()
                and param not in result_data
            ):
                result_data[param] = value
        return result_data
    def transform_search_response(
        self,
        raw_response: httpx.Response,
        logging_obj: LiteLLMLoggingObj,
        **kwargs,
    ) -> SearchResponse:
        """
        Transform You.com API response to LiteLLM unified SearchResponse format.
        You.com → LiteLLM mappings (for both `results.web[]` and `results.news[]`):
        - title       → SearchResult.title
        - url         → SearchResult.url
        - snippets[0] → SearchResult.snippet (falls back to `description`)
        - page_age    → SearchResult.date
        """
        response_json = raw_response.json()
        raw_results = response_json.get("results") or {}
        web_results = raw_results.get("web") or []
        news_results = raw_results.get("news") or []
        results: List[SearchResult] = []
        for item in list(web_results) + list(news_results):
            snippets = item.get("snippets") or []
            snippet = snippets[0] if snippets else item.get("description", "")
            results.append(
                SearchResult(
                    title=item.get("title", ""),
                    url=item.get("url", ""),
                    snippet=snippet,
                    date=item.get("page_age"),
                    last_updated=None,
                )
            )
        return SearchResponse(
            results=results,
            object="search",
        )
--- a/litellm/main.py
+++ b/litellm/main.py
@ -6655,7 +6655,7 @@ async def atranscription(*args, **kwargs) -> TranscriptionResponse:
@client
-def transcription(
+def transcription(  # noqa: PLR0915
    model: str,
    file: FileTypes,
    ## OPTIONAL OPENAI PARAMS ##
@ -6847,6 +6847,35 @@ def transcription(
                else None
            ),
        )
    elif custom_llm_provider == "soniox":
        from litellm.llms.soniox.audio_transcription.handler import (
            SonioxAudioTranscriptionHandler,
        )
        response = SonioxAudioTranscriptionHandler().audio_transcriptions(
            model=model,
            audio_file=file,
            optional_params=optional_params,
            litellm_params=litellm_params_dict,
            model_response=model_response,
            atranscription=atranscription,
            client=(
                client
                if client is not None
                and (
                    isinstance(client, HTTPHandler)
                    or isinstance(client, AsyncHTTPHandler)
                )
                else None
            ),
            timeout=timeout,
            max_retries=max_retries,
            logging_obj=litellm_logging_obj,
            api_base=api_base,
            api_key=api_key,
            headers=extra_headers,
            provider_config=provider_config,  # type: ignore[arg-type]
        )
    elif provider_config is not None:
        response = base_llm_http_handler.audio_transcriptions(
            model=model,
--- a/litellm/model_prices_and_context_window_backup.json
+++ b/litellm/model_prices_and_context_window_backup.json
@ -1463,6 +1463,36 @@
        "supports_output_config": true,
        "bedrock_output_config_effort_ceiling": "xhigh"
    },
    "jp.anthropic.claude-opus-4-7": {
        "cache_creation_input_token_cost": 6.875e-06,
        "cache_read_input_token_cost": 5.5e-07,
        "input_cost_per_token": 5.5e-06,
        "litellm_provider": "bedrock_converse",
        "max_input_tokens": 1000000,
        "max_output_tokens": 128000,
        "max_tokens": 128000,
        "mode": "chat",
        "output_cost_per_token": 2.75e-05,
        "search_context_cost_per_query": {
            "search_context_size_high": 0.01,
            "search_context_size_low": 0.01,
            "search_context_size_medium": 0.01
        },
        "supports_assistant_prefill": false,
        "supports_computer_use": true,
        "supports_function_calling": true,
        "supports_pdf_input": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
        "supports_xhigh_reasoning_effort": true,
        "tool_use_system_prompt_tokens": 346,
        "supports_native_structured_output": true,
        "supports_max_reasoning_effort": true,
        "supports_minimal_reasoning_effort": true
    },
    "anthropic.claude-sonnet-4-6": {
        "cache_creation_input_token_cost": 3.75e-06,
        "cache_creation_input_token_cost_above_1hr": 6e-06,
@ -24096,6 +24126,21 @@
        "max_input_tokens": 200000,
        "max_output_tokens": 8192
    },
    "minimax/MiniMax-M3": {
        "input_cost_per_token": 6e-07,
        "output_cost_per_token": 2.4e-06,
        "cache_read_input_token_cost": 1.2e-07,
        "litellm_provider": "minimax",
        "mode": "chat",
        "supports_function_calling": true,
        "supports_tool_choice": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
        "supports_system_messages": true,
        "supports_vision": true,
        "max_input_tokens": 512000,
        "max_output_tokens": 128000
    },
    "mistral.devstral-2-123b": {
        "input_cost_per_token": 4e-07,
        "litellm_provider": "bedrock_converse",
@ -24978,6 +25023,7 @@
    },
    "moonshot/kimi-k2-0711-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -24992,6 +25038,7 @@
    },
    "moonshot/kimi-k2-0905-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25006,6 +25053,7 @@
    },
    "moonshot/kimi-k2-turbo-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 1.15e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25030,6 +25078,7 @@
        "source": "https://platform.moonshot.ai/docs/guide/kimi-k2-5-quickstart",
        "supports_function_calling": true,
        "supports_reasoning": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_video_input": true,
        "supports_vision": true
@ -25046,12 +25095,14 @@
        "source": "https://platform.kimi.ai/docs/pricing/chat-k26",
        "supports_function_calling": true,
        "supports_reasoning": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_video_input": true,
        "supports_vision": true
    },
    "moonshot/kimi-latest": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 2e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25066,6 +25117,7 @@
    },
    "moonshot/kimi-latest-128k": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 2e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25080,6 +25132,7 @@
    },
    "moonshot/kimi-latest-32k": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 1e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 32768,
@ -25094,6 +25147,7 @@
    },
    "moonshot/kimi-latest-8k": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 2e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 8192,
@ -25108,6 +25162,7 @@
    },
    "moonshot/kimi-thinking-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2025-11-11",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25120,6 +25175,7 @@
    },
    "moonshot/kimi-k2-thinking": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25135,6 +25191,7 @@
    },
    "moonshot/kimi-k2-thinking-turbo": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 1.15e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25158,9 +25215,11 @@
        "output_cost_per_token": 5e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "moonshot/moonshot-v1-128k-0430": {
        "deprecation_date": "2024-04-30",
        "input_cost_per_token": 2e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25182,6 +25241,7 @@
        "output_cost_per_token": 5e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true
    },
@ -25195,9 +25255,11 @@
        "output_cost_per_token": 3e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "moonshot/moonshot-v1-32k-0430": {
        "deprecation_date": "2024-04-30",
        "input_cost_per_token": 1e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 32768,
@ -25219,6 +25281,7 @@
        "output_cost_per_token": 3e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true
    },
@ -25232,9 +25295,11 @@
        "output_cost_per_token": 2e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "moonshot/moonshot-v1-8k-0430": {
        "deprecation_date": "2024-04-30",
        "input_cost_per_token": 2e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 8192,
@ -25256,6 +25321,7 @@
        "output_cost_per_token": 2e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true
    },
@ -25269,6 +25335,7 @@
        "output_cost_per_token": 5e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "morph/morph-v3-fast": {
@ -36012,7 +36079,8 @@
        "supports_prompt_caching": true,
        "supports_response_schema": false,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-3-beta": {
        "cache_read_input_token_cost": 7.5e-07,
@ -36211,7 +36279,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-fast-non-reasoning": {
        "cache_read_input_token_cost": 5e-08,
@ -36228,7 +36297,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-0709": {
        "input_cost_per_token": 3e-06,
@ -36244,7 +36314,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-latest": {
        "input_cost_per_token": 3e-06,
@ -36302,7 +36373,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-1-fast-reasoning-latest": {
        "cache_read_input_token_cost": 5e-08,
@ -36323,7 +36395,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-1-fast-non-reasoning": {
        "cache_read_input_token_cost": 5e-08,
@ -36343,7 +36416,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-1-fast-non-reasoning-latest": {
        "cache_read_input_token_cost": 5e-08,
@ -36363,7 +36437,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4.20-multi-agent-beta-0309": {
        "cache_read_input_token_cost": 2e-07,
@ -36514,7 +36589,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
-        "supports_tool_choice": true
+        "supports_tool_choice": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-code-fast-1-0825": {
        "cache_read_input_token_cost": 2e-08,
@ -36529,7 +36605,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
-        "supports_tool_choice": true
+        "supports_tool_choice": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-vision-beta": {
        "input_cost_per_image": 5e-06,
@ -41547,5 +41624,18 @@
        "supports_vision": true,
        "supports_native_structured_output": true,
        "supports_pdf_input": true
    },
    "soniox/stt-async-v4": {
        "litellm_provider": "soniox",
        "max_output_tokens": 8000,
        "max_tokens": 8000,
        "input_cost_per_second": 0.0,
        "output_cost_per_second": 0.0000277778,
        "mode": "audio_transcription",
        "source": "https://soniox.com/pricing",
        "supported_endpoints": [
            "/v1/audio/transcriptions"
        ],
        "supports_audio_input": true
    }
-}
+}
--- a/litellm/provider_endpoints_support_backup.json
+++ b/litellm/provider_endpoints_support_backup.json
@ -1539,6 +1539,23 @@
        "interactions": true
      }
    },
    "neosantara": {
      "display_name": "Neosantara (`neosantara`)",
      "url": "https://docs.litellm.ai/docs/providers/neosantara",
      "endpoints": {
        "chat_completions": true,
        "messages": false,
        "responses": true,
        "embeddings": false,
        "image_generations": false,
        "audio_transcriptions": false,
        "audio_speech": false,
        "moderations": false,
        "batches": false,
        "rerank": false,
        "a2a": false
      }
    },
    "nvidia_nim": {
      "display_name": "Nvidia NIM (`nvidia_nim`)",
      "url": "https://docs.litellm.ai/docs/providers/nvidia_nim",
--- a/litellm/proxy/_experimental/mcp_server/mcp_server_manager.py
+++ b/litellm/proxy/_experimental/mcp_server/mcp_server_manager.py
@ -684,6 +684,7 @@ class MCPServerManager:
                ),
                allow_sampling=bool(server_config.get("allow_sampling", False)),
                allow_elicitation=bool(server_config.get("allow_elicitation", False)),
                timeout=server_config.get("timeout", None),
            )
            self._assign_unique_short_prefix(new_server)
            _warn_internal_delegate_pkce_if_applicable(new_server, source="config")
@ -1096,6 +1097,7 @@ class MCPServerManager:
                credentials_dict.get("subject_token_type") if credentials_dict else None
            )
            or "urn:ietf:params:oauth:token-type:access_token",
            timeout=getattr(mcp_server, "timeout", None),
        )
        _warn_internal_delegate_pkce_if_applicable(new_server, source="database")
        return new_server
@ -1662,7 +1664,9 @@ class MCPServerManager:
                transport_type=transport,
                auth_type=server.auth_type,
                auth_value=auth_value,
-                timeout=MCP_CLIENT_TIMEOUT,
+                timeout=(
                    server.timeout if server.timeout is not None else MCP_CLIENT_TIMEOUT
                ),
                stdio_config=stdio_config,
                extra_headers=extra_headers,
                sampling_callback=sampling_cb,
@ -1690,7 +1694,9 @@ class MCPServerManager:
                transport_type=transport,
                auth_type=server.auth_type,
                auth_value=auth_value,
-                timeout=MCP_CLIENT_TIMEOUT,
+                timeout=(
                    server.timeout if server.timeout is not None else MCP_CLIENT_TIMEOUT
                ),
                extra_headers=extra_headers,
                aws_auth=aws_auth,
                sampling_callback=sampling_cb,
@ -3158,14 +3164,26 @@ class MCPServerManager:
            asyncio.create_task(_call_tool_via_client(client, call_tool_params))
        )
        _timeout = (
            mcp_server.timeout if mcp_server.timeout is not None else MCP_CLIENT_TIMEOUT
        )
        try:
-            mcp_responses = await asyncio.gather(*tasks)
+            mcp_responses = await asyncio.wait_for(
                asyncio.gather(*tasks), timeout=_timeout
            )
        except asyncio.TimeoutError:
            raise HTTPException(
                status_code=504,
                detail={
                    "error": "timeout",
                    "message": f"MCP tool call timed out after {_timeout}s",
                },
            )
        except (
            BlockedPiiEntityError,
            GuardrailRaisedException,
            HTTPException,
        ) as e:
            # Re-raise guardrail exceptions to properly fail the MCP call
            verbose_logger.error(
                f"Guardrail blocked MCP tool call during result check: {str(e)}"
            )
@ -3953,6 +3971,7 @@ class MCPServerManager:
            registration_url=server.registration_url,
            allow_all_keys=server.allow_all_keys,
            instructions=server.instructions,
            timeout=server.timeout,
        )
    async def get_all_mcp_servers_with_health_and_teams(
@ -4052,6 +4071,7 @@ class MCPServerManager:
            byok_api_key_help_url=server.byok_api_key_help_url,
            source_url=server.source_url,
            instructions=server.instructions,
            timeout=server.timeout,
        )
    async def get_all_mcp_servers_unfiltered(self) -> List[LiteLLM_MCPServerTable]:
--- a/litellm/proxy/_types.py
+++ b/litellm/proxy/_types.py
@ -1300,6 +1300,7 @@ class NewMCPServerRequest(LiteLLMPydanticObjectBase):
    byok_description: List[str] = Field(default_factory=list)
    byok_api_key_help_url: Optional[str] = None
    source_url: Optional[str] = None
    timeout: Optional[float] = None
    # BYOM submission fields — set by the endpoint, not by the caller.
    # Any caller-provided values are silently overridden before persistence.
    approval_status: Optional[str] = Field(
@ -1385,6 +1386,7 @@ class UpdateMCPServerRequest(LiteLLMPydanticObjectBase):
    byok_description: List[str] = Field(default_factory=list)
    byok_api_key_help_url: Optional[str] = None
    source_url: Optional[str] = None
    timeout: Optional[float] = None
    @model_validator(mode="before")
    @classmethod
@ -1460,6 +1462,7 @@ class LiteLLM_MCPServerTable(LiteLLMPydanticObjectBase):
    byok_api_key_help_url: Optional[str] = None
    has_user_credential: Optional[bool] = None
    source_url: Optional[str] = None
    timeout: Optional[float] = None
    # BYOM submission fields
    approval_status: Optional[str] = Field(
        default="active",
--- a/litellm/proxy/example_config_yaml/websearch_interception_config.yaml
+++ b/litellm/proxy/example_config_yaml/websearch_interception_config.yaml
@ -8,6 +8,10 @@ search_tools:
  - search_tool_name: "my-perplexity-search"
    litellm_params:
      search_provider: "perplexity"
  # Alternative provider example (requires YOUCOM_API_KEY):
  # - search_tool_name: "my-you-com-search"
  #   litellm_params:
  #     search_provider: "you_com"
 litellm_settings:
  callbacks: ["websearch_interception"]
--- a/litellm/proxy/management_endpoints/mcp_management_endpoints.py
+++ b/litellm/proxy/management_endpoints/mcp_management_endpoints.py
@ -659,6 +659,7 @@ if MCP_AVAILABLE:
            registration_url=payload.registration_url,
            allow_all_keys=payload.allow_all_keys,
            available_on_public_internet=payload.available_on_public_internet,
            timeout=payload.timeout,
        )
    def get_prisma_client_or_throw(message: str):
--- a/litellm/proxy/public_endpoints/provider_create_fields.json
+++ b/litellm/proxy/public_endpoints/provider_create_fields.json
@ -2570,6 +2570,24 @@
    ],
    "default_model_placeholder": "snowflake/mistral-7b"
  },
  {
    "provider": "Soniox",
    "provider_display_name": "Soniox",
    "litellm_provider": "soniox",
    "credential_fields": [
      {
        "key": "api_key",
        "label": "Soniox API Key",
        "placeholder": null,
        "tooltip": "Currently only the async Speech-to-Text REST API (api.soniox.com) is supported. Realtime STT (stt-rt.soniox.com) and TTS (tts-rt.soniox.com) are not yet available.",
        "required": true,
        "field_type": "password",
        "options": null,
        "default_value": null
      }
    ],
    "default_model_placeholder": "soniox/stt-async-v4"
  },
  {
    "provider": "TEXT_COMPLETION_CODESTRAL",
    "provider_display_name": "Text-Completion-Codestral",
--- a/litellm/proxy/schema.prisma
+++ b/litellm/proxy/schema.prisma
@ -331,6 +331,7 @@ model LiteLLM_MCPServerTable {
  byok_description      String[] @default([])
  byok_api_key_help_url String?
  source_url            String?
  timeout               Float?
  // BYOM submission lifecycle
  approval_status  String?   @default("active")
  submitted_by     String?
--- a/litellm/search/main.py
+++ b/litellm/search/main.py
@ -283,6 +283,7 @@ def search(
        complete_url = search_provider_config.get_complete_url(
            api_base=api_base,
            optional_params=optional_params,
            api_key=api_key,
        )
        # Pre Call logging
--- a/litellm/types/mcp_server/mcp_server_manager.py
+++ b/litellm/types/mcp_server/mcp_server_manager.py
@ -109,6 +109,7 @@ class MCPServer(BaseModel):
    # Defaults to the token's expires_in minus the expiry buffer, or
    # MCP_PER_USER_TOKEN_DEFAULT_TTL when expires_in is absent.
    token_storage_ttl_seconds: Optional[int] = None
    timeout: Optional[float] = None
    # Resolved short-ID tool prefix when LITELLM_USE_SHORT_MCP_TOOL_PREFIX is
    # enabled.  Set by ``MCPServerManager._assign_unique_short_prefix`` at
    # registration time so that natural-hash collisions between two
--- a/litellm/types/utils.py
+++ b/litellm/types/utils.py
@ -3290,6 +3290,7 @@ class LlmProviders(str, Enum):
    GIGACHAT = "gigachat"
    NVIDIA_NIM = "nvidia_nim"
    NVIDIA_RIVA = "nvidia_riva"
    SONIOX = "soniox"
    CEREBRAS = "cerebras"
    AI21_CHAT = "ai21_chat"
    VOLCENGINE = "volcengine"
@ -3374,6 +3375,7 @@ class LlmProviders(str, Enum):
    NANOGPT = "nano-gpt"
    POE = "poe"
    CHUTES = "chutes"
    NEOSANTARA = "neosantara"
    XIAOMI_MIMO = "xiaomi_mimo"
    TENSORMESH = "tensormesh"
    LITELLM_AGENT = "litellm_agent"
@ -3416,6 +3418,7 @@ class SearchProviders(str, Enum):
    DUCKDUCKGO = "duckduckgo"
    SEARCHAPI = "searchapi"
    SERPER = "serper"
    YOU_COM = "you_com"
    APISERPENT = "apiserpent"
--- a/litellm/utils.py
+++ b/litellm/utils.py
@ -8816,6 +8816,12 @@ class ProviderConfigManager:
            )
            return NvidiaRivaAudioTranscriptionConfig()
        elif litellm.LlmProviders.SONIOX == provider:
            from litellm.llms.soniox.audio_transcription.transformation import (
                SonioxAudioTranscriptionConfig,
            )
            return SonioxAudioTranscriptionConfig()
        return None
    @staticmethod
@ -9518,6 +9524,7 @@ class ProviderConfigManager:
        from litellm.llms.searxng.search.transformation import SearXNGSearchConfig
        from litellm.llms.serper.search.transformation import SerperSearchConfig
        from litellm.llms.tavily.search.transformation import TavilySearchConfig
        from litellm.llms.you_com.search.transformation import YouComSearchConfig
        PROVIDER_TO_CONFIG_MAP = {
            SearchProviders.PERPLEXITY: PerplexitySearchConfig,
@ -9533,6 +9540,7 @@ class ProviderConfigManager:
            SearchProviders.DUCKDUCKGO: DuckDuckGoSearchConfig,
            SearchProviders.SEARCHAPI: SearchAPIConfig,
            SearchProviders.SERPER: SerperSearchConfig,
            SearchProviders.YOU_COM: YouComSearchConfig,
            SearchProviders.APISERPENT: APISerpentSearchConfig,
        }
        config_class = PROVIDER_TO_CONFIG_MAP.get(provider, None)
--- a/model_prices_and_context_window.json
+++ b/model_prices_and_context_window.json
@ -1463,6 +1463,36 @@
        "supports_output_config": true,
        "bedrock_output_config_effort_ceiling": "xhigh"
    },
    "jp.anthropic.claude-opus-4-7": {
        "cache_creation_input_token_cost": 6.875e-06,
        "cache_read_input_token_cost": 5.5e-07,
        "input_cost_per_token": 5.5e-06,
        "litellm_provider": "bedrock_converse",
        "max_input_tokens": 1000000,
        "max_output_tokens": 128000,
        "max_tokens": 128000,
        "mode": "chat",
        "output_cost_per_token": 2.75e-05,
        "search_context_cost_per_query": {
            "search_context_size_high": 0.01,
            "search_context_size_low": 0.01,
            "search_context_size_medium": 0.01
        },
        "supports_assistant_prefill": false,
        "supports_computer_use": true,
        "supports_function_calling": true,
        "supports_pdf_input": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
        "supports_xhigh_reasoning_effort": true,
        "tool_use_system_prompt_tokens": 346,
        "supports_native_structured_output": true,
        "supports_max_reasoning_effort": true,
        "supports_minimal_reasoning_effort": true
    },
    "anthropic.claude-sonnet-4-6": {
        "cache_creation_input_token_cost": 3.75e-06,
        "cache_creation_input_token_cost_above_1hr": 6e-06,
@ -24096,6 +24126,21 @@
        "max_input_tokens": 200000,
        "max_output_tokens": 8192
    },
    "minimax/MiniMax-M3": {
        "input_cost_per_token": 6e-07,
        "output_cost_per_token": 2.4e-06,
        "cache_read_input_token_cost": 1.2e-07,
        "litellm_provider": "minimax",
        "mode": "chat",
        "supports_function_calling": true,
        "supports_tool_choice": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
        "supports_system_messages": true,
        "supports_vision": true,
        "max_input_tokens": 512000,
        "max_output_tokens": 128000
    },
    "mistral.devstral-2-123b": {
        "input_cost_per_token": 4e-07,
        "litellm_provider": "bedrock_converse",
@ -24978,6 +25023,7 @@
    },
    "moonshot/kimi-k2-0711-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -24992,6 +25038,7 @@
    },
    "moonshot/kimi-k2-0905-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25006,6 +25053,7 @@
    },
    "moonshot/kimi-k2-turbo-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 1.15e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25030,6 +25078,7 @@
        "source": "https://platform.moonshot.ai/docs/guide/kimi-k2-5-quickstart",
        "supports_function_calling": true,
        "supports_reasoning": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_video_input": true,
        "supports_vision": true
@ -25046,12 +25095,14 @@
        "source": "https://platform.kimi.ai/docs/pricing/chat-k26",
        "supports_function_calling": true,
        "supports_reasoning": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_video_input": true,
        "supports_vision": true
    },
    "moonshot/kimi-latest": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 2e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25066,6 +25117,7 @@
    },
    "moonshot/kimi-latest-128k": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 2e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25080,6 +25132,7 @@
    },
    "moonshot/kimi-latest-32k": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 1e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 32768,
@ -25094,6 +25147,7 @@
    },
    "moonshot/kimi-latest-8k": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-01-28",
        "input_cost_per_token": 2e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 8192,
@ -25108,6 +25162,7 @@
    },
    "moonshot/kimi-thinking-preview": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2025-11-11",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25120,6 +25175,7 @@
    },
    "moonshot/kimi-k2-thinking": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 6e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25135,6 +25191,7 @@
    },
    "moonshot/kimi-k2-thinking-turbo": {
        "cache_read_input_token_cost": 1.5e-07,
        "deprecation_date": "2026-05-25",
        "input_cost_per_token": 1.15e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 262144,
@ -25158,9 +25215,11 @@
        "output_cost_per_token": 5e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "moonshot/moonshot-v1-128k-0430": {
        "deprecation_date": "2024-04-30",
        "input_cost_per_token": 2e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 131072,
@ -25182,6 +25241,7 @@
        "output_cost_per_token": 5e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true
    },
@ -25195,9 +25255,11 @@
        "output_cost_per_token": 3e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "moonshot/moonshot-v1-32k-0430": {
        "deprecation_date": "2024-04-30",
        "input_cost_per_token": 1e-06,
        "litellm_provider": "moonshot",
        "max_input_tokens": 32768,
@ -25219,6 +25281,7 @@
        "output_cost_per_token": 3e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true
    },
@ -25232,9 +25295,11 @@
        "output_cost_per_token": 2e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "moonshot/moonshot-v1-8k-0430": {
        "deprecation_date": "2024-04-30",
        "input_cost_per_token": 2e-07,
        "litellm_provider": "moonshot",
        "max_input_tokens": 8192,
@ -25256,6 +25321,7 @@
        "output_cost_per_token": 2e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true
    },
@ -25269,6 +25335,7 @@
        "output_cost_per_token": 5e-06,
        "source": "https://platform.moonshot.ai/docs/pricing",
        "supports_function_calling": true,
        "supports_response_schema": true,
        "supports_tool_choice": true
    },
    "morph/morph-v3-fast": {
@ -30977,6 +31044,11 @@
        "litellm_provider": "tavily",
        "mode": "search"
    },
    "you_com/search": {
        "input_cost_per_query": 0.0,
        "litellm_provider": "you_com",
        "mode": "search"
    },
    "text-completion-codestral/codestral-2405": {
        "input_cost_per_token": 0.0,
        "litellm_provider": "text-completion-codestral",
@ -36047,7 +36119,8 @@
        "supports_prompt_caching": true,
        "supports_response_schema": false,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-3-beta": {
        "cache_read_input_token_cost": 7.5e-07,
@ -36246,7 +36319,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-fast-non-reasoning": {
        "cache_read_input_token_cost": 5e-08,
@ -36263,7 +36337,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-0709": {
        "input_cost_per_token": 3e-06,
@ -36279,7 +36354,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_tool_choice": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-latest": {
        "input_cost_per_token": 3e-06,
@ -36337,7 +36413,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-1-fast-reasoning-latest": {
        "cache_read_input_token_cost": 5e-08,
@ -36358,7 +36435,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-1-fast-non-reasoning": {
        "cache_read_input_token_cost": 5e-08,
@ -36378,7 +36456,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4-1-fast-non-reasoning-latest": {
        "cache_read_input_token_cost": 5e-08,
@ -36398,7 +36477,8 @@
        "supports_response_schema": true,
        "supports_tool_choice": true,
        "supports_vision": true,
-        "supports_web_search": true
+        "supports_web_search": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-4.20-multi-agent-beta-0309": {
        "cache_read_input_token_cost": 2e-07,
@ -36549,7 +36629,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
-        "supports_tool_choice": true
+        "supports_tool_choice": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-code-fast-1-0825": {
        "cache_read_input_token_cost": 2e-08,
@ -36564,7 +36645,8 @@
        "supports_function_calling": true,
        "supports_prompt_caching": true,
        "supports_reasoning": true,
-        "supports_tool_choice": true
+        "supports_tool_choice": true,
        "deprecation_date": "2026-05-15"
    },
    "xai/grok-vision-beta": {
        "input_cost_per_image": 5e-06,
@ -41756,6 +41838,16 @@
    "output_cost_per_token": 0.0,
    "litellm_provider": "snowflake",
    "mode": "embedding"
  },
  "soniox/stt-async-v4": {
    "litellm_provider": "soniox",
    "max_output_tokens": 8000,
    "max_tokens": 8000,
    "input_cost_per_second": 0.0,
    "output_cost_per_second": 0.0000277778,
    "mode": "audio_transcription",
    "source": "https://soniox.com/pricing",
    "supported_endpoints": ["/v1/audio/transcriptions"],
    "supports_audio_input": true
  }
 }
--- a/provider_endpoints_support.json
+++ b/provider_endpoints_support.json
@ -1556,6 +1556,23 @@
        "interactions": true
      }
    },
    "neosantara": {
      "display_name": "Neosantara (`neosantara`)",
      "url": "https://docs.litellm.ai/docs/providers/neosantara",
      "endpoints": {
        "chat_completions": true,
        "messages": false,
        "responses": true,
        "embeddings": false,
        "image_generations": false,
        "audio_transcriptions": false,
        "audio_speech": false,
        "moderations": false,
        "batches": false,
        "rerank": false,
        "a2a": false
      }
    },
    "nlp_cloud": {
      "display_name": "NLP Cloud (`nlp_cloud`)",
      "url": "https://docs.litellm.ai/docs/providers/nlp_cloud",
@ -2081,6 +2098,22 @@
        "interactions": true
      }
    },
    "soniox": {
      "display_name": "Soniox (`soniox`)",
      "url": "https://docs.litellm.ai/docs/providers/soniox",
      "endpoints": {
        "chat_completions": false,
        "messages": false,
        "responses": false,
        "embeddings": false,
        "image_generations": false,
        "audio_transcriptions": true,
        "audio_speech": false,
        "moderations": false,
        "batches": false,
        "rerank": false
      }
    },
    "synthetic": {
      "display_name": "Synthetic (`synthetic`)",
      "endpoints": {
@ -2190,6 +2223,10 @@
        "search": true
      }
    },
    "you_com": {
      "display_name": "You.com (`you_com`)",
      "url": "https://docs.litellm.ai/docs/search/you_com"
    },
    "apiserpent": {
      "display_name": "APISerpent (`apiserpent`)",
      "url": "https://docs.litellm.ai/docs/search/apiserpent",
--- a/schema.prisma
+++ b/schema.prisma
@ -331,6 +331,7 @@ model LiteLLM_MCPServerTable {
  byok_description      String[] @default([])
  byok_api_key_help_url String?
  source_url            String?
  timeout               Float?
  // BYOM submission lifecycle
  approval_status  String?   @default("active")
  submitted_by     String?
--- a/tests/test_litellm/integrations/langfuse/test_langfuse_prompt_management.py
+++ b/tests/test_litellm/integrations/langfuse/test_langfuse_prompt_management.py
@ -1,8 +1,8 @@
 import os
 from unittest.mock import MagicMock, patch
 from litellm.integrations.langfuse.langfuse_prompt_management import (
    LangfusePromptManagement,
    langfuse_client_init,
 )
@ -65,3 +65,44 @@ class TestLangfusePromptManagement:
                mock_run_async.call_args[0][0]
                == langfuse_prompt_management.async_log_failure_event
            )
    def test_langfuse_client_init_passes_dedicated_httpx_client(self):
        import httpx
        from litellm.llms.custom_httpx.http_handler import _get_httpx_client
        shared_client = _get_httpx_client().client
        mock_langfuse_class = MagicMock()
        with (
            patch(
                "litellm.integrations.langfuse.langfuse_prompt_management.resolve_langfuse_credentials",
                return_value=("pk-1234", "sk-1234", "https://localhost"),
            ),
            patch(
                "litellm.integrations.langfuse.langfuse_prompt_management.LangFuseLogger._get_langfuse_flush_interval",
                return_value=1,
            ),
            patch.dict("sys.modules", {"langfuse": self._mock_langfuse}),
            patch(
                "litellm.llms.custom_httpx.http_handler.get_ssl_configuration",
                return_value=False,
            ) as mock_get_ssl,
        ):
            self._mock_langfuse.Langfuse = mock_langfuse_class
            langfuse_client_init(
                langfuse_public_key="pk-1234",
                langfuse_secret="sk-1234",
                langfuse_host="https://localhost",
            )
            mock_langfuse_class.assert_called_once()
            call_kwargs = mock_langfuse_class.call_args[1]
            assert "httpx_client" in call_kwargs
            passed_client = call_kwargs["httpx_client"]
            assert isinstance(passed_client, httpx.Client)
            assert passed_client is not shared_client
            mock_get_ssl.assert_called_once()
        langfuse_client_init.cache_clear()
--- a/tests/test_litellm/integrations/test_openmeter.py
+++ b/tests/test_litellm/integrations/test_openmeter.py
@ -23,6 +23,7 @@ class TestOpenMeterIntegration:
        os.environ.pop("OPENMETER_API_KEY", None)
        os.environ.pop("OPENMETER_API_ENDPOINT", None)
        os.environ.pop("OPENMETER_EVENT_TYPE", None)
        os.environ.pop("OPENMETER_TRUST_REQUEST_USER", None)
    def test_openmeter_logger_initialization(self):
        """Test that OpenMeterLogger initializes correctly with required env vars"""
@ -388,6 +389,75 @@ class TestOpenMeterIntegration:
        assert isinstance(result["subject"], str)
        assert result["subject"] == "12345"
    def test_common_logic_trust_request_user_false_ignores_request_user(self):
        """OPENMETER_TRUST_REQUEST_USER=false makes the key-bound user_id win
        over a request-supplied `user` (forge-attribution mitigation)."""
        os.environ["OPENMETER_TRUST_REQUEST_USER"] = "false"
        logger = OpenMeterLogger()
        kwargs = {
            "user": "forged-by-client",
            "model": "gpt-4",
            "response_cost": 0.002,
            "litellm_call_id": "test-call-id",
            "litellm_params": {
                "metadata": {"user_api_key_user_id": "real-tenant-id"}
            },
        }
        response_obj = {
            "id": "test-response-id",
            "usage": {"prompt_tokens": 20, "completion_tokens": 10, "total_tokens": 30},
        }
        result = logger._common_logic(kwargs, response_obj)
        assert result["subject"] == "real-tenant-id"
        assert result["subject"] != "forged-by-client"
    def test_common_logic_trust_request_user_false_still_raises_without_key_user(self):
        """OPENMETER_TRUST_REQUEST_USER=false still raises when no
        user_api_key_user_id is available — the request `user` is not a
        fallback in this mode."""
        os.environ["OPENMETER_TRUST_REQUEST_USER"] = "false"
        logger = OpenMeterLogger()
        kwargs = {
            "user": "would-have-worked-without-the-flag",
            "model": "gpt-3.5-turbo",
            "response_cost": 0.001,
            "litellm_call_id": "test-call-id",
        }
        response_obj = {"id": "test-response-id"}
        with pytest.raises(Exception, match="OpenMeter: user is required"):
            logger._common_logic(kwargs, response_obj)
    def test_common_logic_trust_request_user_default_preserves_behavior(self):
        """Default (unset OPENMETER_TRUST_REQUEST_USER) keeps request `user`
        taking priority — backward compatibility."""
        # OPENMETER_TRUST_REQUEST_USER intentionally unset
        logger = OpenMeterLogger()
        kwargs = {
            "user": "request-user",
            "model": "gpt-4",
            "response_cost": 0.002,
            "litellm_call_id": "test-call-id",
            "litellm_params": {
                "metadata": {"user_api_key_user_id": "key-user"}
            },
        }
        response_obj = {
            "id": "test-response-id",
            "usage": {"prompt_tokens": 20, "completion_tokens": 10, "total_tokens": 30},
        }
        result = logger._common_logic(kwargs, response_obj)
        assert result["subject"] == "request-user"
    @patch("litellm.integrations.openmeter.HTTPHandler")
    def test_integration_token_user_id_scenario(self, mock_http_handler):
        """Integration test simulating the exact scenario that was failing"""
--- a/tests/test_litellm/litellm_core_utils/test_litellm_logging.py
+++ b/tests/test_litellm/litellm_core_utils/test_litellm_logging.py
@ -3114,3 +3114,44 @@ def test_get_error_information_prefers_message_attribute_over_empty_str():
    )
    assert info["error_message"] == "real failure detail"
    assert info["error_code"] == "401"
@pytest.mark.parametrize(
    "event_cls, event_type",
    [
        ("ResponseCompletedEvent", "response.completed"),
        ("ResponseIncompleteEvent", "response.incomplete"),
        ("ResponseFailedEvent", "response.failed"),
    ],
 )
 def test_handle_anthropic_messages_response_logging_with_terminal_responses_api_events(
    event_cls, event_type
 ):
    """Regression test for #28943: when anthropic_messages routes to OpenAI Responses
    API and stream=True, success_handler receives a terminal ResponsesAPI event instead
    of a ModelResponse. The handler must return the inner ResponsesAPIResponse rather
    than crashing with AnthropicResponse.model_validate."""
    import importlib
    openai_types = importlib.import_module("litellm.types.llms.openai")
    EventClass = getattr(openai_types, event_cls)
    from litellm.types.llms.openai import ResponsesAPIResponse
    logging_obj = LitellmLogging(
        model="gpt-4o",
        messages=[{"role": "user", "content": "hello"}],
        stream=True,
        call_type="anthropic_messages",
        start_time=time.time(),
        litellm_call_id="test-rce-123",
        function_id="test-fn",
    )
    inner_response = ResponsesAPIResponse(
        id="resp_test", created_at=1700000000, output=[]
    )
    event = EventClass(type=event_type, response=inner_response)
    result = logging_obj._handle_anthropic_messages_response_logging(result=event)
    assert result is inner_response
--- a/tests/test_litellm/llms/cohere/chat/test_cohere_transformation.py
+++ b/tests/test_litellm/llms/cohere/chat/test_cohere_transformation.py
@ -6,7 +6,9 @@ sys.path.insert(
    0, os.path.abspath("../../../../..")
 )  # Adds the parent directory to the system path
 import litellm
 from litellm.llms.cohere.chat.transformation import CohereChatConfig
 from litellm.llms.cohere.chat.v2_transformation import CohereV2ChatConfig
 class TestCohereTransform:
@ -49,3 +51,69 @@ class TestCohereTransform:
        # The function should properly map max_tokens if max_completion_tokens is not provided
        assert result == {"temperature": 0.7, "max_tokens": 200}
 class TestCohereV2Transform:
    def setup_method(self):
        self.config = CohereV2ChatConfig()
        self.model = "command-r"
    def test_v2_supports_max_completion_tokens(self):
        """max_completion_tokens must be advertised so get_optional_params does not reject it"""
        assert "max_completion_tokens" in self.config.get_supported_openai_params(
            self.model
        )
    def test_v2_max_tokens_only_still_maps(self):
        """max_tokens alone maps to cohere max_tokens when max_completion_tokens is absent"""
        result = self.config.map_openai_params(
            non_default_params={"temperature": 0.7, "max_tokens": 200},
            optional_params={},
            model=self.model,
            drop_params=False,
        )
        assert result == {"temperature": 0.7, "max_tokens": 200}
    def test_v2_map_max_completion_tokens_overrides_max_tokens(self):
        """max_completion_tokens maps to cohere max_tokens and overrides max_tokens, matching v1"""
        result = self.config.map_openai_params(
            non_default_params={
                "temperature": 0.7,
                "max_tokens": 200,
                "max_completion_tokens": 256,
            },
            optional_params={},
            model=self.model,
            drop_params=False,
        )
        assert result == {"temperature": 0.7, "max_tokens": 256}
    def test_v2_max_completion_tokens_precedence_is_order_independent(self):
        """max_completion_tokens wins over max_tokens regardless of dict ordering"""
        max_tokens_first = self.config.map_openai_params(
            non_default_params={"max_tokens": 200, "max_completion_tokens": 256},
            optional_params={},
            model=self.model,
            drop_params=False,
        )
        max_completion_first = self.config.map_openai_params(
            non_default_params={"max_completion_tokens": 256, "max_tokens": 200},
            optional_params={},
            model=self.model,
            drop_params=False,
        )
        assert max_tokens_first == {"max_tokens": 256}
        assert max_completion_first == {"max_tokens": 256}
    def test_v2_default_route_accepts_max_completion_tokens(self):
        """The default cohere_chat route resolves to v2; max_completion_tokens must not raise"""
        optional_params = litellm.get_optional_params(
            model=self.model,
            custom_llm_provider="cohere_chat",
            max_completion_tokens=256,
        )
        assert optional_params["max_tokens"] == 256
--- a/tests/test_litellm/llms/custom_httpx/test_http_handler.py
+++ b/tests/test_litellm/llms/custom_httpx/test_http_handler.py
@ -798,3 +798,56 @@ def test_get_httpx_client_applies_httpx_timeout_object_without_mocking_handler()
        assert handler.client.timeout == t
    finally:
        handler.close()
 def test_sync_get_forwards_per_request_timeout():
    """HTTPHandler.get(timeout=...) must apply the timeout to that request,
    overriding the client default rather than silently ignoring it."""
    captured = {}
    def mock_handler(request: httpx.Request) -> httpx.Response:
        captured["timeout"] = request.extensions.get("timeout")
        return httpx.Response(200, request=request, json={"ok": True})
    handler = HTTPHandler()
    handler.client.close()
    handler.client = httpx.Client(
        transport=httpx.MockTransport(mock_handler),
        timeout=httpx.Timeout(5.0),
    )
    try:
        handler.get("https://example.com/poll", timeout=99.0)
        assert captured["timeout"] == {
            "connect": 99.0,
            "read": 99.0,
            "write": 99.0,
            "pool": 99.0,
        }
    finally:
        handler.close()
@pytest.mark.asyncio
 async def test_async_get_forwards_per_request_timeout():
    captured = {}
    async def mock_handler(request: httpx.Request) -> httpx.Response:
        captured["timeout"] = request.extensions.get("timeout")
        return httpx.Response(200, request=request, json={"ok": True})
    handler = AsyncHTTPHandler()
    await handler.client.aclose()
    handler.client = httpx.AsyncClient(
        transport=httpx.MockTransport(mock_handler),
        timeout=httpx.Timeout(5.0),
    )
    try:
        await handler.get("https://example.com/poll", timeout=99.0)
        assert captured["timeout"] == {
            "connect": 99.0,
            "read": 99.0,
            "write": 99.0,
            "pool": 99.0,
        }
    finally:
        await handler.close()
--- a/tests/test_litellm/llms/moonshot/test_moonshot_chat_transformation.py
+++ b/tests/test_litellm/llms/moonshot/test_moonshot_chat_transformation.py
@ -9,15 +9,12 @@ import os
 import sys
 from unittest.mock import patch
-sys.path.insert(
+sys.path.insert(0, os.path.abspath("../../../../.."))  # Adds the parent directory to the system path
    0, os.path.abspath("../../../../..")
 )  # Adds the parent directory to the system path
 import pytest
 import litellm
 import litellm.utils
 from litellm import completion
 from litellm.litellm_core_utils.get_model_cost_map import GetModelCostMap
 from litellm.llms.moonshot.chat.transformation import MoonshotChatConfig
@ -208,6 +205,42 @@ class TestMoonshotConfig:
            # Temperature should be preserved
            assert result.get("temperature") == temp
    def test_temperature_dropped_for_reasoning_models(self):
        """Reasoning models (kimi-k2.5, kimi-k2.6) reject any temperature except 1,
        so the param is dropped rather than clamped. A clamp to 0.3/1 would still
        400 when the caller passes e.g. 0.5."""
        config = MoonshotChatConfig()
        with patch(
            "litellm.llms.moonshot.chat.transformation.supports_reasoning",
            return_value=True,
        ):
            for temp in [0.0, 0.5, 1.0, 1.5]:
                result = config.map_openai_params(
                    non_default_params={"temperature": temp},
                    optional_params={},
                    model="kimi-k2.5",
                    drop_params=False,
                )
                assert "temperature" not in result
    def test_temperature_clamped_for_non_reasoning_models(self):
        """Non-reasoning models keep the [0.3, 1] clamp behaviour."""
        config = MoonshotChatConfig()
        with patch(
            "litellm.llms.moonshot.chat.transformation.supports_reasoning",
            return_value=False,
        ):
            result = config.map_openai_params(
                non_default_params={"temperature": 1.5},
                optional_params={},
                model="moonshot-v1-8k",
                drop_params=False,
            )
        assert result.get("temperature") == 1
    def test_tool_choice_required_adds_message(self):
        """Test that tool_choice='required' adds a special message and removes tool_choice"""
        config = MoonshotChatConfig()
@ -232,10 +265,7 @@ class TestMoonshotConfig:
        assert result["messages"][0]["role"] == "user"
        assert result["messages"][0]["content"] == "What's the weather like?"
        assert result["messages"][1]["role"] == "user"
-        assert (
+        assert result["messages"][1]["content"] == "Please select a tool to handle the current issue."
            result["messages"][1]["content"]
            == "Please select a tool to handle the current issue."
        )
        # Check that tool_choice was removed but tools are preserved
        assert "tool_choice" not in result
@ -273,10 +303,7 @@ class TestMoonshotConfig:
        # Check that the message was added
        assert len(result["messages"]) == 2
-        assert (
+        assert result["messages"][1]["content"] == "Please select a tool to handle the current issue."
            result["messages"][1]["content"]
            == "Please select a tool to handle the current issue."
        )
    def test_tool_choice_non_required_preserved(self):
        """Test that non-'required' tool_choice values are preserved"""
@ -501,9 +528,7 @@ class TestMoonshotConfig:
        assert result[0].get("reasoning_content") == "stored thinking"
        # The promoted key must be removed from provider_specific_fields to
        # avoid sending the value twice in the serialised request body
-        assert "reasoning_content" not in (
+        assert "reasoning_content" not in (result[0].get("provider_specific_fields") or {})
            result[0].get("provider_specific_fields") or {}
        )
    def test_reasoning_model_fill_called_from_transform_request(self):
        """transform_request injects reasoning_content end-to-end for reasoning models."""
@ -603,10 +628,7 @@ class TestMoonshotConfig:
        result = config.fill_reasoning_content(messages)
        # reasoning_content should be preserved, not replaced with placeholder
-        assert (
+        assert result[0].get("reasoning_content") == "<thinking>User wants weather</thinking>"
            result[0].get("reasoning_content")
            == "<thinking>User wants weather</thinking>"
        )
    def test_reasoning_content_preserved_in_multi_turn_flow(self):
        """reasoning_content is preserved through multi-turn conversation flow.
@ -650,10 +672,7 @@ class TestMoonshotConfig:
        result = config.fill_reasoning_content(messages)
        # reasoning_content should be preserved in the assistant message
-        assert (
+        assert result[1].get("reasoning_content") == "<thinking>Planning to call weather tool</thinking>"
            result[1].get("reasoning_content")
            == "<thinking>Planning to call weather tool</thinking>"
        )
 class TestKimiK26ModelRegistry:
@ -695,3 +714,33 @@ class TestKimiK26ModelRegistry:
        """kimi-k2.6 should be assigned to the moonshot provider."""
        model_info = model_cost_map["moonshot/kimi-k2.6"]
        assert model_info["litellm_provider"] == "moonshot"
 class TestMoonshotResponseSchemaSupport:
    """Every model currently live on api.moonshot.ai supports json_schema
    response_format, which gates discovery via litellm.responses(). The flag
    must be true so the capability is advertised honestly."""
    LIVE_MODELS = [
        "moonshot/kimi-k2.5",
        "moonshot/kimi-k2.6",
        "moonshot/moonshot-v1-8k",
        "moonshot/moonshot-v1-32k",
        "moonshot/moonshot-v1-128k",
        "moonshot/moonshot-v1-8k-vision-preview",
        "moonshot/moonshot-v1-32k-vision-preview",
        "moonshot/moonshot-v1-128k-vision-preview",
        "moonshot/moonshot-v1-auto",
    ]
    @pytest.fixture(autouse=True)
    def model_cost_map(self):
        return GetModelCostMap.load_local_model_cost_map()
    @pytest.mark.parametrize("model", LIVE_MODELS)
    def test_live_model_supports_response_schema(self, model, model_cost_map):
        assert model_cost_map[model].get("supports_response_schema") is True
    def test_supports_response_schema_utility_reports_true(self, model_cost_map, monkeypatch):
        monkeypatch.setattr(litellm, "model_cost", model_cost_map)
        assert litellm.utils.supports_response_schema(model="moonshot/kimi-k2.5") is True
--- a/tests/test_litellm/llms/neosantara/test_neosantara.py
+++ b/tests/test_litellm/llms/neosantara/test_neosantara.py
@ -0,0 +1,100 @@
 import os
 from unittest.mock import patch
 NEOSANTARA_API_BASE = "https://api.neosantara.xyz/v1"
 def test_neosantara_json_registry():
    import litellm
    from litellm.llms.openai_like.json_loader import JSONProviderRegistry
    assert litellm.LlmProviders.NEOSANTARA.value == "neosantara"
    assert litellm.LlmProviders("neosantara") == litellm.LlmProviders.NEOSANTARA
    assert JSONProviderRegistry.exists("neosantara")
    config = JSONProviderRegistry.get("neosantara")
    assert config is not None
    assert config.base_url == NEOSANTARA_API_BASE
    assert config.api_key_env == "NEOSANTARA_API_KEY"
    assert config.api_base_env == "NEOSANTARA_API_BASE"
    assert config.param_mappings["max_completion_tokens"] == "max_tokens"
    assert "/v1/chat/completions" in config.supported_endpoints
    assert "/v1/responses" in config.supported_endpoints
 def test_neosantara_dynamic_config_env_vars():
    from litellm.llms.openai_like.dynamic_config import create_config_class
    from litellm.llms.openai_like.json_loader import JSONProviderRegistry
    config = create_config_class(JSONProviderRegistry.get("neosantara"))()
    with patch.dict(
        os.environ,
        {
            "NEOSANTARA_API_KEY": "test-key",
            "NEOSANTARA_API_BASE": "https://custom.neosantara.example/v1",
        },
    ):
        api_base, api_key = config._get_openai_compatible_provider_info(None, None)
    assert api_base == "https://custom.neosantara.example/v1"
    assert api_key == "test-key"
 def test_neosantara_provider_detection_by_prefix():
    from litellm.litellm_core_utils.get_llm_provider_logic import get_llm_provider
    model, provider, _, api_base = get_llm_provider("neosantara/gemini-3-flash")
    assert model == "gemini-3-flash"
    assert provider == "neosantara"
    assert api_base == NEOSANTARA_API_BASE
 def test_neosantara_chat_complete_url():
    from litellm.llms.openai_like.dynamic_config import create_config_class
    from litellm.llms.openai_like.json_loader import JSONProviderRegistry
    config = create_config_class(JSONProviderRegistry.get("neosantara"))()
    assert (
        config.get_complete_url(
            api_base=None,
            api_key=None,
            model="gemini-3-flash",
            optional_params={},
            litellm_params={},
        )
        == "https://api.neosantara.xyz/v1/chat/completions"
    )
 def test_neosantara_maps_max_completion_tokens_to_max_tokens():
    from litellm.llms.openai_like.dynamic_config import create_config_class
    from litellm.llms.openai_like.json_loader import JSONProviderRegistry
    config = create_config_class(JSONProviderRegistry.get("neosantara"))()
    optional_params = config.map_openai_params(
        non_default_params={"max_completion_tokens": 7},
        optional_params={},
        model="gemini-3-flash",
        drop_params=False,
    )
    assert optional_params == {"max_tokens": 7}
 def test_neosantara_responses_api_config():
    from litellm.llms.openai.responses.transformation import OpenAIResponsesAPIConfig
    from litellm.utils import ProviderConfigManager
    config = ProviderConfigManager.get_provider_responses_api_config(
        provider="neosantara",
        model="claude-opus-4-6",
    )
    assert isinstance(config, OpenAIResponsesAPIConfig)
    assert config.custom_llm_provider == "neosantara"
    assert (
        config.get_complete_url(api_base=None, litellm_params={})
        == "https://api.neosantara.xyz/v1/responses"
    )
--- a/tests/test_litellm/llms/soniox/init.py
+++ b/tests/test_litellm/llms/soniox/init.py
@ -0,0 +1 @@
 """Soniox provider tests."""
--- a/tests/test_litellm/llms/soniox/audio_transcription/init.py
+++ b/tests/test_litellm/llms/soniox/audio_transcription/init.py
@ -0,0 +1 @@
 """Soniox audio transcription tests."""
--- a/tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_handler.py
+++ b/tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_handler.py
--- a/tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_transformation.py
+++ b/tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_transformation.py
@ -0,0 +1,495 @@
 """Tests for SonioxAudioTranscriptionConfig."""
 import json
 from typing import Any, Dict, Optional
 from unittest.mock import patch
 import httpx
 import pytest
 from litellm.llms.soniox.audio_transcription.transformation import (
    SonioxAudioTranscriptionConfig,
 )
 from litellm.llms.soniox.common_utils import SonioxException
 from litellm.types.utils import TranscriptionResponse
 def _make_response(payload: Dict[str, Any], status_code: int = 200) -> httpx.Response:
    return httpx.Response(
        status_code=status_code,
        content=json.dumps(payload).encode("utf-8"),
        headers={"content-type": "application/json"},
    )
 class TestGetSupportedOpenAIParams:
    def test_should_advertise_language_and_response_format(self):
        cfg = SonioxAudioTranscriptionConfig()
        assert cfg.get_supported_openai_params(model="stt-async-v4") == [
            "language",
            "response_format",
        ]
 class TestMapOpenAIParams:
    def test_should_translate_language_to_language_hints(self):
        cfg = SonioxAudioTranscriptionConfig()
        result = cfg.map_openai_params(
            non_default_params={"language": "en"},
            optional_params={},
            model="stt-async-v4",
            drop_params=False,
        )
        assert result["language_hints"] == ["en"]
    def test_should_prepend_language_to_existing_hints(self):
        cfg = SonioxAudioTranscriptionConfig()
        result = cfg.map_openai_params(
            non_default_params={"language": "en"},
            optional_params={"language_hints": ["fr"]},
            model="stt-async-v4",
            drop_params=False,
        )
        assert result["language_hints"] == ["en", "fr"]
    def test_should_not_duplicate_language_already_in_hints(self):
        cfg = SonioxAudioTranscriptionConfig()
        result = cfg.map_openai_params(
            non_default_params={"language": "en"},
            optional_params={"language_hints": ["en", "fr"]},
            model="stt-async-v4",
            drop_params=False,
        )
        assert result["language_hints"] == ["en", "fr"]
    def test_should_passthrough_soniox_native_kwargs(self):
        cfg = SonioxAudioTranscriptionConfig()
        result = cfg.map_openai_params(
            non_default_params={
                "enable_speaker_diarization": True,
                "enable_language_identification": True,
                "context": "medical conversation",
                "audio_url": "https://example.com/a.wav",
            },
            optional_params={},
            model="stt-async-v4",
            drop_params=False,
        )
        assert result["enable_speaker_diarization"] is True
        assert result["enable_language_identification"] is True
        assert result["context"] == "medical conversation"
        assert result["audio_url"] == "https://example.com/a.wav"
    def test_should_passthrough_handler_only_kwargs(self):
        cfg = SonioxAudioTranscriptionConfig()
        result = cfg.map_openai_params(
            non_default_params={
                "soniox_polling_interval": 0.5,
                "soniox_max_polling_attempts": 10,
                "soniox_cleanup": ["file"],
            },
            optional_params={},
            model="stt-async-v4",
            drop_params=False,
        )
        assert result["soniox_polling_interval"] == 0.5
        assert result["soniox_max_polling_attempts"] == 10
        assert result["soniox_cleanup"] == ["file"]
 class TestValidateEnvironment:
    def test_should_set_bearer_token_from_api_key(self):
        cfg = SonioxAudioTranscriptionConfig()
        headers = cfg.validate_environment(
            headers={},
            model="stt-async-v4",
            messages=[],
            optional_params={},
            litellm_params={},
            api_key="sk-test",
        )
        assert headers["Authorization"] == "Bearer sk-test"
    def test_should_resolve_key_from_env(self, monkeypatch):
        monkeypatch.setenv("SONIOX_API_KEY", "env-key")
        cfg = SonioxAudioTranscriptionConfig()
        headers = cfg.validate_environment(
            headers={},
            model="stt-async-v4",
            messages=[],
            optional_params={},
            litellm_params={},
        )
        assert headers["Authorization"] == "Bearer env-key"
    def test_should_raise_when_no_api_key(self, monkeypatch):
        monkeypatch.delenv("SONIOX_API_KEY", raising=False)
        cfg = SonioxAudioTranscriptionConfig()
        with pytest.raises(SonioxException) as exc_info:
            cfg.validate_environment(
                headers={},
                model="stt-async-v4",
                messages=[],
                optional_params={},
                litellm_params={},
            )
        assert exc_info.value.status_code == 401
    def test_should_merge_caller_headers(self):
        cfg = SonioxAudioTranscriptionConfig()
        headers = cfg.validate_environment(
            headers={"X-Trace-Id": "abc"},
            model="stt-async-v4",
            messages=[],
            optional_params={},
            litellm_params={},
            api_key="sk-test",
        )
        assert headers["X-Trace-Id"] == "abc"
        assert headers["Authorization"] == "Bearer sk-test"
 class TestGetCompleteUrl:
    def test_should_return_default_base(self):
        cfg = SonioxAudioTranscriptionConfig()
        url = cfg.get_complete_url(
            api_base=None,
            api_key="sk-test",
            model="stt-async-v4",
            optional_params={},
            litellm_params={},
        )
        assert url == "https://api.soniox.com"
    def test_should_strip_trailing_slash_from_custom_base(self):
        cfg = SonioxAudioTranscriptionConfig()
        url = cfg.get_complete_url(
            api_base="https://custom.example.com/",
            api_key="sk-test",
            model="stt-async-v4",
            optional_params={},
            litellm_params={},
        )
        assert url == "https://custom.example.com"
 class TestTransformAudioTranscriptionRequest:
    def test_should_build_minimal_body_with_model(self):
        cfg = SonioxAudioTranscriptionConfig()
        result = cfg.transform_audio_transcription_request(
            model="stt-async-v4",
            audio_file=None,
            optional_params={},
            litellm_params={},
        )
        assert result.data == {"model": "stt-async-v4"}
        assert result.files is None
        assert result.content_type == "application/json"
    def test_should_include_passthrough_params_in_body(self):
        cfg = SonioxAudioTranscriptionConfig()
        result = cfg.transform_audio_transcription_request(
            model="stt-async-v4",
            audio_file=None,
            optional_params={
                "audio_url": "https://example.com/a.wav",
                "language_hints": ["en"],
                "enable_speaker_diarization": True,
                "soniox_polling_interval": 0.5,  # handler-only, must NOT appear
            },
            litellm_params={},
        )
        body = result.data
        assert body["audio_url"] == "https://example.com/a.wav"
        assert body["language_hints"] == ["en"]
        assert body["enable_speaker_diarization"] is True
        assert "soniox_polling_interval" not in body
 class TestTransformAudioTranscriptionResponse:
    def test_should_build_response_from_plain_transcript_payload(self):
        cfg = SonioxAudioTranscriptionConfig()
        resp = cfg.transform_audio_transcription_response(
            _make_response({"id": "tx_1", "text": "hello world"}),
        )
        assert resp.text == "hello world"
        assert resp["task"] == "transcribe"
    def test_should_build_response_from_envelope_payload(self):
        cfg = SonioxAudioTranscriptionConfig()
        resp = cfg.transform_audio_transcription_response(
            _make_response(
                {
                    "transcription": {"id": "tx_1", "audio_duration_ms": 2500},
                    "transcript": {"text": "hello world", "tokens": []},
                }
            ),
        )
        assert resp.text == "hello world"
        assert resp["duration"] == pytest.approx(2.5)
    def test_should_render_speaker_tags_when_diarization_present(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "text": "ignored fallback",
                "tokens": [
                    {"text": "hello", "speaker": 1},
                    {"text": " world", "speaker": 2},
                ],
            }
        }
        resp = cfg._build_response_from_payload(payload)
        assert "Speaker 1:" in resp.text
        assert "Speaker 2:" in resp.text
    def test_should_set_language_when_all_tokens_share_one(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "tokens": [
                    {"text": "hello", "language": "en"},
                    {"text": " world", "language": "en"},
                ]
            }
        }
        resp = cfg._build_response_from_payload(payload)
        assert resp["language"] == "en"
    def test_should_populate_provided_model_response(self):
        cfg = SonioxAudioTranscriptionConfig()
        model_response = TranscriptionResponse()
        model_response._hidden_params = {"pre": "existing"}
        payload = {"text": "populated"}
        resp = cfg._build_response_from_payload(payload, model_response=model_response)
        assert resp is model_response
        assert resp.text == "populated"
        assert resp._hidden_params["pre"] == "existing"
        assert "soniox_raw" in resp._hidden_params
    def test_should_stash_raw_payload_in_hidden_params(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcription": {"id": "tx_1"},
            "transcript": {"text": "hi", "tokens": []},
        }
        resp = cfg._build_response_from_payload(payload)
        raw = resp._hidden_params["soniox_raw"]
        assert raw["transcription"]["id"] == "tx_1"
        assert raw["transcript"]["text"] == "hi"
    def test_should_raise_on_invalid_json(self):
        cfg = SonioxAudioTranscriptionConfig()
        bad = httpx.Response(status_code=200, content=b"not json")
        with pytest.raises(SonioxException):
            cfg.transform_audio_transcription_response(bad)
    def test_should_concat_token_texts_when_no_text_field_or_tags(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "tokens": [
                    {"text": "hello"},
                    {"text": " world"},
                ],
            }
        }
        resp = cfg._build_response_from_payload(payload)
        assert resp.text == "hello world"
    def test_should_return_empty_text_for_empty_payload(self):
        cfg = SonioxAudioTranscriptionConfig()
        resp = cfg._build_response_from_payload({})
        assert resp.text == ""
    def test_should_skip_duration_when_audio_duration_ms_is_invalid(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcription": {"audio_duration_ms": "not-a-number"},
            "transcript": {"text": "hi", "tokens": []},
        }
        resp = cfg._build_response_from_payload(payload)
        assert "duration" not in resp.model_dump()
 class TestRenderSonioxTokens:
    def test_should_return_empty_string_for_no_tokens(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens
        assert render_soniox_tokens([]) == ""
 class TestRenderSonioxTokensAsSrt:
    def test_should_render_basic_srt(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
        tokens = [
            {"text": "Hello ", "start_ms": 0, "end_ms": 500},
            {"text": "world.", "start_ms": 500, "end_ms": 1000},
        ]
        result = render_soniox_tokens_as_srt(tokens)
        assert "1\n" in result
        assert "00:00:00,000 --> " in result
        assert "Hello world." in result
    def test_should_split_cues_on_speaker_change(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
        tokens = [
            {"text": "Hi.", "start_ms": 0, "end_ms": 1000, "speaker": "1"},
            {"text": "Hey.", "start_ms": 1500, "end_ms": 2500, "speaker": "2"},
        ]
        result = render_soniox_tokens_as_srt(tokens)
        assert "1\n" in result
        assert "2\n" in result
        assert "Hi." in result
        assert "Hey." in result
    def test_should_return_empty_string_for_no_timestamps(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
        tokens = [{"text": "no timestamps"}]
        result = render_soniox_tokens_as_srt(tokens)
        assert result == ""
    def test_should_return_empty_string_for_empty_tokens(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
        assert render_soniox_tokens_as_srt([]) == ""
    def test_should_format_long_timestamps_correctly(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
        tokens = [
            {"text": "Late.", "start_ms": 3661000, "end_ms": 3662000},
        ]
        result = render_soniox_tokens_as_srt(tokens)
        # 3661000 ms = 1 hour, 1 minute, 1 second
        assert "01:01:01,000" in result
 class TestRenderSonioxTokensAsVtt:
    def test_should_render_basic_vtt_with_header(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_vtt
        tokens = [
            {"text": "Hello ", "start_ms": 0, "end_ms": 500},
            {"text": "world.", "start_ms": 500, "end_ms": 1000},
        ]
        result = render_soniox_tokens_as_vtt(tokens)
        assert result.startswith("WEBVTT\n")
        assert "00:00:00.000 --> " in result
        assert "Hello world." in result
    def test_should_return_header_only_for_empty_tokens(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_vtt
        result = render_soniox_tokens_as_vtt([])
        assert result.startswith("WEBVTT\n")
        # Only header + blank line
        lines = result.strip().split("\n")
        assert len(lines) == 1
    def test_should_use_dot_separator_not_comma(self):
        from litellm.llms.soniox.common_utils import render_soniox_tokens_as_vtt
        tokens = [{"text": "Test.", "start_ms": 1500, "end_ms": 2500}]
        result = render_soniox_tokens_as_vtt(tokens)
        # VTT uses dots, not commas
        assert "00:00:01.500" in result
        assert "," not in result.replace("WEBVTT", "")
 class TestBuildResponseWithResponseFormat:
    def test_should_render_srt_when_response_format_is_srt(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "tokens": [
                    {"text": "Hello ", "start_ms": 0, "end_ms": 500},
                    {"text": "world.", "start_ms": 500, "end_ms": 1000},
                ]
            }
        }
        resp = cfg._build_response_from_payload(payload, response_format="srt")
        assert "00:00:00,000 --> " in resp.text
        assert "Hello world." in resp.text
    def test_should_render_vtt_when_response_format_is_vtt(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "tokens": [
                    {"text": "Hello ", "start_ms": 0, "end_ms": 500},
                    {"text": "world.", "start_ms": 500, "end_ms": 1000},
                ]
            }
        }
        resp = cfg._build_response_from_payload(payload, response_format="vtt")
        assert resp.text.startswith("WEBVTT\n")
        assert "Hello world." in resp.text
    def test_should_include_words_for_verbose_json(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "text": "Hello world.",
                "tokens": [
                    {"text": "Hello ", "start_ms": 0, "end_ms": 500},
                    {"text": "world.", "start_ms": 500, "end_ms": 1000},
                ],
            }
        }
        resp = cfg._build_response_from_payload(payload, response_format="verbose_json")
        # text should be plain (not SRT/VTT)
        assert resp.text == "Hello world."
        # words should be populated
        words = resp.get("words")
        assert words is not None
        assert len(words) == 2
        assert words[0]["word"] == "Hello "
        assert words[0]["start"] == 0.0
        assert words[0]["end"] == 0.5
        assert words[1]["start"] == 0.5
        assert words[1]["end"] == 1.0
    def test_should_default_to_plain_text_when_no_response_format(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "text": "Hello world.",
                "tokens": [
                    {"text": "Hello ", "start_ms": 0, "end_ms": 500},
                    {"text": "world.", "start_ms": 500, "end_ms": 1000},
                ],
            }
        }
        resp = cfg._build_response_from_payload(payload, response_format=None)
        assert resp.text == "Hello world."
    def test_should_fallback_to_plain_text_for_srt_with_no_timestamps(self):
        cfg = SonioxAudioTranscriptionConfig()
        payload = {
            "transcript": {
                "text": "No timestamps here.",
                "tokens": [{"text": "No timestamps here."}],
            }
        }
        # SRT requested but tokens have no start_ms/end_ms -> empty SRT
        # falls back gracefully since _group_tokens_into_cues skips them
        resp = cfg._build_response_from_payload(payload, response_format="srt")
        # With no timestamp data, SRT rendering produces empty string,
        # but we still get output because the code checks `tokens` truthiness
        # before choosing SRT path. Actually the tokens list is truthy but
        # _group_tokens_into_cues will produce no cues -> empty SRT string.
        # Let's verify it doesn't crash.
        assert isinstance(resp.text, str)
 class TestGetErrorClass:
    def test_should_return_soniox_exception(self):
        cfg = SonioxAudioTranscriptionConfig()
        err = cfg.get_error_class(error_message="boom", status_code=500, headers={})
        assert isinstance(err, SonioxException)
        assert err.status_code == 500
--- a/tests/test_litellm/llms/soniox/test_soniox_provider_registration.py
+++ b/tests/test_litellm/llms/soniox/test_soniox_provider_registration.py
@ -0,0 +1,42 @@
 """Tests verifying Soniox is correctly registered as a litellm provider."""
 import pytest
 import litellm
 class TestProviderRegistration:
    def test_should_expose_soniox_in_llm_providers_enum(self):
        assert litellm.LlmProviders.SONIOX.value == "soniox"
    def test_should_list_soniox_in_provider_list(self):
        assert "soniox" in litellm.provider_list
    def test_should_list_soniox_in_models_by_provider(self):
        assert "soniox" in litellm.models_by_provider
    def test_should_lazy_import_soniox_audio_transcription_config(self):
        cls = litellm.SonioxAudioTranscriptionConfig
        assert cls.__name__ == "SonioxAudioTranscriptionConfig"
        # Calling again should return the same class (cached).
        assert litellm.SonioxAudioTranscriptionConfig is cls
    def test_should_resolve_soniox_via_get_llm_provider(self, monkeypatch):
        monkeypatch.setenv("SONIOX_API_KEY", "test-key")
        model, provider, api_key, api_base = litellm.get_llm_provider(
            model="soniox/stt-async-v4"
        )
        assert provider == "soniox"
        assert model == "stt-async-v4"
        assert api_key == "test-key"
        assert api_base == "https://api.soniox.com"
    def test_should_return_soniox_config_from_provider_config_manager(self):
        from litellm.utils import ProviderConfigManager
        cfg = ProviderConfigManager.get_provider_audio_transcription_config(
            model="stt-async-v4",
            provider=litellm.LlmProviders.SONIOX,
        )
        assert cfg is not None
        assert cfg.__class__.__name__ == "SonioxAudioTranscriptionConfig"
--- a/tests/test_litellm/llms/you_com/init.py
+++ b/tests/test_litellm/llms/you_com/init.py
--- a/tests/test_litellm/llms/you_com/test_you_com_search.py
+++ b/tests/test_litellm/llms/you_com/test_you_com_search.py
@ -0,0 +1,384 @@
 """
 Tests for You.com Search API integration.
 """
 import os
 import sys
 import pytest
 from unittest.mock import AsyncMock, patch, MagicMock
 sys.path.insert(0, os.path.abspath("../.."))
 import litellm
 class TestYouComSearch:
    """
    Tests for You.com Search functionality with mocked network responses.
    """
    @pytest.fixture(autouse=True)
    def _set_api_key(self, monkeypatch):
        """
        Default fixture: YOUCOM_API_KEY is set, scoped to this test.
        Tests that need the key absent should call `monkeypatch.delenv` themselves.
        """
        monkeypatch.setenv("YOUCOM_API_KEY", "test-api-key")
    @pytest.mark.asyncio
    async def test_you_com_search_request_payload(self):
        """
        Validate the You.com search request payload structure without real API calls.
        """
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "results": {
                "web": [
                    {
                        "title": "Test Result 1",
                        "url": "https://example.com/1",
                        "description": "Brief description 1",
                        "snippets": ["This is a test snippet for result 1"],
                        "page_age": "2025-01-15T00:00:00Z",
                    },
                    {
                        "title": "Test Result 2",
                        "url": "https://example.com/2",
                        "description": "Brief description 2",
                        "snippets": ["This is a test snippet for result 2"],
                        "page_age": "2025-01-10T00:00:00Z",
                    },
                ],
                "news": [],
            },
            "metadata": {
                "search_uuid": "abc-123",
                "query": "latest developments in AI",
                "latency": 0.42,
            },
        }
        with patch(
            "litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
            new_callable=AsyncMock,
        ) as mock_post:
            mock_post.return_value = mock_response
            response = await litellm.asearch(
                query="latest developments in AI",
                search_provider="you_com",
                max_results=5,
            )
            assert mock_post.call_count == 1
            call_args = mock_post.call_args
            assert call_args.kwargs["url"] == "https://ydc-index.io/v1/search"
            headers = call_args.kwargs.get("headers", {})
            assert "X-API-Key" in headers
            assert headers["X-API-Key"] == "test-api-key"
            assert headers["Content-Type"] == "application/json"
            json_data = call_args.kwargs.get("json")
            assert json_data is not None
            assert json_data["query"] == "latest developments in AI"
            # max_results is mapped to You.com's `count` parameter
            assert json_data["count"] == 5
            assert hasattr(response, "results")
            assert hasattr(response, "object")
            assert response.object == "search"
            assert len(response.results) == 2
            first_result = response.results[0]
            assert first_result.title == "Test Result 1"
            assert first_result.url == "https://example.com/1"
            assert first_result.snippet == "This is a test snippet for result 1"
            assert first_result.date == "2025-01-15T00:00:00Z"
    @pytest.mark.asyncio
    async def test_you_com_search_domain_filter_and_country(self):
        """
        Validate that Perplexity-spec optional params map to You.com's parameters:
        - search_domain_filter -> include_domains
        - country              -> country (lowercased to match Tavily's convention)
        """
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "results": {"web": [], "news": []},
            "metadata": {},
        }
        with patch(
            "litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
            new_callable=AsyncMock,
        ) as mock_post:
            mock_post.return_value = mock_response
            await litellm.asearch(
                query="machine learning",
                search_provider="you_com",
                search_domain_filter=["arxiv.org", "nature.com"],
                country="US",
            )
            call_args = mock_post.call_args
            json_data = call_args.kwargs.get("json")
            assert json_data["query"] == "machine learning"
            assert json_data["include_domains"] == ["arxiv.org", "nature.com"]
            # Country is normalized to lowercase, matching Tavily's behavior.
            assert json_data["country"] == "us"
            # search_domain_filter and max_tokens_per_page (perplexity-spec names)
            # should NOT leak through to the upstream payload.
            assert "search_domain_filter" not in json_data
            assert "max_tokens_per_page" not in json_data
    @pytest.mark.asyncio
    async def test_you_com_search_snippet_fallback_to_description(self):
        """
        When `snippets` is missing/empty, snippet falls back to `description`.
        """
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "results": {
                "web": [
                    {
                        "title": "No snippets here",
                        "url": "https://example.com/3",
                        "description": "Fallback description text",
                        "snippets": [],
                        "page_age": None,
                    }
                ],
                "news": [],
            },
            "metadata": {},
        }
        with patch(
            "litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
            new_callable=AsyncMock,
        ) as mock_post:
            mock_post.return_value = mock_response
            response = await litellm.asearch(
                query="anything",
                search_provider="you_com",
            )
            assert len(response.results) == 1
            assert response.results[0].snippet == "Fallback description text"
            assert response.results[0].date is None
    @pytest.mark.asyncio
    async def test_you_com_search_news_results_appended(self):
        """
        News results are flattened in after web results.
        """
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "results": {
                "web": [
                    {
                        "title": "Web Result",
                        "url": "https://example.com/web",
                        "snippets": ["web snippet"],
                        "description": "web desc",
                        "page_age": "2025-01-01T00:00:00Z",
                    }
                ],
                "news": [
                    {
                        "title": "News Result",
                        "url": "https://news.example.com/article",
                        "description": "news desc",
                        "page_age": "2025-02-01T00:00:00Z",
                    }
                ],
            },
            "metadata": {},
        }
        with patch(
            "litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
            new_callable=AsyncMock,
        ) as mock_post:
            mock_post.return_value = mock_response
            response = await litellm.asearch(
                query="anything",
                search_provider="you_com",
            )
            assert len(response.results) == 2
            assert response.results[0].title == "Web Result"
            assert response.results[1].title == "News Result"
            # News result has no `snippets` -> falls back to description
            assert response.results[1].snippet == "news desc"
    def test_you_com_search_complete_url_handles_trailing_slash(self):
        """
        get_complete_url must normalize trailing slashes on api_base, so a custom
        base like `https://x.example/v1/search/` does not become
        `https://x.example/v1/search/v1/search`.
        """
        from litellm.llms.you_com.search.transformation import YouComSearchConfig
        config = YouComSearchConfig()
        assert (
            config.get_complete_url(
                api_base="https://x.example/v1/search/", optional_params={}
            )
            == "https://x.example/v1/search"
        )
        assert (
            config.get_complete_url(api_base="https://x.example/", optional_params={})
            == "https://x.example/v1/search"
        )
        # With an API key configured, default base is the keyed endpoint.
        assert (
            config.get_complete_url(api_base=None, optional_params={})
            == "https://ydc-index.io/v1/search"
        )
    @pytest.mark.asyncio
    async def test_you_com_search_keyless_free_tier(self, monkeypatch):
        """
        Without YOUCOM_API_KEY, the adapter targets the keyless free-tier
        endpoint and sends no X-API-Key header.
        """
        monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "results": {
                "web": [
                    {
                        "title": "Keyless Result",
                        "url": "https://example.com/keyless",
                        "snippets": ["snippet from keyless tier"],
                        "description": "desc",
                        "page_age": "2025-03-01T00:00:00Z",
                    }
                ],
                "news": [],
            },
            "metadata": {},
        }
        with patch(
            "litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
            new_callable=AsyncMock,
        ) as mock_post:
            mock_post.return_value = mock_response
            response = await litellm.asearch(
                query="hello world",
                search_provider="you_com",
            )
            call_args = mock_post.call_args
            assert call_args.kwargs["url"] == "https://api.you.com/v1/agents/search"
            headers = call_args.kwargs.get("headers", {})
            assert "X-API-Key" not in headers
            assert headers["Content-Type"] == "application/json"
            assert len(response.results) == 1
            assert response.results[0].title == "Keyless Result"
    @pytest.mark.asyncio
    async def test_you_com_search_programmatic_api_key_selects_keyed_endpoint(
        self, monkeypatch
    ):
        """
        When the key is passed programmatically (no YOUCOM_API_KEY in the env),
        the keyed endpoint must be selected and the X-API-Key header sent, instead
        of silently falling back to the keyless free tier.
        """
        monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "results": {"web": [], "news": []},
            "metadata": {},
        }
        with patch(
            "litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
            new_callable=AsyncMock,
        ) as mock_post:
            mock_post.return_value = mock_response
            await litellm.asearch(
                query="anything",
                search_provider="you_com",
                api_key="my-programmatic-key",
            )
            call_args = mock_post.call_args
            assert call_args.kwargs["url"] == "https://ydc-index.io/v1/search"
            headers = call_args.kwargs.get("headers", {})
            assert headers["X-API-Key"] == "my-programmatic-key"
    def test_you_com_search_complete_url_uses_programmatic_api_key(self, monkeypatch):
        """
        get_complete_url selects the keyed endpoint from a forwarded api_key even
        when YOUCOM_API_KEY is absent from the environment.
        """
        monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
        from litellm.llms.you_com.search.transformation import YouComSearchConfig
        config = YouComSearchConfig()
        assert (
            config.get_complete_url(
                api_base=None, optional_params={}, api_key="my-programmatic-key"
            )
            == "https://ydc-index.io/v1/search"
        )
        assert (
            config.get_complete_url(api_base=None, optional_params={}, api_key=None)
            == "https://api.you.com/v1/agents/search"
        )
    def test_you_com_search_validate_environment_keyless(self, monkeypatch):
        """
        validate_environment must NOT raise when no key is configured —
        the keyless free tier is the default behavior.
        """
        monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
        from litellm.llms.you_com.search.transformation import YouComSearchConfig
        config = YouComSearchConfig()
        headers = config.validate_environment(headers={}, api_key=None)
        assert "X-API-Key" not in headers
        assert headers["Content-Type"] == "application/json"
    def test_you_com_search_pins_identity_accept_encoding(self, monkeypatch):
        """
        The adapter pins Accept-Encoding: identity to work around the keyless
        endpoint advertising gzip content-encoding while returning bytes httpx
        can't decode. Without this, every keyless request raises DecodingError.
        """
        monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
        from litellm.llms.you_com.search.transformation import YouComSearchConfig
        config = YouComSearchConfig()
        headers = config.validate_environment(headers={}, api_key=None)
        assert headers["Accept-Encoding"] == "identity"
        # setdefault: a caller-supplied Accept-Encoding should win
        headers = config.validate_environment(
            headers={"Accept-Encoding": "gzip"}, api_key=None
        )
        assert headers["Accept-Encoding"] == "gzip"
--- a/tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py
+++ b/tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py
@ -1,4 +1,5 @@
 import importlib
 import asyncio
 import json
 import logging
 import os
@ -501,7 +502,12 @@ class TestMCPServerManager:
        captured_extra_headers = None
        async def capture_create_mcp_client(
-            server, mcp_auth_header, extra_headers, stdio_env, subject_token=None, **kwargs
+            server,
            mcp_auth_header,
            extra_headers,
            stdio_env,
            subject_token=None,
            **kwargs,
        ):  # pragma: no cover - helper
            nonlocal captured_extra_headers
            captured_extra_headers = extra_headers
@ -554,7 +560,12 @@ class TestMCPServerManager:
        captured_extra_headers = None
        async def capture_create_mcp_client(
-            server, mcp_auth_header, extra_headers, stdio_env, subject_token=None, **kwargs
+            server,
            mcp_auth_header,
            extra_headers,
            stdio_env,
            subject_token=None,
            **kwargs,
        ):  # pragma: no cover - helper
            nonlocal captured_extra_headers
            captured_extra_headers = extra_headers
@ -610,7 +621,12 @@ class TestMCPServerManager:
        captured_extra_headers = None
        async def capture_create_mcp_client(
-            server, mcp_auth_header, extra_headers, stdio_env, subject_token=None, **kwargs
+            server,
            mcp_auth_header,
            extra_headers,
            stdio_env,
            subject_token=None,
            **kwargs,
        ):  # pragma: no cover - helper
            nonlocal captured_extra_headers
            captured_extra_headers = extra_headers
@ -3066,6 +3082,121 @@ class TestMCPServerTimestamps:
        rebuilt_table = manager._build_mcp_server_table(mcp_server)
        assert rebuilt_table.source_url == "https://github.com/org/mcp-server"
    @pytest.mark.asyncio
    async def test_round_trip_timeout_preserved(self):
        """timeout survives the full round-trip: LiteLLM_MCPServerTable -> MCPServer -> LiteLLM_MCPServerTable."""
        manager = MCPServerManager()
        table_record = LiteLLM_MCPServerTable(
            server_id="timeout-server",
            server_name="timeout_server",
            url="https://example.com/mcp",
            transport=MCPTransport.http,
            timeout=120.0,
        )
        mcp_server = await manager.build_mcp_server_from_table(table_record)
        assert mcp_server.timeout == 120.0
        rebuilt_table = manager._build_mcp_server_table(mcp_server)
        assert rebuilt_table.timeout == 120.0
    @pytest.mark.asyncio
    async def test_create_mcp_client_uses_server_timeout(self):
        """_create_mcp_client must pass server.timeout to MCPClient when set."""
        manager = MCPServerManager()
        server = MCPServer(
            server_id="timeout-client-server",
            name="timeout_client_server",
            url="https://example.com/mcp",
            transport=MCPTransport.http,
            timeout=180.0,
        )
        client = await manager._create_mcp_client(server)
        assert client.timeout == 180.0
    @pytest.mark.asyncio
    async def test_create_mcp_client_falls_back_to_global_timeout(self):
        """_create_mcp_client must fall back to MCP_CLIENT_TIMEOUT when server.timeout is None."""
        from litellm.constants import MCP_CLIENT_TIMEOUT
        manager = MCPServerManager()
        server = MCPServer(
            server_id="default-timeout-server",
            name="default_timeout_server",
            url="https://example.com/mcp",
            transport=MCPTransport.http,
        )
        client = await manager._create_mcp_client(server)
        assert client.timeout == MCP_CLIENT_TIMEOUT
    @pytest.mark.asyncio
    async def test_create_mcp_client_zero_timeout_not_treated_as_falsy(self):
        """server.timeout=0.0 must be passed through, not fall back to MCP_CLIENT_TIMEOUT."""
        manager = MCPServerManager()
        server = MCPServer(
            server_id="zero-timeout-server",
            name="zero_timeout_server",
            url="https://example.com/mcp",
            transport=MCPTransport.http,
            timeout=0.0,
        )
        client = await manager._create_mcp_client(server)
        assert client.timeout == 0.0
    @pytest.mark.asyncio
    async def test_load_servers_from_config_preserves_timeout(self):
        """timeout from proxy config is loaded into MCPServer."""
        manager = MCPServerManager()
        config = {
            "my_server": {
                "url": "https://example.com/mcp",
                "transport": MCPTransport.http,
                "timeout": 90.0,
            }
        }
        await manager.load_servers_from_config(config)
        servers = list(manager.config_mcp_servers.values())
        assert len(servers) == 1
        assert servers[0].timeout == 90.0
    @pytest.mark.asyncio
    async def test_call_regular_mcp_tool_timeout_returns_504(self):
        """When the MCP client call is cancelled (timeout), _call_regular_mcp_tool raises HTTPException 504."""
        from unittest.mock import AsyncMock, patch
        manager = MCPServerManager()
        async def _slow_call(*args, **kwargs):
            await asyncio.sleep(999)
        mock_client = AsyncMock()
        mock_client.call_tool = _slow_call
        server = MCPServer(
            server_id="timeout-tool-server",
            name="timeout_tool_server",
            url="https://example.com/mcp",
            transport=MCPTransport.http,
            timeout=0.01,
        )
        with patch.object(manager, "_create_mcp_client", return_value=mock_client):
            with pytest.raises(HTTPException) as exc_info:
                await manager._call_regular_mcp_tool(
                    mcp_server=server,
                    original_tool_name="some_tool",
                    arguments={},
                    tasks=[],
                    mcp_auth_header=None,
                    mcp_server_auth_headers=None,
                    oauth2_headers=None,
                    raw_headers=None,
                    proxy_logging_obj=None,
                )
        assert exc_info.value.status_code == 504
        assert exc_info.value.detail["error"] == "timeout"
        assert "0.01s" in exc_info.value.detail["message"]
 class TestInternalDelegatePkceWarningLog:
    @pytest.mark.asyncio
--- a/ui/litellm-dashboard/public/assets/logos/soniox.svg
+++ b/ui/litellm-dashboard/public/assets/logos/soniox.svg
@ -0,0 +1 @@
 <svg viewBox="0 0 100 17.5" width="92" fill="white" xmlns="http://www.w3.org/2000/svg"><title>Soniox</title><path d="m0 14.866 2.1606-3.5214c1.8927 1.2576 3.9669 1.8995 5.6694 1.8995 1.0025 0 1.4606-0.3036 1.4606-0.8847v-0.0607c0-0.6419-0.9161-0.9194-2.6532-1.4138-3.2582-0.8587-5.8509-1.9602-5.8509-5.2995v-0.06938c0-3.5214 2.8088-5.4903 6.6114-5.4903 2.4112 0 4.9089 0.70255 6.8016 1.9342l-1.9791 3.6775c-1.7112-0.95408-3.5693-1.5352-4.8744-1.5352-0.88152 0-1.3396 0.33827-1.3396 0.79796v0.06071c0 0.64184 0.94202 0.95409 2.6792 1.4745 3.2582 0.91939 5.8509 2.0556 5.8509 5.2735v0.0607c0 3.6515-2.7137 5.551-6.741 5.551-2.7656-0.0087-5.5052-0.798-7.7955-2.4546z"></path><path d="m16.135 8.7342v-0.06071c0-4.7184 3.8372-8.6735 9.1436-8.6735 5.2719 0 9.0832 3.8944 9.0832 8.6127v0.06072c0 4.7184-3.8372 8.6735-9.1437 8.6735-5.2718 0-9.0831-3.8944-9.0831-8.6128zm12.583 0v-0.06071c0-2.0209-1.4606-3.7383-3.5088-3.7383-2.1001 0-3.4483 1.6826-3.4483 3.6775v0.06072c0 2.0209 1.4605 3.7383 3.5088 3.7383 2.1087 0 3.4483-1.6827 3.4483-3.6776z"></path><path d="m36.877 0.36428h5.7904v2.3332c1.063-1.3791 2.5927-2.6974 4.9348-2.6974 3.5089 0 5.609 2.3332 5.609 6.0974v10.85h-5.7905v-8.977c0-1.8041-0.942-2.7929-2.3161-2.7929-1.4001 0-2.4372 0.9801-2.4372 2.7929v8.977h-5.7904z"></path><path d="m55.951 0.36426h5.7904v16.584h-5.7904z"></path><path d="m64.29 8.7342v-0.06071c0-4.7184 3.8373-8.6735 9.1437-8.6735 5.2719 0 9.0832 3.8944 9.0832 8.6127v0.06072c0 4.7184-3.8372 8.6735-9.1437 8.6735-5.2719 0-9.0832-3.8944-9.0832-8.6128zm12.592 0v-0.06071c0-2.0209-1.4605-3.7383-3.5088-3.7383-2.1001 0-3.4483 1.6826-3.4483 3.6775v0.06072c0 2.0209 1.4606 3.7383 3.5088 3.7383 2.1088 0 3.4483-1.6827 3.4483-3.6776z"></path><path d="m88.082 8.578-5.4533-8.2138h6.2484l2.4372 4.0765 2.4371-4.0765h6.1275l-5.4274 8.1791 5.5484 8.3959h-6.2225l-2.5582-4.2587-2.5927 4.2587h-6.0929z"></path></svg>
--- a/ui/litellm-dashboard/src/components/provider_info_helpers.tsx
+++ b/ui/litellm-dashboard/src/components/provider_info_helpers.tsx
@ -87,6 +87,7 @@ export enum Providers {
  Sambanova = "Sambanova",
  SAP = "SAP Generative AI Hub",
  Snowflake = "Snowflake",
  Soniox = "Soniox",
  TEXT_COMPLETION_CODESTRAL = "Text-Completion-Codestral",
  TogetherAI = "TogetherAI",
  TOPAZ = "Topaz",
@ -195,6 +196,7 @@ export const provider_map: Record<string, string> = {
  Sambanova: "sambanova",
  SAP: "sap",
  Snowflake: "snowflake",
  Soniox: "soniox",
  TEXT_COMPLETION_CODESTRAL: "text-completion-codestral",
  TogetherAI: "together_ai",
  TOPAZ: "topaz",
@ -286,6 +288,7 @@ export const providerLogoMap: Record<string, string> = {
  [Providers.Sambanova]: `${asset_logos_folder}sambanova.svg`,
  [Providers.SAP]: `${asset_logos_folder}sap.png`,
  [Providers.Snowflake]: `${asset_logos_folder}snowflake.svg`,
  [Providers.Soniox]: `${asset_logos_folder}soniox.svg`,
  [Providers.TEXT_COMPLETION_CODESTRAL]: `${asset_logos_folder}mistral.svg`,
  [Providers.TogetherAI]: `${asset_logos_folder}togetherai.svg`,
  [Providers.TOPAZ]: `${asset_logos_folder}topaz.svg`,
		`@ -0,0 +1,3 @@`
							`-- AlterTable`
							`ALTER TABLE "LiteLLM_MCPServerTable" ADD COLUMN "timeout" DOUBLE PRECISION;`
		`@ -0,0 +1 @@`
							`"""Soniox audio transcription implementation."""`
		`@ -0,0 +1 @@`
							<svg viewBox="0 0 100 17.5" width="92" fill="white" xmlns="http://www.w3.org/2000/svg"><title>Soniox</title><path d="m0 14.866 2.1606-3.5214c1.8927 1.2576 3.9669 1.8995 5.6694 1.8995 1.0025 0 1.4606-0.3036 1.4606-0.8847v-0.0607c0-0.6419-0.9161-0.9194-2.6532-1.4138-3.2582-0.8587-5.8509-1.9602-5.8509-5.2995v-0.06938c0-3.5214 2.8088-5.4903 6.6114-5.4903 2.4112 0 4.9089 0.70255 6.8016 1.9342l-1.9791 3.6775c-1.7112-0.95408-3.5693-1.5352-4.8744-1.5352-0.88152 0-1.3396 0.33827-1.3396 0.79796v0.06071c0 0.64184 0.94202 0.95409 2.6792 1.4745 3.2582 0.91939 5.8509 2.0556 5.8509 5.2735v0.0607c0 3.6515-2.7137 5.551-6.741 5.551-2.7656-0.0087-5.5052-0.798-7.7955-2.4546z"></path><path d="m16.135 8.7342v-0.06071c0-4.7184 3.8372-8.6735 9.1436-8.6735 5.2719 0 9.0832 3.8944 9.0832 8.6127v0.06072c0 4.7184-3.8372 8.6735-9.1437 8.6735-5.2718 0-9.0831-3.8944-9.0831-8.6128zm12.583 0v-0.06071c0-2.0209-1.4606-3.7383-3.5088-3.7383-2.1001 0-3.4483 1.6826-3.4483 3.6775v0.06072c0 2.0209 1.4605 3.7383 3.5088 3.7383 2.1087 0 3.4483-1.6827 3.4483-3.6776z"></path><path d="m36.877 0.36428h5.7904v2.3332c1.063-1.3791 2.5927-2.6974 4.9348-2.6974 3.5089 0 5.609 2.3332 5.609 6.0974v10.85h-5.7905v-8.977c0-1.8041-0.942-2.7929-2.3161-2.7929-1.4001 0-2.4372 0.9801-2.4372 2.7929v8.977h-5.7904z"></path><path d="m55.951 0.36426h5.7904v16.584h-5.7904z"></path><path d="m64.29 8.7342v-0.06071c0-4.7184 3.8373-8.6735 9.1437-8.6735 5.2719 0 9.0832 3.8944 9.0832 8.6127v0.06072c0 4.7184-3.8372 8.6735-9.1437 8.6735-5.2719 0-9.0832-3.8944-9.0832-8.6128zm12.592 0v-0.06071c0-2.0209-1.4605-3.7383-3.5088-3.7383-2.1001 0-3.4483 1.6826-3.4483 3.6775v0.06072c0 2.0209 1.4606 3.7383 3.5088 3.7383 2.1088 0 3.4483-1.6827 3.4483-3.6776z"></path><path d="m88.082 8.578-5.4533-8.2138h6.2484l2.4372 4.0765 2.4371-4.0765h6.1275l-5.4274 8.1791 5.5484 8.3959h-6.2225l-2.5582-4.2587-2.5927 4.2587h-6.0929z"></path></svg>