Litellm oss staging 050626 (#29774)

* Mark xAI models retiring on 2026-05-15 (#28788)

Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is
retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3
with various reasoning efforts; callers continuing to use the old slugs
will be billed at grok-4.3 pricing):

  grok-4-1-fast-reasoning{,-latest}      -> grok-4.3 (low effort)
  grok-4-1-fast-non-reasoning{,-latest}  -> grok-4.3 (none)
  grok-4-fast-reasoning                  -> grok-4.3 (low effort)
  grok-4-fast-non-reasoning              -> grok-4.3 (none)
  grok-4-0709                            -> grok-4.3 (low effort)
  grok-code-fast-1{,-0825}               -> grok-build-0.1
  grok-3                                 -> grok-4.3 (none)

Only the direct xai/ slugs are tagged; third-party hosts (azure_ai,
oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The
grok-3 retirement list explicitly names only the base grok-3 slug — the
-mini / -fast / -beta / -latest variants are not listed, so they remain
untouched.

* feat(moonshot): advertise json_schema response support on live models (#29683)

litellm.responses() already routes Moonshot through the responses->chat-completions
bridge, and Moonshot honors response_format json_schema on chat completions. The
cost-map entries left supports_response_schema unset, so discovery layers that gate
on that flag dropped Moonshot from structured-output / responses listings even though
the capability works end to end.

Set supports_response_schema on the nine models currently live on api.moonshot.ai:
kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants,
and moonshot-v1-auto. Verified against the live API that each honors json_schema and
that litellm.responses() returns schema-valid structured output through the bridge.

* chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685)

Thirteen Moonshot/Kimi models in the cost map no longer resolve on
api.moonshot.ai (all return 404). Stamp each with its deprecation_date from
platform.kimi.ai/docs/models rather than deleting the entries, so historical
cost calculation keeps resolving the names while tooling can surface the
retirement.

Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context
variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the
moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot
publishes no discontinuation date for them).

* fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687)

Kimi reasoning models reject every temperature except 1; a request with
temperature=0.2 returns "invalid temperature: only 1 is allowed for this model".
litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd.

Drop the temperature param entirely for reasoning models (gated on
supports_reasoning, the same signal transform_request already uses) so the model
default is used; the non-reasoning moonshot-v1 models keep the existing clamp.

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(mcp): add per-server timeout configuration (#29672)

* feat(mcp): add per-server timeout configuration

* fix(mcp): address timeout field review comments

- use is not None guard instead of or for 0.0 edge case
- copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table)
- add timeout Float? column to all three schema.prisma files
- extend round-trip test to cover _build_mcp_server_table direction
- add test for zero timeout not treated as falsy

* fix(mcp): forward timeout in _build_temporary_mcp_server_record

* fix(mcp): return 504 instead of 500 when per-server timeout fires

* test(mcp): add 504 timeout regression test; fix black formatting

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7

AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the
existing us./eu./au./global. profiles for Claude Opus 4.7
(ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is
missing from model_prices_and_context_window.json. Tokyo-region
users currently get an "unknown model" error when routing through
the JP geo profile.

Adds the entry to both the canonical file and the bundled backup,
mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches
the other regional profiles (10% premium over base/global).

Regression test pins all six documented profiles (base, global, us, eu,
au, jp) and asserts pricing parity between jp. and au. variants.

Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(soniox): add soniox audio transcription integration (#29508)

* feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650)

The OpenMeter callback resolves the CloudEvent subject from kwargs["user"]
first, then falls back to the key-bound user_api_key_user_id. For
multi-tenant proxy deployments, a client can set `"user": "..."` in the
request body and cause their usage to be attributed to that arbitrary
string — a billing-attribution forgery risk.

Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward
compatibility). When set to "false", the request-supplied `user` field is
ignored and the subject is resolved solely from user_api_key_user_id.

Matches the existing env-var-driven config pattern in this file
(OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE).

* feat(search): add you_com as a search provider (#28370)

* feat(search): add you_com as a search provider

Registers You.com Search API as a first-class `search_provider` in the
`search_tools` registry, alongside Tavily, Exa, Perplexity, etc.

- New adapter: litellm/llms/you_com/search/transformation.py
  - POSTs to https://ydc-index.io/v1/search
  - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key)
  - Maps Perplexity unified spec: max_results -> count,
    search_domain_filter -> include_domains, country -> country
  - Flattens results.web + results.news into a single SearchResult list;
    snippet prefers snippets[0], falls back to description; page_age -> date
- Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired
  into ProviderConfigManager.get_provider_search_config()
- Pricing entry: model_prices_and_context_window.json (placeholder $0.0;
  happy to adjust to maintainers' preferred public number)
- Docs: example router config snippet and example proxy yaml updated
- Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests
  (payload shape, domain filter mapping, snippet fallback, news flattening,
  missing-api-key error)

Refs upstream expansion signal: #15942

* review fixups: normalize api_base, lowercase country, scope env-var to test

Addresses Greptile inline review comments on #28370:

- get_complete_url: strip trailing slashes from api_base *before* the
  endswith("/v1/search") check, so a custom base like ".../v1/search/"
  doesn't become ".../v1/search/v1/search".
- transform_search_request: .lower() country before sending, matching
  Tavily's convention so callers using the unified spec form ("US") get
  consistent behavior across providers.
- Tests: replace direct os.environ writes with an autouse monkeypatch
  fixture so YOUCOM_API_KEY is set per-test and removed afterwards.
  The missing-key test now uses monkeypatch.delenv. New test asserts the
  trailing-slash normalization above.

Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note
that documentation changes belong in the litellm-docs repo.

* support keyless free tier (api.you.com/v1/agents/search) as default

You.com offers an IP-throttled keyless endpoint that returns the same
response shape as the keyed one (~100 queries/day, no signup). This is a
significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG
providers already in the search_tools registry.

Behavior:
- YOUCOM_API_KEY set        -> keyed:  POST https://ydc-index.io/v1/search
                                       (X-API-Key header)
- no key                    -> free:   POST https://api.you.com/v1/agents/search
                                       (no auth)
- YOUCOM_API_BASE override  -> honored as-is

Tests:
- New: test_you_com_search_keyless_free_tier - asserts URL + absence of
  X-API-Key when no key is configured.
- New: test_you_com_search_validate_environment_keyless - asserts the
  config no longer raises when the key is absent.
- Removed: test_you_com_search_raises_without_api_key (the precondition
  no longer holds).
- Existing payload/domain-filter/etc tests still cover keyed mode via
  the autouse YOUCOM_API_KEY fixture.

Verified both endpoints accept POST + return identical JSON shape:
  results.web[] / results.news[] with title, url, snippets, description,
  page_age.

* register you_com in provider_endpoints_support.json

Adding `litellm/llms/you_com/` requires a corresponding entry in
provider_endpoints_support.json or the
code-quality/check_provider_folders_documented CI check fails.

Follows the compact tavily/serper pattern - endpoints: { search: true }.
Local run of the check now reports "All 114 provider folders are documented".

* move tests under tests/test_litellm/llms/ so CI exercises them

The litellm CI workflows scope unit tests to `tests/test_litellm/...`
(see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so
tests living under `tests/search_tests/` are never run in CI - which is
why codecov reports 0% patch coverage for the new adapter even though
the unit tests exist and pass locally.

Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so
the test-unit-llm-providers job picks it up. 7/7 tests still pass at
the new location.

(Sibling search-only providers - tavily, exa_ai, brave, etc. - still
live only in `tests/search_tests/` and would benefit from the same
move, but that is out of scope for this PR.)

* fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug

The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises
Content-Encoding: gzip but returns a body that httpx's decoder rejects
with `zlib.error: Error -3 while decompressing data: incorrect header
check`, surfacing as litellm.APIConnectionError in user code. curl works
because it doesn't request compression by default.

Pin Accept-Encoding: identity in validate_environment so the upstream
server skips compression entirely. Harmless on the keyed endpoint
(ydc-index.io/v1/search) which negotiates content-encoding correctly.

The header uses setdefault so a caller-supplied Accept-Encoding still
takes precedence. (Server-side bug has been flagged to the You.com team
separately - once fixed there, this workaround can be removed.)

New unit test: test_you_com_search_pins_identity_accept_encoding.

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* docs: fix README typo (#29419)

Correct clear spelling mistakes in documentation without changing behavior.

Confidence: high
Scope-risk: narrow
Tested: git diff --check; uvx codespell on changed files
Not-tested: Full docs build not run; text-only changes

* Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480)

* fix(langfuse): pass ssl_verify to Langfuse httpx client

* fix_langfuse_

* add unit tests

* addressed comments

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(models): add minimax/MiniMax-M3 to model cost map (#29412)

Add MiniMax's new flagship MiniMax-M3 to the native minimax provider:
512K context, 128K max output, native multimodal (supports_vision),
reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output
2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so
cache_creation_input_token_cost is omitted.

Updated both the root model_prices_and_context_window.json (remote
source) and the bundled litellm/model_prices_and_context_window_backup.json
(local fallback), keeping them in sync.

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394)

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log

* fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation

* feat(provider): Add Neosantara provider as OpenAI Compatible (#29646)

* Add Neosantara provider

* Register Neosantara provider enum

* Address Neosantara provider review feedback

* Add Neosantara packaged endpoint support

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix: address greptile and veria review feedback

- langfuse: guard httpx_client injection behind version check (>= 2.7.3)
- soniox: propagate audio_transcription_duration in _hidden_params for spend tracking
- soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base
- mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError

* chore(mcp): add migration for per-server timeout column

* fix(test): add tool_use_system_prompt_tokens to model prices schema validator

* fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key

* fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs

The search flow resolves api_key in validate_environment but never passed it
into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the
env) set the X-API-Key header yet still selected the keyless free-tier endpoint.
Forward api_key through both the search entrypoint and the http handler so the
keyed endpoint is chosen.

HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox
poll and transcript-fetch GETs silently used the client global default instead
of the caller timeout. Add a per-request timeout to get() and forward the
configured timeout from the Soniox handler.

* fix(soniox): price stt-async-v4 per second so transcriptions are billed

The handler stores audio_transcription_duration in _hidden_params, but the
model carried only token cost fields and the response has no token usage, so
the transcription cost path fell through to cost_per_second and returned $0.
An authenticated caller could transcribe Soniox audio without decrementing
their budget. Switch the entry to output_cost_per_second at Soniox's published
$0.10/hour async rate so the stored duration produces a real charge.

* fix(langfuse): use a dedicated httpx client for the SDK injection

The httpx_client handed to the Langfuse SDK came from _get_httpx_client(),
which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that
client on teardown it would invalidate the shared client used by every other
LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL
verification and client certificate from LiteLLM's configuration.

* fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var

* fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779)

* fix(cohere): support max_completion_tokens on cohere v2 chat

The default cohere_chat route resolves to CohereV2ChatConfig, which did not
list or map max_completion_tokens, so get_optional_params raised
UnsupportedParamsError for the standard OpenAI parameter (the modern
replacement for the deprecated max_tokens). The v1 config already maps it to
cohere's max_tokens; mirror that in v2 and add v2 regression tests.

* fix(cohere): make max_completion_tokens take precedence over max_tokens on v2

When both max_tokens and max_completion_tokens are supplied, prefer
max_completion_tokens explicitly rather than relying on dict iteration order,
and cover both orderings with a regression test.

---------

Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com>
Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Dan Lemon <dan@danlemon.com>
Co-authored-by: Saswat <saswatds@users.noreply.github.com>
Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com>
Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com>
Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: kape <168134658+kapelame@users.noreply.github.com>
Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com>
Co-authored-by: Just R <remixingmagelang@gmail.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
This commit is contained in:
Sameer Kankute 2026-06-06 02:21:51 +05:30 committed by GitHub
parent 4a5644d51e
commit d671a09c20
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
58 changed files with 4615 additions and 63 deletions

View File

@ -407,7 +407,7 @@ Support for more providers. Missing a provider or LLM Platform, raise a [feature
### Run in Developer Mode
#### Services
1. Setup .env file in root
2. Run dependant services `docker-compose up db prometheus`
2. Run dependent services `docker-compose up db prometheus`
#### Backend
1. (In root) create virtual environment `python -m venv .venv`

View File

@ -0,0 +1,3 @@
-- AlterTable
ALTER TABLE "LiteLLM_MCPServerTable" ADD COLUMN "timeout" DOUBLE PRECISION;

View File

@ -331,6 +331,7 @@ model LiteLLM_MCPServerTable {
byok_description String[] @default([])
byok_api_key_help_url String?
source_url String?
timeout Float?
// BYOM submission lifecycle
approval_status String? @default("active")
submitted_by String?

View File

@ -612,6 +612,7 @@ cerebras_models: Set = set()
galadriel_models: Set = set()
nvidia_nim_models: Set = set()
nvidia_riva_models: Set = set()
soniox_models: Set = set()
sambanova_models: Set = set()
sambanova_embedding_models: Set = set()
novita_models: Set = set()
@ -844,6 +845,8 @@ def add_known_models(model_cost_map: Optional[Dict] = None):
nvidia_nim_models.add(key)
elif value.get("litellm_provider") == "nvidia_riva":
nvidia_riva_models.add(key)
elif value.get("litellm_provider") == "soniox":
soniox_models.add(key)
elif value.get("litellm_provider") == "sambanova":
sambanova_models.add(key)
elif value.get("litellm_provider") == "sambanova-embedding-models":
@ -1009,6 +1012,7 @@ model_list = list(
| galadriel_models
| nvidia_nim_models
| nvidia_riva_models
| soniox_models
| sambanova_models
| azure_text_models
| novita_models
@ -1109,6 +1113,7 @@ models_by_provider: dict = {
"galadriel": galadriel_models,
"nvidia_nim": nvidia_nim_models,
"nvidia_riva": nvidia_riva_models,
"soniox": soniox_models,
"sambanova": sambanova_models | sambanova_embedding_models,
"novita": novita_models,
"nebius": nebius_models | nebius_embedding_models,

View File

@ -321,6 +321,7 @@ LLM_CONFIG_NAMES = (
"LemonadeChatConfig",
"SnowflakeEmbeddingConfig",
"AmazonNovaChatConfig",
"SonioxAudioTranscriptionConfig",
)
# Types that support lazy loading via _lazy_import_types
@ -1195,6 +1196,10 @@ _LLM_CONFIGS_IMPORT_MAP = {
".llms.amazon_nova.chat.transformation",
"AmazonNovaChatConfig",
),
"SonioxAudioTranscriptionConfig": (
".llms.soniox.audio_transcription.transformation",
"SonioxAudioTranscriptionConfig",
),
}
# Import map for utils module lazy imports

View File

@ -556,7 +556,9 @@ class MCPClient:
)
return tool_result
except asyncio.CancelledError:
verbose_logger.warning("MCP client tool call was cancelled")
verbose_logger.warning(
f"MCP client tool call timed out after {self.timeout}s for {self.server_url}"
)
raise
except Exception as e:
import traceback

View File

@ -102,6 +102,18 @@ def langfuse_client_init(
if Version(langfuse.version.__version__) >= Version("2.6.0"):
parameters["sdk_integration"] = "litellm"
if Version(langfuse.version.__version__) >= Version("2.7.3"):
import httpx
import litellm
from ...llms.custom_httpx.http_handler import get_ssl_configuration
parameters["httpx_client"] = httpx.Client(
verify=get_ssl_configuration(),
cert=os.getenv("SSL_CERTIFICATE", litellm.ssl_certificate),
)
client = Langfuse(**parameters)
return client

View File

@ -65,7 +65,15 @@ class OpenMeterLogger(CustomLogger):
"total_tokens": response_obj["usage"].get("total_tokens"),
}
user_param = kwargs.get("user", None) # end-user passed in via 'user' param
# OPENMETER_TRUST_REQUEST_USER (default "true"): when set to "false",
# the request-supplied `user` field is ignored and the subject is
# resolved solely from the key-bound user_api_key_user_id. Proxies
# serving multi-tenant traffic enable this to prevent clients from
# forging attribution by setting `user` in the request body.
trust_request_user = (
os.getenv("OPENMETER_TRUST_REQUEST_USER", "true").lower() != "false"
)
user_param = kwargs.get("user", None) if trust_request_user else None
# If no user provided directly, try to get it from token user_id
if user_param is None:

View File

@ -244,6 +244,9 @@ search_tools:
- search_tool_name: "my-tavily-tool"
litellm_params:
search_provider: "tavily"
- search_tool_name: "my-you-com-tool"
litellm_params:
search_provider: "you_com"
```
---

View File

@ -659,6 +659,11 @@ def _get_openai_compatible_provider_info( # noqa: PLR0915
or get_secret_str("NVIDIA_RIVA_API_KEY")
or get_secret_str("NVIDIA_NIM_API_KEY")
)
elif custom_llm_provider == "soniox":
api_base = (
api_base or get_secret_str("SONIOX_API_BASE") or "https://api.soniox.com"
)
dynamic_api_key = api_key or get_secret_str("SONIOX_API_KEY")
elif custom_llm_provider == "cerebras":
api_base = (
api_base or get_secret("CEREBRAS_API_BASE") or "https://api.cerebras.ai/v1"

View File

@ -341,6 +341,11 @@ def get_supported_openai_params( # noqa: PLR0915
return ElevenLabsAudioTranscriptionConfig().get_supported_openai_params(
model=model
)
elif custom_llm_provider == "soniox":
if request_type == "transcription":
return litellm.SonioxAudioTranscriptionConfig().get_supported_openai_params(
model=model
)
elif custom_llm_provider in litellm._custom_providers:
if request_type == "chat_completion":
provider_config = litellm.ProviderConfigManager.get_provider_chat_config(

View File

@ -3503,7 +3503,9 @@ class Logging(LiteLLMLoggingBaseClass):
else:
return None
def _handle_anthropic_messages_response_logging(self, result: Any) -> ModelResponse:
def _handle_anthropic_messages_response_logging(
self, result: Any
) -> Union[ModelResponse, ResponsesAPIResponse]:
"""
Handles logging for Anthropic messages responses.
@ -3522,6 +3524,15 @@ class Logging(LiteLLMLoggingBaseClass):
return result
elif isinstance(result, ModelResponse):
return result
elif isinstance(
result,
(ResponseCompletedEvent, ResponseIncompleteEvent, ResponseFailedEvent),
):
# anthropic_messages() can route to OpenAI Responses API; in that path
# the assembled streaming result is one of these terminal events rather than
# a ModelResponse. Return the inner response so downstream handlers
# (_transform_usage_objects, normalize_logging_result) can process it.
return result.response
httpx_response = self.model_call_details.get("httpx_response", None)
if httpx_response and isinstance(httpx_response, httpx.Response):

View File

@ -120,6 +120,7 @@ class CohereV2ChatConfig(OpenAIGPTConfig):
"stream",
"temperature",
"max_tokens",
"max_completion_tokens",
"top_p",
"frequency_penalty",
"presence_penalty",
@ -143,7 +144,12 @@ class CohereV2ChatConfig(OpenAIGPTConfig):
optional_params["stream"] = value
if param == "temperature":
optional_params["temperature"] = value
if param == "max_tokens":
if (
param == "max_tokens"
and "max_completion_tokens" not in non_default_params
):
optional_params["max_tokens"] = value
if param == "max_completion_tokens":
optional_params["max_tokens"] = value
if param == "n":
optional_params["num_generations"] = value

View File

@ -589,6 +589,7 @@ class AsyncHTTPHandler:
params: Optional[dict] = None,
headers: Optional[dict] = None,
follow_redirects: Optional[bool] = None,
timeout: Optional[Union[float, httpx.Timeout]] = None,
):
# Set follow_redirects to UseClientDefault if None
_follow_redirects = (
@ -599,7 +600,11 @@ class AsyncHTTPHandler:
params.update(HTTPHandler.extract_query_params(url))
response = await self.client.get(
url, params=params, headers=headers, follow_redirects=_follow_redirects # type: ignore
url,
params=params,
headers=headers, # type: ignore
follow_redirects=_follow_redirects, # type: ignore
timeout=timeout if timeout is not None else USE_CLIENT_DEFAULT,
)
return response
@ -1115,6 +1120,7 @@ class HTTPHandler:
params: Optional[dict] = None,
headers: Optional[dict] = None,
follow_redirects: Optional[bool] = None,
timeout: Optional[Union[float, httpx.Timeout]] = None,
):
# Set follow_redirects to UseClientDefault if None
_follow_redirects = (
@ -1128,6 +1134,7 @@ class HTTPHandler:
params=params,
headers=headers,
follow_redirects=_follow_redirects,
timeout=timeout if timeout is not None else USE_CLIENT_DEFAULT,
)
return response

View File

@ -1751,6 +1751,7 @@ class BaseLLMHTTPHandler:
api_base=api_base,
optional_params=optional_params,
data=data,
api_key=api_key,
)
## LOGGING
@ -1833,6 +1834,7 @@ class BaseLLMHTTPHandler:
api_base=api_base,
optional_params=optional_params,
data=data,
api_key=api_key,
)
## LOGGING

View File

@ -134,11 +134,15 @@ class MoonshotChatConfig(OpenAIGPTConfig):
##########################################
# temperature limitations
# 1. `temperature` on KIMI API is [0, 1] but OpenAI is [0, 2]
# 2. If temperature < 0.3 and n > 1, KIMI will raise an exception.
# 1. reasoning models (kimi-k2.5, kimi-k2.6, ...) reject every temperature
# except 1, so the param is dropped and the model's default is used
# 2. `temperature` on KIMI API is [0, 1] but OpenAI is [0, 2]
# 3. If temperature < 0.3 and n > 1, KIMI will raise an exception.
# If we enter this condition, we set the temperature to 0.3 as suggested by Moonshot AI
##########################################
if "temperature" in optional_params:
if supports_reasoning(model=model, custom_llm_provider="moonshot"):
optional_params.pop("temperature", None)
elif "temperature" in optional_params:
if optional_params["temperature"] > 1:
optional_params["temperature"] = 1
if optional_params["temperature"] < 0.3 and optional_params.get("n", 1) > 1:

View File

@ -115,6 +115,15 @@
"max_completion_tokens": "max_tokens"
}
},
"neosantara": {
"base_url": "https://api.neosantara.xyz/v1",
"api_key_env": "NEOSANTARA_API_KEY",
"api_base_env": "NEOSANTARA_API_BASE",
"param_mappings": {
"max_completion_tokens": "max_tokens"
},
"supported_endpoints": ["/v1/chat/completions", "/v1/responses"]
},
"tensormesh": {
"base_url": "https://serverless.tensormesh.ai/v1",
"api_key_env": "TENSORMESH_INFERENCE_API_KEY",

View File

@ -0,0 +1 @@
"""Soniox LLM provider implementation."""

View File

@ -0,0 +1 @@
"""Soniox audio transcription implementation."""

View File

@ -0,0 +1,802 @@
"""
Handler for Soniox async speech-to-text transcription.
Soniox's async transcription API requires multiple HTTP calls:
1. (optional) POST /v1/files upload a local audio file
2. POST /v1/transcriptions create a transcription job
3. GET /v1/transcriptions/{id} poll until status == "completed"
4. GET /v1/transcriptions/{id}/transcript fetch the transcript
5. (optional) DELETE /v1/transcriptions/{id} cleanup
6. (optional) DELETE /v1/files/{id} cleanup
Because this does not fit the single-request shape of
`base_llm_http_handler.audio_transcriptions`, the dispatch in
`litellm.main.transcription()` routes Soniox requests directly to this
handler (analogous to the OpenAI / Azure transcription handlers).
"""
import asyncio
import math
import time
from typing import (
TYPE_CHECKING,
Any,
Coroutine,
Dict,
List,
Optional,
Tuple,
Union,
)
import httpx
from litellm.litellm_core_utils.audio_utils.utils import (
get_audio_file_name,
process_audio_file,
)
from litellm.llms.custom_httpx.http_handler import (
AsyncHTTPHandler,
HTTPHandler,
_get_httpx_client,
get_async_httpx_client,
)
from litellm.llms.soniox.audio_transcription.transformation import (
SonioxAudioTranscriptionConfig,
)
from litellm.llms.soniox.common_utils import (
SONIOX_DEFAULT_CLEANUP,
SONIOX_DEFAULT_MAX_POLL_ATTEMPTS,
SONIOX_DEFAULT_POLL_INTERVAL,
SONIOX_MAX_POLL_ATTEMPTS,
SONIOX_MAX_POLL_INTERVAL,
SONIOX_MIN_POLL_INTERVAL,
SONIOX_SECRET_FIELDS,
SonioxException,
get_soniox_api_base,
)
from litellm.types.utils import FileTypes, TranscriptionResponse
if TYPE_CHECKING:
from litellm.litellm_core_utils.litellm_logging import (
Logging as LiteLLMLoggingObj,
)
else:
LiteLLMLoggingObj = Any
class SonioxAudioTranscriptionHandler:
"""Orchestrates the Soniox async transcription flow."""
# ------------------------------------------------------------------
# Public entry points
# ------------------------------------------------------------------
def audio_transcriptions(
self,
model: str,
audio_file: Optional[FileTypes],
optional_params: dict,
litellm_params: dict,
model_response: TranscriptionResponse,
timeout: float,
max_retries: int,
logging_obj: LiteLLMLoggingObj,
api_key: Optional[str],
api_base: Optional[str],
client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
atranscription: bool = False,
headers: Optional[Dict[str, Any]] = None,
provider_config: Optional[SonioxAudioTranscriptionConfig] = None,
) -> Union[TranscriptionResponse, Coroutine[Any, Any, TranscriptionResponse]]:
"""Sync/async dispatch for Soniox transcription requests.
Note: ``max_retries`` is accepted for signature compatibility with
``litellm.transcription`` but is **not yet implemented** for the Soniox
async pipeline. Transient HTTP failures during upload, create, poll,
or fetch will surface immediately. Wrap calls with the standard
``litellm.Router`` / ``num_retries`` mechanism for retry behaviour.
"""
config = provider_config or SonioxAudioTranscriptionConfig()
if atranscription is True:
return self._async_audio_transcriptions(
model=model,
audio_file=audio_file,
optional_params=optional_params,
litellm_params=litellm_params,
model_response=model_response,
timeout=timeout,
logging_obj=logging_obj,
api_key=api_key,
api_base=api_base,
client=client if isinstance(client, AsyncHTTPHandler) else None,
headers=headers or {},
provider_config=config,
)
return self._sync_audio_transcriptions(
model=model,
audio_file=audio_file,
optional_params=optional_params,
litellm_params=litellm_params,
model_response=model_response,
timeout=timeout,
logging_obj=logging_obj,
api_key=api_key,
api_base=api_base,
client=client if isinstance(client, HTTPHandler) else None,
headers=headers or {},
provider_config=config,
)
# ------------------------------------------------------------------
# Helpers shared between sync and async paths
# ------------------------------------------------------------------
def _prepare(
self,
audio_file: Optional[FileTypes],
optional_params: dict,
litellm_params: dict,
api_key: Optional[str],
api_base: Optional[str],
provider_config: SonioxAudioTranscriptionConfig,
headers: Dict[str, Any],
) -> Tuple[
Dict[str, str], # auth headers
str, # api_base (no trailing slash)
Dict[str, Any], # body for POST /v1/transcriptions (without file_id/audio_url)
Dict[str, Any], # handler-only options (poll interval, cleanup, ...)
]:
# Validate env -> auth headers.
auth_headers = provider_config.validate_environment(
headers=headers,
model="", # unused
messages=[],
optional_params=optional_params,
litellm_params=litellm_params,
api_key=api_key,
api_base=api_base,
)
base_url = get_soniox_api_base(api_base)
# Operate on a local copy so we don't mutate the caller's dict
# (the caller may reuse `optional_params` for retries or logging).
params = dict(optional_params)
# Pull handler-only kwargs out of params so they aren't sent
# to Soniox.
poll_interval = float(
params.pop("soniox_polling_interval", SONIOX_DEFAULT_POLL_INTERVAL)
)
try:
max_attempts = int(
params.pop(
"soniox_max_polling_attempts", SONIOX_DEFAULT_MAX_POLL_ATTEMPTS
)
)
except (ValueError, OverflowError):
max_attempts = SONIOX_DEFAULT_MAX_POLL_ATTEMPTS
cleanup_raw = params.pop("soniox_cleanup", SONIOX_DEFAULT_CLEANUP)
if cleanup_raw is None:
cleanup: List[str] = []
elif isinstance(cleanup_raw, str):
cleanup = [cleanup_raw]
else:
cleanup = list(cleanup_raw)
filename_override = params.pop("filename", None)
# Server-side clamps. Caller-supplied poll settings (from request kwargs)
# are bounded so an authenticated caller cannot force a worker into a
# tight poll loop (zero interval) or pin it indefinitely (huge attempt
# count). Total polling time is bounded by
# SONIOX_MAX_POLL_ATTEMPTS * SONIOX_MAX_POLL_INTERVAL.
if not math.isfinite(poll_interval):
poll_interval = SONIOX_DEFAULT_POLL_INTERVAL
clamped_poll_interval = max(
SONIOX_MIN_POLL_INTERVAL, min(poll_interval, SONIOX_MAX_POLL_INTERVAL)
)
clamped_max_attempts = max(1, min(max_attempts, SONIOX_MAX_POLL_ATTEMPTS))
handler_opts: Dict[str, Any] = {
"poll_interval": clamped_poll_interval,
"max_attempts": clamped_max_attempts,
"cleanup": cleanup,
"filename_override": filename_override,
"audio_url": params.pop("audio_url", None),
"file_id": params.pop("file_id", None),
}
# Soniox does not accept `language` directly; map_openai_params should
# already have translated it, but drop any leftover to be safe.
params.pop("language", None)
# response_format is handled by LiteLLM post-processing, not Soniox.
handler_opts["response_format"] = params.pop("response_format", None)
return auth_headers, base_url, params, handler_opts
def _build_create_body(
self,
model: str,
optional_params: dict,
handler_opts: Dict[str, Any],
file_id: Optional[str],
) -> Dict[str, Any]:
body: Dict[str, Any] = {"model": model}
# Soniox-native passthrough fields
for key, value in optional_params.items():
if value is None:
continue
body[key] = value
if handler_opts.get("audio_url"):
body["audio_url"] = handler_opts["audio_url"]
if file_id:
body["file_id"] = file_id
return body
@staticmethod
def _redact_body_for_logging(body: Dict[str, Any]) -> Dict[str, Any]:
"""Return a shallow copy of ``body`` with secret fields redacted.
Soniox's create-transcription body can include
``webhook_auth_header_value`` (a shared secret used to authenticate
webhook callbacks). Forwarding that value to logging callbacks would
let anyone with read access to those sinks forge webhook requests, so
we replace any value of a known secret-bearing field with the literal
``"[REDACTED]"`` before logging. Non-secret fields are passed through
unchanged.
"""
if not body:
return body
redacted = dict(body)
for field in SONIOX_SECRET_FIELDS:
if field in redacted and redacted[field] is not None:
redacted[field] = "[REDACTED]"
return redacted
@staticmethod
def _safe_log_pre_call(
logging_obj: LiteLLMLoggingObj,
api_key: Optional[str],
api_base: str,
body: Dict[str, Any],
) -> None:
try:
logging_obj.pre_call(
input=None,
api_key=api_key,
additional_args={
"api_base": f"{api_base}/v1/transcriptions",
"atranscription": True,
"complete_input_dict": SonioxAudioTranscriptionHandler._redact_body_for_logging(
body
),
},
)
except Exception:
# Logging hooks are best-effort: a misbehaving callback or third-party
# observability integration must never break a real Soniox call.
pass
@staticmethod
def _safe_log_post_call(
logging_obj: LiteLLMLoggingObj,
audio_file: Optional[FileTypes],
api_key: Optional[str],
body: Dict[str, Any],
original_response: Any,
) -> None:
try:
logging_obj.post_call(
input=get_audio_file_name(audio_file) if audio_file else None,
api_key=api_key,
additional_args={
"complete_input_dict": SonioxAudioTranscriptionHandler._redact_body_for_logging(
body
)
},
original_response=original_response,
)
except Exception:
# Logging hooks are best-effort: a misbehaving callback or third-party
# observability integration must never break a real Soniox call.
pass
@staticmethod
def _raise_for_response(
response: httpx.Response,
provider_config: SonioxAudioTranscriptionConfig,
action: str,
) -> None:
if response.status_code >= 400:
try:
payload = response.json()
message = (
payload.get("error_message")
or payload.get("error")
or response.text
)
except Exception:
message = response.text
raise provider_config.get_error_class(
error_message=f"Soniox {action} failed (HTTP {response.status_code}): {message}",
status_code=response.status_code,
headers=response.headers,
)
# ------------------------------------------------------------------
# Sync flow
# ------------------------------------------------------------------
def _sync_audio_transcriptions(
self,
model: str,
audio_file: Optional[FileTypes],
optional_params: dict,
litellm_params: dict,
model_response: TranscriptionResponse,
timeout: float,
logging_obj: LiteLLMLoggingObj,
api_key: Optional[str],
api_base: Optional[str],
client: Optional[HTTPHandler],
headers: Dict[str, Any],
provider_config: SonioxAudioTranscriptionConfig,
) -> TranscriptionResponse:
auth_headers, base_url, opt_params, handler_opts = self._prepare(
audio_file=audio_file,
optional_params=optional_params,
litellm_params=litellm_params,
api_key=api_key,
api_base=api_base,
provider_config=provider_config,
headers=headers,
)
http_client = (
client
if isinstance(client, HTTPHandler)
else (
_get_httpx_client(
params={"ssl_verify": litellm_params.get("ssl_verify", None)},
)
)
)
file_id = handler_opts.get("file_id")
uploaded_file_id: Optional[str] = None
transcription_id: Optional[str] = None
try:
if not file_id and not handler_opts.get("audio_url"):
if audio_file is None:
raise SonioxException(
message=(
"Soniox transcription requires one of: a file argument, "
"an `audio_url` kwarg, or a `file_id` kwarg."
),
status_code=400,
headers=None,
)
uploaded_file_id = self._sync_upload_file(
http_client=http_client,
base_url=base_url,
auth_headers=auth_headers,
audio_file=audio_file,
filename_override=handler_opts.get("filename_override"),
timeout=timeout,
provider_config=provider_config,
)
file_id = uploaded_file_id
body = self._build_create_body(model, opt_params, handler_opts, file_id)
self._safe_log_pre_call(logging_obj, api_key, base_url, body)
create_resp = http_client.post(
url=f"{base_url}/v1/transcriptions",
headers=auth_headers,
json=body,
timeout=timeout,
)
self._raise_for_response(
create_resp, provider_config, "create transcription"
)
transcription_id = create_resp.json()["id"]
transcription_meta = self._sync_poll_until_completed(
http_client=http_client,
base_url=base_url,
auth_headers=auth_headers,
transcription_id=transcription_id,
poll_interval=handler_opts["poll_interval"],
max_attempts=handler_opts["max_attempts"],
timeout=timeout,
provider_config=provider_config,
)
transcript_resp = http_client.get(
url=f"{base_url}/v1/transcriptions/{transcription_id}/transcript",
headers=auth_headers,
timeout=timeout,
)
self._raise_for_response(
transcript_resp, provider_config, "fetch transcript"
)
transcript = transcript_resp.json()
payload = {"transcription": transcription_meta, "transcript": transcript}
response = provider_config._build_response_from_payload(
payload,
model_response=model_response,
response_format=handler_opts.get("response_format"),
)
self._safe_log_post_call(logging_obj, audio_file, api_key, body, payload)
audio_duration_ms = transcription_meta.get("audio_duration_ms")
response._hidden_params.update(
{
"model": model,
"custom_llm_provider": "soniox",
"audio_transcription_duration": (
float(audio_duration_ms) / 1000.0
if audio_duration_ms is not None
else None
),
}
)
return response
finally:
self._sync_cleanup(
http_client=http_client,
base_url=base_url,
auth_headers=auth_headers,
cleanup=handler_opts["cleanup"],
file_id_to_cleanup=uploaded_file_id,
transcription_id=transcription_id,
timeout=timeout,
)
def _sync_upload_file(
self,
http_client: HTTPHandler,
base_url: str,
auth_headers: Dict[str, str],
audio_file: FileTypes,
filename_override: Optional[str],
timeout: float,
provider_config: SonioxAudioTranscriptionConfig,
) -> str:
processed = process_audio_file(audio_file)
filename = filename_override or processed.filename
files = {
"file": (filename, processed.file_content, processed.content_type),
}
# `Authorization` header is fine; httpx sets multipart Content-Type.
upload_headers = {"Authorization": auth_headers["Authorization"]}
resp = http_client.post(
url=f"{base_url}/v1/files",
headers=upload_headers,
files=files,
timeout=timeout,
)
self._raise_for_response(resp, provider_config, "upload file")
return resp.json()["id"]
def _sync_poll_until_completed(
self,
http_client: HTTPHandler,
base_url: str,
auth_headers: Dict[str, str],
transcription_id: str,
poll_interval: float,
max_attempts: int,
timeout: float,
provider_config: SonioxAudioTranscriptionConfig,
) -> Dict[str, Any]:
for _ in range(max_attempts):
resp = http_client.get(
url=f"{base_url}/v1/transcriptions/{transcription_id}",
headers=auth_headers,
timeout=timeout,
)
self._raise_for_response(resp, provider_config, "poll transcription")
data = resp.json()
status = data.get("status")
if status == "completed":
return data
if status == "error":
raise provider_config.get_error_class(
error_message=(
f"Soniox transcription {transcription_id} failed: "
f"{data.get('error_message') or data.get('error_type') or 'unknown error'}"
),
status_code=500,
headers=resp.headers,
)
time.sleep(poll_interval)
raise provider_config.get_error_class(
error_message=(
f"Soniox transcription {transcription_id} did not complete after "
f"{max_attempts} polling attempts (interval={poll_interval}s)."
),
status_code=504,
headers={},
)
def _sync_cleanup(
self,
http_client: HTTPHandler,
base_url: str,
auth_headers: Dict[str, str],
cleanup: List[str],
file_id_to_cleanup: Optional[str],
transcription_id: Optional[str],
timeout: float,
) -> None:
if not cleanup:
return
if "transcription" in cleanup and transcription_id:
try:
http_client.delete(
url=f"{base_url}/v1/transcriptions/{transcription_id}",
headers=auth_headers,
timeout=timeout,
)
except Exception:
# Cleanup is best-effort: a failed delete leaves stale data on
# Soniox but must not mask the original transcription result
# (or, on the error path, the original error).
pass
if "file" in cleanup and file_id_to_cleanup:
try:
http_client.delete(
url=f"{base_url}/v1/files/{file_id_to_cleanup}",
headers=auth_headers,
timeout=timeout,
)
except Exception:
# Cleanup is best-effort; see comment above.
pass
# ------------------------------------------------------------------
# Async flow
# ------------------------------------------------------------------
async def _async_audio_transcriptions(
self,
model: str,
audio_file: Optional[FileTypes],
optional_params: dict,
litellm_params: dict,
model_response: TranscriptionResponse,
timeout: float,
logging_obj: LiteLLMLoggingObj,
api_key: Optional[str],
api_base: Optional[str],
client: Optional[AsyncHTTPHandler],
headers: Dict[str, Any],
provider_config: SonioxAudioTranscriptionConfig,
) -> TranscriptionResponse:
import litellm
auth_headers, base_url, opt_params, handler_opts = self._prepare(
audio_file=audio_file,
optional_params=optional_params,
litellm_params=litellm_params,
api_key=api_key,
api_base=api_base,
provider_config=provider_config,
headers=headers,
)
http_client = (
client
if isinstance(client, AsyncHTTPHandler)
else (
get_async_httpx_client(
llm_provider=litellm.LlmProviders.SONIOX,
params={"ssl_verify": litellm_params.get("ssl_verify", None)},
)
)
)
file_id = handler_opts.get("file_id")
uploaded_file_id: Optional[str] = None
transcription_id: Optional[str] = None
try:
if not file_id and not handler_opts.get("audio_url"):
if audio_file is None:
raise SonioxException(
message=(
"Soniox transcription requires one of: a file argument, "
"an `audio_url` kwarg, or a `file_id` kwarg."
),
status_code=400,
headers=None,
)
uploaded_file_id = await self._async_upload_file(
http_client=http_client,
base_url=base_url,
auth_headers=auth_headers,
audio_file=audio_file,
filename_override=handler_opts.get("filename_override"),
timeout=timeout,
provider_config=provider_config,
)
file_id = uploaded_file_id
body = self._build_create_body(model, opt_params, handler_opts, file_id)
self._safe_log_pre_call(logging_obj, api_key, base_url, body)
create_resp = await http_client.post(
url=f"{base_url}/v1/transcriptions",
headers=auth_headers,
json=body,
timeout=timeout,
)
self._raise_for_response(
create_resp, provider_config, "create transcription"
)
transcription_id = create_resp.json()["id"]
transcription_meta = await self._async_poll_until_completed(
http_client=http_client,
base_url=base_url,
auth_headers=auth_headers,
transcription_id=transcription_id,
poll_interval=handler_opts["poll_interval"],
max_attempts=handler_opts["max_attempts"],
timeout=timeout,
provider_config=provider_config,
)
transcript_resp = await http_client.get(
url=f"{base_url}/v1/transcriptions/{transcription_id}/transcript",
headers=auth_headers,
timeout=timeout,
)
self._raise_for_response(
transcript_resp, provider_config, "fetch transcript"
)
transcript = transcript_resp.json()
payload = {"transcription": transcription_meta, "transcript": transcript}
response = provider_config._build_response_from_payload(
payload,
model_response=model_response,
response_format=handler_opts.get("response_format"),
)
self._safe_log_post_call(logging_obj, audio_file, api_key, body, payload)
audio_duration_ms = transcription_meta.get("audio_duration_ms")
response._hidden_params.update(
{
"model": model,
"custom_llm_provider": "soniox",
"audio_transcription_duration": (
float(audio_duration_ms) / 1000.0
if audio_duration_ms is not None
else None
),
}
)
return response
finally:
await self._async_cleanup(
http_client=http_client,
base_url=base_url,
auth_headers=auth_headers,
cleanup=handler_opts["cleanup"],
file_id_to_cleanup=uploaded_file_id,
transcription_id=transcription_id,
timeout=timeout,
)
async def _async_upload_file(
self,
http_client: AsyncHTTPHandler,
base_url: str,
auth_headers: Dict[str, str],
audio_file: FileTypes,
filename_override: Optional[str],
timeout: float,
provider_config: SonioxAudioTranscriptionConfig,
) -> str:
processed = process_audio_file(audio_file)
filename = filename_override or processed.filename
files = {
"file": (filename, processed.file_content, processed.content_type),
}
upload_headers = {"Authorization": auth_headers["Authorization"]}
resp = await http_client.post(
url=f"{base_url}/v1/files",
headers=upload_headers,
files=files,
timeout=timeout,
)
self._raise_for_response(resp, provider_config, "upload file")
return resp.json()["id"]
async def _async_poll_until_completed(
self,
http_client: AsyncHTTPHandler,
base_url: str,
auth_headers: Dict[str, str],
transcription_id: str,
poll_interval: float,
max_attempts: int,
timeout: float,
provider_config: SonioxAudioTranscriptionConfig,
) -> Dict[str, Any]:
for _ in range(max_attempts):
resp = await http_client.get(
url=f"{base_url}/v1/transcriptions/{transcription_id}",
headers=auth_headers,
timeout=timeout,
)
self._raise_for_response(resp, provider_config, "poll transcription")
data = resp.json()
status = data.get("status")
if status == "completed":
return data
if status == "error":
raise provider_config.get_error_class(
error_message=(
f"Soniox transcription {transcription_id} failed: "
f"{data.get('error_message') or data.get('error_type') or 'unknown error'}"
),
status_code=500,
headers=resp.headers,
)
await asyncio.sleep(poll_interval)
raise provider_config.get_error_class(
error_message=(
f"Soniox transcription {transcription_id} did not complete after "
f"{max_attempts} polling attempts (interval={poll_interval}s)."
),
status_code=504,
headers={},
)
async def _async_cleanup(
self,
http_client: AsyncHTTPHandler,
base_url: str,
auth_headers: Dict[str, str],
cleanup: List[str],
file_id_to_cleanup: Optional[str],
transcription_id: Optional[str],
timeout: float,
) -> None:
if not cleanup:
return
if "transcription" in cleanup and transcription_id:
try:
await http_client.delete(
url=f"{base_url}/v1/transcriptions/{transcription_id}",
headers=auth_headers,
timeout=timeout,
)
except Exception:
# Cleanup is best-effort: a failed delete leaves stale data on
# Soniox but must not mask the original transcription result
# (or, on the error path, the original error).
pass
if "file" in cleanup and file_id_to_cleanup:
try:
await http_client.delete(
url=f"{base_url}/v1/files/{file_id_to_cleanup}",
headers=auth_headers,
timeout=timeout,
)
except Exception:
# Cleanup is best-effort; see comment above.
pass

View File

@ -0,0 +1,281 @@
"""
Translates between OpenAI's `/v1/audio/transcriptions` shape and Soniox's
async transcription API (https://soniox.com/docs/stt/async/async-transcription).
This config covers parameter mapping, env validation and response shaping.
The actual orchestration (file upload -> create -> poll -> fetch -> cleanup)
lives in `litellm.llms.soniox.audio_transcription.handler`, because Soniox's
async API requires multiple HTTP calls and does not fit the single-request
contract of `base_llm_http_handler.audio_transcriptions`.
"""
from typing import Any, Dict, List, Optional, Union
from httpx import Headers, Response
from litellm.llms.base_llm.audio_transcription.transformation import (
AudioTranscriptionRequestData,
BaseAudioTranscriptionConfig,
)
from litellm.llms.base_llm.chat.transformation import BaseLLMException
from litellm.llms.soniox.common_utils import (
SonioxException,
get_soniox_api_base,
get_soniox_api_key,
render_soniox_tokens,
render_soniox_tokens_as_srt,
render_soniox_tokens_as_vtt,
)
from litellm.types.llms.openai import (
AllMessageValues,
OpenAIAudioTranscriptionOptionalParams,
)
from litellm.types.utils import FileTypes, TranscriptionResponse
# Soniox-native kwargs the user can pass through `litellm.transcription(..., **kwargs)`
# in addition to the standard OpenAI params.
SONIOX_PASSTHROUGH_PARAMS: List[str] = [
"language_hints",
"language_hints_strict",
"enable_language_identification",
"enable_speaker_diarization",
"context",
"translation",
"client_reference_id",
"webhook_url",
"webhook_auth_header_name",
"webhook_auth_header_value",
"audio_url",
"file_id",
]
# Handler-only kwargs (consumed by the handler, not sent to Soniox).
SONIOX_HANDLER_ONLY_PARAMS: List[str] = [
"soniox_polling_interval",
"soniox_max_polling_attempts",
"soniox_cleanup",
"filename",
]
class SonioxAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
"""Configuration for Soniox async speech-to-text transcription."""
def get_supported_openai_params(
self, model: str
) -> List[OpenAIAudioTranscriptionOptionalParams]:
# `language` is mapped onto Soniox's `language_hints`.
# `response_format` is handled by LiteLLM (Soniox doesn't support
# SRT/VTT natively but we synthesize them from token timestamps).
return ["language", "response_format"]
def map_openai_params(
self,
non_default_params: dict,
optional_params: dict,
model: str,
drop_params: bool,
) -> dict:
# Translate the OpenAI `language` param into Soniox `language_hints`.
if "language" in non_default_params and non_default_params["language"]:
language = non_default_params["language"]
existing_hints = optional_params.get("language_hints")
if not existing_hints:
optional_params["language_hints"] = [language]
elif language not in existing_hints:
optional_params["language_hints"] = [language] + list(existing_hints)
# Capture response_format for post-processing (not sent to Soniox API).
if "response_format" in non_default_params:
optional_params["response_format"] = non_default_params["response_format"]
# Pass through Soniox-native kwargs unchanged.
for key in SONIOX_PASSTHROUGH_PARAMS + SONIOX_HANDLER_ONLY_PARAMS:
if key in non_default_params and non_default_params[key] is not None:
optional_params[key] = non_default_params[key]
return optional_params
def get_error_class(
self, error_message: str, status_code: int, headers: Union[dict, Headers]
) -> BaseLLMException:
return SonioxException(
message=error_message, status_code=status_code, headers=headers
)
def validate_environment(
self,
headers: dict,
model: str,
messages: List[AllMessageValues],
optional_params: dict,
litellm_params: dict,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
) -> dict:
resolved_key = get_soniox_api_key(api_key)
if not resolved_key:
raise SonioxException(
message=(
"Missing Soniox API key. Set the SONIOX_API_KEY environment "
"variable or pass api_key=... to litellm.transcription()."
),
status_code=401,
headers=None,
)
merged_headers: Dict[str, str] = {
"Authorization": f"Bearer {resolved_key}",
}
if headers:
merged_headers.update(headers)
return merged_headers
def get_complete_url(
self,
api_base: Optional[str],
api_key: Optional[str],
model: str,
optional_params: dict,
litellm_params: dict,
stream: Optional[bool] = None,
) -> str:
# The handler builds per-call URLs (uploads, create, poll, fetch, delete);
# we just return the resolved base.
return get_soniox_api_base(api_base)
def transform_audio_transcription_request(
self,
model: str,
audio_file: FileTypes,
optional_params: dict,
litellm_params: dict,
) -> AudioTranscriptionRequestData:
"""
Build the JSON body for `POST /v1/transcriptions`.
The handler is responsible for the file upload (if `audio_file` is bytes)
and for filling in `file_id`/`audio_url`. This method exists so the
config can be exercised in isolation by unit tests.
"""
body: Dict[str, Any] = {"model": model}
for key in SONIOX_PASSTHROUGH_PARAMS:
value = optional_params.get(key)
if value is not None:
body[key] = value
return AudioTranscriptionRequestData(
data=body, files=None, content_type="application/json"
)
def transform_audio_transcription_response(
self,
raw_response: Response,
model_response: Optional[TranscriptionResponse] = None,
) -> TranscriptionResponse:
"""
Build a TranscriptionResponse from a Soniox transcript payload.
`raw_response.json()` may be either:
- a Soniox transcript object: `{"id": "...", "text": "...", "tokens": [...]}`
- or a merged envelope: `{"transcription": {...}, "transcript": {...}}`
produced by the handler so transcription metadata is also available.
"""
try:
payload = raw_response.json()
except Exception as exc:
raise SonioxException(
message=f"Failed to parse Soniox response: {exc}",
status_code=getattr(raw_response, "status_code", 500),
headers=getattr(raw_response, "headers", None),
)
return self._build_response_from_payload(payload, model_response=model_response)
def _build_response_from_payload(
self,
payload: Dict[str, Any],
model_response: Optional[TranscriptionResponse] = None,
response_format: Optional[str] = None,
) -> TranscriptionResponse:
"""Shared response-building logic (also used by the handler)."""
transcription_meta: Dict[str, Any] = {}
transcript: Dict[str, Any]
if isinstance(payload, dict) and "transcript" in payload:
transcription_meta = payload.get("transcription") or {}
transcript = payload.get("transcript") or {}
else:
transcript = payload if isinstance(payload, dict) else {}
tokens: List[Dict[str, Any]] = transcript.get("tokens") or []
# Decide what to put in `text` based on response_format:
# - "srt": render tokens as SRT subtitles (synthesized from timestamps)
# - "vtt": render tokens as WebVTT subtitles (synthesized from timestamps)
# - "verbose_json": return JSON with word-level timing (handled below)
# - "text" / "json" / None: default plain text rendering
if response_format == "srt" and tokens:
text = render_soniox_tokens_as_srt(tokens)
elif response_format == "vtt" and tokens:
text = render_soniox_tokens_as_vtt(tokens)
else:
# Default text rendering (also used for "json", "text",
# "verbose_json")
has_speaker = any(t.get("speaker") is not None for t in tokens)
has_language = any(t.get("language") is not None for t in tokens)
if (has_speaker or has_language) and tokens:
text = render_soniox_tokens(tokens)
elif transcript.get("text"):
text = transcript["text"]
elif tokens:
text = "".join(t.get("text", "") for t in tokens)
else:
text = ""
response = model_response or TranscriptionResponse(text=text)
response.text = text
response["task"] = "transcribe"
# Best-effort metadata fields matching OpenAI's verbose_json shape.
if transcription_meta.get("audio_duration_ms") is not None:
try:
response["duration"] = (
float(transcription_meta["audio_duration_ms"]) / 1000.0
)
except (TypeError, ValueError):
pass
# Surface a representative language if all tokens agree.
has_language = any(t.get("language") is not None for t in tokens)
if has_language:
languages = {t.get("language") for t in tokens if t.get("language")}
if len(languages) == 1:
response["language"] = next(iter(languages))
# For verbose_json, include word-level timing from tokens.
if response_format == "verbose_json" and tokens:
words: List[Dict[str, Any]] = []
for token in tokens:
word_entry: Dict[str, Any] = {"word": token.get("text", "")}
if token.get("start_ms") is not None:
word_entry["start"] = float(token["start_ms"]) / 1000.0
if token.get("end_ms") is not None:
word_entry["end"] = float(token["end_ms"]) / 1000.0
words.append(word_entry)
if words:
response["words"] = words
# Stash the raw Soniox payload so power-users can read tokens, segments,
# speaker/language data, etc.
response._hidden_params.update(
{
"soniox_raw": {
"transcription": transcription_meta,
"transcript": transcript,
}
}
)
return response

View File

@ -0,0 +1,274 @@
"""
Shared utilities for the Soniox provider (https://soniox.com).
"""
from typing import Any, Dict, List, Optional
from litellm.llms.base_llm.chat.transformation import BaseLLMException
# Soniox API base URL.
SONIOX_API_BASE: str = "https://api.soniox.com"
# Default polling interval in seconds when waiting for an async transcription
# to finish. Mirrors the Soniox SDK default.
SONIOX_DEFAULT_POLL_INTERVAL: float = 1.0
# Minimum polling interval (in seconds) the server will accept from caller-
# supplied `soniox_polling_interval` kwargs. Prevents an authenticated caller
# from forcing a worker into a tight poll loop with a zero/near-zero interval.
SONIOX_MIN_POLL_INTERVAL: float = 0.5
# Maximum polling interval (in seconds). Prevents a caller from setting an
# excessively large or non-finite interval that would keep a worker sleeping
# far longer than necessary between status checks.
SONIOX_MAX_POLL_INTERVAL: float = 60.0
# Default maximum number of polling attempts (1800 attempts * 1s ~= 30 minutes).
SONIOX_DEFAULT_MAX_POLL_ATTEMPTS: int = 1800
# Hard upper bound on polling attempts. Combined with `SONIOX_MIN_POLL_INTERVAL`
# this caps total polling time per request at ~3000s (50 minutes), preventing a
# caller from pinning a worker indefinitely via a huge attempt count.
SONIOX_MAX_POLL_ATTEMPTS: int = 6000
# Default cleanup behaviour: delete both the uploaded file (if any) and the
# transcription record after the transcript has been fetched.
SONIOX_DEFAULT_CLEANUP: List[str] = ["file", "transcription"]
# Body fields that may carry secrets and must be redacted before being
# forwarded to logging callbacks. Soniox accepts a webhook auth header value
# alongside the create-transcription request; that value lets the recipient
# authenticate webhook callbacks and must not leak into observability sinks.
SONIOX_SECRET_FIELDS: List[str] = ["webhook_auth_header_value"]
class SonioxException(BaseLLMException):
"""Provider-specific exception class for Soniox."""
pass
def get_soniox_api_key(api_key: Optional[str] = None) -> Optional[str]:
"""Resolve the Soniox API key from arg or env var."""
# Local import to avoid a circular import: litellm.secret_managers.main
# imports from litellm at top-level.
from litellm.secret_managers.main import get_secret_str
return api_key or get_secret_str("SONIOX_API_KEY")
def get_soniox_api_base(api_base: Optional[str] = None) -> str:
"""Resolve the Soniox API base URL from arg or env var (defaults to public API)."""
from litellm.secret_managers.main import get_secret_str
base = api_base or get_secret_str("SONIOX_API_BASE") or SONIOX_API_BASE
return base.rstrip("/")
def render_soniox_tokens(tokens: List[Dict[str, Any]]) -> str:
"""
Render a list of Soniox tokens to a readable transcript string.
Mirrors the behaviour of the official Soniox SDK's `renderTokens` helper:
- When the speaker changes, a `Speaker N:` tag is inserted.
- When the language changes, a `[lang]` (or `[Translation][lang]`) tag is
inserted.
If neither speaker nor language information is present on any token (i.e.
diarization and language identification are disabled), the function simply
concatenates the token texts.
"""
if not tokens:
return ""
text_parts: List[str] = []
current_speaker: Optional[Any] = None
current_language: Optional[Any] = None
for token in tokens:
text = token.get("text", "")
speaker = token.get("speaker")
language = token.get("language")
is_translation = token.get("translation_status") == "translation"
# Speaker changed -> emit a speaker tag.
if speaker is not None and speaker != current_speaker:
if current_speaker is not None:
text_parts.append("\n\n")
current_speaker = speaker
current_language = None # reset language whenever speaker changes
text_parts.append(f"Speaker {current_speaker}:")
# Language changed -> emit a language (or translation) tag.
if language is not None and language != current_language:
current_language = language
prefix = "[Translation] " if is_translation else ""
text_parts.append(f"\n{prefix}[{current_language}] ")
text = text.lstrip() if isinstance(text, str) else text
text_parts.append(text)
return "".join(text_parts)
# ---------------------------------------------------------------------------
# SRT / VTT subtitle rendering
# ---------------------------------------------------------------------------
# Maximum number of tokens to group into a single subtitle cue.
_CUE_MAX_TOKENS: int = 15
# Maximum duration (in ms) for a single cue before forcing a break.
_CUE_MAX_DURATION_MS: int = 5000
def _format_timestamp_srt(ms: int) -> str:
"""Format milliseconds as SRT timestamp: HH:MM:SS,mmm"""
if ms < 0:
ms = 0
hours = ms // 3_600_000
ms %= 3_600_000
minutes = ms // 60_000
ms %= 60_000
seconds = ms // 1_000
millis = ms % 1_000
return f"{hours:02d}:{minutes:02d}:{seconds:02d},{millis:03d}"
def _format_timestamp_vtt(ms: int) -> str:
"""Format milliseconds as VTT timestamp: HH:MM:SS.mmm"""
if ms < 0:
ms = 0
hours = ms // 3_600_000
ms %= 3_600_000
minutes = ms // 60_000
ms %= 60_000
seconds = ms // 1_000
millis = ms % 1_000
return f"{hours:02d}:{minutes:02d}:{seconds:02d}.{millis:03d}"
def _group_tokens_into_cues(
tokens: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""
Group Soniox tokens into subtitle cues.
Each cue has:
- start_ms: int
- end_ms: int
- text: str
Grouping heuristics:
- A new cue starts when token count exceeds _CUE_MAX_TOKENS.
- A new cue starts when duration exceeds _CUE_MAX_DURATION_MS.
- A new cue starts when the speaker changes (if diarization is on).
- Tokens without timestamps are appended to the current cue.
"""
cues: List[Dict[str, Any]] = []
current_tokens: List[str] = []
current_start: Optional[int] = None
current_end: Optional[int] = None
current_speaker: Optional[Any] = None
def _flush() -> None:
if current_tokens and current_start is not None:
text = "".join(current_tokens).strip()
if text:
cues.append(
{
"start_ms": current_start,
"end_ms": (
current_end if current_end is not None else current_start
),
"text": text,
}
)
for token in tokens:
start_ms = token.get("start_ms")
end_ms = token.get("end_ms")
text = token.get("text", "")
speaker = token.get("speaker")
# Skip tokens with no timestamp data entirely if we have no cue started
if start_ms is None and current_start is None:
continue
# Speaker change forces a new cue
if speaker is not None and speaker != current_speaker:
_flush()
current_tokens = []
current_start = start_ms
current_end = end_ms
current_speaker = speaker
current_tokens.append(text)
continue
# Duration or token count exceeded -> flush
should_break = False
if len(current_tokens) >= _CUE_MAX_TOKENS:
should_break = True
elif (
current_start is not None
and start_ms is not None
and (start_ms - current_start) >= _CUE_MAX_DURATION_MS
):
should_break = True
if should_break:
_flush()
current_tokens = []
current_start = start_ms
current_end = end_ms
current_tokens.append(text)
else:
if current_start is None:
current_start = start_ms
if end_ms is not None:
current_end = end_ms
current_tokens.append(text)
_flush()
return cues
def render_soniox_tokens_as_srt(tokens: List[Dict[str, Any]]) -> str:
"""
Render Soniox tokens as SRT (SubRip) subtitle format.
Returns an empty string if no tokens have timestamp data.
"""
cues = _group_tokens_into_cues(tokens)
if not cues:
return ""
lines: List[str] = []
for idx, cue in enumerate(cues, start=1):
start = _format_timestamp_srt(cue["start_ms"])
end = _format_timestamp_srt(cue["end_ms"])
lines.append(str(idx))
lines.append(f"{start} --> {end}")
lines.append(cue["text"])
lines.append("") # blank line between cues
return "\n".join(lines)
def render_soniox_tokens_as_vtt(tokens: List[Dict[str, Any]]) -> str:
"""
Render Soniox tokens as WebVTT subtitle format.
Returns the VTT header even if no cues are present.
"""
cues = _group_tokens_into_cues(tokens)
lines: List[str] = ["WEBVTT", ""]
for cue in cues:
start = _format_timestamp_vtt(cue["start_ms"])
end = _format_timestamp_vtt(cue["end_ms"])
lines.append(f"{start} --> {end}")
lines.append(cue["text"])
lines.append("") # blank line between cues
return "\n".join(lines)

View File

View File

@ -0,0 +1,7 @@
"""
You.com Search API module.
"""
from litellm.llms.you_com.search.transformation import YouComSearchConfig
__all__ = ["YouComSearchConfig"]

View File

@ -0,0 +1,193 @@
"""
Calls You.com's /v1/search endpoint to search the web.
You.com API Reference: https://you.com/docs/api-reference/search/v1-search
OpenAPI spec: https://you.com/specs/openapi_search_v1.yaml
"""
from typing import Dict, List, Optional, TypedDict, Union
import httpx
from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
from litellm.llms.base_llm.search.transformation import (
BaseSearchConfig,
SearchResponse,
SearchResult,
)
from litellm.secret_managers.main import get_secret_str
class _YouComSearchRequestRequired(TypedDict):
"""Required fields for You.com Search API request."""
query: str
class YouComSearchRequest(_YouComSearchRequestRequired, total=False):
"""
You.com Search API request format.
Based on: https://you.com/specs/openapi_search_v1.yaml
"""
count: int
country: str
language: str
freshness: str
include_domains: List[str]
exclude_domains: List[str]
safesearch: str
class YouComSearchConfig(BaseSearchConfig):
# Keyed tier (higher rate limits): authenticate with X-API-Key.
YOU_COM_API_BASE = "https://ydc-index.io"
# Keyless free tier: IP-throttled (100 queries/day) and requires no auth.
# Used automatically when YOUCOM_API_KEY is not set.
YOU_COM_FREE_API_BASE = "https://api.you.com/v1/agents/search"
@staticmethod
def ui_friendly_name() -> str:
return "You.com"
def validate_environment(
self,
headers: Dict,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
**kwargs,
) -> Dict:
"""
Set headers for the You.com Search API.
If YOUCOM_API_KEY (or an explicit api_key) is present, use the keyed
endpoint with the `X-API-Key` header. Otherwise fall through to the
keyless free tier; no auth header is required.
"""
api_key = api_key or get_secret_str("YOUCOM_API_KEY")
headers["Content-Type"] = "application/json"
# Pin Accept-Encoding to identity: the keyless `api.you.com/v1/agents/search`
# endpoint advertises gzip content-encoding but returns body bytes the
# decoder rejects, which surfaces as httpx.DecodingError through litellm's
# http handler. Identity is harmless on the keyed endpoint.
headers.setdefault("Accept-Encoding", "identity")
if api_key:
headers["X-API-Key"] = api_key
return headers
def get_complete_url(
self,
api_base: Optional[str],
optional_params: dict,
data: Optional[Union[Dict, List[Dict]]] = None,
**kwargs,
) -> str:
"""
Pick the endpoint based on whether an API key is configured.
- api_base explicit override -> use it as-is (normalized)
- YOUCOM_API_KEY set -> keyed endpoint (ydc-index.io/v1/search)
- no key -> keyless free tier (api.you.com/v1/agents/search)
"""
if api_base is None:
api_base = get_secret_str("YOUCOM_API_BASE")
if api_base is None:
api_key = kwargs.get("api_key") or get_secret_str("YOUCOM_API_KEY")
if api_key:
api_base = self.YOU_COM_API_BASE
else:
# Keyless free tier already includes the full path.
return self.YOU_COM_FREE_API_BASE
api_base = api_base.rstrip("/")
if not api_base.endswith("/v1/search") and not api_base.endswith(
"/v1/agents/search"
):
api_base = f"{api_base}/v1/search"
return api_base
def transform_search_request(
self,
query: Union[str, List[str]],
optional_params: dict,
**kwargs,
) -> Dict:
"""
Transform Search request to You.com API format.
Perplexity unified spec You.com mappings:
- query query
- max_results count
- search_domain_filter include_domains
- country country
- max_tokens_per_page (not applicable, ignored)
"""
if isinstance(query, list):
query = " ".join(query)
request_data: YouComSearchRequest = {
"query": query,
}
if "max_results" in optional_params:
request_data["count"] = optional_params["max_results"]
if "search_domain_filter" in optional_params:
request_data["include_domains"] = optional_params["search_domain_filter"]
if "country" in optional_params:
request_data["country"] = optional_params["country"].lower()
result_data = dict(request_data)
for param, value in optional_params.items():
if (
param not in self.get_supported_perplexity_optional_params()
and param not in result_data
):
result_data[param] = value
return result_data
def transform_search_response(
self,
raw_response: httpx.Response,
logging_obj: LiteLLMLoggingObj,
**kwargs,
) -> SearchResponse:
"""
Transform You.com API response to LiteLLM unified SearchResponse format.
You.com LiteLLM mappings (for both `results.web[]` and `results.news[]`):
- title SearchResult.title
- url SearchResult.url
- snippets[0] SearchResult.snippet (falls back to `description`)
- page_age SearchResult.date
"""
response_json = raw_response.json()
raw_results = response_json.get("results") or {}
web_results = raw_results.get("web") or []
news_results = raw_results.get("news") or []
results: List[SearchResult] = []
for item in list(web_results) + list(news_results):
snippets = item.get("snippets") or []
snippet = snippets[0] if snippets else item.get("description", "")
results.append(
SearchResult(
title=item.get("title", ""),
url=item.get("url", ""),
snippet=snippet,
date=item.get("page_age"),
last_updated=None,
)
)
return SearchResponse(
results=results,
object="search",
)

View File

@ -6655,7 +6655,7 @@ async def atranscription(*args, **kwargs) -> TranscriptionResponse:
@client
def transcription(
def transcription( # noqa: PLR0915
model: str,
file: FileTypes,
## OPTIONAL OPENAI PARAMS ##
@ -6847,6 +6847,35 @@ def transcription(
else None
),
)
elif custom_llm_provider == "soniox":
from litellm.llms.soniox.audio_transcription.handler import (
SonioxAudioTranscriptionHandler,
)
response = SonioxAudioTranscriptionHandler().audio_transcriptions(
model=model,
audio_file=file,
optional_params=optional_params,
litellm_params=litellm_params_dict,
model_response=model_response,
atranscription=atranscription,
client=(
client
if client is not None
and (
isinstance(client, HTTPHandler)
or isinstance(client, AsyncHTTPHandler)
)
else None
),
timeout=timeout,
max_retries=max_retries,
logging_obj=litellm_logging_obj,
api_base=api_base,
api_key=api_key,
headers=extra_headers,
provider_config=provider_config, # type: ignore[arg-type]
)
elif provider_config is not None:
response = base_llm_http_handler.audio_transcriptions(
model=model,

View File

@ -1463,6 +1463,36 @@
"supports_output_config": true,
"bedrock_output_config_effort_ceiling": "xhigh"
},
"jp.anthropic.claude-opus-4-7": {
"cache_creation_input_token_cost": 6.875e-06,
"cache_read_input_token_cost": 5.5e-07,
"input_cost_per_token": 5.5e-06,
"litellm_provider": "bedrock_converse",
"max_input_tokens": 1000000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "chat",
"output_cost_per_token": 2.75e-05,
"search_context_cost_per_query": {
"search_context_size_high": 0.01,
"search_context_size_low": 0.01,
"search_context_size_medium": 0.01
},
"supports_assistant_prefill": false,
"supports_computer_use": true,
"supports_function_calling": true,
"supports_pdf_input": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_xhigh_reasoning_effort": true,
"tool_use_system_prompt_tokens": 346,
"supports_native_structured_output": true,
"supports_max_reasoning_effort": true,
"supports_minimal_reasoning_effort": true
},
"anthropic.claude-sonnet-4-6": {
"cache_creation_input_token_cost": 3.75e-06,
"cache_creation_input_token_cost_above_1hr": 6e-06,
@ -24096,6 +24126,21 @@
"max_input_tokens": 200000,
"max_output_tokens": 8192
},
"minimax/MiniMax-M3": {
"input_cost_per_token": 6e-07,
"output_cost_per_token": 2.4e-06,
"cache_read_input_token_cost": 1.2e-07,
"litellm_provider": "minimax",
"mode": "chat",
"supports_function_calling": true,
"supports_tool_choice": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_system_messages": true,
"supports_vision": true,
"max_input_tokens": 512000,
"max_output_tokens": 128000
},
"mistral.devstral-2-123b": {
"input_cost_per_token": 4e-07,
"litellm_provider": "bedrock_converse",
@ -24978,6 +25023,7 @@
},
"moonshot/kimi-k2-0711-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -24992,6 +25038,7 @@
},
"moonshot/kimi-k2-0905-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25006,6 +25053,7 @@
},
"moonshot/kimi-k2-turbo-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 1.15e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25030,6 +25078,7 @@
"source": "https://platform.moonshot.ai/docs/guide/kimi-k2-5-quickstart",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_video_input": true,
"supports_vision": true
@ -25046,12 +25095,14 @@
"source": "https://platform.kimi.ai/docs/pricing/chat-k26",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_video_input": true,
"supports_vision": true
},
"moonshot/kimi-latest": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 2e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25066,6 +25117,7 @@
},
"moonshot/kimi-latest-128k": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 2e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25080,6 +25132,7 @@
},
"moonshot/kimi-latest-32k": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 1e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 32768,
@ -25094,6 +25147,7 @@
},
"moonshot/kimi-latest-8k": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 2e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 8192,
@ -25108,6 +25162,7 @@
},
"moonshot/kimi-thinking-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2025-11-11",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25120,6 +25175,7 @@
},
"moonshot/kimi-k2-thinking": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25135,6 +25191,7 @@
},
"moonshot/kimi-k2-thinking-turbo": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 1.15e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25158,9 +25215,11 @@
"output_cost_per_token": 5e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"moonshot/moonshot-v1-128k-0430": {
"deprecation_date": "2024-04-30",
"input_cost_per_token": 2e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25182,6 +25241,7 @@
"output_cost_per_token": 5e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true
},
@ -25195,9 +25255,11 @@
"output_cost_per_token": 3e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"moonshot/moonshot-v1-32k-0430": {
"deprecation_date": "2024-04-30",
"input_cost_per_token": 1e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 32768,
@ -25219,6 +25281,7 @@
"output_cost_per_token": 3e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true
},
@ -25232,9 +25295,11 @@
"output_cost_per_token": 2e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"moonshot/moonshot-v1-8k-0430": {
"deprecation_date": "2024-04-30",
"input_cost_per_token": 2e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 8192,
@ -25256,6 +25321,7 @@
"output_cost_per_token": 2e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true
},
@ -25269,6 +25335,7 @@
"output_cost_per_token": 5e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"morph/morph-v3-fast": {
@ -36012,7 +36079,8 @@
"supports_prompt_caching": true,
"supports_response_schema": false,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-3-beta": {
"cache_read_input_token_cost": 7.5e-07,
@ -36211,7 +36279,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-fast-non-reasoning": {
"cache_read_input_token_cost": 5e-08,
@ -36228,7 +36297,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-0709": {
"input_cost_per_token": 3e-06,
@ -36244,7 +36314,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-latest": {
"input_cost_per_token": 3e-06,
@ -36302,7 +36373,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-1-fast-reasoning-latest": {
"cache_read_input_token_cost": 5e-08,
@ -36323,7 +36395,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-1-fast-non-reasoning": {
"cache_read_input_token_cost": 5e-08,
@ -36343,7 +36416,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-1-fast-non-reasoning-latest": {
"cache_read_input_token_cost": 5e-08,
@ -36363,7 +36437,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4.20-multi-agent-beta-0309": {
"cache_read_input_token_cost": 2e-07,
@ -36514,7 +36589,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_tool_choice": true
"supports_tool_choice": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-code-fast-1-0825": {
"cache_read_input_token_cost": 2e-08,
@ -36529,7 +36605,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_tool_choice": true
"supports_tool_choice": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-vision-beta": {
"input_cost_per_image": 5e-06,
@ -41547,5 +41624,18 @@
"supports_vision": true,
"supports_native_structured_output": true,
"supports_pdf_input": true
},
"soniox/stt-async-v4": {
"litellm_provider": "soniox",
"max_output_tokens": 8000,
"max_tokens": 8000,
"input_cost_per_second": 0.0,
"output_cost_per_second": 0.0000277778,
"mode": "audio_transcription",
"source": "https://soniox.com/pricing",
"supported_endpoints": [
"/v1/audio/transcriptions"
],
"supports_audio_input": true
}
}
}

View File

@ -1539,6 +1539,23 @@
"interactions": true
}
},
"neosantara": {
"display_name": "Neosantara (`neosantara`)",
"url": "https://docs.litellm.ai/docs/providers/neosantara",
"endpoints": {
"chat_completions": true,
"messages": false,
"responses": true,
"embeddings": false,
"image_generations": false,
"audio_transcriptions": false,
"audio_speech": false,
"moderations": false,
"batches": false,
"rerank": false,
"a2a": false
}
},
"nvidia_nim": {
"display_name": "Nvidia NIM (`nvidia_nim`)",
"url": "https://docs.litellm.ai/docs/providers/nvidia_nim",

View File

@ -684,6 +684,7 @@ class MCPServerManager:
),
allow_sampling=bool(server_config.get("allow_sampling", False)),
allow_elicitation=bool(server_config.get("allow_elicitation", False)),
timeout=server_config.get("timeout", None),
)
self._assign_unique_short_prefix(new_server)
_warn_internal_delegate_pkce_if_applicable(new_server, source="config")
@ -1096,6 +1097,7 @@ class MCPServerManager:
credentials_dict.get("subject_token_type") if credentials_dict else None
)
or "urn:ietf:params:oauth:token-type:access_token",
timeout=getattr(mcp_server, "timeout", None),
)
_warn_internal_delegate_pkce_if_applicable(new_server, source="database")
return new_server
@ -1662,7 +1664,9 @@ class MCPServerManager:
transport_type=transport,
auth_type=server.auth_type,
auth_value=auth_value,
timeout=MCP_CLIENT_TIMEOUT,
timeout=(
server.timeout if server.timeout is not None else MCP_CLIENT_TIMEOUT
),
stdio_config=stdio_config,
extra_headers=extra_headers,
sampling_callback=sampling_cb,
@ -1690,7 +1694,9 @@ class MCPServerManager:
transport_type=transport,
auth_type=server.auth_type,
auth_value=auth_value,
timeout=MCP_CLIENT_TIMEOUT,
timeout=(
server.timeout if server.timeout is not None else MCP_CLIENT_TIMEOUT
),
extra_headers=extra_headers,
aws_auth=aws_auth,
sampling_callback=sampling_cb,
@ -3158,14 +3164,26 @@ class MCPServerManager:
asyncio.create_task(_call_tool_via_client(client, call_tool_params))
)
_timeout = (
mcp_server.timeout if mcp_server.timeout is not None else MCP_CLIENT_TIMEOUT
)
try:
mcp_responses = await asyncio.gather(*tasks)
mcp_responses = await asyncio.wait_for(
asyncio.gather(*tasks), timeout=_timeout
)
except asyncio.TimeoutError:
raise HTTPException(
status_code=504,
detail={
"error": "timeout",
"message": f"MCP tool call timed out after {_timeout}s",
},
)
except (
BlockedPiiEntityError,
GuardrailRaisedException,
HTTPException,
) as e:
# Re-raise guardrail exceptions to properly fail the MCP call
verbose_logger.error(
f"Guardrail blocked MCP tool call during result check: {str(e)}"
)
@ -3953,6 +3971,7 @@ class MCPServerManager:
registration_url=server.registration_url,
allow_all_keys=server.allow_all_keys,
instructions=server.instructions,
timeout=server.timeout,
)
async def get_all_mcp_servers_with_health_and_teams(
@ -4052,6 +4071,7 @@ class MCPServerManager:
byok_api_key_help_url=server.byok_api_key_help_url,
source_url=server.source_url,
instructions=server.instructions,
timeout=server.timeout,
)
async def get_all_mcp_servers_unfiltered(self) -> List[LiteLLM_MCPServerTable]:

View File

@ -1300,6 +1300,7 @@ class NewMCPServerRequest(LiteLLMPydanticObjectBase):
byok_description: List[str] = Field(default_factory=list)
byok_api_key_help_url: Optional[str] = None
source_url: Optional[str] = None
timeout: Optional[float] = None
# BYOM submission fields — set by the endpoint, not by the caller.
# Any caller-provided values are silently overridden before persistence.
approval_status: Optional[str] = Field(
@ -1385,6 +1386,7 @@ class UpdateMCPServerRequest(LiteLLMPydanticObjectBase):
byok_description: List[str] = Field(default_factory=list)
byok_api_key_help_url: Optional[str] = None
source_url: Optional[str] = None
timeout: Optional[float] = None
@model_validator(mode="before")
@classmethod
@ -1460,6 +1462,7 @@ class LiteLLM_MCPServerTable(LiteLLMPydanticObjectBase):
byok_api_key_help_url: Optional[str] = None
has_user_credential: Optional[bool] = None
source_url: Optional[str] = None
timeout: Optional[float] = None
# BYOM submission fields
approval_status: Optional[str] = Field(
default="active",

View File

@ -8,6 +8,10 @@ search_tools:
- search_tool_name: "my-perplexity-search"
litellm_params:
search_provider: "perplexity"
# Alternative provider example (requires YOUCOM_API_KEY):
# - search_tool_name: "my-you-com-search"
# litellm_params:
# search_provider: "you_com"
litellm_settings:
callbacks: ["websearch_interception"]

View File

@ -659,6 +659,7 @@ if MCP_AVAILABLE:
registration_url=payload.registration_url,
allow_all_keys=payload.allow_all_keys,
available_on_public_internet=payload.available_on_public_internet,
timeout=payload.timeout,
)
def get_prisma_client_or_throw(message: str):

View File

@ -2570,6 +2570,24 @@
],
"default_model_placeholder": "snowflake/mistral-7b"
},
{
"provider": "Soniox",
"provider_display_name": "Soniox",
"litellm_provider": "soniox",
"credential_fields": [
{
"key": "api_key",
"label": "Soniox API Key",
"placeholder": null,
"tooltip": "Currently only the async Speech-to-Text REST API (api.soniox.com) is supported. Realtime STT (stt-rt.soniox.com) and TTS (tts-rt.soniox.com) are not yet available.",
"required": true,
"field_type": "password",
"options": null,
"default_value": null
}
],
"default_model_placeholder": "soniox/stt-async-v4"
},
{
"provider": "TEXT_COMPLETION_CODESTRAL",
"provider_display_name": "Text-Completion-Codestral",

View File

@ -331,6 +331,7 @@ model LiteLLM_MCPServerTable {
byok_description String[] @default([])
byok_api_key_help_url String?
source_url String?
timeout Float?
// BYOM submission lifecycle
approval_status String? @default("active")
submitted_by String?

View File

@ -283,6 +283,7 @@ def search(
complete_url = search_provider_config.get_complete_url(
api_base=api_base,
optional_params=optional_params,
api_key=api_key,
)
# Pre Call logging

View File

@ -109,6 +109,7 @@ class MCPServer(BaseModel):
# Defaults to the token's expires_in minus the expiry buffer, or
# MCP_PER_USER_TOKEN_DEFAULT_TTL when expires_in is absent.
token_storage_ttl_seconds: Optional[int] = None
timeout: Optional[float] = None
# Resolved short-ID tool prefix when LITELLM_USE_SHORT_MCP_TOOL_PREFIX is
# enabled. Set by ``MCPServerManager._assign_unique_short_prefix`` at
# registration time so that natural-hash collisions between two

View File

@ -3290,6 +3290,7 @@ class LlmProviders(str, Enum):
GIGACHAT = "gigachat"
NVIDIA_NIM = "nvidia_nim"
NVIDIA_RIVA = "nvidia_riva"
SONIOX = "soniox"
CEREBRAS = "cerebras"
AI21_CHAT = "ai21_chat"
VOLCENGINE = "volcengine"
@ -3374,6 +3375,7 @@ class LlmProviders(str, Enum):
NANOGPT = "nano-gpt"
POE = "poe"
CHUTES = "chutes"
NEOSANTARA = "neosantara"
XIAOMI_MIMO = "xiaomi_mimo"
TENSORMESH = "tensormesh"
LITELLM_AGENT = "litellm_agent"
@ -3416,6 +3418,7 @@ class SearchProviders(str, Enum):
DUCKDUCKGO = "duckduckgo"
SEARCHAPI = "searchapi"
SERPER = "serper"
YOU_COM = "you_com"
APISERPENT = "apiserpent"

View File

@ -8816,6 +8816,12 @@ class ProviderConfigManager:
)
return NvidiaRivaAudioTranscriptionConfig()
elif litellm.LlmProviders.SONIOX == provider:
from litellm.llms.soniox.audio_transcription.transformation import (
SonioxAudioTranscriptionConfig,
)
return SonioxAudioTranscriptionConfig()
return None
@staticmethod
@ -9518,6 +9524,7 @@ class ProviderConfigManager:
from litellm.llms.searxng.search.transformation import SearXNGSearchConfig
from litellm.llms.serper.search.transformation import SerperSearchConfig
from litellm.llms.tavily.search.transformation import TavilySearchConfig
from litellm.llms.you_com.search.transformation import YouComSearchConfig
PROVIDER_TO_CONFIG_MAP = {
SearchProviders.PERPLEXITY: PerplexitySearchConfig,
@ -9533,6 +9540,7 @@ class ProviderConfigManager:
SearchProviders.DUCKDUCKGO: DuckDuckGoSearchConfig,
SearchProviders.SEARCHAPI: SearchAPIConfig,
SearchProviders.SERPER: SerperSearchConfig,
SearchProviders.YOU_COM: YouComSearchConfig,
SearchProviders.APISERPENT: APISerpentSearchConfig,
}
config_class = PROVIDER_TO_CONFIG_MAP.get(provider, None)

View File

@ -1463,6 +1463,36 @@
"supports_output_config": true,
"bedrock_output_config_effort_ceiling": "xhigh"
},
"jp.anthropic.claude-opus-4-7": {
"cache_creation_input_token_cost": 6.875e-06,
"cache_read_input_token_cost": 5.5e-07,
"input_cost_per_token": 5.5e-06,
"litellm_provider": "bedrock_converse",
"max_input_tokens": 1000000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "chat",
"output_cost_per_token": 2.75e-05,
"search_context_cost_per_query": {
"search_context_size_high": 0.01,
"search_context_size_low": 0.01,
"search_context_size_medium": 0.01
},
"supports_assistant_prefill": false,
"supports_computer_use": true,
"supports_function_calling": true,
"supports_pdf_input": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_xhigh_reasoning_effort": true,
"tool_use_system_prompt_tokens": 346,
"supports_native_structured_output": true,
"supports_max_reasoning_effort": true,
"supports_minimal_reasoning_effort": true
},
"anthropic.claude-sonnet-4-6": {
"cache_creation_input_token_cost": 3.75e-06,
"cache_creation_input_token_cost_above_1hr": 6e-06,
@ -24096,6 +24126,21 @@
"max_input_tokens": 200000,
"max_output_tokens": 8192
},
"minimax/MiniMax-M3": {
"input_cost_per_token": 6e-07,
"output_cost_per_token": 2.4e-06,
"cache_read_input_token_cost": 1.2e-07,
"litellm_provider": "minimax",
"mode": "chat",
"supports_function_calling": true,
"supports_tool_choice": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_system_messages": true,
"supports_vision": true,
"max_input_tokens": 512000,
"max_output_tokens": 128000
},
"mistral.devstral-2-123b": {
"input_cost_per_token": 4e-07,
"litellm_provider": "bedrock_converse",
@ -24978,6 +25023,7 @@
},
"moonshot/kimi-k2-0711-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -24992,6 +25038,7 @@
},
"moonshot/kimi-k2-0905-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25006,6 +25053,7 @@
},
"moonshot/kimi-k2-turbo-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 1.15e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25030,6 +25078,7 @@
"source": "https://platform.moonshot.ai/docs/guide/kimi-k2-5-quickstart",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_video_input": true,
"supports_vision": true
@ -25046,12 +25095,14 @@
"source": "https://platform.kimi.ai/docs/pricing/chat-k26",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_video_input": true,
"supports_vision": true
},
"moonshot/kimi-latest": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 2e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25066,6 +25117,7 @@
},
"moonshot/kimi-latest-128k": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 2e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25080,6 +25132,7 @@
},
"moonshot/kimi-latest-32k": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 1e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 32768,
@ -25094,6 +25147,7 @@
},
"moonshot/kimi-latest-8k": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-01-28",
"input_cost_per_token": 2e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 8192,
@ -25108,6 +25162,7 @@
},
"moonshot/kimi-thinking-preview": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2025-11-11",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25120,6 +25175,7 @@
},
"moonshot/kimi-k2-thinking": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 6e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25135,6 +25191,7 @@
},
"moonshot/kimi-k2-thinking-turbo": {
"cache_read_input_token_cost": 1.5e-07,
"deprecation_date": "2026-05-25",
"input_cost_per_token": 1.15e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 262144,
@ -25158,9 +25215,11 @@
"output_cost_per_token": 5e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"moonshot/moonshot-v1-128k-0430": {
"deprecation_date": "2024-04-30",
"input_cost_per_token": 2e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 131072,
@ -25182,6 +25241,7 @@
"output_cost_per_token": 5e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true
},
@ -25195,9 +25255,11 @@
"output_cost_per_token": 3e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"moonshot/moonshot-v1-32k-0430": {
"deprecation_date": "2024-04-30",
"input_cost_per_token": 1e-06,
"litellm_provider": "moonshot",
"max_input_tokens": 32768,
@ -25219,6 +25281,7 @@
"output_cost_per_token": 3e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true
},
@ -25232,9 +25295,11 @@
"output_cost_per_token": 2e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"moonshot/moonshot-v1-8k-0430": {
"deprecation_date": "2024-04-30",
"input_cost_per_token": 2e-07,
"litellm_provider": "moonshot",
"max_input_tokens": 8192,
@ -25256,6 +25321,7 @@
"output_cost_per_token": 2e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true
},
@ -25269,6 +25335,7 @@
"output_cost_per_token": 5e-06,
"source": "https://platform.moonshot.ai/docs/pricing",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true
},
"morph/morph-v3-fast": {
@ -30977,6 +31044,11 @@
"litellm_provider": "tavily",
"mode": "search"
},
"you_com/search": {
"input_cost_per_query": 0.0,
"litellm_provider": "you_com",
"mode": "search"
},
"text-completion-codestral/codestral-2405": {
"input_cost_per_token": 0.0,
"litellm_provider": "text-completion-codestral",
@ -36047,7 +36119,8 @@
"supports_prompt_caching": true,
"supports_response_schema": false,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-3-beta": {
"cache_read_input_token_cost": 7.5e-07,
@ -36246,7 +36319,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-fast-non-reasoning": {
"cache_read_input_token_cost": 5e-08,
@ -36263,7 +36337,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-0709": {
"input_cost_per_token": 3e-06,
@ -36279,7 +36354,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_tool_choice": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-latest": {
"input_cost_per_token": 3e-06,
@ -36337,7 +36413,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-1-fast-reasoning-latest": {
"cache_read_input_token_cost": 5e-08,
@ -36358,7 +36435,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-1-fast-non-reasoning": {
"cache_read_input_token_cost": 5e-08,
@ -36378,7 +36456,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4-1-fast-non-reasoning-latest": {
"cache_read_input_token_cost": 5e-08,
@ -36398,7 +36477,8 @@
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
"supports_web_search": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-4.20-multi-agent-beta-0309": {
"cache_read_input_token_cost": 2e-07,
@ -36549,7 +36629,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_tool_choice": true
"supports_tool_choice": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-code-fast-1-0825": {
"cache_read_input_token_cost": 2e-08,
@ -36564,7 +36645,8 @@
"supports_function_calling": true,
"supports_prompt_caching": true,
"supports_reasoning": true,
"supports_tool_choice": true
"supports_tool_choice": true,
"deprecation_date": "2026-05-15"
},
"xai/grok-vision-beta": {
"input_cost_per_image": 5e-06,
@ -41756,6 +41838,16 @@
"output_cost_per_token": 0.0,
"litellm_provider": "snowflake",
"mode": "embedding"
},
"soniox/stt-async-v4": {
"litellm_provider": "soniox",
"max_output_tokens": 8000,
"max_tokens": 8000,
"input_cost_per_second": 0.0,
"output_cost_per_second": 0.0000277778,
"mode": "audio_transcription",
"source": "https://soniox.com/pricing",
"supported_endpoints": ["/v1/audio/transcriptions"],
"supports_audio_input": true
}
}

View File

@ -1556,6 +1556,23 @@
"interactions": true
}
},
"neosantara": {
"display_name": "Neosantara (`neosantara`)",
"url": "https://docs.litellm.ai/docs/providers/neosantara",
"endpoints": {
"chat_completions": true,
"messages": false,
"responses": true,
"embeddings": false,
"image_generations": false,
"audio_transcriptions": false,
"audio_speech": false,
"moderations": false,
"batches": false,
"rerank": false,
"a2a": false
}
},
"nlp_cloud": {
"display_name": "NLP Cloud (`nlp_cloud`)",
"url": "https://docs.litellm.ai/docs/providers/nlp_cloud",
@ -2081,6 +2098,22 @@
"interactions": true
}
},
"soniox": {
"display_name": "Soniox (`soniox`)",
"url": "https://docs.litellm.ai/docs/providers/soniox",
"endpoints": {
"chat_completions": false,
"messages": false,
"responses": false,
"embeddings": false,
"image_generations": false,
"audio_transcriptions": true,
"audio_speech": false,
"moderations": false,
"batches": false,
"rerank": false
}
},
"synthetic": {
"display_name": "Synthetic (`synthetic`)",
"endpoints": {
@ -2190,6 +2223,10 @@
"search": true
}
},
"you_com": {
"display_name": "You.com (`you_com`)",
"url": "https://docs.litellm.ai/docs/search/you_com"
},
"apiserpent": {
"display_name": "APISerpent (`apiserpent`)",
"url": "https://docs.litellm.ai/docs/search/apiserpent",

View File

@ -331,6 +331,7 @@ model LiteLLM_MCPServerTable {
byok_description String[] @default([])
byok_api_key_help_url String?
source_url String?
timeout Float?
// BYOM submission lifecycle
approval_status String? @default("active")
submitted_by String?

View File

@ -1,8 +1,8 @@
import os
from unittest.mock import MagicMock, patch
from litellm.integrations.langfuse.langfuse_prompt_management import (
LangfusePromptManagement,
langfuse_client_init,
)
@ -65,3 +65,44 @@ class TestLangfusePromptManagement:
mock_run_async.call_args[0][0]
== langfuse_prompt_management.async_log_failure_event
)
def test_langfuse_client_init_passes_dedicated_httpx_client(self):
import httpx
from litellm.llms.custom_httpx.http_handler import _get_httpx_client
shared_client = _get_httpx_client().client
mock_langfuse_class = MagicMock()
with (
patch(
"litellm.integrations.langfuse.langfuse_prompt_management.resolve_langfuse_credentials",
return_value=("pk-1234", "sk-1234", "https://localhost"),
),
patch(
"litellm.integrations.langfuse.langfuse_prompt_management.LangFuseLogger._get_langfuse_flush_interval",
return_value=1,
),
patch.dict("sys.modules", {"langfuse": self._mock_langfuse}),
patch(
"litellm.llms.custom_httpx.http_handler.get_ssl_configuration",
return_value=False,
) as mock_get_ssl,
):
self._mock_langfuse.Langfuse = mock_langfuse_class
langfuse_client_init(
langfuse_public_key="pk-1234",
langfuse_secret="sk-1234",
langfuse_host="https://localhost",
)
mock_langfuse_class.assert_called_once()
call_kwargs = mock_langfuse_class.call_args[1]
assert "httpx_client" in call_kwargs
passed_client = call_kwargs["httpx_client"]
assert isinstance(passed_client, httpx.Client)
assert passed_client is not shared_client
mock_get_ssl.assert_called_once()
langfuse_client_init.cache_clear()

View File

@ -23,6 +23,7 @@ class TestOpenMeterIntegration:
os.environ.pop("OPENMETER_API_KEY", None)
os.environ.pop("OPENMETER_API_ENDPOINT", None)
os.environ.pop("OPENMETER_EVENT_TYPE", None)
os.environ.pop("OPENMETER_TRUST_REQUEST_USER", None)
def test_openmeter_logger_initialization(self):
"""Test that OpenMeterLogger initializes correctly with required env vars"""
@ -388,6 +389,75 @@ class TestOpenMeterIntegration:
assert isinstance(result["subject"], str)
assert result["subject"] == "12345"
def test_common_logic_trust_request_user_false_ignores_request_user(self):
"""OPENMETER_TRUST_REQUEST_USER=false makes the key-bound user_id win
over a request-supplied `user` (forge-attribution mitigation)."""
os.environ["OPENMETER_TRUST_REQUEST_USER"] = "false"
logger = OpenMeterLogger()
kwargs = {
"user": "forged-by-client",
"model": "gpt-4",
"response_cost": 0.002,
"litellm_call_id": "test-call-id",
"litellm_params": {
"metadata": {"user_api_key_user_id": "real-tenant-id"}
},
}
response_obj = {
"id": "test-response-id",
"usage": {"prompt_tokens": 20, "completion_tokens": 10, "total_tokens": 30},
}
result = logger._common_logic(kwargs, response_obj)
assert result["subject"] == "real-tenant-id"
assert result["subject"] != "forged-by-client"
def test_common_logic_trust_request_user_false_still_raises_without_key_user(self):
"""OPENMETER_TRUST_REQUEST_USER=false still raises when no
user_api_key_user_id is available the request `user` is not a
fallback in this mode."""
os.environ["OPENMETER_TRUST_REQUEST_USER"] = "false"
logger = OpenMeterLogger()
kwargs = {
"user": "would-have-worked-without-the-flag",
"model": "gpt-3.5-turbo",
"response_cost": 0.001,
"litellm_call_id": "test-call-id",
}
response_obj = {"id": "test-response-id"}
with pytest.raises(Exception, match="OpenMeter: user is required"):
logger._common_logic(kwargs, response_obj)
def test_common_logic_trust_request_user_default_preserves_behavior(self):
"""Default (unset OPENMETER_TRUST_REQUEST_USER) keeps request `user`
taking priority backward compatibility."""
# OPENMETER_TRUST_REQUEST_USER intentionally unset
logger = OpenMeterLogger()
kwargs = {
"user": "request-user",
"model": "gpt-4",
"response_cost": 0.002,
"litellm_call_id": "test-call-id",
"litellm_params": {
"metadata": {"user_api_key_user_id": "key-user"}
},
}
response_obj = {
"id": "test-response-id",
"usage": {"prompt_tokens": 20, "completion_tokens": 10, "total_tokens": 30},
}
result = logger._common_logic(kwargs, response_obj)
assert result["subject"] == "request-user"
@patch("litellm.integrations.openmeter.HTTPHandler")
def test_integration_token_user_id_scenario(self, mock_http_handler):
"""Integration test simulating the exact scenario that was failing"""

View File

@ -3114,3 +3114,44 @@ def test_get_error_information_prefers_message_attribute_over_empty_str():
)
assert info["error_message"] == "real failure detail"
assert info["error_code"] == "401"
@pytest.mark.parametrize(
"event_cls, event_type",
[
("ResponseCompletedEvent", "response.completed"),
("ResponseIncompleteEvent", "response.incomplete"),
("ResponseFailedEvent", "response.failed"),
],
)
def test_handle_anthropic_messages_response_logging_with_terminal_responses_api_events(
event_cls, event_type
):
"""Regression test for #28943: when anthropic_messages routes to OpenAI Responses
API and stream=True, success_handler receives a terminal ResponsesAPI event instead
of a ModelResponse. The handler must return the inner ResponsesAPIResponse rather
than crashing with AnthropicResponse.model_validate."""
import importlib
openai_types = importlib.import_module("litellm.types.llms.openai")
EventClass = getattr(openai_types, event_cls)
from litellm.types.llms.openai import ResponsesAPIResponse
logging_obj = LitellmLogging(
model="gpt-4o",
messages=[{"role": "user", "content": "hello"}],
stream=True,
call_type="anthropic_messages",
start_time=time.time(),
litellm_call_id="test-rce-123",
function_id="test-fn",
)
inner_response = ResponsesAPIResponse(
id="resp_test", created_at=1700000000, output=[]
)
event = EventClass(type=event_type, response=inner_response)
result = logging_obj._handle_anthropic_messages_response_logging(result=event)
assert result is inner_response

View File

@ -6,7 +6,9 @@ sys.path.insert(
0, os.path.abspath("../../../../..")
) # Adds the parent directory to the system path
import litellm
from litellm.llms.cohere.chat.transformation import CohereChatConfig
from litellm.llms.cohere.chat.v2_transformation import CohereV2ChatConfig
class TestCohereTransform:
@ -49,3 +51,69 @@ class TestCohereTransform:
# The function should properly map max_tokens if max_completion_tokens is not provided
assert result == {"temperature": 0.7, "max_tokens": 200}
class TestCohereV2Transform:
def setup_method(self):
self.config = CohereV2ChatConfig()
self.model = "command-r"
def test_v2_supports_max_completion_tokens(self):
"""max_completion_tokens must be advertised so get_optional_params does not reject it"""
assert "max_completion_tokens" in self.config.get_supported_openai_params(
self.model
)
def test_v2_max_tokens_only_still_maps(self):
"""max_tokens alone maps to cohere max_tokens when max_completion_tokens is absent"""
result = self.config.map_openai_params(
non_default_params={"temperature": 0.7, "max_tokens": 200},
optional_params={},
model=self.model,
drop_params=False,
)
assert result == {"temperature": 0.7, "max_tokens": 200}
def test_v2_map_max_completion_tokens_overrides_max_tokens(self):
"""max_completion_tokens maps to cohere max_tokens and overrides max_tokens, matching v1"""
result = self.config.map_openai_params(
non_default_params={
"temperature": 0.7,
"max_tokens": 200,
"max_completion_tokens": 256,
},
optional_params={},
model=self.model,
drop_params=False,
)
assert result == {"temperature": 0.7, "max_tokens": 256}
def test_v2_max_completion_tokens_precedence_is_order_independent(self):
"""max_completion_tokens wins over max_tokens regardless of dict ordering"""
max_tokens_first = self.config.map_openai_params(
non_default_params={"max_tokens": 200, "max_completion_tokens": 256},
optional_params={},
model=self.model,
drop_params=False,
)
max_completion_first = self.config.map_openai_params(
non_default_params={"max_completion_tokens": 256, "max_tokens": 200},
optional_params={},
model=self.model,
drop_params=False,
)
assert max_tokens_first == {"max_tokens": 256}
assert max_completion_first == {"max_tokens": 256}
def test_v2_default_route_accepts_max_completion_tokens(self):
"""The default cohere_chat route resolves to v2; max_completion_tokens must not raise"""
optional_params = litellm.get_optional_params(
model=self.model,
custom_llm_provider="cohere_chat",
max_completion_tokens=256,
)
assert optional_params["max_tokens"] == 256

View File

@ -798,3 +798,56 @@ def test_get_httpx_client_applies_httpx_timeout_object_without_mocking_handler()
assert handler.client.timeout == t
finally:
handler.close()
def test_sync_get_forwards_per_request_timeout():
"""HTTPHandler.get(timeout=...) must apply the timeout to that request,
overriding the client default rather than silently ignoring it."""
captured = {}
def mock_handler(request: httpx.Request) -> httpx.Response:
captured["timeout"] = request.extensions.get("timeout")
return httpx.Response(200, request=request, json={"ok": True})
handler = HTTPHandler()
handler.client.close()
handler.client = httpx.Client(
transport=httpx.MockTransport(mock_handler),
timeout=httpx.Timeout(5.0),
)
try:
handler.get("https://example.com/poll", timeout=99.0)
assert captured["timeout"] == {
"connect": 99.0,
"read": 99.0,
"write": 99.0,
"pool": 99.0,
}
finally:
handler.close()
@pytest.mark.asyncio
async def test_async_get_forwards_per_request_timeout():
captured = {}
async def mock_handler(request: httpx.Request) -> httpx.Response:
captured["timeout"] = request.extensions.get("timeout")
return httpx.Response(200, request=request, json={"ok": True})
handler = AsyncHTTPHandler()
await handler.client.aclose()
handler.client = httpx.AsyncClient(
transport=httpx.MockTransport(mock_handler),
timeout=httpx.Timeout(5.0),
)
try:
await handler.get("https://example.com/poll", timeout=99.0)
assert captured["timeout"] == {
"connect": 99.0,
"read": 99.0,
"write": 99.0,
"pool": 99.0,
}
finally:
await handler.close()

View File

@ -9,15 +9,12 @@ import os
import sys
from unittest.mock import patch
sys.path.insert(
0, os.path.abspath("../../../../..")
) # Adds the parent directory to the system path
sys.path.insert(0, os.path.abspath("../../../../..")) # Adds the parent directory to the system path
import pytest
import litellm
import litellm.utils
from litellm import completion
from litellm.litellm_core_utils.get_model_cost_map import GetModelCostMap
from litellm.llms.moonshot.chat.transformation import MoonshotChatConfig
@ -208,6 +205,42 @@ class TestMoonshotConfig:
# Temperature should be preserved
assert result.get("temperature") == temp
def test_temperature_dropped_for_reasoning_models(self):
"""Reasoning models (kimi-k2.5, kimi-k2.6) reject any temperature except 1,
so the param is dropped rather than clamped. A clamp to 0.3/1 would still
400 when the caller passes e.g. 0.5."""
config = MoonshotChatConfig()
with patch(
"litellm.llms.moonshot.chat.transformation.supports_reasoning",
return_value=True,
):
for temp in [0.0, 0.5, 1.0, 1.5]:
result = config.map_openai_params(
non_default_params={"temperature": temp},
optional_params={},
model="kimi-k2.5",
drop_params=False,
)
assert "temperature" not in result
def test_temperature_clamped_for_non_reasoning_models(self):
"""Non-reasoning models keep the [0.3, 1] clamp behaviour."""
config = MoonshotChatConfig()
with patch(
"litellm.llms.moonshot.chat.transformation.supports_reasoning",
return_value=False,
):
result = config.map_openai_params(
non_default_params={"temperature": 1.5},
optional_params={},
model="moonshot-v1-8k",
drop_params=False,
)
assert result.get("temperature") == 1
def test_tool_choice_required_adds_message(self):
"""Test that tool_choice='required' adds a special message and removes tool_choice"""
config = MoonshotChatConfig()
@ -232,10 +265,7 @@ class TestMoonshotConfig:
assert result["messages"][0]["role"] == "user"
assert result["messages"][0]["content"] == "What's the weather like?"
assert result["messages"][1]["role"] == "user"
assert (
result["messages"][1]["content"]
== "Please select a tool to handle the current issue."
)
assert result["messages"][1]["content"] == "Please select a tool to handle the current issue."
# Check that tool_choice was removed but tools are preserved
assert "tool_choice" not in result
@ -273,10 +303,7 @@ class TestMoonshotConfig:
# Check that the message was added
assert len(result["messages"]) == 2
assert (
result["messages"][1]["content"]
== "Please select a tool to handle the current issue."
)
assert result["messages"][1]["content"] == "Please select a tool to handle the current issue."
def test_tool_choice_non_required_preserved(self):
"""Test that non-'required' tool_choice values are preserved"""
@ -501,9 +528,7 @@ class TestMoonshotConfig:
assert result[0].get("reasoning_content") == "stored thinking"
# The promoted key must be removed from provider_specific_fields to
# avoid sending the value twice in the serialised request body
assert "reasoning_content" not in (
result[0].get("provider_specific_fields") or {}
)
assert "reasoning_content" not in (result[0].get("provider_specific_fields") or {})
def test_reasoning_model_fill_called_from_transform_request(self):
"""transform_request injects reasoning_content end-to-end for reasoning models."""
@ -603,10 +628,7 @@ class TestMoonshotConfig:
result = config.fill_reasoning_content(messages)
# reasoning_content should be preserved, not replaced with placeholder
assert (
result[0].get("reasoning_content")
== "<thinking>User wants weather</thinking>"
)
assert result[0].get("reasoning_content") == "<thinking>User wants weather</thinking>"
def test_reasoning_content_preserved_in_multi_turn_flow(self):
"""reasoning_content is preserved through multi-turn conversation flow.
@ -650,10 +672,7 @@ class TestMoonshotConfig:
result = config.fill_reasoning_content(messages)
# reasoning_content should be preserved in the assistant message
assert (
result[1].get("reasoning_content")
== "<thinking>Planning to call weather tool</thinking>"
)
assert result[1].get("reasoning_content") == "<thinking>Planning to call weather tool</thinking>"
class TestKimiK26ModelRegistry:
@ -695,3 +714,33 @@ class TestKimiK26ModelRegistry:
"""kimi-k2.6 should be assigned to the moonshot provider."""
model_info = model_cost_map["moonshot/kimi-k2.6"]
assert model_info["litellm_provider"] == "moonshot"
class TestMoonshotResponseSchemaSupport:
"""Every model currently live on api.moonshot.ai supports json_schema
response_format, which gates discovery via litellm.responses(). The flag
must be true so the capability is advertised honestly."""
LIVE_MODELS = [
"moonshot/kimi-k2.5",
"moonshot/kimi-k2.6",
"moonshot/moonshot-v1-8k",
"moonshot/moonshot-v1-32k",
"moonshot/moonshot-v1-128k",
"moonshot/moonshot-v1-8k-vision-preview",
"moonshot/moonshot-v1-32k-vision-preview",
"moonshot/moonshot-v1-128k-vision-preview",
"moonshot/moonshot-v1-auto",
]
@pytest.fixture(autouse=True)
def model_cost_map(self):
return GetModelCostMap.load_local_model_cost_map()
@pytest.mark.parametrize("model", LIVE_MODELS)
def test_live_model_supports_response_schema(self, model, model_cost_map):
assert model_cost_map[model].get("supports_response_schema") is True
def test_supports_response_schema_utility_reports_true(self, model_cost_map, monkeypatch):
monkeypatch.setattr(litellm, "model_cost", model_cost_map)
assert litellm.utils.supports_response_schema(model="moonshot/kimi-k2.5") is True

View File

@ -0,0 +1,100 @@
import os
from unittest.mock import patch
NEOSANTARA_API_BASE = "https://api.neosantara.xyz/v1"
def test_neosantara_json_registry():
import litellm
from litellm.llms.openai_like.json_loader import JSONProviderRegistry
assert litellm.LlmProviders.NEOSANTARA.value == "neosantara"
assert litellm.LlmProviders("neosantara") == litellm.LlmProviders.NEOSANTARA
assert JSONProviderRegistry.exists("neosantara")
config = JSONProviderRegistry.get("neosantara")
assert config is not None
assert config.base_url == NEOSANTARA_API_BASE
assert config.api_key_env == "NEOSANTARA_API_KEY"
assert config.api_base_env == "NEOSANTARA_API_BASE"
assert config.param_mappings["max_completion_tokens"] == "max_tokens"
assert "/v1/chat/completions" in config.supported_endpoints
assert "/v1/responses" in config.supported_endpoints
def test_neosantara_dynamic_config_env_vars():
from litellm.llms.openai_like.dynamic_config import create_config_class
from litellm.llms.openai_like.json_loader import JSONProviderRegistry
config = create_config_class(JSONProviderRegistry.get("neosantara"))()
with patch.dict(
os.environ,
{
"NEOSANTARA_API_KEY": "test-key",
"NEOSANTARA_API_BASE": "https://custom.neosantara.example/v1",
},
):
api_base, api_key = config._get_openai_compatible_provider_info(None, None)
assert api_base == "https://custom.neosantara.example/v1"
assert api_key == "test-key"
def test_neosantara_provider_detection_by_prefix():
from litellm.litellm_core_utils.get_llm_provider_logic import get_llm_provider
model, provider, _, api_base = get_llm_provider("neosantara/gemini-3-flash")
assert model == "gemini-3-flash"
assert provider == "neosantara"
assert api_base == NEOSANTARA_API_BASE
def test_neosantara_chat_complete_url():
from litellm.llms.openai_like.dynamic_config import create_config_class
from litellm.llms.openai_like.json_loader import JSONProviderRegistry
config = create_config_class(JSONProviderRegistry.get("neosantara"))()
assert (
config.get_complete_url(
api_base=None,
api_key=None,
model="gemini-3-flash",
optional_params={},
litellm_params={},
)
== "https://api.neosantara.xyz/v1/chat/completions"
)
def test_neosantara_maps_max_completion_tokens_to_max_tokens():
from litellm.llms.openai_like.dynamic_config import create_config_class
from litellm.llms.openai_like.json_loader import JSONProviderRegistry
config = create_config_class(JSONProviderRegistry.get("neosantara"))()
optional_params = config.map_openai_params(
non_default_params={"max_completion_tokens": 7},
optional_params={},
model="gemini-3-flash",
drop_params=False,
)
assert optional_params == {"max_tokens": 7}
def test_neosantara_responses_api_config():
from litellm.llms.openai.responses.transformation import OpenAIResponsesAPIConfig
from litellm.utils import ProviderConfigManager
config = ProviderConfigManager.get_provider_responses_api_config(
provider="neosantara",
model="claude-opus-4-6",
)
assert isinstance(config, OpenAIResponsesAPIConfig)
assert config.custom_llm_provider == "neosantara"
assert (
config.get_complete_url(api_base=None, litellm_params={})
== "https://api.neosantara.xyz/v1/responses"
)

View File

@ -0,0 +1 @@
"""Soniox provider tests."""

View File

@ -0,0 +1 @@
"""Soniox audio transcription tests."""

View File

@ -0,0 +1,495 @@
"""Tests for SonioxAudioTranscriptionConfig."""
import json
from typing import Any, Dict, Optional
from unittest.mock import patch
import httpx
import pytest
from litellm.llms.soniox.audio_transcription.transformation import (
SonioxAudioTranscriptionConfig,
)
from litellm.llms.soniox.common_utils import SonioxException
from litellm.types.utils import TranscriptionResponse
def _make_response(payload: Dict[str, Any], status_code: int = 200) -> httpx.Response:
return httpx.Response(
status_code=status_code,
content=json.dumps(payload).encode("utf-8"),
headers={"content-type": "application/json"},
)
class TestGetSupportedOpenAIParams:
def test_should_advertise_language_and_response_format(self):
cfg = SonioxAudioTranscriptionConfig()
assert cfg.get_supported_openai_params(model="stt-async-v4") == [
"language",
"response_format",
]
class TestMapOpenAIParams:
def test_should_translate_language_to_language_hints(self):
cfg = SonioxAudioTranscriptionConfig()
result = cfg.map_openai_params(
non_default_params={"language": "en"},
optional_params={},
model="stt-async-v4",
drop_params=False,
)
assert result["language_hints"] == ["en"]
def test_should_prepend_language_to_existing_hints(self):
cfg = SonioxAudioTranscriptionConfig()
result = cfg.map_openai_params(
non_default_params={"language": "en"},
optional_params={"language_hints": ["fr"]},
model="stt-async-v4",
drop_params=False,
)
assert result["language_hints"] == ["en", "fr"]
def test_should_not_duplicate_language_already_in_hints(self):
cfg = SonioxAudioTranscriptionConfig()
result = cfg.map_openai_params(
non_default_params={"language": "en"},
optional_params={"language_hints": ["en", "fr"]},
model="stt-async-v4",
drop_params=False,
)
assert result["language_hints"] == ["en", "fr"]
def test_should_passthrough_soniox_native_kwargs(self):
cfg = SonioxAudioTranscriptionConfig()
result = cfg.map_openai_params(
non_default_params={
"enable_speaker_diarization": True,
"enable_language_identification": True,
"context": "medical conversation",
"audio_url": "https://example.com/a.wav",
},
optional_params={},
model="stt-async-v4",
drop_params=False,
)
assert result["enable_speaker_diarization"] is True
assert result["enable_language_identification"] is True
assert result["context"] == "medical conversation"
assert result["audio_url"] == "https://example.com/a.wav"
def test_should_passthrough_handler_only_kwargs(self):
cfg = SonioxAudioTranscriptionConfig()
result = cfg.map_openai_params(
non_default_params={
"soniox_polling_interval": 0.5,
"soniox_max_polling_attempts": 10,
"soniox_cleanup": ["file"],
},
optional_params={},
model="stt-async-v4",
drop_params=False,
)
assert result["soniox_polling_interval"] == 0.5
assert result["soniox_max_polling_attempts"] == 10
assert result["soniox_cleanup"] == ["file"]
class TestValidateEnvironment:
def test_should_set_bearer_token_from_api_key(self):
cfg = SonioxAudioTranscriptionConfig()
headers = cfg.validate_environment(
headers={},
model="stt-async-v4",
messages=[],
optional_params={},
litellm_params={},
api_key="sk-test",
)
assert headers["Authorization"] == "Bearer sk-test"
def test_should_resolve_key_from_env(self, monkeypatch):
monkeypatch.setenv("SONIOX_API_KEY", "env-key")
cfg = SonioxAudioTranscriptionConfig()
headers = cfg.validate_environment(
headers={},
model="stt-async-v4",
messages=[],
optional_params={},
litellm_params={},
)
assert headers["Authorization"] == "Bearer env-key"
def test_should_raise_when_no_api_key(self, monkeypatch):
monkeypatch.delenv("SONIOX_API_KEY", raising=False)
cfg = SonioxAudioTranscriptionConfig()
with pytest.raises(SonioxException) as exc_info:
cfg.validate_environment(
headers={},
model="stt-async-v4",
messages=[],
optional_params={},
litellm_params={},
)
assert exc_info.value.status_code == 401
def test_should_merge_caller_headers(self):
cfg = SonioxAudioTranscriptionConfig()
headers = cfg.validate_environment(
headers={"X-Trace-Id": "abc"},
model="stt-async-v4",
messages=[],
optional_params={},
litellm_params={},
api_key="sk-test",
)
assert headers["X-Trace-Id"] == "abc"
assert headers["Authorization"] == "Bearer sk-test"
class TestGetCompleteUrl:
def test_should_return_default_base(self):
cfg = SonioxAudioTranscriptionConfig()
url = cfg.get_complete_url(
api_base=None,
api_key="sk-test",
model="stt-async-v4",
optional_params={},
litellm_params={},
)
assert url == "https://api.soniox.com"
def test_should_strip_trailing_slash_from_custom_base(self):
cfg = SonioxAudioTranscriptionConfig()
url = cfg.get_complete_url(
api_base="https://custom.example.com/",
api_key="sk-test",
model="stt-async-v4",
optional_params={},
litellm_params={},
)
assert url == "https://custom.example.com"
class TestTransformAudioTranscriptionRequest:
def test_should_build_minimal_body_with_model(self):
cfg = SonioxAudioTranscriptionConfig()
result = cfg.transform_audio_transcription_request(
model="stt-async-v4",
audio_file=None,
optional_params={},
litellm_params={},
)
assert result.data == {"model": "stt-async-v4"}
assert result.files is None
assert result.content_type == "application/json"
def test_should_include_passthrough_params_in_body(self):
cfg = SonioxAudioTranscriptionConfig()
result = cfg.transform_audio_transcription_request(
model="stt-async-v4",
audio_file=None,
optional_params={
"audio_url": "https://example.com/a.wav",
"language_hints": ["en"],
"enable_speaker_diarization": True,
"soniox_polling_interval": 0.5, # handler-only, must NOT appear
},
litellm_params={},
)
body = result.data
assert body["audio_url"] == "https://example.com/a.wav"
assert body["language_hints"] == ["en"]
assert body["enable_speaker_diarization"] is True
assert "soniox_polling_interval" not in body
class TestTransformAudioTranscriptionResponse:
def test_should_build_response_from_plain_transcript_payload(self):
cfg = SonioxAudioTranscriptionConfig()
resp = cfg.transform_audio_transcription_response(
_make_response({"id": "tx_1", "text": "hello world"}),
)
assert resp.text == "hello world"
assert resp["task"] == "transcribe"
def test_should_build_response_from_envelope_payload(self):
cfg = SonioxAudioTranscriptionConfig()
resp = cfg.transform_audio_transcription_response(
_make_response(
{
"transcription": {"id": "tx_1", "audio_duration_ms": 2500},
"transcript": {"text": "hello world", "tokens": []},
}
),
)
assert resp.text == "hello world"
assert resp["duration"] == pytest.approx(2.5)
def test_should_render_speaker_tags_when_diarization_present(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"text": "ignored fallback",
"tokens": [
{"text": "hello", "speaker": 1},
{"text": " world", "speaker": 2},
],
}
}
resp = cfg._build_response_from_payload(payload)
assert "Speaker 1:" in resp.text
assert "Speaker 2:" in resp.text
def test_should_set_language_when_all_tokens_share_one(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"tokens": [
{"text": "hello", "language": "en"},
{"text": " world", "language": "en"},
]
}
}
resp = cfg._build_response_from_payload(payload)
assert resp["language"] == "en"
def test_should_populate_provided_model_response(self):
cfg = SonioxAudioTranscriptionConfig()
model_response = TranscriptionResponse()
model_response._hidden_params = {"pre": "existing"}
payload = {"text": "populated"}
resp = cfg._build_response_from_payload(payload, model_response=model_response)
assert resp is model_response
assert resp.text == "populated"
assert resp._hidden_params["pre"] == "existing"
assert "soniox_raw" in resp._hidden_params
def test_should_stash_raw_payload_in_hidden_params(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcription": {"id": "tx_1"},
"transcript": {"text": "hi", "tokens": []},
}
resp = cfg._build_response_from_payload(payload)
raw = resp._hidden_params["soniox_raw"]
assert raw["transcription"]["id"] == "tx_1"
assert raw["transcript"]["text"] == "hi"
def test_should_raise_on_invalid_json(self):
cfg = SonioxAudioTranscriptionConfig()
bad = httpx.Response(status_code=200, content=b"not json")
with pytest.raises(SonioxException):
cfg.transform_audio_transcription_response(bad)
def test_should_concat_token_texts_when_no_text_field_or_tags(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"tokens": [
{"text": "hello"},
{"text": " world"},
],
}
}
resp = cfg._build_response_from_payload(payload)
assert resp.text == "hello world"
def test_should_return_empty_text_for_empty_payload(self):
cfg = SonioxAudioTranscriptionConfig()
resp = cfg._build_response_from_payload({})
assert resp.text == ""
def test_should_skip_duration_when_audio_duration_ms_is_invalid(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcription": {"audio_duration_ms": "not-a-number"},
"transcript": {"text": "hi", "tokens": []},
}
resp = cfg._build_response_from_payload(payload)
assert "duration" not in resp.model_dump()
class TestRenderSonioxTokens:
def test_should_return_empty_string_for_no_tokens(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens
assert render_soniox_tokens([]) == ""
class TestRenderSonioxTokensAsSrt:
def test_should_render_basic_srt(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
tokens = [
{"text": "Hello ", "start_ms": 0, "end_ms": 500},
{"text": "world.", "start_ms": 500, "end_ms": 1000},
]
result = render_soniox_tokens_as_srt(tokens)
assert "1\n" in result
assert "00:00:00,000 --> " in result
assert "Hello world." in result
def test_should_split_cues_on_speaker_change(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
tokens = [
{"text": "Hi.", "start_ms": 0, "end_ms": 1000, "speaker": "1"},
{"text": "Hey.", "start_ms": 1500, "end_ms": 2500, "speaker": "2"},
]
result = render_soniox_tokens_as_srt(tokens)
assert "1\n" in result
assert "2\n" in result
assert "Hi." in result
assert "Hey." in result
def test_should_return_empty_string_for_no_timestamps(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
tokens = [{"text": "no timestamps"}]
result = render_soniox_tokens_as_srt(tokens)
assert result == ""
def test_should_return_empty_string_for_empty_tokens(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
assert render_soniox_tokens_as_srt([]) == ""
def test_should_format_long_timestamps_correctly(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_srt
tokens = [
{"text": "Late.", "start_ms": 3661000, "end_ms": 3662000},
]
result = render_soniox_tokens_as_srt(tokens)
# 3661000 ms = 1 hour, 1 minute, 1 second
assert "01:01:01,000" in result
class TestRenderSonioxTokensAsVtt:
def test_should_render_basic_vtt_with_header(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_vtt
tokens = [
{"text": "Hello ", "start_ms": 0, "end_ms": 500},
{"text": "world.", "start_ms": 500, "end_ms": 1000},
]
result = render_soniox_tokens_as_vtt(tokens)
assert result.startswith("WEBVTT\n")
assert "00:00:00.000 --> " in result
assert "Hello world." in result
def test_should_return_header_only_for_empty_tokens(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_vtt
result = render_soniox_tokens_as_vtt([])
assert result.startswith("WEBVTT\n")
# Only header + blank line
lines = result.strip().split("\n")
assert len(lines) == 1
def test_should_use_dot_separator_not_comma(self):
from litellm.llms.soniox.common_utils import render_soniox_tokens_as_vtt
tokens = [{"text": "Test.", "start_ms": 1500, "end_ms": 2500}]
result = render_soniox_tokens_as_vtt(tokens)
# VTT uses dots, not commas
assert "00:00:01.500" in result
assert "," not in result.replace("WEBVTT", "")
class TestBuildResponseWithResponseFormat:
def test_should_render_srt_when_response_format_is_srt(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"tokens": [
{"text": "Hello ", "start_ms": 0, "end_ms": 500},
{"text": "world.", "start_ms": 500, "end_ms": 1000},
]
}
}
resp = cfg._build_response_from_payload(payload, response_format="srt")
assert "00:00:00,000 --> " in resp.text
assert "Hello world." in resp.text
def test_should_render_vtt_when_response_format_is_vtt(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"tokens": [
{"text": "Hello ", "start_ms": 0, "end_ms": 500},
{"text": "world.", "start_ms": 500, "end_ms": 1000},
]
}
}
resp = cfg._build_response_from_payload(payload, response_format="vtt")
assert resp.text.startswith("WEBVTT\n")
assert "Hello world." in resp.text
def test_should_include_words_for_verbose_json(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"text": "Hello world.",
"tokens": [
{"text": "Hello ", "start_ms": 0, "end_ms": 500},
{"text": "world.", "start_ms": 500, "end_ms": 1000},
],
}
}
resp = cfg._build_response_from_payload(payload, response_format="verbose_json")
# text should be plain (not SRT/VTT)
assert resp.text == "Hello world."
# words should be populated
words = resp.get("words")
assert words is not None
assert len(words) == 2
assert words[0]["word"] == "Hello "
assert words[0]["start"] == 0.0
assert words[0]["end"] == 0.5
assert words[1]["start"] == 0.5
assert words[1]["end"] == 1.0
def test_should_default_to_plain_text_when_no_response_format(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"text": "Hello world.",
"tokens": [
{"text": "Hello ", "start_ms": 0, "end_ms": 500},
{"text": "world.", "start_ms": 500, "end_ms": 1000},
],
}
}
resp = cfg._build_response_from_payload(payload, response_format=None)
assert resp.text == "Hello world."
def test_should_fallback_to_plain_text_for_srt_with_no_timestamps(self):
cfg = SonioxAudioTranscriptionConfig()
payload = {
"transcript": {
"text": "No timestamps here.",
"tokens": [{"text": "No timestamps here."}],
}
}
# SRT requested but tokens have no start_ms/end_ms -> empty SRT
# falls back gracefully since _group_tokens_into_cues skips them
resp = cfg._build_response_from_payload(payload, response_format="srt")
# With no timestamp data, SRT rendering produces empty string,
# but we still get output because the code checks `tokens` truthiness
# before choosing SRT path. Actually the tokens list is truthy but
# _group_tokens_into_cues will produce no cues -> empty SRT string.
# Let's verify it doesn't crash.
assert isinstance(resp.text, str)
class TestGetErrorClass:
def test_should_return_soniox_exception(self):
cfg = SonioxAudioTranscriptionConfig()
err = cfg.get_error_class(error_message="boom", status_code=500, headers={})
assert isinstance(err, SonioxException)
assert err.status_code == 500

View File

@ -0,0 +1,42 @@
"""Tests verifying Soniox is correctly registered as a litellm provider."""
import pytest
import litellm
class TestProviderRegistration:
def test_should_expose_soniox_in_llm_providers_enum(self):
assert litellm.LlmProviders.SONIOX.value == "soniox"
def test_should_list_soniox_in_provider_list(self):
assert "soniox" in litellm.provider_list
def test_should_list_soniox_in_models_by_provider(self):
assert "soniox" in litellm.models_by_provider
def test_should_lazy_import_soniox_audio_transcription_config(self):
cls = litellm.SonioxAudioTranscriptionConfig
assert cls.__name__ == "SonioxAudioTranscriptionConfig"
# Calling again should return the same class (cached).
assert litellm.SonioxAudioTranscriptionConfig is cls
def test_should_resolve_soniox_via_get_llm_provider(self, monkeypatch):
monkeypatch.setenv("SONIOX_API_KEY", "test-key")
model, provider, api_key, api_base = litellm.get_llm_provider(
model="soniox/stt-async-v4"
)
assert provider == "soniox"
assert model == "stt-async-v4"
assert api_key == "test-key"
assert api_base == "https://api.soniox.com"
def test_should_return_soniox_config_from_provider_config_manager(self):
from litellm.utils import ProviderConfigManager
cfg = ProviderConfigManager.get_provider_audio_transcription_config(
model="stt-async-v4",
provider=litellm.LlmProviders.SONIOX,
)
assert cfg is not None
assert cfg.__class__.__name__ == "SonioxAudioTranscriptionConfig"

View File

@ -0,0 +1,384 @@
"""
Tests for You.com Search API integration.
"""
import os
import sys
import pytest
from unittest.mock import AsyncMock, patch, MagicMock
sys.path.insert(0, os.path.abspath("../.."))
import litellm
class TestYouComSearch:
"""
Tests for You.com Search functionality with mocked network responses.
"""
@pytest.fixture(autouse=True)
def _set_api_key(self, monkeypatch):
"""
Default fixture: YOUCOM_API_KEY is set, scoped to this test.
Tests that need the key absent should call `monkeypatch.delenv` themselves.
"""
monkeypatch.setenv("YOUCOM_API_KEY", "test-api-key")
@pytest.mark.asyncio
async def test_you_com_search_request_payload(self):
"""
Validate the You.com search request payload structure without real API calls.
"""
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"results": {
"web": [
{
"title": "Test Result 1",
"url": "https://example.com/1",
"description": "Brief description 1",
"snippets": ["This is a test snippet for result 1"],
"page_age": "2025-01-15T00:00:00Z",
},
{
"title": "Test Result 2",
"url": "https://example.com/2",
"description": "Brief description 2",
"snippets": ["This is a test snippet for result 2"],
"page_age": "2025-01-10T00:00:00Z",
},
],
"news": [],
},
"metadata": {
"search_uuid": "abc-123",
"query": "latest developments in AI",
"latency": 0.42,
},
}
with patch(
"litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
new_callable=AsyncMock,
) as mock_post:
mock_post.return_value = mock_response
response = await litellm.asearch(
query="latest developments in AI",
search_provider="you_com",
max_results=5,
)
assert mock_post.call_count == 1
call_args = mock_post.call_args
assert call_args.kwargs["url"] == "https://ydc-index.io/v1/search"
headers = call_args.kwargs.get("headers", {})
assert "X-API-Key" in headers
assert headers["X-API-Key"] == "test-api-key"
assert headers["Content-Type"] == "application/json"
json_data = call_args.kwargs.get("json")
assert json_data is not None
assert json_data["query"] == "latest developments in AI"
# max_results is mapped to You.com's `count` parameter
assert json_data["count"] == 5
assert hasattr(response, "results")
assert hasattr(response, "object")
assert response.object == "search"
assert len(response.results) == 2
first_result = response.results[0]
assert first_result.title == "Test Result 1"
assert first_result.url == "https://example.com/1"
assert first_result.snippet == "This is a test snippet for result 1"
assert first_result.date == "2025-01-15T00:00:00Z"
@pytest.mark.asyncio
async def test_you_com_search_domain_filter_and_country(self):
"""
Validate that Perplexity-spec optional params map to You.com's parameters:
- search_domain_filter -> include_domains
- country -> country (lowercased to match Tavily's convention)
"""
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"results": {"web": [], "news": []},
"metadata": {},
}
with patch(
"litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
new_callable=AsyncMock,
) as mock_post:
mock_post.return_value = mock_response
await litellm.asearch(
query="machine learning",
search_provider="you_com",
search_domain_filter=["arxiv.org", "nature.com"],
country="US",
)
call_args = mock_post.call_args
json_data = call_args.kwargs.get("json")
assert json_data["query"] == "machine learning"
assert json_data["include_domains"] == ["arxiv.org", "nature.com"]
# Country is normalized to lowercase, matching Tavily's behavior.
assert json_data["country"] == "us"
# search_domain_filter and max_tokens_per_page (perplexity-spec names)
# should NOT leak through to the upstream payload.
assert "search_domain_filter" not in json_data
assert "max_tokens_per_page" not in json_data
@pytest.mark.asyncio
async def test_you_com_search_snippet_fallback_to_description(self):
"""
When `snippets` is missing/empty, snippet falls back to `description`.
"""
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"results": {
"web": [
{
"title": "No snippets here",
"url": "https://example.com/3",
"description": "Fallback description text",
"snippets": [],
"page_age": None,
}
],
"news": [],
},
"metadata": {},
}
with patch(
"litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
new_callable=AsyncMock,
) as mock_post:
mock_post.return_value = mock_response
response = await litellm.asearch(
query="anything",
search_provider="you_com",
)
assert len(response.results) == 1
assert response.results[0].snippet == "Fallback description text"
assert response.results[0].date is None
@pytest.mark.asyncio
async def test_you_com_search_news_results_appended(self):
"""
News results are flattened in after web results.
"""
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"results": {
"web": [
{
"title": "Web Result",
"url": "https://example.com/web",
"snippets": ["web snippet"],
"description": "web desc",
"page_age": "2025-01-01T00:00:00Z",
}
],
"news": [
{
"title": "News Result",
"url": "https://news.example.com/article",
"description": "news desc",
"page_age": "2025-02-01T00:00:00Z",
}
],
},
"metadata": {},
}
with patch(
"litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
new_callable=AsyncMock,
) as mock_post:
mock_post.return_value = mock_response
response = await litellm.asearch(
query="anything",
search_provider="you_com",
)
assert len(response.results) == 2
assert response.results[0].title == "Web Result"
assert response.results[1].title == "News Result"
# News result has no `snippets` -> falls back to description
assert response.results[1].snippet == "news desc"
def test_you_com_search_complete_url_handles_trailing_slash(self):
"""
get_complete_url must normalize trailing slashes on api_base, so a custom
base like `https://x.example/v1/search/` does not become
`https://x.example/v1/search/v1/search`.
"""
from litellm.llms.you_com.search.transformation import YouComSearchConfig
config = YouComSearchConfig()
assert (
config.get_complete_url(
api_base="https://x.example/v1/search/", optional_params={}
)
== "https://x.example/v1/search"
)
assert (
config.get_complete_url(api_base="https://x.example/", optional_params={})
== "https://x.example/v1/search"
)
# With an API key configured, default base is the keyed endpoint.
assert (
config.get_complete_url(api_base=None, optional_params={})
== "https://ydc-index.io/v1/search"
)
@pytest.mark.asyncio
async def test_you_com_search_keyless_free_tier(self, monkeypatch):
"""
Without YOUCOM_API_KEY, the adapter targets the keyless free-tier
endpoint and sends no X-API-Key header.
"""
monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"results": {
"web": [
{
"title": "Keyless Result",
"url": "https://example.com/keyless",
"snippets": ["snippet from keyless tier"],
"description": "desc",
"page_age": "2025-03-01T00:00:00Z",
}
],
"news": [],
},
"metadata": {},
}
with patch(
"litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
new_callable=AsyncMock,
) as mock_post:
mock_post.return_value = mock_response
response = await litellm.asearch(
query="hello world",
search_provider="you_com",
)
call_args = mock_post.call_args
assert call_args.kwargs["url"] == "https://api.you.com/v1/agents/search"
headers = call_args.kwargs.get("headers", {})
assert "X-API-Key" not in headers
assert headers["Content-Type"] == "application/json"
assert len(response.results) == 1
assert response.results[0].title == "Keyless Result"
@pytest.mark.asyncio
async def test_you_com_search_programmatic_api_key_selects_keyed_endpoint(
self, monkeypatch
):
"""
When the key is passed programmatically (no YOUCOM_API_KEY in the env),
the keyed endpoint must be selected and the X-API-Key header sent, instead
of silently falling back to the keyless free tier.
"""
monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.json.return_value = {
"results": {"web": [], "news": []},
"metadata": {},
}
with patch(
"litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
new_callable=AsyncMock,
) as mock_post:
mock_post.return_value = mock_response
await litellm.asearch(
query="anything",
search_provider="you_com",
api_key="my-programmatic-key",
)
call_args = mock_post.call_args
assert call_args.kwargs["url"] == "https://ydc-index.io/v1/search"
headers = call_args.kwargs.get("headers", {})
assert headers["X-API-Key"] == "my-programmatic-key"
def test_you_com_search_complete_url_uses_programmatic_api_key(self, monkeypatch):
"""
get_complete_url selects the keyed endpoint from a forwarded api_key even
when YOUCOM_API_KEY is absent from the environment.
"""
monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
from litellm.llms.you_com.search.transformation import YouComSearchConfig
config = YouComSearchConfig()
assert (
config.get_complete_url(
api_base=None, optional_params={}, api_key="my-programmatic-key"
)
== "https://ydc-index.io/v1/search"
)
assert (
config.get_complete_url(api_base=None, optional_params={}, api_key=None)
== "https://api.you.com/v1/agents/search"
)
def test_you_com_search_validate_environment_keyless(self, monkeypatch):
"""
validate_environment must NOT raise when no key is configured
the keyless free tier is the default behavior.
"""
monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
from litellm.llms.you_com.search.transformation import YouComSearchConfig
config = YouComSearchConfig()
headers = config.validate_environment(headers={}, api_key=None)
assert "X-API-Key" not in headers
assert headers["Content-Type"] == "application/json"
def test_you_com_search_pins_identity_accept_encoding(self, monkeypatch):
"""
The adapter pins Accept-Encoding: identity to work around the keyless
endpoint advertising gzip content-encoding while returning bytes httpx
can't decode. Without this, every keyless request raises DecodingError.
"""
monkeypatch.delenv("YOUCOM_API_KEY", raising=False)
from litellm.llms.you_com.search.transformation import YouComSearchConfig
config = YouComSearchConfig()
headers = config.validate_environment(headers={}, api_key=None)
assert headers["Accept-Encoding"] == "identity"
# setdefault: a caller-supplied Accept-Encoding should win
headers = config.validate_environment(
headers={"Accept-Encoding": "gzip"}, api_key=None
)
assert headers["Accept-Encoding"] == "gzip"

View File

@ -1,4 +1,5 @@
import importlib
import asyncio
import json
import logging
import os
@ -501,7 +502,12 @@ class TestMCPServerManager:
captured_extra_headers = None
async def capture_create_mcp_client(
server, mcp_auth_header, extra_headers, stdio_env, subject_token=None, **kwargs
server,
mcp_auth_header,
extra_headers,
stdio_env,
subject_token=None,
**kwargs,
): # pragma: no cover - helper
nonlocal captured_extra_headers
captured_extra_headers = extra_headers
@ -554,7 +560,12 @@ class TestMCPServerManager:
captured_extra_headers = None
async def capture_create_mcp_client(
server, mcp_auth_header, extra_headers, stdio_env, subject_token=None, **kwargs
server,
mcp_auth_header,
extra_headers,
stdio_env,
subject_token=None,
**kwargs,
): # pragma: no cover - helper
nonlocal captured_extra_headers
captured_extra_headers = extra_headers
@ -610,7 +621,12 @@ class TestMCPServerManager:
captured_extra_headers = None
async def capture_create_mcp_client(
server, mcp_auth_header, extra_headers, stdio_env, subject_token=None, **kwargs
server,
mcp_auth_header,
extra_headers,
stdio_env,
subject_token=None,
**kwargs,
): # pragma: no cover - helper
nonlocal captured_extra_headers
captured_extra_headers = extra_headers
@ -3066,6 +3082,121 @@ class TestMCPServerTimestamps:
rebuilt_table = manager._build_mcp_server_table(mcp_server)
assert rebuilt_table.source_url == "https://github.com/org/mcp-server"
@pytest.mark.asyncio
async def test_round_trip_timeout_preserved(self):
"""timeout survives the full round-trip: LiteLLM_MCPServerTable -> MCPServer -> LiteLLM_MCPServerTable."""
manager = MCPServerManager()
table_record = LiteLLM_MCPServerTable(
server_id="timeout-server",
server_name="timeout_server",
url="https://example.com/mcp",
transport=MCPTransport.http,
timeout=120.0,
)
mcp_server = await manager.build_mcp_server_from_table(table_record)
assert mcp_server.timeout == 120.0
rebuilt_table = manager._build_mcp_server_table(mcp_server)
assert rebuilt_table.timeout == 120.0
@pytest.mark.asyncio
async def test_create_mcp_client_uses_server_timeout(self):
"""_create_mcp_client must pass server.timeout to MCPClient when set."""
manager = MCPServerManager()
server = MCPServer(
server_id="timeout-client-server",
name="timeout_client_server",
url="https://example.com/mcp",
transport=MCPTransport.http,
timeout=180.0,
)
client = await manager._create_mcp_client(server)
assert client.timeout == 180.0
@pytest.mark.asyncio
async def test_create_mcp_client_falls_back_to_global_timeout(self):
"""_create_mcp_client must fall back to MCP_CLIENT_TIMEOUT when server.timeout is None."""
from litellm.constants import MCP_CLIENT_TIMEOUT
manager = MCPServerManager()
server = MCPServer(
server_id="default-timeout-server",
name="default_timeout_server",
url="https://example.com/mcp",
transport=MCPTransport.http,
)
client = await manager._create_mcp_client(server)
assert client.timeout == MCP_CLIENT_TIMEOUT
@pytest.mark.asyncio
async def test_create_mcp_client_zero_timeout_not_treated_as_falsy(self):
"""server.timeout=0.0 must be passed through, not fall back to MCP_CLIENT_TIMEOUT."""
manager = MCPServerManager()
server = MCPServer(
server_id="zero-timeout-server",
name="zero_timeout_server",
url="https://example.com/mcp",
transport=MCPTransport.http,
timeout=0.0,
)
client = await manager._create_mcp_client(server)
assert client.timeout == 0.0
@pytest.mark.asyncio
async def test_load_servers_from_config_preserves_timeout(self):
"""timeout from proxy config is loaded into MCPServer."""
manager = MCPServerManager()
config = {
"my_server": {
"url": "https://example.com/mcp",
"transport": MCPTransport.http,
"timeout": 90.0,
}
}
await manager.load_servers_from_config(config)
servers = list(manager.config_mcp_servers.values())
assert len(servers) == 1
assert servers[0].timeout == 90.0
@pytest.mark.asyncio
async def test_call_regular_mcp_tool_timeout_returns_504(self):
"""When the MCP client call is cancelled (timeout), _call_regular_mcp_tool raises HTTPException 504."""
from unittest.mock import AsyncMock, patch
manager = MCPServerManager()
async def _slow_call(*args, **kwargs):
await asyncio.sleep(999)
mock_client = AsyncMock()
mock_client.call_tool = _slow_call
server = MCPServer(
server_id="timeout-tool-server",
name="timeout_tool_server",
url="https://example.com/mcp",
transport=MCPTransport.http,
timeout=0.01,
)
with patch.object(manager, "_create_mcp_client", return_value=mock_client):
with pytest.raises(HTTPException) as exc_info:
await manager._call_regular_mcp_tool(
mcp_server=server,
original_tool_name="some_tool",
arguments={},
tasks=[],
mcp_auth_header=None,
mcp_server_auth_headers=None,
oauth2_headers=None,
raw_headers=None,
proxy_logging_obj=None,
)
assert exc_info.value.status_code == 504
assert exc_info.value.detail["error"] == "timeout"
assert "0.01s" in exc_info.value.detail["message"]
class TestInternalDelegatePkceWarningLog:
@pytest.mark.asyncio

View File

@ -0,0 +1 @@
<svg viewBox="0 0 100 17.5" width="92" fill="white" xmlns="http://www.w3.org/2000/svg"><title>Soniox</title><path d="m0 14.866 2.1606-3.5214c1.8927 1.2576 3.9669 1.8995 5.6694 1.8995 1.0025 0 1.4606-0.3036 1.4606-0.8847v-0.0607c0-0.6419-0.9161-0.9194-2.6532-1.4138-3.2582-0.8587-5.8509-1.9602-5.8509-5.2995v-0.06938c0-3.5214 2.8088-5.4903 6.6114-5.4903 2.4112 0 4.9089 0.70255 6.8016 1.9342l-1.9791 3.6775c-1.7112-0.95408-3.5693-1.5352-4.8744-1.5352-0.88152 0-1.3396 0.33827-1.3396 0.79796v0.06071c0 0.64184 0.94202 0.95409 2.6792 1.4745 3.2582 0.91939 5.8509 2.0556 5.8509 5.2735v0.0607c0 3.6515-2.7137 5.551-6.741 5.551-2.7656-0.0087-5.5052-0.798-7.7955-2.4546z"></path><path d="m16.135 8.7342v-0.06071c0-4.7184 3.8372-8.6735 9.1436-8.6735 5.2719 0 9.0832 3.8944 9.0832 8.6127v0.06072c0 4.7184-3.8372 8.6735-9.1437 8.6735-5.2718 0-9.0831-3.8944-9.0831-8.6128zm12.583 0v-0.06071c0-2.0209-1.4606-3.7383-3.5088-3.7383-2.1001 0-3.4483 1.6826-3.4483 3.6775v0.06072c0 2.0209 1.4605 3.7383 3.5088 3.7383 2.1087 0 3.4483-1.6827 3.4483-3.6776z"></path><path d="m36.877 0.36428h5.7904v2.3332c1.063-1.3791 2.5927-2.6974 4.9348-2.6974 3.5089 0 5.609 2.3332 5.609 6.0974v10.85h-5.7905v-8.977c0-1.8041-0.942-2.7929-2.3161-2.7929-1.4001 0-2.4372 0.9801-2.4372 2.7929v8.977h-5.7904z"></path><path d="m55.951 0.36426h5.7904v16.584h-5.7904z"></path><path d="m64.29 8.7342v-0.06071c0-4.7184 3.8373-8.6735 9.1437-8.6735 5.2719 0 9.0832 3.8944 9.0832 8.6127v0.06072c0 4.7184-3.8372 8.6735-9.1437 8.6735-5.2719 0-9.0832-3.8944-9.0832-8.6128zm12.592 0v-0.06071c0-2.0209-1.4605-3.7383-3.5088-3.7383-2.1001 0-3.4483 1.6826-3.4483 3.6775v0.06072c0 2.0209 1.4606 3.7383 3.5088 3.7383 2.1088 0 3.4483-1.6827 3.4483-3.6776z"></path><path d="m88.082 8.578-5.4533-8.2138h6.2484l2.4372 4.0765 2.4371-4.0765h6.1275l-5.4274 8.1791 5.5484 8.3959h-6.2225l-2.5582-4.2587-2.5927 4.2587h-6.0929z"></path></svg>

After

Width:  |  Height:  |  Size: 1.8 KiB

View File

@ -87,6 +87,7 @@ export enum Providers {
Sambanova = "Sambanova",
SAP = "SAP Generative AI Hub",
Snowflake = "Snowflake",
Soniox = "Soniox",
TEXT_COMPLETION_CODESTRAL = "Text-Completion-Codestral",
TogetherAI = "TogetherAI",
TOPAZ = "Topaz",
@ -195,6 +196,7 @@ export const provider_map: Record<string, string> = {
Sambanova: "sambanova",
SAP: "sap",
Snowflake: "snowflake",
Soniox: "soniox",
TEXT_COMPLETION_CODESTRAL: "text-completion-codestral",
TogetherAI: "together_ai",
TOPAZ: "topaz",
@ -286,6 +288,7 @@ export const providerLogoMap: Record<string, string> = {
[Providers.Sambanova]: `${asset_logos_folder}sambanova.svg`,
[Providers.SAP]: `${asset_logos_folder}sap.png`,
[Providers.Snowflake]: `${asset_logos_folder}snowflake.svg`,
[Providers.Soniox]: `${asset_logos_folder}soniox.svg`,
[Providers.TEXT_COMPLETION_CODESTRAL]: `${asset_logos_folder}mistral.svg`,
[Providers.TogetherAI]: `${asset_logos_folder}togetherai.svg`,
[Providers.TOPAZ]: `${asset_logos_folder}topaz.svg`,