litellm/litellm/__init__.py
Sameer Kankute d671a09c20
Litellm oss staging 050626 (#29774)
* Mark xAI models retiring on 2026-05-15 (#28788)

Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is
retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3
with various reasoning efforts; callers continuing to use the old slugs
will be billed at grok-4.3 pricing):

  grok-4-1-fast-reasoning{,-latest}      -> grok-4.3 (low effort)
  grok-4-1-fast-non-reasoning{,-latest}  -> grok-4.3 (none)
  grok-4-fast-reasoning                  -> grok-4.3 (low effort)
  grok-4-fast-non-reasoning              -> grok-4.3 (none)
  grok-4-0709                            -> grok-4.3 (low effort)
  grok-code-fast-1{,-0825}               -> grok-build-0.1
  grok-3                                 -> grok-4.3 (none)

Only the direct xai/ slugs are tagged; third-party hosts (azure_ai,
oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The
grok-3 retirement list explicitly names only the base grok-3 slug — the
-mini / -fast / -beta / -latest variants are not listed, so they remain
untouched.

* feat(moonshot): advertise json_schema response support on live models (#29683)

litellm.responses() already routes Moonshot through the responses->chat-completions
bridge, and Moonshot honors response_format json_schema on chat completions. The
cost-map entries left supports_response_schema unset, so discovery layers that gate
on that flag dropped Moonshot from structured-output / responses listings even though
the capability works end to end.

Set supports_response_schema on the nine models currently live on api.moonshot.ai:
kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants,
and moonshot-v1-auto. Verified against the live API that each honors json_schema and
that litellm.responses() returns schema-valid structured output through the bridge.

* chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685)

Thirteen Moonshot/Kimi models in the cost map no longer resolve on
api.moonshot.ai (all return 404). Stamp each with its deprecation_date from
platform.kimi.ai/docs/models rather than deleting the entries, so historical
cost calculation keeps resolving the names while tooling can surface the
retirement.

Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context
variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the
moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot
publishes no discontinuation date for them).

* fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687)

Kimi reasoning models reject every temperature except 1; a request with
temperature=0.2 returns "invalid temperature: only 1 is allowed for this model".
litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd.

Drop the temperature param entirely for reasoning models (gated on
supports_reasoning, the same signal transform_request already uses) so the model
default is used; the non-reasoning moonshot-v1 models keep the existing clamp.

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(mcp): add per-server timeout configuration (#29672)

* feat(mcp): add per-server timeout configuration

* fix(mcp): address timeout field review comments

- use is not None guard instead of or for 0.0 edge case
- copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table)
- add timeout Float? column to all three schema.prisma files
- extend round-trip test to cover _build_mcp_server_table direction
- add test for zero timeout not treated as falsy

* fix(mcp): forward timeout in _build_temporary_mcp_server_record

* fix(mcp): return 504 instead of 500 when per-server timeout fires

* test(mcp): add 504 timeout regression test; fix black formatting

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7

AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the
existing us./eu./au./global. profiles for Claude Opus 4.7
(ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is
missing from model_prices_and_context_window.json. Tokyo-region
users currently get an "unknown model" error when routing through
the JP geo profile.

Adds the entry to both the canonical file and the bundled backup,
mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches
the other regional profiles (10% premium over base/global).

Regression test pins all six documented profiles (base, global, us, eu,
au, jp) and asserts pricing parity between jp. and au. variants.

Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(soniox): add soniox audio transcription integration (#29508)

* feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650)

The OpenMeter callback resolves the CloudEvent subject from kwargs["user"]
first, then falls back to the key-bound user_api_key_user_id. For
multi-tenant proxy deployments, a client can set `"user": "..."` in the
request body and cause their usage to be attributed to that arbitrary
string — a billing-attribution forgery risk.

Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward
compatibility). When set to "false", the request-supplied `user` field is
ignored and the subject is resolved solely from user_api_key_user_id.

Matches the existing env-var-driven config pattern in this file
(OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE).

* feat(search): add you_com as a search provider (#28370)

* feat(search): add you_com as a search provider

Registers You.com Search API as a first-class `search_provider` in the
`search_tools` registry, alongside Tavily, Exa, Perplexity, etc.

- New adapter: litellm/llms/you_com/search/transformation.py
  - POSTs to https://ydc-index.io/v1/search
  - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key)
  - Maps Perplexity unified spec: max_results -> count,
    search_domain_filter -> include_domains, country -> country
  - Flattens results.web + results.news into a single SearchResult list;
    snippet prefers snippets[0], falls back to description; page_age -> date
- Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired
  into ProviderConfigManager.get_provider_search_config()
- Pricing entry: model_prices_and_context_window.json (placeholder $0.0;
  happy to adjust to maintainers' preferred public number)
- Docs: example router config snippet and example proxy yaml updated
- Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests
  (payload shape, domain filter mapping, snippet fallback, news flattening,
  missing-api-key error)

Refs upstream expansion signal: #15942

* review fixups: normalize api_base, lowercase country, scope env-var to test

Addresses Greptile inline review comments on #28370:

- get_complete_url: strip trailing slashes from api_base *before* the
  endswith("/v1/search") check, so a custom base like ".../v1/search/"
  doesn't become ".../v1/search/v1/search".
- transform_search_request: .lower() country before sending, matching
  Tavily's convention so callers using the unified spec form ("US") get
  consistent behavior across providers.
- Tests: replace direct os.environ writes with an autouse monkeypatch
  fixture so YOUCOM_API_KEY is set per-test and removed afterwards.
  The missing-key test now uses monkeypatch.delenv. New test asserts the
  trailing-slash normalization above.

Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note
that documentation changes belong in the litellm-docs repo.

* support keyless free tier (api.you.com/v1/agents/search) as default

You.com offers an IP-throttled keyless endpoint that returns the same
response shape as the keyed one (~100 queries/day, no signup). This is a
significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG
providers already in the search_tools registry.

Behavior:
- YOUCOM_API_KEY set        -> keyed:  POST https://ydc-index.io/v1/search
                                       (X-API-Key header)
- no key                    -> free:   POST https://api.you.com/v1/agents/search
                                       (no auth)
- YOUCOM_API_BASE override  -> honored as-is

Tests:
- New: test_you_com_search_keyless_free_tier - asserts URL + absence of
  X-API-Key when no key is configured.
- New: test_you_com_search_validate_environment_keyless - asserts the
  config no longer raises when the key is absent.
- Removed: test_you_com_search_raises_without_api_key (the precondition
  no longer holds).
- Existing payload/domain-filter/etc tests still cover keyed mode via
  the autouse YOUCOM_API_KEY fixture.

Verified both endpoints accept POST + return identical JSON shape:
  results.web[] / results.news[] with title, url, snippets, description,
  page_age.

* register you_com in provider_endpoints_support.json

Adding `litellm/llms/you_com/` requires a corresponding entry in
provider_endpoints_support.json or the
code-quality/check_provider_folders_documented CI check fails.

Follows the compact tavily/serper pattern - endpoints: { search: true }.
Local run of the check now reports "All 114 provider folders are documented".

* move tests under tests/test_litellm/llms/ so CI exercises them

The litellm CI workflows scope unit tests to `tests/test_litellm/...`
(see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so
tests living under `tests/search_tests/` are never run in CI - which is
why codecov reports 0% patch coverage for the new adapter even though
the unit tests exist and pass locally.

Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so
the test-unit-llm-providers job picks it up. 7/7 tests still pass at
the new location.

(Sibling search-only providers - tavily, exa_ai, brave, etc. - still
live only in `tests/search_tests/` and would benefit from the same
move, but that is out of scope for this PR.)

* fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug

The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises
Content-Encoding: gzip but returns a body that httpx's decoder rejects
with `zlib.error: Error -3 while decompressing data: incorrect header
check`, surfacing as litellm.APIConnectionError in user code. curl works
because it doesn't request compression by default.

Pin Accept-Encoding: identity in validate_environment so the upstream
server skips compression entirely. Harmless on the keyed endpoint
(ydc-index.io/v1/search) which negotiates content-encoding correctly.

The header uses setdefault so a caller-supplied Accept-Encoding still
takes precedence. (Server-side bug has been flagged to the You.com team
separately - once fixed there, this workaround can be removed.)

New unit test: test_you_com_search_pins_identity_accept_encoding.

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* docs: fix README typo (#29419)

Correct clear spelling mistakes in documentation without changing behavior.

Confidence: high
Scope-risk: narrow
Tested: git diff --check; uvx codespell on changed files
Not-tested: Full docs build not run; text-only changes

* Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480)

* fix(langfuse): pass ssl_verify to Langfuse httpx client

* fix_langfuse_

* add unit tests

* addressed comments

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(models): add minimax/MiniMax-M3 to model cost map (#29412)

Add MiniMax's new flagship MiniMax-M3 to the native minimax provider:
512K context, 128K max output, native multimodal (supports_vision),
reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output
2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so
cache_creation_input_token_cost is omitted.

Updated both the root model_prices_and_context_window.json (remote
source) and the bundled litellm/model_prices_and_context_window_backup.json
(local fallback), keeping them in sync.

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394)

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log

* fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation

* feat(provider): Add Neosantara provider as OpenAI Compatible (#29646)

* Add Neosantara provider

* Register Neosantara provider enum

* Address Neosantara provider review feedback

* Add Neosantara packaged endpoint support

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix: address greptile and veria review feedback

- langfuse: guard httpx_client injection behind version check (>= 2.7.3)
- soniox: propagate audio_transcription_duration in _hidden_params for spend tracking
- soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base
- mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError

* chore(mcp): add migration for per-server timeout column

* fix(test): add tool_use_system_prompt_tokens to model prices schema validator

* fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key

* fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs

The search flow resolves api_key in validate_environment but never passed it
into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the
env) set the X-API-Key header yet still selected the keyless free-tier endpoint.
Forward api_key through both the search entrypoint and the http handler so the
keyed endpoint is chosen.

HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox
poll and transcript-fetch GETs silently used the client global default instead
of the caller timeout. Add a per-request timeout to get() and forward the
configured timeout from the Soniox handler.

* fix(soniox): price stt-async-v4 per second so transcriptions are billed

The handler stores audio_transcription_duration in _hidden_params, but the
model carried only token cost fields and the response has no token usage, so
the transcription cost path fell through to cost_per_second and returned $0.
An authenticated caller could transcribe Soniox audio without decrementing
their budget. Switch the entry to output_cost_per_second at Soniox's published
$0.10/hour async rate so the stored duration produces a real charge.

* fix(langfuse): use a dedicated httpx client for the SDK injection

The httpx_client handed to the Langfuse SDK came from _get_httpx_client(),
which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that
client on teardown it would invalidate the shared client used by every other
LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL
verification and client certificate from LiteLLM's configuration.

* fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var

* fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779)

* fix(cohere): support max_completion_tokens on cohere v2 chat

The default cohere_chat route resolves to CohereV2ChatConfig, which did not
list or map max_completion_tokens, so get_optional_params raised
UnsupportedParamsError for the standard OpenAI parameter (the modern
replacement for the deprecated max_tokens). The v1 config already maps it to
cohere's max_tokens; mirror that in v2 and add v2 regression tests.

* fix(cohere): make max_completion_tokens take precedence over max_tokens on v2

When both max_tokens and max_completion_tokens are supplied, prefer
max_completion_tokens explicitly rather than relying on dict iteration order,
and cover both orderings with a regression test.

---------

Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com>
Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Dan Lemon <dan@danlemon.com>
Co-authored-by: Saswat <saswatds@users.noreply.github.com>
Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com>
Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com>
Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: kape <168134658+kapelame@users.noreply.github.com>
Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com>
Co-authored-by: Just R <remixingmagelang@gmail.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
2026-06-05 13:51:51 -07:00

2296 lines
88 KiB
Python

### Hide pydantic namespace conflict warnings globally ###
from __future__ import annotations
import warnings
warnings.filterwarnings("ignore", message=".*conflict with protected namespace.*")
# Suppress Pydantic 2.11+ deprecation warning about accessing model_fields on instances
# This warning can accumulate during streaming and cause memory leaks
warnings.filterwarnings(
"ignore", message=".*Accessing the.*attribute on the instance is deprecated.*"
)
### INIT VARIABLES #########################
import threading
import os
# Load .env before any other litellm imports so env vars (e.g. LITELLM_UI_SESSION_DURATION) are available
import dotenv as _dotenv
if os.getenv("LITELLM_MODE", "DEV") == "DEV":
_dotenv.load_dotenv()
from typing import (
Callable,
List,
Optional,
Dict,
Union,
Any,
Literal,
get_args,
TYPE_CHECKING,
Tuple,
overload,
Type,
)
from litellm.types.integrations.datadog import DatadogInitParams
from litellm._logging import (
set_verbose,
_turn_on_debug,
verbose_logger,
json_logs,
_turn_on_json,
log_level,
)
import re
from litellm.constants import (
DEFAULT_BATCH_SIZE,
DEFAULT_FLUSH_INTERVAL_SECONDS,
ROUTER_MAX_FALLBACKS,
DEFAULT_MAX_RETRIES,
DEFAULT_REPLICATE_POLLING_RETRIES,
DEFAULT_REPLICATE_POLLING_DELAY_SECONDS,
LITELLM_CHAT_PROVIDERS,
HUMANLOOP_PROMPT_CACHE_TTL_SECONDS,
OPENAI_CHAT_COMPLETION_PARAMS,
OPENAI_CHAT_COMPLETION_PARAMS as _openai_completion_params, # backwards compatibility
OPENAI_FINISH_REASONS,
OPENAI_FINISH_REASONS as _openai_finish_reasons, # backwards compatibility
openai_compatible_endpoints,
openai_compatible_providers,
openai_text_completion_compatible_providers,
_openai_like_providers,
replicate_models,
clarifai_models,
huggingface_models,
empower_models,
together_ai_models,
baseten_models,
WANDB_MODELS,
REPEATED_STREAMING_CHUNK_LIMIT,
request_timeout,
open_ai_embedding_models,
cohere_embedding_models,
bedrock_embedding_models,
known_tokenizer_config,
BEDROCK_INVOKE_PROVIDERS_LITERAL,
BEDROCK_EMBEDDING_PROVIDERS_LITERAL,
BEDROCK_CONVERSE_MODELS,
DEFAULT_MAX_TOKENS,
DEFAULT_SOFT_BUDGET,
DEFAULT_ALLOWED_FAILS,
)
import httpx
# register_async_client_cleanup is lazy-loaded and called on first access
litellm_mode = os.getenv("LITELLM_MODE", "DEV") # "PRODUCTION", "DEV"
####################################################
if set_verbose:
_turn_on_debug()
####################################################
### Callbacks /Logging / Success / Failure Handlers #####
CALLBACK_TYPES = Union[str, Callable, "CustomLogger"] # CustomLogger is lazy-loaded
input_callback: List[CALLBACK_TYPES] = []
success_callback: List[CALLBACK_TYPES] = []
failure_callback: List[CALLBACK_TYPES] = []
service_callback: List[CALLBACK_TYPES] = []
audit_log_callbacks: List[CALLBACK_TYPES] = []
# logging_callback_manager is lazy-loaded via __getattr__
_custom_logger_compatible_callbacks_literal = Literal[
"lago",
"openmeter",
"logfire",
"literalai",
"litellm_agent",
"dynamic_rate_limiter",
"dynamic_rate_limiter_v3",
"langsmith",
"prometheus",
"otel",
"datadog",
"datadog_metrics",
"datadog_llm_observability",
"galileo",
"braintrust",
"arize",
"arize_phoenix",
"langtrace",
"gcs_bucket",
"azure_storage",
"opik",
"argilla",
"mlflow",
"langfuse",
"langfuse_otel",
"weave_otel",
"pagerduty",
"humanloop",
"azure_sentinel",
"gcs_pubsub",
"agentops",
"anthropic_cache_control_hook",
"generic_api",
"resend_email",
"sendgrid_email",
"smtp_email",
"deepeval",
"s3_v2",
"aws_sqs",
"vector_store_pre_call_hook",
"dotprompt",
"bitbucket",
"gitlab",
"cloudzero",
"focus",
"vantage",
"posthog",
"levo",
"compression_interception",
]
cold_storage_custom_logger: Optional[_custom_logger_compatible_callbacks_literal] = None
logged_real_time_event_types: Optional[Union[List[str], Literal["*"]]] = None
_known_custom_logger_compatible_callbacks: List = list(
get_args(_custom_logger_compatible_callbacks_literal)
)
callbacks: List[
Union[
Callable, _custom_logger_compatible_callbacks_literal, "CustomLogger"
] # CustomLogger is lazy-loaded
] = []
callback_settings: Dict[str, Dict[str, Any]] = {}
initialized_langfuse_clients: int = 0
langfuse_default_tags: Optional[List[str]] = None
langsmith_batch_size: Optional[int] = None
prometheus_initialize_budget_metrics: Optional[bool] = False
prometheus_latency_buckets: Optional[List[float]] = None
require_auth_for_metrics_endpoint: Optional[bool] = True
argilla_batch_size: Optional[int] = None
datadog_use_v1: Optional[bool] = False # if you want to use v1 datadog logged payload.
gcs_pub_sub_use_v1: Optional[bool] = (
False # if you want to use v1 gcs pubsub logged payload
)
generic_api_use_v1: Optional[bool] = (
False # if you want to use v1 generic api logged payload
)
argilla_transformation_object: Optional[Dict[str, Any]] = None
_async_input_callback: List[
Union[str, Callable, "CustomLogger"]
] = ( # CustomLogger is lazy-loaded
[]
) # internal variable - async custom callbacks are routed here.
_async_success_callback: List[
Union[str, Callable, "CustomLogger"]
] = ( # CustomLogger is lazy-loaded
[]
) # internal variable - async custom callbacks are routed here.
_async_failure_callback: List[
Union[str, Callable, "CustomLogger"]
] = ( # CustomLogger is lazy-loaded
[]
) # internal variable - async custom callbacks are routed here.
pre_call_rules: List[Callable] = []
post_call_rules: List[Callable] = []
turn_off_message_logging: Optional[bool] = False
standard_logging_payload_excluded_fields: Optional[List[str]] = (
None # Fields to exclude from StandardLoggingPayload before callbacks receive it
)
log_raw_request_response: bool = False
redact_messages_in_exceptions: Optional[bool] = False
redact_user_api_key_info: Optional[bool] = False
filter_invalid_headers: Optional[bool] = False
add_user_information_to_llm_headers: Optional[bool] = (
None # adds user_id, team_id, token hash (params from StandardLoggingMetadata) to request headers
)
store_audit_logs = False # Enterprise feature, allow users to see audit logs
skip_system_message_in_guardrail: bool = False
skip_tool_message_in_guardrail: bool = False
### end of callbacks #############
email: Optional[str] = (
None # Not used anymore, will be removed in next MAJOR release - https://github.com/BerriAI/litellm/discussions/648
)
token: Optional[str] = (
None # Not used anymore, will be removed in next MAJOR release - https://github.com/BerriAI/litellm/discussions/648
)
telemetry = True
max_tokens: int = DEFAULT_MAX_TOKENS # OpenAI Defaults
drop_params = bool(os.getenv("LITELLM_DROP_PARAMS", False))
modify_params = bool(os.getenv("LITELLM_MODIFY_PARAMS", False))
use_chat_completions_url_for_anthropic_messages: bool = bool(
os.getenv("LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES", False)
) # When True, routes OpenAI /v1/messages requests to chat/completions instead of the Responses API
route_all_chat_openai_to_responses: bool = (
os.getenv("LITELLM_ROUTE_ALL_CHAT_OPENAI_TO_RESPONSES", "false").lower() == "true"
) # When True, routes all OpenAI /chat/completions requests through the Responses API bridge
# When True, Gemini/Vertex Live setup is deferred until client `session.update`.
# Default False preserves historical behavior (auto-send setup on connect).
gemini_live_defer_setup: bool = (
os.getenv("LITELLM_GEMINI_LIVE_DEFER_SETUP", "false").lower() == "true"
)
use_legacy_interactions_schema: bool = (
os.getenv("LITELLM_USE_LEGACY_INTERACTIONS_SCHEMA", "false").lower() == "true"
) # When True, sends Api-Revision: 2026-05-07 to Google so responses use the legacy `outputs`
# schema instead of the new `steps` schema. Remove this flag after June 8, 2026.
retry = True
### AUTH ###
api_key: Optional[str] = None
openai_key: Optional[str] = None
groq_key: Optional[str] = None
gigachat_key: Optional[str] = None
xai_key: Optional[str] = None
databricks_key: Optional[str] = None
openai_like_key: Optional[str] = None
azure_key: Optional[str] = None
anthropic_key: Optional[str] = None
replicate_key: Optional[str] = None
bytez_key: Optional[str] = None
cohere_key: Optional[str] = None
infinity_key: Optional[str] = None
clarifai_key: Optional[str] = None
maritalk_key: Optional[str] = None
ai21_key: Optional[str] = None
ollama_key: Optional[str] = None
openrouter_key: Optional[str] = None
datarobot_key: Optional[str] = None
predibase_key: Optional[str] = None
huggingface_key: Optional[str] = None
vertex_project: Optional[str] = None
vertex_location: Optional[str] = None
predibase_tenant_id: Optional[str] = None
togetherai_api_key: Optional[str] = None
cloudflare_api_key: Optional[str] = None
vercel_ai_gateway_key: Optional[str] = None
baseten_key: Optional[str] = None
llama_api_key: Optional[str] = None
aleph_alpha_key: Optional[str] = None
nlp_cloud_key: Optional[str] = None
novita_api_key: Optional[str] = None
snowflake_key: Optional[str] = None
gradient_ai_api_key: Optional[str] = None
nebius_key: Optional[str] = None
wandb_key: Optional[str] = None
heroku_key: Optional[str] = None
cometapi_key: Optional[str] = None
ovhcloud_key: Optional[str] = None
lemonade_key: Optional[str] = None
sap_service_key: Optional[str] = None
amazon_nova_api_key: Optional[str] = None
inception_key: Optional[str] = None
common_cloud_provider_auth_params: dict = {
"params": ["project", "region_name", "token"],
"providers": ["vertex_ai", "bedrock", "watsonx", "azure", "vertex_ai_beta"],
}
use_litellm_proxy: bool = (
False # when True, requests will be sent to the specified litellm proxy endpoint
)
use_client: bool = False
ssl_verify: Union[str, bool] = True
ssl_security_level: Optional[str] = None
ssl_certificate: Optional[str] = None
user_url_validation: bool = True
user_url_allowed_hosts: List[str] = []
provider_url_destination_allowed_hosts: List[str] = []
ssl_ecdh_curve: Optional[str] = (
None # Set to 'X25519' to disable PQC and improve performance
)
disable_streaming_logging: bool = False
disable_token_counter: bool = False
disable_add_transform_inline_image_block: bool = False
disable_add_user_agent_to_request_tags: bool = False
disable_anthropic_gemini_context_caching_transform: bool = False
disable_vertex_batch_output_transformation: bool = False
extra_spend_tag_headers: Optional[List[str]] = None
in_memory_llm_clients_cache: "LLMClientCache"
safe_memory_mode: bool = False
enable_azure_ad_token_refresh: Optional[bool] = False
# Proxy Authentication - auto-obtain/refresh OAuth2/JWT tokens for LiteLLM Proxy
proxy_auth: Optional[Any] = None
### DEFAULT AZURE API VERSION ###
AZURE_DEFAULT_API_VERSION = "2025-02-01-preview" # this is updated to the latest
### DEFAULT WATSONX API VERSION ###
WATSONX_DEFAULT_API_VERSION = "2024-03-13"
### COHERE EMBEDDINGS DEFAULT TYPE ###
COHERE_DEFAULT_EMBEDDING_INPUT_TYPE: "COHERE_EMBEDDING_INPUT_TYPES" = "search_document"
### CREDENTIALS ###
credential_list: List["CredentialItem"] = []
### GUARDRAILS ###
llamaguard_model_name: Optional[str] = None
openai_moderations_model_name: Optional[str] = None
presidio_ad_hoc_recognizers: Optional[str] = None
google_moderation_confidence_threshold: Optional[float] = None
llamaguard_unsafe_content_categories: Optional[str] = None
blocked_user_list: Optional[Union[str, List]] = None
banned_keywords_list: Optional[Union[str, List]] = None
llm_guard_mode: Literal["all", "key-specific", "request-specific"] = "all"
guardrail_name_config_map: Dict[str, GuardrailItem] = {}
include_cost_in_streaming_usage: bool = False
reasoning_auto_summary: bool = False
### PROMPTS ####
from litellm.types.prompts.init_prompts import PromptSpec
prompt_name_config_map: Dict[str, PromptSpec] = {}
##################
### PREVIEW FEATURES ###
enable_preview_features: bool = False
return_response_headers: bool = (
False # get response headers from LLM Api providers - example x-remaining-requests,
)
enable_json_schema_validation: bool = False
enable_model_config_credential_overrides: bool = False
enable_key_alias_format_validation: bool = (
False # opt-in validation of key_alias format on /key/generate and /key/update
)
enable_gemini_default_thinking_level_low: bool = (
False # opt-in: force thinkingLevel low/minimal for Gemini 3 thinking param mapping
)
####################
logging: bool = True
enable_loadbalancing_on_batch_endpoints: Optional[bool] = None
enable_caching_on_provider_specific_optional_params: bool = (
False # feature-flag for caching on optional params - e.g. 'top_k'
)
caching: bool = (
False # Not used anymore, will be removed in next MAJOR release - https://github.com/BerriAI/litellm/discussions/648
)
caching_with_models: bool = (
False # # Not used anymore, will be removed in next MAJOR release - https://github.com/BerriAI/litellm/discussions/648
)
cache: Optional["Cache"] = (
None # cache object <- use this - https://docs.litellm.ai/docs/caching
)
default_in_memory_ttl: Optional[float] = None
default_redis_ttl: Optional[float] = None
default_redis_batch_cache_expiry: Optional[float] = None
model_alias_map: Dict[str, str] = {}
model_group_settings: Optional["ModelGroupSettings"] = None
max_budget: float = 0.0 # set the max budget across all providers
budget_duration: Optional[str] = (
None # proxy only - resets budget after fixed duration. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").
)
default_soft_budget: float = (
DEFAULT_SOFT_BUDGET # by default all litellm proxy keys have a soft budget of 50.0
)
forward_traceparent_to_llm_provider: bool = False
_current_cost = 0.0 # private variable, used if max budget is set
error_logs: Dict = {}
add_function_to_prompt: bool = (
False # if function calling not supported by api, append function call details to system prompt
)
client_session: Optional[httpx.Client] = None
aclient_session: Optional[httpx.AsyncClient] = None
model_fallbacks: Optional[List] = None # Deprecated for 'litellm.fallbacks'
model_cost_map_url: str = os.getenv(
"LITELLM_MODEL_COST_MAP_URL",
"https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json",
)
blog_posts_url: str = os.getenv(
"LITELLM_BLOG_POSTS_URL",
"https://docs.litellm.ai/blog/rss.xml",
)
anthropic_beta_headers_url: str = os.getenv(
"LITELLM_ANTHROPIC_BETA_HEADERS_URL",
"https://raw.githubusercontent.com/BerriAI/litellm/main/litellm/anthropic_beta_headers_config.json",
)
suppress_debug_info = False
dynamodb_table_name: Optional[str] = None
s3_callback_params: Optional[Dict] = None
s3_audit_callback_params: Optional[Dict] = None
datadog_llm_observability_params: Optional[Union[DatadogLLMObsInitParams, Dict]] = None
datadog_params: Optional[Union[DatadogInitParams, Dict]] = None
aws_sqs_callback_params: Optional[Dict] = None
generic_logger_headers: Optional[Dict] = None
default_key_generate_params: Optional[Dict] = None
default_key_max_budget_alert_emails: Optional[Dict[str, list]] = None
upperbound_key_generate_params: Optional[LiteLLM_UpperboundKeyGenerateParams] = None
key_generation_settings: Optional["StandardKeyGenerationConfig"] = None
default_internal_user_params: Optional[Dict] = None
default_team_params: Optional[Union[DefaultTeamSSOParams, Dict]] = None
default_team_settings: Optional[List] = None
max_user_budget: Optional[float] = None
default_max_internal_user_budget: Optional[float] = None
max_internal_user_budget: Optional[float] = None
max_ui_session_budget: Optional[float] = 0.25 # $0.25 USD budgets for UI Chat sessions
internal_user_budget_duration: Optional[str] = None
tag_budget_config: Optional[Dict[str, "BudgetConfig"]] = None
max_end_user_budget: Optional[float] = None
max_end_user_budget_id: Optional[str] = None
# When True, end-user IDs extracted from requests are validated against
# LiteLLM_EndUserTable / LiteLLM_UserTable. Values that do not resolve to a
# known row are dropped before reaching spend logs. Defaults to False for
# backwards compatibility — arbitrary client-supplied identifiers still
# pass through unchanged.
validate_end_user_id_in_db: bool = False
disable_end_user_cost_tracking: Optional[bool] = None
disable_end_user_cost_tracking_prometheus_only: Optional[bool] = None
enable_end_user_cost_tracking_prometheus_only: Optional[bool] = None
custom_prometheus_metadata_labels: List[str] = []
custom_prometheus_tags: List[str] = []
prometheus_metrics_config: Optional[List] = None
prometheus_emit_stream_label: bool = False
prometheus_user_budget_label_include_email_alias: bool = False
prometheus_end_user_metrics_max_series_per_metric: Optional[int] = 10000
prometheus_end_user_metrics_ttl_seconds: Optional[float] = 3600.0
prometheus_end_user_metrics_cleanup_interval_seconds: Optional[float] = 60.0
disable_add_prefix_to_prompt: bool = (
False # used by anthropic, to disable adding prefix to prompt
)
disable_copilot_system_to_assistant: bool = (
False # If false (default), converts all 'system' role messages to 'assistant' for GitHub Copilot compatibility. Set to true to disable this behavior.
)
public_mcp_servers: Optional[List[str]] = None
public_mcp_hub_strict_whitelist: bool = True
public_model_groups: Optional[List[str]] = None
public_agent_groups: Optional[List[str]] = None
# Supports both old format (Dict[str, str]) and new format (Dict[str, Dict[str, Any]])
# New format: { "displayName": { "url": "...", "index": 0 } }
# Old format: { "displayName": "url" } (for backward compatibility)
public_model_groups_links: Dict[str, Union[str, Dict[str, Any]]] = {}
#### REQUEST PRIORITIZATION #######
priority_reservation: Optional[Dict[str, Union[float, "PriorityReservationDict"]]] = (
None
)
# priority_reservation_settings is lazy-loaded via __getattr__
# Only declare for type checking - at runtime __getattr__ handles it
if TYPE_CHECKING:
priority_reservation_settings: Optional["PriorityReservationSettings"] = None
######## Networking Settings ########
use_aiohttp_transport: bool = (
True # Older variable, aiohttp is now the default. use disable_aiohttp_transport instead.
)
aiohttp_trust_env: bool = False # set to true to use HTTP_ Proxy settings
disable_aiohttp_transport: bool = False # Set this to true to use httpx instead
disable_aiohttp_trust_env: bool = (
False # When False, aiohttp will respect HTTP(S)_PROXY env vars
)
force_ipv4: bool = (
False # when True, litellm will force ipv4 for all LLM requests. Some users have seen httpx ConnectionError when using ipv6.
)
network_mock: bool = False # When True, use mock transport — no real network calls
####### STOP SEQUENCE LIMIT #######
disable_stop_sequence_limit: bool = False # when True, stop sequence limit is disabled
#### RETRIES ####
num_retries: Optional[int] = None # per model endpoint
max_fallbacks: Optional[int] = None
default_fallbacks: Optional[List] = None
fallbacks: Optional[List] = None
context_window_fallbacks: Optional[List] = None
content_policy_fallbacks: Optional[List] = None
allowed_fails: int = 3
allow_dynamic_callback_disabling: bool = True
num_retries_per_request: Optional[int] = (
None # for the request overall (incl. fallbacks + model retries)
)
####### SECRET MANAGERS #####################
secret_manager_client: Optional[Any] = (
None # list of instantiated key management clients - e.g. azure kv, infisical, etc.
)
_google_kms_resource_name: Optional[str] = None
_key_management_system: Optional["KeyManagementSystem"] = None
# Note: KeyManagementSettings must be eagerly imported because _key_management_settings
# is accessed during import time in secret_managers/main.py
# We'll import it after the lazy import system is set up
# We can't define it here because KeyManagementSettings is lazy-loaded
#### PII MASKING ####
output_parse_pii: bool = False
#############################################
from litellm.litellm_core_utils.get_model_cost_map import get_model_cost_map
model_cost = get_model_cost_map(url=model_cost_map_url)
cost_discount_config: Dict[str, float] = (
{}
) # Provider-specific cost discounts {"vertex_ai": 0.05} = 5% discount
cost_margin_config: Dict[str, Union[float, Dict[str, float]]] = (
{}
) # Provider-specific or global cost margins. Examples:
# Percentage: {"openai": 0.10} = 10% margin
# Fixed: {"openai": {"fixed_amount": 0.001}} = $0.001 per request
# Global: {"global": 0.05} = 5% global margin on all providers
# Combined: {"vertex_ai": {"percentage": 0.08, "fixed_amount": 0.0005}}
custom_prompt_dict: Dict[str, dict] = {}
check_provider_endpoint = False
####### THREAD-SPECIFIC DATA ####################
class MyLocal(threading.local):
def __init__(self):
self.user = "Hello World"
_thread_context = MyLocal()
def identify(event_details):
# Store user in thread local data
if "user" in event_details:
_thread_context.user = event_details["user"]
####### ADDITIONAL PARAMS ################### configurable params if you use proxy models like Helicone, map spend to org id, etc.
api_base: Optional[str] = None
headers = None
api_version: Optional[str] = None
organization = None
project = None
config_path = None
vertex_ai_safety_settings: Optional[dict] = None
####### COMPLETION MODELS ###################
from typing import Set
open_ai_chat_completion_models: Set = set()
open_ai_text_completion_models: Set = set()
cohere_models: Set = set()
cohere_chat_models: Set = set()
mistral_chat_models: Set = set()
text_completion_codestral_models: Set = set()
text_completion_inception_models: Set = set()
anthropic_models: Set = set()
openrouter_models: Set = set()
datarobot_models: Set = set()
vertex_language_models: Set = set()
vertex_vision_models: Set = set()
vertex_chat_models: Set = set()
vertex_code_chat_models: Set = set()
vertex_ai_image_models: Set = set()
vertex_ai_video_models: Set = set()
vertex_text_models: Set = set()
vertex_code_text_models: Set = set()
vertex_embedding_models: Set = set()
vertex_anthropic_models: Set = set()
vertex_llama3_models: Set = set()
vertex_deepseek_models: Set = set()
vertex_ai_ai21_models: Set = set()
vertex_mistral_models: Set = set()
vertex_openai_models: Set = set()
vertex_minimax_models: Set = set()
vertex_moonshot_models: Set = set()
vertex_zai_models: Set = set()
ai21_models: Set = set()
ai21_chat_models: Set = set()
nlp_cloud_models: Set = set()
aleph_alpha_models: Set = set()
bedrock_models: Set = set()
bedrock_converse_models: Set = set(BEDROCK_CONVERSE_MODELS)
fal_ai_models: Set = set()
fireworks_ai_models: Set = set()
fireworks_ai_embedding_models: Set = set()
deepinfra_models: Set = set()
perplexity_models: Set = set()
watsonx_models: Set = set()
gemini_models: Set = set()
xai_models: Set = set()
zai_models: Set = set()
deepseek_models: Set = set()
runwayml_models: Set = set()
azure_ai_models: Set = set()
jina_ai_models: Set = set()
voyage_models: Set = set()
infinity_models: Set = set()
heroku_models: Set = set()
databricks_models: Set = set()
cloudflare_models: Set = set()
codestral_models: Set = set()
friendliai_models: Set = set()
featherless_ai_models: Set = set()
palm_models: Set = set()
groq_models: Set = set()
azure_models: Set = set()
azure_anthropic_models: Set = set()
azure_text_models: Set = set()
anyscale_models: Set = set()
cerebras_models: Set = set()
galadriel_models: Set = set()
nvidia_nim_models: Set = set()
nvidia_riva_models: Set = set()
soniox_models: Set = set()
sambanova_models: Set = set()
sambanova_embedding_models: Set = set()
novita_models: Set = set()
assemblyai_models: Set = set()
snowflake_models: Set = set()
gradient_ai_models: Set = set()
llama_models: Set = set()
nscale_models: Set = set()
nebius_models: Set = set()
nebius_embedding_models: Set = set()
aiml_models: Set = set()
deepgram_models: Set = set()
elevenlabs_models: Set = set()
dashscope_models: Set = set()
moonshot_models: Set = set()
publicai_models: Set = set()
v0_models: Set = set()
morph_models: Set = set()
lambda_ai_models: Set = set()
inception_models: Set = set()
hyperbolic_models: Set = set()
black_forest_labs_models: Set = set()
recraft_models: Set = set()
cometapi_models: Set = set()
oci_models: Set = set()
vercel_ai_gateway_models: Set = set()
volcengine_models: Set = set()
wandb_models: Set = set(WANDB_MODELS)
ovhcloud_models: Set = set()
ovhcloud_embedding_models: Set = set()
lemonade_models: Set = set()
docker_model_runner_models: Set = set()
amazon_nova_models: Set = set()
stability_models: Set = set()
github_copilot_models: Set = set()
chatgpt_models: Set = set()
minimax_models: Set = set()
aws_polly_models: Set = set()
gigachat_models: Set = set()
llamagate_models: Set = set()
reducto_models: Set = set()
bedrock_mantle_models: Set = set()
def is_bedrock_pricing_only_model(key: str) -> bool:
"""
Excludes keys with the pattern 'bedrock/<region>/<model>'. These are in the model_prices_and_context_window.json file for pricing purposes only.
Args:
key (str): A key to filter.
Returns:
bool: True if the key matches the Bedrock pattern, False otherwise.
"""
# Regex to match 'bedrock/<region>/<model>'
bedrock_pattern = re.compile(r"^bedrock/[a-zA-Z0-9_-]+/.+$")
if "month-commitment" in key:
return True
is_match = bedrock_pattern.match(key)
return is_match is not None
def is_openai_finetune_model(key: str) -> bool:
"""
Excludes model cost keys with the pattern 'ft:<model>'. These are in the model_prices_and_context_window.json file for pricing purposes only.
Args:
key (str): A key to filter.
Returns:
bool: True if the key matches the OpenAI finetune pattern, False otherwise.
"""
return key.startswith("ft:") and not key.count(":") > 1
def add_known_models(model_cost_map: Optional[Dict] = None):
_map = model_cost_map if model_cost_map is not None else model_cost
for key, value in _map.items():
if value.get("litellm_provider") == "openai" and not is_openai_finetune_model(
key
):
open_ai_chat_completion_models.add(key)
elif value.get("litellm_provider") == "text-completion-openai":
open_ai_text_completion_models.add(key)
elif value.get("litellm_provider") == "azure_text":
azure_text_models.add(key)
elif value.get("litellm_provider") == "cohere":
cohere_models.add(key)
elif value.get("litellm_provider") == "cohere_chat":
cohere_chat_models.add(key)
elif value.get("litellm_provider") == "mistral":
mistral_chat_models.add(key)
elif value.get("litellm_provider") == "anthropic":
anthropic_models.add(key)
elif value.get("litellm_provider") == "empower":
empower_models.add(key)
elif value.get("litellm_provider") == "openrouter":
openrouter_models.add(key)
elif value.get("litellm_provider") == "vercel_ai_gateway":
vercel_ai_gateway_models.add(key)
elif value.get("litellm_provider") == "datarobot":
datarobot_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-text-models":
vertex_text_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-code-text-models":
vertex_code_text_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-language-models":
vertex_language_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-vision-models":
vertex_vision_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-chat-models":
vertex_chat_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-code-chat-models":
vertex_code_chat_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-embedding-models":
vertex_embedding_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-anthropic_models":
key = key.replace("vertex_ai/", "")
vertex_anthropic_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-llama_models":
key = key.replace("vertex_ai/", "")
vertex_llama3_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-deepseek_models":
key = key.replace("vertex_ai/", "")
vertex_deepseek_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-mistral_models":
key = key.replace("vertex_ai/", "")
vertex_mistral_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-ai21_models":
key = key.replace("vertex_ai/", "")
vertex_ai_ai21_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-image-models":
key = key.replace("vertex_ai/", "")
vertex_ai_image_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-video-models":
key = key.replace("vertex_ai/", "")
vertex_ai_video_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-openai_models":
key = key.replace("vertex_ai/", "")
vertex_openai_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-minimax_models":
key = key.replace("vertex_ai/", "")
vertex_minimax_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-moonshot_models":
key = key.replace("vertex_ai/", "")
vertex_moonshot_models.add(key)
elif value.get("litellm_provider") == "vertex_ai-zai_models":
key = key.replace("vertex_ai/", "")
vertex_zai_models.add(key)
elif value.get("litellm_provider") == "ai21":
if value.get("mode") == "chat":
ai21_chat_models.add(key)
else:
ai21_models.add(key)
elif value.get("litellm_provider") == "nlp_cloud":
nlp_cloud_models.add(key)
elif value.get("litellm_provider") == "aleph_alpha":
aleph_alpha_models.add(key)
elif value.get(
"litellm_provider"
) == "bedrock" and not is_bedrock_pricing_only_model(key):
bedrock_models.add(key)
elif value.get("litellm_provider") == "bedrock_converse":
bedrock_converse_models.add(key)
elif value.get("litellm_provider") == "deepinfra":
deepinfra_models.add(key)
elif value.get("litellm_provider") == "perplexity":
perplexity_models.add(key)
elif value.get("litellm_provider") == "watsonx":
watsonx_models.add(key)
elif value.get("litellm_provider") == "gemini":
gemini_models.add(key)
elif value.get("litellm_provider") == "fireworks_ai":
# ignore the 'up-to', '-to-' model names -> not real models. just for cost tracking based on model params.
if "-to-" not in key and "fireworks-ai-default" not in key:
fireworks_ai_models.add(key)
elif value.get("litellm_provider") == "fireworks_ai-embedding-models":
# ignore the 'up-to', '-to-' model names -> not real models. just for cost tracking based on model params.
if "-to-" not in key:
fireworks_ai_embedding_models.add(key)
elif value.get("litellm_provider") == "text-completion-codestral":
text_completion_codestral_models.add(key)
elif value.get("litellm_provider") == "text-completion-inception":
text_completion_inception_models.add(key)
elif value.get("litellm_provider") == "xai":
xai_models.add(key)
elif value.get("litellm_provider") == "zai":
zai_models.add(key)
elif value.get("litellm_provider") == "fal_ai":
fal_ai_models.add(key)
elif value.get("litellm_provider") == "deepseek":
deepseek_models.add(key)
elif value.get("litellm_provider") == "runwayml":
runwayml_models.add(key)
elif value.get("litellm_provider") == "meta_llama":
llama_models.add(key)
elif value.get("litellm_provider") == "nscale":
nscale_models.add(key)
elif value.get("litellm_provider") == "azure_ai":
azure_ai_models.add(key)
elif value.get("litellm_provider") == "voyage":
voyage_models.add(key)
elif value.get("litellm_provider") == "infinity":
infinity_models.add(key)
elif value.get("litellm_provider") == "databricks":
databricks_models.add(key)
elif value.get("litellm_provider") == "cloudflare":
cloudflare_models.add(key)
elif value.get("litellm_provider") == "codestral":
codestral_models.add(key)
elif value.get("litellm_provider") == "friendliai":
friendliai_models.add(key)
elif value.get("litellm_provider") == "palm":
palm_models.add(key)
elif value.get("litellm_provider") == "groq":
groq_models.add(key)
elif value.get("litellm_provider") == "azure":
azure_models.add(key)
elif value.get("litellm_provider") == "azure_anthropic":
azure_anthropic_models.add(key)
elif value.get("litellm_provider") == "anyscale":
anyscale_models.add(key)
elif value.get("litellm_provider") == "cerebras":
cerebras_models.add(key)
elif value.get("litellm_provider") == "galadriel":
galadriel_models.add(key)
elif value.get("litellm_provider") == "nvidia_nim":
nvidia_nim_models.add(key)
elif value.get("litellm_provider") == "nvidia_riva":
nvidia_riva_models.add(key)
elif value.get("litellm_provider") == "soniox":
soniox_models.add(key)
elif value.get("litellm_provider") == "sambanova":
sambanova_models.add(key)
elif value.get("litellm_provider") == "sambanova-embedding-models":
sambanova_embedding_models.add(key)
elif value.get("litellm_provider") == "novita":
novita_models.add(key)
elif value.get("litellm_provider") == "nebius-chat-models":
nebius_models.add(key)
elif value.get("litellm_provider") == "nebius-embedding-models":
nebius_embedding_models.add(key)
elif value.get("litellm_provider") == "aiml":
aiml_models.add(key)
elif value.get("litellm_provider") == "assemblyai":
assemblyai_models.add(key)
elif value.get("litellm_provider") == "jina_ai":
jina_ai_models.add(key)
elif value.get("litellm_provider") == "snowflake":
snowflake_models.add(key)
elif value.get("litellm_provider") == "gradient_ai":
gradient_ai_models.add(key)
elif value.get("litellm_provider") == "featherless_ai":
featherless_ai_models.add(key)
elif value.get("litellm_provider") == "deepgram":
deepgram_models.add(key)
elif value.get("litellm_provider") == "elevenlabs":
elevenlabs_models.add(key)
elif value.get("litellm_provider") == "heroku":
heroku_models.add(key)
elif value.get("litellm_provider") == "dashscope":
dashscope_models.add(key)
elif value.get("litellm_provider") == "moonshot":
moonshot_models.add(key)
elif value.get("litellm_provider") == "publicai":
publicai_models.add(key)
elif value.get("litellm_provider") == "v0":
v0_models.add(key)
elif value.get("litellm_provider") == "morph":
morph_models.add(key)
elif value.get("litellm_provider") == "lambda_ai":
lambda_ai_models.add(key)
elif value.get("litellm_provider") == "inception":
inception_models.add(key)
elif value.get("litellm_provider") == "hyperbolic":
hyperbolic_models.add(key)
elif value.get("litellm_provider") == "black_forest_labs":
black_forest_labs_models.add(key)
elif value.get("litellm_provider") == "recraft":
recraft_models.add(key)
elif value.get("litellm_provider") == "cometapi":
cometapi_models.add(key)
elif value.get("litellm_provider") == "oci":
oci_models.add(key)
elif value.get("litellm_provider") == "volcengine":
volcengine_models.add(key)
elif value.get("litellm_provider") == "wandb":
wandb_models.add(key)
elif value.get("litellm_provider") == "ovhcloud":
ovhcloud_models.add(key)
elif value.get("litellm_provider") == "ovhcloud-embedding-models":
ovhcloud_embedding_models.add(key)
elif value.get("litellm_provider") == "lemonade":
lemonade_models.add(key)
elif value.get("litellm_provider") == "docker_model_runner":
docker_model_runner_models.add(key)
elif value.get("litellm_provider") == "amazon_nova":
amazon_nova_models.add(key)
elif value.get("litellm_provider") == "stability":
stability_models.add(key)
elif value.get("litellm_provider") == "github_copilot":
github_copilot_models.add(key)
elif value.get("litellm_provider") == "chatgpt":
chatgpt_models.add(key)
elif value.get("litellm_provider") == "minimax":
minimax_models.add(key)
elif value.get("litellm_provider") == "aws_polly":
aws_polly_models.add(key)
elif value.get("litellm_provider") == "gigachat":
gigachat_models.add(key)
elif value.get("litellm_provider") == "llamagate":
llamagate_models.add(key)
elif value.get("litellm_provider") == "reducto":
reducto_models.add(key)
elif value.get("litellm_provider") == "bedrock_mantle":
bedrock_mantle_models.add(key)
add_known_models()
# known openai compatible endpoints - we'll eventually move this list to the model_prices_and_context_window.json dictionary
# this is maintained for Exception Mapping
# used for Cost Tracking & Token counting
# https://azure.microsoft.com/en-in/pricing/details/cognitive-services/openai-service/
# Azure returns gpt-35-turbo in their responses, we need to map this to azure/gpt-3.5-turbo for token counting
azure_llms = {
"gpt-35-turbo": "azure/gpt-35-turbo",
"gpt-35-turbo-16k": "azure/gpt-35-turbo-16k",
"gpt-35-turbo-instruct": "azure/gpt-35-turbo-instruct",
"azure/gpt-41": "gpt-4.1",
"azure/gpt-41-mini": "gpt-4.1-mini",
"azure/gpt-41-nano": "gpt-4.1-nano",
}
azure_embedding_models = {
"ada": "azure/ada",
}
petals_models = [
"petals-team/StableBeluga2",
]
ollama_models = ["llama2"]
maritalk_models = ["maritalk"]
model_list = list(
open_ai_chat_completion_models
| open_ai_text_completion_models
| cohere_models
| cohere_chat_models
| anthropic_models
| set(replicate_models)
| openrouter_models
| datarobot_models
| set(huggingface_models)
| vertex_chat_models
| vertex_text_models
| ai21_models
| ai21_chat_models
| set(together_ai_models)
| set(baseten_models)
| aleph_alpha_models
| nlp_cloud_models
| set(ollama_models)
| bedrock_models
| deepinfra_models
| perplexity_models
| set(maritalk_models)
| runwayml_models
| vertex_language_models
| watsonx_models
| gemini_models
| text_completion_codestral_models
| text_completion_inception_models
| xai_models
| zai_models
| fal_ai_models
| deepseek_models
| azure_ai_models
| voyage_models
| infinity_models
| databricks_models
| cloudflare_models
| codestral_models
| friendliai_models
| palm_models
| groq_models
| azure_models
| azure_anthropic_models
| anyscale_models
| cerebras_models
| galadriel_models
| nvidia_nim_models
| nvidia_riva_models
| soniox_models
| sambanova_models
| azure_text_models
| novita_models
| assemblyai_models
| jina_ai_models
| snowflake_models
| gradient_ai_models
| llama_models
| featherless_ai_models
| nscale_models
| deepgram_models
| elevenlabs_models
| dashscope_models
| moonshot_models
| publicai_models
| v0_models
| morph_models
| lambda_ai_models
| inception_models
| black_forest_labs_models
| recraft_models
| cometapi_models
| oci_models
| heroku_models
| vercel_ai_gateway_models
| volcengine_models
| wandb_models
| ovhcloud_models
| lemonade_models
| docker_model_runner_models
| reducto_models
| bedrock_mantle_models
| set(clarifai_models)
)
model_list_set = set(model_list)
# provider_list is lazy-loaded via __getattr__ to avoid importing LlmProviders at import time
models_by_provider: dict = {
"openai": open_ai_chat_completion_models | open_ai_text_completion_models,
"text-completion-openai": open_ai_text_completion_models,
"cohere": cohere_models | cohere_chat_models,
"cohere_chat": cohere_chat_models,
"anthropic": anthropic_models,
"replicate": replicate_models,
"huggingface": huggingface_models,
"together_ai": together_ai_models,
"baseten": baseten_models,
"openrouter": openrouter_models,
"vercel_ai_gateway": vercel_ai_gateway_models,
"datarobot": datarobot_models,
"vertex_ai": vertex_chat_models
| vertex_text_models
| vertex_anthropic_models
| vertex_vision_models
| vertex_language_models
| vertex_deepseek_models
| vertex_minimax_models
| vertex_moonshot_models
| vertex_zai_models,
"ai21": ai21_models,
"bedrock": bedrock_models | bedrock_converse_models,
"petals": petals_models,
"ollama": ollama_models,
"ollama_chat": ollama_models,
"deepinfra": deepinfra_models,
"perplexity": perplexity_models,
"maritalk": maritalk_models,
"watsonx": watsonx_models,
"gemini": gemini_models,
"fireworks_ai": fireworks_ai_models | fireworks_ai_embedding_models,
"aleph_alpha": aleph_alpha_models,
"text-completion-codestral": text_completion_codestral_models,
"text-completion-inception": text_completion_inception_models,
"xai": xai_models,
"zai": zai_models,
"fal_ai": fal_ai_models,
"deepseek": deepseek_models,
"runwayml": runwayml_models,
"mistral": mistral_chat_models,
"azure_ai": azure_ai_models,
"voyage": voyage_models,
"infinity": infinity_models,
"databricks": databricks_models,
"cloudflare": cloudflare_models,
"codestral": codestral_models,
"nlp_cloud": nlp_cloud_models,
"friendliai": friendliai_models,
"palm": palm_models,
"groq": groq_models,
"azure": azure_models | azure_text_models,
"azure_anthropic": azure_anthropic_models,
"azure_text": azure_text_models,
"anyscale": anyscale_models,
"cerebras": cerebras_models,
"galadriel": galadriel_models,
"nvidia_nim": nvidia_nim_models,
"nvidia_riva": nvidia_riva_models,
"soniox": soniox_models,
"sambanova": sambanova_models | sambanova_embedding_models,
"novita": novita_models,
"nebius": nebius_models | nebius_embedding_models,
"aiml": aiml_models,
"assemblyai": assemblyai_models,
"jina_ai": jina_ai_models,
"snowflake": snowflake_models,
"gradient_ai": gradient_ai_models,
"meta_llama": llama_models,
"nscale": nscale_models,
"featherless_ai": featherless_ai_models,
"deepgram": deepgram_models,
"elevenlabs": elevenlabs_models,
"heroku": heroku_models,
"dashscope": dashscope_models,
"moonshot": moonshot_models,
"publicai": publicai_models,
"v0": v0_models,
"morph": morph_models,
"lambda_ai": lambda_ai_models,
"inception": inception_models,
"hyperbolic": hyperbolic_models,
"black_forest_labs": black_forest_labs_models,
"recraft": recraft_models,
"cometapi": cometapi_models,
"oci": oci_models,
"volcengine": volcengine_models,
"wandb": wandb_models,
"ovhcloud": ovhcloud_models | ovhcloud_embedding_models,
"lemonade": lemonade_models,
"clarifai": clarifai_models,
"amazon_nova": amazon_nova_models,
"stability": stability_models,
"github_copilot": github_copilot_models,
"chatgpt": chatgpt_models,
"minimax": minimax_models,
"aws_polly": aws_polly_models,
"gigachat": gigachat_models,
"llamagate": llamagate_models,
"reducto": reducto_models,
"bedrock_mantle": bedrock_mantle_models,
}
# mapping for those models which have larger equivalents
longer_context_model_fallback_dict: dict = {
# openai chat completion models
"gpt-3.5-turbo": "gpt-3.5-turbo-16k",
"gpt-3.5-turbo-0301": "gpt-3.5-turbo-16k-0301",
"gpt-3.5-turbo-0613": "gpt-3.5-turbo-16k-0613",
"gpt-4": "gpt-4-32k",
"gpt-4-0314": "gpt-4-32k-0314",
"gpt-4-0613": "gpt-4-32k-0613",
# anthropic
"claude-instant-1": "claude-2",
"claude-instant-1.2": "claude-2",
# vertexai
"chat-bison": "chat-bison-32k",
"chat-bison@001": "chat-bison-32k",
"codechat-bison": "codechat-bison-32k",
"codechat-bison@001": "codechat-bison-32k",
# openrouter
"openrouter/openai/gpt-3.5-turbo": "openrouter/openai/gpt-3.5-turbo-16k",
"openrouter/anthropic/claude-instant-v1": "openrouter/anthropic/claude-2",
}
####### EMBEDDING MODELS ###################
all_embedding_models = (
open_ai_embedding_models
| set(cohere_embedding_models)
| set(bedrock_embedding_models)
| vertex_embedding_models
| fireworks_ai_embedding_models
| nebius_embedding_models
| sambanova_embedding_models
| ovhcloud_embedding_models
)
####### IMAGE GENERATION MODELS ###################
openai_image_generation_models = ["dall-e-2", "dall-e-3"]
####### VIDEO GENERATION MODELS ###################
openai_video_generation_models = ["sora-2"]
# timeout is lazy-loaded via __getattr__
# get_llm_provider is lazy-loaded via __getattr__
# remove_index_from_tool_calls is lazy-loaded via __getattr__
# Import KeyManagementSettings here (before utils import) because _key_management_settings
# is accessed during import time in secret_managers/main.py (via dd_tracing -> datadog -> _service_logger -> utils)
from litellm.types.secret_managers.main import KeyManagementSettings
_key_management_settings: KeyManagementSettings = KeyManagementSettings()
# client must be imported immediately as it's used as a decorator at function definition time
from .utils import client
# Note: Most other utils imports are lazy-loaded via __getattr__ to avoid loading utils.py
# (which imports tiktoken) at import time
from .llms.custom_llm import CustomLLM
from .llms.anthropic.common_utils import AnthropicModelInfo
from .llms.ai21.chat.transformation import AI21ChatConfig, AI21ChatConfig as AI21Config
from .llms.deprecated_providers.palm import (
PalmConfig,
) # here to prevent breaking changes
from .llms.deprecated_providers.aleph_alpha import AlephAlphaConfig
from .llms.gemini.common_utils import GeminiModelInfo
from .llms.vertex_ai.vertex_embeddings.transformation import (
VertexAITextEmbeddingConfig,
)
vertexAITextEmbeddingConfig = VertexAITextEmbeddingConfig()
from .llms.bedrock.embed.amazon_titan_v2_transformation import (
AmazonTitanV2Config,
)
from .llms.topaz.common_utils import TopazModelInfo
# OpenAIOSeriesConfig is lazy loaded - openaiOSeriesConfig will be created on first access
# OpenAIGPTConfig, OpenAIGPT5Config, etc. are lazy loaded - instances will be created on first access
from .llms.xai.common_utils import XAIModelInfo
# PublicAI now uses JSON-based configuration (see litellm/llms/openai_like/providers.json)
# All remaining configs are now lazy loaded - see _lazy_imports_registry.py
# Import LlmProviders here (before main import) because it's imported during import time
# in multiple places including openai.py (via main import)
from litellm.types.utils import LlmProviders
## Lazy loading this is not straightforward, will leave it here for now.
from .main import * # type: ignore
from .compression import compress # type: ignore[no-redef]
# Skills API
from .skills.main import (
create_skill,
acreate_skill,
list_skills,
alist_skills,
get_skill,
aget_skill,
delete_skill,
adelete_skill,
)
from .evals.main import (
create_eval,
acreate_eval,
list_evals,
alist_evals,
get_eval,
aget_eval,
delete_eval,
adelete_eval,
cancel_eval,
acancel_eval,
create_run,
acreate_run,
list_runs,
alist_runs,
get_run,
aget_run,
delete_run,
adelete_run,
cancel_run,
acancel_run,
)
from .integrations import *
from .llms.custom_httpx.async_client_cleanup import close_litellm_async_clients
from .exceptions import (
AuthenticationError,
InvalidRequestError,
BadRequestError,
ImageFetchError,
NotFoundError,
PermissionDeniedError,
RateLimitError,
ServiceUnavailableError,
BadGatewayError,
OpenAIError,
ContextWindowExceededError,
ContentPolicyViolationError,
BudgetExceededError,
APIError,
Timeout,
APIConnectionError,
UnsupportedParamsError,
APIResponseValidationError,
UnprocessableEntityError,
InternalServerError,
JSONSchemaValidationError,
LITELLM_EXCEPTION_TYPES,
MockException,
)
from .budget_manager import BudgetManager
from .proxy.proxy_cli import run_server
from .router import Router
from .assistants.main import *
from .batches.main import *
from .images.main import *
from .videos.main import *
from .batch_completion.main import * # type: ignore
from .rerank_api.main import *
from .llms.anthropic.experimental_pass_through.messages.handler import *
from .responses.main import *
# Interactions API is available as litellm.interactions module
# Usage: litellm.interactions.create(), litellm.interactions.get(), etc.
from . import interactions
from .interactions.agents.main import (
acreate as acreate_agent,
create as create_agent,
alist as alist_agents,
list as list_agents,
aget as aget_agent,
get as get_agent,
adelete as adelete_agent,
delete as delete_agent,
alist_versions as alist_agent_versions,
list_versions as list_agent_versions,
)
from .skills.main import (
create_skill,
acreate_skill,
list_skills,
alist_skills,
get_skill,
aget_skill,
delete_skill,
adelete_skill,
)
from .containers.main import *
from .ocr.main import *
from .rag.main import *
from .search.main import *
from .realtime_api.main import (
_arealtime,
acreate_realtime_client_secret,
arealtime_calls,
)
from .responses.main import _aresponses_websocket
from .fine_tuning.main import *
from .files.main import *
from .vector_store_files.main import (
acreate as avector_store_file_create,
adelete as avector_store_file_delete,
alist as avector_store_file_list,
aretrieve as avector_store_file_retrieve,
aretrieve_content as avector_store_file_content,
aupdate as avector_store_file_update,
create as vector_store_file_create,
delete as vector_store_file_delete,
list as vector_store_file_list,
retrieve as vector_store_file_retrieve,
retrieve_content as vector_store_file_content,
update as vector_store_file_update,
)
from .scheduler import *
### ADAPTERS ###
from .types.adapter import AdapterItem
import litellm.anthropic_interface as anthropic
adapters: List[AdapterItem] = []
### Vector Store Registry ###
from .vector_stores.vector_store_registry import (
VectorStoreRegistry,
VectorStoreIndexRegistry,
)
vector_store_registry: Optional[VectorStoreRegistry] = None
vector_store_index_registry: Optional[VectorStoreIndexRegistry] = None
### RAG ###
from . import rag
### CUSTOM LLMs ###
from .types.llms.custom_llm import CustomLLMItem
custom_provider_map: List[CustomLLMItem] = []
_custom_providers: List[str] = (
[]
) # internal helper util, used to track names of custom providers
disable_hf_tokenizer_download: Optional[bool] = (
None # disable huggingface tokenizer download. Defaults to openai clk100
)
global_disable_no_log_param: bool = False
### CLI UTILITIES ###
from litellm.litellm_core_utils.cli_token_utils import get_litellm_gateway_api_key
### PASSTHROUGH ###
from .passthrough import allm_passthrough_route, llm_passthrough_route
from .google_genai import agenerate_content
### GLOBAL CONFIG ###
global_bitbucket_config: Optional[Dict[str, Any]] = None
def set_global_bitbucket_config(config: Dict[str, Any]) -> None:
"""Set global BitBucket configuration for prompt management."""
global global_bitbucket_config
global_bitbucket_config = config
### GLOBAL CONFIG ###
global_gitlab_config: Optional[Dict[str, Any]] = None
def set_global_gitlab_config(config: Dict[str, Any]) -> None:
"""Set global BitBucket configuration for prompt management."""
global global_gitlab_config
global_gitlab_config = config
# Lazy loading system for heavy modules to reduce initial import time and memory usage
if TYPE_CHECKING:
from litellm.types.utils import ModelInfo as _ModelInfoType
from litellm.types.utils import PriorityReservationSettings
from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler, HTTPHandler
from litellm.caching.caching import Cache
# Type stubs for lazy-loaded configs to help mypy
from .llms.bedrock.chat.converse_transformation import (
AmazonConverseConfig as AmazonConverseConfig,
)
from .llms.openai_like.chat.handler import (
OpenAILikeChatConfig as OpenAILikeChatConfig,
)
from .llms.galadriel.chat.transformation import (
GaladrielChatConfig as GaladrielChatConfig,
)
from .llms.github.chat.transformation import GithubChatConfig as GithubChatConfig
from .llms.azure_ai.anthropic.transformation import (
AzureAnthropicConfig as AzureAnthropicConfig,
)
from .llms.bytez.chat.transformation import BytezChatConfig as BytezChatConfig
from .llms.compactifai.chat.transformation import (
CompactifAIChatConfig as CompactifAIChatConfig,
)
from .llms.empower.chat.transformation import EmpowerChatConfig as EmpowerChatConfig
from .llms.minimax.chat.transformation import MinimaxChatConfig as MinimaxChatConfig
from .llms.aiohttp_openai.chat.transformation import (
AiohttpOpenAIChatConfig as AiohttpOpenAIChatConfig,
)
from .llms.huggingface.chat.transformation import (
HuggingFaceChatConfig as HuggingFaceChatConfig,
)
from .llms.huggingface.embedding.transformation import (
HuggingFaceEmbeddingConfig as HuggingFaceEmbeddingConfig,
)
from .llms.oobabooga.chat.transformation import OobaboogaConfig as OobaboogaConfig
from .llms.maritalk import MaritalkConfig as MaritalkConfig
from .llms.openrouter.chat.transformation import (
OpenrouterConfig as OpenrouterConfig,
)
from .llms.datarobot.chat.transformation import DataRobotConfig as DataRobotConfig
from .llms.anthropic.chat.transformation import AnthropicConfig as AnthropicConfig
from .llms.bedrock.claude_platform.transformation import (
BedrockClaudePlatformConfig as BedrockClaudePlatformConfig,
)
from .llms.bedrock.claude_platform.messages_transformation import (
BedrockClaudePlatformMessagesConfig as BedrockClaudePlatformMessagesConfig,
)
from .llms.anthropic.completion.transformation import (
AnthropicTextConfig as AnthropicTextConfig,
)
from .llms.groq.stt.transformation import GroqSTTConfig as GroqSTTConfig
from .llms.triton.completion.transformation import TritonConfig as TritonConfig
from .llms.triton.completion.transformation import (
TritonGenerateConfig as TritonGenerateConfig,
)
from .llms.triton.completion.transformation import (
TritonInferConfig as TritonInferConfig,
)
from .llms.triton.embedding.transformation import (
TritonEmbeddingConfig as TritonEmbeddingConfig,
)
from .llms.huggingface.rerank.transformation import (
HuggingFaceRerankConfig as HuggingFaceRerankConfig,
)
from .llms.databricks.chat.transformation import (
DatabricksConfig as DatabricksConfig,
)
from .llms.databricks.embed.transformation import (
DatabricksEmbeddingConfig as DatabricksEmbeddingConfig,
)
from .llms.predibase.chat.transformation import PredibaseConfig as PredibaseConfig
from .llms.replicate.chat.transformation import ReplicateConfig as ReplicateConfig
from .llms.snowflake.chat.transformation import SnowflakeConfig as SnowflakeConfig
from .llms.cohere.rerank.transformation import (
CohereRerankConfig as CohereRerankConfig,
)
from .llms.cohere.rerank_v2.transformation import (
CohereRerankV2Config as CohereRerankV2Config,
)
from .llms.azure_ai.rerank.transformation import (
AzureAIRerankConfig as AzureAIRerankConfig,
)
from .llms.infinity.rerank.transformation import (
InfinityRerankConfig as InfinityRerankConfig,
)
from .llms.jina_ai.rerank.transformation import (
JinaAIRerankConfig as JinaAIRerankConfig,
)
from .llms.deepinfra.rerank.transformation import (
DeepinfraRerankConfig as DeepinfraRerankConfig,
)
from .llms.hosted_vllm.rerank.transformation import (
HostedVLLMRerankConfig as HostedVLLMRerankConfig,
)
from .llms.nvidia_nim.rerank.transformation import (
NvidiaNimRerankConfig as NvidiaNimRerankConfig,
)
from .llms.nvidia_nim.rerank.ranking_transformation import (
NvidiaNimRankingConfig as NvidiaNimRankingConfig,
)
from .llms.vertex_ai.rerank.transformation import (
VertexAIRerankConfig as VertexAIRerankConfig,
)
from .llms.fireworks_ai.rerank.transformation import (
FireworksAIRerankConfig as FireworksAIRerankConfig,
)
from .llms.voyage.rerank.transformation import (
VoyageRerankConfig as VoyageRerankConfig,
)
from .llms.watsonx.rerank.transformation import (
IBMWatsonXRerankConfig as IBMWatsonXRerankConfig,
)
from .llms.clarifai.chat.transformation import ClarifaiConfig as ClarifaiConfig
from .llms.ai21.chat.transformation import AI21ChatConfig as AI21ChatConfig
from .llms.meta_llama.chat.transformation import LlamaAPIConfig as LlamaAPIConfig
from .llms.together_ai.completion.transformation import (
TogetherAITextCompletionConfig as TogetherAITextCompletionConfig,
)
from .llms.cloudflare.chat.transformation import (
CloudflareChatConfig as CloudflareChatConfig,
)
from .llms.novita.chat.transformation import NovitaConfig as NovitaConfig
from .llms.petals.completion.transformation import PetalsConfig as PetalsConfig
from .llms.ollama.chat.transformation import OllamaChatConfig as OllamaChatConfig
from .llms.ollama.completion.transformation import OllamaConfig as OllamaConfig
from .llms.sagemaker.completion.transformation import (
SagemakerConfig as SagemakerConfig,
)
from .llms.sagemaker.chat.transformation import (
SagemakerChatConfig as SagemakerChatConfig,
)
from .llms.sagemaker.nova.transformation import (
SagemakerNovaConfig as SagemakerNovaConfig,
)
from .llms.cohere.chat.transformation import CohereChatConfig as CohereChatConfig
from .llms.anthropic.experimental_pass_through.messages.transformation import (
AnthropicMessagesConfig as AnthropicMessagesConfig,
)
from .llms.bedrock.messages.invoke_transformations.anthropic_claude3_transformation import (
AmazonAnthropicClaudeMessagesConfig as AmazonAnthropicClaudeMessagesConfig,
)
from .llms.bedrock.messages.mantle_transformation import (
AmazonMantleMessagesConfig as AmazonMantleMessagesConfig,
)
from .llms.together_ai.chat import TogetherAIConfig as TogetherAIConfig
from .llms.nlp_cloud.chat.handler import NLPCloudConfig as NLPCloudConfig
from .llms.vertex_ai.gemini.vertex_and_google_ai_studio_gemini import (
VertexGeminiConfig as VertexGeminiConfig,
)
from .llms.gemini.chat.transformation import (
GoogleAIStudioGeminiConfig as GoogleAIStudioGeminiConfig,
)
from .llms.vertex_ai.vertex_ai_partner_models.anthropic.transformation import (
VertexAIAnthropicConfig as VertexAIAnthropicConfig,
)
from .llms.vertex_ai.vertex_ai_partner_models.llama3.transformation import (
VertexAILlama3Config as VertexAILlama3Config,
)
from .llms.vertex_ai.vertex_ai_partner_models.ai21.transformation import (
VertexAIAi21Config as VertexAIAi21Config,
)
from .llms.bedrock.chat.invoke_handler import (
AmazonCohereChatConfig as AmazonCohereChatConfig,
)
from .llms.bedrock.common_utils import (
AmazonBedrockGlobalConfig as AmazonBedrockGlobalConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_ai21_transformation import (
AmazonAI21Config as AmazonAI21Config,
)
from .llms.bedrock.chat.invoke_transformations.amazon_nova_transformation import (
AmazonInvokeNovaConfig as AmazonInvokeNovaConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_qwen2_transformation import (
AmazonQwen2Config as AmazonQwen2Config,
)
from .llms.bedrock.chat.invoke_transformations.amazon_qwen3_transformation import (
AmazonQwen3Config as AmazonQwen3Config,
)
from .llms.bedrock.chat.invoke_transformations.anthropic_claude2_transformation import (
AmazonAnthropicConfig as AmazonAnthropicConfig,
)
from .llms.bedrock.chat.invoke_transformations.anthropic_claude3_transformation import (
AmazonAnthropicClaudeConfig as AmazonAnthropicClaudeConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_cohere_transformation import (
AmazonCohereConfig as AmazonCohereConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_llama_transformation import (
AmazonLlamaConfig as AmazonLlamaConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_deepseek_transformation import (
AmazonDeepSeekR1Config as AmazonDeepSeekR1Config,
)
from .llms.bedrock.chat.invoke_transformations.amazon_mistral_transformation import (
AmazonMistralConfig as AmazonMistralConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_moonshot_transformation import (
AmazonMoonshotConfig as AmazonMoonshotConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_titan_transformation import (
AmazonTitanConfig as AmazonTitanConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_twelvelabs_pegasus_transformation import (
AmazonTwelveLabsPegasusConfig as AmazonTwelveLabsPegasusConfig,
)
from .llms.bedrock.chat.invoke_transformations.base_invoke_transformation import (
AmazonInvokeConfig as AmazonInvokeConfig,
)
from .llms.bedrock.chat.invoke_transformations.amazon_openai_transformation import (
AmazonBedrockOpenAIConfig as AmazonBedrockOpenAIConfig,
)
from .llms.bedrock.image_generation.amazon_stability1_transformation import (
AmazonStabilityConfig as AmazonStabilityConfig,
)
from .llms.bedrock.image_generation.amazon_stability3_transformation import (
AmazonStability3Config as AmazonStability3Config,
)
from .llms.bedrock.image_generation.amazon_nova_canvas_transformation import (
AmazonNovaCanvasConfig as AmazonNovaCanvasConfig,
)
from .llms.bedrock.embed.amazon_titan_g1_transformation import (
AmazonTitanG1Config as AmazonTitanG1Config,
)
from .llms.bedrock.embed.amazon_titan_multimodal_transformation import (
AmazonTitanMultimodalEmbeddingG1Config as AmazonTitanMultimodalEmbeddingG1Config,
)
from .llms.cohere.chat.v2_transformation import (
CohereV2ChatConfig as CohereV2ChatConfig,
)
from .llms.bedrock.embed.cohere_transformation import (
BedrockCohereEmbeddingConfig as BedrockCohereEmbeddingConfig,
)
from .llms.bedrock.embed.twelvelabs_marengo_transformation import (
TwelveLabsMarengoEmbeddingConfig as TwelveLabsMarengoEmbeddingConfig,
)
from .llms.bedrock.embed.amazon_nova_transformation import (
AmazonNovaEmbeddingConfig as AmazonNovaEmbeddingConfig,
)
from .llms.openai.openai import (
OpenAIConfig as OpenAIConfig,
MistralEmbeddingConfig as MistralEmbeddingConfig,
)
from .llms.openai.image_variations.transformation import (
OpenAIImageVariationConfig as OpenAIImageVariationConfig,
)
from .llms.deepgram.audio_transcription.transformation import (
DeepgramAudioTranscriptionConfig as DeepgramAudioTranscriptionConfig,
)
from .llms.nvidia_riva.audio_transcription.transformation import (
NvidiaRivaAudioTranscriptionConfig as NvidiaRivaAudioTranscriptionConfig,
)
from .llms.topaz.image_variations.transformation import (
TopazImageVariationConfig as TopazImageVariationConfig,
)
from litellm.llms.openai.completion.transformation import (
OpenAITextCompletionConfig as OpenAITextCompletionConfig,
)
from .llms.groq.chat.transformation import GroqChatConfig as GroqChatConfig
from .llms.bedrock_mantle.chat.transformation import (
BedrockMantleChatConfig as BedrockMantleChatConfig,
)
from .llms.a2a.chat.transformation import A2AConfig as A2AConfig
from .llms.voyage.embedding.transformation import (
VoyageEmbeddingConfig as VoyageEmbeddingConfig,
)
from .llms.voyage.embedding.transformation_contextual import (
VoyageContextualEmbeddingConfig as VoyageContextualEmbeddingConfig,
)
from .llms.infinity.embedding.transformation import (
InfinityEmbeddingConfig as InfinityEmbeddingConfig,
)
from .llms.perplexity.embedding.transformation import (
PerplexityEmbeddingConfig as PerplexityEmbeddingConfig,
)
from .llms.azure_ai.chat.transformation import (
AzureAIStudioConfig as AzureAIStudioConfig,
)
from .llms.mistral.chat.transformation import MistralConfig as MistralConfig
from .llms.openai.responses.transformation import (
OpenAIResponsesAPIConfig as OpenAIResponsesAPIConfig,
)
from .llms.azure.responses.transformation import (
AzureOpenAIResponsesAPIConfig as AzureOpenAIResponsesAPIConfig,
)
from .llms.azure.responses.o_series_transformation import (
AzureOpenAIOSeriesResponsesAPIConfig as AzureOpenAIOSeriesResponsesAPIConfig,
)
from .llms.xai.responses.transformation import (
XAIResponsesAPIConfig as XAIResponsesAPIConfig,
)
from .llms.litellm_proxy.responses.transformation import (
LiteLLMProxyResponsesAPIConfig as LiteLLMProxyResponsesAPIConfig,
)
from .llms.volcengine.responses.transformation import (
VolcEngineResponsesAPIConfig as VolcEngineResponsesAPIConfig,
)
from .llms.manus.responses.transformation import (
ManusResponsesAPIConfig as ManusResponsesAPIConfig,
)
from .llms.perplexity.responses.transformation import (
PerplexityResponsesConfig as PerplexityResponsesConfig,
)
from .llms.databricks.responses.transformation import (
DatabricksResponsesAPIConfig as DatabricksResponsesAPIConfig,
)
from .llms.openrouter.responses.transformation import (
OpenRouterResponsesAPIConfig as OpenRouterResponsesAPIConfig,
)
from .llms.bedrock_mantle.responses.transformation import (
BedrockMantleResponsesAPIConfig as BedrockMantleResponsesAPIConfig,
)
from .llms.gemini.interactions.transformation import (
GoogleAIStudioInteractionsConfig as GoogleAIStudioInteractionsConfig,
)
from .llms.openai.chat.o_series_transformation import (
OpenAIOSeriesConfig as OpenAIOSeriesConfig,
OpenAIOSeriesConfig as OpenAIO1Config,
)
from .llms.anthropic.skills.transformation import (
AnthropicSkillsConfig as AnthropicSkillsConfig,
)
from .llms.base_llm.skills.transformation import (
BaseSkillsAPIConfig as BaseSkillsAPIConfig,
)
from .llms.gradient_ai.chat.transformation import (
GradientAIConfig as GradientAIConfig,
)
from .llms.openai.chat.gpt_transformation import OpenAIGPTConfig as OpenAIGPTConfig
from .llms.openai.chat.gpt_5_transformation import (
OpenAIGPT5Config as OpenAIGPT5Config,
)
from .llms.openai.transcriptions.whisper_transformation import (
OpenAIWhisperAudioTranscriptionConfig as OpenAIWhisperAudioTranscriptionConfig,
)
from .llms.openai.transcriptions.gpt_transformation import (
OpenAIGPTAudioTranscriptionConfig as OpenAIGPTAudioTranscriptionConfig,
)
from .llms.openai.chat.gpt_audio_transformation import (
OpenAIGPTAudioConfig as OpenAIGPTAudioConfig,
)
from .llms.nvidia_nim.chat.transformation import NvidiaNimConfig as NvidiaNimConfig
from .llms.nvidia_nim.embed import (
NvidiaNimEmbeddingConfig as NvidiaNimEmbeddingConfig,
)
# Type stubs for lazy-loaded config instances
openaiOSeriesConfig: OpenAIOSeriesConfig
openAIGPTConfig: OpenAIGPTConfig
openAIGPTAudioConfig: OpenAIGPTAudioConfig
openAIGPT5Config: OpenAIGPT5Config
nvidiaNimConfig: NvidiaNimConfig
nvidiaNimEmbeddingConfig: NvidiaNimEmbeddingConfig
# Import config classes that need type stubs (for mypy) - import with _ prefix to avoid circular reference
from .llms.vllm.completion.transformation import VLLMConfig as _VLLMConfig
from .llms.deepseek.chat.transformation import (
DeepSeekChatConfig as _DeepSeekChatConfig,
)
from .llms.sap.chat.transformation import (
GenAIHubOrchestrationConfig as _GenAIHubOrchestrationConfig,
)
from .llms.sap.embed.transformation import (
GenAIHubEmbeddingConfig as _GenAIHubEmbeddingConfig,
)
from .llms.azure.chat.o_series_transformation import (
AzureOpenAIO1Config as _AzureOpenAIO1Config,
)
from .llms.perplexity.chat.transformation import (
PerplexityChatConfig as _PerplexityChatConfig,
)
from .llms.nscale.chat.transformation import NscaleConfig as _NscaleConfig
from .llms.watsonx.chat.transformation import (
IBMWatsonXChatConfig as _IBMWatsonXChatConfig,
)
from .llms.watsonx.completion.transformation import (
IBMWatsonXAIConfig as _IBMWatsonXAIConfig,
)
from .llms.litellm_proxy.chat.transformation import (
LiteLLMProxyChatConfig as _LiteLLMProxyChatConfig,
)
from .llms.deepinfra.chat.transformation import DeepInfraConfig as _DeepInfraConfig
from .llms.llamafile.chat.transformation import (
LlamafileChatConfig as _LlamafileChatConfig,
)
from .llms.lm_studio.chat.transformation import (
LMStudioChatConfig as _LMStudioChatConfig,
)
from .llms.lm_studio.embed.transformation import (
LmStudioEmbeddingConfig as _LmStudioEmbeddingConfig,
)
from .llms.watsonx.embed.transformation import (
IBMWatsonXEmbeddingConfig as _IBMWatsonXEmbeddingConfig,
)
from .llms.vertex_ai.gemini.vertex_and_google_ai_studio_gemini import (
VertexGeminiConfig as _VertexGeminiConfig,
)
# Type stubs for lazy-loaded config classes (to help mypy understand types)
VLLMConfig: Type[_VLLMConfig]
DeepSeekChatConfig: Type[_DeepSeekChatConfig]
GenAIHubOrchestrationConfig: Type[_GenAIHubOrchestrationConfig]
GenAIHubEmbeddingConfig: Type[_GenAIHubEmbeddingConfig]
AzureOpenAIO1Config: Type[_AzureOpenAIO1Config]
PerplexityChatConfig: Type[_PerplexityChatConfig]
NscaleConfig: Type[_NscaleConfig]
IBMWatsonXChatConfig: Type[_IBMWatsonXChatConfig]
IBMWatsonXAIConfig: Type[_IBMWatsonXAIConfig]
LiteLLMProxyChatConfig: Type[_LiteLLMProxyChatConfig]
DeepInfraConfig: Type[_DeepInfraConfig]
LlamafileChatConfig: Type[_LlamafileChatConfig]
LMStudioChatConfig: Type[_LMStudioChatConfig]
LmStudioEmbeddingConfig: Type[_LmStudioEmbeddingConfig]
IBMWatsonXEmbeddingConfig: Type[_IBMWatsonXEmbeddingConfig]
VertexAIConfig: Type[_VertexGeminiConfig] # Alias for VertexGeminiConfig
from .llms.featherless_ai.chat.transformation import (
FeatherlessAIConfig as FeatherlessAIConfig,
)
from .llms.cerebras.chat import CerebrasConfig as CerebrasConfig
from .llms.baseten.chat import BasetenConfig as BasetenConfig
from .llms.sambanova.chat import SambanovaConfig as SambanovaConfig
from .llms.sambanova.embedding.transformation import (
SambaNovaEmbeddingConfig as SambaNovaEmbeddingConfig,
)
from .llms.fireworks_ai.chat.transformation import (
FireworksAIConfig as FireworksAIConfig,
)
from .llms.fireworks_ai.completion.transformation import (
FireworksAITextCompletionConfig as FireworksAITextCompletionConfig,
)
from .llms.fireworks_ai.audio_transcription.transformation import (
FireworksAIAudioTranscriptionConfig as FireworksAIAudioTranscriptionConfig,
)
from .llms.fireworks_ai.embed.fireworks_ai_transformation import (
FireworksAIEmbeddingConfig as FireworksAIEmbeddingConfig,
)
from .llms.friendliai.chat.transformation import (
FriendliaiChatConfig as FriendliaiChatConfig,
)
from .llms.jina_ai.embedding.transformation import (
JinaAIEmbeddingConfig as JinaAIEmbeddingConfig,
)
from .llms.xai.chat.transformation import XAIChatConfig as XAIChatConfig
from .llms.zai.chat.transformation import ZAIChatConfig as ZAIChatConfig
from .llms.aiml.chat.transformation import AIMLChatConfig as AIMLChatConfig
from .llms.volcengine.chat.transformation import (
VolcEngineChatConfig as VolcEngineChatConfig,
VolcEngineChatConfig as VolcEngineConfig,
)
from .llms.codestral.completion.transformation import (
CodestralTextCompletionConfig as CodestralTextCompletionConfig,
)
from .llms.inception.completion.transformation import (
InceptionTextCompletionConfig as InceptionTextCompletionConfig,
)
from .llms.azure.azure import (
AzureOpenAIAssistantsAPIConfig as AzureOpenAIAssistantsAPIConfig,
)
from .llms.heroku.chat.transformation import HerokuChatConfig as HerokuChatConfig
from .llms.cometapi.chat.transformation import CometAPIConfig as CometAPIConfig
from .llms.azure.chat.gpt_transformation import (
AzureOpenAIConfig as AzureOpenAIConfig,
)
from .llms.azure.chat.gpt_5_transformation import (
AzureOpenAIGPT5Config as AzureOpenAIGPT5Config,
)
from .llms.azure.completion.transformation import (
AzureOpenAITextConfig as AzureOpenAITextConfig,
)
from .llms.azure.audio_transcription.transformation import (
AzureSpeechAudioTranscriptionConfig as AzureSpeechAudioTranscriptionConfig,
)
from .llms.hosted_vllm.chat.transformation import (
HostedVLLMChatConfig as HostedVLLMChatConfig,
)
from .llms.hosted_vllm.embedding.transformation import (
HostedVLLMEmbeddingConfig as HostedVLLMEmbeddingConfig,
)
from .llms.hosted_vllm.responses.transformation import (
HostedVLLMResponsesAPIConfig as HostedVLLMResponsesAPIConfig,
)
from .llms.github_copilot.chat.transformation import (
GithubCopilotConfig as GithubCopilotConfig,
)
from .llms.github_copilot.responses.transformation import (
GithubCopilotResponsesAPIConfig as GithubCopilotResponsesAPIConfig,
)
from .llms.github_copilot.embedding.transformation import (
GithubCopilotEmbeddingConfig as GithubCopilotEmbeddingConfig,
)
from .llms.chatgpt.chat.transformation import ChatGPTConfig as ChatGPTConfig
from .llms.chatgpt.responses.transformation import (
ChatGPTResponsesAPIConfig as ChatGPTResponsesAPIConfig,
)
from .llms.gigachat.chat.transformation import GigaChatConfig as GigaChatConfig
from .llms.gigachat.embedding.transformation import (
GigaChatEmbeddingConfig as GigaChatEmbeddingConfig,
)
from .llms.nebius.chat.transformation import NebiusConfig as NebiusConfig
from .llms.wandb.chat.transformation import WandbConfig as WandbConfig
from .llms.dashscope.chat.transformation import (
DashScopeChatConfig as DashScopeChatConfig,
)
from .llms.dashscope.embed.transformation import (
DashScopeEmbeddingConfig as DashScopeEmbeddingConfig,
)
from .llms.dashscope.rerank.transformation import (
DashScopeRerankConfig as DashScopeRerankConfig,
)
from .llms.moonshot.chat.transformation import (
MoonshotChatConfig as MoonshotChatConfig,
)
from .llms.docker_model_runner.chat.transformation import (
DockerModelRunnerChatConfig as DockerModelRunnerChatConfig,
)
from .llms.v0.chat.transformation import V0ChatConfig as V0ChatConfig
from .llms.oci.chat.transformation import OCIChatConfig as OCIChatConfig
from .llms.oci.embed.transformation import OCIEmbeddingConfig as OCIEmbeddingConfig
from .llms.morph.chat.transformation import MorphChatConfig as MorphChatConfig
from .llms.ragflow.chat.transformation import RAGFlowConfig as RAGFlowConfig
from .llms.lambda_ai.chat.transformation import (
LambdaAIChatConfig as LambdaAIChatConfig,
)
from .llms.inception.chat.transformation import (
InceptionChatConfig as InceptionChatConfig,
)
from .llms.hyperbolic.chat.transformation import (
HyperbolicChatConfig as HyperbolicChatConfig,
)
from .llms.vercel_ai_gateway.chat.transformation import (
VercelAIGatewayConfig as VercelAIGatewayConfig,
)
from .llms.ovhcloud.chat.transformation import (
OVHCloudChatConfig as OVHCloudChatConfig,
)
from .llms.ovhcloud.embedding.transformation import (
OVHCloudEmbeddingConfig as OVHCloudEmbeddingConfig,
)
from .llms.cometapi.embed.transformation import (
CometAPIEmbeddingConfig as CometAPIEmbeddingConfig,
)
from .llms.lemonade.chat.transformation import (
LemonadeChatConfig as LemonadeChatConfig,
)
from .llms.snowflake.embedding.transformation import (
SnowflakeEmbeddingConfig as SnowflakeEmbeddingConfig,
)
from .llms.amazon_nova.chat.transformation import (
AmazonNovaChatConfig as AmazonNovaChatConfig,
)
from litellm.caching.llm_caching_handler import LLMClientCache
from litellm.types.llms.bedrock import COHERE_EMBEDDING_INPUT_TYPES
from litellm.types.utils import (
BudgetConfig,
CredentialItem,
PriorityReservationDict,
StandardKeyGenerationConfig,
)
from litellm.types.guardrails import GuardrailItem
from litellm.types.proxy.management_endpoints.ui_sso import (
DefaultTeamSSOParams,
LiteLLM_UpperboundKeyGenerateParams,
)
# Cost calculator functions
cost_per_token: Callable[..., Tuple[float, float]]
completion_cost: Callable[..., float]
response_cost_calculator: Any
modify_integration: Any
# Utils functions - type stubs for truly lazy loaded functions only
# (functions NOT imported via "from .main import *")
get_response_string: Callable[..., str]
supports_function_calling: Callable[..., bool]
supports_web_search: Callable[..., bool]
supports_url_context: Callable[..., bool]
supports_response_schema: Callable[..., bool]
supports_parallel_function_calling: Callable[..., bool]
supports_vision: Callable[..., bool]
supports_audio_input: Callable[..., bool]
supports_audio_output: Callable[..., bool]
supports_system_messages: Callable[..., bool]
supports_reasoning: Callable[..., bool]
acreate: Callable[..., Any]
get_max_tokens: Callable[..., int]
get_model_info: Callable[..., _ModelInfoType] # type: ignore[no-redef]
register_prompt_template: Callable[..., None]
validate_environment: Callable[..., dict]
check_valid_key: Callable[..., bool]
register_model: Callable[..., None]
encode: Callable[..., list]
decode: Callable[..., str]
_calculate_retry_after: Callable[..., float]
_should_retry: Callable[..., bool]
get_supported_openai_params: Callable[..., Optional[list]]
get_api_base: Callable[..., Optional[str]]
get_first_chars_messages: Callable[..., str]
get_provider_fields: Callable[..., List]
get_valid_models: Callable[..., list]
remove_index_from_tool_calls: Callable[..., None]
# Response types - truly lazy loaded only (not in main.py or elsewhere)
ModelResponseListIterator: Type[Any]
# HTTP handler singletons (created lazily via __getattr__ at runtime)
module_level_aclient: AsyncHTTPHandler
module_level_client: HTTPHandler
# Bedrock tool name mappings instance (lazy-loaded)
from litellm.caching.caching import InMemoryCache
bedrock_tool_name_mappings: InMemoryCache
# Azure exception class (lazy-loaded)
from litellm.llms.azure.common_utils import AzureOpenAIError
# Secret manager types (lazy-loaded)
from litellm.types.secret_managers.main import (
KeyManagementSystem,
KeyManagementSettings, # Not lazy-loaded - needed for _key_management_settings initialization
)
# Custom logger class (lazy-loaded)
from litellm.integrations.custom_logger import CustomLogger
# Datadog LLM observability params (lazy-loaded)
from litellm.types.integrations.datadog_llm_obs import DatadogLLMObsInitParams
# Logging callback manager class and instance (lazy-loaded)
from litellm.litellm_core_utils.logging_callback_manager import (
LoggingCallbackManager,
)
logging_callback_manager: LoggingCallbackManager
# provider_list is lazy-loaded
from litellm.types.utils import LlmProviders
provider_list: List[Union[LlmProviders, str]]
# Note: AmazonConverseConfig and OpenAILikeChatConfig are imported above in TYPE_CHECKING block
# Track if async client cleanup has been registered (for lazy loading)
_async_client_cleanup_registered = False
# Eager loading for backwards compatibility with VCR and other HTTP recording tools
# When LITELLM_DISABLE_LAZY_LOADING is set, lazy-loaded attributes are loaded at import time
# For now, this only affects encoding (tiktoken) as it was the only reported issue
# See: https://github.com/BerriAI/litellm/issues/18659
# This ensures encoding is initialized before VCR starts recording HTTP requests
if os.getenv("LITELLM_DISABLE_LAZY_LOADING", "").lower() in ("1", "true", "yes", "on"):
# Load encoding at import time (pre-#18070 behavior)
# This ensures encoding is initialized before VCR starts recording
from .main import encoding
def __getattr__(name: str) -> Any:
"""Lazy import handler with cached registry for improved performance."""
global _async_client_cleanup_registered
# Register async client cleanup on first access (only once)
if not _async_client_cleanup_registered:
from litellm.llms.custom_httpx.async_client_cleanup import (
register_async_client_cleanup,
)
register_async_client_cleanup()
_async_client_cleanup_registered = True
# Use cached registry from _lazy_imports instead of importing tuples every time
from ._lazy_imports import _get_lazy_import_registry
registry = _get_lazy_import_registry()
# Check if name is in registry and call the cached handler function
if name in registry:
handler_func = registry[name]
return handler_func(name)
# Lazy load encoding from main.py to avoid heavy tiktoken import
if name == "encoding":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
# Check if already cached
if "encoding" not in _globals:
from .main import encoding as _encoding
_globals["encoding"] = _encoding
return _globals["encoding"]
# Lazy load bedrock_tool_name_mappings instance
if name == "bedrock_tool_name_mappings":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
# Check if already cached
if "bedrock_tool_name_mappings" not in _globals:
from .llms.bedrock.chat.invoke_handler import (
bedrock_tool_name_mappings as _bedrock_tool_name_mappings,
)
_globals["bedrock_tool_name_mappings"] = _bedrock_tool_name_mappings
return _globals["bedrock_tool_name_mappings"]
# Lazy load AzureOpenAIError exception class
if name == "AzureOpenAIError":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
# Check if already cached
if "AzureOpenAIError" not in _globals:
from .llms.azure.common_utils import AzureOpenAIError as _AzureOpenAIError
_globals["AzureOpenAIError"] = _AzureOpenAIError
return _globals["AzureOpenAIError"]
# Lazy load openaiOSeriesConfig instance
if name == "openaiOSeriesConfig":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
if "openaiOSeriesConfig" not in _globals:
# Import the config class and instantiate it
config_class = __getattr__("OpenAIOSeriesConfig")
_globals["openaiOSeriesConfig"] = config_class()
return _globals["openaiOSeriesConfig"]
# Lazy load other config instances
_config_instances = {
"openAIGPTConfig": "OpenAIGPTConfig",
"openAIGPTAudioConfig": "OpenAIGPTAudioConfig",
"openAIGPT5Config": "OpenAIGPT5Config",
"nvidiaNimConfig": "NvidiaNimConfig",
"nvidiaNimEmbeddingConfig": "NvidiaNimEmbeddingConfig",
}
if name in _config_instances:
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
if name not in _globals:
# Import the config class and instantiate it
config_class = __getattr__(_config_instances[name])
_globals[name] = config_class()
return _globals[name]
# Handle OpenAIO1Config alias
if name == "OpenAIO1Config":
return __getattr__("OpenAIOSeriesConfig")
# Lazy load provider_list
if name == "provider_list":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
# Check if already cached
if "provider_list" not in _globals:
# LlmProviders is eagerly imported above, so we can import it directly
from litellm.types.utils import LlmProviders
_globals["provider_list"] = list(LlmProviders)
return _globals["provider_list"]
# Lazy load priority_reservation_settings instance
if name == "priority_reservation_settings":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
# Check if already cached
if "priority_reservation_settings" not in _globals:
# Import the class and instantiate it
PriorityReservationSettings = __getattr__("PriorityReservationSettings")
_globals["priority_reservation_settings"] = PriorityReservationSettings()
return _globals["priority_reservation_settings"]
# Lazy load logging_callback_manager instance
if name == "logging_callback_manager":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
# Check if already cached
if "logging_callback_manager" not in _globals:
# Import the class and instantiate it
LoggingCallbackManager = __getattr__("LoggingCallbackManager")
_globals["logging_callback_manager"] = LoggingCallbackManager()
return _globals["logging_callback_manager"]
# Lazy load _service_logger module
if name == "_service_logger":
from ._lazy_imports import _get_litellm_globals
_globals = _get_litellm_globals()
# Check if already cached
if "_service_logger" not in _globals:
# Import the module lazily
import litellm._service_logger
_globals["_service_logger"] = litellm._service_logger
return _globals["_service_logger"]
# Lazy load evals module functions
if name in [
"acreate_eval",
"alist_evals",
"aget_eval",
"aupdate_eval",
"adelete_eval",
"acancel_eval",
"create_eval",
"list_evals",
"get_eval",
"update_eval",
"delete_eval",
"cancel_eval",
"acreate_run",
"alist_runs",
"aget_run",
"acancel_run",
"adelete_run",
"create_run",
"list_runs",
"get_run",
"cancel_run",
"delete_run",
]:
from litellm.evals.main import (
acreate_eval,
alist_evals,
aget_eval,
aupdate_eval,
adelete_eval,
acancel_eval,
create_eval,
list_evals,
get_eval,
update_eval,
delete_eval,
cancel_eval,
acreate_run,
alist_runs,
aget_run,
acancel_run,
adelete_run,
create_run,
list_runs,
get_run,
cancel_run,
delete_run,
)
return locals()[name]
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
# ALL_LITELLM_RESPONSE_TYPES is lazy-loaded via __getattr__ to avoid loading utils at import time