PR was blocked by .github/workflows/guard-fork-dependencies.yml: fork PRs
cannot modify uv.lock. Reverting:
- uv.lock + pyproject.toml black bump (24.10.0 -> 26.3.1) and the 295
files of mechanical Black 26 reformat coupled to it
- pyproject.toml diskcache extra change (kept the runtime mitigation in
litellm/caching/disk_cache.py via JSONDisk)
Kept:
- Dockerfile cache narrowing (drops ~660 MB of uv build cache that
surfaced cached setuptools as CVE findings)
- litellm/caching/disk_cache.py: dc.JSONDisk to neutralize CVE-2025-69872
- ui/litellm-dashboard/package-lock.json + litellm-js/spend-logs/package-lock.json:
next/postcss/hono/uuid CVE bumps (these are not blocked by the fork guard)
- tests/test_litellm/caching/test_disk_cache.py
- tests/code_coverage_tests/liccheck.ini: harmless black authorization
Black + gitpython + langchain dep upgrades will need a follow-up from a
maintainer pushing a branch in the canonical BerriAI/litellm repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strip VCR wiring from the batches test conftest. Drops:
- import of `_vcr_conftest_common` helpers
- the `vcr_config` fixture, `pytest_recording_configure`,
`_vcr_outcome_gate`, `pytest_runtest_makereport`
- the `apply_vcr_auto_marker_to_items` call in
`pytest_collection_modifyitems`
- `VerboseReporterState` / its `pytest_configure` /
`pytest_runtest_logreport` hooks (purely VCR-verdict plumbing)
Why: every test in this directory creates ephemeral OpenAI / Bedrock /
vLLM resources whose IDs change per run (file-XXX, batch-XXX,
ft-XXX, ...). VCR's path/query/body matchers don't match across runs,
so `record_mode="new_episodes"` was silently passing through to the
live API and recording many new cassette entries every run. Cassette
bloat without replay benefit.
Behaviour after this change is identical to running the directory
without `CASSETTE_REDIS_URL` set: tests that have keys hit live APIs,
tests that don't continue to skip via their existing skipif markers.
Conftest now keeps only path setup and the session-scoped `event_loop`
fixture.
OpenAI announced gpt-3.5-turbo-0125 (and fine-tuning of gpt-3.5-turbo
in general) for shutdown on 2026-10-23, with the announcement landing
2026-04-22. The hard-fail date is ~5 months out, but timing fits the
recent uptick in this test flaking and OpenAI may already be running
the deprecated model's pipeline with deprioritized infra.
Bump to gpt-4o-mini-2024-07-18 — currently supported for fine-tuning,
no announced shutdown. Updates the live test plus the mocked test for
consistency. Belt-and-suspenders with the existing propagation-retry
helper.
Previous fix polled `litellm.afile_retrieve` for `status == "processed"`
before calling the fine-tuning endpoint. That doesn't actually solve
the race:
- OpenAI's `FileObject.status` field is deprecated per the SDK type and
not authoritative — it can read "processed" before the file is usable.
- The retrieve and fine-tuning endpoints don't share a consistency
model, so retrieve succeeding tells you nothing about FT visibility.
Replace with a retry around the actual `acreate_fine_tuning_job` call
that catches the OpenAI 400 `'file-... does not exist'` and backs off
exponentially (1s → cap 8s, 12 attempts, ~70s total budget). The
operation succeeding is the only reliable signal that propagation
finished.
OpenAI file uploads are eventually consistent — a freshly uploaded file
may briefly 404 from `retrieve` and is rejected by the fine-tuning
endpoint with `'file-... does not exist'` until processing finishes.
The async fine-tuning test called `acreate_fine_tuning_job` immediately
after `acreate_file` and flaked on this race.
Add a polling helper that waits up to ~30s for `status=processed` (and
short-circuits on `error`), called between upload and FT job creation.
Mirrors the same propagation lag covered by the `await asyncio.sleep(1)`
in the sister batches test, but more robust against longer delays.
OpenAI deprecated the gpt-4o-realtime-preview-2024-10-01 snapshot,
which caused these E2E tests to fail consistently in CI. Bump to the
unversioned gpt-4o-realtime-preview alias to match the sibling
test_openai_realtime_simple.py and stay current as OpenAI rolls the
alias forward.
- Add `_strip_image_b64_payloads` filter: rewrites `data[*].b64_json` in
image-gen responses to a 4-byte placeholder before the cassette is saved.
Image-edit and image-gen cassettes (193 MB / 184 MB / 104 MB / ...) will
shrink to <100 KB on next record. Tests assert response shape only, so
coverage is preserved.
- Add `_normalize_multipart_boundary` filter: replaces httpx's per-request
random multipart boundary with a fixed string in both Content-Type header
and body bytes. Audio-transcription / Whisper tests have been effectively
unmocked — every CI run hit live providers and was silently capped at
MAX_EPISODES_PER_CASSETTE=50. Both record and replay now see identical
bytes; the safe_body matcher works.
- Fix test_evals_api.py body poisoning: replace `int(time.time())` in eval
names with `hashlib.sha1(test_node_name)[:12]`, add a function-scoped
`managed_eval` fixture that creates and deletes the eval, and switch
`get_eval` / `update_eval` from `list_evals().data[0].id` (which made
the URL vary by run) to `managed_eval.id`. Net coverage gain: delete is
now actually exercised.
- Swap arxiv PDF URL in BaseOCRTest for the in-repo `dummy.pdf` (589 B)
served via sha-pinned jsdelivr.
- Swap etsystatic image URL in BaseLLMChatTest.test_image_url for the
in-repo LiteLLM logo (9.2 KB) served via the same jsdelivr pin.
- Add `tests/llm_translation/test_vcr_filters.py` with 14 unit tests
covering both new filters: replacement, idempotency, nesting, content-
length update, two-distinct-boundaries-converge-after-normalize, etc.
Cassettes recorded with the prior patterns will mismatch on the first CI
run after merge; recommend flushing the cassette Redis once (post-merge)
so re-records save under the new format from the start.
* feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json
xAI's docs page now lists grok-4.3 as the recommended chat / coding model:
"We strongly recommend all API callers use grok-4.3. It is the most
intelligent and fastest model we've built." (https://docs.x.ai/docs/models)
Pricing/specs sourced from xAI's published model metadata:
- input: $1.25 / 1M tokens (<=200k), $2.50 / 1M tokens (>200k)
- output: $2.50 / 1M tokens (<=200k), $5.00 / 1M tokens (>200k)
- cached: $0.20 / 1M tokens (<=200k), $0.40 / 1M tokens (>200k)
- context: 1,000,000 tokens
- capabilities: vision, reasoning, function calling, structured outputs,
prompt caching, web search
Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias),
mirroring the pattern used for the rest of the xAI/Grok-4 family.
* test(xai): add model_info test for grok-4.3 + sync backup cost map
- Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into
litellm/model_prices_and_context_window_backup.json so the bundled
model cost map matches the canonical model_prices_and_context_window.json.
- Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering
pricing tiers, capability flags, context window, provider routing,
and parity between the main and backup cost maps.
- Point 'source' at the live xAI models page (the per-model URL
https://docs.x.ai/docs/models/grok-4.3 currently 404s).
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
---------
Co-authored-by: shin-watcher <shin-watcher@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* Refactor Bedrock response stream shape handling
- Introduced a module-level constant `BEDROCK_RESPONSE_STREAM_SHAPE` to cache the response stream shape, eliminating the need for per-instance caching in `BedrockEventStreamDecoderBase`.
- Updated relevant methods to utilize the new constant, improving performance by avoiding redundant loading of the shape.
- Added tests to ensure the shape is loaded correctly at import time and is consistent across different modules.
- Added a new mock server script for testing Bedrock pass-through functionality.
* Refactor response parsing for Bedrock and SageMaker
- Improved code readability by formatting the parsing method calls in `AWSEventStreamDecoder` for both Bedrock and SageMaker response stream shapes.
- Added blank lines for better separation of code blocks in `invoke_handler.py` and `common_utils.py` to enhance maintainability.
* Enhance error handling for Bedrock and SageMaker response stream shape loading
- Wrapped the loading logic in `_load_bedrock_response_stream_shape` and `_load_sagemaker_response_stream_shape` with try-except blocks to gracefully handle exceptions.
- Added logging to warn when the response stream shape cannot be pre-loaded, ensuring the module imports cleanly.
- Updated tests to verify that loading failures return `None` instead of propagating exceptions.
* Implement error handling for missing response stream shapes in Bedrock and SageMaker
- Added checks in `_parse_message_from_event` methods to raise appropriate errors when `BEDROCK_RESPONSE_STREAM_SHAPE` or `SAGEMAKER_RESPONSE_STREAM_SHAPE` is None, ensuring clearer error reporting.
- Updated logging messages to reflect the unavailability of event-stream decoding for both Bedrock and SageMaker.
- Enhanced unit tests to verify that the correct exceptions are raised when the response stream shapes are not loaded.
* [Chore] CI: Block PRs that drop overall code coverage
Tighten Codecov project status threshold from 1% to 0% so any drop in
overall project coverage relative to the base commit fails the
codecov/project check. target: auto keeps the bar floating with the
codebase, no manual maintenance needed as coverage moves up over time.
* [Chore] CI: Always post Codecov status regardless of CI outcome
Set codecov.require_ci_to_pass: false and codecov.notify.wait_for_ci:
false so Codecov posts the codecov/project and codecov/patch checks as
soon as the expected uploads arrive, instead of withholding them when
unrelated CI jobs fail. The coverage-regression check is independent
of test pass/fail, and CI failures are already enforced by their own
required-status checks.
The assert-shard-coverage guard in test-unit-proxy-db.yml failed because
test_request_size_limit_middleware.py was added under tests/proxy_unit_tests/
but not referenced by any matrix entry. Assigning it to the proxy-runtime
shard, which already covers other server-runtime tests (proxy_routes,
proxy_gunicorn, server_root_path).
* fix(auth): pass team_id in member-level model access check
_check_team_member_model_access calls _can_object_call_model without
team_id, so access groups defined via model_info.access_groups cannot
resolve for team-scoped DB models (their internal router name is
model_name_<team>_<uuid>, not the public name). The team-level check
already passes team_id; this mirrors that.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test(auth): add tests for member-level access group resolution with team_id
Eight tests covering _can_object_call_model and
_check_team_member_model_access with team-scoped DB models:
- access group resolves when team_id is passed
- access group fails without team_id (pre-fix behavior)
- literal model name still works with team_id (no regression)
- denied model still denied with team_id
- second model in group also reachable
- end-to-end member access via access group (mocked membership)
- end-to-end member denied for model not in allowed list
- no-override member inherits team-level check
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* proxy: hot-reload config YAML when --reload is set
Uvicorn's --reload only watches *.py by default, so editing the
--config YAML did not restart the proxy. _get_reload_options() now
extends reload_dirs/reload_includes with the config file's directory
and basename when --config is provided.
* proxy: qualify reload_includes with absolute config path
Address Greptile review on PR #27274. When the --config file lives
outside cwd, reload_includes previously stored only the basename, which
meant uvicorn/watchfiles would also reload on edits to any same-named
file inside cwd. Use the absolute config path as the include pattern in
that case so only the actual proxy config triggers a restart.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* fix(proxy): use basename for reload_includes config pattern
Uvicorn's resolve_reload_patterns() calls pathlib.Path.glob(), which
raises NotImplementedError on absolute patterns (uvicorn discussion
2156). Passing config_abs (an absolute path) when the config file lived
outside cwd crashed startup under --reload. The config_dir is already
added to reload_dirs, so using just the basename as the include pattern
is sufficient to match the specific config file.
* fix: make it reload app when yaml changes
* style: remove unneeded comments
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* Include model name + configured TPM/RPM in priority rate-limit 429 errors (#27215)
* Include model name + configured TPM/RPM in priority rate-limit 429 errors
The current 429 message ('Priority-based rate limit exceeded. Priority: prod,
Rate limit type: tokens, Remaining: -664145, Model saturation: 86.3%') doesn't
tell the operator which model was hit or what the configured limit is, so they
can't tell whether the priority allocation needs tuning or the model TPM is
just too small.
Add Model, Model TPM, and Model RPM to both the priority-based 429 and the
sibling Model-capacity 429 in dynamic_rate_limiter_v3._check_rate_limits.
Pure error-message change — no behavior or schema impact.
* test: assert priority 429 includes model name + configured TPM/RPM
Adds a regression test for the new fields in the priority-based 429 detail
('Model:', 'Model TPM:', 'Model RPM:'). Verified locally that the test
fails against the unpatched dynamic_rate_limiter_v3.py and passes after
the patch.
---------
Co-authored-by: shin-watcher <ext-agent-shin@berri.ai>
* Update litellm/proxy/hooks/dynamic_rate_limiter_v3.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update litellm/proxy/hooks/dynamic_rate_limiter_v3.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: shin-watcher <ext-agent-shin@berri.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* ci(circleci): enable Rerun Failed Tests for all pytest suites
Migrated every pytest-based CircleCI job that uploads JUnit results to use
'circleci tests run' instead of invoking pytest directly. This is the
prerequisite for CircleCI's 'Rerun failed tests' feature to be available
on each job in the pipeline.
For each job:
- Glob test files via 'circleci tests glob' and pipe them into
'circleci tests run --command="xargs ... pytest ..."' so the agent can
feed the failed-test subset on rerun.
- Preserve all original pytest flags (parallelism, timeouts, retries,
coverage, junit output paths).
- For jobs that previously lacked 'store_test_results' (proxy spend
accuracy, proxy_build_from_pip, db_migration_disable_update_check),
add the step so JUnit XML is uploaded and rerun is actually wired up.
- Replace the dynamic IGNORE_DIRS shell array in llm_translation_testing
with a 'grep -v' filter on the glob output, matching the previous
behavior of skipping tests/llm_translation/realtime.
- For 'build_and_test', glob 'tests/test_*.py' (top-level only) which
matches the prior 'tests/*.py' shell glob; the long list of
'--ignore=tests/<subdir>' flags was vestigial and is dropped.
Jobs already using 'circleci tests run' (local_testing_part1/2,
litellm_router_testing) are unchanged.
* fix(ci): convert classnames to file paths on rerun
CircleCI's Rerun Failed Tests sends each previously failed test as a
JUnit classname (e.g. 'tests.otel_tests.test_key_logging_callbacks'),
but pytest needs a file path. Without the awk preprocess step, rerun
runs fail with 'file or directory not found'.
Mirror the awk transform that local_testing_part1, local_testing_part2,
and litellm_router_testing already use, so rerun works in every job
that this PR migrated to 'circleci tests run'.
* ci: drop -x from OTEL pytest run so all failures are reported
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
* fix(hosted_vllm): normalize custom tools for chat completions
Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot.
Made-with: Cursor
* Fix black issue
* Fix hosted vllm custom tool schema fallback
* fix black
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Several tests parametrized over (model, api_key, ...) tuples or raw
token strings, causing pytest to embed those values in the test ID
and print them in CI logs. Refactored each affected test to keep the
same coverage without putting key material into parametrize.
- audio_tests/test_audio_speech.py: split env-var keys into separate
azure/openai test functions sharing a helper; sync_mode parametrize
preserved.
- audio_tests/test_whisper.py: split into openai_whisper /
azure_whisper functions sharing a helper; response_format parametrize
preserved.
- local_testing/test_embedding.py: single-case parametrize inlined.
- proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize
cases split into 5 named tests sharing an _assert helper.
- proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split
into 4 named tests.
- test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix
cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split
into 5 named tests.
Verified: black clean; 14 refactored unit tests pass; pytest collects
audio/embedding tests with safe IDs (no key material in test IDs).
* feat(audio_transcription): add NVIDIA Riva STT provider
Adds nvidia_riva as a new audio transcription provider, supporting both
NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming.
- Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy,
audioread fallback) so callers can send any common format.
- Maps OpenAI params: language (en -> en-US), response_format (text/json/
verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets,
word offsets converted ms -> s for verbose_json.
- Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted
otherwise (SSL off by default), with explicit use_ssl override.
- gRPC errors wrapped via NvidiaRivaException -> litellm exception classes.
- Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client,
soundfile, audioread, numpy).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(nvidia_riva): address PR review feedback
- handler: forward call-level `timeout` to streaming_response_generator
(kwarg-detected via inspect for older riva-client compat) so a stalled
Riva server cannot block the caller indefinitely.
- audio_utils: spill bytes to a tempfile before audioread.audio_open;
most audioread backends (FFmpeg, GStreamer) require a real filesystem
path and previously raised TypeError on BytesIO, breaking the mp3/m4a
fallback path.
- audio_utils: prefer soxr / scipy.signal.resample_poly for resampling
(anti-aliased polyphase) when installed, falling back to linear only
as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples.
- transformation: bare `es` now maps to es-ES (Castilian) instead of
es-US, matching BCP-47 conventions.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore: trigger CI re-run [stabilize loop 1/3]
* Update litellm/llms/nvidia_riva/audio_transcription/transformation.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* chore: trigger CI re-run [stabilize loop 1/3]
* fix code qa
* fix lint
* fix mypy
* fix mypy
* Fix NVIDIA Riva ASR service lookup
* Fix NVIDIA Riva transcription payload logging
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* fix(anthropic, mcp): sanitize tool names to match Anthropic's `^[a-zA-Z0-9_-]{1,128}$`
Tool names with characters like `/` or `.` (commonly produced by the
OpenAPI -> MCP generator from `operationId`s such as
`actions/download-job-logs-for-workflow-run`) caused Anthropic to reject
requests with `tools.N.custom.name: String should match pattern
'^[a-zA-Z0-9_-]{1,128}$'`.
Two layers of fix:
1. Anthropic transformation: build a per-request forward map (original ->
sanitized, disambiguated by suffix on collisions) and a reverse map
(only for names actually rewritten). Forward map is applied to tool
defs, `tool_choice`, and historical assistant tool_calls in messages.
Reverse map is threaded through both the non-streaming and streaming
response paths so callers continue to see their original tool names
in `tool_use` blocks.
2. OpenAPI -> MCP generator: sanitize `operationId` (and the
method+path fallback) at registration time so generated MCP tools are
valid for any strict-name provider, not just Anthropic. The dashboard
preview endpoint applies the same sanitization for parity.
Includes unit tests covering: collision disambiguation between
`foo_bar` and `foo/bar` in the same request, reverse-map only firing
for actually-rewritten names, message rewrite for historical tool_calls,
streaming chunk_parser reverse-mapping, and sanitization of OpenAPI
operationIds plus the preview endpoint output.
Made-with: Cursor
* fix(anthropic): build tool-name maps in transform_request, not optional_params
The previous patch stashed the per-request forward and reverse tool-name
maps under ``optional_params["_anthropic_tool_name_forward_map"]`` and
``optional_params["_anthropic_tool_name_map"]``. ``optional_params`` is
the dict that becomes the JSON body via ``data = {**optional_params}``,
so those internal keys leaked over the wire and Anthropic 400'd with:
_anthropic_tool_name_forward_map: Extra inputs are not permitted
Worse, this meant *every* request whose tool list contained any name with
an invalid character (the exact case the patch was meant to fix) regressed
into a confusing meta-error pointing at LiteLLM's internal map instead of
the offending tool.
Fix: move all tool-name sanitization into ``transform_request``, which is
the single chokepoint already shared by ``AnthropicConfig``,
``AmazonAnthropicConfig`` (Bedrock invoke), ``VertexAIAnthropicConfig``,
and ``AzureAnthropicConfig`` (all call ``super().transform_request`` /
``AnthropicConfig.transform_request(self, ...)``). New static helper
``_sanitize_tool_names_in_request`` walks the already-Anthropic-shaped
``optional_params["tools"]`` (only ``type=="custom"`` entries -- hosted
tool names are reserved by Anthropic and must not be touched), builds
the per-request forward/reverse maps, and applies the forward map in
place to ``tools[*].name`` and ``tool_choice.name``. The reverse map is
stashed exclusively on ``litellm_params`` (which is never serialized to
a provider) under ``_anthropic_tool_name_map`` for the response paths
to consume.
Side effect of this restructure: ``map_openai_params`` is now a pure
OpenAI->Anthropic param translator with no side-channel state, which
matches its contract everywhere else in the codebase.
Tests: replaced the now-incorrect "stashes maps in optional_params"
tests with regressions that assert no underscore-prefixed keys appear
in either ``optional_params`` after ``map_openai_params`` or in the
final ``transform_request`` body. Added end-to-end coverage for:
sanitization in ``transform_request``, ``tool_choice`` rewriting,
historical ``tool_calls`` rewriting in messages, and hosted-tool
passthrough.
Made-with: Cursor
* fix(anthropic): always sanitize empty text content blocks
Anthropic 400s on `{"role": "user", "content": ""}` with:
"messages: text content blocks must be non-empty"
LiteLLM already had `_sanitize_empty_text_content` to rewrite empty text
to a placeholder, but it was gated behind `litellm.modify_params=True`.
With that flag off (default), empty content from upstream agent
frameworks (e.g. pydantic-ai) flowed straight through and tripped the
Anthropic validator.
Fix:
- Always run `_sanitize_empty_text_content` at the top of
`anthropic_messages_pt`, independent of `modify_params`. There is no
way to "pass through" an empty text block, so this is non-optional.
The richer tool-call sanitizations (Cases A/B/D, which actually
mutate conversation structure) remain gated on `modify_params`.
- Extend `_sanitize_empty_text_content` to also handle list-of-blocks
content (`[{"type": "text", "text": ""}]`), not just string content.
Adds 3 regression tests covering string content, list-of-blocks
content, and the no-op case (non-empty messages with modify_params off).
Made-with: Cursor
* fix(anthropic): drop dead tool-name forward-map params, fix mypy + caller-mutation
- remove unused `name_forward_map` param from `_map_tool_choice`,
`_map_tool_helper`, `_map_tools` and the `_apply_anthropic_tool_name_forward`
helper. Production sanitization runs in `_sanitize_tool_names_in_request`
at `transform_request`; these params were never threaded through.
- handler.py: use `ANTHROPIC_TOOL_NAME_REVERSE_MAP_KEY` constant instead of
the hardcoded `"_anthropic_tool_name_map"` string.
- fix mypy `"object" has no attribute "__iter__"` in
`_rewrite_tool_names_in_messages` by guarding `tool_calls` with
`isinstance(..., list)`.
- `_sanitize_tool_names_in_request`: build a new tools list with copy-on-
change entries (and copy `tool_choice` on rewrite) so a caller reusing
the same tool list/dicts across requests doesn't see its inputs
permanently rewritten.
- doc-comment `_build_request_tool_name_maps` clarifying it operates on
OpenAI-format tools (vs `_sanitize_tool_names_in_request` which runs
on Anthropic-format tools post-`_map_tools`).
- tests: drop 3 tests pinning the now-removed param paths; add coverage
for tool_calls + None function_call rewrite and caller-dict immutability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(mcp): inherit stored credentials in test/tools/list for edit flow
When editing an existing MCP server, the Tool Configuration preview
calls POST /mcp-rest/test/tools/list with server_id but no credentials
(management API redacts them). The endpoint now calls
_inherit_credentials_from_existing_server() so stored bearer tokens
and OAuth2 M2M credentials are loaded from global_mcp_server_manager
automatically — tools load without re-entering credentials.
New servers (no server_id) and requests with explicit credentials are
unaffected (function is a no-op in both cases).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(mcp): show all tools in edit panel, not just allowed tools
Edit flow was passing externalTools (from GET /tools/list, filtered by
allowed_tools) to MCPToolConfiguration, disabling the internal hook.
Remove the external props so the internal hook fires via
POST /test/tools/list, which returns all tools unfiltered. Combined
with the credential inheritance fix, tools load automatically without
re-entering credentials and all tools are visible for re-configuration.
existingAllowedTools still pre-checks previously allowed tools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix order-dependent collision in _build_anthropic_tool_name_maps
Use a two-pass approach: first pre-register all already-valid tool names
in the 'used' set, then sanitize/disambiguate names that need rewriting.
This ensures valid names always have priority regardless of input order,
preventing duplicate tool names on the wire when e.g. 'foo/bar' appears
before 'foo_bar' in the tool list.
Add regression test for the reversed ordering case.
* Fix OpenAPI tool name collision: disambiguate sanitized names with numeric suffixes
sanitize_openapi_tool_name replaces all invalid chars with '_', but when
two operationIds differ only by sanitized characters (e.g. 'foo/list' and
'foo.list' both become 'foo_list'), the second registration silently
overwrites the first in the tool registry.
Add collision disambiguation in register_tools_from_openapi that appends
_2, _3, ... suffixes when a sanitized name is already taken, mirroring
the existing logic in _build_anthropic_tool_name_maps.
* Fix preview endpoint missing collision disambiguation for tool names
Add used_names tracking and _2/_3 suffix disambiguation to
_preview_openapi_tools, matching the logic in register_tools_from_openapi.
Without this, two operationIds that sanitize to the same string (e.g.
'foo/list' and 'foo.list' both becoming 'foo_list') would show duplicate
names in the preview while registration would disambiguate them.
* Align preview HTTP method order with register_tools_from_openapi
The preview endpoint and register_tools_from_openapi both use
order-dependent collision disambiguation (_2, _3 suffixes). When the
iteration order differs, two operations on the same path with sanitized
names that collide get different suffixes in preview vs registration,
so the dashboard shows names that don't match what actually got
registered.
Also adds a regression test that fails on the swapped order.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* Skip duplicate originals in _build_anthropic_tool_name_maps
If the same invalid tool name appeared twice in original_names (e.g.
['foo/bar', 'foo/bar']), the second occurrence overwrote the forward
map entry with a freshly-suffixed name (foo_bar_2), leaving foo_bar
orphaned in 'used' with no reverse mapping. _sanitize_tool_names_in_request
then rewrote both tool entries to foo_bar_2, and Anthropic 400'd on
duplicate tool names.
Skip the rewrite if forward already has the original mapped.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* feat(realtime): OpenAI Realtime GA support and beta compatibility
- Normalize beta-style session.update to GA for upstream OpenAI; optional GA→beta
event translation when client sends OpenAI-Beta: realtime=v1
- Default upstream WebSocket without OpenAI-Beta; forward header when client opts in
- Extend OpenAI realtime types for GA event names and conversation item shapes
- Relax LiteLLMRealtimeStreamLoggingObject.results to List[Any] for GA events
- Update proxy client_secrets fallback to omit beta header; dashboard RealtimePlayground
- Add unit tests for remap, translation, and beta header helper
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix results
* fix greptile
* Fix mypy issues
* Remove unused class constants _GA_TEXT_DELTA_TYPES and _GA_AUDIO_DELTA_TYPES
These frozensets were defined as class-level constants in realtime_streaming.py
but never referenced anywhere in the codebase. Removing dead code.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(realtime): use GA-shaped session.update in guardrail injections
The guardrail VAD injection code sent a beta-style session.update with a
flat turn_detection field:
{"session": {"turn_detection": {"create_response": false}}}
When the upstream OpenAI backend operates in GA mode (no OpenAI-Beta
header forwarded), it requires the nested GA shape:
{"session": {"type": "realtime", "audio": {"input": {"turn_detection": {"create_response": false}}}}}
The _remap_beta_session_to_ga helper was only applied to client-
originated session.update messages in client_ack_messages. Internally-
generated session.updates (sent via _send_to_backend) in two paths:
- _handle_raw_backend_message (raw/no provider_config path, line 518)
- backend_to_client_send_messages provider_config path (line 481)
bypassed the remap, so GA upstreams ignored or rejected them, breaking
audio transcription guardrails for all non-beta clients.
Fix: add _make_disable_auto_response_message() helper that always emits
the correct GA-shaped session.update, and replace both injection sites
with it.
Update existing tests to assert the GA nested shape instead of the old
flat beta shape, and add a new unit test for the helper itself.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* Log realtime session type
* Fix beta realtime session payloads
* Fix realtime audio format remapping edge case
* Fix Azure realtime beta session shape
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
* refactor(BaseAWSLLM): implement shared IAM cache and static credential caching
- Introduced a process-wide shared IAM cache to optimize credential management across instances.
- Added a method to handle caching of static credentials, ensuring only long-lived credentials are cached.
- Updated the get_credentials method to utilize the new caching mechanism for static credential flows.
- Enhanced unit tests to verify the correct behavior of the shared cache and static credential usage.
* refactor(BaseAWSLLM): enhance IAM credential caching and update related tests
- Improved the process-wide IAM credential caching mechanism to better handle static and AssumeRole credentials.
- Renamed the caching method for clarity and updated comments to reflect the new caching behavior.
- Added a fixture to ensure the IAM cache is flushed between tests to prevent leakage of cached entries.
- Updated unit tests to verify the correct behavior of the shared IAM cache, particularly for static credentials and role assumptions.
* refactor(BaseAWSLLM): clarify IAM credential caching behavior and enhance tests
- Updated documentation to specify that only static and ambient environment credentials are cached, excluding AssumeRole and other credential types.
- Modified the caching logic to ensure that AssumeRole credentials are not stored in the IAM cache, requiring STS calls for each request.
- Enhanced unit tests to verify that AssumeRole credentials are not cached and to ensure proper behavior of the IAM cache across different scenarios.
* Code Readability improvement for aws auth path
* refactor(BaseAWSLLM): enhance IAM credential caching documentation and add tests
- Updated comments to clarify the behavior of the in-process IAM credential cache, specifying the TTL for static and ambient credentials.
- Added new unit tests to verify the caching behavior for ambient environment credentials across instances and ensure that static access key sessions are constructed only once when cached.
- Ensured that temporary session tokens and AWS profiles are not cached, validating the expected behavior through additional tests.
* refactor(BaseAWSLLM): improve IAM credential handling and add tests for role assumption
- Updated comments to clarify the behavior of IAM credential caching, particularly regarding the handling of ambient credentials and role assumptions.
- Enhanced unit tests to verify that the caching mechanism correctly distinguishes between already running roles and new role assumptions, ensuring that cached environment credentials are not reused incorrectly.
- Added a new test case to validate the behavior when switching roles, confirming that the system correctly uses AssumeRole when the role changes.