Commit Graph

96 Commits

Author SHA1 Message Date
Mateo Wang
33c363d4d4
Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic (#29847)
* test(ci): extend record/replay proxy to chat, embeddings, moderations, rerank, anthropic

The record/replay proxy that took the gpt-image-1 spend E2E off the live OpenAI
path now fronts every provider, so the other real-provider E2Es stop paying for
and depending on live calls each commit. It keys per upstream and selects a
non-OpenAI provider by a /__recorder_upstream/<host>/ path prefix carried on the
model's api_base, since some litellm handlers (cohere rerank) drop custom
request headers. Wired into build_and_test (chat, embeddings, moderations,
image), the otel job (cohere rerank), and the anthropic-messages job via a
reusable start_openai_record_replay_proxy command.

Dropped the time.time()/uuid prompt cache-busters in the build_and_test chat
tests, whose config has the response cache off, so identical requests are
recordable. The image spend test now asserts a repeat call still bills spend,
failing loudly if the proxy response cache is ever turned on.

Responses, the anthropic passthrough, bedrock, and fake-endpoint tests are left
live: their lifecycles, api_base assertions, providers, or fake targets make a
stateless body-keyed cache either break them or add nothing.

* docs(ci): note the recorder command's OpenAI default upstream and prefix override

Addresses a review note: the shared start_openai_record_replay_proxy command
defaults the upstream to OpenAI, so a non-OpenAI model must carry the
/__recorder_upstream/<host>/ prefix on its api_base. Document that in the
command description so a future caller does not assume the default follows the
provider.
2026-06-06 14:33:42 -07:00
Mateo Wang
84247d954d
test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound (#29787)
* test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound

The dockerized spend test test_key_info_spend_values_image_generation curls
the proxy for a gpt-image-1 image, which wildcard-routes to real api.openai.com
on every commit; an OpenAI outage then reddens unrelated PRs and each run pays
for an image.

Add an in-repo record/replay reverse proxy (tests/_openai_record_replay_proxy.py)
that sits between the proxy and OpenAI. The first run, and the first after the
recording lapses, records live; subsequent runs replay from the shared Redis
cassette store. The proxy keeps its real separate-process HTTP topology; only
the image model's api_base is pointed at the recorder in CI via
IMAGE_GEN_RECORDER_BASE_URL, which is unset elsewhere so it falls back to
api.openai.com.

Recordings lapse 24h after write and are never refreshed on read, matching the
VCR persister contract, so provider drift is still caught. Replayed responses
drop upstream framing/server headers (content-length, transfer-encoding,
content-encoding, date, server) so the re-serving layer recomputes them,
honoring the Bedrock content-length lesson.

* test(ci): close recorder http client on app shutdown

Add a Starlette lifespan that closes the self-created httpx.AsyncClient on
teardown, and leave caller-injected clients untouched so reuse across
create_app calls is not broken. Covers the unclosed-client ResourceWarning
raised in review.
2026-06-05 10:27:23 -07:00
Mateo Wang
2c733c00f5
chore(ci): modernize model references in tests and configs (#27856)
* test: modernize models used in CircleCI e2e test suites

Replaces obsolete models (gpt-4o, gpt-4o-mini, gpt-3.5-turbo,
claude-3-5-sonnet-20240620, claude-sonnet-4-20250514) with current
equivalents across the e2e_openai_endpoints and
proxy_e2e_anthropic_messages_tests CircleCI jobs.

- gpt-4o -> gpt-5.5 (responses API e2e tests)
- gpt-4o-mini -> gpt-5-mini (websocket responses, oai_misc_config)
- gpt-4o-mini-2024-07-18 -> gpt-4.1-mini-2025-04-14 (fine-tuning,
  still actively fine-tunable)
- gpt-4 / gpt-3.5-turbo target_model_names example -> gpt-5.5 /
  gpt-5-mini
- bedrock claude-3-5-sonnet-20240620 batch entry -> haiku-4-5-20251001
  (also aligning oai_misc_config model_name with what
  test_bedrock_batches_api.py actually requests)
- bedrock claude-sonnet-4-20250514 (deprecated, retires 2026-06-15)
  -> claude-sonnet-4-5-20250929

* test: point bedrock-claude-sonnet-4 alias at Sonnet 4.6, not 4.5

Greptile/Cursor flagged that after the previous commit, the
bedrock-claude-sonnet-4 alias collided with bedrock-claude-sonnet-4.5
(both pointed to claude-sonnet-4-5-20250929). Rename to
bedrock-claude-sonnet-4.6 and point it at the Sonnet 4.6 Bedrock ID
(us.anthropic.claude-sonnet-4-6, already in the litellm model
registry) so the alias name matches the underlying model version.

* test: modernize models across remaining CI-mounted configs & tests

Expands the modernization sweep to all CircleCI-mounted proxy configs
and to test directories where the model literal is a fixture/route key
(not the test's subject).

Config changes:
- proxy_server_config.yaml: bump gpt-3.5-turbo / gpt-3.5-turbo-1106 /
  gpt-4o / gemini-1.5-flash / dall-e-3 underlying models; rename
  gpt-3.5-turbo-end-user-test alias to gpt-5-mini-end-user-test; bump
  text-embedding-ada-002 underlying to text-embedding-3-small. User-
  facing aliases (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, etc.)
  preserved for backward compatibility with tests.
- simple_config.yaml, otel_test_config.yaml, spend_tracking_config.yaml:
  bump gpt-3.5-turbo underlying to gpt-5-mini.
- pass_through_config.yaml: claude-3-5-sonnet / claude-3-7-sonnet /
  claude-3-haiku entries replaced with claude-sonnet-4-5 / claude-
  haiku-4-5 / claude-opus-4-7.
- oai_misc_config.yaml: align alias name with the gpt-5-mini rename.

Test changes (proactive: claude-sonnet-4-20250514 / claude-opus-4-
20250514 retire 2026-06-15):
- tests/llm_translation/test_anthropic_completion.py: bump 3 references
  + paired Vertex AI ID to claude-sonnet-4-5.
- tests/llm_translation/test_optional_params.py: bump 2 references.
- tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py
  and test_bedrock_anthropic_messages_test.py: bump router fixtures
  using the deprecated model IDs.
- tests/pass_through_unit_tests/base_anthropic_messages_tool_search_test.py:
  modernize docstring examples.
- tests/test_end_users.py: update references to renamed alias.

* test: modernize placeholder model literals in router_unit_tests

Mass replace_all on fixture/placeholder model literals across the
router_unit_tests/ suite (model name is a routing key / label, not the
test subject). Sub-agent sweep so far — additional commits will follow
for logging_callback_tests/, enterprise/, top-level tests/test_*.py,
and other CI-mounted dirs.

Mappings applied:
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 / claude-3-opus-20240229 /
  claude-3-haiku-20240307 / claude-3-5-sonnet-20240620 ->
  claude-sonnet-4-5-20250929 / claude-opus-4-7 /
  claude-haiku-4-5-20251001 as appropriate

Explicitly preserved:
- gpt-4o-mini-* variants (transcribe, tts, etc.) where they're current
- gpt-4-turbo / gpt-4-vision-preview / gpt-4-0613 (subject literals)
- JSONL batch body literals
- Mock LLM response model fields (must match upstream)
- Fake/mock identifiers

* test: modernize placeholder model literals across remaining CI suites

Sub-agent sweep across logging_callback_tests/, guardrails_tests/,
enterprise/, pass_through_unit_tests/, otel_tests/,
llm_responses_api_testing/, batches_tests/, spend_tracking_tests/,
litellm_utils_tests/, unified_google_tests/, and a few top-level
tests/test_*.py files where the model literal is a fixture or
placeholder (router model_list, mock standard logging payload, mock
callback data) rather than the test's subject.

Mappings applied (see scope notes below):
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5.5 (corrected from initial gpt-5 — bare gpt-5
  is not a valid OpenAI alias; only gpt-5.5 / gpt-5.4 / gpt-5.2-codex
  / gpt-5-mini exist)
- gpt-4o-mini (bare) -> gpt-5-mini
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 -> claude-sonnet-4-5-20250929
- claude-3-opus-20240229 -> claude-opus-4-7
- claude-3-haiku-20240307 -> claude-haiku-4-5-20251001
- claude-3-5-sonnet-20240620/20241022 -> claude-sonnet-4-5-20250929
- claude-3-7-sonnet-20250219 -> claude-sonnet-4-6
- gemini-1.5-flash -> gemini-2.5-flash
- gemini-1.5-pro -> gemini-2.5-pro

Explicitly preserved (not modernized):
- llm_translation/ tests where model is the SUBJECT (provider-specific
  translation/transformation logic). Only the deprecated 20250514
  references were already bumped in a prior commit.
- Cost-calc / tokenizer subject tests in test_utils.py (skip-ranges
  documented by the sub-agent).
- Bedrock model IDs in test_health_check.py path-stripping tests.
- JSONL batch request bodies and mock LLM response bodies (must match
  upstream literal).
- Langfuse expected-request-body JSON fixtures (cost values are exact-
  match-asserted; changing the model would shift response_cost).
- gpt-3.5-turbo-instruct (text-completion endpoint; no modern OpenAI
  equivalent).
- Top-level tests calling the proxy through user-facing aliases
  (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, dall-e-3) — aliases
  in proxy_server_config.yaml stay; only the underlying model was
  bumped.
- tests/test_gpt5_azure_temperature_support.py (the test's whole point
  is model-name handling).
- Fake / mock / openai/fake identifiers.

Notable side fixes:
- test_spend_accuracy_tests.py: UPSTREAM_MODEL now matches what
  spend_tracking_config.yaml's proxy actually routes to (gpt-5-mini),
  resolving a latent inconsistency.
- proxy_server_config.yaml: bare `gpt-5` alias renamed to `gpt-5.5`
  (bare gpt-5 is not a valid OpenAI alias).
- test_batches_logging_unit_tests.py: explicit_models list entries
  kept distinct (gpt-5-mini + gpt-5.5) after bulk rename.

* test: fix CI failures from model modernization sweep

CI surfaced 4 categories of regression from the bulk modernization:

1. Azure deployment names are customer-specific. Reverted:
   - tests/litellm_utils_tests/test_health_check.py: azure/text-
     embedding-3-small -> azure/text-embedding-ada-002 (the CI Azure
     account does not have a text-embedding-3-small deployment).
   - tests/logging_callback_tests/test_custom_callback_router.py:
     same revert for two router fixtures driving aembedding.

2. gpt-5 family does not accept temperature != 1. Tests that pass a
   custom temperature swapped from gpt-5-mini to gpt-4.1-mini (modern
   non-reasoning OpenAI mini that still accepts temperature/logprobs):
   - tests/logging_callback_tests/test_datadog.py
   - tests/logging_callback_tests/test_langsmith_unit_test.py
   - tests/logging_callback_tests/test_otel_logging.py

3. proxy_server_config.yaml's gpt-3.5-turbo-large alias was routing to
   gpt-5.5 (a reasoning model that rejects logprobs). The proxy test
   tests/test_openai_endpoints.py::test_chat_completion_streaming
   exercises logprobs/top_logprobs through that alias. Bumped the
   underlying model to gpt-4.1 (non-reasoning, still modern).

4. tests/logging_callback_tests/test_gcs_pub_sub.py asserts against a
   pinned JSON fixture (gcs_pub_sub_body/spend_logs_payload.json) with
   hardcoded model="gpt-4o" and a model-specific spend value. Reverted
   the litellm.acompletion calls in the test to model="gpt-4o" so the
   fixture's exact-match assertions still hold.

5. tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py:
   anthropic.messages.create routing to openai/gpt-5-mini returned an
   empty content[0] with max_tokens=100 (reasoning-token consumption).
   Swapped to openai/gpt-4.1-mini.

* test: fix Assistants API model + 2 cursor[bot] review nits

1. pass_through_unit_tests/test_custom_logger_passthrough.py: gpt-5.5
   isn't accepted by the /v1/assistants endpoint
   ("unsupported_model"). Switch to gpt-4.1-mini (modern, Assistants-
   API-supported, non-reasoning).

2. example_config_yaml/pass_through_config.yaml: the previous sweep
   bumped the claude-3-7-sonnet alias to claude-opus-4-7, which is a
   tier change (Sonnet -> Opus). Map to claude-sonnet-4-6 to keep the
   Sonnet tier intact. (Cursor bugbot review.)

3. example_config_yaml/simple_config.yaml: model_name was left as
   gpt-3.5-turbo while the underlying was bumped to gpt-5-mini, which
   muddles the "simple" example. Make both sides gpt-5-mini so the
   most basic example is a straight 1:1 mapping again. (Cursor bugbot
   review.)

* fix: revert gpt-4/gpt-3.5-turbo alias underlying to non-reasoning models

tests/test_openai_endpoints.py::test_completion calls the proxy alias
"gpt-4" with temperature=0, and other tests call gpt-3.5-turbo with
custom temperature / logprobs / the legacy /v1/completions endpoint.
The earlier modernization mapped both aliases to gpt-5.5 / gpt-5-mini,
which are reasoning models that reject temperature != 1 and don't
expose /v1/completions. Map the aliases to gpt-4.1 / gpt-4.1-mini
(modern non-reasoning OpenAI models) instead — keeps user-facing
aliases preserved while picking a current underlying that still
supports the parameters/endpoints the tests exercise.
2026-05-15 15:44:28 -07:00
Yuneng Jiang
8c8621ece3
fix(tests): swap dall-e to gpt-image-1 after openai deprecation
DALL-E 2 and DALL-E 3 were removed from the OpenAI API on 2026-05-12,
causing e2e image-generation tests to fail with "model does not exist".
Swap all live-API DALL-E references in proxy-backed tests to gpt-image-1
and update the dall-e-2 alias in proxy_server_config.yaml to point at
openai/gpt-image-1 (preserves any historical dall-e-2 callers).
2026-05-12 16:07:59 -07:00
yuneng-jiang
e2779639c0 [Fix] Fix test_users_in_team_budget using model with no pricing data
gpt-3.5-turbo-0301 was removed from the model cost map, so every call
had response_cost=0 and team member spend never increased. The wait
helper also returned True after 3s regardless of whether spend updated.

- Switch fake-openai-endpoint to gpt-3.5-turbo (has pricing in cost map)
- Remove premature early-return in wait_for_team_member_spend_update

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 12:35:20 -07:00
Ishaan Jaffer
d897c5e022 fix team budget checks 2026-01-31 15:28:33 -08:00
Sameer Kankute
d3fb7528d5 revert proxy_server_config.py 2025-12-20 00:20:20 +05:30
Sameer Kankute
d29f4cab59
Merge pull request #17971 from BerriAI/litellm_ocr_deepseek
Add support for ocr for vertex ai deepseek model
2025-12-20 00:13:57 +05:30
Sameer Kankute
ba90985300 Add reasoning effort mapping 2025-12-17 18:03:48 +05:30
Mateo Di Loreto
107ea9043a
[Feature] Download Prisma binaries at build time instead of at runtime for Security Restricted environments (#17695)
* Use config file to enable prometheus metrics

* Revert "Use config file to enable prometheus metrics"

This reverts commit 15ae36e1711791c0ac0a7aa84dcec142951717f5.

* Improve hardened stack and Prisma offline flow

* Document hardened compose usage

* Remove undesired change in fastapi-sso

* Restore dashboard lockfile

* Remove unecessary tempdirs

* Document hardened/offline Docker validation flow
2025-12-16 21:25:53 +05:30
Sameer Kankute
858879919c Add support for ocr for vertex ai deepseek model 2025-12-15 11:45:04 +05:30
abi_jey
a40c6ae0c0 fix: remove the unnecessary config changes 2025-11-26 13:33:52 +00:00
abi_jey
98344417ab fix: tested e2e implementation and added sample config. 2025-11-26 12:37:13 +00:00
Ishaan Jaffer
d6b0c11d5b test fixes, fk azure 2025-10-25 17:15:52 -07:00
Ishaan Jaffer
6ac21ddcec fix build and test gpt-3.5-turbo 2025-10-25 16:39:14 -07:00
Ishaan Jaffer
a6b6e56246 fixes azure 2025-10-25 15:54:30 -07:00
Sameer Kankute
b9585b1db5
Update documentation for enable_caching_on_provider_specific_optional_params (#15885) 2025-10-24 10:22:27 -07:00
Sameer Kankute
dce6cd1051 Add shared healthcheck 2025-10-09 22:18:05 +05:30
Ishaan Jaffer
4054eeea20 test build and test 2025-09-27 09:26:38 -07:00
Ishaan Jaff
9761ba7c7a
[Bug Fix] Responses api session management for streaming responses (#13396)
* fix proxy config

* fix(responses api): fix streaming ID consistency and tool format handling (#12640)

* fix(responses): ensure streaming chunk IDs use consistent encoding format

Fixes streaming ID inconsistency where streaming responses used raw provider IDs
while non-streaming responses used properly encoded IDs with provider context.

Changes:
- Updated LiteLLMCompletionStreamingIterator to accept provider context
- Added _encode_chunk_id() method using same logic as non-streaming responses
- Modified chunk transformation to encode all streaming item_ids with resp_ prefix
- Updated handlers to pass custom_llm_provider and litellm_metadata to streaming iterator

Impact:
- Streaming chunk IDs now format: resp_<base64_encoded_provider_context>
- Enables session continuity when using streaming response IDs as previous_response_id
- Allows provider detection and load balancing with streaming responses
- Maintains backward compatibility with existing streaming functionality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(types): add explicit Optional[str] type annotation for model_id

This resolves MyPy type checking error where model_id could be None
but wasn't explicitly typed as Optional[str].

* fix(types): handle None case for litellm_metadata access

Prevents 'Item None has no attribute get' error by checking for None
before accessing litellm_metadata dictionary.

* test: add comprehensive tests for streaming ID consistency

Adds unit and E2E tests to verify streaming chunk IDs are properly encoded
with consistent format across streaming responses.

## Tests Added

### Unit Test (test_reasoning_content_transformation.py)
- `test_streaming_chunk_id_encoding()`: Validates the `_encode_chunk_id()` method
  correctly encodes chunk IDs with `resp_` prefix and provider context

### E2E Tests (test_e2e_openai_responses_api.py)
- `test_streaming_id_consistency_across_chunks()`: Tests that all streaming chunk IDs
  are properly encoded across multiple chunks in a real streaming response
- `test_streaming_response_id_as_previous_response_id()`: Tests the core use case -
  using streaming response IDs for session continuity with `previous_response_id`

## Key Testing Approach
- Uses **Gemini** (non-OpenAI model) to test the transformation logic rather than
  OpenAI passthrough, since the streaming ID consistency issue occurs when LiteLLM
  transforms responses rather than just passing through to native OpenAI responses API
- Tests validate that streaming chunk IDs now use same encoding as non-streaming responses
- Verifies session continuity works with streaming responses

Addresses @ishaan-jaff's request for unit tests covering the streaming ID consistency fix.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(lint): remove unused imports in transformation.py

Removes unused imports to fix CI linting errors:
- GenericResponseOutputItem
- OutputFunctionToolCall

* test: remove E2E tests from openai_endpoints_tests

Remove streaming ID consistency E2E tests as requested by @ishaan-jaff.
Keep only the mock/unit test in test_reasoning_content_transformation.py

* revert: remove streaming chunk ID encoding to original behavior

This reverts the streaming chunk ID encoding changes to understand the original issue better.
Original behavior was:
- Streaming chunks: raw provider IDs
- Streaming final response: raw IDs (PROBLEM!)
- Non-streaming final response: encoded IDs (correct)

The real issue: streaming final response IDs were not encoded, breaking session continuity.

* fix(responses): encode streaming final response IDs to match OpenAI behavior

Fixes streaming ID inconsistency to match OpenAI's Responses API behavior:
- Streaming chunks: raw message IDs (like OpenAI's msg_xxx)
- Final response: encoded IDs (like OpenAI's resp_xxx)

This enables session continuity by ensuring streaming final response IDs
have the same encoded format as non-streaming responses, allowing them
to be used as previous_response_id in follow-up requests.

Changes:
- Add custom_llm_provider and litellm_metadata to LiteLLMCompletionStreamingIterator
- Update handlers to pass provider context to streaming iterator
- Apply _update_responses_api_response_id_with_model_id to final streaming response
- Keep streaming chunks as raw IDs to match OpenAI format

Impact:
- Session continuity works with streaming responses
- Load balancing can detect provider from streaming final response IDs
- Format matches OpenAI's Responses API exactly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test: update unit test to match correct OpenAI-compatible behavior

Updates the unit test to verify streaming chunk IDs are raw (not encoded)
to match OpenAI's responses API format:
- Streaming chunks: raw message IDs (like msg_xxx)
- Final response: encoded IDs (like resp_xxx)

This reflects the correct behavior implemented in the fix.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* cleanup

* TestBaseResponsesAPIStreamingIterator

---------

Co-authored-by: Javier de la Torre <jatorre@carto.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-08-07 20:13:24 -07:00
Krish Dholakia
9c32525c17
build: update model in test (#10706) 2025-05-09 13:33:11 -07:00
Krrish Dholakia
96e31edad3 build(proxy_server_config.yaml): move to model with higher quota 2025-05-08 22:18:27 -07:00
Ishaan Jaff
97d7a5e78e fix deployment name 2025-04-19 09:23:22 -07:00
Ishaan Jaff
8a1023fa2d test image gen fix in build and test 2025-04-02 21:33:24 -07:00
Ishaan Jaff
6b3bfa2b42
(Feat) - return x-litellm-attempted-fallbacks in responses from litellm proxy (#8558)
* add_fallback_headers_to_response

* test x-litellm-attempted-fallbacks

* unit test attempted fallbacks

* fix add_fallback_headers_to_response

* docs document response headers

* fix file name
2025-02-15 14:54:23 -08:00
Krish Dholakia
6bafdbc546
Litellm dev 01 25 2025 p4 (#8006)
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request

adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)

* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming

Fixes https://github.com/BerriAI/litellm/issues/7942

* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"

This reverts commit 7a052a64e3642616405e71350627e2e4f66615b4.

* fix(deepseek-r-1): return reasoning_content as a top-level param

ensures compatibility with existing tools that use it

* fix: fix linting error
2025-01-26 08:01:05 -08:00
Krish Dholakia
08b124aeb6
Litellm dev 01 25 2025 p2 (#8003)
* fix(base_utils.py): supported nested json schema passed in for anthropic calls

* refactor(base_utils.py): refactor ref parsing to prevent infinite loop

* test(test_openai_endpoints.py): refactor anthropic test to use bedrock

* fix(langfuse_prompt_management.py): add unit test for sync langfuse calls

Resolves https://github.com/BerriAI/litellm/issues/7938#issuecomment-2613293757
2025-01-25 16:50:57 -08:00
Krish Dholakia
513b1904ab
Add attempted-retries and timeout values to response headers + more testing (#7926)
* feat(router.py): add retry headers to response

makes it easy to add testing to ensure model-specific retries are respected

* fix(add_retry_headers.py): clarify attempted retries vs. max retries

* test(test_fallbacks.py): add test for checking if max retries set for model is respected

* test(test_fallbacks.py): assert values for attempted retries and max retries are as expected

* fix(utils.py): return timeout in litellm proxy response headers

* test(test_fallbacks.py): add test to assert model specific timeout used on timeout error

* test: add bad model with timeout to proxy

* fix: fix linting error

* fix(router.py): fix get model list from model alias

* test: loosen test restriction - account for other events on proxy
2025-01-22 22:19:44 -08:00
Krish Dholakia
3a7b13efa2
feat(health_check.py): set upperbound for api when making health check call (#7865)
* feat(health_check.py): set upperbound for api when making health check call

prevent bad model from health check to hang and cause pod restarts

* fix(health_check.py): cleanup task once completed

* fix(constants.py): bump default health check timeout to 1min

* docs(health.md): add 'health_check_timeout' to health docs on litellm

* build(proxy_server_config.yaml): add bad model to health check
2025-01-18 19:47:43 -08:00
Ishaan Jaff
47e12802df
(feat) /batches Add support for using /batches endpoints in OAI format (#7402)
* run azure testing on ci/cd

* update docs on azure batches endpoints

* add input azure.jsonl

* refactor - use separate file for batches endpoints

* fixes for passing custom llm provider to /batch endpoints

* pass custom llm provider to files endpoints

* update azure batches doc

* add info for azure batches api

* update batches endpoints

* use simple helper for raising proxy exception

* update config.yml

* fix imports

* update tests

* use existing settings

* update env var used

* update configs

* update config.yml

* update ft testing
2024-12-24 16:58:05 -08:00
Krish Dholakia
4ac66bd843
LiteLLM Minor Fixes and Improvements (09/07/2024) (#5580)
* fix(litellm_logging.py): set completion_start_time_float to end_time_float if none

Fixes https://github.com/BerriAI/litellm/issues/5500

* feat(_init_.py): add new 'openai_text_completion_compatible_providers' list

Fixes https://github.com/BerriAI/litellm/issues/5558

Handles correctly routing fireworks ai calls when done via text completions

* fix: fix linting errors

* fix: fix linting errors

* fix(openai.py): fix exception raised

* fix(openai.py): fix error handling

* fix(_redis.py): allow all supported arguments for redis cluster (#5554)

* Revert "fix(_redis.py): allow all supported arguments for redis cluster (#5554)" (#5583)

This reverts commit f2191ef4cb049bdb8a32d0846690f4a8519df143.

* fix(router.py): return model alias w/ underlying deployment on router.get_model_list()

Fixes https://github.com/BerriAI/litellm/issues/5524#issuecomment-2336410666

* test: handle flaky tests

---------

Co-authored-by: Jonas Dittrich <58814480+Kakadus@users.noreply.github.com>
2024-09-09 18:54:17 -07:00
Krrish Dholakia
0a016d33e6 Revert "fix(router.py): return model alias w/ underlying deployment on router.get_model_list()"
This reverts commit 638896309c.
2024-09-07 18:04:56 -07:00
Krrish Dholakia
638896309c fix(router.py): return model alias w/ underlying deployment on router.get_model_list()
Fixes https://github.com/BerriAI/litellm/issues/5524#issuecomment-2336410666
2024-09-07 18:01:31 -07:00
Ishaan Jaff
f1ffa82062 fix use provider specific routing 2024-08-07 14:37:20 -07:00
Ishaan Jaff
404360b28d test pass through endpoint 2024-08-06 12:16:00 -07:00
Ishaan Jaff
b35c63001d fix setup for endpoints 2024-07-31 17:09:08 -07:00
Ishaan Jaff
c8dfc95e90 add examples on config 2024-07-31 15:29:06 -07:00
Ishaan Jaff
9863520376 support using */* 2024-07-25 18:48:56 -07:00
Ishaan Jaff
e2397c3b83 fix test_team_2logging langfuse 2024-06-19 21:14:18 -07:00
Ishaan Jaff
d409ffbaa9 fix test_chat_completion_different_deployments 2024-06-17 23:04:48 -07:00
Ishaan Jaff
cb386fda20 test - making mistral embedding request on proxy 2024-06-12 15:10:20 -07:00
Marc Abramowitz
83c242bbb3 Add commented set_verbose line to proxy_config
because I've wanted to do this a couple of times and couldn't remember
the exact syntax.
2024-05-16 15:59:37 -07:00
Krrish Dholakia
54587db402 fix(alerting.py): fix datetime comparison logic 2024-05-14 22:10:09 -07:00
Ishaan Jaff
9bde3ccd1d (ci/cd) fixes 2024-05-13 20:49:02 -07:00
Krrish Dholakia
99e8f0715e test(test_end_users.py): fix end user region routing test 2024-05-11 22:42:43 -07:00
Ishaan Jaff
9c4f1ec3e5 fix - failing test_end_user_specific_region test 2024-05-11 17:05:37 -07:00
Ishaan Jaff
a4695c3010 test - using langfuse as a failure callback 2024-05-10 17:37:32 -07:00
Krrish Dholakia
3d18897d69 feat(router.py): enable filtering model group by 'allowed_model_region' 2024-05-08 22:10:17 -07:00
Ishaan Jaff
6a06aba443 (ci/cd) use db connection limit 2024-05-06 11:15:22 -07:00
Ishaan Jaff
e8d3dd475a fix fake endpoint used on ci/cd 2024-05-06 10:37:39 -07:00