litellm

History

Mateo Wang 533eab4dbd fix(tests/vcr): make Redis cassette cache replay deterministically (zero VCR misses on consecutive runs) (#28826 ) * test(vcr): make Redis-backed cassettes replay deterministically across runs - Pin LITELLM_LOCAL_MODEL_COST_MAP=True in the shared VCR harness so the per-test importlib.reload(litellm) no longer fetches the model cost map from raw.githubusercontent.com. That live fetch was being recorded into cassettes; for tests that subsequently skip it was the only recorded episode, so the persister refused to save it (skipped tests don't persist) and the test re-recorded it live every run (MISS:NOT_PERSISTED). - Compare-time symmetric matcher tolerance for Google OAuth (ya29.) tokens, observability/telemetry payloads, credential-exchange bodies, and volatile UUID/timestamp tokens, so existing cassettes select a recorded episode instead of growing past the 50-episode cap and re-recording live. - Don't record fire-and-forget telemetry (langfuse/arize/otel/...) into non-telemetry tests' cassettes. Several modules set litellm.success_callback at import time, so observability logging is globally enabled and an async flush from the background logging worker lands in an unrelated test's VCR window, saved as a spurious MISS:RECORDED (observed: a Langfuse batch from another completion landing on test_lowest_latency_routing_buffer). Such a request now passes through live (telemetry hosts aren't real-spend hosts); tests that actually assert on telemetry keep recording it. - Dedupe + cap the VCR diagnostic dump so the classification summary survives CircleCI's ~400KB step-output truncation. - Stabilize a non-deterministic rate-limit test body; mark AWS Secrets Manager lifecycle tests VCR-incompatible (uniquely-named secrets can't be replayed). - Mark test_router_text_completion_client VCR-incompatible: it fires 300 identical requests to verify async-client reuse, but vcrpy patches the HTTP transport so replay never exercises the real connection pool the test validates, and recording 300 near-identical episodes overflows the 50-episode cap (MISS:OVERFLOW every run). It hits a free mock endpoint. - Mark the Vertex AI MaaS Mistral OCR tests (vertex_ai/mistral-ocr-2505) VCR-incompatible: the MaaS model is not provisioned in the CI GCP project, so the live :rawPredict call fails and the test skips every run, leaving no cassette to record (MISS:NOT_PERSISTED every run). Sibling direct-Mistral and Azure OCR tests are unaffected and still replay from cache. fix(tests/vcr): refresh cassette TTL on read so replayed cassettes don't expire The Redis VCR persister loaded cassettes with a plain GET, which does not touch the key's TTL. A cassette that is only ever replayed (HIT/NOOP, never re-recorded) therefore expired exactly 24h after its last write, no matter how often it was read. Whichever CI run happened to cross that boundary re-recorded the cassette live and surfaced a spurious VCR MISS on otherwise deterministic cassettes — the residual per-run flakiness floor (a different random subset of read-only cassettes expiring each run). Slide the expiry forward on every successful load (best-effort EXPIRE), so any cassette used at least once per TTL window stays alive indefinitely and the 2nd/3rd run of a day replays cleanly. * fix(tests/vcr): recover from spurious GET-None for existing cassette keys Under concurrent CI load, the persister's load GET was observed returning None for a cassette key that demonstrably existed on the (single, non- clustered) Redis master — an external monitor saw the key present with a healthy TTL at the same instant the in-process client read None. Because None is a valid GET result (not a RedisError), the retry-on-error client config never engaged, so the cassette re-recorded live (a phantom MISS:RECORDED); for flaky/networked tests the failed live call then triggered a pytest rerun, which is why a rotating subset of otherwise deterministic tests missed each run. On a None result, re-check EXISTS and re-read once. If the key really exists, use the recovered value and log [vcr-transient-miss-recovered] (also counted in cassette_cache_health). A genuinely absent key (a new cassette) still falls through to CassetteNotFoundError. * chore(tests/vcr): TEMP diagnostic for persistent-miss cassette load path Logs GET/EXISTS at load time for the three cassettes that re-record every run despite being present in Redis, to capture what the in-process client sees. To be reverted before merge. * chore(tests/vcr): write load diagnostic to Redis (truncation-proof) CI stdout truncates to the last ~400KB, dropping the early loaddbg lines for the alphabetically-first failing test. Push the load probe to a Redis list instead so it survives. To be reverted before merge. * fix(tests/vcr): don't drop stored telemetry episodes during cassette load Root cause of the residual per-run misses on present cassettes: vcrpy's Cassette._load() replays each stored interaction through Cassette.append(), which runs before_record_request on it — and a None return there silently drops that episode. The telemetry-leak suppressor (_should_drop_telemetry_record) returns None for telemetry requests, so when a non-telemetry-named test (or the alphabetically-first test in a worker, whose _current_test_nodeid is still empty) loaded a cassette containing a Langfuse ingestion episode, the episode was dropped on read — forcing an endless live re-record (a phantom MISS:RECORDED on a cassette that was demonstrably present in Redis). Verified by reproducing Cassette._load() against the real cassette: empty/non-telemetry nodeid -> 0 episodes survive; with the guard -> 1 survives. Fix: guard the suppressor with a thread-local set around Cassette._load (via a small idempotent monkeypatch), so the drop only ever stops new incidental telemetry from being recorded and never filters the existing cassette on read. Also drops the speculative GET-None recovery + its diagnostics from the previous commits: the load diagnostic showed GET returns the cassette bytes fine (get=1440B), so the persister never returned a spurious None — the loss happened later in vcrpy's append. The proven TTL-refresh-on-read fix is retained. * fix(tests/vcr): drop incidental telemetry export POSTs to stop rotating async-flush misses litellm's observability loggers flush on a background thread, so a Langfuse ingestion POST scheduled by one telemetry test can fire mid-way through a later telemetry-named test (after that test's own httpx mock has exited) and be recorded by VCR as a phantom episode — a non-deterministic MISS:RECORDED / PARTIAL that rotates onto a different telemetry test from run to run. Telemetry export POSTs are fire-and-forget; no test asserts on a recorded export response except the pass-through proxy test (which forwards a client POST to Langfuse ingestion and replays its 207). So _should_drop_telemetry_record now drops incidental export POSTs for every test except that one. Dropping returns None (live fire-and-forget, never stored), so it can only turn a phantom miss into a harmless live call, never the reverse; recorded read-back GETs that telemetry tests assert on are matched by method and left untouched. * fix(tests/vcr): restore assertion in test_banner_silent_when_vcr_disabled The assertion that the banner is suppressed when VCR is disabled was inadvertently moved into test_diagnostic_log_silent_when_no_dir when the diagnostic-log tests were added, leaving the disabled-VCR test verifying nothing. Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>		2026-05-26 11:30:44 -07:00
..
base_token_counter_test.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
conftest.py	fix(tests/vcr): make Redis cassette cache replay deterministically (zero VCR misses on consecutive runs) (#28826 )	2026-05-26 11:30:44 -07:00
log.txt	Allow passing `thinking` param to litellm proxy via client sdk + Code QA Refactor on get_optional_params (get correct values) (#9386 )	2025-04-07 21:04:11 -07:00
test_aiohttp_handler.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
test_anthropic_token_counter.py	[Test] Anthropic: Replace Legacy Claude-4-Sonnet Alias With Haiku 4.5	2026-05-01 19:10:27 -07:00
test_aws_secret_manager.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
test_azure_ai_anthropic_token_counter.py	fix: fix ci/cd + handle oidc jwt tokens	2026-03-30 16:12:58 -07:00
test_bedrock_token_counter.py	[Fix] Tests: Align Bedrock count-tokens endpoint assertions with URL-encoded model id	2026-05-01 15:22:40 -07:00
test_cyberark.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
test_get_secret.py
test_hashicorp.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
test_health_check.py	chore(ci): modernize model references in tests and configs (#27856 )	2026-05-15 15:44:28 -07:00
test_litellm_overhead.py	chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728 )	2026-05-25 12:03:17 -07:00
test_logging_callback_manager.py	feat(logging): add retry settings for generic API logger (#26645 )	2026-04-28 08:38:17 -07:00
test_proxy_budget_reset.py	fix: reset org and tag budgets (#27326 )	2026-05-09 19:15:32 -04:00
test_secret_manager.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
test_utils.py	chore(ci): modernize model references in tests and configs (#27856 )	2026-05-15 15:44:28 -07:00
test_validate_tool_choice.py	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
vertex_key.json	test: update to new vertex ai keys	2026-03-28 20:19:05 -07:00