* feat(bedrock_mantle): add SigV4/IAM auth to Responses API route (fixes #29665) (#29788) * feat(responses): add default no-op sign_request to BaseResponsesAPIConfig * feat(responses): call sign_request after body is final, send signed bytes when signed * feat(bedrock_mantle): add SigV4 sign_request via composed BaseAWSLLM (bearer path) * test(bedrock_mantle): cover SigV4 access-key, AssumeRole, body bytes, region/auth consistency * feat(bedrock_mantle): defer auth to sign_request; validate_environment no longer requires bearer * docs(bedrock_mantle): document SigV4 + Bearer auth on Responses route * test(responses): cover fake-stream signing order and mantle bearer arg/env precedence * fix(bedrock_mantle): wrap all botocore credential errors with both-paths guidance * fix(bedrock_mantle): catch specific credential errors, not all BotoCoreError, so STS transport failures are not masked * fix(bedrock_mantle): sign the compact Responses route too, not just create * fix(github-copilot): route per-model on /v1/responses based on model info (#29747) * feat(focus): add GCS destination for FOCUS export (#29751) * test: add failing tests for FocusGCSDestination * feat: add FocusGCSDestination reusing GCSBucketBase auth * feat: register FocusGCSDestination in factory; export from __init__ * fix(focus): preserve GCS_PATH_SERVICE_ACCOUNT when service_account_json not in config * style: apply Black formatting to gcs_destination and tests * style: apply Black formatting to factory.py * fix(bedrock): omit empty additionalModelRequestFields and system from Converse API payload (#29565) Amazon Nova Pro (and other strict Bedrock models) return 400 Malformed input request when additionalModelRequestFields: {} or system: [] are present in the payload. Both fields are optional in CommonRequestObject (total=False) and must be omitted rather than sent as empty structures. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible in pass-through cost tracking (#29730) * fix(proxy): recognize *.cognitiveservices.azure.com as OpenAI-compatible Azure OpenAI resources created via the newer "Azure AI Foundry" / Cognitive Services pathway live on `*.cognitiveservices.azure.com` subdomains, not the older `openai.azure.com`. Both are valid Azure OpenAI surfaces in production today. The OpenAI pass-through cost-tracking handler hard-codes only the older hostname in five places (four `is_openai_*_route` methods on OpenAIPassthroughLoggingHandler, plus is_openai_route on PassThroughEndpointLogging). As a result, calls from newer Azure deployments are silently classified as "not an OpenAI route", the dispatch into the cost-tracking handler is skipped, and tokens/cost never get extracted into LiteLLM_SpendLogs — the row gets written with prompt_tokens=0, completion_tokens=0, spend=0, model='unknown'. Reproduced 2026-06-04 against a real Azure OpenAI deployment on `*.cognitiveservices.azure.com` proxied through LiteLLM v1.88.0. Fix: factor the hostname check into a single helper `_is_openai_compatible_host` listing all three recognized surfaces (api.openai.com, openai.azure.com, cognitiveservices.azure.com), and have all five call sites delegate to it. Purely additive — never weakens recognition for the originally-supported hostnames. Adds a test `test_is_openai_route_recognizes_cognitiveservices_azure_com` that exercises all four `is_openai_*_route` static methods against `*.cognitiveservices.azure.com` URLs (positive cases per route + a small cross-route negative to confirm route-specific path matching still works on the new hostname). Out of scope for this PR (separate followup): - `openai_passthrough_handler` calls chat/completions `transform_response` on Responses API payloads (`output:` not `choices:`), which throws inside the dispatch and drops the SpendLogs row entirely. Recognized + tracked separately. * ci: trigger fresh run Empty commit to re-run checks. The previous auth-and-jwt failure was a transient HuggingFace Hub 429 rate-limit hitting tokenizer downloads in tests/proxy_unit_tests/test_custom_tokenizer_bug.py — unrelated to this PR's scope (hostname recognition in pass-through cost tracking). No code change. --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(responses): preserve forced-function tool_choice name in Responses to Chat transform (#29812) The Responses API forces a specific function with a top-level name ({"type": "function", "name": "X"}), but _transform_tool_choice only handled the nested Chat Completions shape and fell through to returning "required" for the flat form, silently dropping the function name and degrading a forced function call to force-any-tool. Map the flat Responses shape to the nested Chat shape, keeping the "required" fallback when no name is present. * Preserve x-anthropic-billing-header system blocks for first-party Anthropic (#29584) * Preserve x-anthropic-billing-header system blocks for first-party Anthropic PR #20951 strips system blocks beginning with "x-anthropic-billing-header:" for every Anthropic target. That block is how the first-party Anthropic API recognizes Claude Code subscription (OAuth) traffic, so dropping it makes requests that carry only that block, such as the auto-mode tool-safety classifier, fail with a misleading 429 rate_limit_error; normal turns still work because they also carry the "You are Claude Code" identity block. Gate the strip behind should_strip_billing_metadata(), defaulting to False on the first-party AnthropicConfig and AnthropicMessagesConfig so the block is kept, and overridden to True on the providers that reach these transforms and reject the block (Bedrock platform, Vertex, Azure for the chat path; Minimax, Azure, DeepSeek for the messages path). Behavior for those providers is unchanged. * Strip billing header on Bedrock invoke and Vertex messages pass-through Two more subclasses reach the gated strip but inherited keep-by-default. AmazonAnthropicClaudeConfig (Bedrock invoke) calls AnthropicConfig.transform_request, which calls translate_system_message, and VertexAIPartnerModelsAnthropicMessagesConfig (Vertex messages pass-through) calls super().transform_anthropic_messages_request. Override should_strip_billing_metadata() to True on both. Add a parametrized test asserting the flag for every first-party base (False) and provider subclass (True), covering all overrides, plus a translate_system_message regression test for the Bedrock invoke path. * fix(cache): log hashed cache keys (#29890) * fix(ui): save routing groups as list (#29889) * Revert "fix(ui): save routing groups as list (#29889)" (#29928) This reverts commit 9b1f78ffa7a309cabe5e9a7ab5f94d1224d192c9. * feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider (#29842) * feat(parasail): add Parasail as a JSON-configured OpenAI-compatible provider Registers parasail in the openai_like JSON provider loader with both /v1/chat/completions and /v1/responses support. Parasail's Responses API rejects store:true and any request that omits store, so the loader gains a force_store_false special_handling flag; the parasail entry sets it and the generated Responses config overrides store=false on every call. This keeps callers from hitting "State storage not supported" and matches what Parasail's docs require. Adds the PARASAIL enum value, listing under openai_compatible_providers, provider documentation at docs/my-website/docs/providers/parasail.md, and a focused unit test file under tests/test_litellm/llms/parasail/ that covers JSON registration, chat URL construction, Responses URL construction with PARASAIL_API_BASE override, and the force_store_false regression in both the caller-sent-store=true and caller-omitted cases. * fix(parasail): register in provider_endpoints_support, drop in-repo docs Greptile review feedback. The provider doc belongs in the litellm-docs repo, not this one's docs/my-website tree; removing it here. Adds the parasail entry to provider_endpoints_support.json so the check_provider_folders_documented.py CI check passes (chat_completions and responses true; others false). * fix: normalize Anthropic passthrough server tool usage (#29827) * test(anthropic): cover server_tool_use dict cost tracking * fix: normalize Anthropic server tool usage (cherry picked from commit 982f726bed7d3ec05e463c5dd3d090bebae91d19) * fix: keep server tool usage subscriptable (cherry picked from commit 70280b9b272455b2f974d08bc697f67f929755bf) --------- Co-authored-by: Genmin <joey@joeyroth.com> * fix(proxy): fix typo generic_role_mappoings -> generic_role_mappings in ui_sso.py (#29753) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * feat(proxy): add disable_budget_reservation general setting (#27639) (#29493) * feat(proxy): add disable_budget_reservation general setting (#27639) * feat(proxy): register disable_budget_reservation in ConfigGeneralSettings (#27639) * docs(proxy): document disable_budget_reservation concurrency tradeoff (#27639) * ci: re-trigger flaky docker build (prisma generate ECONNRESET) * fix(proxy): warn and document budget enforcement tradeoff when disable_budget_reservation is set (#27639) * feat(gemini_tts): adding support to Gemini TTS languageCode parameters (#29623) * Adding support to Gemini TTS Language Code parameters * Mapping Gemini TTS languageCode param in Docstring * Use snake_case for language_code input keyMapping Gemini TTS languageCode param in Docstring * Restoring files modified under enterprise/litellm_enterprise due to lint/formatting checks --------- Co-authored-by: João Garrido <joaogarrido@google.com> * feat(guardrails): capture user and model metadata in CrowdStrike AIDR (#29517) * fix(proxy): require OpenAI path segment for shared Azure Cognitive Services domains Address Greptile review: the `*.cognitiveservices.azure.com` / `*.openai.azure.com` domains are shared by every Azure Cognitive Service (Speech, Vision, Language, ...), so a hostname-only substring match misclassified non-OpenAI Azure traffic as OpenAI routes. - Replace the substring host test with suffix matching (rejects look-alike domains like cognitiveservices.azure.com.attacker.example). - Add `_is_openai_compatible_url` that requires an OpenAI-style path marker (`/openai/` or `/v1/`) on the shared Azure domains, and use it in PassThroughEndpointLogging.is_openai_route (previously hostname-only). - Add negative tests for Azure Speech/Vision paths and look-alike domains. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix: support Responses input in Redis semantic cache (#29581) * fix: support responses input in redis semantic cache * test: cover redis semantic prompt extraction * test: handle blank redis semantic text fallbacks * chore: remove async cache dead statement * test: cover redis semantic cache miss paths * fix: filter sensitive cache lookup kwargs * chore: rerun ci after huggingface rate limit * chore(ui): regenerate dashboard API types (npm run gen:api) Sync src/lib/http/schema.d.ts with the proxy OpenAPI spec: adds the disable_budget_reservation general-settings field and picks up the RateLimitError docstring reindent. Fixes the gen:api CI drift check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(bedrock): assert empty additionalModelRequestFields is omitted The Converse transformer now drops an empty additionalModelRequestFields block instead of sending it as `{}`. Update test_bedrock_top_k_param so models without top_k support (llama3) assert the key is absent rather than equal to an empty dict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Kent <72616338+kingdoooo@users.noreply.github.com> Co-authored-by: codgician <15964984+codgician@users.noreply.github.com> Co-authored-by: Praveen Ghuge <95286176+pghuge-cloudwiz@users.noreply.github.com> Co-authored-by: Roi <roytev@gmail.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Liam Scott <liam@uilliam.com> Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com> Co-authored-by: Ceder Dens <cederdens@gmail.com> Co-authored-by: 冯基魁 <56265583+fengjikui@users.noreply.github.com> Co-authored-by: Kai Huang <kaihuang724@gmail.com> Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com> Co-authored-by: Genmin <joey@joeyroth.com> Co-authored-by: Arnav Bhilwariya <arnavbhilwariya0408@gmail.com> Co-authored-by: Armaan Sandhu <74664101+Ar-maan05@users.noreply.github.com> Co-authored-by: João Garrido <48538534+johngarrido@users.noreply.github.com> Co-authored-by: João Garrido <joaogarrido@google.com> Co-authored-by: Kenan Yildirim <kenan@kenany.me> Co-authored-by: Dávid Balatoni <balcsida@gmail.com>
991 lines
34 KiB
Python
991 lines
34 KiB
Python
import os
|
|
import sys
|
|
from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
|
import pytest
|
|
|
|
sys.path.insert(
|
|
0, os.path.abspath("../../..")
|
|
) # Adds the parent directory to the system path
|
|
|
|
|
|
# Tests for RedisSemanticCache
|
|
def test_redis_semantic_cache_initialization(monkeypatch):
|
|
# Mock the redisvl import
|
|
semantic_cache_mock = MagicMock()
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(CustomTextVectorizer=MagicMock()),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
# Set environment variables
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
# Initialize the cache with a similarity threshold
|
|
redis_semantic_cache = RedisSemanticCache(similarity_threshold=0.8)
|
|
|
|
# Verify the semantic cache was initialized with correct parameters
|
|
assert redis_semantic_cache.similarity_threshold == 0.8
|
|
|
|
# Use pytest.approx for floating point comparison to handle precision issues
|
|
assert redis_semantic_cache.distance_threshold == pytest.approx(0.2, abs=1e-10)
|
|
assert redis_semantic_cache.embedding_model == "text-embedding-ada-002"
|
|
|
|
# Test initialization with missing similarity_threshold
|
|
with pytest.raises(ValueError, match="similarity_threshold must be provided"):
|
|
RedisSemanticCache()
|
|
|
|
|
|
def test_redis_semantic_cache_get_cache(monkeypatch):
|
|
# Mock the redisvl import and embedding function
|
|
semantic_cache_mock = MagicMock()
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
# Set environment variables
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
# Initialize cache
|
|
redis_semantic_cache = RedisSemanticCache(similarity_threshold=0.8)
|
|
|
|
# Mock the llmcache.check method to return a result
|
|
mock_result = [
|
|
{
|
|
"prompt": "What is the capital of France?",
|
|
"response": '{"content": "Paris is the capital of France."}',
|
|
"vector_distance": 0.1, # Distance of 0.1 means similarity of 0.9
|
|
RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key",
|
|
}
|
|
]
|
|
redis_semantic_cache.llmcache.check = MagicMock(return_value=mock_result)
|
|
|
|
# Mock the embedding function
|
|
with (
|
|
patch(
|
|
"litellm.embedding",
|
|
return_value={"data": [{"embedding": [0.1, 0.2, 0.3]}]},
|
|
),
|
|
patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
),
|
|
):
|
|
# Test get_cache with a message
|
|
metadata = {}
|
|
result = redis_semantic_cache.get_cache(
|
|
key="test_key",
|
|
messages=[{"content": "What is the capital of France?"}],
|
|
metadata=metadata,
|
|
)
|
|
|
|
# Verify result is properly parsed
|
|
assert result == {"content": "Paris is the capital of France."}
|
|
assert metadata["semantic-similarity"] == pytest.approx(0.9)
|
|
|
|
# Verify llmcache.check was called
|
|
redis_semantic_cache.llmcache.check.assert_called_once_with(
|
|
prompt="What is the capital of France?",
|
|
filter_expression="cache-key-filter",
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_rejects_unscoped_cache_hit(monkeypatch):
|
|
semantic_cache_mock = MagicMock()
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
redis_semantic_cache = RedisSemanticCache(similarity_threshold=0.8)
|
|
redis_semantic_cache.llmcache.check = MagicMock(
|
|
return_value=[
|
|
{
|
|
"prompt": "What is the capital of France?",
|
|
"response": '{"content": "Paris"}',
|
|
"vector_distance": 0.1,
|
|
}
|
|
]
|
|
)
|
|
|
|
with patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
):
|
|
metadata = {}
|
|
result = redis_semantic_cache.get_cache(
|
|
key="test_key",
|
|
messages=[{"content": "What is the capital of France?"}],
|
|
metadata=metadata,
|
|
)
|
|
|
|
assert result is None
|
|
assert metadata["semantic-similarity"] == 0.0
|
|
|
|
|
|
def test_redis_semantic_cache_set_cache_stores_cache_key_filter(monkeypatch):
|
|
semantic_cache_mock = MagicMock()
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
redis_semantic_cache = RedisSemanticCache(similarity_threshold=0.8)
|
|
redis_semantic_cache.llmcache.store = MagicMock()
|
|
|
|
redis_semantic_cache.set_cache(
|
|
key="test_key",
|
|
value={"content": "Paris"},
|
|
messages=[{"content": "What is the capital of France?"}],
|
|
ttl=60,
|
|
)
|
|
|
|
redis_semantic_cache.llmcache.store.assert_called_once_with(
|
|
"What is the capital of France?",
|
|
"{'content': 'Paris'}",
|
|
filters={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"},
|
|
ttl=60,
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_uses_isolated_index_for_old_schema(monkeypatch):
|
|
fallback_cache_mock = MagicMock()
|
|
semantic_cache_mock = MagicMock(
|
|
side_effect=[
|
|
ValueError("stored index schema differs from requested fields"),
|
|
fallback_cache_mock,
|
|
]
|
|
)
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
redis_semantic_cache = RedisSemanticCache(
|
|
similarity_threshold=0.8,
|
|
index_name="existing_index",
|
|
)
|
|
|
|
assert redis_semantic_cache.llmcache is fallback_cache_mock
|
|
assert semantic_cache_mock.call_args_list[0].kwargs["name"] == "existing_index"
|
|
assert (
|
|
semantic_cache_mock.call_args_list[1].kwargs["name"]
|
|
== "existing_index_isolated"
|
|
)
|
|
assert semantic_cache_mock.call_args_list[1].kwargs["filterable_fields"] == [
|
|
RedisSemanticCache._cache_key_filterable_field()
|
|
]
|
|
|
|
|
|
def test_redis_semantic_cache_overwrites_stale_isolated_index(monkeypatch):
|
|
fallback_cache_mock = MagicMock()
|
|
semantic_cache_mock = MagicMock(
|
|
side_effect=[
|
|
ValueError("Existing index schema does not match"),
|
|
ValueError("Existing index schema does not match"),
|
|
fallback_cache_mock,
|
|
]
|
|
)
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
redis_semantic_cache = RedisSemanticCache(
|
|
similarity_threshold=0.8,
|
|
index_name="existing_index",
|
|
)
|
|
|
|
assert redis_semantic_cache.llmcache is fallback_cache_mock
|
|
assert (
|
|
semantic_cache_mock.call_args_list[2].kwargs["name"]
|
|
== "existing_index_isolated"
|
|
)
|
|
assert semantic_cache_mock.call_args_list[2].kwargs["overwrite"] is True
|
|
assert semantic_cache_mock.call_args_list[2].kwargs["filterable_fields"] == [
|
|
RedisSemanticCache._cache_key_filterable_field()
|
|
]
|
|
|
|
|
|
def test_redis_semantic_cache_reraises_unexpected_isolated_index_error(monkeypatch):
|
|
semantic_cache_mock = MagicMock(
|
|
side_effect=[
|
|
ValueError("Existing index schema does not match"),
|
|
ValueError("connection failed"),
|
|
]
|
|
)
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
with pytest.raises(ValueError, match="connection failed"):
|
|
RedisSemanticCache(
|
|
similarity_threshold=0.8,
|
|
index_name="existing_index",
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_reraises_unexpected_index_error():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.distance_threshold = 0.2
|
|
semantic_cache_mock = MagicMock(side_effect=ValueError("connection failed"))
|
|
|
|
with pytest.raises(ValueError, match="connection failed"):
|
|
redis_semantic_cache._init_semantic_cache(
|
|
semantic_cache_cls=semantic_cache_mock,
|
|
index_name="existing_index",
|
|
redis_url="redis://localhost:6379",
|
|
cache_vectorizer=MagicMock(),
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_matches_bytes_cache_key():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
|
|
assert redis_semantic_cache._cache_hit_matches_key(
|
|
cache_hit={RedisSemanticCache.CACHE_KEY_FIELD_NAME: b"test_key"},
|
|
key="test_key",
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_rejects_pre_isolation_unscoped_hit():
|
|
"""Pre-isolation entries with no cache-key field cannot be safely
|
|
reassigned to a caller's scope and are treated as misses."""
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
|
|
cache_hit = {
|
|
"prompt": "What is the capital of France?",
|
|
"response": '{"content": "Paris"}',
|
|
"vector_distance": 0.1,
|
|
}
|
|
assert not redis_semantic_cache._cache_hit_matches_key(
|
|
cache_hit=cache_hit,
|
|
key="test_key",
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_builds_filter_expression(monkeypatch):
|
|
class FakeTag:
|
|
def __init__(self, field_name):
|
|
self.field_name = field_name
|
|
|
|
def __eq__(self, value):
|
|
return (self.field_name, value)
|
|
|
|
with patch.dict("sys.modules", {"redisvl.query.filter": MagicMock(Tag=FakeTag)}):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
|
|
assert redis_semantic_cache._get_cache_key_filter_expression("test_key") == (
|
|
RedisSemanticCache.CACHE_KEY_FIELD_NAME,
|
|
"test_key",
|
|
)
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_redis_semantic_cache_async_get_cache(monkeypatch):
|
|
# Mock the redisvl import
|
|
semantic_cache_mock = MagicMock()
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
# Set environment variables
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
# Initialize cache
|
|
redis_semantic_cache = RedisSemanticCache(similarity_threshold=0.8)
|
|
|
|
# Mock the async methods
|
|
mock_result = [
|
|
{
|
|
"prompt": "What is the capital of France?",
|
|
"response": '{"content": "Paris is the capital of France."}',
|
|
"vector_distance": 0.1, # Distance of 0.1 means similarity of 0.9
|
|
RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key",
|
|
}
|
|
]
|
|
|
|
redis_semantic_cache.llmcache.acheck = AsyncMock(return_value=mock_result)
|
|
redis_semantic_cache._get_async_embedding = AsyncMock(
|
|
return_value=[0.1, 0.2, 0.3]
|
|
)
|
|
|
|
with patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
):
|
|
# Test async_get_cache with a message
|
|
result = await redis_semantic_cache.async_get_cache(
|
|
key="test_key",
|
|
messages=[{"content": "What is the capital of France?"}],
|
|
metadata={},
|
|
)
|
|
|
|
# Verify result is properly parsed
|
|
assert result == {"content": "Paris is the capital of France."}
|
|
|
|
# Verify methods were called
|
|
redis_semantic_cache._get_async_embedding.assert_called_once()
|
|
redis_semantic_cache.llmcache.acheck.assert_called_once_with(
|
|
prompt="What is the capital of France?",
|
|
vector=[0.1, 0.2, 0.3],
|
|
filter_expression="cache-key-filter",
|
|
)
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_redis_semantic_cache_async_get_cache_rejects_unscoped_hit(monkeypatch):
|
|
semantic_cache_mock = MagicMock()
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
redis_semantic_cache = RedisSemanticCache(similarity_threshold=0.8)
|
|
redis_semantic_cache.llmcache.acheck = AsyncMock(
|
|
return_value=[
|
|
{
|
|
"prompt": "What is the capital of France?",
|
|
"response": '{"content": "Paris"}',
|
|
"vector_distance": 0.1,
|
|
}
|
|
]
|
|
)
|
|
redis_semantic_cache._get_async_embedding = AsyncMock(
|
|
return_value=[0.1, 0.2, 0.3]
|
|
)
|
|
|
|
with patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
):
|
|
result = await redis_semantic_cache.async_get_cache(
|
|
key="test_key",
|
|
messages=[{"content": "What is the capital of France?"}],
|
|
metadata={},
|
|
)
|
|
|
|
assert result is None
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_redis_semantic_cache_async_set_cache_stores_cache_key_filter(
|
|
monkeypatch,
|
|
):
|
|
semantic_cache_mock = MagicMock()
|
|
custom_vectorizer_mock = MagicMock()
|
|
|
|
with patch.dict(
|
|
"sys.modules",
|
|
{
|
|
"redisvl.extensions.llmcache": MagicMock(SemanticCache=semantic_cache_mock),
|
|
"redisvl.utils.vectorize": MagicMock(
|
|
CustomTextVectorizer=custom_vectorizer_mock
|
|
),
|
|
},
|
|
):
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
monkeypatch.setenv("REDIS_HOST", "localhost")
|
|
monkeypatch.setenv("REDIS_PORT", "6379")
|
|
monkeypatch.setenv("REDIS_PASSWORD", "test_password")
|
|
|
|
redis_semantic_cache = RedisSemanticCache(similarity_threshold=0.8)
|
|
redis_semantic_cache.llmcache.astore = AsyncMock()
|
|
redis_semantic_cache._get_async_embedding = AsyncMock(
|
|
return_value=[0.1, 0.2, 0.3]
|
|
)
|
|
|
|
await redis_semantic_cache.async_set_cache(
|
|
key="test_key",
|
|
value={"content": "Paris"},
|
|
messages=[{"content": "What is the capital of France?"}],
|
|
ttl=60,
|
|
)
|
|
|
|
redis_semantic_cache.llmcache.astore.assert_called_once_with(
|
|
"What is the capital of France?",
|
|
"{'content': 'Paris'}",
|
|
vector=[0.1, 0.2, 0.3],
|
|
filters={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"},
|
|
ttl=60,
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_set_cache_uses_responses_string_input():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
redis_semantic_cache._get_cache_filters = MagicMock(
|
|
return_value={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"}
|
|
)
|
|
redis_semantic_cache._get_ttl = MagicMock(return_value=None)
|
|
|
|
redis_semantic_cache.set_cache(
|
|
key="test_key",
|
|
value={"content": "Paris"},
|
|
input="What is the capital of France?",
|
|
)
|
|
|
|
redis_semantic_cache.llmcache.store.assert_called_once_with(
|
|
"What is the capital of France?",
|
|
"{'content': 'Paris'}",
|
|
filters={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"},
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_get_cache_uses_responses_string_input():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.similarity_threshold = 0.8
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
redis_semantic_cache.llmcache.check = MagicMock(
|
|
return_value=[
|
|
{
|
|
"prompt": "What is the capital of France?",
|
|
"response": '{"content": "Paris"}',
|
|
"vector_distance": 0.1,
|
|
RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key",
|
|
}
|
|
]
|
|
)
|
|
|
|
with patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
):
|
|
metadata = {}
|
|
result = redis_semantic_cache.get_cache(
|
|
key="test_key",
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
)
|
|
|
|
assert result == {"content": "Paris"}
|
|
assert metadata["semantic-similarity"] == pytest.approx(0.9)
|
|
redis_semantic_cache.llmcache.check.assert_called_once_with(
|
|
prompt="What is the capital of France?",
|
|
filter_expression="cache-key-filter",
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_set_cache_flattens_structured_responses_input():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
redis_semantic_cache._get_cache_filters = MagicMock(
|
|
return_value={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"}
|
|
)
|
|
redis_semantic_cache._get_ttl = MagicMock(return_value=None)
|
|
|
|
redis_semantic_cache.set_cache(
|
|
key="test_key",
|
|
value={"content": "Paris"},
|
|
input=[
|
|
{
|
|
"role": "user",
|
|
"content": [
|
|
{"type": "input_text", "text": "What is the capital of France?"},
|
|
{"type": "input_text", "text": "Answer briefly."},
|
|
{
|
|
"type": "input_image",
|
|
"image_url": "https://example.com/paris.png",
|
|
},
|
|
],
|
|
}
|
|
],
|
|
)
|
|
|
|
redis_semantic_cache.llmcache.store.assert_called_once_with(
|
|
"What is the capital of France?\nAnswer briefly.",
|
|
"{'content': 'Paris'}",
|
|
filters={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"},
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_prompt_extraction_prefers_messages():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
prompt = RedisSemanticCache._get_prompt_from_kwargs(
|
|
messages=[{"content": "message prompt"}],
|
|
input="responses prompt",
|
|
)
|
|
|
|
assert prompt == "message prompt"
|
|
|
|
|
|
def test_redis_semantic_cache_prompt_extraction_handles_model_objects():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
class ModelDumpInput:
|
|
def model_dump(self):
|
|
return {"content": [{"text": "model dump prompt"}]}
|
|
|
|
class DictInput:
|
|
def dict(self):
|
|
return {"content": [{"output_text": "dict prompt"}]}
|
|
|
|
prompt = RedisSemanticCache._get_prompt_from_kwargs(
|
|
input=[
|
|
ModelDumpInput(),
|
|
DictInput(),
|
|
{"content": [{"input_text": "inline prompt"}]},
|
|
{"content": [{"type": "input_image", "image_url": "https://example.com"}]},
|
|
]
|
|
)
|
|
|
|
assert prompt == "model dump prompt\ndict prompt\ninline prompt"
|
|
|
|
|
|
def test_redis_semantic_cache_prompt_extraction_returns_none_without_text():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
assert RedisSemanticCache._get_prompt_from_kwargs() is None
|
|
assert RedisSemanticCache._get_prompt_from_kwargs(input=None) is None
|
|
assert RedisSemanticCache._get_prompt_from_kwargs(input=" ") is None
|
|
assert (
|
|
RedisSemanticCache._get_prompt_from_kwargs(
|
|
input=[{"type": "input_image", "image_url": "https://example.com"}]
|
|
)
|
|
is None
|
|
)
|
|
|
|
|
|
def test_redis_semantic_cache_prompt_extraction_skips_blank_dict_text_keys():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
prompt = RedisSemanticCache._get_prompt_from_kwargs(
|
|
input={"text": " ", "input_text": "fallback prompt"}
|
|
)
|
|
|
|
assert prompt == "fallback prompt"
|
|
|
|
|
|
def test_redis_semantic_cache_prompt_extraction_skips_blank_object_text_keys():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
class ResponseInput:
|
|
text = " "
|
|
input_text = "fallback prompt"
|
|
|
|
prompt = RedisSemanticCache._get_prompt_from_kwargs(input=ResponseInput())
|
|
|
|
assert prompt == "fallback prompt"
|
|
|
|
|
|
def test_redis_semantic_cache_prompt_extraction_handles_object_content():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
class ResponseInput:
|
|
content = [{"text": "object content prompt"}]
|
|
|
|
prompt = RedisSemanticCache._get_prompt_from_kwargs(input=ResponseInput())
|
|
|
|
assert prompt == "object content prompt"
|
|
|
|
|
|
def test_redis_semantic_cache_set_cache_skips_blank_responses_input():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
|
|
redis_semantic_cache.set_cache(
|
|
key="test_key",
|
|
value={"content": "Paris"},
|
|
input=" ",
|
|
)
|
|
|
|
redis_semantic_cache.llmcache.store.assert_not_called()
|
|
|
|
|
|
def test_redis_semantic_cache_get_cache_sets_similarity_on_blank_responses_input():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
metadata = {}
|
|
|
|
result = redis_semantic_cache.get_cache(
|
|
key="test_key",
|
|
input=" ",
|
|
metadata=metadata,
|
|
)
|
|
|
|
assert result is None
|
|
assert metadata["semantic-similarity"] == 0.0
|
|
redis_semantic_cache.llmcache.check.assert_not_called()
|
|
|
|
|
|
def test_redis_semantic_cache_get_cache_sets_similarity_when_no_results():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
redis_semantic_cache.llmcache.check = MagicMock(return_value=[])
|
|
|
|
with patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
):
|
|
metadata = {}
|
|
result = redis_semantic_cache.get_cache(
|
|
key="test_key",
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
)
|
|
|
|
assert result is None
|
|
assert metadata["semantic-similarity"] == 0.0
|
|
redis_semantic_cache.llmcache.check.assert_called_once_with(
|
|
prompt="What is the capital of France?",
|
|
filter_expression="cache-key-filter",
|
|
)
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_redis_semantic_cache_async_paths_use_responses_string_input():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.similarity_threshold = 0.8
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
redis_semantic_cache.llmcache.astore = AsyncMock()
|
|
redis_semantic_cache.llmcache.acheck = AsyncMock(
|
|
return_value=[
|
|
{
|
|
"prompt": "What is the capital of France?",
|
|
"response": '{"content": "Paris"}',
|
|
"vector_distance": 0.1,
|
|
RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key",
|
|
}
|
|
]
|
|
)
|
|
redis_semantic_cache._get_cache_filters = MagicMock(
|
|
return_value={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"}
|
|
)
|
|
redis_semantic_cache._get_ttl = MagicMock(return_value=None)
|
|
redis_semantic_cache._get_async_embedding = AsyncMock(return_value=[0.1, 0.2, 0.3])
|
|
|
|
await redis_semantic_cache.async_set_cache(
|
|
key="test_key",
|
|
value={"content": "Paris"},
|
|
input="What is the capital of France?",
|
|
)
|
|
|
|
with patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
):
|
|
metadata = {}
|
|
result = await redis_semantic_cache.async_get_cache(
|
|
key="test_key",
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
)
|
|
|
|
redis_semantic_cache.llmcache.astore.assert_called_once_with(
|
|
"What is the capital of France?",
|
|
"{'content': 'Paris'}",
|
|
vector=[0.1, 0.2, 0.3],
|
|
filters={RedisSemanticCache.CACHE_KEY_FIELD_NAME: "test_key"},
|
|
)
|
|
assert result == {"content": "Paris"}
|
|
assert metadata["semantic-similarity"] == pytest.approx(0.9)
|
|
redis_semantic_cache.llmcache.acheck.assert_called_once_with(
|
|
prompt="What is the capital of France?",
|
|
vector=[0.1, 0.2, 0.3],
|
|
filter_expression="cache-key-filter",
|
|
)
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_redis_semantic_cache_async_paths_set_similarity_on_misses():
|
|
from litellm.caching.redis_semantic_cache import RedisSemanticCache
|
|
|
|
redis_semantic_cache = RedisSemanticCache.__new__(RedisSemanticCache)
|
|
redis_semantic_cache.llmcache = MagicMock()
|
|
redis_semantic_cache.llmcache.astore = AsyncMock()
|
|
redis_semantic_cache.llmcache.acheck = AsyncMock(return_value=[])
|
|
redis_semantic_cache._get_async_embedding = AsyncMock(return_value=[0.1, 0.2, 0.3])
|
|
|
|
await redis_semantic_cache.async_set_cache(
|
|
key="test_key",
|
|
value={"content": "Paris"},
|
|
input=" ",
|
|
)
|
|
|
|
redis_semantic_cache.llmcache.astore.assert_not_called()
|
|
redis_semantic_cache._get_async_embedding.assert_not_called()
|
|
|
|
blank_metadata = {}
|
|
blank_result = await redis_semantic_cache.async_get_cache(
|
|
key="test_key",
|
|
input=" ",
|
|
metadata=blank_metadata,
|
|
)
|
|
|
|
assert blank_result is None
|
|
assert blank_metadata["semantic-similarity"] == 0.0
|
|
redis_semantic_cache.llmcache.acheck.assert_not_called()
|
|
redis_semantic_cache._get_async_embedding.assert_not_called()
|
|
|
|
with patch.object(
|
|
redis_semantic_cache,
|
|
"_get_cache_key_filter_expression",
|
|
return_value="cache-key-filter",
|
|
):
|
|
miss_metadata = {}
|
|
miss_result = await redis_semantic_cache.async_get_cache(
|
|
key="test_key",
|
|
input="What is the capital of France?",
|
|
metadata=miss_metadata,
|
|
)
|
|
|
|
assert miss_result is None
|
|
assert miss_metadata["semantic-similarity"] == 0.0
|
|
redis_semantic_cache.llmcache.acheck.assert_called_once_with(
|
|
prompt="What is the capital of France?",
|
|
vector=[0.1, 0.2, 0.3],
|
|
filter_expression="cache-key-filter",
|
|
)
|
|
|
|
|
|
def test_cache_get_cache_passes_responses_input_to_backend_cache():
|
|
from litellm.caching.caching import Cache
|
|
|
|
cache = Cache.__new__(Cache)
|
|
cache.cache = MagicMock()
|
|
cache.cache.get_cache = MagicMock(return_value=None)
|
|
cache.should_use_cache = MagicMock(return_value=True)
|
|
cache.get_cache_key = MagicMock(return_value="test_key")
|
|
|
|
metadata = {}
|
|
cache.get_cache(
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
cache={},
|
|
)
|
|
|
|
cache.cache.get_cache.assert_called_once_with(
|
|
"test_key",
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
)
|
|
|
|
|
|
def test_cache_get_cache_filters_sensitive_kwargs_from_backend_cache():
|
|
from litellm.caching.caching import Cache
|
|
|
|
cache = Cache.__new__(Cache)
|
|
cache.cache = MagicMock()
|
|
cache.should_use_cache = MagicMock(return_value=True)
|
|
cache.get_cache_key = MagicMock(return_value="test_key")
|
|
cache._get_cache_logic = MagicMock(return_value={"content": "Paris"})
|
|
|
|
def _cache_hit(_cache_key, **cache_kwargs):
|
|
cache_kwargs["metadata"]["semantic-similarity"] = 0.7
|
|
return {"content": "Paris"}
|
|
|
|
cache.cache.get_cache = MagicMock(side_effect=_cache_hit)
|
|
|
|
metadata = {"user_api_key": "sk-secret", "trace_id": "trace-id"}
|
|
result = cache.get_cache(
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
cache={"s-maxage": 10},
|
|
api_key="sk-secret",
|
|
headers={"authorization": "Bearer sk-secret"},
|
|
)
|
|
|
|
assert result == {"content": "Paris"}
|
|
assert metadata == {
|
|
"user_api_key": "sk-secret",
|
|
"trace_id": "trace-id",
|
|
"semantic-similarity": 0.7,
|
|
}
|
|
|
|
forwarded_kwargs = cache.cache.get_cache.call_args.kwargs
|
|
assert forwarded_kwargs == {
|
|
"input": "What is the capital of France?",
|
|
"metadata": {"semantic-similarity": 0.7},
|
|
}
|
|
assert forwarded_kwargs["metadata"] is not metadata
|
|
cache._get_cache_logic.assert_called_once_with(
|
|
cached_result={"content": "Paris"},
|
|
max_age=10,
|
|
)
|
|
|
|
|
|
def test_cache_get_cache_filters_sensitive_kwargs_without_metadata():
|
|
from litellm.caching.caching import Cache
|
|
|
|
cache = Cache.__new__(Cache)
|
|
cache.cache = MagicMock()
|
|
cache.cache.get_cache = MagicMock(return_value={"content": "Paris"})
|
|
cache.should_use_cache = MagicMock(return_value=True)
|
|
cache.get_cache_key = MagicMock(return_value="test_key")
|
|
cache._get_cache_logic = MagicMock(return_value={"content": "Paris"})
|
|
|
|
result = cache.get_cache(
|
|
input="What is the capital of France?",
|
|
cache={"s-maxage": 10},
|
|
api_key="sk-secret",
|
|
headers={"authorization": "Bearer sk-secret"},
|
|
)
|
|
|
|
assert result == {"content": "Paris"}
|
|
cache.cache.get_cache.assert_called_once_with(
|
|
"test_key",
|
|
input="What is the capital of France?",
|
|
)
|
|
|
|
|
|
def test_cache_get_cache_passes_responses_input_to_dynamic_cache():
|
|
from litellm.caching.caching import Cache
|
|
|
|
cache = Cache.__new__(Cache)
|
|
cache.should_use_cache = MagicMock(return_value=True)
|
|
cache.get_cache_key = MagicMock(return_value="test_key")
|
|
cache._get_cache_logic = MagicMock(return_value={"content": "Paris"})
|
|
dynamic_cache_object = MagicMock()
|
|
dynamic_cache_object.get_cache = MagicMock(return_value={"content": "Paris"})
|
|
|
|
metadata = {}
|
|
result = cache.get_cache(
|
|
dynamic_cache_object=dynamic_cache_object,
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
cache={},
|
|
)
|
|
|
|
assert result == {"content": "Paris"}
|
|
dynamic_cache_object.get_cache.assert_called_once_with(
|
|
"test_key",
|
|
input="What is the capital of France?",
|
|
metadata=metadata,
|
|
)
|
|
cache._get_cache_logic.assert_called_once_with(
|
|
cached_result={"content": "Paris"},
|
|
max_age=float("inf"),
|
|
)
|