fix(register_model): preserve built-in cache pricing when registering custom overrides under unmapped keys (#30044)
* fix(spend-tracking): fall back to direct spend-counter increment when reservation reconcile fails When the reservation-reconcile path in `_reconcile_budget_reservation_for_counter_update` hits a Redis error, it now correctly returns an empty set so that `increment_spend_counters` re-runs the direct increment for the affected counters. Previously, the function logged the failure, invalidated the reserved counters, and still returned the reserved counter keys, which caused the caller to skip the direct increment. With the increment skipped and the counter deleted, the next request reseeded the counter from `LiteLLM_VerificationToken.spend`, a column the batched flusher only updates every few seconds, so the enforced cross-pod spend value collapsed to a stale snapshot and budget gating stopped firing for affected keys. Adds a regression test that exercises the failure path with a flaky redis backend and asserts the actual response cost lands in the shared counter. * fix(register_model): preserve built-in cache pricing when registering custom overrides under unmapped keys When a custom-priced model is registered under a key shape that get_model_info cannot resolve (e.g. litellm_params.model set to bedrock/bedrock/us.anthropic.claude-sonnet-4-6 or another non-canonical alias), register_model previously fell back to an empty existing_model. The merged entry then carried only the fields the user set explicitly (input/output cost, provider) and dropped cache pricing. Downstream the cost calculator defaulted cache_creation_input_token_cost and cache_read_input_token_cost to 0, silently dropping the bulk of the bill for cache-heavy Anthropic traffic. register_model now attempts to resolve a canonical built-in entry by stripping provider prefixes, region prefixes, and provider-specific suffixes before giving up. When a variant resolves, its defaults (notably cache pricing) are inherited while the user's explicit overrides still win. When nothing resolves and the user supplied no cache pricing, it logs a warning instead of silently under-billing. * fix(router): inherit built-in cache pricing on deployments with partial custom pricing A deployment configured with only input_cost_per_token and output_cost_per_token under model_info was being registered under its model_info.id with no cache cost fields. The cost calculator then defaulted cache_creation_input_token_cost and cache_read_input_token_cost to 0, silently billing cache_read and cache_creation tokens at zero. For cache-heavy Anthropic traffic this drops the bulk of the bill. When the deployment's litellm_params.model resolves to a built-in cost-map entry, pull the cache pricing fields from there before registering. User-specified cache fields still win on merge; only missing fields are inherited. Pairs with the register_model fallback added earlier in this branch: that handles unmapped key shapes like bedrock/bedrock/x, this handles deploy-id keys whose backend model is mapped. * fix(register_model): inherit only cache pricing on unmapped-key fallback, not provider The unmapped-key fallback in register_model copied the entire resolved built-in entry, so registering openai/command-r-plus inherited the cohere built-in's litellm_provider and get_model_info(custom_llm_provider=openai) could no longer resolve it. Restrict the fallback to the cache-pricing fields, matching the router-side _inherit_builtin_cache_pricing, so the cache-cost dropout stays fixed without clobbering the registered provider. Add a direct unit test for Router._inherit_builtin_cache_pricing so the router coverage check sees it, and pin the fixed spend-counter contract: when reservation reconcile fails the counter must hold the directly incremented cost rather than being left at None.
This commit is contained in:
parent
a75ed0079c
commit
410b892f77
@ -2266,7 +2266,7 @@ async def _reconcile_budget_reservation_for_counter_update(
|
||||
)
|
||||
except Exception:
|
||||
verbose_proxy_logger.warning(
|
||||
"Failed to reconcile budget reservation after persisted spend; invalidating reserved counters and continuing",
|
||||
"Failed to reconcile budget reservation after persisted spend; invalidating reserved counters and falling back to direct increment",
|
||||
exc_info=True,
|
||||
)
|
||||
try:
|
||||
@ -2277,6 +2277,7 @@ async def _reconcile_budget_reservation_for_counter_update(
|
||||
verbose_proxy_logger.exception(
|
||||
"Failed to invalidate reserved counters after reservation reconciliation failed"
|
||||
)
|
||||
return set()
|
||||
return reserved_counter_keys
|
||||
|
||||
|
||||
|
||||
@ -7788,6 +7788,39 @@ class Router:
|
||||
|
||||
return hash_object.hexdigest()
|
||||
|
||||
@staticmethod
|
||||
def _inherit_builtin_cache_pricing(
|
||||
model_info: dict, backend_model: str, custom_llm_provider: Optional[str]
|
||||
) -> None:
|
||||
"""Fill missing cache pricing on a custom-priced deployment entry from
|
||||
the backend model's built-in cost map entry, so a deployment that
|
||||
only spells out ``input_cost_per_token``/``output_cost_per_token``
|
||||
does not silently bill cache_read/cache_creation at 0.
|
||||
|
||||
User-specified cache fields always win; only ``None``/missing entries
|
||||
are inherited. No-op when the backend model has no canonical entry.
|
||||
"""
|
||||
cache_fields = (
|
||||
"cache_creation_input_token_cost",
|
||||
"cache_creation_input_token_cost_above_1hr",
|
||||
"cache_creation_input_token_cost_above_200k_tokens",
|
||||
"cache_read_input_token_cost",
|
||||
"cache_read_input_token_cost_above_200k_tokens",
|
||||
)
|
||||
if all(model_info.get(f) is not None for f in cache_fields):
|
||||
return
|
||||
try:
|
||||
backend_info = litellm.get_model_info(
|
||||
model=backend_model, custom_llm_provider=custom_llm_provider
|
||||
)
|
||||
except Exception:
|
||||
return
|
||||
for field in cache_fields:
|
||||
if model_info.get(field) is None:
|
||||
backend_value = backend_info.get(field)
|
||||
if backend_value is not None:
|
||||
model_info[field] = backend_value
|
||||
|
||||
def _create_deployment(
|
||||
self,
|
||||
deployment_info: dict,
|
||||
@ -7816,6 +7849,13 @@ class Router:
|
||||
if deployment.litellm_params.get(field) is not None:
|
||||
_model_info[field] = deployment.litellm_params[field]
|
||||
|
||||
if _model_info.get("input_cost_per_token") is not None:
|
||||
Router._inherit_builtin_cache_pricing(
|
||||
model_info=_model_info,
|
||||
backend_model=deployment.litellm_params.model,
|
||||
custom_llm_provider=deployment.litellm_params.custom_llm_provider,
|
||||
)
|
||||
|
||||
## REGISTER MODEL INFO IN LITELLM MODEL COST MAP
|
||||
model_id = deployment.model_info.id
|
||||
if model_id is not None:
|
||||
@ -8562,6 +8602,13 @@ class Router:
|
||||
if field_value is not None:
|
||||
_model_info_dict[field] = field_value
|
||||
|
||||
if _model_info_dict.get("input_cost_per_token") is not None:
|
||||
Router._inherit_builtin_cache_pricing(
|
||||
model_info=_model_info_dict,
|
||||
backend_model=deployment.litellm_params.model,
|
||||
custom_llm_provider=deployment.litellm_params.custom_llm_provider,
|
||||
)
|
||||
|
||||
# Register custom pricing in litellm.model_cost.
|
||||
# Mirrors _create_deployment() logic to ensure dynamically-added deployments
|
||||
# (e.g., loaded from DB) also have their custom pricing registered.
|
||||
|
||||
@ -2887,6 +2887,61 @@ def _convert_stringified_numbers(value):
|
||||
return value
|
||||
|
||||
|
||||
_BEDROCK_REGION_PREFIXES = (
|
||||
"us.",
|
||||
"eu.",
|
||||
"apac.",
|
||||
"jp.",
|
||||
"au.",
|
||||
"us-gov.",
|
||||
"global.",
|
||||
"ap-northeast-1.",
|
||||
)
|
||||
|
||||
_CACHE_PRICING_FIELDS = (
|
||||
"cache_creation_input_token_cost",
|
||||
"cache_creation_input_token_cost_above_1hr",
|
||||
"cache_creation_input_token_cost_above_200k_tokens",
|
||||
"cache_read_input_token_cost",
|
||||
"cache_read_input_token_cost_above_200k_tokens",
|
||||
)
|
||||
|
||||
|
||||
def _resolve_builtin_model_cost_entry(
|
||||
key: str, provider: str
|
||||
) -> Optional[Dict[str, Any]]:
|
||||
"""Best-effort lookup of a built-in ``model_cost`` entry for a custom key
|
||||
whose shape ``get_model_info`` cannot resolve (double provider prefixes
|
||||
like ``bedrock/bedrock/us.anthropic.claude-sonnet-4-6`` or region aliases).
|
||||
|
||||
Returns a copy of the matching entry so the caller can inherit its defaults
|
||||
(most importantly cache pricing) without mutating the shared built-in.
|
||||
Returns ``None`` when no safe match exists.
|
||||
"""
|
||||
candidates: List[str] = []
|
||||
segments = key.split("/")
|
||||
idx = 0
|
||||
while idx < len(segments) - 1 and segments[idx] in LlmProvidersSet:
|
||||
idx += 1
|
||||
candidates.append("/".join(segments[idx:]))
|
||||
|
||||
base = candidates[-1] if candidates else key
|
||||
for region_prefix in _BEDROCK_REGION_PREFIXES:
|
||||
if base.startswith(region_prefix):
|
||||
candidates.append(base[len(region_prefix) :])
|
||||
|
||||
if provider:
|
||||
stripped = _strip_model_name(model=base, custom_llm_provider=provider)
|
||||
if stripped != base:
|
||||
candidates.append(stripped)
|
||||
|
||||
for candidate in candidates:
|
||||
entry = litellm.model_cost.get(candidate)
|
||||
if entry is not None and entry.get("litellm_provider") is not None:
|
||||
return dict(entry)
|
||||
return None
|
||||
|
||||
|
||||
def register_model(model_cost: Union[str, dict]): # noqa: PLR0915
|
||||
"""
|
||||
Register new / Override existing models (and their pricing) to specific providers.
|
||||
@ -2933,6 +2988,26 @@ def register_model(model_cost: Union[str, dict]): # noqa: PLR0915
|
||||
except Exception:
|
||||
existing_model = {}
|
||||
model_cost_key = key
|
||||
builtin_entry = _resolve_builtin_model_cost_entry(
|
||||
key=_key_str, provider=provider
|
||||
)
|
||||
if builtin_entry is not None:
|
||||
for field in _CACHE_PRICING_FIELDS:
|
||||
if (
|
||||
value.get(field) is None
|
||||
and builtin_entry.get(field) is not None
|
||||
):
|
||||
existing_model[field] = builtin_entry[field]
|
||||
elif (
|
||||
value.get("cache_creation_input_token_cost") is None
|
||||
and value.get("cache_read_input_token_cost") is None
|
||||
):
|
||||
verbose_logger.warning(
|
||||
f"register_model: model={key} not in built-in cost map and no "
|
||||
"prefix/region variant matched; cache cost fields will default "
|
||||
"to 0. To track cache cost, add cache_creation_input_token_cost "
|
||||
"and cache_read_input_token_cost to model_info"
|
||||
)
|
||||
# ``get_model_info`` returns ``litellm_provider: None`` when the
|
||||
# provider is unknown (e.g. custom deployments registered via
|
||||
# ``Router.add_deployment``). Persisting that None into
|
||||
|
||||
@ -192,8 +192,9 @@ async def test_reconcile_budget_reservation_for_counter_update_returns_empty_set
|
||||
async def test_reconcile_budget_reservation_for_counter_update_failure_invalidates(
|
||||
monkeypatch,
|
||||
):
|
||||
"""Reservation reconcile raising must invalidate reserved counters but
|
||||
not propagate the exception."""
|
||||
"""Reservation reconcile raising must invalidate reserved counters, swallow
|
||||
the exception, and return an empty set so the caller falls back to the
|
||||
direct spend-counter increment instead of skipping it."""
|
||||
import litellm.proxy.spend_tracking.budget_reservation as br
|
||||
|
||||
monkeypatch.setattr(
|
||||
@ -213,7 +214,7 @@ async def test_reconcile_budget_reservation_for_counter_update_failure_invalidat
|
||||
budget_reservation={"foo": "bar"}, response_cost=1.0
|
||||
)
|
||||
|
||||
assert result == {"spend:key:abc"}
|
||||
assert result == set()
|
||||
assert fake_invalidate.called is True
|
||||
|
||||
|
||||
|
||||
@ -0,0 +1,87 @@
|
||||
"""
|
||||
Regression test for enforced-spend underreporting when Redis fails during the
|
||||
budget-reservation reconcile step of ``increment_spend_counters``.
|
||||
|
||||
Production failure mode: a managed Redis returns an intermittent timeout on the
|
||||
reconcile increment. Reconcile deletes (invalidates) the shared counter and
|
||||
gives up, but ``increment_spend_counters`` still treats the counter as
|
||||
"already reconciled" and skips the direct increment. The actual call cost never
|
||||
lands in the enforced counter, so budgets stop gating until the next cold
|
||||
reseed pulls a lagging value from the DB.
|
||||
|
||||
The fix makes the reconcile path fall back to the direct increment when it
|
||||
fails, so the actual cost is always written to the shared counter.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
from litellm.caching import DualCache
|
||||
from litellm.proxy import proxy_server
|
||||
|
||||
|
||||
class _FlakyRedisCache:
|
||||
def __init__(self) -> None:
|
||||
self._store: dict = {}
|
||||
self._increment_calls = 0
|
||||
|
||||
async def async_increment(self, key, value, **kwargs):
|
||||
self._increment_calls += 1
|
||||
if self._increment_calls == 1:
|
||||
raise Exception("Redis timeout")
|
||||
self._store[key] = float(self._store.get(key, 0.0)) + float(value)
|
||||
return self._store[key]
|
||||
|
||||
async def async_get_cache(self, key, *args, **kwargs):
|
||||
return self._store.get(key)
|
||||
|
||||
async def async_delete_cache(self, key, *args, **kwargs):
|
||||
self._store.pop(key, None)
|
||||
|
||||
async def async_set_cache(self, key, value, *args, **kwargs):
|
||||
self._store[key] = float(value)
|
||||
return True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_direct_increment_runs_when_reservation_reconcile_hits_redis_failure(
|
||||
monkeypatch,
|
||||
):
|
||||
hashed_token = "hashed_test_token"
|
||||
counter_key = f"spend:key:{hashed_token}"
|
||||
reserved_cost = 0.5
|
||||
response_cost = 1.0
|
||||
|
||||
flaky_redis = _FlakyRedisCache()
|
||||
flaky_redis._store[counter_key] = reserved_cost
|
||||
|
||||
monkeypatch.setattr(proxy_server, "prisma_client", None)
|
||||
monkeypatch.setattr(proxy_server, "user_api_key_cache", DualCache())
|
||||
monkeypatch.setattr(proxy_server.spend_counter_cache, "redis_cache", flaky_redis)
|
||||
proxy_server.spend_counter_cache.in_memory_cache.set_cache(
|
||||
key=counter_key, value=reserved_cost
|
||||
)
|
||||
|
||||
budget_reservation = {
|
||||
"reserved_cost": reserved_cost,
|
||||
"finalized": False,
|
||||
"entries": [
|
||||
{
|
||||
"counter_key": counter_key,
|
||||
"entity_type": "Key",
|
||||
"entity_id": hashed_token,
|
||||
"reserved_cost": reserved_cost,
|
||||
"applied_adjustment": 0.0,
|
||||
}
|
||||
],
|
||||
}
|
||||
|
||||
await proxy_server.increment_spend_counters(
|
||||
token=hashed_token,
|
||||
team_id=None,
|
||||
user_id=None,
|
||||
response_cost=response_cost,
|
||||
budget_reservation=budget_reservation,
|
||||
)
|
||||
|
||||
enforced_spend = await flaky_redis.async_get_cache(key=counter_key)
|
||||
assert enforced_spend == response_cost
|
||||
@ -6609,7 +6609,12 @@ async def test_increment_spend_counters_finalizes_none_cost_reservation():
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_increment_spend_counters_invalidates_bad_reserved_counter_without_failing():
|
||||
async def test_increment_spend_counters_falls_back_to_direct_increment_on_bad_reserved_counter():
|
||||
"""When the reservation reconcile fails, the reserved counters are
|
||||
invalidated and the actual response cost must still be written via the
|
||||
direct increment fallback. Leaving the counter at ``None`` lets the next
|
||||
request reseed a stale value from the DB and silently stops budget gating,
|
||||
which is the bug this fix addresses."""
|
||||
from litellm.caching.dual_cache import DualCache
|
||||
from litellm.proxy.proxy_server import increment_spend_counters
|
||||
|
||||
@ -6650,7 +6655,7 @@ async def test_increment_spend_counters_invalidates_bad_reserved_counter_without
|
||||
counter_cache.in_memory_cache.get_cache(
|
||||
key="spend:key:key-bad-reserved-counter"
|
||||
)
|
||||
is None
|
||||
== 0.25
|
||||
)
|
||||
finally:
|
||||
ps.spend_counter_cache = orig_counter
|
||||
|
||||
@ -301,6 +301,126 @@ def test_register_model_strips_none_litellm_provider_from_get_model_info(monkeyp
|
||||
litellm.model_cost.pop(model_key, None)
|
||||
|
||||
|
||||
def test_register_model_inherits_builtin_cache_pricing_for_unmapped_key():
|
||||
"""Registering a custom override under a key shape that
|
||||
``get_model_info`` cannot resolve (e.g. a double provider prefix like
|
||||
``bedrock/bedrock/us.anthropic.claude-sonnet-4-6``) must still inherit
|
||||
the built-in cache pricing for the underlying model.
|
||||
|
||||
Before the fix ``register_model`` fell back to an empty ``existing_model``
|
||||
so the merged entry only carried the fields the user set explicitly
|
||||
(input/output cost). ``cache_creation_input_token_cost`` and
|
||||
``cache_read_input_token_cost`` were absent, and the cost calculator
|
||||
silently charged 0 for every cache token, dropping the bulk of the bill
|
||||
for cache-heavy Anthropic traffic.
|
||||
|
||||
Regression for the cache-pricing dropout under partial overrides.
|
||||
"""
|
||||
from litellm.litellm_core_utils.llm_cost_calc.utils import generic_cost_per_token
|
||||
from litellm.types.utils import PromptTokensDetailsWrapper, Usage
|
||||
|
||||
original_model_cost = litellm.model_cost
|
||||
os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True"
|
||||
litellm.model_cost = litellm.get_model_cost_map(url="")
|
||||
|
||||
builtin_key = "us.anthropic.claude-sonnet-4-6"
|
||||
registered_key = f"bedrock/bedrock/{builtin_key}"
|
||||
builtin = litellm.model_cost[builtin_key]
|
||||
|
||||
assert builtin["cache_creation_input_token_cost"] > 0
|
||||
assert builtin["cache_read_input_token_cost"] > 0
|
||||
|
||||
try:
|
||||
litellm.register_model(
|
||||
{
|
||||
registered_key: {
|
||||
"input_cost_per_token": builtin["input_cost_per_token"],
|
||||
"output_cost_per_token": builtin["output_cost_per_token"],
|
||||
"litellm_provider": "bedrock",
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
registered = litellm.model_cost[registered_key]
|
||||
assert (
|
||||
registered.get("cache_creation_input_token_cost")
|
||||
== builtin["cache_creation_input_token_cost"]
|
||||
)
|
||||
assert (
|
||||
registered.get("cache_read_input_token_cost")
|
||||
== builtin["cache_read_input_token_cost"]
|
||||
)
|
||||
assert registered["litellm_provider"] == "bedrock"
|
||||
|
||||
usage = Usage(
|
||||
prompt_tokens=1100,
|
||||
completion_tokens=100,
|
||||
total_tokens=1200,
|
||||
prompt_tokens_details=PromptTokensDetailsWrapper(
|
||||
cached_tokens=800,
|
||||
text_tokens=100,
|
||||
),
|
||||
cache_creation_input_tokens=200,
|
||||
)
|
||||
|
||||
input_cost, output_cost = generic_cost_per_token(
|
||||
model=registered_key,
|
||||
usage=usage,
|
||||
custom_llm_provider="bedrock",
|
||||
)
|
||||
|
||||
text_only_cost = builtin["input_cost_per_token"] * 100
|
||||
expected_input_cost = (
|
||||
text_only_cost
|
||||
+ builtin["cache_read_input_token_cost"] * 800
|
||||
+ builtin["cache_creation_input_token_cost"] * 200
|
||||
)
|
||||
assert abs(input_cost - expected_input_cost) < 1e-12
|
||||
assert abs(output_cost - builtin["output_cost_per_token"] * 100) < 1e-12
|
||||
assert input_cost > text_only_cost + 1e-12
|
||||
finally:
|
||||
litellm.model_cost.pop(registered_key, None)
|
||||
litellm.model_cost = original_model_cost
|
||||
os.environ.pop("LITELLM_LOCAL_MODEL_COST_MAP", None)
|
||||
from litellm.utils import _invalidate_model_cost_lowercase_map
|
||||
|
||||
_invalidate_model_cost_lowercase_map()
|
||||
|
||||
|
||||
def test_register_model_warns_when_no_builtin_match_for_cache_pricing(caplog):
|
||||
"""When a custom override is registered under a key that neither
|
||||
``get_model_info`` nor any prefix/region variant can resolve to a
|
||||
built-in entry, ``register_model`` must warn that cache cost fields will
|
||||
default to 0 instead of silently producing an under-billed entry.
|
||||
"""
|
||||
import logging
|
||||
|
||||
from litellm._logging import verbose_logger
|
||||
|
||||
registered_key = "bedrock/totally-made-up-model-alias-xyz"
|
||||
litellm.model_cost.pop(registered_key, None)
|
||||
|
||||
try:
|
||||
with caplog.at_level(logging.WARNING, logger=verbose_logger.name):
|
||||
litellm.register_model(
|
||||
{
|
||||
registered_key: {
|
||||
"input_cost_per_token": 0.001,
|
||||
"output_cost_per_token": 0.002,
|
||||
"litellm_provider": "bedrock",
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
assert any(
|
||||
registered_key in record.message
|
||||
and "cache_creation_input_token_cost" in record.message
|
||||
for record in caplog.records
|
||||
), "expected a warning naming the unmapped key and the cache cost fields"
|
||||
finally:
|
||||
litellm.model_cost.pop(registered_key, None)
|
||||
|
||||
|
||||
def test_register_model_router_add_deployment_custom_pricing_applies():
|
||||
"""End-to-end regression for https://github.com/BerriAI/litellm/issues/28336.
|
||||
|
||||
@ -344,9 +464,9 @@ def test_register_model_router_add_deployment_custom_pricing_applies():
|
||||
f"{model_key} / {deployment_model}"
|
||||
)
|
||||
for k in registered_keys:
|
||||
assert _check_provider_match(litellm.model_cost[k], "openai") is True, (
|
||||
f"custom pricing for {k} was dropped by _check_provider_match"
|
||||
)
|
||||
assert (
|
||||
_check_provider_match(litellm.model_cost[k], "openai") is True
|
||||
), f"custom pricing for {k} was dropped by _check_provider_match"
|
||||
finally:
|
||||
litellm.model_cost.pop(model_key, None)
|
||||
litellm.model_cost.pop(deployment_model, None)
|
||||
|
||||
@ -402,3 +402,138 @@ def test_should_not_downgrade_chatgpt_shared_key_mode_with_alias_override():
|
||||
assert bridge_model_info["mode"] == "responses"
|
||||
finally:
|
||||
_restore_model_cost_entries(model_keys)
|
||||
|
||||
|
||||
def test_partial_custom_pricing_inherits_builtin_cache_pricing():
|
||||
"""A deployment that overrides only input/output cost on a cache-supporting
|
||||
model must still bill cache_read and cache_creation tokens. Before the
|
||||
fix the deploy-id entry was registered with the user's two fields and
|
||||
nothing else, so the cost calculator silently billed cache tokens at 0.
|
||||
Regression for the prompt-caching cost dropout reported by the customer.
|
||||
"""
|
||||
backend_model = "anthropic/claude-sonnet-4-5-20250929"
|
||||
deploy_id = "claude-deploy-partial-pricing"
|
||||
|
||||
builtin_info = litellm.get_model_info(model=backend_model)
|
||||
builtin_cache_create = builtin_info["cache_creation_input_token_cost"]
|
||||
builtin_cache_read = builtin_info["cache_read_input_token_cost"]
|
||||
assert builtin_cache_create is not None and builtin_cache_create > 0
|
||||
assert builtin_cache_read is not None and builtin_cache_read > 0
|
||||
|
||||
model_keys = {
|
||||
deploy_id: litellm.model_cost.get(deploy_id),
|
||||
backend_model: copy.deepcopy(litellm.model_cost.get(backend_model)),
|
||||
}
|
||||
try:
|
||||
Router(
|
||||
model_list=[
|
||||
{
|
||||
"model_name": "claude-custom",
|
||||
"litellm_params": {
|
||||
"model": backend_model,
|
||||
"api_key": "fake-key",
|
||||
},
|
||||
"model_info": {
|
||||
"id": deploy_id,
|
||||
"input_cost_per_token": 0.000003,
|
||||
"output_cost_per_token": 0.000015,
|
||||
},
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
entry = litellm.model_cost[deploy_id]
|
||||
assert entry["input_cost_per_token"] == 0.000003
|
||||
assert entry["output_cost_per_token"] == 0.000015
|
||||
assert entry.get("cache_creation_input_token_cost") == builtin_cache_create
|
||||
assert entry.get("cache_read_input_token_cost") == builtin_cache_read
|
||||
finally:
|
||||
_restore_model_cost_entries(model_keys)
|
||||
|
||||
|
||||
def test_partial_pricing_does_not_overwrite_explicit_cache_fields():
|
||||
"""When the user explicitly sets cache_*_input_token_cost on a deployment,
|
||||
those values must not be replaced by the built-in fallback.
|
||||
"""
|
||||
backend_model = "anthropic/claude-sonnet-4-5-20250929"
|
||||
deploy_id = "claude-deploy-explicit-cache"
|
||||
|
||||
explicit_cache_create = 0.00001
|
||||
explicit_cache_read = 0.0000005
|
||||
builtin_info = litellm.get_model_info(model=backend_model)
|
||||
assert builtin_info["cache_creation_input_token_cost"] != explicit_cache_create
|
||||
assert builtin_info["cache_read_input_token_cost"] != explicit_cache_read
|
||||
|
||||
model_keys = {
|
||||
deploy_id: litellm.model_cost.get(deploy_id),
|
||||
backend_model: copy.deepcopy(litellm.model_cost.get(backend_model)),
|
||||
}
|
||||
try:
|
||||
Router(
|
||||
model_list=[
|
||||
{
|
||||
"model_name": "claude-custom-explicit",
|
||||
"litellm_params": {
|
||||
"model": backend_model,
|
||||
"api_key": "fake-key",
|
||||
},
|
||||
"model_info": {
|
||||
"id": deploy_id,
|
||||
"input_cost_per_token": 0.000003,
|
||||
"output_cost_per_token": 0.000015,
|
||||
"cache_creation_input_token_cost": explicit_cache_create,
|
||||
"cache_read_input_token_cost": explicit_cache_read,
|
||||
},
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
entry = litellm.model_cost[deploy_id]
|
||||
assert entry.get("cache_creation_input_token_cost") == explicit_cache_create
|
||||
assert entry.get("cache_read_input_token_cost") == explicit_cache_read
|
||||
finally:
|
||||
_restore_model_cost_entries(model_keys)
|
||||
|
||||
|
||||
def test_inherit_builtin_cache_pricing_fills_only_missing_fields():
|
||||
"""Direct unit test of the helper: missing cache fields are filled from the
|
||||
backend model's built-in entry, while an explicitly set cache field and the
|
||||
user's input/output pricing are left untouched.
|
||||
"""
|
||||
backend_model = "anthropic/claude-sonnet-4-5-20250929"
|
||||
builtin_info = litellm.get_model_info(model=backend_model)
|
||||
builtin_cache_create = builtin_info["cache_creation_input_token_cost"]
|
||||
builtin_cache_read = builtin_info["cache_read_input_token_cost"]
|
||||
assert builtin_cache_create is not None and builtin_cache_create > 0
|
||||
assert builtin_cache_read is not None and builtin_cache_read > 0
|
||||
|
||||
explicit_cache_read = builtin_cache_read + 1
|
||||
model_info = {
|
||||
"input_cost_per_token": 0.000003,
|
||||
"cache_read_input_token_cost": explicit_cache_read,
|
||||
}
|
||||
|
||||
Router._inherit_builtin_cache_pricing(
|
||||
model_info=model_info,
|
||||
backend_model=backend_model,
|
||||
custom_llm_provider="anthropic",
|
||||
)
|
||||
|
||||
assert model_info["input_cost_per_token"] == 0.000003
|
||||
assert model_info["cache_read_input_token_cost"] == explicit_cache_read
|
||||
assert model_info["cache_creation_input_token_cost"] == builtin_cache_create
|
||||
|
||||
|
||||
def test_inherit_builtin_cache_pricing_noop_for_unknown_backend():
|
||||
"""No canonical entry for the backend model means the helper leaves the
|
||||
passed-in dict unchanged rather than raising.
|
||||
"""
|
||||
model_info = {"input_cost_per_token": 0.000003}
|
||||
|
||||
Router._inherit_builtin_cache_pricing(
|
||||
model_info=model_info,
|
||||
backend_model="this-backend-model-does-not-exist-x9y8z7",
|
||||
custom_llm_provider=None,
|
||||
)
|
||||
|
||||
assert model_info == {"input_cost_per_token": 0.000003}
|
||||
|
||||
Loading…
Reference in New Issue
Block a user