Resolve merge conflict by including both CompactifAI and OVHCloud providers

- Keep CompactifAI provider detection logic
- Include new OVHCloud provider from main branch
- Both providers now work correctly with model prefix detection
This commit is contained in:
Tim Elfrink 2025-09-14 23:03:18 +02:00
commit 9521414efa
250 changed files with 14752 additions and 1245 deletions

View File

@ -346,6 +346,7 @@ curl 'http://0.0.0.0:4000/key/generate' \
| [Featherless AI](https://docs.litellm.ai/docs/providers/featherless_ai) | ✅ | ✅ | ✅ | ✅ | | |
| [Nebius AI Studio](https://docs.litellm.ai/docs/providers/nebius) | ✅ | ✅ | ✅ | ✅ | ✅ | |
| [Heroku](https://docs.litellm.ai/docs/providers/heroku) | ✅ | ✅ | | | | |
| [OVHCloud AI Endpoints](https://docs.litellm.ai/docs/providers/ovhcloud) | ✅ | ✅ | | | | |
[**Read the Docs**](https://docs.litellm.ai/docs/)

View File

@ -0,0 +1,256 @@
# LiteLLM Release Notes Generation Instructions
This document provides comprehensive instructions for AI agents to generate release notes for LiteLLM following the established format and style.
## Required Inputs
1. **Release Version** (e.g., `v1.76.3-stable`)
2. **PR Diff/Changelog** - List of PRs with titles and contributors
3. **Previous Version Commit Hash** - To compare model pricing changes
4. **Reference Release Notes** - Previous release notes to follow style/format
## Step-by-Step Process
### 1. Initial Setup and Analysis
```bash
# Check git diff for model pricing changes
git diff <previous_commit_hash> HEAD -- model_prices_and_context_window.json
```
**Key Analysis Points:**
- New models added (look for new entries)
- Deprecated models removed (look for deleted entries)
- Pricing updates (look for cost changes)
- Feature support changes (tool calling, reasoning, etc.)
### 2. Release Notes Structure
Follow this exact structure based on `docs/my-website/release_notes/v1.76.1-stable/index.md`:
```markdown
---
title: "v1.76.X-stable - [Key Theme]"
slug: "v1-76-X"
date: YYYY-MM-DDTHH:mm:ss
authors: [standard author block]
hide_table_of_contents: false
---
## Deploy this version
[Docker and pip installation tabs]
## Key Highlights
[3-5 bullet points of major features]
## Major Changes
[Critical changes users need to know]
## Performance Improvements
[Performance-related changes]
## New Models / Updated Models
[Detailed model tables and provider updates]
## LLM API Endpoints
[API-related features and fixes]
## Management Endpoints / UI
[Admin interface and management changes]
## Logging / Guardrail Integrations
[Observability and security features]
## Performance / Loadbalancing / Reliability improvements
[Infrastructure improvements]
## General Proxy Improvements
[Other proxy-related changes]
## New Contributors
[List of first-time contributors]
## Full Changelog
[Link to GitHub comparison]
```
### 3. Categorization Rules
**Performance Improvements:**
- RPS improvements
- Memory optimizations
- CPU usage optimizations
- Timeout controls
- Worker configuration
**New Models/Updated Models:**
- Extract from model_prices_and_context_window.json diff
- Create tables with: Provider, Model, Context Window, Input Cost, Output Cost, Features
- Group by provider
- Note pricing corrections
- Highlight deprecated models
**Provider Features:**
- Group by provider (Gemini, OpenAI, Anthropic, etc.)
- Link to provider docs: `../../docs/providers/[provider_name]`
- Separate features from bug fixes
**API Endpoints:**
- Images API
- Video Generation (if applicable)
- Responses API
- Passthrough endpoints
- General chat completions
**UI/Management:**
- Authentication changes
- Dashboard improvements
- Team management
- Key management
**Integrations:**
- Logging providers (Datadog, Braintrust, etc.)
- Guardrails
- Cost tracking
- Observability
### 4. Documentation Linking Strategy
**Link to docs when:**
- New provider support added
- Significant feature additions
- API endpoint changes
- Integration additions
**Link format:** `../../docs/[category]/[specific_doc]`
**Common doc paths:**
- `../../docs/providers/[provider]` - Provider-specific docs
- `../../docs/image_generation` - Image generation
- `../../docs/video_generation` - Video generation (if exists)
- `../../docs/response_api` - Responses API
- `../../docs/proxy/logging` - Logging integrations
- `../../docs/proxy/guardrails` - Guardrails
- `../../docs/pass_through/[provider]` - Passthrough endpoints
### 5. Model Table Generation
From git diff analysis, create tables like:
```markdown
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
| -------- | ----- | -------------- | ------------------- | -------------------- | -------- |
| OpenRouter | `openrouter/openai/gpt-4.1` | 1M | $2.00 | $8.00 | Chat completions with vision |
```
**Extract from JSON:**
- `max_input_tokens` → Context Window
- `input_cost_per_token` × 1,000,000 → Input cost
- `output_cost_per_token` × 1,000,000 → Output cost
- `supports_*` fields → Features
- Special pricing fields (per image, per second) for generation models
### 6. PR Categorization Logic
**By Keywords in PR Title:**
- `[Perf]`, `Performance`, `RPS` → Performance Improvements
- `[Bug]`, `[Bug Fix]`, `Fix` → Bug Fixes section
- `[Feat]`, `[Feature]`, `Add support` → Features section
- `[Docs]` → Documentation (usually exclude from main sections)
- Provider names (Gemini, OpenAI, etc.) → Group under provider
**By PR Content Analysis:**
- New model additions → New Models section
- UI changes → Management Endpoints/UI
- Logging/observability → Logging/Guardrail Integrations
- Rate limiting/budgets → Performance/Reliability
- Authentication → Management Endpoints
### 7. Writing Style Guidelines
**Tone:**
- Professional but accessible
- Focus on user impact
- Highlight breaking changes clearly
- Use active voice
**Formatting:**
- Use consistent markdown formatting
- Include PR links: `[PR #XXXXX](https://github.com/BerriAI/litellm/pull/XXXXX)`
- Use code blocks for configuration examples
- Bold important terms and section headers
**Warnings/Notes:**
- Add warning boxes for breaking changes
- Include migration instructions when needed
- Provide override options for default changes
### 8. Quality Checks
**Before finalizing:**
- Verify all PR links work
- Check documentation links are valid
- Ensure model pricing is accurate
- Confirm provider names are consistent
- Review for typos and formatting issues
### 9. Common Patterns to Follow
**Performance Changes:**
```markdown
- **+400 RPS Performance Boost** - Description - [PR #XXXXX](link)
```
**New Models:**
Always include pricing table and feature highlights
**Breaking Changes:**
```markdown
:::warning
This release has a known issue...
:::
```
**Provider Features:**
```markdown
- **[Provider Name](../../docs/providers/provider)**
- Feature description - [PR #XXXXX](link)
```
### 10. Missing Documentation Check
**Review for missing docs:**
- New providers without documentation
- New API endpoints without examples
- Complex features without guides
- Integration setup instructions
**Flag for documentation needs:**
- New provider integrations
- Significant API changes
- Complex configuration options
- Migration requirements
## Example Command Workflow
```bash
# 1. Get model changes
git diff <commit> HEAD -- model_prices_and_context_window.json
# 2. Analyze PR list for categorization
# 3. Create release notes following template
# 4. Link to appropriate documentation
# 5. Review for missing documentation needs
```
## Output Requirements
- Follow exact markdown structure from reference
- Include all PR links and contributors
- Provide accurate model pricing tables
- Link to relevant documentation
- Highlight breaking changes with warnings
- Include deployment instructions
- End with full changelog link
This process ensures consistent, comprehensive release notes that help users understand changes and upgrade smoothly.

View File

@ -65,6 +65,7 @@ Use `litellm.get_supported_openai_params()` for an updated list of params for ea
| Github | ✅| ✅ | ✅ | ✅| ✅ | ✅ | ✅ | ✅| ✅ | ✅| ✅|| || ✅ | ✅ (model dependent) | ✅ (model dependent) || ||
| Novita AI| ✅| ✅ || ✅| ✅ | ✅ | ✅ | ✅| ✅ | ✅| || ✅||| |||| ||
| Bytez | ✅| ✅ || ✅| ✅ | | | ✅|| || || || || || ||
| OVHCloud AI Endpoints | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | | | | | | | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
:::note

View File

@ -0,0 +1,380 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# 🆕 OVHCloud AI Endpoints
Leading French Cloud provider in Europe with data sovereignty and privacy.
You can explore the last models we made available in our [catalog](https://endpoints.ai.cloud.ovh.net/catalog).
:::tip
We support ALL OVHCloud AI Endpoints models, just set `model=ovhcloud/<any-model-on-ai-endpoints>` as a prefix when sending litellm requests.
For the complete models catalog, visit https://endpoints.ai.cloud.ovh.net/catalog. **
:::
## Sample usage
### Chat completion
You can define your API key by setting the `OVHCLOUD_API_KEY` environment variable or by overriding the `api_key` parameter. You can generate a key on the [OVHCloud Manager](https://www.ovh.com/manager).
```python
from litellm import completion
import os
# Our API is free but ratelimited for calls without an API key.
os.environ['OVHCLOUD_API_KEY'] = "your-api-key"
response = completion(
model = "ovhcloud/Meta-Llama-3_3-70B-Instruct",
messages = [
{
"role": "user",
"content": "Hello, how are you?",
}
],
max_tokens = 10,
stop = [],
temperature = 0.2,
top_p = 0.9,
user = "user",
api_key = "your-api-key" # Optional if set through the enviromnent variable.
)
print(response)
```
### Streaming
Set the parameter `stream` to `True` to stream a response.
```python
from litellm import completion
import os
os.environ['OVHCLOUD_API_KEY'] = "your-api-key"
response = completion(
model = "ovhcloud/Meta-Llama-3_3-70B-Instruct",
messages = [
{
"role": "user",
"content": "Hello, how are you?",
}
],
max_tokens = 10,
stop = [],
temperature = 0.2,
top_p = 0.9,
user = "user",
api_key = "your-api-key" # Optional if set through the enviromnent variable,
stream = True
)
for part in response:
print(response)
```
### Tool Calling
```python
from litellm import completion
import json
def get_current_weather(location, unit="celsius"):
if unit == "celsius":
return {"location": location, "temperature": "22", "unit": "celsius"}
else:
return {"location": location, "temperature": "72", "unit": "fahrenheit"}
def print_message(role, content, is_tool_call=False, function_name=None):
if role == "user":
print(f"🧑 User: {content}")
elif role == "assistant":
if is_tool_call:
print(f"🤖 Assistant: I will call the function '{function_name}' to get some informations.")
else:
print(f"🤖 Assistant: {content}")
elif role == "tool":
print(f"🔧 Tool ({function_name}): {content}")
print()
messages = [{"role": "user", "content": "What's the weather like in Paris?"}]
model = "ovhcloud/Meta-Llama-3_3-70B-Instruct"
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and country, e.g. Montréal, Canada",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
print("🌟 Beginning of the conversation")
# Initial user message
print_message("user", messages[0]["content"])
# First request to the model
print("📡 Sending first request to the model...")
response = completion(
model=model,
messages=messages,
tools=tools,
tool_choice="auto",
)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls
if tool_calls:
available_functions = {
"get_current_weather": get_current_weather,
}
# Display the tool calls suggested by the model
for tool_call in tool_calls:
print_message("assistant", "", is_tool_call=True, function_name=tool_call.function.name)
print(f" 📋 Arguments: {tool_call.function.arguments}")
print()
# Add assistant message with tool calls to the conversation history
assistant_message = {
"role": "assistant",
"content": response_message.content,
"tool_calls": [
{
"id": tool_call.id,
"type": "function",
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments
}
} for tool_call in tool_calls
]
}
messages.append(assistant_message)
# Execute each tool call and add the results to the conversation history
for tool_call in tool_calls:
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
print(f"🔧 Executing function '{function_name}'...")
function_response = function_to_call(
location=function_args.get("location"),
unit=function_args.get("unit"),
)
# Display tool response
print_message("tool", json.dumps(function_response, indent=2), function_name=function_name)
messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": json.dumps(function_response),
})
print("📡 Sending second request to the model with results...")
# Second request with function results
second_response = completion(
model=model,
messages=messages
)
# Display final response
final_content = second_response.choices[0].message.content
print_message("assistant", final_content)
else:
print("❌ No function call detected")
print_message("assistant", response_message.content)
```
### Vision Example
```python
from base64 import b64encode
from mimetypes import guess_type
import litellm
# Auxiliary function to get b64 images
def data_url_from_image(file_path):
mime_type, _ = guess_type(file_path)
if mime_type is None:
raise ValueError("Could not determine MIME type of the file")
with open(file_path, "rb") as image_file:
encoded_string = b64encode(image_file.read()).decode("utf-8")
data_url = f"data:{mime_type};base64,{encoded_string}"
return data_url
response = litellm.completion(
model = "ovhcloud/Mistral-Small-3.2-24B-Instruct-2506",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": data_url_from_image("your_image.jpg"),
"format": "image/jpeg"
}
}
]
}
],
stream=False
)
print(response.choices[0].message.content)
```
### Structured Output
```python
from litellm import completion
response = completion(
model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
messages=[
{
"role": "system",
"content": (
"You are a specialist in extracting structured data from unstructured text. "
"Your task is to identify relevant entities and categories, then format them "
"according to the requested structure."
),
},
{
"role": "user",
"content": "Room 12 contains books, a desk, and a lamp."
},
],
response_format={
"type": "json_schema",
"json_schema": {
"title": "data",
"name": "data_extraction",
"schema": {
"type": "object",
"properties": {
"section": {"type": "string"},
"products": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["section", "products"],
"additionalProperties": False
},
"strict": False
}
},
stream=False
)
print(response.choices[0].message.content)
```
### Embeddings
```python
from litellm import embedding
response = embedding(
model="ovhcloud/BGE-M3",
input=["sample text to embed", "another sample text to embed"]
)
print(response.data)
```
## Usage with LiteLLM Proxy Server
Here's how to call a OVHCloud AI Endpoints model with the LiteLLM Proxy Server
1. Modify the config.yaml
```yaml
model_list:
- model_name: my-model
litellm_params:
model: ovhcloud/<your-model-name> # add ovhcloud/ prefix to route as OVHCloud provider
api_key: api-key # api key to send your model
```
2. Start the proxy
```bash
$ litellm --config /path/to/config.yaml
```
3. Send Request to LiteLLM Proxy Server
<Tabs>
<TabItem value="openai" label="OpenAI Python v1.0.0+">
```python
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="my-model",
messages = [
{
"role": "user",
"content": "what llm are you"
}
],
)
print(response)
```
</TabItem>
<TabItem value="curl" label="curl">
```shell
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
```
</TabItem>
</Tabs>

View File

@ -8,9 +8,9 @@ LiteLLM supports all models on VLLM.
| Property | Details |
|-------|-------|
| Description | vLLM is a fast and easy-to-use library for LLM inference and serving. [Docs](https://docs.vllm.ai/en/latest/index.html) |
| Provider Route on LiteLLM | `hosted_vllm/` (for OpenAI compatible server), `vllm/` (for vLLM sdk usage) |
| Provider Route on LiteLLM | `hosted_vllm/` (for OpenAI compatible server), `vllm/` ([DEPRECATED] for vLLM sdk usage) |
| Provider Doc | [vLLM ↗](https://docs.vllm.ai/en/latest/index.html) |
| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions`, `/rerank` |
| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions`, `/rerank`, `/audio/transcriptions` |
# Quick Start

View File

@ -4,6 +4,10 @@ import TabItem from '@theme/TabItem';
# ✨ SSO for Admin UI
:::info
From v1.76.0, SSO is now Free for up to 5 users.
:::
:::info
✨ SSO is on LiteLLM Enterprise

View File

@ -0,0 +1,212 @@
# Forward Client Headers to LLM API
Control which model groups can forward client headers to the underlying LLM provider APIs.
## Overview
By default, LiteLLM does not forward client headers to LLM provider APIs for security reasons. However, you can selectively enable header forwarding for specific model groups using the `forward_client_headers_to_llm_api` setting.
## Configuration
## Enable Globally
```yaml
general_settings:
forward_client_headers_to_llm_api: true
```
## Enable for a Model Group
Add the `forward_client_headers_to_llm_api` setting under `model_group_settings` in your configuration:
```yaml
model_list:
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: "your-api-key"
- model_name: "wildcard-models/*"
litellm_params:
model: "openai/*"
api_key: "your-api-key"
litellm_settings:
model_group_settings:
forward_client_headers_to_llm_api:
- gpt-4o-mini
- wildcard-models/*
```
## Supported Model Patterns
The configuration supports various model matching patterns:
### 1. Exact Model Names
```yaml
forward_client_headers_to_llm_api:
- gpt-4o-mini
- claude-3-sonnet
```
### 2. Wildcard Patterns
```yaml
forward_client_headers_to_llm_api:
- "openai/*" # All OpenAI models
- "anthropic/*" # All Anthropic models
- "wildcard-group/*" # All models in wildcard-group
```
### 3. Team Model Aliases
If your team has model aliases configured, the forwarding will work with both the original model name and the alias.
## Forwarded Headers
When enabled for a model group, LiteLLM forwards the following types of headers:
### Custom Headers (x- prefix)
- Any header starting with `x-` (except `x-stainless-*` which can cause OpenAI SDK issues)
- Examples: `x-custom-header`, `x-request-id`, `x-trace-id`
### Provider-Specific Headers
- **Anthropic**: `anthropic-beta` headers
- **OpenAI**: `openai-organization` (when enabled via `forward_openai_org_id: true`)
### User Information Headers (Optional)
When `add_user_information_to_llm_headers` is enabled, LiteLLM adds:
- `x-litellm-user-id`
- `x-litellm-org-id`
- Other user metadata as `x-litellm-*` headers
## Security Considerations
⚠️ **Important Security Notes:**
1. **Sensitive Data**: Only enable header forwarding for trusted model groups, as headers may contain sensitive information
2. **API Keys**: Never include API keys or secrets in forwarded headers
3. **PII**: Be cautious about forwarding headers that might contain personally identifiable information
4. **Provider Limits**: Some providers have restrictions on custom headers
## Example Use Cases
### 1. Request Tracing
Forward tracing headers to track requests across your system:
```bash
curl -X POST "https://your-proxy.com/v1/chat/completions" \
-H "Authorization: Bearer your-key" \
-H "x-trace-id: abc123" \
-H "x-request-source: mobile-app" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
### 2. Custom Metadata
Pass custom metadata to your LLM provider:
```bash
curl -X POST "https://your-proxy.com/v1/chat/completions" \
-H "Authorization: Bearer your-key" \
-H "x-customer-id: customer-123" \
-H "x-environment: production" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
### 3. Anthropic Beta Features
Enable beta features for Anthropic models:
```bash
curl -X POST "https://your-proxy.com/v1/chat/completions" \
-H "Authorization: Bearer your-key" \
-H "anthropic-beta: tools-2024-04-04" \
-d '{
"model": "claude-3-sonnet",
"messages": [{"role": "user", "content": "Hello"}]
}'
```
## Complete Configuration Example
```yaml
model_list:
# Fixed model with header forwarding
- model_name: byok-fixed-gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_base: "https://your-openai-endpoint.com"
api_key: "your-api-key"
# Wildcard model group with header forwarding
- model_name: "byok-wildcard/*"
litellm_params:
model: "openai/*"
api_base: "https://your-openai-endpoint.com"
api_key: "your-api-key"
# Standard model without header forwarding
- model_name: standard-gpt-4
litellm_params:
model: openai/gpt-4
api_key: "your-api-key"
litellm_settings:
# Enable user info headers globally (optional)
add_user_information_to_llm_headers: true
model_group_settings:
forward_client_headers_to_llm_api:
- byok-fixed-gpt-4o-mini
- byok-wildcard/*
# Note: standard-gpt-4 is NOT included, so no headers forwarded
general_settings:
# Enable OpenAI organization header forwarding (optional)
forward_openai_org_id: true
```
## Testing Header Forwarding
To test if headers are being forwarded:
1. **Enable Debug Logging**: Set `set_verbose: true` in your config
2. **Check Provider Logs**: Monitor your LLM provider's request logs
3. **Use Webhook Sites**: For testing, you can use webhook.site URLs as api_base to see forwarded headers
## Troubleshooting
### Headers Not Being Forwarded
1. **Check Model Name**: Ensure the model name in your request matches the configuration
2. **Verify Pattern Matching**: Wildcard patterns must match exactly
3. **Review Logs**: Enable verbose logging to see header processing
### Provider Errors
1. **Invalid Headers**: Some providers reject unknown headers
2. **Header Limits**: Providers may have limits on header count/size
3. **Authentication**: Ensure forwarded headers don't conflict with authentication
## Related Features
- [Request Headers](./request_headers.md) - Complete list of supported request headers
- [Response Headers](./response_headers.md) - Headers returned by LiteLLM
- [Team Model Aliases](./team_model_add.md) - Configure model aliases for teams
- [Model Access Control](./model_access.md) - Control which users can access which models
## API Reference
The header forwarding is controlled by the `ModelGroupSettings` configuration:
```python
class ModelGroupSettings(BaseModel):
forward_client_headers_to_llm_api: Optional[List[str]] = None
```
Where each string in the list can be:
- An exact model name (e.g., `"gpt-4o-mini"`)
- A wildcard pattern (e.g., `"openai/*"`)
- A model group name (e.g., `"my-model-group/*"`)

View File

@ -135,6 +135,7 @@ guardrails:
# application_id: "my-app"
# monitor_mode: false
# block_failures: true
# anonymize_input: false
```
### Required Parameters
@ -147,6 +148,7 @@ guardrails:
- **`application_id`**: Your application identifier (defaults to `"litellm"`)
- **`monitor_mode`**: If `true`, logs violations without blocking (defaults to `false`)
- **`block_failures`**: If `true`, blocks requests when guardrail API failures occur (defaults to `true`)
- **`anonymize_input`**: If `true`, replaces sensitive content with anonymized version (defaults to `false`)
## Environment Variables
@ -158,6 +160,7 @@ export NOMA_API_BASE="https://api.noma.security/" # Optional
export NOMA_APPLICATION_ID="my-app" # Optional
export NOMA_MONITOR_MODE="false" # Optional
export NOMA_BLOCK_FAILURES="true" # Optional
export NOMA_ANONYMIZE_INPUT="false" # Optional
```
## Advanced Configuration
@ -190,6 +193,20 @@ guardrails:
block_failures: false # Allow requests to proceed if guardrail API fails
```
### Content Anonymization
Enable anonymization to replace sensitive content instead of blocking:
```yaml
guardrails:
- guardrail_name: "noma-anonymize"
litellm_params:
guardrail: noma
mode: "pre_call"
api_key: os.environ/NOMA_API_KEY
anonymize_input: true # Replace sensitive data with anonymized version
```
### Multiple Guardrails
Apply different configurations for input and output:

View File

@ -0,0 +1,153 @@
import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Tool Permission Guardrail
LiteLLM provides a Tool Permission Guardrail that lets you control which **tool calls** a model is allowed to invoke, using configurable allow/deny rules. This offers fine-grained, provider-agnostic control over tool execution (e.g., OpenAI Chat Completions `tool_calls`, Anthropic Messages `tool_use`, MCP tools).
## Quick Start
### 1. Define Guardrails on your LiteLLM config.yaml
Define your guardrails under the `guardrails` section
```yaml
guardrails:
- guardrail_name: "tool-permission-guardrail"
litellm_params:
guardrail: tool_permission
mode: "post_call"
rules:
- id: "allow_bash"
tool_name: "Bash"
decision: "allow"
- id: "allow_github_mcp"
tool_name: "mcp__github_*"
decision: "allow"
- id: "allow_aws_documentation"
tool_name: "mcp__aws-documentation_*_documentation"
decision: "allow"
- id: "deny_read_commands"
tool_name: "Read"
decision: "Deny"
default_action: "deny" # Fallback when no rule matches: "allow" or "deny"
on_disallowed_action: "block" # How to handle disallowed tools: "block" or "rewrite"
```
#### Rule Structure
```yaml
- id: "unique_rule_id" # Unique identifier for the rule
tool_name: "pattern" # Tool name or pattern to match
decision: "allow" # "allow" or "deny"
```
#### Supported values for `mode`
- `pre_call` Run **before** LLM call, on **input**
- `post_call` Run **after** LLM call, on **input & output**
### 2. Start the Proxy
```shell
litellm --config config.yaml --port 4000
```
## Examples
<Tabs>
<TabItem value="block" label="Block Request">
**Block requset**
```bash
# Test
curl -X POST "http://localhost:4000/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-master-key-here" \
-d '{
"model": "gpt-5-mini",
"messages": [{"role": "user","content": "What is the weather like in Tokyo today?"}],
"tools": [
{
"type":"function",
"function": {
"name":"get_current_weather",
"description": "Get the current weather in a given location"
}
}
]
}'
```
**Expected response (Denied):**
```json
{
"error":
{
"message": "Guardrail raised an exception, Guardrail: tool-permission-guardrail, Message: Tool 'get_current_weather' denied by default action",
"type": "None",
"param": "None",
"code": "500"
}
}
```
</TabItem>
<TabItem value="rewrite" label="Rewrite Request">
**Rewrite requset**
```bash
# Test
curl -X POST "http://localhost:4000/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-master-key-here" \
-d '{
"model": "gpt-5-mini",
"messages": [{"role": "user","content": "What is the weather like in Tokyo today?"}],
"tools": [
{
"type":"function",
"function": {
"name":"get_current_weather",
"description": "Get the current weather in a given location"
}
}
]
}'
```
**Expected response:**
```json
{
"id": "chatcmpl-xxxxxxxxxxxxxxx",
"created": 1757716050,
"model": "gpt-5-mini-2025-08-07",
"object": "chat.completion",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I cant fetch live weather — I dont have realtime internet access.",
"role": "assistant",
"annotations": []
},
"provider_specific_fields": {}
}
],
"usage": {
"prompt_tokens": 112,
"total_tokens": 735,
"completion_tokens_details": {
"reasoning_tokens": 384,
},
},
"service_tier": "default"
}
```
</TabItem>
</Tabs>

View File

@ -2,6 +2,10 @@
Special headers that are supported by LiteLLM.
## Header Forwarding
By default, LiteLLM does not forward client headers to LLM provider APIs. However, you can selectively enable header forwarding for specific model groups. [Learn more about configuring header forwarding](./forward_client_headers.md).
## LiteLLM Headers
`x-litellm-timeout` Optional[float]: The timeout for the request in seconds.
@ -21,11 +25,15 @@ Special headers that are supported by LiteLLM.
`anthropic-version` Optional[str]: The version of the Anthropic API to use.
`anthropic-beta` Optional[str]: The beta version of the Anthropic API to use.
- For `/v1/messages` endpoint, this will always be forward the header to the underlying model.
- For `/chat/completions` endpoint, this will only be forwarded if `forward_client_headers_to_llm_api` is true.
- For `/chat/completions` endpoint, this will only be forwarded if the model is configured in `forward_client_headers_to_llm_api`. [Learn more](./forward_client_headers.md)
## OpenAI Headers
`openai-organization` Optional[str]: The organization to use for the OpenAI API. (currently needs to be enabled via `general_settings::forward_openai_org_id: true`)
## Custom Headers
Custom headers starting with `x-` can be forwarded to LLM provider APIs when the model is configured in `forward_client_headers_to_llm_api`. [Learn more about header forwarding configuration](./forward_client_headers.md).

View File

@ -89,16 +89,20 @@ To track spend and usage for each Open WebUI user, configure both Open WebUI and
2. **Configure LiteLLM to Parse User Headers**
Add the following to your LiteLLM `config.yaml` to specify a header to use for user tracking:
Add the following to your LiteLLM `config.yaml` to specify the request header mapping for user tracking:
```yaml
general_settings:
user_header_name: X-OpenWebUI-User-Id
user_header_mappings:
- header_name: X-OpenWebUI-User-Id
litellm_user_role: internal_user
- header_name: X-OpenWebUI-User-Email
litellm_user_role: customer
```
ⓘ Available tracking options
You can use any of the following headers for `user_header_name`:
You can use any of the following headers in `header_name` in `user_header_mappings` :
- `X-OpenWebUI-User-Id`
- `X-OpenWebUI-User-Email`
- `X-OpenWebUI-User-Name`
@ -109,6 +113,12 @@ To track spend and usage for each Open WebUI user, configure both Open WebUI and
- Users can modify their own usernames
- Administrators can modify both usernames and emails of any account
This video walks through on how we can map the openweb ui headers to LiteLLM user roles
<iframe src="https://www.loom.com/embed/a1b6a4635fc0478ba4fd34cae16e2ffd?sid=791c2dcc-7e65-45be-bf7f-27d2601c123e" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen width="840" height="500"></iframe>
<br/>
<br/>
## Render `thinking` content on Open WebUI

View File

@ -0,0 +1,155 @@
---
title: "v1.77.2-stable - Bedrock Batches API"
slug: "v1-77-2"
date: 2025-09-13T10:00:00
authors:
- name: Krrish Dholakia
title: CEO, LiteLLM
url: https://www.linkedin.com/in/krish-d/
image_url: https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg
- name: Ishaan Jaffer
title: CTO, LiteLLM
url: https://www.linkedin.com/in/reffajnaahsi/
image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
hide_table_of_contents: false
---
import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Deploy this version
<Tabs>
<TabItem value="docker" label="Docker">
``` showLineNumbers title="docker run litellm"
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.2
```
</TabItem>
<TabItem value="pip" label="Pip">
``` showLineNumbers title="pip install litellm"
pip install litellm==1.77.2
```
</TabItem>
</Tabs>
---
## Key Highlights
- **Bedrock Batches API** - Support for creating Batch Inference Jobs on Bedrock using LiteLLM's unified batch API (OpenAI compatible)
- **Qwen API Tiered Pricing** - Cost tracking support for Dashscope (Qwen) models with multiple pricing tiers
## New Models / Updated Models
#### New Model Support
| Provider | Model | Context Window | Pricing ($/1M tokens) | Features |
| ----------- | ------------------------------- | -------------- | --------------------- | -------- |
| DeepInfra | `deepinfra/deepseek-ai/DeepSeek-R1` | 164K | **Input:** $0.70<br/>**Output:** $2.40 | Chat completions, tool calling |
| Heroku | `heroku/claude-4-sonnet` | 8K | Contact provider for pricing | Function calling, tool choice |
| Heroku | `heroku/claude-3-7-sonnet` | 8K | Contact provider for pricing | Function calling, tool choice |
| Heroku | `heroku/claude-3-5-sonnet-latest` | 8K | Contact provider for pricing | Function calling, tool choice |
| Heroku | `heroku/claude-3-5-haiku` | 4K | Contact provider for pricing | Function calling, tool choice |
| Dashscope | `dashscope/qwen-plus-latest` | 1M | **Tiered Pricing:**<br/>• 0-256K tokens: $0.40 / $1.20<br/>• 256K-1M tokens: $1.20 / $3.60 | Function calling, reasoning |
| Dashscope | `dashscope/qwen3-max-preview` | 262K | **Tiered Pricing:**<br/>• 0-32K tokens: $1.20 / $6.00<br/>• 32K-128K tokens: $2.40 / $12.00<br/>• 128K-252K tokens: $3.00 / $15.00 | Function calling, reasoning |
| Dashscope | `dashscope/qwen-flash` | 1M | **Tiered Pricing:**<br/>• 0-256K tokens: $0.05 / $0.40<br/>• 256K-1M tokens: $0.25 / $2.00 | Function calling, reasoning |
| Dashscope | `dashscope/qwen3-coder-plus` | 1M | **Tiered Pricing:**<br/>• 0-32K tokens: $1.00 / $5.00<br/>• 32K-128K tokens: $1.80 / $9.00<br/>• 128K-256K tokens: $3.00 / $15.00<br/>• 256K-1M tokens: $6.00 / $60.00 | Function calling, reasoning, caching |
| Dashscope | `dashscope/qwen3-coder-flash` | 1M | **Tiered Pricing:**<br/>• 0-32K tokens: $0.30 / $1.50<br/>• 32K-128K tokens: $0.50 / $2.50<br/>• 128K-256K tokens: $0.80 / $4.00<br/>• 256K-1M tokens: $1.60 / $9.60 | Function calling, reasoning, caching |
---
#### Features
- **[Bedrock](../../docs/providers/bedrock_batches)**
- Bedrock Batches API - batch processing support with file upload and request transformation - [PR #14518](https://github.com/BerriAI/litellm/pull/14518), [PR #14522](https://github.com/BerriAI/litellm/pull/14522)
- **[VLLM](../../docs/providers/vllm)**
- Added transcription endpoint support - [PR #14523](https://github.com/BerriAI/litellm/pull/14523)
- **[Ollama](../../docs/providers/ollama)**
- `ollama_chat/` - images, thinking, and content as list handling - [PR #14523](https://github.com/BerriAI/litellm/pull/14523)
- **General**
- New debug flag for detailed request/response logging [PR #14482](https://github.com/BerriAI/litellm/pull/14482)
#### Bug Fixes
- **[Azure OpenAI](../../docs/providers/azure)**
- Fixed extra_body injection causing payload rejection in image generation - [PR #14475](https://github.com/BerriAI/litellm/pull/14475)
- **[LM Studio](../../docs/providers/lm-studio)**
- Resolved illegal Bearer header value issue - [PR #14512](https://github.com/BerriAI/litellm/pull/14512)
---
## LLM API Endpoints
#### Bug Fixes
- **[/messages](../../docs/anthropic_unified)**
- Don't send content block after message w/ finish reason + usage block - [PR #14477](https://github.com/BerriAI/litellm/pull/14477)
- **[/generateContent](../../docs/generateContent)**
- Gemini CLI Integration - Fixed token count errors - [PR #14451](https://github.com/BerriAI/litellm/pull/14451), [PR #14417](https://github.com/BerriAI/litellm/pull/14417)
---
## Spend Tracking, Budgets and Rate Limiting
#### Features
- **[Qwen API Tiered Pricing](../../docs/providers/dashscope)** - Added comprehensive tiered cost tracking for Dashscope/Qwen models - [PR #14471](https://github.com/BerriAI/litellm/pull/14471), [PR #14479](https://github.com/BerriAI/litellm/pull/14479)
#### Bug Fixes
- **Provider Budgets** - Fixed provider budget calculations - [PR #14459](https://github.com/BerriAI/litellm/pull/14459)
---
## Management Endpoints / UI
#### Features
- **User Headers Mapping** - New X-LiteLLM Users mapping feature for enhanced user tracking - [PR #14485](https://github.com/BerriAI/litellm/pull/14485)
- **Key Unblocking** - Support for hashed tokens in `/key/unblock` endpoint - [PR #14477](https://github.com/BerriAI/litellm/pull/14477)
- **Model Group Header Forwarding** - Enhanced wildcard model support with documentation - [PR #14528](https://github.com/BerriAI/litellm/pull/14528)
#### Bug Fixes
- **Log Tab Key Alias** - Fixed filtering inaccuracies for failed logs - [PR #14469](https://github.com/BerriAI/litellm/pull/14469), [PR #14529](https://github.com/BerriAI/litellm/pull/14529)
---
## Logging / Guardrail Integrations
#### Features
- **Noma Integration** - Added non-blocking monitor mode with anonymize input support - [PR #14401](https://github.com/BerriAI/litellm/pull/14401)
---
## Performance / Loadbalancing / Reliability improvements
#### Performance
- Removed dynamic creation of static values - [PR #14538](https://github.com/BerriAI/litellm/pull/14538)
- Using `_PROXY_MaxParallelRequestsHandler_v3` by default for optimal throughput - [PR #14450](https://github.com/BerriAI/litellm/pull/14450)
- Improved execution context propagation into logging tasks - [PR #14455](https://github.com/BerriAI/litellm/pull/14455)
---
## New Contributors
* @Sameerlite made their first contribution in [PR #14460](https://github.com/BerriAI/litellm/pull/14460)
* @holzman made their first contribution in [PR #14459](https://github.com/BerriAI/litellm/pull/14459)
* @sashank5644 made their first contribution in [PR #14469](https://github.com/BerriAI/litellm/pull/14469)
* @TomAlon made their first contribution in [PR #14401](https://github.com/BerriAI/litellm/pull/14401)
* @AlexsanderHamir made their first contribution in [PR #14538](https://github.com/BerriAI/litellm/pull/14538)
---
## **[Full Changelog](https://github.com/BerriAI/litellm/compare/v1.77.1.dev.2...v1.77.2.dev)**

View File

@ -49,6 +49,7 @@ const sidebars = {
"proxy/guardrails/secret_detection",
"proxy/guardrails/custom_guardrail",
"proxy/guardrails/prompt_injection",
"proxy/guardrails/tool_permission",
].sort(),
],
},
@ -141,6 +142,7 @@ const sidebars = {
"proxy/clientside_auth",
"proxy/request_headers",
"proxy/response_headers",
"proxy/forward_client_headers",
"proxy/model_discovery",
],
},
@ -487,7 +489,8 @@ const sidebars = {
"providers/bytez",
"providers/heroku",
"providers/oci",
"providers/datarobot",
"providers/datarobot",
"providers/ovhcloud",
],
},
{

View File

@ -1,4 +1,6 @@
from typing import Literal, TypedDict
from typing import Literal
from typing_extensions import TypedDict
class CustomAuthSettings(TypedDict):

View File

@ -6,7 +6,7 @@
"": {
"dependencies": {
"@hono/node-server": "^1.10.1",
"hono": "^4.6.5"
"hono": "^4.9.7"
},
"devDependencies": {
"@types/node": "^20.11.17",
@ -463,9 +463,10 @@
}
},
"node_modules/hono": {
"version": "4.6.5",
"resolved": "https://registry.npmjs.org/hono/-/hono-4.6.5.tgz",
"integrity": "sha512-qsmN3V5fgtwdKARGLgwwHvcdLKursMd+YOt69eGpl1dUCJb8mCd7hZfyZnBYjxCegBG7qkJRQRUy2oO25yHcyQ==",
"version": "4.9.7",
"resolved": "https://registry.npmjs.org/hono/-/hono-4.9.7.tgz",
"integrity": "sha512-t4Te6ERzIaC48W3x4hJmBwgNlLhmiEdEE5ViYb02ffw4ignHNHa5IBtPjmbKstmtKa8X6C35iWwK4HaqvrzG9w==",
"license": "MIT",
"engines": {
"node": ">=16.9.0"
}

View File

@ -4,7 +4,7 @@
},
"dependencies": {
"@hono/node-server": "^1.10.1",
"hono": "^4.6.5"
"hono": "^4.9.7"
},
"devDependencies": {
"@types/node": "^20.11.17",

View File

@ -241,6 +241,7 @@ gradient_ai_api_key: Optional[str] = None
nebius_key: Optional[str] = None
heroku_key: Optional[str] = None
cometapi_key: Optional[str] = None
ovhcloud_key: Optional[str] = None
common_cloud_provider_auth_params: dict = {
"params": ["project", "region_name", "token"],
"providers": ["vertex_ai", "bedrock", "watsonx", "azure", "vertex_ai_beta"],
@ -520,6 +521,8 @@ cometapi_models: Set = set()
oci_models: Set = set()
vercel_ai_gateway_models: Set = set()
volcengine_models: Set = set()
ovhcloud_models: Set = set()
ovhcloud_embedding_models: Set = set()
def is_bedrock_pricing_only_model(key: str) -> bool:
@ -734,6 +737,10 @@ def add_known_models():
oci_models.add(key)
elif value.get("litellm_provider") == "volcengine":
volcengine_models.add(key)
elif value.get("litellm_provider") == "ovhcloud":
ovhcloud_models.add(key)
elif value.get("litellm_provider") == "ovhcloud-embedding-models":
ovhcloud_embedding_models.add(key)
add_known_models()
@ -828,6 +835,7 @@ model_list = list(
| heroku_models
| vercel_ai_gateway_models
| volcengine_models
| ovhcloud_models
)
model_list_set = set(model_list)
@ -909,6 +917,7 @@ models_by_provider: dict = {
"cometapi": cometapi_models,
"oci": oci_models,
"volcengine": volcengine_models,
"ovhcloud": ovhcloud_models | ovhcloud_embedding_models,
}
# mapping for those models which have larger equivalents
@ -943,6 +952,7 @@ all_embedding_models = (
| fireworks_ai_embedding_models
| nebius_embedding_models
| sambanova_embedding_models
| ovhcloud_embedding_models
)
####### IMAGE GENERATION MODELS ###################
@ -1255,6 +1265,8 @@ from .llms.morph.chat.transformation import MorphChatConfig
from .llms.lambda_ai.chat.transformation import LambdaAIChatConfig
from .llms.hyperbolic.chat.transformation import HyperbolicChatConfig
from .llms.vercel_ai_gateway.chat.transformation import VercelAIGatewayConfig
from .llms.ovhcloud.chat.transformation import OVHCloudChatConfig
from .llms.ovhcloud.embedding.transformation import OVHCloudEmbeddingConfig
from .main import * # type: ignore
from .integrations import *
from .llms.custom_httpx.async_client_cleanup import close_litellm_async_clients

View File

@ -2,7 +2,9 @@
Handler for transforming /chat/completions api requests to litellm.responses requests
"""
from typing import TYPE_CHECKING, Any, Coroutine, TypedDict, Union
from typing import TYPE_CHECKING, Any, Coroutine, Union
from typing_extensions import TypedDict
if TYPE_CHECKING:
from litellm import CustomStreamWrapper, LiteLLMLoggingObj, ModelResponse

View File

@ -313,6 +313,7 @@ LITELLM_CHAT_PROVIDERS = [
"morph",
"lambda_ai",
"vercel_ai_gateway",
"ovhcloud",
]
LITELLM_EMBEDDING_PROVIDERS_SUPPORTING_INPUT_ARRAY_OF_TOKENS = [
@ -1023,6 +1024,7 @@ SENTRY_DENYLIST = [
"FIREWORKS_API_KEY",
"FIREWORKS_AI_API_KEY",
"FIREWORKSAI_API_KEY",
"OVHCLOUD_API_KEY",
# Database and Connection Strings
"database_url",
"redis_url",

View File

@ -2,7 +2,9 @@
Handler for transforming /chat/completions api requests to litellm.responses requests
"""
from typing import TYPE_CHECKING, Optional, TypedDict, Union
from typing import TYPE_CHECKING, Optional, Union
from typing_extensions import TypedDict
if TYPE_CHECKING:
from litellm import LiteLLMLoggingObj

View File

@ -31,7 +31,7 @@ class SoftBudgetAlert(BaseBudgetAlertType):
return "Soft Budget Crossed: "
def get_id(self, user_info: CallInfo) -> str:
return "default_id"
return user_info.token or "default_id"
class UserBudgetAlert(BaseBudgetAlertType):

View File

@ -64,7 +64,7 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
asyncio.create_task(self.periodic_flush())
self.flush_lock = asyncio.Lock()
self.log_queue: List[LLMObsPayload] = []
#########################################################
# Handle datadog_llm_observability_params set as litellm.datadog_llm_observability_params
#########################################################
@ -83,22 +83,25 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
"""
dict_datadog_llm_obs_params: Dict = {}
if litellm.datadog_llm_observability_params is not None:
if isinstance(litellm.datadog_llm_observability_params, DatadogLLMObsInitParams):
dict_datadog_llm_obs_params = litellm.datadog_llm_observability_params.model_dump()
if isinstance(
litellm.datadog_llm_observability_params, DatadogLLMObsInitParams
):
dict_datadog_llm_obs_params = (
litellm.datadog_llm_observability_params.model_dump()
)
elif isinstance(litellm.datadog_llm_observability_params, Dict):
# only allow params that are of DatadogLLMObsInitParams
dict_datadog_llm_obs_params = DatadogLLMObsInitParams(**litellm.datadog_llm_observability_params).model_dump()
dict_datadog_llm_obs_params = DatadogLLMObsInitParams(
**litellm.datadog_llm_observability_params
).model_dump()
return dict_datadog_llm_obs_params
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
try:
verbose_logger.debug(
f"DataDogLLMObs: Logging success event for model {kwargs.get('model', 'unknown')}"
)
payload = self.create_llm_obs_payload(
kwargs, start_time, end_time
)
payload = self.create_llm_obs_payload(kwargs, start_time, end_time)
verbose_logger.debug(f"DataDogLLMObs: Payload: {payload}")
self.log_queue.append(payload)
@ -108,15 +111,13 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
verbose_logger.exception(
f"DataDogLLMObs: Error logging success event - {str(e)}"
)
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
try:
verbose_logger.debug(
f"DataDogLLMObs: Logging failure event for model {kwargs.get('model', 'unknown')}"
)
payload = self.create_llm_obs_payload(
kwargs, start_time, end_time
)
payload = self.create_llm_obs_payload(kwargs, start_time, end_time)
verbose_logger.debug(f"DataDogLLMObs: Payload: {payload}")
self.log_queue.append(payload)
@ -184,7 +185,6 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
messages = standard_logging_payload["messages"]
messages = self._ensure_string_content(messages=messages)
response_obj = standard_logging_payload.get("response")
metadata = kwargs.get("litellm_params", {}).get("metadata", {})
@ -193,10 +193,12 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
messages
)
)
output_meta = OutputMeta(messages=self._get_response_messages(
response_obj=response_obj,
call_type=standard_logging_payload.get("call_type")
))
output_meta = OutputMeta(
messages=self._get_response_messages(
standard_logging_payload=standard_logging_payload,
call_type=standard_logging_payload.get("call_type"),
)
)
error_info = self._assemble_error_info(standard_logging_payload)
@ -214,7 +216,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
output_tokens=float(standard_logging_payload.get("completion_tokens", 0)),
total_tokens=float(standard_logging_payload.get("total_tokens", 0)),
total_cost=float(standard_logging_payload.get("response_cost", 0)),
time_to_first_token=self._get_time_to_first_token_seconds(standard_logging_payload),
time_to_first_token=self._get_time_to_first_token_seconds(
standard_logging_payload
),
)
payload: LLMObsPayload = LLMObsPayload(
@ -251,27 +255,35 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
except Exception:
pass
return None
def _assemble_error_info(self, standard_logging_payload: StandardLoggingPayload) -> Optional[DDLLMObsError]:
def _assemble_error_info(
self, standard_logging_payload: StandardLoggingPayload
) -> Optional[DDLLMObsError]:
"""
Assemble error information for failure cases according to DD LLM Obs API spec
"""
# Handle error information for failure cases according to DD LLM Obs API spec
error_info: Optional[DDLLMObsError] = None
if standard_logging_payload.get("status") == "failure":
# Try to get structured error information first
error_information: Optional[StandardLoggingPayloadErrorInformation] = standard_logging_payload.get("error_information")
error_information: Optional[
StandardLoggingPayloadErrorInformation
] = standard_logging_payload.get("error_information")
if error_information:
error_info = DDLLMObsError(
message=error_information.get("error_message") or standard_logging_payload.get("error_str") or "Unknown error",
message=error_information.get("error_message")
or standard_logging_payload.get("error_str")
or "Unknown error",
type=error_information.get("error_class"),
stack=error_information.get("traceback")
stack=error_information.get("traceback"),
)
return error_info
def _get_time_to_first_token_seconds(self, standard_logging_payload: StandardLoggingPayload) -> float:
def _get_time_to_first_token_seconds(
self, standard_logging_payload: StandardLoggingPayload
) -> float:
"""
Get the time to first token in seconds
@ -280,7 +292,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
For non streaming calls, CompletionStartTime is time we get the response back
"""
start_time: Optional[float] = standard_logging_payload.get("startTime")
completion_start_time: Optional[float] = standard_logging_payload.get("completionStartTime")
completion_start_time: Optional[float] = standard_logging_payload.get(
"completionStartTime"
)
end_time: Optional[float] = standard_logging_payload.get("endTime")
if completion_start_time is not None and start_time is not None:
@ -290,19 +304,43 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
else:
return 0.0
def _get_response_messages(
self, response_obj: Any, call_type: Optional[str]
self, standard_logging_payload: StandardLoggingPayload, call_type: Optional[str]
) -> List[Any]:
"""
Get the messages from the response object
for now this handles logging /chat/completions responses
"""
response_obj = standard_logging_payload.get("response")
if response_obj is None:
return []
if call_type in [CallTypes.completion.value, CallTypes.acompletion.value]:
# edge case: handle response_obj is a string representation of a dict
if isinstance(response_obj, str):
try:
import ast
response_obj = ast.literal_eval(response_obj)
except (ValueError, SyntaxError):
try:
# fallback to json parsing
response_obj = json.loads(str(response_obj))
except json.JSONDecodeError:
return []
if call_type in [
CallTypes.completion.value,
CallTypes.acompletion.value,
CallTypes.text_completion.value,
CallTypes.atext_completion.value,
CallTypes.generate_content.value,
CallTypes.agenerate_content.value,
CallTypes.generate_content_stream.value,
CallTypes.agenerate_content_stream.value,
CallTypes.anthropic_messages.value,
]:
try:
# Safely extract message from response_obj, handle failure cases
if isinstance(response_obj, dict) and "choices" in response_obj:
@ -315,102 +353,104 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
return []
return []
def _get_datadog_span_kind(self, call_type: Optional[str]) -> Literal["llm", "tool", "task", "embedding", "retrieval"]:
def _get_datadog_span_kind(
self, call_type: Optional[str]
) -> Literal["llm", "tool", "task", "embedding", "retrieval"]:
"""
Map liteLLM call_type to appropriate DataDog LLM Observability span kind.
Available DataDog span kinds: "llm", "tool", "task", "embedding", "retrieval"
"""
if call_type is None:
return "llm"
# Embedding operations
if call_type in [CallTypes.embedding.value, CallTypes.aembedding.value]:
return "embedding"
# LLM completion operations
# LLM completion operations
if call_type in [
CallTypes.completion.value,
CallTypes.completion.value,
CallTypes.acompletion.value,
CallTypes.text_completion.value,
CallTypes.text_completion.value,
CallTypes.atext_completion.value,
CallTypes.generate_content.value,
CallTypes.generate_content.value,
CallTypes.agenerate_content.value,
CallTypes.generate_content_stream.value,
CallTypes.generate_content_stream.value,
CallTypes.agenerate_content_stream.value,
CallTypes.anthropic_messages.value
CallTypes.anthropic_messages.value,
]:
return "llm"
# Tool operations
if call_type in [CallTypes.call_mcp_tool.value]:
return "tool"
# Retrieval operations
if call_type in [
CallTypes.get_assistants.value,
CallTypes.get_assistants.value,
CallTypes.aget_assistants.value,
CallTypes.get_thread.value,
CallTypes.get_thread.value,
CallTypes.aget_thread.value,
CallTypes.get_messages.value,
CallTypes.get_messages.value,
CallTypes.aget_messages.value,
CallTypes.afile_retrieve.value,
CallTypes.afile_retrieve.value,
CallTypes.file_retrieve.value,
CallTypes.afile_list.value,
CallTypes.afile_list.value,
CallTypes.file_list.value,
CallTypes.afile_content.value,
CallTypes.afile_content.value,
CallTypes.file_content.value,
CallTypes.retrieve_batch.value,
CallTypes.retrieve_batch.value,
CallTypes.aretrieve_batch.value,
CallTypes.retrieve_fine_tuning_job.value,
CallTypes.retrieve_fine_tuning_job.value,
CallTypes.aretrieve_fine_tuning_job.value,
CallTypes.responses.value,
CallTypes.responses.value,
CallTypes.aresponses.value,
CallTypes.alist_input_items.value
CallTypes.alist_input_items.value,
]:
return "retrieval"
# Task operations (batch, fine-tuning, file operations, etc.)
if call_type in [
CallTypes.create_batch.value,
CallTypes.create_batch.value,
CallTypes.acreate_batch.value,
CallTypes.create_fine_tuning_job.value,
CallTypes.create_fine_tuning_job.value,
CallTypes.acreate_fine_tuning_job.value,
CallTypes.cancel_fine_tuning_job.value,
CallTypes.cancel_fine_tuning_job.value,
CallTypes.acancel_fine_tuning_job.value,
CallTypes.list_fine_tuning_jobs.value,
CallTypes.list_fine_tuning_jobs.value,
CallTypes.alist_fine_tuning_jobs.value,
CallTypes.create_assistants.value,
CallTypes.create_assistants.value,
CallTypes.acreate_assistants.value,
CallTypes.delete_assistant.value,
CallTypes.delete_assistant.value,
CallTypes.adelete_assistant.value,
CallTypes.create_thread.value,
CallTypes.create_thread.value,
CallTypes.acreate_thread.value,
CallTypes.add_message.value,
CallTypes.add_message.value,
CallTypes.a_add_message.value,
CallTypes.run_thread.value,
CallTypes.run_thread.value,
CallTypes.arun_thread.value,
CallTypes.run_thread_stream.value,
CallTypes.run_thread_stream.value,
CallTypes.arun_thread_stream.value,
CallTypes.file_delete.value,
CallTypes.file_delete.value,
CallTypes.afile_delete.value,
CallTypes.create_file.value,
CallTypes.create_file.value,
CallTypes.acreate_file.value,
CallTypes.image_generation.value,
CallTypes.image_generation.value,
CallTypes.aimage_generation.value,
CallTypes.image_edit.value,
CallTypes.image_edit.value,
CallTypes.aimage_edit.value,
CallTypes.moderation.value,
CallTypes.moderation.value,
CallTypes.amoderation.value,
CallTypes.transcription.value,
CallTypes.transcription.value,
CallTypes.atranscription.value,
CallTypes.speech.value,
CallTypes.speech.value,
CallTypes.aspeech.value,
CallTypes.rerank.value,
CallTypes.arerank.value
CallTypes.rerank.value,
CallTypes.arerank.value,
]:
return "task"
# Default fallback for unknown or passthrough operations
return "llm"
@ -443,7 +483,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
"cache_hit": standard_logging_payload.get("cache_hit", "unknown"),
"cache_key": standard_logging_payload.get("cache_key", "unknown"),
"saved_cache_cost": standard_logging_payload.get("saved_cache_cost", 0),
"guardrail_information": standard_logging_payload.get("guardrail_information", None),
"guardrail_information": standard_logging_payload.get(
"guardrail_information", None
),
}
#########################################################
@ -452,22 +494,32 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
latency_metrics = self._get_latency_metrics(standard_logging_payload)
_metadata.update({"latency_metrics": dict(latency_metrics)})
## extract tool calls and add to metadata
tool_call_metadata = self._extract_tool_call_metadata(standard_logging_payload)
_metadata.update(tool_call_metadata)
_standard_logging_metadata: dict = (
dict(standard_logging_payload.get("metadata", {})) or {}
)
_metadata.update(_standard_logging_metadata)
return _metadata
def _get_latency_metrics(self, standard_logging_payload: StandardLoggingPayload) -> DDLLMObsLatencyMetrics:
def _get_latency_metrics(
self, standard_logging_payload: StandardLoggingPayload
) -> DDLLMObsLatencyMetrics:
"""
Get the latency metrics from the standard logging payload
"""
latency_metrics: DDLLMObsLatencyMetrics = DDLLMObsLatencyMetrics()
# Add latency metrics to metadata
# Time to first token (convert from seconds to milliseconds for consistency)
time_to_first_token_seconds = self._get_time_to_first_token_seconds(standard_logging_payload)
time_to_first_token_seconds = self._get_time_to_first_token_seconds(
standard_logging_payload
)
if time_to_first_token_seconds > 0:
latency_metrics["time_to_first_token_ms"] = time_to_first_token_seconds * 1000
latency_metrics["time_to_first_token_ms"] = (
time_to_first_token_seconds * 1000
)
# LiteLLM overhead time
hidden_params = standard_logging_payload.get("hidden_params", {})
@ -476,11 +528,143 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
latency_metrics["litellm_overhead_time_ms"] = litellm_overhead_ms
# Guardrail overhead latency
guardrail_info: Optional[StandardLoggingGuardrailInformation] = standard_logging_payload.get("guardrail_information")
guardrail_info: Optional[
StandardLoggingGuardrailInformation
] = standard_logging_payload.get("guardrail_information")
if guardrail_info is not None:
_guardrail_duration_seconds: Optional[float] = guardrail_info.get("duration")
_guardrail_duration_seconds: Optional[float] = guardrail_info.get(
"duration"
)
if _guardrail_duration_seconds is not None:
# Convert from seconds to milliseconds for consistency
latency_metrics["guardrail_overhead_time_ms"] = _guardrail_duration_seconds * 1000
return latency_metrics
latency_metrics["guardrail_overhead_time_ms"] = (
_guardrail_duration_seconds * 1000
)
return latency_metrics
def _process_input_messages_preserving_tool_calls(
self, messages: List[Any]
) -> List[Dict[str, Any]]:
"""
Process input messages while preserving tool_calls and tool message types.
This bypasses the lossy string conversion when tool calls are present,
allowing complex nested tool_calls objects to be preserved for Datadog.
"""
processed = []
for msg in messages:
if isinstance(msg, dict):
# Preserve messages with tool_calls or tool role as-is
if "tool_calls" in msg or msg.get("role") == "tool":
processed.append(msg)
else:
# For regular messages, still apply string conversion
converted = (
handle_any_messages_to_chat_completion_str_messages_conversion(
[msg]
)
)
processed.extend(converted)
else:
# For non-dict messages, apply string conversion
converted = (
handle_any_messages_to_chat_completion_str_messages_conversion(
[msg]
)
)
processed.extend(converted)
return processed
@staticmethod
def _tool_calls_kv_pair(tool_calls: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Extract tool call information into key-value pairs for Datadog metadata.
Similar to OpenTelemetry's implementation but adapted for Datadog's format.
"""
kv_pairs: Dict[str, Any] = {}
for idx, tool_call in enumerate(tool_calls):
try:
# Extract tool call ID
tool_id = tool_call.get("id")
if tool_id:
kv_pairs[f"tool_calls.{idx}.id"] = tool_id
# Extract tool call type
tool_type = tool_call.get("type")
if tool_type:
kv_pairs[f"tool_calls.{idx}.type"] = tool_type
# Extract function information
function = tool_call.get("function")
if function:
function_name = function.get("name")
if function_name:
kv_pairs[f"tool_calls.{idx}.function.name"] = function_name
function_arguments = function.get("arguments")
if function_arguments:
# Store arguments as JSON string for Datadog
if isinstance(function_arguments, str):
kv_pairs[
f"tool_calls.{idx}.function.arguments"
] = function_arguments
else:
import json
kv_pairs[
f"tool_calls.{idx}.function.arguments"
] = json.dumps(function_arguments)
except (KeyError, TypeError, ValueError) as e:
verbose_logger.debug(
f"DataDogLLMObs: Error processing tool call {idx}: {str(e)}"
)
continue
return kv_pairs
def _extract_tool_call_metadata(
self, standard_logging_payload: StandardLoggingPayload
) -> Dict[str, Any]:
"""
Extract tool call information from both input messages and response for Datadog metadata.
"""
tool_call_metadata: Dict[str, Any] = {}
try:
# Extract tool calls from input messages
messages = standard_logging_payload.get("messages", [])
if messages and isinstance(messages, list):
for message in messages:
if isinstance(message, dict) and "tool_calls" in message:
tool_calls = message.get("tool_calls")
if tool_calls:
input_tool_calls_kv = self._tool_calls_kv_pair(tool_calls)
# Prefix with "input_" to distinguish from response tool calls
for key, value in input_tool_calls_kv.items():
tool_call_metadata[f"input_{key}"] = value
# Extract tool calls from response
response_obj = standard_logging_payload.get("response")
if response_obj and isinstance(response_obj, dict):
choices = response_obj.get("choices", [])
for choice in choices:
if isinstance(choice, dict):
message = choice.get("message")
if message and isinstance(message, dict):
tool_calls = message.get("tool_calls")
if tool_calls:
response_tool_calls_kv = self._tool_calls_kv_pair(
tool_calls
)
# Prefix with "output_" to distinguish from input tool calls
for key, value in response_tool_calls_kv.items():
tool_call_metadata[f"output_{key}"] = value
except Exception as e:
verbose_logger.debug(
f"DataDogLLMObs: Error extracting tool call metadata: {str(e)}"
)
return tool_call_metadata

View File

@ -4,9 +4,10 @@ Humanloop integration
https://humanloop.com/
"""
from typing import Any, Dict, List, Optional, Tuple, TypedDict, Union, cast
from typing import Any, Dict, List, Optional, Tuple, Union, cast
import httpx
from typing_extensions import TypedDict
import litellm
from litellm.caching import DualCache

View File

@ -1,5 +1,7 @@
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional, Tuple, TypedDict
from typing import Any, Dict, List, Optional, Tuple
from typing_extensions import TypedDict
from litellm.types.llms.openai import AllMessageValues
from litellm.types.utils import StandardCallbackDynamicParams

View File

@ -374,6 +374,8 @@ def get_llm_provider( # noqa: PLR0915
custom_llm_provider = "oci"
elif model.startswith("compactifai/"):
custom_llm_provider = "compactifai"
elif model.startswith("ovhcloud/"):
custom_llm_provider = "ovhcloud"
if not custom_llm_provider:
if litellm.suppress_debug_info is False:
print() # noqa

View File

@ -1,6 +1,5 @@
import asyncio
import json
import re
import time
import traceback
import uuid
@ -9,6 +8,9 @@ from typing import Dict, Iterable, List, Literal, Optional, Tuple, Union
import litellm
from litellm._logging import verbose_logger
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.litellm_core_utils.prompt_templates.common_utils import (
_extract_reasoning_content,
)
from litellm.types.llms.databricks import DatabricksTool
from litellm.types.llms.openai import (
ChatCompletionThinkingBlock,
@ -274,49 +276,6 @@ def _handle_invalid_parallel_tool_calls(
return tool_calls
def _parse_content_for_reasoning(
message_text: Optional[str],
) -> Tuple[Optional[str], Optional[str]]:
"""
Parse the content for reasoning
Returns:
- reasoning_content: The content of the reasoning
- content: The content of the message
"""
if not message_text:
return None, message_text
reasoning_match = re.match(
r"<(?:think|thinking)>(.*?)</(?:think|thinking)>(.*)", message_text, re.DOTALL
)
if reasoning_match:
return reasoning_match.group(1), reasoning_match.group(2)
return None, message_text
def _extract_reasoning_content(message: dict) -> Tuple[Optional[str], Optional[str]]:
"""
Extract reasoning content and main content from a message.
Args:
message (dict): The message dictionary that may contain reasoning_content
Returns:
tuple[Optional[str], Optional[str]]: A tuple of (reasoning_content, content)
"""
message_content = message.get("content")
if "reasoning_content" in message:
return message["reasoning_content"], message["content"]
elif "reasoning" in message:
return message["reasoning"], message["content"]
elif isinstance(message_content, str):
return _parse_content_for_reasoning(message_content)
return None, message_content
class LiteLLMResponseObjectHandler:
@staticmethod
def convert_to_image_response(

View File

@ -1,7 +1,9 @@
import asyncio
import contextlib
import contextvars
from typing import Coroutine, Optional, TypedDict
from typing import Coroutine, Optional
from typing_extensions import TypedDict
from litellm._logging import verbose_logger

View File

@ -14,6 +14,7 @@ from typing import (
Literal,
Mapping,
Optional,
Tuple,
Union,
cast,
)
@ -869,3 +870,63 @@ def convert_prefix_message_to_non_prefix_messages(
else:
new_messages.append(message)
return new_messages
def _extract_reasoning_content(message: dict) -> Tuple[Optional[str], Optional[str]]:
"""
Extract reasoning content and main content from a message.
Args:
message (dict): The message dictionary that may contain reasoning_content
Returns:
tuple[Optional[str], Optional[str]]: A tuple of (reasoning_content, content)
"""
message_content = message.get("content")
if "reasoning_content" in message:
return message["reasoning_content"], message["content"]
elif "reasoning" in message:
return message["reasoning"], message["content"]
elif isinstance(message_content, str):
return _parse_content_for_reasoning(message_content)
return None, message_content
def _parse_content_for_reasoning(
message_text: Optional[str],
) -> Tuple[Optional[str], Optional[str]]:
"""
Parse the content for reasoning
Returns:
- reasoning_content: The content of the reasoning
- content: The content of the message
"""
if not message_text:
return None, message_text
reasoning_match = re.match(
r"<(?:think|thinking)>(.*?)</(?:think|thinking)>(.*)", message_text, re.DOTALL
)
if reasoning_match:
return reasoning_match.group(1), reasoning_match.group(2)
return None, message_text
def extract_images_from_message(message: AllMessageValues) -> List[str]:
"""
Extract images from a message
"""
images = []
message_content = message.get("content")
if isinstance(message_content, list):
for m in message_content:
image_url = m.get("image_url")
if image_url:
if isinstance(image_url, str):
images.append(image_url)
elif isinstance(image_url, dict) and "url" in image_url:
images.append(image_url["url"])
return images

View File

@ -1024,6 +1024,8 @@ class CustomStreamWrapper:
return
def chunk_creator(self, chunk: Any): # type: ignore # noqa: PLR0915
if hasattr(chunk, 'id'):
self.response_id = chunk.id
model_response = self.model_response_creator()
response_obj: Dict[str, Any] = {}
try:

View File

@ -1,6 +1,6 @@
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
from typing import TYPE_CHECKING, Any, List, Optional, Union
import httpx
@ -23,12 +23,13 @@ else:
class AudioTranscriptionRequestData:
"""
Structured data for audio transcription requests.
Attributes:
data: The request data (form data for multipart, json data for regular requests)
files: Optional files dict for multipart form data
content_type: Optional content type override
"""
data: Union[dict, bytes]
files: Optional[dict] = None
content_type: Optional[str] = None
@ -66,13 +67,11 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
audio_file: FileTypes,
optional_params: dict,
litellm_params: dict,
) -> Union[AudioTranscriptionRequestData, Dict]:
) -> AudioTranscriptionRequestData:
raise NotImplementedError(
"AudioTranscriptionConfig needs a request transformation for audio transcription models"
)
def transform_audio_transcription_response(
self,
raw_response: httpx.Response,
@ -110,7 +109,6 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
raise NotImplementedError(
"AudioTranscriptionConfig does not need a response transformation for audio transcription models"
)
def get_provider_specific_params(
self,
@ -141,7 +139,7 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
provider_specific_params[key] = value
return provider_specific_params
def _should_exclude_param(
self,
param_name: str,

View File

@ -14,7 +14,7 @@ from litellm._logging import verbose_logger
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.litellm_core_utils.core_helpers import map_finish_reason
from litellm.litellm_core_utils.litellm_logging import Logging
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
from litellm.litellm_core_utils.prompt_templates.common_utils import (
_parse_content_for_reasoning,
)
from litellm.litellm_core_utils.prompt_templates.factory import (
@ -397,7 +397,11 @@ class AmazonConverseConfig(BaseConfig):
for param, value in non_default_params.items():
if param == "response_format" and isinstance(value, dict):
optional_params = self._translate_response_format_param(
value=value, model=model, optional_params=optional_params, non_default_params=non_default_params, is_thinking_enabled=is_thinking_enabled
value=value,
model=model,
optional_params=optional_params,
non_default_params=non_default_params,
is_thinking_enabled=is_thinking_enabled,
)
if param == "max_tokens" or param == "max_completion_tokens":
optional_params["maxTokens"] = value
@ -446,11 +450,11 @@ class AmazonConverseConfig(BaseConfig):
)
return optional_params
def _translate_response_format_param(
self,
value: dict,
model: str,
self,
value: dict,
model: str,
optional_params: dict,
non_default_params: dict,
is_thinking_enabled: bool,
@ -504,7 +508,7 @@ class AmazonConverseConfig(BaseConfig):
optional_params["json_mode"] = True
if non_default_params.get("stream", False) is True:
optional_params["fake_stream"] = True
return optional_params
def update_optional_params_with_thinking_tokens(

View File

@ -3,7 +3,7 @@ from typing import Any, List, Optional, cast
from httpx import Response
from litellm import verbose_logger
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
from litellm.litellm_core_utils.prompt_templates.common_utils import (
_parse_content_for_reasoning,
)
from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator

View File

@ -118,7 +118,6 @@ class BaseLLMHTTPHandler:
response: Optional[httpx.Response] = None
for i in range(max(max_retry_on_unprocessable_entity_error, 1)):
try:
response = await async_httpx_client.post(
url=api_base,
headers=headers,
@ -2221,7 +2220,9 @@ class BaseLLMHTTPHandler:
if isinstance(transformed_request, dict) and "method" in transformed_request:
# Handle pre-signed requests (e.g., from Bedrock S3 uploads)
upload_response = getattr(sync_httpx_client, transformed_request["method"].lower())(
upload_response = getattr(
sync_httpx_client, transformed_request["method"].lower()
)(
url=transformed_request["url"],
headers=transformed_request["headers"],
data=transformed_request["data"],
@ -2233,8 +2234,8 @@ class BaseLLMHTTPHandler:
# Handle traditional file uploads
# Ensure transformed_request is a string for httpx compatibility
if isinstance(transformed_request, bytes):
transformed_request = transformed_request.decode('utf-8')
transformed_request = transformed_request.decode("utf-8")
# Use the HTTP method specified by the provider config
http_method = provider_config.file_upload_http_method.upper()
if http_method == "PUT":
@ -2314,7 +2315,7 @@ class BaseLLMHTTPHandler:
)
else:
async_httpx_client = client
#########################################################
# Debug Logging
#########################################################
@ -2330,7 +2331,9 @@ class BaseLLMHTTPHandler:
if isinstance(transformed_request, dict) and "method" in transformed_request:
# Handle pre-signed requests (e.g., from Bedrock S3 uploads)
upload_response = await getattr(async_httpx_client, transformed_request["method"].lower())(
upload_response = await getattr(
async_httpx_client, transformed_request["method"].lower()
)(
url=transformed_request["url"],
headers=transformed_request["headers"],
data=transformed_request["data"],
@ -2342,8 +2345,8 @@ class BaseLLMHTTPHandler:
# Handle traditional file uploads
# Ensure transformed_request is a string for httpx compatibility
if isinstance(transformed_request, bytes):
transformed_request = transformed_request.decode('utf-8')
transformed_request = transformed_request.decode("utf-8")
# Use the HTTP method specified by the provider config
http_method = provider_config.file_upload_http_method.upper()
if http_method == "PUT":
@ -2468,9 +2471,14 @@ class BaseLLMHTTPHandler:
sync_httpx_client = client
try:
if isinstance(transformed_request, dict) and "method" in transformed_request:
if (
isinstance(transformed_request, dict)
and "method" in transformed_request
):
# Handle pre-signed requests (e.g., from Bedrock with AWS auth)
batch_response = getattr(sync_httpx_client, transformed_request["method"].lower())(
batch_response = getattr(
sync_httpx_client, transformed_request["method"].lower()
)(
url=transformed_request["url"],
headers=transformed_request["headers"],
data=transformed_request["data"],
@ -2500,8 +2508,11 @@ class BaseLLMHTTPHandler:
)
# Store original request for response transformation
litellm_params_with_request = {**litellm_params, "original_batch_request": create_batch_data}
litellm_params_with_request = {
**litellm_params,
"original_batch_request": create_batch_data,
}
return provider_config.transform_create_batch_response(
model=model,
raw_response=batch_response,
@ -2531,7 +2542,7 @@ class BaseLLMHTTPHandler:
)
else:
async_httpx_client = client
#########################################################
# Debug Logging
#########################################################
@ -2546,9 +2557,14 @@ class BaseLLMHTTPHandler:
)
try:
if isinstance(transformed_request, dict) and "method" in transformed_request:
if (
isinstance(transformed_request, dict)
and "method" in transformed_request
):
# Handle pre-signed requests (e.g., from Bedrock with AWS auth)
batch_response = await getattr(async_httpx_client, transformed_request["method"].lower())(
batch_response = await getattr(
async_httpx_client, transformed_request["method"].lower()
)(
url=transformed_request["url"],
headers=transformed_request["headers"],
data=transformed_request["data"],
@ -2578,8 +2594,11 @@ class BaseLLMHTTPHandler:
)
# Store original request for response transformation (for async version)
litellm_params_with_request = {**litellm_params, "original_batch_request": create_batch_data or {}}
litellm_params_with_request = {
**litellm_params,
"original_batch_request": create_batch_data or {},
}
return provider_config.transform_create_batch_response(
model=model,
raw_response=batch_response,

View File

@ -1,10 +1,13 @@
from typing import List, Optional
from typing import List, Optional, cast
from litellm.litellm_core_utils.prompt_templates.factory import (
convert_generic_image_chunk_to_openai_image_obj,
convert_to_anthropic_image_obj,
)
from litellm.types.llms.openai import AllMessageValues
from litellm.litellm_core_utils.prompt_templates.image_handling import (
convert_url_to_base64,
)
from litellm.types.llms.openai import AllMessageValues, ChatCompletionFileObject
from litellm.types.llms.vertex_ai import ContentType, PartType
from litellm.utils import supports_reasoning
@ -99,7 +102,8 @@ class GoogleAIStudioGeminiConfig(VertexGeminiConfig):
self, messages: List[AllMessageValues]
) -> List[ContentType]:
"""
Google AI Studio Gemini does not support image urls in messages.
Google AI Studio Gemini does not support HTTP/HTTPS URLs for files.
Convert them to base64 data instead.
"""
for message in messages:
_message_content = message.get("content")
@ -124,4 +128,16 @@ class GoogleAIStudioGeminiConfig(VertexGeminiConfig):
image_obj
)
)
elif element.get("type") == "file":
file_element = cast(ChatCompletionFileObject, element)
file_id = file_element["file"].get("file_id")
if file_id and ("http://" in file_id or "https://" in file_id):
# Convert HTTP/HTTPS file URL to base64 data
try:
base64_data = convert_url_to_base64(file_id)
file_element["file"]["file_data"] = base64_data # type: ignore
file_element["file"].pop("file_id", None) # type: ignore
except Exception:
# If conversion fails, leave as is and let the API handle it
pass
return _gemini_convert_messages_with_history(messages=messages)

View File

@ -0,0 +1,72 @@
"""
Transformation logic for Hosted VLLM rerank
"""
from typing import Optional, Union
import httpx
from litellm.llms.base_llm.audio_transcription.transformation import (
AudioTranscriptionRequestData,
)
from litellm.llms.base_llm.chat.transformation import BaseLLMException
from litellm.llms.openai.transcriptions.whisper_transformation import (
OpenAIWhisperAudioTranscriptionConfig,
)
from litellm.types.utils import FileTypes
class HostedVLLMAudioTranscriptionError(BaseLLMException):
def __init__(
self,
status_code: int,
message: str,
headers: Optional[Union[dict, httpx.Headers]] = None,
):
super().__init__(status_code=status_code, message=message, headers=headers)
class HostedVLLMAudioTranscriptionConfig(OpenAIWhisperAudioTranscriptionConfig):
def __init__(self) -> None:
pass
def get_complete_url(
self,
api_base: Optional[str],
api_key: Optional[str],
model: str,
optional_params: dict,
litellm_params: dict,
stream: Optional[bool] = None,
) -> str:
if api_base:
# Remove trailing slashes and ensure clean base URL
api_base = api_base.rstrip("/")
if not api_base.endswith("/v1/audio/transcriptions"):
api_base = f"{api_base}/v1/audio/transcriptions"
return api_base
raise ValueError("api_base must be provided for Hosted VLLM rerank")
def transform_audio_transcription_request(
self,
model: str,
audio_file: FileTypes,
optional_params: dict,
litellm_params: dict,
) -> AudioTranscriptionRequestData:
"""
Transform the audio transcription request
"""
data = {"model": model, "file": audio_file, **optional_params}
if "response_format" not in data or (
data["response_format"] == "text" or data["response_format"] == "json"
):
data["response_format"] = (
"verbose_json" # ensures 'duration' is received - used for cost calculation
)
return AudioTranscriptionRequestData(
data=data,
)

View File

@ -1,8 +1,9 @@
import os
import uuid
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, TypedDict, Union
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
import httpx
from typing_extensions import TypedDict
import litellm
from litellm.llms.base_llm.chat.transformation import BaseLLMException

View File

@ -15,8 +15,8 @@ class LMStudioChatConfig(OpenAIGPTConfig):
) -> Tuple[Optional[str], Optional[str]]:
api_base = api_base or get_secret_str("LM_STUDIO_API_BASE") # type: ignore
dynamic_api_key = (
api_key or get_secret_str("LM_STUDIO_API_KEY") or " "
) # vllm does not require an api key
api_key or get_secret_str("LM_STUDIO_API_KEY") or "fake-api-key"
) # LM Studio does not require an api key, but OpenAI client requires non-None value
return api_base, dynamic_api_key
def map_openai_params(

View File

@ -16,9 +16,18 @@ from httpx._models import Headers, Response
from pydantic import BaseModel
import litellm
from litellm.litellm_core_utils.prompt_templates.common_utils import (
_extract_reasoning_content,
convert_content_list_to_str,
extract_images_from_message,
)
from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
from litellm.llms.base_llm.chat.transformation import BaseConfig, BaseLLMException
from litellm.types.llms.ollama import OllamaToolCall, OllamaToolCallFunction
from litellm.types.llms.ollama import (
OllamaChatCompletionMessage,
OllamaToolCall,
OllamaToolCallFunction,
)
from litellm.types.llms.openai import (
AllMessageValues,
ChatCompletionAssistantToolCall,
@ -299,7 +308,23 @@ class OllamaChatConfig(BaseConfig):
)
new_tools.append(ollama_tool_call)
cast(dict, m)["tool_calls"] = new_tools
new_messages.append(m)
reasoning_content, parsed_content = _extract_reasoning_content(
cast(dict, m)
)
content_str = convert_content_list_to_str(cast(AllMessageValues, m))
images = extract_images_from_message(cast(AllMessageValues, m))
ollama_message = OllamaChatCompletionMessage(
role=cast(str, m.get("role")),
)
if reasoning_content is not None:
ollama_message["thinking"] = reasoning_content
if content_str is not None:
ollama_message["content"] = content_str
if images is not None:
ollama_message["images"] = images
new_messages.append(ollama_message)
# Load Config
config = self.get_config()
@ -361,7 +386,7 @@ class OllamaChatConfig(BaseConfig):
del response_json_message["thinking"]
elif response_json_message.get("content") is not None:
# parse reasoning content from content
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
from litellm.litellm_core_utils.prompt_templates.common_utils import (
_parse_content_for_reasoning,
)

View File

@ -229,7 +229,7 @@ class OllamaConfig(BaseConfig):
model = model.split("/", 1)[1]
api_base = get_secret_str("OLLAMA_API_BASE") or "http://localhost:11434"
api_key = self.get_api_key()
headers = { "Authorization": f"Bearer {api_key}" } if api_key else {}
headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
try:
response = litellm.module_level_client.post(
@ -279,7 +279,7 @@ class OllamaConfig(BaseConfig):
api_key: Optional[str] = None,
json_mode: Optional[bool] = None,
) -> ModelResponse:
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
from litellm.litellm_core_utils.prompt_templates.common_utils import (
_parse_content_for_reasoning,
)

View File

@ -1,5 +1,8 @@
from typing import List
from litellm.llms.base_llm.audio_transcription.transformation import (
AudioTranscriptionRequestData,
)
from litellm.types.llms.openai import OpenAIAudioTranscriptionOptionalParams
from litellm.types.utils import FileTypes
@ -27,8 +30,12 @@ class OpenAIGPTAudioTranscriptionConfig(OpenAIWhisperAudioTranscriptionConfig):
audio_file: FileTypes,
optional_params: dict,
litellm_params: dict,
) -> dict:
) -> AudioTranscriptionRequestData:
"""
Transform the audio transcription request
"""
return {"model": model, "file": audio_file, **optional_params}
data = {"model": model, "file": audio_file, **optional_params}
return AudioTranscriptionRequestData(
data=data,
)

View File

@ -1,4 +1,4 @@
from typing import Optional, Union
from typing import Optional, Union, cast
import httpx
from openai import AsyncOpenAI, OpenAI
@ -34,6 +34,7 @@ class OpenAIAudioTranscription(OpenAIChatCompletion):
- call openai_aclient.audio.transcriptions.create by default
"""
try:
raw_response = (
await openai_aclient.audio.transcriptions.with_raw_response.create(
**data, timeout=timeout
@ -93,15 +94,14 @@ class OpenAIAudioTranscription(OpenAIChatCompletion):
Handle audio transcription request
"""
if provider_config is not None:
data = provider_config.transform_audio_transcription_request(
transformed_data = provider_config.transform_audio_transcription_request(
model=model,
audio_file=audio_file,
optional_params=optional_params,
litellm_params=litellm_params,
)
if not isinstance(data, dict):
raise ValueError("OpenAI transformation route requires a dict")
data = cast(dict, transformed_data.data)
else:
data = {"model": model, "file": audio_file, **optional_params}

View File

@ -1,8 +1,9 @@
from typing import List, Optional, Union
from httpx import Headers
from httpx import Headers, Response
from litellm.llms.base_llm.audio_transcription.transformation import (
AudioTranscriptionRequestData,
BaseAudioTranscriptionConfig,
)
from litellm.llms.base_llm.chat.transformation import BaseLLMException
@ -11,12 +12,40 @@ from litellm.types.llms.openai import (
AllMessageValues,
OpenAIAudioTranscriptionOptionalParams,
)
from litellm.types.utils import FileTypes
from litellm.types.utils import FileTypes, TranscriptionResponse
from ..common_utils import OpenAIError
class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
def get_complete_url(
self,
api_base: Optional[str],
api_key: Optional[str],
model: str,
optional_params: dict,
litellm_params: dict,
stream: Optional[bool] = None,
) -> str:
"""
OPTIONAL
Get the complete url for the request
Some providers need `model` in `api_base`
"""
## get the api base, attach the endpoint - v1/audio/transcriptions
# strip trailing slash if present
api_base = api_base.rstrip("/") if api_base else ""
# if endswith "/v1"
if api_base and api_base.endswith("/v1"):
api_base = f"{api_base}/audio/transcriptions"
else:
api_base = f"{api_base}/v1/audio/transcriptions"
return api_base or ""
def get_supported_openai_params(
self, model: str
) -> List[OpenAIAudioTranscriptionOptionalParams]:
@ -72,21 +101,22 @@ class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
audio_file: FileTypes,
optional_params: dict,
litellm_params: dict,
) -> dict:
) -> AudioTranscriptionRequestData:
"""
Transform the audio transcription request
"""
data = {"model": model, "file": audio_file, **optional_params}
if "response_format" not in data or (
data["response_format"] == "text" or data["response_format"] == "json"
):
data[
"response_format"
] = "verbose_json" # ensures 'duration' is received - used for cost calculation
data["response_format"] = (
"verbose_json" # ensures 'duration' is received - used for cost calculation
)
return data
return AudioTranscriptionRequestData(
data=data,
)
def get_error_class(
self, error_message: str, status_code: int, headers: Union[dict, Headers]
@ -96,3 +126,25 @@ class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
message=error_message,
headers=headers,
)
def transform_audio_transcription_response(
self,
raw_response: Response,
) -> TranscriptionResponse:
try:
raw_response_json = raw_response.json()
except Exception as e:
raise ValueError(
f"Error transforming response to json: {str(e)}\nResponse: {raw_response.text}"
)
if any(
key in raw_response_json
for key in TranscriptionResponse.model_fields.keys()
):
return TranscriptionResponse(**raw_response_json)
else:
raise ValueError(
"Invalid response format. Received response does not match the expected format. Got: ",
raw_response_json,
)

View File

@ -0,0 +1,141 @@
"""
Support for OVHCloud AI Endpoints `/v1/chat/completions` endpoint.
Our unified API follows the OpenAI standard.
More information on our website: https://endpoints.ai.cloud.ovh.net
"""
from typing import Optional, Union, List
import httpx
from litellm import ModelResponseStream, OpenAIGPTConfig, get_model_info, verbose_logger
from litellm.llms.ovhcloud.utils import OVHCloudException
from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
from litellm.llms.base_llm.chat.transformation import BaseLLMException
from litellm.types.llms.openai import AllMessageValues
class OVHCloudChatConfig(OpenAIGPTConfig):
@property
def custom_llm_provider(self) -> Optional[str]:
return "ovhcloud"
def get_supported_openai_params(self, model: str) -> list:
"""
Details about function calling support can be found here:
https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-function-calling?id=kb_article_view&sysparm_article=KB0071907
"""
supports_function_calling: Optional[bool] = None
try:
model_info = get_model_info(model, custom_llm_provider="ovhcloud")
supports_function_calling = model_info.get(
"supports_function_calling", False
)
except Exception as e:
verbose_logger.debug(f"Error getting supported OpenAI params: {e}")
pass
optional_params = super().get_supported_openai_params(model)
if supports_function_calling is not True:
verbose_logger.debug(
"You can see our models supporting function_calling in our catalog: https://endpoints.ai.cloud.ovh.net/catalog "
)
optional_params.remove("tools")
optional_params.remove("tool_choice")
optional_params.remove("function_call")
optional_params.remove("response_format")
return optional_params
def get_complete_url(
self,
api_base: Optional[str],
api_key: Optional[str],
model: str,
optional_params: dict,
litellm_params: dict,
stream: Optional[bool] = None,
) -> str:
api_base = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" if api_base is None else api_base.rstrip("/")
complete_url = f"{api_base}/chat/completions"
return complete_url
def get_error_class(
self,
error_message: str,
status_code: int,
headers: Union[dict, httpx.Headers]
) -> BaseLLMException:
return OVHCloudException(
message=error_message,
status_code=status_code,
headers=headers,
)
def map_openai_params(
self,
non_default_params: dict,
optional_params: dict,
model: str,
drop_params: bool,
) -> dict:
mapped_openai_params = super().map_openai_params(
non_default_params, optional_params, model, drop_params
)
return mapped_openai_params
def transform_request(
self,
model: str,
messages: List[AllMessageValues],
optional_params: dict,
litellm_params: dict,
headers: dict,
) -> dict:
extra_body = optional_params.pop("extra_body", {})
response = super().transform_request(
model, messages, optional_params, litellm_params, headers
)
response.update(extra_body)
return response
class OVHCloudChatCompletionStreamingHandler(BaseModelResponseIterator):
"""
Handler for OVHCloud AI Endpoints streaming chat completion responses
"""
def chunk_parser(self, chunk: dict) -> ModelResponseStream:
"""
Parse individual chunks from streaming response
"""
try:
if "error" in chunk:
error_chunk = chunk["error"]
error_message = "OVHCloud Error: {}".format(
error_chunk.get("message", "Unknown error")
)
raise OVHCloudException(
message=error_message,
status_code=error_chunk.get("code", 400),
headers={"Content-Type": "application/json"},
)
new_choices = []
for choice in chunk["choices"]:
if "delta" in choice and "reasoning" in choice["delta"]:
choice["delta"]["reasoning_content"] = choice["delta"].get("reasoning")
new_choices.append(choice)
return ModelResponseStream(
id=chunk["id"],
object="chat.completion.chunk",
created=chunk["created"],
usage=chunk.get("usage"),
model=chunk["model"],
choices=new_choices,
)
except KeyError as e:
raise OVHCloudException(
message=f"KeyError: {e}, Got unexpected response from CometAPI: {chunk}",
status_code=400,
headers={"Content-Type": "application/json"},
)
except Exception as e:
raise e

View File

@ -0,0 +1,122 @@
"""
This is OpenAI compatible - no transformation is applied
"""
from typing import List, Optional, Union
import httpx
from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
from litellm.llms.base_llm.chat.transformation import BaseLLMException
from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig
from litellm.secret_managers.main import get_secret_str
from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues
from litellm.types.utils import EmbeddingResponse, Usage
from ..utils import OVHCloudException
class OVHCloudEmbeddingConfig(BaseEmbeddingConfig):
def __init__(self) -> None:
pass
def get_complete_url(
self,
api_base: Optional[str],
api_key: Optional[str],
model: str,
optional_params: dict,
litellm_params: dict,
stream: Optional[bool] = None,
) -> str:
api_base = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" if api_base is None else api_base.rstrip("/")
complete_url = f"{api_base}/embeddings"
return complete_url
def validate_environment(
self,
headers: dict,
model: str,
messages: List[AllMessageValues],
optional_params: dict,
litellm_params: dict,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
) -> dict:
if api_key is None:
api_key = get_secret_str("OVHCLOUD_API_KEY")
default_headers = {
"Authorization": f"Bearer {api_key}",
"accept": "application/json",
"Content-Type": "application/json",
}
if "Authorization" in headers:
default_headers["Authorization"] = headers["Authorization"]
return {**default_headers, **headers}
def get_supported_openai_params(self, model: str):
return []
def map_openai_params(
self,
non_default_params: dict,
optional_params: dict,
model: str,
drop_params: bool,
):
supported_openai_params = self.get_supported_openai_params(model)
for param, value in non_default_params.items():
if param in supported_openai_params:
optional_params[param] = value
return optional_params
def transform_embedding_request(
self,
model: str,
input: AllEmbeddingInputValues,
optional_params: dict,
headers: dict,
) -> dict:
return {"input": input, "model": model, **optional_params}
def transform_embedding_response(
self,
model: str,
raw_response: httpx.Response,
model_response: EmbeddingResponse,
logging_obj: LiteLLMLoggingObj,
api_key: Optional[str],
request_data: dict,
optional_params: dict,
litellm_params: dict,
) -> EmbeddingResponse:
try:
raw_response_json = raw_response.json()
except Exception:
raise OVHCloudException(
message=raw_response.text,
status_code=raw_response.status_code,
headers=raw_response.headers,
)
model_response.model = raw_response_json.get("model")
model_response.data = raw_response_json.get("data")
model_response.object = raw_response_json.get("object")
usage = Usage(
prompt_tokens=raw_response_json.get("usage", {}).get("prompt_tokens", 0),
total_tokens=raw_response_json.get("usage", {}).get("total_tokens", 0),
)
model_response.usage = usage
return model_response
def get_error_class(
self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
) -> BaseLLMException:
return OVHCloudException(
message=error_message, status_code=status_code, headers=headers
)

View File

@ -0,0 +1,6 @@
from litellm.llms.base_llm.chat.transformation import BaseLLMException
class OVHCloudException(BaseLLMException):
"""OVHCloud AI Endpoints exception handling class"""
pass

View File

@ -1,6 +1,7 @@
from typing import Optional, TypedDict, Union
from typing import Optional, Union
import httpx
from typing_extensions import TypedDict
import litellm
from litellm.llms.custom_httpx.http_handler import (

View File

@ -3,7 +3,9 @@ Types for Vertex Embeddings Requests
"""
from enum import Enum
from typing import List, Optional, TypedDict, Union
from typing import List, Optional, Union
from typing_extensions import TypedDict
class TaskType(str, Enum):

View File

@ -164,6 +164,7 @@ from .llms.openai.openai import OpenAIChatCompletion
from .llms.openai.transcriptions.handler import OpenAIAudioTranscription
from .llms.openai_like.chat.handler import OpenAILikeChatHandler
from .llms.openai_like.embedding.handler import OpenAILikeEmbeddingHandler
from .llms.ovhcloud.chat.transformation import OVHCloudChatConfig
from .llms.petals.completion import handler as petals_handler
from .llms.predibase.chat.handler import PredibaseChatCompletion
from .llms.replicate.chat.handler import completion as replicate_chat_completion
@ -259,6 +260,7 @@ sagemaker_chat_completion = SagemakerChatHandler()
bytez_transformation = BytezChatConfig()
heroku_transformation = HerokuChatConfig()
oci_transformation = OCIChatConfig()
ovhcloud_transformation = OVHCloudChatConfig()
####### COMPLETION ENDPOINTS ################
@ -3535,6 +3537,42 @@ def completion( # type: ignore # noqa: PLR0915
pass
elif custom_llm_provider == "ovhcloud" or model in litellm.ovhcloud_models:
api_key = (
api_key
or litellm.ovhcloud_key
or get_secret_str("OVHCLOUD_API_KEY")
or litellm.api_key
)
api_base = (
api_base
or litellm.api_base
or get_secret_str("OVHCLOUD_API_BASE")
or "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"
)
response = base_llm_http_handler.completion(
model=model,
messages=messages,
headers=headers,
model_response=model_response,
api_key=api_key,
api_base=api_base,
acompletion=acompletion,
logging_obj=logging,
optional_params=optional_params,
litellm_params=litellm_params,
timeout=timeout, # type: ignore
client=client,
custom_llm_provider=custom_llm_provider,
encoding=encoding,
stream=stream,
provider_config=ovhcloud_transformation,
)
pass
elif custom_llm_provider == "custom":
url = litellm.api_base or api_base or ""
if url is None or url == "":
@ -4603,6 +4641,28 @@ def embedding( # noqa: PLR0915
aembedding=aembedding,
headers=headers,
)
elif custom_llm_provider == "ovhcloud":
api_key = api_key or litellm.api_key or get_secret_str("OVHCLOUD_API_KEY")
api_base = (
api_base
or litellm.api_base
or get_secret_str("OVHCLOUD_API_BASE")
or "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"
)
response = base_llm_http_handler.embedding(
model=model,
input=input,
custom_llm_provider=custom_llm_provider,
api_base=api_base,
api_key=api_key,
logging_obj=logging,
timeout=timeout,
model_response=EmbeddingResponse(),
optional_params=optional_params,
client=client,
aembedding=aembedding,
litellm_params={},
)
elif custom_llm_provider in litellm._custom_providers:
custom_handler: Optional[CustomLLM] = None
for item in litellm.custom_provider_map:
@ -5297,7 +5357,10 @@ def transcription(
model_response = litellm.utils.TranscriptionResponse()
model, custom_llm_provider, dynamic_api_key, api_base = get_llm_provider(
model=model, custom_llm_provider=custom_llm_provider, api_base=api_base
model=model,
custom_llm_provider=custom_llm_provider,
api_base=api_base,
api_key=api_key,
) # type: ignore
if dynamic_api_key is not None:
@ -5313,6 +5376,7 @@ def transcription(
custom_llm_provider=custom_llm_provider,
**non_default_params,
)
litellm_params_dict = get_litellm_params(**kwargs)
litellm_logging_obj.update_environment_variables(
@ -5377,9 +5441,8 @@ def transcription(
max_retries=max_retries,
litellm_params=litellm_params_dict,
)
elif (
custom_llm_provider == "openai"
or custom_llm_provider in litellm.openai_compatible_providers
elif custom_llm_provider == "openai" or (
custom_llm_provider in litellm.openai_compatible_providers
):
api_base = (
api_base
@ -5394,6 +5457,7 @@ def transcription(
or None # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
)
# set API KEY
api_key = api_key or litellm.api_key or litellm.openai_key or get_secret("OPENAI_API_KEY") # type: ignore
response = openai_audio_transcriptions.audio_transcriptions(
model=model,
@ -5410,10 +5474,7 @@ def transcription(
provider_config=provider_config,
litellm_params=litellm_params_dict,
)
elif custom_llm_provider in [
LlmProviders.DEEPGRAM.value,
LlmProviders.ELEVENLABS.value,
]:
elif provider_config is not None:
response = base_llm_http_handler.audio_transcriptions(
model=model,
audio_file=file,

View File

@ -20777,5 +20777,207 @@
"metadata": {
"notes": "Volcengine Doubao embedding model - text-240715 version with 2560 dimensions"
}
},
"ovhcloud/Qwen2.5-VL-72B-Instruct": {
"max_tokens": 32000,
"max_input_tokens": 32000,
"max_output_tokens": 32000,
"input_cost_per_token": 9.1e-07,
"output_cost_per_token": 9.1e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": true,
"supports_tool_choice": false,
"supports_vision": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/qwen2-5-vl-72b-instruct"
},
"ovhcloud/llava-v1.6-mistral-7b-hf": {
"max_tokens": 32000,
"max_input_tokens": 32000,
"max_output_tokens": 32000,
"input_cost_per_token": 2.9e-07,
"output_cost_per_token": 2.9e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": true,
"supports_tool_choice": false,
"supports_vision": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/llava-next-mistral-7b"
},
"ovhcloud/gpt-oss-120b": {
"max_tokens": 131000,
"max_input_tokens": 131000,
"max_output_tokens": 131000,
"input_cost_per_token": 8e-08,
"output_cost_per_token": 4e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": true,
"supports_tool_choice": false,
"supports_reasoning": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/gpt-oss-120b"
},
"ovhcloud/Meta-Llama-3_3-70B-Instruct": {
"max_tokens": 131000,
"max_input_tokens": 131000,
"max_output_tokens": 131000,
"input_cost_per_token": 6.7e-07,
"output_cost_per_token": 6.7e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/meta-llama-3-3-70b-instruct"
},
"ovhcloud/Qwen2.5-Coder-32B-Instruct": {
"max_tokens": 32000,
"max_input_tokens": 32000,
"max_output_tokens": 32000,
"input_cost_per_token": 8.7e-07,
"output_cost_per_token": 8.7e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": true,
"supports_tool_choice": false,
"source": "https://endpoints.ai.cloud.ovh.net/models/qwen2-5-coder-32b-instruct"
},
"ovhcloud/Mixtral-8x7B-Instruct-v0.1": {
"max_tokens": 32000,
"max_input_tokens": 32000,
"max_output_tokens": 32000,
"input_cost_per_token": 6.3e-07,
"output_cost_per_token": 6.3e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": true,
"supports_tool_choice": false,
"source": "https://endpoints.ai.cloud.ovh.net/models/mixtral-8x7b-instruct-v0-1"
},
"ovhcloud/Meta-Llama-3_1-70B-Instruct": {
"max_tokens": 131000,
"max_input_tokens": 131000,
"max_output_tokens": 131000,
"input_cost_per_token": 6.7e-07,
"output_cost_per_token": 6.7e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": false,
"supports_tool_choice": false,
"source": "https://endpoints.ai.cloud.ovh.net/models/meta-llama-3-1-70b-instruct"
},
"ovhcloud/Mistral-Small-3.2-24B-Instruct-2506": {
"max_tokens": 128000,
"max_input_tokens": 128000,
"max_output_tokens": 128000,
"input_cost_per_token": 9e-08,
"output_cost_per_token": 2.8e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_vision": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/mistral-small-3-2-24b-instruct-2506"
},
"ovhcloud/DeepSeek-R1-Distill-Llama-70B": {
"max_tokens": 131000,
"max_input_tokens": 131000,
"max_output_tokens": 131000,
"input_cost_per_token": 6.7e-07,
"output_cost_per_token": 6.7e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_reasoning": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/deepseek-r1-distill-llama-70b"
},
"ovhcloud/Llama-3.1-8B-Instruct": {
"max_tokens": 131000,
"max_input_tokens": 131000,
"max_output_tokens": 131000,
"input_cost_per_token": 1e-07,
"output_cost_per_token": 1e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/llama-3-1-8b-instruct"
},
"ovhcloud/Mistral-7B-Instruct-v0.3": {
"max_tokens": 127000,
"max_input_tokens": 127000,
"max_output_tokens": 127000,
"input_cost_per_token": 1e-07,
"output_cost_per_token": 1e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/mistral-7b-instruct-v0-3"
},
"ovhcloud/gpt-oss-20b": {
"max_tokens": 131000,
"max_input_tokens": 131000,
"max_output_tokens": 131000,
"input_cost_per_token": 4e-08,
"output_cost_per_token": 1.5e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": true,
"supports_tool_choice": false,
"supports_reasoning": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/gpt-oss-20b"
},
"ovhcloud/Mistral-Nemo-Instruct-2407": {
"max_tokens": 118000,
"max_input_tokens": 118000,
"max_output_tokens": 118000,
"input_cost_per_token": 1.3e-07,
"output_cost_per_token": 1.3e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/mistral-nemo-instruct-2407"
},
"ovhcloud/Qwen3-32B": {
"max_tokens": 32000,
"max_input_tokens": 32000,
"max_output_tokens": 32000,
"input_cost_per_token": 8e-08,
"output_cost_per_token": 2.3e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": true,
"supports_response_schema": true,
"supports_tool_choice": true,
"supports_reasoning": true,
"source": "https://endpoints.ai.cloud.ovh.net/models/qwen3-32b"
},
"ovhcloud/mamba-codestral-7B-v0.1": {
"max_tokens": 256000,
"max_input_tokens": 256000,
"max_output_tokens": 256000,
"input_cost_per_token": 1.9e-07,
"output_cost_per_token": 1.9e-07,
"litellm_provider": "ovhcloud",
"mode": "chat",
"supports_function_calling": false,
"supports_response_schema": true,
"supports_tool_choice": false,
"source": "https://endpoints.ai.cloud.ovh.net/models/mamba-codestral-7b-v0-1"
}
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1 +0,0 @@
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{6580:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=6580)}),_N_E=n.O()}]);

View File

@ -0,0 +1 @@
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{96443:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=96443)}),_N_E=n.O()}]);

View File

@ -1 +1 @@
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{11790:function(e,n,t){Promise.resolve().then(t.bind(t,52829))},52829:function(e,n,t){"use strict";t.r(n),t.d(n,{default:function(){return f}});var u=t(57437),s=t(2265),c=t(99376),r=t(72162);function f(){let e=(0,c.useSearchParams)().get("key"),[n,t]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&t(e)},[e]),(0,u.jsx)(r.Z,{accessToken:n})}}},function(e){e.O(0,[50,521,154,162,971,117,744],function(){return e(e.s=11790)}),_N_E=e.O()}]);
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{21024:function(e,n,t){Promise.resolve().then(t.bind(t,52829))},52829:function(e,n,t){"use strict";t.r(n),t.d(n,{default:function(){return f}});var u=t(57437),s=t(2265),c=t(99376),r=t(72162);function f(){let e=(0,c.useSearchParams)().get("key"),[n,t]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&t(e)},[e]),(0,u.jsx)(r.Z,{accessToken:n})}}},function(e){e.O(0,[50,521,154,162,971,117,744],function(){return e(e.s=21024)}),_N_E=e.O()}]);

View File

@ -1 +1 @@
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[25],{58538:function(e,n,u){Promise.resolve().then(u.bind(u,22775))},22775:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(36172);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1,userRole:null})}}},function(e){e.O(0,[50,521,866,154,162,172,971,117,744],function(){return e(e.s=58538)}),_N_E=e.O()}]);
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[25],{64563:function(e,n,u){Promise.resolve().then(u.bind(u,22775))},22775:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(36172);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1,userRole:null})}}},function(e){e.O(0,[50,521,866,154,162,172,971,117,744],function(){return e(e.s=64563)}),_N_E=e.O()}]);

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1 +1 @@
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{20169:function(e,n,t){Promise.resolve().then(t.t.bind(t,12846,23)),Promise.resolve().then(t.t.bind(t,19107,23)),Promise.resolve().then(t.t.bind(t,61060,23)),Promise.resolve().then(t.t.bind(t,4707,23)),Promise.resolve().then(t.t.bind(t,80,23)),Promise.resolve().then(t.t.bind(t,36423,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,117],function(){return n(54278),n(20169)}),_N_E=e.O()}]);
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{10264:function(e,n,t){Promise.resolve().then(t.t.bind(t,12846,23)),Promise.resolve().then(t.t.bind(t,19107,23)),Promise.resolve().then(t.t.bind(t,61060,23)),Promise.resolve().then(t.t.bind(t,4707,23)),Promise.resolve().then(t.t.bind(t,80,23)),Promise.resolve().then(t.t.bind(t,36423,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,117],function(){return n(54278),n(10264)}),_N_E=e.O()}]);

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,7 +1,7 @@
2:I[19107,[],"ClientPageRoot"]
3:I[30628,["665","static/chunks/3014691f-b7b79b78e27792f3.js","990","static/chunks/13b76428-ebdf3012af0e4489.js","50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","220","static/chunks/220-5061c4cea850d728.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","931","static/chunks/app/page-127adcf8da2b5294.js"],"default",1]
3:I[30628,["665","static/chunks/3014691f-b7b79b78e27792f3.js","990","static/chunks/13b76428-ebdf3012af0e4489.js","50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","220","static/chunks/220-8af5927d18414264.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","931","static/chunks/app/page-338773f18570e0d6.js"],"default",1]
4:I[4707,[],""]
5:I[36423,[],""]
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
1:null

View File

@ -1,7 +1,7 @@
2:I[19107,[],"ClientPageRoot"]
3:I[52829,["50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","418","static/chunks/app/model_hub/page-d6e5fb7de2cedde9.js"],"default",1]
3:I[52829,["50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","418","static/chunks/app/model_hub/page-0dbadf20167b786c.js"],"default",1]
4:I[4707,[],""]
5:I[36423,[],""]
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
1:null

File diff suppressed because one or more lines are too long

View File

@ -1,7 +1,7 @@
2:I[19107,[],"ClientPageRoot"]
3:I[22775,["50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","25","static/chunks/app/model_hub_table/page-e06e934de1021ee4.js"],"default",1]
3:I[22775,["50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","25","static/chunks/app/model_hub_table/page-f469bae327299fbb.js"],"default",1]
4:I[4707,[],""]
5:I[36423,[],""]
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["model_hub_table",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub_table",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub_table","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["model_hub_table",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub_table",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub_table","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
1:null

File diff suppressed because one or more lines are too long

View File

@ -1,7 +1,7 @@
2:I[19107,[],"ClientPageRoot"]
3:I[12011,["665","static/chunks/3014691f-b7b79b78e27792f3.js","50","static/chunks/50-fe160ecfa8bc4059.js","154","static/chunks/154-fff436ed72b19a24.js","461","static/chunks/app/onboarding/page-3c5840c907b0a5c8.js"],"default",1]
3:I[12011,["665","static/chunks/3014691f-b7b79b78e27792f3.js","50","static/chunks/50-bb8a11a7610535aa.js","154","static/chunks/154-6f752d9e0a5e497b.js","461","static/chunks/app/onboarding/page-7828c2c64e97362a.js"],"default",1]
4:I[4707,[],""]
5:I[36423,[],""]
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
1:null

View File

@ -1,12 +1,20 @@
model_list:
- model_name: fake-openai-endpoint
- model_name: byok-fixed-gpt-4o-mini
litellm_params:
model: openai/fake
api_key: fake-key
api_base: https://exampleopenaiendpoint-production.up.railway.app/
- model_name: wildcard_models/*
model: openai/gpt-4o-mini
api_base: "https://webhook.site/2f385e05-00aa-402b-86d1-efc9261471a5"
api_key: dummy
- model_name: "byok-wildcard/*"
litellm_params:
model: openai/*
- model_name: xai-grok-3
litellm_params:
model: xai/grok-3
- model_name: hosted_vllm/whisper-v3
litellm_params:
model: hosted_vllm/whisper-v3
api_base: "https://webhook.site/2f385e05-00aa-402b-86d1-efc9261471a5"
api_key: dummy

View File

@ -1949,7 +1949,7 @@ class LiteLLM_OrganizationMembershipTable(LiteLLMPydanticObjectBase):
model_config = ConfigDict(protected_namespaces=())
class LiteLLM_OrganizationTableUpdate(LiteLLMPydanticObjectBase):
class LiteLLM_OrganizationTableUpdate(LiteLLM_BudgetTable):
"""Represents user-controllable params for a LiteLLM_OrganizationTable record"""
organization_id: Optional[str] = None

View File

@ -1211,7 +1211,6 @@ def _check_model_access_helper(
models: List[str],
team_model_aliases: Optional[Dict[str, str]] = None,
team_id: Optional[str] = None,
object_type: Literal["user", "team", "key", "org"] = "user",
) -> bool:
## check if model in allowed model names
from collections import defaultdict
@ -1316,7 +1315,6 @@ def _can_object_call_model(
models=models,
team_model_aliases=team_model_aliases,
team_id=team_id,
object_type=object_type,
):
return True

View File

@ -0,0 +1,36 @@
model_list:
- model_name: claude-3-5-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
guardrails:
- guardrail_name: "tool-permission-guardrail"
litellm_params:
guardrail: tool_permission
mode: "post_call"
default_on: true # Apply to all requests by default
rules:
- id: "allow_bash"
tool_name: "Bash"
decision: "allow"
- id: "allow_github_mcp"
tool_name: "mcp__github_*"
decision: "allow"
- id: "allow_aws_documentation"
tool_name: "mcp__aws-documentation_*_documentation"
decision: "allow"
- id: "deny_read_commands"
tool_name: "Read"
decision: "Deny"
default_action: "deny" # deny by default if no rule matches
on_disallowed_action: "block" # block by default if no rule matches
# Optional: Configure general settings
general_settings:
master_key: sk-1234
# Optional: Add logging configuration
litellm_settings:
success_callback: ["langfuse"]
failure_callback: ["langfuse"]

View File

@ -18,6 +18,7 @@ def initialize_guardrail(litellm_params: "LitellmParams", guardrail: "Guardrail"
application_id=litellm_params.application_id,
monitor_mode=litellm_params.monitor_mode,
block_failures=litellm_params.block_failures,
anonymize_input=litellm_params.anonymize_input,
event_hook=litellm_params.mode,
default_on=litellm_params.default_on,
)

View File

@ -5,9 +5,10 @@
#
# +-------------------------------------------------------------+
import asyncio
import copy
import os
from typing import Any, Dict, Literal, Optional, Union
from typing import Any, Dict, Final, Literal, Optional, Union
from urllib.parse import urljoin
from fastapi import HTTPException
@ -24,6 +25,15 @@ from litellm.proxy._types import UserAPIKeyAuth
from litellm.types.guardrails import GuardrailEventHooks
from litellm.types.utils import EmbeddingResponse, ImageResponse
# Constants
USER_ROLE: Final[Literal["user"]] = "user"
ASSISTANT_ROLE: Final[Literal["assistant"]] = "assistant"
SENSITIVE_DATA_DETECTOR_KEYS: Final[list[str]] = ["sensitiveData", "dataDetector"]
# Type aliases
MessageRole = Literal["user", "assistant"]
LLMResponse = Union[Any, ModelResponse, EmbeddingResponse, ImageResponse]
class NomaBlockedMessage(HTTPException):
"""Exception raised when Noma guardrail blocks a message"""
@ -77,6 +87,7 @@ class NomaBlockedMessage(HTTPException):
"allowedTopics",
"bannedTopics",
"topicGuardrails",
"topicDetector", # Mock name for tests
] and isinstance(value, dict):
filtered_topics = {}
for topic, topic_result in value.items():
@ -86,7 +97,7 @@ class NomaBlockedMessage(HTTPException):
if filtered_topics:
result[key] = filtered_topics
elif key == "sensitiveData" and isinstance(value, dict):
elif key in SENSITIVE_DATA_DETECTOR_KEYS and isinstance(value, dict):
filtered_sensitive = {}
for data_type, data_result in value.items():
if self._is_result_true(data_result):
@ -135,6 +146,7 @@ class NomaGuardrail(CustomGuardrail):
application_id: Optional[str] = None,
monitor_mode: Optional[bool] = None,
block_failures: Optional[bool] = None,
anonymize_input: Optional[bool] = None,
**kwargs,
):
self.async_handler = get_async_httpx_client(
@ -162,8 +174,326 @@ class NomaGuardrail(CustomGuardrail):
else:
self.block_failures = block_failures
if anonymize_input is None:
self.anonymize_input = (
os.environ.get("NOMA_ANONYMIZE_INPUT", "false").lower() == "true"
)
else:
self.anonymize_input = anonymize_input
super().__init__(**kwargs)
def _create_background_noma_check(
self,
coro,
) -> None:
"""Create a background task for Noma API calls without blocking the main flow"""
try:
asyncio.create_task(coro)
except Exception as e:
verbose_proxy_logger.error(
f"Failed to create background Noma task: {str(e)}"
)
async def _process_user_message_check(
self,
request_data: dict,
user_auth: UserAPIKeyAuth,
) -> Optional[str]:
"""Shared logic for processing user message checks"""
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
user_message = await self._extract_user_message(request_data)
if not user_message:
return None
payload = {"request": {"text": user_message}}
response_json = await self._call_noma_api(
payload=payload,
llm_request_id=None,
request_data=request_data,
user_auth=user_auth,
extra_data=extra_data,
)
if self.monitor_mode:
await self._handle_verdict_background(
USER_ROLE, user_message, response_json
)
return user_message
# Check if we should anonymize content
if self._should_anonymize(response_json, USER_ROLE):
anonymized_content = self._extract_anonymized_content(
response_json, USER_ROLE
)
if anonymized_content:
# Replace the user message content with anonymized version
self._replace_user_message_content(request_data, anonymized_content)
verbose_proxy_logger.debug(
f"Noma guardrail anonymized user message: {anonymized_content}"
)
return anonymized_content
await self._check_verdict(USER_ROLE, user_message, response_json)
return user_message
async def _process_llm_response_check(
self,
request_data: dict,
response: LLMResponse,
user_auth: UserAPIKeyAuth,
) -> Optional[str]:
"""Shared logic for processing LLM response checks"""
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
if not isinstance(response, litellm.ModelResponse):
return None
content = None
for choice in response.choices:
if isinstance(choice, litellm.Choices) and choice.message.content:
content = choice.message.content
break
if not content or not isinstance(content, str):
return None
payload = {"response": {"text": content}}
response_json = await self._call_noma_api(
payload=payload,
llm_request_id=response.id,
request_data=request_data,
user_auth=user_auth,
extra_data=extra_data,
)
if self.monitor_mode:
await self._handle_verdict_background(
ASSISTANT_ROLE, content, response_json
)
return content
# Check if we should anonymize content
if self._should_anonymize(response_json, ASSISTANT_ROLE):
anonymized_content = self._extract_anonymized_content(
response_json, ASSISTANT_ROLE
)
if anonymized_content:
# Replace the LLM response content with anonymized version
self._replace_llm_response_content(response, anonymized_content)
verbose_proxy_logger.debug(
f"Noma guardrail anonymized LLM response: {anonymized_content}"
)
return anonymized_content
await self._check_verdict(ASSISTANT_ROLE, content, response_json)
return content
def _should_only_sensitive_data_failed(self, classification_obj: dict) -> bool:
"""
Check if only sensitive data detectors (PII, PCI, secrets) have result=true in the classification.
Args:
classification_obj: The prompt or response classification object from Noma API
Returns:
True if only sensitiveData detectors have result=true, False otherwise
"""
if not classification_obj:
return False
# Track which detectors have result=true (detected violations)
failed_detectors = []
sensitive_data_detected = False
for key, value in classification_obj.items():
if key in SENSITIVE_DATA_DETECTOR_KEYS and isinstance(value, dict):
# Check if any sensitive data detector has result=true
for data_type, data_result in value.items():
if self._is_result_true(data_result):
sensitive_data_detected = True
# Don't add to failed_detectors as we want to allow these
elif isinstance(value, dict) and "result" in value:
# Check other detectors - these should NOT have result=true
if self._is_result_true(value):
failed_detectors.append(key)
elif isinstance(value, dict):
# Handle nested detectors
for nested_key, nested_value in value.items():
if self._is_result_true(nested_value):
failed_detectors.append(f"{key}.{nested_key}")
# Return True only if sensitive data was detected AND no other detectors have result=true
return sensitive_data_detected and len(failed_detectors) == 0
def _extract_anonymized_content(
self, response_json: dict, message_type: MessageRole
) -> Optional[str]:
"""
Extract anonymized content from Noma API response.
Args:
response_json: The full response from Noma API
message_type: Either 'user' or 'assistant' to determine which content to extract
Returns:
The anonymized content string if available, None otherwise
"""
original_response = response_json.get("originalResponse", {})
if message_type == USER_ROLE:
prompt_data = original_response.get("prompt", {})
anonymized_data = prompt_data.get("anonymizedContent", {})
return anonymized_data.get("anonymized")
elif message_type == ASSISTANT_ROLE:
response_data = original_response.get("response", {})
anonymized_data = response_data.get("anonymizedContent", {})
return anonymized_data.get("anonymized")
return None
def _should_anonymize(self, response_json: dict, message_type: MessageRole) -> bool:
"""
Determine if content should be anonymized based on Noma API response.
Logic:
- If verdict=True: Content is safe, anonymize if anonymized version exists
- If verdict=False: Check if only sensitiveData detectors have result=True
- If yes: Anonymize
- If no: Block (other violations detected)
Args:
response_json: The full response from Noma API
message_type: Either 'user' or 'assistant' to determine which classification to check
Returns:
True if content should be anonymized, False if it should be blocked
"""
# Only anonymize in blocking mode when anonymize_input is enabled
if self.monitor_mode or not self.anonymize_input:
return False
verdict = response_json.get("verdict", True)
# If verdict is True, anonymize (content is considered safe)
if verdict:
return True
# If verdict is False, check if only sensitive data detectors have result=True
original_response = response_json.get("originalResponse", {})
if message_type == USER_ROLE:
classification_obj = original_response.get("prompt", {})
elif message_type == ASSISTANT_ROLE:
classification_obj = original_response.get("response", {})
else:
return False
# Anonymize only if solely sensitive data (PII/PCI/secrets) was detected
return self._should_only_sensitive_data_failed(classification_obj)
def _is_result_true(self, result_obj: Optional[Dict[str, Any]]) -> bool:
"""
Check if a result object has a "result" field that is True.
Args:
result_obj: A dictionary that may contain a "result" field
Returns:
True if the "result" field exists and is True, False otherwise
"""
if not result_obj or not isinstance(result_obj, dict):
return False
return result_obj.get("result") is True
def _replace_user_message_content(
self, request_data: dict, anonymized_content: str
):
"""
Replace the user message content in request data with anonymized version.
Args:
request_data: The original request data
anonymized_content: The anonymized content to replace with
"""
messages = request_data.get("messages", [])
if not messages:
return
# Find and replace the last user message
for i in range(len(messages) - 1, -1, -1):
if messages[i].get("role") == USER_ROLE:
messages[i]["content"] = anonymized_content
break
def _replace_llm_response_content(
self, response: LLMResponse, anonymized_content: str
):
"""
Replace the LLM response content with anonymized version.
Args:
response: The original LLM response
anonymized_content: The anonymized content to replace with
"""
if not isinstance(response, litellm.ModelResponse):
return
# Replace content in all choices
for choice in response.choices:
if isinstance(choice, litellm.Choices) and choice.message.content:
choice.message.content = anonymized_content
async def _check_user_message_background(
self,
request_data: dict,
user_auth: UserAPIKeyAuth,
) -> None:
"""Check user message in background for monitor mode - non-blocking"""
try:
await self._process_user_message_check(request_data, user_auth)
except Exception as e:
verbose_proxy_logger.error(
f"Noma background user message check failed: {str(e)}"
)
async def _check_llm_response_background(
self,
request_data: dict,
response: LLMResponse,
user_auth: UserAPIKeyAuth,
) -> None:
"""Check LLM response in background for monitor mode - non-blocking"""
try:
await self._process_llm_response_check(request_data, response, user_auth)
except Exception as e:
verbose_proxy_logger.error(
f"Noma background response check failed: {str(e)}"
)
async def _handle_verdict_background(
self,
type: MessageRole,
message: str,
response_json: dict,
) -> None:
"""Handle verdict from Noma API in background - logging only, never blocks"""
try:
if not response_json.get("verdict", True):
msg = f"Noma guardrail blocked {type} message: {message}"
verbose_proxy_logger.warning(msg)
else:
msg = f"Noma guardrail allowed {type} message: {message}"
verbose_proxy_logger.info(msg)
except Exception as e:
verbose_proxy_logger.error(
f"Noma background verdict handling failed: {str(e)}"
)
async def async_pre_call_hook(
self,
user_api_key_dict: UserAPIKeyAuth,
@ -191,6 +521,18 @@ class NomaGuardrail(CustomGuardrail):
):
return data
# In monitor mode, run Noma check in background and return immediately
if self.monitor_mode:
try:
self._create_background_noma_check(
self._check_user_message_background(data, user_api_key_dict)
)
except Exception as e:
verbose_proxy_logger.error(
f"Failed to start background Noma pre-call check: {str(e)}"
)
return data
try:
return await self._check_user_message(data, user_api_key_dict)
except NomaBlockedMessage:
@ -198,7 +540,7 @@ class NomaGuardrail(CustomGuardrail):
except Exception as e:
verbose_proxy_logger.error(f"Noma pre-call hook failed: {str(e)}")
if self.block_failures and not self.monitor_mode:
if self.block_failures:
raise
return data
@ -220,6 +562,18 @@ class NomaGuardrail(CustomGuardrail):
if self.should_run_guardrail(data=data, event_type=event_type) is not True:
return data
# In monitor mode, run Noma check in background and return immediately
if self.monitor_mode:
try:
self._create_background_noma_check(
self._check_user_message_background(data, user_api_key_dict)
)
except Exception as e:
verbose_proxy_logger.error(
f"Failed to start background Noma moderation check: {str(e)}"
)
return data
try:
return await self._check_user_message(data, user_api_key_dict)
except NomaBlockedMessage:
@ -227,7 +581,7 @@ class NomaGuardrail(CustomGuardrail):
except Exception as e:
verbose_proxy_logger.error(f"Noma moderation hook failed: {str(e)}")
if self.block_failures and not self.monitor_mode:
if self.block_failures:
raise
return data
@ -235,19 +589,33 @@ class NomaGuardrail(CustomGuardrail):
self,
data: dict,
user_api_key_dict: UserAPIKeyAuth,
response: Union[Any, ModelResponse, EmbeddingResponse, ImageResponse],
response: LLMResponse,
):
event_type: GuardrailEventHooks = GuardrailEventHooks.post_call
if self.should_run_guardrail(data=data, event_type=event_type) is not True:
return response
# In monitor mode, run Noma check in background and return immediately
if self.monitor_mode:
try:
self._create_background_noma_check(
self._check_llm_response_background(
data, response, user_api_key_dict
)
)
except Exception as e:
verbose_proxy_logger.error(
f"Failed to start background Noma post-call check: {str(e)}"
)
return response
try:
return await self._check_llm_response(data, response, user_api_key_dict)
except NomaBlockedMessage:
raise
except Exception as e:
verbose_proxy_logger.error(f"Noma post-call hook failed: {str(e)}")
if self.block_failures and not self.monitor_mode:
if self.block_failures:
raise
return response
@ -257,55 +625,24 @@ class NomaGuardrail(CustomGuardrail):
user_auth: UserAPIKeyAuth,
) -> Union[Exception, str, dict, None]:
"""Check user message for policy violations"""
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
user_message = await self._extract_user_message(request_data)
user_message = await self._process_user_message_check(request_data, user_auth)
if not user_message:
return request_data
payload = {"request": {"text": user_message}}
response_json = await self._call_noma_api(
payload=payload,
llm_request_id=None,
request_data=request_data,
user_auth=user_auth,
extra_data=extra_data,
)
await self._check_verdict("user", user_message, response_json)
return request_data
async def _check_llm_response(
self,
request_data: dict,
response: Union[Any, ModelResponse, EmbeddingResponse, ImageResponse],
response: LLMResponse,
user_auth: UserAPIKeyAuth,
) -> Union[Exception, ModelResponse, Any]:
"""Check LLM response for policy violations"""
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
if not isinstance(response, litellm.ModelResponse):
return response
content = None
for choice in response.choices:
if isinstance(choice, litellm.Choices) and choice.message.content:
content = choice.message.content
break
if not content or not isinstance(content, str):
return response
payload = {"response": {"text": content}}
response_json = await self._call_noma_api(
payload=payload,
llm_request_id=response.id,
request_data=request_data,
user_auth=user_auth,
extra_data=extra_data,
content = await self._process_llm_response_check(
request_data, response, user_auth
)
await self._check_verdict("assistant", content, response_json)
if not content:
return response
return response
@ -316,7 +653,7 @@ class NomaGuardrail(CustomGuardrail):
return None
# Get the last user message
user_messages = [msg for msg in messages if msg.get("role") == "user"]
user_messages = [msg for msg in messages if msg.get("role") == USER_ROLE]
if not user_messages:
return None
@ -371,7 +708,7 @@ class NomaGuardrail(CustomGuardrail):
async def _check_verdict(
self,
type: Literal["user", "assistant"],
type: MessageRole,
message: str,
response_json: dict,
) -> None:
@ -379,11 +716,7 @@ class NomaGuardrail(CustomGuardrail):
Check the verdict from the Noma API and raise an exception if needed
"""
if not response_json.get("verdict", True):
msg = str.format(
"Noma guardrail blocked {type} message: {message}",
type=type,
message=message,
)
msg = f"Noma guardrail blocked {type} message: {message}"
if self.monitor_mode:
verbose_proxy_logger.warning(msg)
@ -392,11 +725,7 @@ class NomaGuardrail(CustomGuardrail):
original_response = response_json.get("originalResponse", {})
raise NomaBlockedMessage(original_response)
else:
msg = str.format(
"Noma guardrail allowed {type} message: {message}",
type=type,
message=message,
)
msg = f"Noma guardrail allowed {type} message: {message}"
if self.monitor_mode:
verbose_proxy_logger.info(msg)
else:

View File

@ -0,0 +1,511 @@
from fastapi import HTTPException
import re
from typing import Any, AsyncGenerator, Dict, List, Literal, Optional, Union
from litellm import ChatCompletionToolParam
from litellm._logging import verbose_proxy_logger
from litellm.caching.dual_cache import DualCache
from litellm.exceptions import GuardrailRaisedException
from litellm.integrations.custom_guardrail import (
CustomGuardrail,
log_guardrail_information,
)
from litellm.proxy._types import UserAPIKeyAuth
from litellm.proxy.common_utils.callback_utils import (
add_guardrail_to_applied_guardrails_header,
)
from litellm.types.guardrails import GuardrailEventHooks
from litellm.types.proxy.guardrails.guardrail_hooks.tool_permission import (
PermissionError,
ToolPermissionRule,
ToolResult,
)
from litellm.types.utils import (
ModelResponse,
ModelResponseStream,
LLMResponseTypes,
Choices,
ChatCompletionMessageToolCall,
)
GUARDRAIL_NAME = "tool_permission"
class ToolPermissionGuardrail(CustomGuardrail):
def __init__(
self,
rules: Optional[List[Dict]] = None,
default_action: Literal["deny", "allow"] = "deny",
on_disallowed_action: Literal["block", "rewrite"] = "block",
**kwargs,
):
"""
Initialize the Tool Permission Guardrail
Args:
rules: List of permission rules
default_action: Default action when no rule matches ("allow" or "deny")
on_disallowed_action:
**kwargs: Additional arguments passed to CustomGuardrail
"""
# Set supported event hooks - this guardrail only works on post_call
if "supported_event_hooks" not in kwargs:
kwargs["supported_event_hooks"] = [
GuardrailEventHooks.pre_call,
GuardrailEventHooks.post_call,
]
super().__init__(**kwargs)
self.rules: List[ToolPermissionRule] = []
if rules:
for rule_dict in rules:
self.rules.append(ToolPermissionRule(**rule_dict))
self.default_action = default_action
self.on_disallowed_action = on_disallowed_action
verbose_proxy_logger.debug(
"Tool Permission Guardrail initialized with %d rules, default_action: %s",
len(self.rules),
self.default_action,
)
def _matches_pattern(self, tool_name: str, pattern: str) -> bool:
"""
Check if a tool name matches a pattern
Supports patterns like:
- "Bash" - exact match
- "mcp__*" - prefix pattern (matches names starting wich "mcp__")
- "*_read" - suffix wildcard (matches names ending with "_read")
- "mcp__github_*_read" - infix wildcard (matches names like "mcp__github_mark_all_notifications_read")
Args:
tool_name: Name of the tool to check
pattern: Pattern to match against
Returns:
True if the tool name matches the pattern
"""
# Handle exact matches
if tool_name == pattern:
return True
if "*" in pattern:
# Escape regex special chars except '*'
escaped_pattern = re.escape(pattern)
# Turn \* into .*
regex_pattern = escaped_pattern.replace(r"\*", ".*")
return bool(re.fullmatch(regex_pattern, tool_name))
return False
def _check_tool_permission(
self, tool_name: str
) -> tuple[bool, Optional[str], Optional[str]]:
"""
Check if a tool is allowed based on the configured rules
Args:
tool_name: Name of the tool to check
Returns:
Tuple of (is_allowed, rule_id, message)
"""
verbose_proxy_logger.debug(f"Checking permission for tool: {tool_name}")
# Check each rule in order
for rule in self.rules:
if self._matches_pattern(tool_name, rule.tool_name):
is_allowed = rule.decision == "allow"
message = f"Tool '{tool_name}' {'allowed' if is_allowed else 'denied'} by rule '{rule.id}'"
verbose_proxy_logger.debug(message)
return is_allowed, rule.id, message
# No rule matched, use default action
is_allowed = self.default_action == "allow"
message = f"Tool '{tool_name}' {'allowed' if is_allowed else 'denied'} by default action"
verbose_proxy_logger.debug(message)
return is_allowed, None, message
def _extract_tool_calls_from_response(
self, response: ModelResponse
) -> List[ChatCompletionMessageToolCall]:
"""
Extract tool_calls from all choices in a model response.
Args:
response: The model response to analyze
Returns:
List of tool_calls blocks found in the response
"""
tool_calls = []
for choice in response.choices:
if isinstance(choice, Choices):
for tool in choice.message.tool_calls or []:
tool_calls.append(tool)
return tool_calls
def _modify_request_with_permission_errors(
self,
data: dict,
denied_tool_names: List[str],
):
"""
Modify the request to replace denied tool_calls blocks with error results
Args:
data: The model request to modify
denied_tools: List of (tool_use, error) tuples for denied tools
"""
if not denied_tool_names:
return data
verbose_proxy_logger.info(
f"Blocking {len(denied_tool_names)} unauthorized tool uses"
)
# Create a mapping of tool_use_id to error result
error_tool_names = set()
for tool_use in denied_tool_names:
error_tool_names.add(tool_use)
# Modify the tools
tools: Optional[List[ChatCompletionToolParam]] = data.get("tools")
if tools is None:
return data
new_tools = []
for tool in tools:
if tool["type"] != "function":
continue
tool_name: str = tool["function"]["name"]
if tool_name not in error_tool_names:
new_tools.append(tool)
data["tools"] = new_tools
return data
def _create_permission_error_result(
self, tool_call: ChatCompletionMessageToolCall, error: PermissionError
) -> ToolResult:
"""
Create a tool_result block for a permission error
Args:
tool_use: The tool use that was denied
error: The permission error details
Returns:
A tool_result block with the error message
"""
error_message = f"Permission denied: {error.message}"
if error.rule_id:
error_message += f" (Rule: {error.rule_id})"
return ToolResult(
tool_use_id=tool_call.id, content=error_message, is_error=True
)
def _modify_response_with_permission_errors(
self,
response: ModelResponse,
denied_tools: List[tuple[ChatCompletionMessageToolCall, PermissionError]],
) -> None:
"""
Modify the response to replace denied tool_calls blocks with error results
Args:
response: The model response to modify
denied_tools: List of (tool_use, error) tuples for denied tools
"""
if not denied_tools:
return
verbose_proxy_logger.info(
f"Blocking {len(denied_tools)} unauthorized tool uses"
)
# Create a mapping of tool_use_id to error result
error_results = {}
for tool_use, error in denied_tools:
error_result = self._create_permission_error_result(tool_use, error)
error_results[tool_use.id] = error_result
# Modify the response content
for choice in response.choices:
if isinstance(choice, Choices):
filtered_tool_calls = []
error_messages = []
# Rewrite tool_calls
for tool_call in choice.message.tool_calls or []:
tool_call_id = tool_call.id
if tool_call_id in error_results:
error_result = error_results[tool_call_id]
error_messages.append(error_result.content)
else:
filtered_tool_calls.append(tool_call)
choice.message.tool_calls = (
filtered_tool_calls if filtered_tool_calls else None
)
# Add error messages to content
if error_messages:
existing_content = choice.message.content
if existing_content:
choice.message.content = (
existing_content + "\n\n" + "\n".join(error_messages)
)
else:
choice.message.content = "\n".join(error_messages)
@log_guardrail_information
async def async_pre_call_hook(
self,
user_api_key_dict: UserAPIKeyAuth,
cache: DualCache,
data: dict,
call_type: Literal[
"completion",
"text_completion",
"embeddings",
"image_generation",
"moderation",
"audio_transcription",
"pass_through_endpoint",
"rerank",
"mcp_call",
],
) -> Union[Exception, str, dict, None]:
""" """
verbose_proxy_logger.debug("Tool Permission Guardrail Pre-Call Hook")
from litellm.proxy.common_utils.callback_utils import (
add_guardrail_to_applied_guardrails_header,
)
event_type: GuardrailEventHooks = GuardrailEventHooks.pre_call
if self.should_run_guardrail(data=data, event_type=event_type) is not True:
return data
new_tools: Optional[List[ChatCompletionToolParam]] = data.get("tools")
if new_tools is None:
verbose_proxy_logger.warning(
"Tool Permission Guardrail: not running guardrail. No tools in data"
)
return data
# Check permissions for each tool
denied_tool_names = []
for tool in new_tools:
if tool["type"] != "function":
continue
tool_name: str = tool["function"]["name"]
is_allowed, _, message = self._check_tool_permission(tool_name)
if not is_allowed and message is not None:
verbose_proxy_logger.warning(f"Tool Permission Guardrail: {message}")
if self.on_disallowed_action == "block":
raise HTTPException(
status_code=400,
detail={
"error": "Violated guardrail policy",
"detection_message": message,
},
)
denied_tool_names.append(tool_name)
if denied_tool_names:
data = self._modify_request_with_permission_errors(data, denied_tool_names)
verbose_proxy_logger.debug(
"Tool Permission Guardrail Pre-Call Hook: All tools allowed"
)
add_guardrail_to_applied_guardrails_header(
request_data=data, guardrail_name=self.guardrail_name
)
return data
@log_guardrail_information
async def async_post_call_success_hook(
self,
data: dict,
user_api_key_dict: UserAPIKeyAuth,
response: LLMResponseTypes,
):
"""
Check tool usage permissions after the LLM call
Args:
data: Request data
user_api_key_dict: User API key information (unused but required by interface)
response: The model response to check
"""
if not isinstance(response, ModelResponse):
return
verbose_proxy_logger.debug(
"Tool Permission Guardrail Post-Call Hook: Checking response"
)
if not self.should_run_guardrail(
data=data, event_type=GuardrailEventHooks.post_call
):
verbose_proxy_logger.debug(
"Tool Permission Guardrail: Skipping check (not enabled)"
)
return
# Extract tool_calls from the response
tool_calls = self._extract_tool_calls_from_response(response)
if not tool_calls:
verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
return
verbose_proxy_logger.debug(
f"Tool Permission Guardrail: Found {len(tool_calls)} tool calls"
)
# Check permissions for each tool use
denied_tools = []
for tool_call in tool_calls:
if tool_call.function.name is None:
continue
is_allowed, rule_id, message = self._check_tool_permission(
tool_call.function.name
)
if not is_allowed and message is not None:
verbose_proxy_logger.warning(f"Tool Permission Guardrail: {message}")
if self.on_disallowed_action == "block":
raise GuardrailRaisedException(
guardrail_name=self.guardrail_name,
message=message,
)
denied_tools.append(
(
tool_call,
PermissionError(
tool_name=tool_call.function.name,
rule_id=rule_id,
message=message,
),
)
)
if denied_tools:
self._modify_response_with_permission_errors(response, denied_tools)
verbose_proxy_logger.debug(
"Tool Permission Guardrail Post-Call Hook: All tools allowed"
)
add_guardrail_to_applied_guardrails_header(
request_data=data, guardrail_name=self.guardrail_name
)
async def async_post_call_streaming_iterator_hook(
self,
user_api_key_dict: UserAPIKeyAuth,
response: Any,
request_data: dict,
) -> AsyncGenerator[ModelResponseStream, None]:
"""
Check tool usage permissions after the LLM stream call
Args:
user_api_key_dict: User API key information (unused but required by interface)
response: The model response to check
request_data: The model request (unused but required by interface)
"""
# Import here to avoid circular imports
from litellm.llms.base_llm.base_model_iterator import MockResponseIterator
from litellm.main import stream_chunk_builder
from litellm.types.utils import TextCompletionResponse
# Collect all chunks to process them together
all_chunks: List[ModelResponseStream] = []
async for chunk in response:
all_chunks.append(chunk)
assembled_model_response: Optional[
Union[ModelResponse, TextCompletionResponse]
] = stream_chunk_builder(
chunks=all_chunks,
)
if isinstance(assembled_model_response, ModelResponse):
verbose_proxy_logger.debug("Tool Permission Guardrail: Checking response")
# Extract tool_calls from the response
tool_calls = self._extract_tool_calls_from_response(assembled_model_response)
if not tool_calls:
verbose_proxy_logger.debug(
"Tool Permission Guardrail: No tool uses found"
)
return
verbose_proxy_logger.debug(
f"Tool Permission Guardrail: Found {len(tool_calls)} tool calls"
)
# Check permissions for each tool use
denied_tools = []
for tool_call in tool_calls:
if tool_call.function.name is None:
continue
is_allowed, rule_id, message = self._check_tool_permission(
tool_call.function.name
)
if not is_allowed and message is not None:
verbose_proxy_logger.warning(
f"Tool Permission Guardrail: {message}"
)
if self.on_disallowed_action == "block":
raise GuardrailRaisedException(
guardrail_name=self.guardrail_name,
message=message,
)
denied_tools.append(
(
tool_call,
PermissionError(
tool_name=tool_call.function.name,
rule_id=rule_id,
message=message,
),
)
)
verbose_proxy_logger.debug(
"Tool Permission Guardrail Post-Call Hook: All tools allowed"
)
if denied_tools:
self._modify_response_with_permission_errors(
assembled_model_response, denied_tools
)
mock_response = MockResponseIterator(
model_response=assembled_model_response
)
# Return the reconstructed stream
async for chunk in mock_response:
yield chunk
else:
for chunk in all_chunks:
yield chunk

View File

@ -123,3 +123,18 @@ def initialize_hide_secrets(litellm_params: LitellmParams, guardrail: Guardrail)
return _secret_detection_object
def initialize_tool_permission(litellm_params: LitellmParams, guardrail: Guardrail):
from litellm.proxy.guardrails.guardrail_hooks.tool_permission import (
ToolPermissionGuardrail,
)
_tool_permission_callback = ToolPermissionGuardrail(
guardrail_name=guardrail.get("guardrail_name", ""),
event_hook=litellm_params.mode,
rules=litellm_params.rules,
default_action=getattr(litellm_params, "default_action", "deny"),
on_disallowed_action=getattr(litellm_params, "on_disallowed_action", "block"),
default_on=litellm_params.default_on,
)
litellm.logging_callback_manager.add_litellm_callback(_tool_permission_callback)
return _tool_permission_callback

View File

@ -26,6 +26,7 @@ from .guardrail_initializers import (
initialize_lakera,
initialize_lakera_v2,
initialize_presidio,
initialize_tool_permission,
)
guardrail_initializer_registry = {
@ -34,6 +35,7 @@ guardrail_initializer_registry = {
SupportedGuardrailIntegrations.LAKERA_V2.value: initialize_lakera_v2,
SupportedGuardrailIntegrations.PRESIDIO.value: initialize_presidio,
SupportedGuardrailIntegrations.HIDE_SECRETS.value: initialize_hide_secrets,
SupportedGuardrailIntegrations.TOOL_PERMISSION.value: initialize_tool_permission,
}
guardrail_class_registry: Dict[str, Type[CustomGuardrail]] = {}

View File

@ -1,10 +1,11 @@
import asyncio
import sys
from datetime import datetime, timedelta
from typing import TYPE_CHECKING, Any, List, Literal, Optional, Tuple, TypedDict, Union
from typing import TYPE_CHECKING, Any, List, Literal, Optional, Tuple, Union
from fastapi import HTTPException
from pydantic import BaseModel
from typing_extensions import TypedDict
import litellm
from litellm import DualCache, ModelResponse

View File

@ -14,11 +14,14 @@ from litellm.proxy._types import (
AddTeamCallback,
CommonProxyErrors,
LitellmDataForBackendLLMCall,
LitellmUserRoles,
SpecialHeaders,
TeamCallbackMetadata,
UserAPIKeyAuth,
LitellmUserRoles,
)
# Cache special headers as a frozenset for O(1) lookup performance
_SPECIAL_HEADERS_CACHE = frozenset(v.value.lower() for v in SpecialHeaders._member_map_.values())
from litellm.proxy.auth.route_checks import RouteChecks
from litellm.router import Router
from litellm.types.llms.anthropic import ANTHROPIC_API_HEADERS
@ -54,6 +57,13 @@ def parse_cache_control(cache_control):
return cache_dict
LITELLM_METADATA_ROUTES = (
"batches",
"/v1/messages",
"responses",
"files",
)
def _get_metadata_variable_name(request: Request) -> str:
"""
Helper to return what the "metadata" field should be called in the request data
@ -65,22 +75,10 @@ def _get_metadata_variable_name(request: Request) -> str:
if RouteChecks._is_assistants_api_request(request):
return "litellm_metadata"
LITELLM_METADATA_ROUTES = [
"batches",
"/v1/messages",
"responses",
"files",
]
if any(
[
litellm_metadata_route in request.url.path
for litellm_metadata_route in LITELLM_METADATA_ROUTES
]
):
if any(route in request.url.path for route in LITELLM_METADATA_ROUTES):
return "litellm_metadata"
else:
return "metadata"
return "metadata"
def safe_add_api_version_from_query_params(data: dict, request: Request):
@ -235,14 +233,13 @@ def clean_headers(
"""
Removes litellm api key from headers
"""
special_headers = [v.value.lower() for v in SpecialHeaders._member_map_.values()]
special_headers = special_headers
if litellm_key_header_name is not None:
special_headers.append(litellm_key_header_name.lower())
clean_headers = {}
litellm_key_lower = litellm_key_header_name.lower() if litellm_key_header_name is not None else None
for header, value in headers.items():
if header.lower() not in special_headers:
header_lower = header.lower()
# Check if header should be excluded: either in special headers cache or matches custom litellm key
if (header_lower not in _SPECIAL_HEADERS_CACHE and (litellm_key_lower is None or header_lower != litellm_key_lower)):
clean_headers[header] = value
return clean_headers
@ -272,7 +269,7 @@ class LiteLLMProxyRequestSetup:
if timeout_header is not None:
return float(timeout_header)
return None
@staticmethod
def _get_stream_timeout_from_request(headers: dict) -> Optional[float]:
"""
@ -292,13 +289,14 @@ class LiteLLMProxyRequestSetup:
if num_retries_header is not None:
return int(num_retries_header)
return None
@staticmethod
def _get_spend_logs_metadata_from_request_headers(headers: dict) -> Optional[dict]:
"""
Get the `spend_logs_metadata` from the request headers.
"""
from litellm.litellm_core_utils.safe_json_loads import safe_json_loads
spend_logs_metadata_header = headers.get("x-litellm-spend-logs-metadata", None)
if spend_logs_metadata_header is not None:
return safe_json_loads(spend_logs_metadata_header)
@ -337,16 +335,24 @@ class LiteLLMProxyRequestSetup:
return None
@staticmethod
def add_internal_user_from_user_mapping(general_settings: Optional[Dict], user_api_key_dict: UserAPIKeyAuth, headers: dict) -> UserAPIKeyAuth:
def add_internal_user_from_user_mapping(
general_settings: Optional[Dict],
user_api_key_dict: UserAPIKeyAuth,
headers: dict,
) -> UserAPIKeyAuth:
if general_settings is None:
return user_api_key_dict
user_header_mapping = general_settings.get("user_header_mappings")
if not user_header_mapping:
return user_api_key_dict
header_name = LiteLLMProxyRequestSetup.get_internal_user_header_from_mapping(user_header_mapping)
header_name = LiteLLMProxyRequestSetup.get_internal_user_header_from_mapping(
user_header_mapping
)
if not header_name:
return user_api_key_dict
header_value = LiteLLMProxyRequestSetup._get_case_insensitive_header(headers, header_name)
header_value = LiteLLMProxyRequestSetup._get_case_insensitive_header(
headers, header_name
)
if header_value:
user_api_key_dict.user_id = header_value
return user_api_key_dict
@ -429,15 +435,25 @@ class LiteLLMProxyRequestSetup:
"""
Add headers to the LLM call by model group
"""
from litellm.proxy.auth.auth_checks import _check_model_access_helper
from litellm.proxy.proxy_server import llm_router
data_model = data.get("model")
if (
data_model is not None
and litellm.model_group_settings is not None
and litellm.model_group_settings.forward_client_headers_to_llm_api
is not None
and data_model
in litellm.model_group_settings.forward_client_headers_to_llm_api
and _check_model_access_helper(
model=data_model,
llm_router=llm_router,
models=litellm.model_group_settings.forward_client_headers_to_llm_api,
team_model_aliases=user_api_key_dict.team_model_aliases,
team_id=user_api_key_dict.team_id,
) # handles aliases, wildcards, etc.
):
_headers = LiteLLMProxyRequestSetup.add_headers_to_llm_call(
headers, user_api_key_dict
)
@ -497,8 +513,10 @@ class LiteLLMProxyRequestSetup:
timeout = LiteLLMProxyRequestSetup._get_timeout_from_request(headers)
if timeout is not None:
data["timeout"] = timeout
stream_timeout = LiteLLMProxyRequestSetup._get_stream_timeout_from_request(headers)
stream_timeout = LiteLLMProxyRequestSetup._get_stream_timeout_from_request(
headers
)
if stream_timeout is not None:
data["stream_timeout"] = stream_timeout
@ -507,7 +525,7 @@ class LiteLLMProxyRequestSetup:
data["num_retries"] = num_retries
return data
@staticmethod
def add_litellm_metadata_from_request_headers(
headers: dict,
@ -520,11 +538,16 @@ class LiteLLMProxyRequestSetup:
Relevant issue: https://github.com/BerriAI/litellm/issues/14008
"""
from litellm.proxy._types import LitellmMetadataFromRequestHeaders
metadata_from_headers = LitellmMetadataFromRequestHeaders()
spend_logs_metadata = LiteLLMProxyRequestSetup._get_spend_logs_metadata_from_request_headers(headers)
spend_logs_metadata = (
LiteLLMProxyRequestSetup._get_spend_logs_metadata_from_request_headers(
headers
)
)
if spend_logs_metadata is not None:
metadata_from_headers["spend_logs_metadata"] = spend_logs_metadata
#########################################################################################
# Finally update the requests metadata with the `metadata_from_headers`
#########################################################################################
@ -714,7 +737,6 @@ async def add_litellm_data_to_request( # noqa: PLR0915
from litellm.proxy.proxy_server import llm_router, premium_user
from litellm.types.proxy.litellm_pre_call_utils import SecretFields
_headers = clean_headers(
request.headers,
litellm_key_header_name=(
@ -740,8 +762,6 @@ async def add_litellm_data_to_request( # noqa: PLR0915
if data.get(_metadata_variable_name, None) is None:
data[_metadata_variable_name] = {}
data.update(
LiteLLMProxyRequestSetup.add_litellm_data_for_backend_llm_call(
headers=_headers,
@ -763,7 +783,9 @@ async def add_litellm_data_to_request( # noqa: PLR0915
data=data, headers=_headers, user_api_key_dict=user_api_key_dict
)
user_api_key_dict = LiteLLMProxyRequestSetup.add_internal_user_from_user_mapping(general_settings, user_api_key_dict, _headers)
user_api_key_dict = LiteLLMProxyRequestSetup.add_internal_user_from_user_mapping(
general_settings, user_api_key_dict, _headers
)
# Parse user info from headers
user = LiteLLMProxyRequestSetup.get_user_from_headers(_headers, general_settings)
@ -773,7 +795,6 @@ async def add_litellm_data_to_request( # noqa: PLR0915
if "user" not in data:
data["user"] = user
data["secret_fields"] = SecretFields(raw_headers=dict(request.headers))
## Dynamic api version (Azure OpenAI endpoints) ##

View File

@ -93,7 +93,7 @@ async def _upsert_budget_and_membership(
create_data["tpm_limit"] = tpm_limit
if rpm_limit is not None:
create_data["rpm_limit"] = rpm_limit
new_budget = await tx.litellm_budgettable.create(
data=create_data,
include={"team_membership": True},

View File

@ -925,6 +925,15 @@ async def prepare_key_update_data(
detail="team_id is required for service account keys. Please specify `team_id` in the request body.",
)
non_default_values = {}
# ADD METADATA FIELDS
# Set Management Endpoint Metadata Fields
for field in LiteLLM_ManagementEndpoint_MetadataFields_Premium:
if getattr(data, field, None) is not None:
_set_object_metadata_field(
object_data=data,
field_name=field,
value=getattr(data, field),
)
for k, v in data_json.items():
if (
k in LiteLLM_ManagementEndpoint_MetadataFields
@ -1137,6 +1146,9 @@ async def update_key_fn(
change_initiated_by=user_api_key_dict,
llm_router=llm_router,
)
# Set Management Endpoint Metadata Fields
non_default_values = await prepare_key_update_data(
data=data, existing_key_row=existing_key_row
)

View File

@ -36,6 +36,28 @@ from litellm.proxy.utils import PrismaClient
router = APIRouter()
def handle_nested_budget_structure_in_organization_update_request(raw_data: dict) -> dict:
"""
Transform organization update request to handle UI payload format.
The UI sends nested budget data in 'litellm_budget_table', but our
model expects flat budget fields at the top level.
"""
transformed_data = raw_data.copy()
# Handle nested budget structure from UI
if 'litellm_budget_table' in transformed_data:
budget_data = transformed_data.pop('litellm_budget_table', {})
if budget_data:
# Extract valid budget fields and merge into top level
budget_fields = LiteLLM_BudgetTable.model_fields.keys()
for key, value in budget_data.items():
if key in budget_fields and value is not None:
transformed_data[key] = value
return transformed_data
@router.post(
"/organization/new",
tags=["organization management"],
@ -248,7 +270,7 @@ async def _set_object_permission(
response_model=LiteLLM_OrganizationTableWithMembers,
)
async def update_organization(
data: LiteLLM_OrganizationTableUpdate,
request: Request,
user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
):
"""
@ -270,6 +292,13 @@ async def update_organization(
},
)
# Transform UI payload to expected format
raw_data = await request.json()
raw_data_with_flat_budget_fields = handle_nested_budget_structure_in_organization_update_request(raw_data)
# Create validated data model
data = LiteLLM_OrganizationTableUpdate(**raw_data_with_flat_budget_fields)
if data.updated_by is None:
data.updated_by = user_api_key_dict.user_id
@ -293,6 +322,23 @@ async def update_organization(
existing_organization_row=existing_organization_row,
)
# Handle budget updates if budget fields are provided
budget_fields = {k: v for k, v in data.model_dump().items()
if k in LiteLLM_BudgetTable.model_fields.keys() and v is not None}
if budget_fields and existing_organization_row.budget_id:
await update_budget(
budget_obj=BudgetNewRequest(
budget_id=existing_organization_row.budget_id,
**budget_fields
),
user_api_key_dict=user_api_key_dict,
)
# Remove budget fields from organization update data
for field in LiteLLM_BudgetTable.model_fields.keys():
updated_organization_row.pop(field, None)
response = await prisma_client.db.litellm_organizationtable.update(
where={"organization_id": data.organization_id},
data=updated_organization_row,

View File

@ -5,7 +5,7 @@ This is an enterprise feature and requires a premium license.
"""
import uuid
from typing import Any, Dict, List, Optional, Set, Tuple, TypedDict
from typing import Any, Dict, List, Optional, Set, Tuple
from fastapi import (
APIRouter,
@ -17,6 +17,7 @@ from fastapi import (
Request,
Response,
)
from typing_extensions import TypedDict
import litellm
from litellm._logging import verbose_proxy_logger

View File

@ -2,10 +2,11 @@ import asyncio
import json
import time
from datetime import datetime
from typing import Literal, Optional, TypedDict
from typing import Literal, Optional
from urllib.parse import urlparse
import httpx
from typing_extensions import TypedDict
import litellm
from litellm._logging import verbose_proxy_logger

View File

@ -10,7 +10,6 @@ from fastapi import APIRouter, Depends, HTTPException, status
import litellm
from litellm._logging import verbose_proxy_logger
from litellm.router_strategy.budget_limiter import RouterBudgetLimiting
from litellm.proxy._types import *
from litellm.proxy._types import ProviderBudgetResponse, ProviderBudgetResponseObject
from litellm.proxy.auth.user_api_key_auth import user_api_key_auth
@ -18,6 +17,7 @@ from litellm.proxy.spend_tracking.spend_tracking_utils import (
get_spend_by_team_and_customer,
)
from litellm.proxy.utils import handle_exception_on_proxy
from litellm.router_strategy.budget_limiter import RouterBudgetLimiting
if TYPE_CHECKING:
from litellm.proxy.proxy_server import PrismaClient
@ -1660,6 +1660,12 @@ async def ui_view_spend_logs( # noqa: PLR0915
model: Optional[str] = fastapi.Query(
default=None, description="Filter logs by model"
),
key_alias: Optional[str] = fastapi.Query(
default=None, description="Filter logs by key alias"
),
end_user: Optional[str] = fastapi.Query(
default=None, description="Filter logs by end user"
),
):
"""
View spend logs for UI with pagination support
@ -1728,6 +1734,15 @@ async def ui_view_spend_logs( # noqa: PLR0915
if model is not None:
where_conditions["model"] = model
if key_alias is not None:
where_conditions["metadata"] = {
"path": ["user_api_key_alias"],
"string_contains": key_alias,
}
if end_user is not None:
where_conditions["end_user"] = end_user
if min_spend is not None or max_spend is not None:
where_conditions["spend"] = {}
if min_spend is not None:

View File

@ -4414,7 +4414,7 @@ class Router:
return tpm_key
except Exception as e:
verbose_router_logger.exception(
verbose_router_logger.debug(
"litellm.router.Router::deployment_callback_on_success(): Exception occured - {}".format(
str(e)
)
@ -4562,8 +4562,10 @@ class Router:
parent_otel_span=parent_otel_span,
ttl=RoutingArgs.ttl.value,
)
def _get_metadata_variable_name_from_kwargs(self, kwargs: dict) -> Literal["metadata", "litellm_metadata"]:
def _get_metadata_variable_name_from_kwargs(
self, kwargs: dict
) -> Literal["metadata", "litellm_metadata"]:
"""
Helper to return what the "metadata" field should be called in the request data
@ -5672,11 +5674,11 @@ class Router:
)
if supported_openai_params is None:
supported_openai_params = []
# Get mode from database model_info if available, otherwise default to "chat"
db_model_info = model.get("model_info", {})
mode = db_model_info.get("mode", "chat")
model_info = ModelMapInfo(
key=model_group,
max_tokens=None,
@ -6802,7 +6804,9 @@ class Router:
model=model,
request_kwargs=request_kwargs,
healthy_deployments=healthy_deployments,
metadata_variable_name=self._get_metadata_variable_name_from_kwargs(request_kwargs),
metadata_variable_name=self._get_metadata_variable_name_from_kwargs(
request_kwargs
),
)
if len(healthy_deployments) == 0:

View File

@ -3,7 +3,9 @@ Wrapper around router cache. Meant to handle model cooldown logic
"""
import time
from typing import TYPE_CHECKING, Any, List, Optional, Tuple, TypedDict, Union
from typing import TYPE_CHECKING, Any, List, Optional, Tuple, Union
from typing_extensions import TypedDict
from litellm import verbose_logger
from litellm.caching.caching import DualCache

View File

@ -4,7 +4,9 @@ Wrapper around router cache. Meant to store model id when prompt caching support
import hashlib
import json
from typing import TYPE_CHECKING, Any, List, Optional, TypedDict, Union
from typing import TYPE_CHECKING, Any, List, Optional, Union
from typing_extensions import TypedDict
from litellm.caching.caching import DualCache
from litellm.caching.in_memory_cache import InMemoryCache

View File

@ -1,7 +1,8 @@
from enum import Enum
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
from typing import Any, Dict, List, Literal, Optional, Union
from pydantic import BaseModel
from typing_extensions import TypedDict
class LiteLLMCacheType(str, Enum):

View File

@ -1,14 +1,10 @@
from datetime import datetime
from enum import Enum
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
from typing import Any, Dict, List, Literal, Optional, Union
from pydantic import BaseModel, ConfigDict, Field, SecretStr
from pydantic import BaseModel, ConfigDict, Field
from typing_extensions import Required, TypedDict
from litellm.types.proxy.guardrails.guardrail_hooks.openai.openai_moderation import (
OpenAIModerationGuardrailConfigModel,
)
"""
Pydantic object defining how to set guardrails on litellm proxy
@ -41,6 +37,9 @@ class SupportedGuardrailIntegrations(Enum):
MODEL_ARMOR = "model_armor"
OPENAI_MODERATION = "openai_moderation"
NOMA = "noma"
TOOL_PERMISSION = "tool_permission"
class Role(Enum):
SYSTEM = "system"
@ -312,7 +311,6 @@ class BedrockGuardrailConfigModel(BaseModel):
)
class LakeraV2GuardrailConfigModel(BaseModel):
"""Configuration parameters for the Lakera AI v2 guardrail"""
@ -375,6 +373,22 @@ class NomaGuardrailConfigModel(BaseModel):
default=None,
description="If True, blocks requests on API failures. Defaults to True if not provided",
)
anonymize_input: Optional[bool] = Field(
default=None,
description="If True, replaces sensitive content with anonymized version when only PII/PCI/secrets are detected. Only applies in blocking mode. Defaults to False if not provided",
)
class ToolPermissionGuardrailConfigModel(BaseModel):
"""Configuration parameters for the Tool Permission guardrail"""
rules: Optional[List[Dict]] = Field(
default=None, description="List of permission rules for tool usage"
)
default_action: Optional[str] = Field(
default="Deny",
description="Default action when no rule matches (Allow or Deny)",
)
class BaseLitellmParams(BaseModel): # works for new and patch update guardrails
@ -425,7 +439,8 @@ class BaseLitellmParams(BaseModel): # works for new and patch update guardrails
)
model: Optional[str] = Field(
default=None, description="Optional field if guardrail requires a 'model' parameter"
default=None,
description="Optional field if guardrail requires a 'model' parameter",
)
# Model Armor params
@ -446,7 +461,7 @@ class BaseLitellmParams(BaseModel): # works for new and patch update guardrails
default=True,
description="Whether to fail the request if Model Armor encounters an error",
)
model_config = ConfigDict(extra="allow", protected_namespaces=())
@ -464,6 +479,7 @@ class LitellmParams(
LassoGuardrailConfigModel,
PillarGuardrailConfigModel,
NomaGuardrailConfigModel,
ToolPermissionGuardrailConfigModel,
BaseLitellmParams,
):
guardrail: str = Field(description="The type of guardrail integration to use")

Some files were not shown because too many files have changed in this diff Show More