Resolve merge conflict by including both CompactifAI and OVHCloud providers
- Keep CompactifAI provider detection logic - Include new OVHCloud provider from main branch - Both providers now work correctly with model prefix detection
This commit is contained in:
commit
9521414efa
@ -346,6 +346,7 @@ curl 'http://0.0.0.0:4000/key/generate' \
|
||||
| [Featherless AI](https://docs.litellm.ai/docs/providers/featherless_ai) | ✅ | ✅ | ✅ | ✅ | | |
|
||||
| [Nebius AI Studio](https://docs.litellm.ai/docs/providers/nebius) | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
||||
| [Heroku](https://docs.litellm.ai/docs/providers/heroku) | ✅ | ✅ | | | | |
|
||||
| [OVHCloud AI Endpoints](https://docs.litellm.ai/docs/providers/ovhcloud) | ✅ | ✅ | | | | |
|
||||
|
||||
[**Read the Docs**](https://docs.litellm.ai/docs/)
|
||||
|
||||
|
||||
256
cookbook/misc/RELEASE_NOTES_GENERATION_INSTRUCTIONS.md
Normal file
256
cookbook/misc/RELEASE_NOTES_GENERATION_INSTRUCTIONS.md
Normal file
@ -0,0 +1,256 @@
|
||||
# LiteLLM Release Notes Generation Instructions
|
||||
|
||||
This document provides comprehensive instructions for AI agents to generate release notes for LiteLLM following the established format and style.
|
||||
|
||||
## Required Inputs
|
||||
|
||||
1. **Release Version** (e.g., `v1.76.3-stable`)
|
||||
2. **PR Diff/Changelog** - List of PRs with titles and contributors
|
||||
3. **Previous Version Commit Hash** - To compare model pricing changes
|
||||
4. **Reference Release Notes** - Previous release notes to follow style/format
|
||||
|
||||
## Step-by-Step Process
|
||||
|
||||
### 1. Initial Setup and Analysis
|
||||
|
||||
```bash
|
||||
# Check git diff for model pricing changes
|
||||
git diff <previous_commit_hash> HEAD -- model_prices_and_context_window.json
|
||||
```
|
||||
|
||||
**Key Analysis Points:**
|
||||
- New models added (look for new entries)
|
||||
- Deprecated models removed (look for deleted entries)
|
||||
- Pricing updates (look for cost changes)
|
||||
- Feature support changes (tool calling, reasoning, etc.)
|
||||
|
||||
### 2. Release Notes Structure
|
||||
|
||||
Follow this exact structure based on `docs/my-website/release_notes/v1.76.1-stable/index.md`:
|
||||
|
||||
```markdown
|
||||
---
|
||||
title: "v1.76.X-stable - [Key Theme]"
|
||||
slug: "v1-76-X"
|
||||
date: YYYY-MM-DDTHH:mm:ss
|
||||
authors: [standard author block]
|
||||
hide_table_of_contents: false
|
||||
---
|
||||
|
||||
## Deploy this version
|
||||
[Docker and pip installation tabs]
|
||||
|
||||
## Key Highlights
|
||||
[3-5 bullet points of major features]
|
||||
|
||||
## Major Changes
|
||||
[Critical changes users need to know]
|
||||
|
||||
## Performance Improvements
|
||||
[Performance-related changes]
|
||||
|
||||
## New Models / Updated Models
|
||||
[Detailed model tables and provider updates]
|
||||
|
||||
## LLM API Endpoints
|
||||
[API-related features and fixes]
|
||||
|
||||
## Management Endpoints / UI
|
||||
[Admin interface and management changes]
|
||||
|
||||
## Logging / Guardrail Integrations
|
||||
[Observability and security features]
|
||||
|
||||
## Performance / Loadbalancing / Reliability improvements
|
||||
[Infrastructure improvements]
|
||||
|
||||
## General Proxy Improvements
|
||||
[Other proxy-related changes]
|
||||
|
||||
## New Contributors
|
||||
[List of first-time contributors]
|
||||
|
||||
## Full Changelog
|
||||
[Link to GitHub comparison]
|
||||
```
|
||||
|
||||
### 3. Categorization Rules
|
||||
|
||||
**Performance Improvements:**
|
||||
- RPS improvements
|
||||
- Memory optimizations
|
||||
- CPU usage optimizations
|
||||
- Timeout controls
|
||||
- Worker configuration
|
||||
|
||||
**New Models/Updated Models:**
|
||||
- Extract from model_prices_and_context_window.json diff
|
||||
- Create tables with: Provider, Model, Context Window, Input Cost, Output Cost, Features
|
||||
- Group by provider
|
||||
- Note pricing corrections
|
||||
- Highlight deprecated models
|
||||
|
||||
**Provider Features:**
|
||||
- Group by provider (Gemini, OpenAI, Anthropic, etc.)
|
||||
- Link to provider docs: `../../docs/providers/[provider_name]`
|
||||
- Separate features from bug fixes
|
||||
|
||||
**API Endpoints:**
|
||||
- Images API
|
||||
- Video Generation (if applicable)
|
||||
- Responses API
|
||||
- Passthrough endpoints
|
||||
- General chat completions
|
||||
|
||||
**UI/Management:**
|
||||
- Authentication changes
|
||||
- Dashboard improvements
|
||||
- Team management
|
||||
- Key management
|
||||
|
||||
**Integrations:**
|
||||
- Logging providers (Datadog, Braintrust, etc.)
|
||||
- Guardrails
|
||||
- Cost tracking
|
||||
- Observability
|
||||
|
||||
### 4. Documentation Linking Strategy
|
||||
|
||||
**Link to docs when:**
|
||||
- New provider support added
|
||||
- Significant feature additions
|
||||
- API endpoint changes
|
||||
- Integration additions
|
||||
|
||||
**Link format:** `../../docs/[category]/[specific_doc]`
|
||||
|
||||
**Common doc paths:**
|
||||
- `../../docs/providers/[provider]` - Provider-specific docs
|
||||
- `../../docs/image_generation` - Image generation
|
||||
- `../../docs/video_generation` - Video generation (if exists)
|
||||
- `../../docs/response_api` - Responses API
|
||||
- `../../docs/proxy/logging` - Logging integrations
|
||||
- `../../docs/proxy/guardrails` - Guardrails
|
||||
- `../../docs/pass_through/[provider]` - Passthrough endpoints
|
||||
|
||||
### 5. Model Table Generation
|
||||
|
||||
From git diff analysis, create tables like:
|
||||
|
||||
```markdown
|
||||
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|
||||
| -------- | ----- | -------------- | ------------------- | -------------------- | -------- |
|
||||
| OpenRouter | `openrouter/openai/gpt-4.1` | 1M | $2.00 | $8.00 | Chat completions with vision |
|
||||
```
|
||||
|
||||
**Extract from JSON:**
|
||||
- `max_input_tokens` → Context Window
|
||||
- `input_cost_per_token` × 1,000,000 → Input cost
|
||||
- `output_cost_per_token` × 1,000,000 → Output cost
|
||||
- `supports_*` fields → Features
|
||||
- Special pricing fields (per image, per second) for generation models
|
||||
|
||||
### 6. PR Categorization Logic
|
||||
|
||||
**By Keywords in PR Title:**
|
||||
- `[Perf]`, `Performance`, `RPS` → Performance Improvements
|
||||
- `[Bug]`, `[Bug Fix]`, `Fix` → Bug Fixes section
|
||||
- `[Feat]`, `[Feature]`, `Add support` → Features section
|
||||
- `[Docs]` → Documentation (usually exclude from main sections)
|
||||
- Provider names (Gemini, OpenAI, etc.) → Group under provider
|
||||
|
||||
**By PR Content Analysis:**
|
||||
- New model additions → New Models section
|
||||
- UI changes → Management Endpoints/UI
|
||||
- Logging/observability → Logging/Guardrail Integrations
|
||||
- Rate limiting/budgets → Performance/Reliability
|
||||
- Authentication → Management Endpoints
|
||||
|
||||
### 7. Writing Style Guidelines
|
||||
|
||||
**Tone:**
|
||||
- Professional but accessible
|
||||
- Focus on user impact
|
||||
- Highlight breaking changes clearly
|
||||
- Use active voice
|
||||
|
||||
**Formatting:**
|
||||
- Use consistent markdown formatting
|
||||
- Include PR links: `[PR #XXXXX](https://github.com/BerriAI/litellm/pull/XXXXX)`
|
||||
- Use code blocks for configuration examples
|
||||
- Bold important terms and section headers
|
||||
|
||||
**Warnings/Notes:**
|
||||
- Add warning boxes for breaking changes
|
||||
- Include migration instructions when needed
|
||||
- Provide override options for default changes
|
||||
|
||||
### 8. Quality Checks
|
||||
|
||||
**Before finalizing:**
|
||||
- Verify all PR links work
|
||||
- Check documentation links are valid
|
||||
- Ensure model pricing is accurate
|
||||
- Confirm provider names are consistent
|
||||
- Review for typos and formatting issues
|
||||
|
||||
### 9. Common Patterns to Follow
|
||||
|
||||
**Performance Changes:**
|
||||
```markdown
|
||||
- **+400 RPS Performance Boost** - Description - [PR #XXXXX](link)
|
||||
```
|
||||
|
||||
**New Models:**
|
||||
Always include pricing table and feature highlights
|
||||
|
||||
**Breaking Changes:**
|
||||
```markdown
|
||||
:::warning
|
||||
This release has a known issue...
|
||||
:::
|
||||
```
|
||||
|
||||
**Provider Features:**
|
||||
```markdown
|
||||
- **[Provider Name](../../docs/providers/provider)**
|
||||
- Feature description - [PR #XXXXX](link)
|
||||
```
|
||||
|
||||
### 10. Missing Documentation Check
|
||||
|
||||
**Review for missing docs:**
|
||||
- New providers without documentation
|
||||
- New API endpoints without examples
|
||||
- Complex features without guides
|
||||
- Integration setup instructions
|
||||
|
||||
**Flag for documentation needs:**
|
||||
- New provider integrations
|
||||
- Significant API changes
|
||||
- Complex configuration options
|
||||
- Migration requirements
|
||||
|
||||
## Example Command Workflow
|
||||
|
||||
```bash
|
||||
# 1. Get model changes
|
||||
git diff <commit> HEAD -- model_prices_and_context_window.json
|
||||
|
||||
# 2. Analyze PR list for categorization
|
||||
# 3. Create release notes following template
|
||||
# 4. Link to appropriate documentation
|
||||
# 5. Review for missing documentation needs
|
||||
```
|
||||
|
||||
## Output Requirements
|
||||
|
||||
- Follow exact markdown structure from reference
|
||||
- Include all PR links and contributors
|
||||
- Provide accurate model pricing tables
|
||||
- Link to relevant documentation
|
||||
- Highlight breaking changes with warnings
|
||||
- Include deployment instructions
|
||||
- End with full changelog link
|
||||
|
||||
This process ensures consistent, comprehensive release notes that help users understand changes and upgrade smoothly.
|
||||
@ -65,6 +65,7 @@ Use `litellm.get_supported_openai_params()` for an updated list of params for ea
|
||||
| Github | ✅| ✅ | ✅ | ✅| ✅ | ✅ | ✅ | ✅| ✅ | ✅| ✅|| || ✅ | ✅ (model dependent) | ✅ (model dependent) || ||
|
||||
| Novita AI| ✅| ✅ || ✅| ✅ | ✅ | ✅ | ✅| ✅ | ✅| || ✅||| |||| ||
|
||||
| Bytez | ✅| ✅ || ✅| ✅ | | | ✅|| || || || || || ||
|
||||
| OVHCloud AI Endpoints | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | | | | | | | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
||||
|
||||
:::note
|
||||
|
||||
|
||||
380
docs/my-website/docs/providers/ovhcloud.md
Normal file
380
docs/my-website/docs/providers/ovhcloud.md
Normal file
@ -0,0 +1,380 @@
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# 🆕 OVHCloud AI Endpoints
|
||||
Leading French Cloud provider in Europe with data sovereignty and privacy.
|
||||
|
||||
You can explore the last models we made available in our [catalog](https://endpoints.ai.cloud.ovh.net/catalog).
|
||||
|
||||
:::tip
|
||||
|
||||
We support ALL OVHCloud AI Endpoints models, just set `model=ovhcloud/<any-model-on-ai-endpoints>` as a prefix when sending litellm requests.
|
||||
For the complete models catalog, visit https://endpoints.ai.cloud.ovh.net/catalog. **
|
||||
|
||||
:::
|
||||
|
||||
## Sample usage
|
||||
### Chat completion
|
||||
You can define your API key by setting the `OVHCLOUD_API_KEY` environment variable or by overriding the `api_key` parameter. You can generate a key on the [OVHCloud Manager](https://www.ovh.com/manager).
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
import os
|
||||
|
||||
# Our API is free but ratelimited for calls without an API key.
|
||||
os.environ['OVHCLOUD_API_KEY'] = "your-api-key"
|
||||
|
||||
response = completion(
|
||||
model = "ovhcloud/Meta-Llama-3_3-70B-Instruct",
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello, how are you?",
|
||||
}
|
||||
],
|
||||
max_tokens = 10,
|
||||
stop = [],
|
||||
temperature = 0.2,
|
||||
top_p = 0.9,
|
||||
user = "user",
|
||||
api_key = "your-api-key" # Optional if set through the enviromnent variable.
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Streaming
|
||||
Set the parameter `stream` to `True` to stream a response.
|
||||
```python
|
||||
from litellm import completion
|
||||
import os
|
||||
|
||||
os.environ['OVHCLOUD_API_KEY'] = "your-api-key"
|
||||
|
||||
response = completion(
|
||||
model = "ovhcloud/Meta-Llama-3_3-70B-Instruct",
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello, how are you?",
|
||||
}
|
||||
],
|
||||
max_tokens = 10,
|
||||
stop = [],
|
||||
temperature = 0.2,
|
||||
top_p = 0.9,
|
||||
user = "user",
|
||||
api_key = "your-api-key" # Optional if set through the enviromnent variable,
|
||||
stream = True
|
||||
)
|
||||
|
||||
for part in response:
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Tool Calling
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
import json
|
||||
|
||||
def get_current_weather(location, unit="celsius"):
|
||||
if unit == "celsius":
|
||||
return {"location": location, "temperature": "22", "unit": "celsius"}
|
||||
else:
|
||||
return {"location": location, "temperature": "72", "unit": "fahrenheit"}
|
||||
|
||||
def print_message(role, content, is_tool_call=False, function_name=None):
|
||||
if role == "user":
|
||||
print(f"🧑 User: {content}")
|
||||
elif role == "assistant":
|
||||
if is_tool_call:
|
||||
print(f"🤖 Assistant: I will call the function '{function_name}' to get some informations.")
|
||||
else:
|
||||
print(f"🤖 Assistant: {content}")
|
||||
elif role == "tool":
|
||||
print(f"🔧 Tool ({function_name}): {content}")
|
||||
print()
|
||||
|
||||
messages = [{"role": "user", "content": "What's the weather like in Paris?"}]
|
||||
model = "ovhcloud/Meta-Llama-3_3-70B-Instruct"
|
||||
|
||||
tools = [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "get_current_weather",
|
||||
"description": "Get the current weather in a given location",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "string",
|
||||
"description": "The city and country, e.g. Montréal, Canada",
|
||||
},
|
||||
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
|
||||
},
|
||||
"required": ["location"],
|
||||
},
|
||||
},
|
||||
}
|
||||
]
|
||||
|
||||
print("🌟 Beginning of the conversation")
|
||||
|
||||
# Initial user message
|
||||
print_message("user", messages[0]["content"])
|
||||
|
||||
# First request to the model
|
||||
print("📡 Sending first request to the model...")
|
||||
response = completion(
|
||||
model=model,
|
||||
messages=messages,
|
||||
tools=tools,
|
||||
tool_choice="auto",
|
||||
)
|
||||
|
||||
response_message = response.choices[0].message
|
||||
tool_calls = response_message.tool_calls
|
||||
|
||||
if tool_calls:
|
||||
available_functions = {
|
||||
"get_current_weather": get_current_weather,
|
||||
}
|
||||
|
||||
# Display the tool calls suggested by the model
|
||||
for tool_call in tool_calls:
|
||||
print_message("assistant", "", is_tool_call=True, function_name=tool_call.function.name)
|
||||
print(f" 📋 Arguments: {tool_call.function.arguments}")
|
||||
print()
|
||||
|
||||
# Add assistant message with tool calls to the conversation history
|
||||
assistant_message = {
|
||||
"role": "assistant",
|
||||
"content": response_message.content,
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": tool_call.id,
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool_call.function.name,
|
||||
"arguments": tool_call.function.arguments
|
||||
}
|
||||
} for tool_call in tool_calls
|
||||
]
|
||||
}
|
||||
|
||||
messages.append(assistant_message)
|
||||
|
||||
# Execute each tool call and add the results to the conversation history
|
||||
for tool_call in tool_calls:
|
||||
function_name = tool_call.function.name
|
||||
function_to_call = available_functions[function_name]
|
||||
function_args = json.loads(tool_call.function.arguments)
|
||||
|
||||
print(f"🔧 Executing function '{function_name}'...")
|
||||
function_response = function_to_call(
|
||||
location=function_args.get("location"),
|
||||
unit=function_args.get("unit"),
|
||||
)
|
||||
|
||||
# Display tool response
|
||||
print_message("tool", json.dumps(function_response, indent=2), function_name=function_name)
|
||||
|
||||
messages.append({
|
||||
"tool_call_id": tool_call.id,
|
||||
"role": "tool",
|
||||
"name": function_name,
|
||||
"content": json.dumps(function_response),
|
||||
})
|
||||
|
||||
print("📡 Sending second request to the model with results...")
|
||||
|
||||
# Second request with function results
|
||||
second_response = completion(
|
||||
model=model,
|
||||
messages=messages
|
||||
)
|
||||
|
||||
# Display final response
|
||||
final_content = second_response.choices[0].message.content
|
||||
print_message("assistant", final_content)
|
||||
|
||||
else:
|
||||
print("❌ No function call detected")
|
||||
print_message("assistant", response_message.content)
|
||||
```
|
||||
|
||||
### Vision Example
|
||||
|
||||
```python
|
||||
from base64 import b64encode
|
||||
from mimetypes import guess_type
|
||||
import litellm
|
||||
|
||||
# Auxiliary function to get b64 images
|
||||
def data_url_from_image(file_path):
|
||||
mime_type, _ = guess_type(file_path)
|
||||
if mime_type is None:
|
||||
raise ValueError("Could not determine MIME type of the file")
|
||||
|
||||
with open(file_path, "rb") as image_file:
|
||||
encoded_string = b64encode(image_file.read()).decode("utf-8")
|
||||
|
||||
data_url = f"data:{mime_type};base64,{encoded_string}"
|
||||
return data_url
|
||||
|
||||
response = litellm.completion(
|
||||
model = "ovhcloud/Mistral-Small-3.2-24B-Instruct-2506",
|
||||
messages=[
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": "What's in this image?"
|
||||
},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": data_url_from_image("your_image.jpg"),
|
||||
"format": "image/jpeg"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
stream=False
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
|
||||
### Structured Output
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
|
||||
response = completion(
|
||||
model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": (
|
||||
"You are a specialist in extracting structured data from unstructured text. "
|
||||
"Your task is to identify relevant entities and categories, then format them "
|
||||
"according to the requested structure."
|
||||
),
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Room 12 contains books, a desk, and a lamp."
|
||||
},
|
||||
],
|
||||
response_format={
|
||||
"type": "json_schema",
|
||||
"json_schema": {
|
||||
"title": "data",
|
||||
"name": "data_extraction",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"section": {"type": "string"},
|
||||
"products": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"}
|
||||
}
|
||||
},
|
||||
"required": ["section", "products"],
|
||||
"additionalProperties": False
|
||||
},
|
||||
"strict": False
|
||||
}
|
||||
},
|
||||
stream=False
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
### Embeddings
|
||||
|
||||
```python
|
||||
from litellm import embedding
|
||||
|
||||
response = embedding(
|
||||
model="ovhcloud/BGE-M3",
|
||||
input=["sample text to embed", "another sample text to embed"]
|
||||
)
|
||||
|
||||
print(response.data)
|
||||
```
|
||||
|
||||
## Usage with LiteLLM Proxy Server
|
||||
|
||||
Here's how to call a OVHCloud AI Endpoints model with the LiteLLM Proxy Server
|
||||
|
||||
1. Modify the config.yaml
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: my-model
|
||||
litellm_params:
|
||||
model: ovhcloud/<your-model-name> # add ovhcloud/ prefix to route as OVHCloud provider
|
||||
api_key: api-key # api key to send your model
|
||||
```
|
||||
|
||||
|
||||
2. Start the proxy
|
||||
|
||||
```bash
|
||||
$ litellm --config /path/to/config.yaml
|
||||
```
|
||||
|
||||
3. Send Request to LiteLLM Proxy Server
|
||||
|
||||
<Tabs>
|
||||
|
||||
<TabItem value="openai" label="OpenAI Python v1.0.0+">
|
||||
|
||||
```python
|
||||
import openai
|
||||
client = openai.OpenAI(
|
||||
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
|
||||
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
|
||||
)
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="my-model",
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "what llm are you"
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="curl" label="curl">
|
||||
|
||||
```shell
|
||||
curl --location 'http://0.0.0.0:4000/chat/completions' \
|
||||
--header 'Authorization: Bearer sk-1234' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data '{
|
||||
"model": "my-model",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "what llm are you"
|
||||
}
|
||||
],
|
||||
}'
|
||||
```
|
||||
</TabItem>
|
||||
|
||||
</Tabs>
|
||||
@ -8,9 +8,9 @@ LiteLLM supports all models on VLLM.
|
||||
| Property | Details |
|
||||
|-------|-------|
|
||||
| Description | vLLM is a fast and easy-to-use library for LLM inference and serving. [Docs](https://docs.vllm.ai/en/latest/index.html) |
|
||||
| Provider Route on LiteLLM | `hosted_vllm/` (for OpenAI compatible server), `vllm/` (for vLLM sdk usage) |
|
||||
| Provider Route on LiteLLM | `hosted_vllm/` (for OpenAI compatible server), `vllm/` ([DEPRECATED] for vLLM sdk usage) |
|
||||
| Provider Doc | [vLLM ↗](https://docs.vllm.ai/en/latest/index.html) |
|
||||
| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions`, `/rerank` |
|
||||
| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions`, `/rerank`, `/audio/transcriptions` |
|
||||
|
||||
|
||||
# Quick Start
|
||||
|
||||
@ -4,6 +4,10 @@ import TabItem from '@theme/TabItem';
|
||||
|
||||
# ✨ SSO for Admin UI
|
||||
|
||||
:::info
|
||||
From v1.76.0, SSO is now Free for up to 5 users.
|
||||
:::
|
||||
|
||||
:::info
|
||||
|
||||
✨ SSO is on LiteLLM Enterprise
|
||||
|
||||
212
docs/my-website/docs/proxy/forward_client_headers.md
Normal file
212
docs/my-website/docs/proxy/forward_client_headers.md
Normal file
@ -0,0 +1,212 @@
|
||||
# Forward Client Headers to LLM API
|
||||
|
||||
Control which model groups can forward client headers to the underlying LLM provider APIs.
|
||||
|
||||
## Overview
|
||||
|
||||
By default, LiteLLM does not forward client headers to LLM provider APIs for security reasons. However, you can selectively enable header forwarding for specific model groups using the `forward_client_headers_to_llm_api` setting.
|
||||
|
||||
## Configuration
|
||||
|
||||
## Enable Globally
|
||||
|
||||
```yaml
|
||||
general_settings:
|
||||
forward_client_headers_to_llm_api: true
|
||||
```
|
||||
|
||||
## Enable for a Model Group
|
||||
|
||||
Add the `forward_client_headers_to_llm_api` setting under `model_group_settings` in your configuration:
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: gpt-4o-mini
|
||||
litellm_params:
|
||||
model: openai/gpt-4o-mini
|
||||
api_key: "your-api-key"
|
||||
- model_name: "wildcard-models/*"
|
||||
litellm_params:
|
||||
model: "openai/*"
|
||||
api_key: "your-api-key"
|
||||
|
||||
litellm_settings:
|
||||
model_group_settings:
|
||||
forward_client_headers_to_llm_api:
|
||||
- gpt-4o-mini
|
||||
- wildcard-models/*
|
||||
```
|
||||
|
||||
## Supported Model Patterns
|
||||
|
||||
The configuration supports various model matching patterns:
|
||||
|
||||
### 1. Exact Model Names
|
||||
```yaml
|
||||
forward_client_headers_to_llm_api:
|
||||
- gpt-4o-mini
|
||||
- claude-3-sonnet
|
||||
```
|
||||
|
||||
### 2. Wildcard Patterns
|
||||
```yaml
|
||||
forward_client_headers_to_llm_api:
|
||||
- "openai/*" # All OpenAI models
|
||||
- "anthropic/*" # All Anthropic models
|
||||
- "wildcard-group/*" # All models in wildcard-group
|
||||
```
|
||||
|
||||
### 3. Team Model Aliases
|
||||
If your team has model aliases configured, the forwarding will work with both the original model name and the alias.
|
||||
|
||||
## Forwarded Headers
|
||||
|
||||
When enabled for a model group, LiteLLM forwards the following types of headers:
|
||||
|
||||
### Custom Headers (x- prefix)
|
||||
- Any header starting with `x-` (except `x-stainless-*` which can cause OpenAI SDK issues)
|
||||
- Examples: `x-custom-header`, `x-request-id`, `x-trace-id`
|
||||
|
||||
### Provider-Specific Headers
|
||||
- **Anthropic**: `anthropic-beta` headers
|
||||
- **OpenAI**: `openai-organization` (when enabled via `forward_openai_org_id: true`)
|
||||
|
||||
### User Information Headers (Optional)
|
||||
When `add_user_information_to_llm_headers` is enabled, LiteLLM adds:
|
||||
- `x-litellm-user-id`
|
||||
- `x-litellm-org-id`
|
||||
- Other user metadata as `x-litellm-*` headers
|
||||
|
||||
## Security Considerations
|
||||
|
||||
⚠️ **Important Security Notes:**
|
||||
|
||||
1. **Sensitive Data**: Only enable header forwarding for trusted model groups, as headers may contain sensitive information
|
||||
2. **API Keys**: Never include API keys or secrets in forwarded headers
|
||||
3. **PII**: Be cautious about forwarding headers that might contain personally identifiable information
|
||||
4. **Provider Limits**: Some providers have restrictions on custom headers
|
||||
|
||||
## Example Use Cases
|
||||
|
||||
### 1. Request Tracing
|
||||
Forward tracing headers to track requests across your system:
|
||||
|
||||
```bash
|
||||
curl -X POST "https://your-proxy.com/v1/chat/completions" \
|
||||
-H "Authorization: Bearer your-key" \
|
||||
-H "x-trace-id: abc123" \
|
||||
-H "x-request-source: mobile-app" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello"}]
|
||||
}'
|
||||
```
|
||||
|
||||
### 2. Custom Metadata
|
||||
Pass custom metadata to your LLM provider:
|
||||
|
||||
```bash
|
||||
curl -X POST "https://your-proxy.com/v1/chat/completions" \
|
||||
-H "Authorization: Bearer your-key" \
|
||||
-H "x-customer-id: customer-123" \
|
||||
-H "x-environment: production" \
|
||||
-d '{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [{"role": "user", "content": "Hello"}]
|
||||
}'
|
||||
```
|
||||
|
||||
### 3. Anthropic Beta Features
|
||||
Enable beta features for Anthropic models:
|
||||
|
||||
```bash
|
||||
curl -X POST "https://your-proxy.com/v1/chat/completions" \
|
||||
-H "Authorization: Bearer your-key" \
|
||||
-H "anthropic-beta: tools-2024-04-04" \
|
||||
-d '{
|
||||
"model": "claude-3-sonnet",
|
||||
"messages": [{"role": "user", "content": "Hello"}]
|
||||
}'
|
||||
```
|
||||
|
||||
## Complete Configuration Example
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
# Fixed model with header forwarding
|
||||
- model_name: byok-fixed-gpt-4o-mini
|
||||
litellm_params:
|
||||
model: openai/gpt-4o-mini
|
||||
api_base: "https://your-openai-endpoint.com"
|
||||
api_key: "your-api-key"
|
||||
|
||||
# Wildcard model group with header forwarding
|
||||
- model_name: "byok-wildcard/*"
|
||||
litellm_params:
|
||||
model: "openai/*"
|
||||
api_base: "https://your-openai-endpoint.com"
|
||||
api_key: "your-api-key"
|
||||
|
||||
# Standard model without header forwarding
|
||||
- model_name: standard-gpt-4
|
||||
litellm_params:
|
||||
model: openai/gpt-4
|
||||
api_key: "your-api-key"
|
||||
|
||||
litellm_settings:
|
||||
# Enable user info headers globally (optional)
|
||||
add_user_information_to_llm_headers: true
|
||||
|
||||
model_group_settings:
|
||||
forward_client_headers_to_llm_api:
|
||||
- byok-fixed-gpt-4o-mini
|
||||
- byok-wildcard/*
|
||||
# Note: standard-gpt-4 is NOT included, so no headers forwarded
|
||||
|
||||
general_settings:
|
||||
# Enable OpenAI organization header forwarding (optional)
|
||||
forward_openai_org_id: true
|
||||
```
|
||||
|
||||
## Testing Header Forwarding
|
||||
|
||||
To test if headers are being forwarded:
|
||||
|
||||
1. **Enable Debug Logging**: Set `set_verbose: true` in your config
|
||||
2. **Check Provider Logs**: Monitor your LLM provider's request logs
|
||||
3. **Use Webhook Sites**: For testing, you can use webhook.site URLs as api_base to see forwarded headers
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Headers Not Being Forwarded
|
||||
|
||||
1. **Check Model Name**: Ensure the model name in your request matches the configuration
|
||||
2. **Verify Pattern Matching**: Wildcard patterns must match exactly
|
||||
3. **Review Logs**: Enable verbose logging to see header processing
|
||||
|
||||
### Provider Errors
|
||||
|
||||
1. **Invalid Headers**: Some providers reject unknown headers
|
||||
2. **Header Limits**: Providers may have limits on header count/size
|
||||
3. **Authentication**: Ensure forwarded headers don't conflict with authentication
|
||||
|
||||
## Related Features
|
||||
|
||||
- [Request Headers](./request_headers.md) - Complete list of supported request headers
|
||||
- [Response Headers](./response_headers.md) - Headers returned by LiteLLM
|
||||
- [Team Model Aliases](./team_model_add.md) - Configure model aliases for teams
|
||||
- [Model Access Control](./model_access.md) - Control which users can access which models
|
||||
|
||||
## API Reference
|
||||
|
||||
The header forwarding is controlled by the `ModelGroupSettings` configuration:
|
||||
|
||||
```python
|
||||
class ModelGroupSettings(BaseModel):
|
||||
forward_client_headers_to_llm_api: Optional[List[str]] = None
|
||||
```
|
||||
|
||||
Where each string in the list can be:
|
||||
- An exact model name (e.g., `"gpt-4o-mini"`)
|
||||
- A wildcard pattern (e.g., `"openai/*"`)
|
||||
- A model group name (e.g., `"my-model-group/*"`)
|
||||
@ -135,6 +135,7 @@ guardrails:
|
||||
# application_id: "my-app"
|
||||
# monitor_mode: false
|
||||
# block_failures: true
|
||||
# anonymize_input: false
|
||||
```
|
||||
|
||||
### Required Parameters
|
||||
@ -147,6 +148,7 @@ guardrails:
|
||||
- **`application_id`**: Your application identifier (defaults to `"litellm"`)
|
||||
- **`monitor_mode`**: If `true`, logs violations without blocking (defaults to `false`)
|
||||
- **`block_failures`**: If `true`, blocks requests when guardrail API failures occur (defaults to `true`)
|
||||
- **`anonymize_input`**: If `true`, replaces sensitive content with anonymized version (defaults to `false`)
|
||||
|
||||
## Environment Variables
|
||||
|
||||
@ -158,6 +160,7 @@ export NOMA_API_BASE="https://api.noma.security/" # Optional
|
||||
export NOMA_APPLICATION_ID="my-app" # Optional
|
||||
export NOMA_MONITOR_MODE="false" # Optional
|
||||
export NOMA_BLOCK_FAILURES="true" # Optional
|
||||
export NOMA_ANONYMIZE_INPUT="false" # Optional
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
@ -190,6 +193,20 @@ guardrails:
|
||||
block_failures: false # Allow requests to proceed if guardrail API fails
|
||||
```
|
||||
|
||||
### Content Anonymization
|
||||
|
||||
Enable anonymization to replace sensitive content instead of blocking:
|
||||
|
||||
```yaml
|
||||
guardrails:
|
||||
- guardrail_name: "noma-anonymize"
|
||||
litellm_params:
|
||||
guardrail: noma
|
||||
mode: "pre_call"
|
||||
api_key: os.environ/NOMA_API_KEY
|
||||
anonymize_input: true # Replace sensitive data with anonymized version
|
||||
```
|
||||
|
||||
### Multiple Guardrails
|
||||
|
||||
Apply different configurations for input and output:
|
||||
|
||||
153
docs/my-website/docs/proxy/guardrails/tool_permission.md
Normal file
153
docs/my-website/docs/proxy/guardrails/tool_permission.md
Normal file
@ -0,0 +1,153 @@
|
||||
import Image from '@theme/IdealImage';
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# Tool Permission Guardrail
|
||||
|
||||
LiteLLM provides a Tool Permission Guardrail that lets you control which **tool calls** a model is allowed to invoke, using configurable allow/deny rules. This offers fine-grained, provider-agnostic control over tool execution (e.g., OpenAI Chat Completions `tool_calls`, Anthropic Messages `tool_use`, MCP tools).
|
||||
|
||||
## Quick Start
|
||||
### 1. Define Guardrails on your LiteLLM config.yaml
|
||||
|
||||
Define your guardrails under the `guardrails` section
|
||||
```yaml
|
||||
guardrails:
|
||||
- guardrail_name: "tool-permission-guardrail"
|
||||
litellm_params:
|
||||
guardrail: tool_permission
|
||||
mode: "post_call"
|
||||
rules:
|
||||
- id: "allow_bash"
|
||||
tool_name: "Bash"
|
||||
decision: "allow"
|
||||
- id: "allow_github_mcp"
|
||||
tool_name: "mcp__github_*"
|
||||
decision: "allow"
|
||||
- id: "allow_aws_documentation"
|
||||
tool_name: "mcp__aws-documentation_*_documentation"
|
||||
decision: "allow"
|
||||
- id: "deny_read_commands"
|
||||
tool_name: "Read"
|
||||
decision: "Deny"
|
||||
default_action: "deny" # Fallback when no rule matches: "allow" or "deny"
|
||||
on_disallowed_action: "block" # How to handle disallowed tools: "block" or "rewrite"
|
||||
```
|
||||
|
||||
#### Rule Structure
|
||||
|
||||
```yaml
|
||||
- id: "unique_rule_id" # Unique identifier for the rule
|
||||
tool_name: "pattern" # Tool name or pattern to match
|
||||
decision: "allow" # "allow" or "deny"
|
||||
```
|
||||
|
||||
#### Supported values for `mode`
|
||||
|
||||
- `pre_call` Run **before** LLM call, on **input**
|
||||
- `post_call` Run **after** LLM call, on **input & output**
|
||||
|
||||
### 2. Start the Proxy
|
||||
|
||||
```shell
|
||||
litellm --config config.yaml --port 4000
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="block" label="Block Request">
|
||||
|
||||
**Block requset**
|
||||
|
||||
```bash
|
||||
# Test
|
||||
curl -X POST "http://localhost:4000/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer your-master-key-here" \
|
||||
-d '{
|
||||
"model": "gpt-5-mini",
|
||||
"messages": [{"role": "user","content": "What is the weather like in Tokyo today?"}],
|
||||
"tools": [
|
||||
{
|
||||
"type":"function",
|
||||
"function": {
|
||||
"name":"get_current_weather",
|
||||
"description": "Get the current weather in a given location"
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
**Expected response (Denied):**
|
||||
|
||||
```json
|
||||
{
|
||||
"error":
|
||||
{
|
||||
"message": "Guardrail raised an exception, Guardrail: tool-permission-guardrail, Message: Tool 'get_current_weather' denied by default action",
|
||||
"type": "None",
|
||||
"param": "None",
|
||||
"code": "500"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="rewrite" label="Rewrite Request">
|
||||
|
||||
**Rewrite requset**
|
||||
|
||||
```bash
|
||||
# Test
|
||||
curl -X POST "http://localhost:4000/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer your-master-key-here" \
|
||||
-d '{
|
||||
"model": "gpt-5-mini",
|
||||
"messages": [{"role": "user","content": "What is the weather like in Tokyo today?"}],
|
||||
"tools": [
|
||||
{
|
||||
"type":"function",
|
||||
"function": {
|
||||
"name":"get_current_weather",
|
||||
"description": "Get the current weather in a given location"
|
||||
}
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
**Expected response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-xxxxxxxxxxxxxxx",
|
||||
"created": 1757716050,
|
||||
"model": "gpt-5-mini-2025-08-07",
|
||||
"object": "chat.completion",
|
||||
"choices": [
|
||||
{
|
||||
"finish_reason": "stop",
|
||||
"index": 0,
|
||||
"message": {
|
||||
"content": "I can’t fetch live weather — I don’t have real‑time internet access.",
|
||||
"role": "assistant",
|
||||
"annotations": []
|
||||
},
|
||||
"provider_specific_fields": {}
|
||||
}
|
||||
],
|
||||
"usage": {
|
||||
"prompt_tokens": 112,
|
||||
"total_tokens": 735,
|
||||
"completion_tokens_details": {
|
||||
"reasoning_tokens": 384,
|
||||
},
|
||||
},
|
||||
"service_tier": "default"
|
||||
}
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
@ -2,6 +2,10 @@
|
||||
|
||||
Special headers that are supported by LiteLLM.
|
||||
|
||||
## Header Forwarding
|
||||
|
||||
By default, LiteLLM does not forward client headers to LLM provider APIs. However, you can selectively enable header forwarding for specific model groups. [Learn more about configuring header forwarding](./forward_client_headers.md).
|
||||
|
||||
## LiteLLM Headers
|
||||
|
||||
`x-litellm-timeout` Optional[float]: The timeout for the request in seconds.
|
||||
@ -21,11 +25,15 @@ Special headers that are supported by LiteLLM.
|
||||
`anthropic-version` Optional[str]: The version of the Anthropic API to use.
|
||||
`anthropic-beta` Optional[str]: The beta version of the Anthropic API to use.
|
||||
- For `/v1/messages` endpoint, this will always be forward the header to the underlying model.
|
||||
- For `/chat/completions` endpoint, this will only be forwarded if `forward_client_headers_to_llm_api` is true.
|
||||
- For `/chat/completions` endpoint, this will only be forwarded if the model is configured in `forward_client_headers_to_llm_api`. [Learn more](./forward_client_headers.md)
|
||||
|
||||
## OpenAI Headers
|
||||
|
||||
`openai-organization` Optional[str]: The organization to use for the OpenAI API. (currently needs to be enabled via `general_settings::forward_openai_org_id: true`)
|
||||
|
||||
## Custom Headers
|
||||
|
||||
Custom headers starting with `x-` can be forwarded to LLM provider APIs when the model is configured in `forward_client_headers_to_llm_api`. [Learn more about header forwarding configuration](./forward_client_headers.md).
|
||||
|
||||
|
||||
|
||||
|
||||
@ -89,16 +89,20 @@ To track spend and usage for each Open WebUI user, configure both Open WebUI and
|
||||
|
||||
2. **Configure LiteLLM to Parse User Headers**
|
||||
|
||||
Add the following to your LiteLLM `config.yaml` to specify a header to use for user tracking:
|
||||
Add the following to your LiteLLM `config.yaml` to specify the request header mapping for user tracking:
|
||||
|
||||
```yaml
|
||||
general_settings:
|
||||
user_header_name: X-OpenWebUI-User-Id
|
||||
user_header_mappings:
|
||||
- header_name: X-OpenWebUI-User-Id
|
||||
litellm_user_role: internal_user
|
||||
- header_name: X-OpenWebUI-User-Email
|
||||
litellm_user_role: customer
|
||||
```
|
||||
|
||||
ⓘ Available tracking options
|
||||
|
||||
You can use any of the following headers for `user_header_name`:
|
||||
You can use any of the following headers in `header_name` in `user_header_mappings` :
|
||||
- `X-OpenWebUI-User-Id`
|
||||
- `X-OpenWebUI-User-Email`
|
||||
- `X-OpenWebUI-User-Name`
|
||||
@ -109,6 +113,12 @@ To track spend and usage for each Open WebUI user, configure both Open WebUI and
|
||||
- Users can modify their own usernames
|
||||
- Administrators can modify both usernames and emails of any account
|
||||
|
||||
This video walks through on how we can map the openweb ui headers to LiteLLM user roles
|
||||
|
||||
<iframe src="https://www.loom.com/embed/a1b6a4635fc0478ba4fd34cae16e2ffd?sid=791c2dcc-7e65-45be-bf7f-27d2601c123e" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen width="840" height="500"></iframe>
|
||||
|
||||
<br/>
|
||||
<br/>
|
||||
|
||||
|
||||
## Render `thinking` content on Open WebUI
|
||||
|
||||
155
docs/my-website/release_notes/v1.77.2-stable/index.md
Normal file
155
docs/my-website/release_notes/v1.77.2-stable/index.md
Normal file
@ -0,0 +1,155 @@
|
||||
---
|
||||
title: "v1.77.2-stable - Bedrock Batches API"
|
||||
slug: "v1-77-2"
|
||||
date: 2025-09-13T10:00:00
|
||||
authors:
|
||||
- name: Krrish Dholakia
|
||||
title: CEO, LiteLLM
|
||||
url: https://www.linkedin.com/in/krish-d/
|
||||
image_url: https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg
|
||||
- name: Ishaan Jaffer
|
||||
title: CTO, LiteLLM
|
||||
url: https://www.linkedin.com/in/reffajnaahsi/
|
||||
image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
|
||||
|
||||
hide_table_of_contents: false
|
||||
---
|
||||
|
||||
import Image from '@theme/IdealImage';
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
## Deploy this version
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="docker" label="Docker">
|
||||
|
||||
``` showLineNumbers title="docker run litellm"
|
||||
docker run \
|
||||
-e STORE_MODEL_IN_DB=True \
|
||||
-p 4000:4000 \
|
||||
ghcr.io/berriai/litellm:v1.77.2
|
||||
```
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="pip" label="Pip">
|
||||
|
||||
``` showLineNumbers title="pip install litellm"
|
||||
pip install litellm==1.77.2
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
---
|
||||
|
||||
## Key Highlights
|
||||
|
||||
- **Bedrock Batches API** - Support for creating Batch Inference Jobs on Bedrock using LiteLLM's unified batch API (OpenAI compatible)
|
||||
- **Qwen API Tiered Pricing** - Cost tracking support for Dashscope (Qwen) models with multiple pricing tiers
|
||||
|
||||
## New Models / Updated Models
|
||||
|
||||
#### New Model Support
|
||||
|
||||
| Provider | Model | Context Window | Pricing ($/1M tokens) | Features |
|
||||
| ----------- | ------------------------------- | -------------- | --------------------- | -------- |
|
||||
| DeepInfra | `deepinfra/deepseek-ai/DeepSeek-R1` | 164K | **Input:** $0.70<br/>**Output:** $2.40 | Chat completions, tool calling |
|
||||
| Heroku | `heroku/claude-4-sonnet` | 8K | Contact provider for pricing | Function calling, tool choice |
|
||||
| Heroku | `heroku/claude-3-7-sonnet` | 8K | Contact provider for pricing | Function calling, tool choice |
|
||||
| Heroku | `heroku/claude-3-5-sonnet-latest` | 8K | Contact provider for pricing | Function calling, tool choice |
|
||||
| Heroku | `heroku/claude-3-5-haiku` | 4K | Contact provider for pricing | Function calling, tool choice |
|
||||
| Dashscope | `dashscope/qwen-plus-latest` | 1M | **Tiered Pricing:**<br/>• 0-256K tokens: $0.40 / $1.20<br/>• 256K-1M tokens: $1.20 / $3.60 | Function calling, reasoning |
|
||||
| Dashscope | `dashscope/qwen3-max-preview` | 262K | **Tiered Pricing:**<br/>• 0-32K tokens: $1.20 / $6.00<br/>• 32K-128K tokens: $2.40 / $12.00<br/>• 128K-252K tokens: $3.00 / $15.00 | Function calling, reasoning |
|
||||
| Dashscope | `dashscope/qwen-flash` | 1M | **Tiered Pricing:**<br/>• 0-256K tokens: $0.05 / $0.40<br/>• 256K-1M tokens: $0.25 / $2.00 | Function calling, reasoning |
|
||||
| Dashscope | `dashscope/qwen3-coder-plus` | 1M | **Tiered Pricing:**<br/>• 0-32K tokens: $1.00 / $5.00<br/>• 32K-128K tokens: $1.80 / $9.00<br/>• 128K-256K tokens: $3.00 / $15.00<br/>• 256K-1M tokens: $6.00 / $60.00 | Function calling, reasoning, caching |
|
||||
| Dashscope | `dashscope/qwen3-coder-flash` | 1M | **Tiered Pricing:**<br/>• 0-32K tokens: $0.30 / $1.50<br/>• 32K-128K tokens: $0.50 / $2.50<br/>• 128K-256K tokens: $0.80 / $4.00<br/>• 256K-1M tokens: $1.60 / $9.60 | Function calling, reasoning, caching |
|
||||
|
||||
---
|
||||
|
||||
#### Features
|
||||
|
||||
- **[Bedrock](../../docs/providers/bedrock_batches)**
|
||||
- Bedrock Batches API - batch processing support with file upload and request transformation - [PR #14518](https://github.com/BerriAI/litellm/pull/14518), [PR #14522](https://github.com/BerriAI/litellm/pull/14522)
|
||||
- **[VLLM](../../docs/providers/vllm)**
|
||||
- Added transcription endpoint support - [PR #14523](https://github.com/BerriAI/litellm/pull/14523)
|
||||
- **[Ollama](../../docs/providers/ollama)**
|
||||
- `ollama_chat/` - images, thinking, and content as list handling - [PR #14523](https://github.com/BerriAI/litellm/pull/14523)
|
||||
- **General**
|
||||
- New debug flag for detailed request/response logging [PR #14482](https://github.com/BerriAI/litellm/pull/14482)
|
||||
|
||||
#### Bug Fixes
|
||||
|
||||
- **[Azure OpenAI](../../docs/providers/azure)**
|
||||
- Fixed extra_body injection causing payload rejection in image generation - [PR #14475](https://github.com/BerriAI/litellm/pull/14475)
|
||||
- **[LM Studio](../../docs/providers/lm-studio)**
|
||||
- Resolved illegal Bearer header value issue - [PR #14512](https://github.com/BerriAI/litellm/pull/14512)
|
||||
|
||||
---
|
||||
|
||||
## LLM API Endpoints
|
||||
|
||||
#### Bug Fixes
|
||||
|
||||
- **[/messages](../../docs/anthropic_unified)**
|
||||
- Don't send content block after message w/ finish reason + usage block - [PR #14477](https://github.com/BerriAI/litellm/pull/14477)
|
||||
- **[/generateContent](../../docs/generateContent)**
|
||||
- Gemini CLI Integration - Fixed token count errors - [PR #14451](https://github.com/BerriAI/litellm/pull/14451), [PR #14417](https://github.com/BerriAI/litellm/pull/14417)
|
||||
|
||||
---
|
||||
|
||||
## Spend Tracking, Budgets and Rate Limiting
|
||||
|
||||
#### Features
|
||||
|
||||
- **[Qwen API Tiered Pricing](../../docs/providers/dashscope)** - Added comprehensive tiered cost tracking for Dashscope/Qwen models - [PR #14471](https://github.com/BerriAI/litellm/pull/14471), [PR #14479](https://github.com/BerriAI/litellm/pull/14479)
|
||||
|
||||
#### Bug Fixes
|
||||
|
||||
- **Provider Budgets** - Fixed provider budget calculations - [PR #14459](https://github.com/BerriAI/litellm/pull/14459)
|
||||
|
||||
---
|
||||
|
||||
## Management Endpoints / UI
|
||||
|
||||
#### Features
|
||||
|
||||
- **User Headers Mapping** - New X-LiteLLM Users mapping feature for enhanced user tracking - [PR #14485](https://github.com/BerriAI/litellm/pull/14485)
|
||||
- **Key Unblocking** - Support for hashed tokens in `/key/unblock` endpoint - [PR #14477](https://github.com/BerriAI/litellm/pull/14477)
|
||||
- **Model Group Header Forwarding** - Enhanced wildcard model support with documentation - [PR #14528](https://github.com/BerriAI/litellm/pull/14528)
|
||||
|
||||
#### Bug Fixes
|
||||
|
||||
- **Log Tab Key Alias** - Fixed filtering inaccuracies for failed logs - [PR #14469](https://github.com/BerriAI/litellm/pull/14469), [PR #14529](https://github.com/BerriAI/litellm/pull/14529)
|
||||
|
||||
---
|
||||
|
||||
## Logging / Guardrail Integrations
|
||||
|
||||
#### Features
|
||||
|
||||
- **Noma Integration** - Added non-blocking monitor mode with anonymize input support - [PR #14401](https://github.com/BerriAI/litellm/pull/14401)
|
||||
|
||||
---
|
||||
|
||||
## Performance / Loadbalancing / Reliability improvements
|
||||
|
||||
#### Performance
|
||||
- Removed dynamic creation of static values - [PR #14538](https://github.com/BerriAI/litellm/pull/14538)
|
||||
- Using `_PROXY_MaxParallelRequestsHandler_v3` by default for optimal throughput - [PR #14450](https://github.com/BerriAI/litellm/pull/14450)
|
||||
- Improved execution context propagation into logging tasks - [PR #14455](https://github.com/BerriAI/litellm/pull/14455)
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
## New Contributors
|
||||
* @Sameerlite made their first contribution in [PR #14460](https://github.com/BerriAI/litellm/pull/14460)
|
||||
* @holzman made their first contribution in [PR #14459](https://github.com/BerriAI/litellm/pull/14459)
|
||||
* @sashank5644 made their first contribution in [PR #14469](https://github.com/BerriAI/litellm/pull/14469)
|
||||
* @TomAlon made their first contribution in [PR #14401](https://github.com/BerriAI/litellm/pull/14401)
|
||||
* @AlexsanderHamir made their first contribution in [PR #14538](https://github.com/BerriAI/litellm/pull/14538)
|
||||
|
||||
---
|
||||
|
||||
## **[Full Changelog](https://github.com/BerriAI/litellm/compare/v1.77.1.dev.2...v1.77.2.dev)**
|
||||
@ -49,6 +49,7 @@ const sidebars = {
|
||||
"proxy/guardrails/secret_detection",
|
||||
"proxy/guardrails/custom_guardrail",
|
||||
"proxy/guardrails/prompt_injection",
|
||||
"proxy/guardrails/tool_permission",
|
||||
].sort(),
|
||||
],
|
||||
},
|
||||
@ -141,6 +142,7 @@ const sidebars = {
|
||||
"proxy/clientside_auth",
|
||||
"proxy/request_headers",
|
||||
"proxy/response_headers",
|
||||
"proxy/forward_client_headers",
|
||||
"proxy/model_discovery",
|
||||
],
|
||||
},
|
||||
@ -487,7 +489,8 @@ const sidebars = {
|
||||
"providers/bytez",
|
||||
"providers/heroku",
|
||||
"providers/oci",
|
||||
"providers/datarobot",
|
||||
"providers/datarobot",
|
||||
"providers/ovhcloud",
|
||||
],
|
||||
},
|
||||
{
|
||||
|
||||
@ -1,4 +1,6 @@
|
||||
from typing import Literal, TypedDict
|
||||
from typing import Literal
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
|
||||
class CustomAuthSettings(TypedDict):
|
||||
|
||||
9
litellm-js/spend-logs/package-lock.json
generated
9
litellm-js/spend-logs/package-lock.json
generated
@ -6,7 +6,7 @@
|
||||
"": {
|
||||
"dependencies": {
|
||||
"@hono/node-server": "^1.10.1",
|
||||
"hono": "^4.6.5"
|
||||
"hono": "^4.9.7"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.11.17",
|
||||
@ -463,9 +463,10 @@
|
||||
}
|
||||
},
|
||||
"node_modules/hono": {
|
||||
"version": "4.6.5",
|
||||
"resolved": "https://registry.npmjs.org/hono/-/hono-4.6.5.tgz",
|
||||
"integrity": "sha512-qsmN3V5fgtwdKARGLgwwHvcdLKursMd+YOt69eGpl1dUCJb8mCd7hZfyZnBYjxCegBG7qkJRQRUy2oO25yHcyQ==",
|
||||
"version": "4.9.7",
|
||||
"resolved": "https://registry.npmjs.org/hono/-/hono-4.9.7.tgz",
|
||||
"integrity": "sha512-t4Te6ERzIaC48W3x4hJmBwgNlLhmiEdEE5ViYb02ffw4ignHNHa5IBtPjmbKstmtKa8X6C35iWwK4HaqvrzG9w==",
|
||||
"license": "MIT",
|
||||
"engines": {
|
||||
"node": ">=16.9.0"
|
||||
}
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
},
|
||||
"dependencies": {
|
||||
"@hono/node-server": "^1.10.1",
|
||||
"hono": "^4.6.5"
|
||||
"hono": "^4.9.7"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.11.17",
|
||||
|
||||
@ -241,6 +241,7 @@ gradient_ai_api_key: Optional[str] = None
|
||||
nebius_key: Optional[str] = None
|
||||
heroku_key: Optional[str] = None
|
||||
cometapi_key: Optional[str] = None
|
||||
ovhcloud_key: Optional[str] = None
|
||||
common_cloud_provider_auth_params: dict = {
|
||||
"params": ["project", "region_name", "token"],
|
||||
"providers": ["vertex_ai", "bedrock", "watsonx", "azure", "vertex_ai_beta"],
|
||||
@ -520,6 +521,8 @@ cometapi_models: Set = set()
|
||||
oci_models: Set = set()
|
||||
vercel_ai_gateway_models: Set = set()
|
||||
volcengine_models: Set = set()
|
||||
ovhcloud_models: Set = set()
|
||||
ovhcloud_embedding_models: Set = set()
|
||||
|
||||
|
||||
def is_bedrock_pricing_only_model(key: str) -> bool:
|
||||
@ -734,6 +737,10 @@ def add_known_models():
|
||||
oci_models.add(key)
|
||||
elif value.get("litellm_provider") == "volcengine":
|
||||
volcengine_models.add(key)
|
||||
elif value.get("litellm_provider") == "ovhcloud":
|
||||
ovhcloud_models.add(key)
|
||||
elif value.get("litellm_provider") == "ovhcloud-embedding-models":
|
||||
ovhcloud_embedding_models.add(key)
|
||||
|
||||
|
||||
add_known_models()
|
||||
@ -828,6 +835,7 @@ model_list = list(
|
||||
| heroku_models
|
||||
| vercel_ai_gateway_models
|
||||
| volcengine_models
|
||||
| ovhcloud_models
|
||||
)
|
||||
|
||||
model_list_set = set(model_list)
|
||||
@ -909,6 +917,7 @@ models_by_provider: dict = {
|
||||
"cometapi": cometapi_models,
|
||||
"oci": oci_models,
|
||||
"volcengine": volcengine_models,
|
||||
"ovhcloud": ovhcloud_models | ovhcloud_embedding_models,
|
||||
}
|
||||
|
||||
# mapping for those models which have larger equivalents
|
||||
@ -943,6 +952,7 @@ all_embedding_models = (
|
||||
| fireworks_ai_embedding_models
|
||||
| nebius_embedding_models
|
||||
| sambanova_embedding_models
|
||||
| ovhcloud_embedding_models
|
||||
)
|
||||
|
||||
####### IMAGE GENERATION MODELS ###################
|
||||
@ -1255,6 +1265,8 @@ from .llms.morph.chat.transformation import MorphChatConfig
|
||||
from .llms.lambda_ai.chat.transformation import LambdaAIChatConfig
|
||||
from .llms.hyperbolic.chat.transformation import HyperbolicChatConfig
|
||||
from .llms.vercel_ai_gateway.chat.transformation import VercelAIGatewayConfig
|
||||
from .llms.ovhcloud.chat.transformation import OVHCloudChatConfig
|
||||
from .llms.ovhcloud.embedding.transformation import OVHCloudEmbeddingConfig
|
||||
from .main import * # type: ignore
|
||||
from .integrations import *
|
||||
from .llms.custom_httpx.async_client_cleanup import close_litellm_async_clients
|
||||
|
||||
@ -2,7 +2,9 @@
|
||||
Handler for transforming /chat/completions api requests to litellm.responses requests
|
||||
"""
|
||||
|
||||
from typing import TYPE_CHECKING, Any, Coroutine, TypedDict, Union
|
||||
from typing import TYPE_CHECKING, Any, Coroutine, Union
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from litellm import CustomStreamWrapper, LiteLLMLoggingObj, ModelResponse
|
||||
|
||||
@ -313,6 +313,7 @@ LITELLM_CHAT_PROVIDERS = [
|
||||
"morph",
|
||||
"lambda_ai",
|
||||
"vercel_ai_gateway",
|
||||
"ovhcloud",
|
||||
]
|
||||
|
||||
LITELLM_EMBEDDING_PROVIDERS_SUPPORTING_INPUT_ARRAY_OF_TOKENS = [
|
||||
@ -1023,6 +1024,7 @@ SENTRY_DENYLIST = [
|
||||
"FIREWORKS_API_KEY",
|
||||
"FIREWORKS_AI_API_KEY",
|
||||
"FIREWORKSAI_API_KEY",
|
||||
"OVHCLOUD_API_KEY",
|
||||
# Database and Connection Strings
|
||||
"database_url",
|
||||
"redis_url",
|
||||
|
||||
@ -2,7 +2,9 @@
|
||||
Handler for transforming /chat/completions api requests to litellm.responses requests
|
||||
"""
|
||||
|
||||
from typing import TYPE_CHECKING, Optional, TypedDict, Union
|
||||
from typing import TYPE_CHECKING, Optional, Union
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from litellm import LiteLLMLoggingObj
|
||||
|
||||
@ -31,7 +31,7 @@ class SoftBudgetAlert(BaseBudgetAlertType):
|
||||
return "Soft Budget Crossed: "
|
||||
|
||||
def get_id(self, user_info: CallInfo) -> str:
|
||||
return "default_id"
|
||||
return user_info.token or "default_id"
|
||||
|
||||
|
||||
class UserBudgetAlert(BaseBudgetAlertType):
|
||||
|
||||
@ -64,7 +64,7 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
asyncio.create_task(self.periodic_flush())
|
||||
self.flush_lock = asyncio.Lock()
|
||||
self.log_queue: List[LLMObsPayload] = []
|
||||
|
||||
|
||||
#########################################################
|
||||
# Handle datadog_llm_observability_params set as litellm.datadog_llm_observability_params
|
||||
#########################################################
|
||||
@ -83,22 +83,25 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
"""
|
||||
dict_datadog_llm_obs_params: Dict = {}
|
||||
if litellm.datadog_llm_observability_params is not None:
|
||||
if isinstance(litellm.datadog_llm_observability_params, DatadogLLMObsInitParams):
|
||||
dict_datadog_llm_obs_params = litellm.datadog_llm_observability_params.model_dump()
|
||||
if isinstance(
|
||||
litellm.datadog_llm_observability_params, DatadogLLMObsInitParams
|
||||
):
|
||||
dict_datadog_llm_obs_params = (
|
||||
litellm.datadog_llm_observability_params.model_dump()
|
||||
)
|
||||
elif isinstance(litellm.datadog_llm_observability_params, Dict):
|
||||
# only allow params that are of DatadogLLMObsInitParams
|
||||
dict_datadog_llm_obs_params = DatadogLLMObsInitParams(**litellm.datadog_llm_observability_params).model_dump()
|
||||
dict_datadog_llm_obs_params = DatadogLLMObsInitParams(
|
||||
**litellm.datadog_llm_observability_params
|
||||
).model_dump()
|
||||
return dict_datadog_llm_obs_params
|
||||
|
||||
|
||||
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
|
||||
try:
|
||||
verbose_logger.debug(
|
||||
f"DataDogLLMObs: Logging success event for model {kwargs.get('model', 'unknown')}"
|
||||
)
|
||||
payload = self.create_llm_obs_payload(
|
||||
kwargs, start_time, end_time
|
||||
)
|
||||
payload = self.create_llm_obs_payload(kwargs, start_time, end_time)
|
||||
verbose_logger.debug(f"DataDogLLMObs: Payload: {payload}")
|
||||
self.log_queue.append(payload)
|
||||
|
||||
@ -108,15 +111,13 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
verbose_logger.exception(
|
||||
f"DataDogLLMObs: Error logging success event - {str(e)}"
|
||||
)
|
||||
|
||||
|
||||
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
|
||||
try:
|
||||
verbose_logger.debug(
|
||||
f"DataDogLLMObs: Logging failure event for model {kwargs.get('model', 'unknown')}"
|
||||
)
|
||||
payload = self.create_llm_obs_payload(
|
||||
kwargs, start_time, end_time
|
||||
)
|
||||
payload = self.create_llm_obs_payload(kwargs, start_time, end_time)
|
||||
verbose_logger.debug(f"DataDogLLMObs: Payload: {payload}")
|
||||
self.log_queue.append(payload)
|
||||
|
||||
@ -184,7 +185,6 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
|
||||
messages = standard_logging_payload["messages"]
|
||||
messages = self._ensure_string_content(messages=messages)
|
||||
response_obj = standard_logging_payload.get("response")
|
||||
|
||||
metadata = kwargs.get("litellm_params", {}).get("metadata", {})
|
||||
|
||||
@ -193,10 +193,12 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
messages
|
||||
)
|
||||
)
|
||||
output_meta = OutputMeta(messages=self._get_response_messages(
|
||||
response_obj=response_obj,
|
||||
call_type=standard_logging_payload.get("call_type")
|
||||
))
|
||||
output_meta = OutputMeta(
|
||||
messages=self._get_response_messages(
|
||||
standard_logging_payload=standard_logging_payload,
|
||||
call_type=standard_logging_payload.get("call_type"),
|
||||
)
|
||||
)
|
||||
|
||||
error_info = self._assemble_error_info(standard_logging_payload)
|
||||
|
||||
@ -214,7 +216,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
output_tokens=float(standard_logging_payload.get("completion_tokens", 0)),
|
||||
total_tokens=float(standard_logging_payload.get("total_tokens", 0)),
|
||||
total_cost=float(standard_logging_payload.get("response_cost", 0)),
|
||||
time_to_first_token=self._get_time_to_first_token_seconds(standard_logging_payload),
|
||||
time_to_first_token=self._get_time_to_first_token_seconds(
|
||||
standard_logging_payload
|
||||
),
|
||||
)
|
||||
|
||||
payload: LLMObsPayload = LLMObsPayload(
|
||||
@ -251,27 +255,35 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
except Exception:
|
||||
pass
|
||||
return None
|
||||
|
||||
def _assemble_error_info(self, standard_logging_payload: StandardLoggingPayload) -> Optional[DDLLMObsError]:
|
||||
|
||||
def _assemble_error_info(
|
||||
self, standard_logging_payload: StandardLoggingPayload
|
||||
) -> Optional[DDLLMObsError]:
|
||||
"""
|
||||
Assemble error information for failure cases according to DD LLM Obs API spec
|
||||
"""
|
||||
# Handle error information for failure cases according to DD LLM Obs API spec
|
||||
error_info: Optional[DDLLMObsError] = None
|
||||
|
||||
|
||||
if standard_logging_payload.get("status") == "failure":
|
||||
# Try to get structured error information first
|
||||
error_information: Optional[StandardLoggingPayloadErrorInformation] = standard_logging_payload.get("error_information")
|
||||
|
||||
error_information: Optional[
|
||||
StandardLoggingPayloadErrorInformation
|
||||
] = standard_logging_payload.get("error_information")
|
||||
|
||||
if error_information:
|
||||
error_info = DDLLMObsError(
|
||||
message=error_information.get("error_message") or standard_logging_payload.get("error_str") or "Unknown error",
|
||||
message=error_information.get("error_message")
|
||||
or standard_logging_payload.get("error_str")
|
||||
or "Unknown error",
|
||||
type=error_information.get("error_class"),
|
||||
stack=error_information.get("traceback")
|
||||
stack=error_information.get("traceback"),
|
||||
)
|
||||
return error_info
|
||||
|
||||
def _get_time_to_first_token_seconds(self, standard_logging_payload: StandardLoggingPayload) -> float:
|
||||
def _get_time_to_first_token_seconds(
|
||||
self, standard_logging_payload: StandardLoggingPayload
|
||||
) -> float:
|
||||
"""
|
||||
Get the time to first token in seconds
|
||||
|
||||
@ -280,7 +292,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
For non streaming calls, CompletionStartTime is time we get the response back
|
||||
"""
|
||||
start_time: Optional[float] = standard_logging_payload.get("startTime")
|
||||
completion_start_time: Optional[float] = standard_logging_payload.get("completionStartTime")
|
||||
completion_start_time: Optional[float] = standard_logging_payload.get(
|
||||
"completionStartTime"
|
||||
)
|
||||
end_time: Optional[float] = standard_logging_payload.get("endTime")
|
||||
|
||||
if completion_start_time is not None and start_time is not None:
|
||||
@ -290,19 +304,43 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
else:
|
||||
return 0.0
|
||||
|
||||
|
||||
def _get_response_messages(
|
||||
self, response_obj: Any, call_type: Optional[str]
|
||||
self, standard_logging_payload: StandardLoggingPayload, call_type: Optional[str]
|
||||
) -> List[Any]:
|
||||
"""
|
||||
Get the messages from the response object
|
||||
|
||||
for now this handles logging /chat/completions responses
|
||||
"""
|
||||
|
||||
response_obj = standard_logging_payload.get("response")
|
||||
if response_obj is None:
|
||||
return []
|
||||
|
||||
if call_type in [CallTypes.completion.value, CallTypes.acompletion.value]:
|
||||
|
||||
# edge case: handle response_obj is a string representation of a dict
|
||||
if isinstance(response_obj, str):
|
||||
try:
|
||||
import ast
|
||||
|
||||
response_obj = ast.literal_eval(response_obj)
|
||||
except (ValueError, SyntaxError):
|
||||
try:
|
||||
# fallback to json parsing
|
||||
response_obj = json.loads(str(response_obj))
|
||||
except json.JSONDecodeError:
|
||||
return []
|
||||
|
||||
if call_type in [
|
||||
CallTypes.completion.value,
|
||||
CallTypes.acompletion.value,
|
||||
CallTypes.text_completion.value,
|
||||
CallTypes.atext_completion.value,
|
||||
CallTypes.generate_content.value,
|
||||
CallTypes.agenerate_content.value,
|
||||
CallTypes.generate_content_stream.value,
|
||||
CallTypes.agenerate_content_stream.value,
|
||||
CallTypes.anthropic_messages.value,
|
||||
]:
|
||||
try:
|
||||
# Safely extract message from response_obj, handle failure cases
|
||||
if isinstance(response_obj, dict) and "choices" in response_obj:
|
||||
@ -315,102 +353,104 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
return []
|
||||
return []
|
||||
|
||||
def _get_datadog_span_kind(self, call_type: Optional[str]) -> Literal["llm", "tool", "task", "embedding", "retrieval"]:
|
||||
def _get_datadog_span_kind(
|
||||
self, call_type: Optional[str]
|
||||
) -> Literal["llm", "tool", "task", "embedding", "retrieval"]:
|
||||
"""
|
||||
Map liteLLM call_type to appropriate DataDog LLM Observability span kind.
|
||||
|
||||
|
||||
Available DataDog span kinds: "llm", "tool", "task", "embedding", "retrieval"
|
||||
"""
|
||||
if call_type is None:
|
||||
return "llm"
|
||||
|
||||
|
||||
# Embedding operations
|
||||
if call_type in [CallTypes.embedding.value, CallTypes.aembedding.value]:
|
||||
return "embedding"
|
||||
|
||||
# LLM completion operations
|
||||
|
||||
# LLM completion operations
|
||||
if call_type in [
|
||||
CallTypes.completion.value,
|
||||
CallTypes.completion.value,
|
||||
CallTypes.acompletion.value,
|
||||
CallTypes.text_completion.value,
|
||||
CallTypes.text_completion.value,
|
||||
CallTypes.atext_completion.value,
|
||||
CallTypes.generate_content.value,
|
||||
CallTypes.generate_content.value,
|
||||
CallTypes.agenerate_content.value,
|
||||
CallTypes.generate_content_stream.value,
|
||||
CallTypes.generate_content_stream.value,
|
||||
CallTypes.agenerate_content_stream.value,
|
||||
CallTypes.anthropic_messages.value
|
||||
CallTypes.anthropic_messages.value,
|
||||
]:
|
||||
return "llm"
|
||||
|
||||
|
||||
# Tool operations
|
||||
if call_type in [CallTypes.call_mcp_tool.value]:
|
||||
return "tool"
|
||||
|
||||
|
||||
# Retrieval operations
|
||||
if call_type in [
|
||||
CallTypes.get_assistants.value,
|
||||
CallTypes.get_assistants.value,
|
||||
CallTypes.aget_assistants.value,
|
||||
CallTypes.get_thread.value,
|
||||
CallTypes.get_thread.value,
|
||||
CallTypes.aget_thread.value,
|
||||
CallTypes.get_messages.value,
|
||||
CallTypes.get_messages.value,
|
||||
CallTypes.aget_messages.value,
|
||||
CallTypes.afile_retrieve.value,
|
||||
CallTypes.afile_retrieve.value,
|
||||
CallTypes.file_retrieve.value,
|
||||
CallTypes.afile_list.value,
|
||||
CallTypes.afile_list.value,
|
||||
CallTypes.file_list.value,
|
||||
CallTypes.afile_content.value,
|
||||
CallTypes.afile_content.value,
|
||||
CallTypes.file_content.value,
|
||||
CallTypes.retrieve_batch.value,
|
||||
CallTypes.retrieve_batch.value,
|
||||
CallTypes.aretrieve_batch.value,
|
||||
CallTypes.retrieve_fine_tuning_job.value,
|
||||
CallTypes.retrieve_fine_tuning_job.value,
|
||||
CallTypes.aretrieve_fine_tuning_job.value,
|
||||
CallTypes.responses.value,
|
||||
CallTypes.responses.value,
|
||||
CallTypes.aresponses.value,
|
||||
CallTypes.alist_input_items.value
|
||||
CallTypes.alist_input_items.value,
|
||||
]:
|
||||
return "retrieval"
|
||||
|
||||
|
||||
# Task operations (batch, fine-tuning, file operations, etc.)
|
||||
if call_type in [
|
||||
CallTypes.create_batch.value,
|
||||
CallTypes.create_batch.value,
|
||||
CallTypes.acreate_batch.value,
|
||||
CallTypes.create_fine_tuning_job.value,
|
||||
CallTypes.create_fine_tuning_job.value,
|
||||
CallTypes.acreate_fine_tuning_job.value,
|
||||
CallTypes.cancel_fine_tuning_job.value,
|
||||
CallTypes.cancel_fine_tuning_job.value,
|
||||
CallTypes.acancel_fine_tuning_job.value,
|
||||
CallTypes.list_fine_tuning_jobs.value,
|
||||
CallTypes.list_fine_tuning_jobs.value,
|
||||
CallTypes.alist_fine_tuning_jobs.value,
|
||||
CallTypes.create_assistants.value,
|
||||
CallTypes.create_assistants.value,
|
||||
CallTypes.acreate_assistants.value,
|
||||
CallTypes.delete_assistant.value,
|
||||
CallTypes.delete_assistant.value,
|
||||
CallTypes.adelete_assistant.value,
|
||||
CallTypes.create_thread.value,
|
||||
CallTypes.create_thread.value,
|
||||
CallTypes.acreate_thread.value,
|
||||
CallTypes.add_message.value,
|
||||
CallTypes.add_message.value,
|
||||
CallTypes.a_add_message.value,
|
||||
CallTypes.run_thread.value,
|
||||
CallTypes.run_thread.value,
|
||||
CallTypes.arun_thread.value,
|
||||
CallTypes.run_thread_stream.value,
|
||||
CallTypes.run_thread_stream.value,
|
||||
CallTypes.arun_thread_stream.value,
|
||||
CallTypes.file_delete.value,
|
||||
CallTypes.file_delete.value,
|
||||
CallTypes.afile_delete.value,
|
||||
CallTypes.create_file.value,
|
||||
CallTypes.create_file.value,
|
||||
CallTypes.acreate_file.value,
|
||||
CallTypes.image_generation.value,
|
||||
CallTypes.image_generation.value,
|
||||
CallTypes.aimage_generation.value,
|
||||
CallTypes.image_edit.value,
|
||||
CallTypes.image_edit.value,
|
||||
CallTypes.aimage_edit.value,
|
||||
CallTypes.moderation.value,
|
||||
CallTypes.moderation.value,
|
||||
CallTypes.amoderation.value,
|
||||
CallTypes.transcription.value,
|
||||
CallTypes.transcription.value,
|
||||
CallTypes.atranscription.value,
|
||||
CallTypes.speech.value,
|
||||
CallTypes.speech.value,
|
||||
CallTypes.aspeech.value,
|
||||
CallTypes.rerank.value,
|
||||
CallTypes.arerank.value
|
||||
CallTypes.rerank.value,
|
||||
CallTypes.arerank.value,
|
||||
]:
|
||||
return "task"
|
||||
|
||||
|
||||
# Default fallback for unknown or passthrough operations
|
||||
return "llm"
|
||||
|
||||
@ -443,7 +483,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
"cache_hit": standard_logging_payload.get("cache_hit", "unknown"),
|
||||
"cache_key": standard_logging_payload.get("cache_key", "unknown"),
|
||||
"saved_cache_cost": standard_logging_payload.get("saved_cache_cost", 0),
|
||||
"guardrail_information": standard_logging_payload.get("guardrail_information", None),
|
||||
"guardrail_information": standard_logging_payload.get(
|
||||
"guardrail_information", None
|
||||
),
|
||||
}
|
||||
|
||||
#########################################################
|
||||
@ -452,22 +494,32 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
latency_metrics = self._get_latency_metrics(standard_logging_payload)
|
||||
_metadata.update({"latency_metrics": dict(latency_metrics)})
|
||||
|
||||
## extract tool calls and add to metadata
|
||||
tool_call_metadata = self._extract_tool_call_metadata(standard_logging_payload)
|
||||
_metadata.update(tool_call_metadata)
|
||||
|
||||
_standard_logging_metadata: dict = (
|
||||
dict(standard_logging_payload.get("metadata", {})) or {}
|
||||
)
|
||||
_metadata.update(_standard_logging_metadata)
|
||||
return _metadata
|
||||
|
||||
def _get_latency_metrics(self, standard_logging_payload: StandardLoggingPayload) -> DDLLMObsLatencyMetrics:
|
||||
def _get_latency_metrics(
|
||||
self, standard_logging_payload: StandardLoggingPayload
|
||||
) -> DDLLMObsLatencyMetrics:
|
||||
"""
|
||||
Get the latency metrics from the standard logging payload
|
||||
"""
|
||||
latency_metrics: DDLLMObsLatencyMetrics = DDLLMObsLatencyMetrics()
|
||||
# Add latency metrics to metadata
|
||||
# Time to first token (convert from seconds to milliseconds for consistency)
|
||||
time_to_first_token_seconds = self._get_time_to_first_token_seconds(standard_logging_payload)
|
||||
time_to_first_token_seconds = self._get_time_to_first_token_seconds(
|
||||
standard_logging_payload
|
||||
)
|
||||
if time_to_first_token_seconds > 0:
|
||||
latency_metrics["time_to_first_token_ms"] = time_to_first_token_seconds * 1000
|
||||
latency_metrics["time_to_first_token_ms"] = (
|
||||
time_to_first_token_seconds * 1000
|
||||
)
|
||||
|
||||
# LiteLLM overhead time
|
||||
hidden_params = standard_logging_payload.get("hidden_params", {})
|
||||
@ -476,11 +528,143 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
|
||||
latency_metrics["litellm_overhead_time_ms"] = litellm_overhead_ms
|
||||
|
||||
# Guardrail overhead latency
|
||||
guardrail_info: Optional[StandardLoggingGuardrailInformation] = standard_logging_payload.get("guardrail_information")
|
||||
guardrail_info: Optional[
|
||||
StandardLoggingGuardrailInformation
|
||||
] = standard_logging_payload.get("guardrail_information")
|
||||
if guardrail_info is not None:
|
||||
_guardrail_duration_seconds: Optional[float] = guardrail_info.get("duration")
|
||||
_guardrail_duration_seconds: Optional[float] = guardrail_info.get(
|
||||
"duration"
|
||||
)
|
||||
if _guardrail_duration_seconds is not None:
|
||||
# Convert from seconds to milliseconds for consistency
|
||||
latency_metrics["guardrail_overhead_time_ms"] = _guardrail_duration_seconds * 1000
|
||||
|
||||
return latency_metrics
|
||||
latency_metrics["guardrail_overhead_time_ms"] = (
|
||||
_guardrail_duration_seconds * 1000
|
||||
)
|
||||
|
||||
return latency_metrics
|
||||
|
||||
def _process_input_messages_preserving_tool_calls(
|
||||
self, messages: List[Any]
|
||||
) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Process input messages while preserving tool_calls and tool message types.
|
||||
|
||||
This bypasses the lossy string conversion when tool calls are present,
|
||||
allowing complex nested tool_calls objects to be preserved for Datadog.
|
||||
"""
|
||||
processed = []
|
||||
for msg in messages:
|
||||
if isinstance(msg, dict):
|
||||
# Preserve messages with tool_calls or tool role as-is
|
||||
if "tool_calls" in msg or msg.get("role") == "tool":
|
||||
processed.append(msg)
|
||||
else:
|
||||
# For regular messages, still apply string conversion
|
||||
converted = (
|
||||
handle_any_messages_to_chat_completion_str_messages_conversion(
|
||||
[msg]
|
||||
)
|
||||
)
|
||||
processed.extend(converted)
|
||||
else:
|
||||
# For non-dict messages, apply string conversion
|
||||
converted = (
|
||||
handle_any_messages_to_chat_completion_str_messages_conversion(
|
||||
[msg]
|
||||
)
|
||||
)
|
||||
processed.extend(converted)
|
||||
return processed
|
||||
|
||||
@staticmethod
|
||||
def _tool_calls_kv_pair(tool_calls: List[Dict[str, Any]]) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract tool call information into key-value pairs for Datadog metadata.
|
||||
|
||||
Similar to OpenTelemetry's implementation but adapted for Datadog's format.
|
||||
"""
|
||||
kv_pairs: Dict[str, Any] = {}
|
||||
for idx, tool_call in enumerate(tool_calls):
|
||||
try:
|
||||
# Extract tool call ID
|
||||
tool_id = tool_call.get("id")
|
||||
if tool_id:
|
||||
kv_pairs[f"tool_calls.{idx}.id"] = tool_id
|
||||
|
||||
# Extract tool call type
|
||||
tool_type = tool_call.get("type")
|
||||
if tool_type:
|
||||
kv_pairs[f"tool_calls.{idx}.type"] = tool_type
|
||||
|
||||
# Extract function information
|
||||
function = tool_call.get("function")
|
||||
if function:
|
||||
function_name = function.get("name")
|
||||
if function_name:
|
||||
kv_pairs[f"tool_calls.{idx}.function.name"] = function_name
|
||||
|
||||
function_arguments = function.get("arguments")
|
||||
if function_arguments:
|
||||
# Store arguments as JSON string for Datadog
|
||||
if isinstance(function_arguments, str):
|
||||
kv_pairs[
|
||||
f"tool_calls.{idx}.function.arguments"
|
||||
] = function_arguments
|
||||
else:
|
||||
import json
|
||||
|
||||
kv_pairs[
|
||||
f"tool_calls.{idx}.function.arguments"
|
||||
] = json.dumps(function_arguments)
|
||||
except (KeyError, TypeError, ValueError) as e:
|
||||
verbose_logger.debug(
|
||||
f"DataDogLLMObs: Error processing tool call {idx}: {str(e)}"
|
||||
)
|
||||
continue
|
||||
|
||||
return kv_pairs
|
||||
|
||||
def _extract_tool_call_metadata(
|
||||
self, standard_logging_payload: StandardLoggingPayload
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
Extract tool call information from both input messages and response for Datadog metadata.
|
||||
"""
|
||||
tool_call_metadata: Dict[str, Any] = {}
|
||||
|
||||
try:
|
||||
# Extract tool calls from input messages
|
||||
messages = standard_logging_payload.get("messages", [])
|
||||
if messages and isinstance(messages, list):
|
||||
for message in messages:
|
||||
if isinstance(message, dict) and "tool_calls" in message:
|
||||
tool_calls = message.get("tool_calls")
|
||||
if tool_calls:
|
||||
input_tool_calls_kv = self._tool_calls_kv_pair(tool_calls)
|
||||
# Prefix with "input_" to distinguish from response tool calls
|
||||
for key, value in input_tool_calls_kv.items():
|
||||
tool_call_metadata[f"input_{key}"] = value
|
||||
|
||||
# Extract tool calls from response
|
||||
response_obj = standard_logging_payload.get("response")
|
||||
if response_obj and isinstance(response_obj, dict):
|
||||
choices = response_obj.get("choices", [])
|
||||
for choice in choices:
|
||||
if isinstance(choice, dict):
|
||||
message = choice.get("message")
|
||||
if message and isinstance(message, dict):
|
||||
tool_calls = message.get("tool_calls")
|
||||
if tool_calls:
|
||||
response_tool_calls_kv = self._tool_calls_kv_pair(
|
||||
tool_calls
|
||||
)
|
||||
# Prefix with "output_" to distinguish from input tool calls
|
||||
for key, value in response_tool_calls_kv.items():
|
||||
tool_call_metadata[f"output_{key}"] = value
|
||||
|
||||
except Exception as e:
|
||||
verbose_logger.debug(
|
||||
f"DataDogLLMObs: Error extracting tool call metadata: {str(e)}"
|
||||
)
|
||||
|
||||
return tool_call_metadata
|
||||
|
||||
@ -4,9 +4,10 @@ Humanloop integration
|
||||
https://humanloop.com/
|
||||
"""
|
||||
|
||||
from typing import Any, Dict, List, Optional, Tuple, TypedDict, Union, cast
|
||||
from typing import Any, Dict, List, Optional, Tuple, Union, cast
|
||||
|
||||
import httpx
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
import litellm
|
||||
from litellm.caching import DualCache
|
||||
|
||||
@ -1,5 +1,7 @@
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, Dict, List, Optional, Tuple, TypedDict
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
from litellm.types.llms.openai import AllMessageValues
|
||||
from litellm.types.utils import StandardCallbackDynamicParams
|
||||
|
||||
@ -374,6 +374,8 @@ def get_llm_provider( # noqa: PLR0915
|
||||
custom_llm_provider = "oci"
|
||||
elif model.startswith("compactifai/"):
|
||||
custom_llm_provider = "compactifai"
|
||||
elif model.startswith("ovhcloud/"):
|
||||
custom_llm_provider = "ovhcloud"
|
||||
if not custom_llm_provider:
|
||||
if litellm.suppress_debug_info is False:
|
||||
print() # noqa
|
||||
|
||||
@ -1,6 +1,5 @@
|
||||
import asyncio
|
||||
import json
|
||||
import re
|
||||
import time
|
||||
import traceback
|
||||
import uuid
|
||||
@ -9,6 +8,9 @@ from typing import Dict, Iterable, List, Literal, Optional, Tuple, Union
|
||||
import litellm
|
||||
from litellm._logging import verbose_logger
|
||||
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
|
||||
from litellm.litellm_core_utils.prompt_templates.common_utils import (
|
||||
_extract_reasoning_content,
|
||||
)
|
||||
from litellm.types.llms.databricks import DatabricksTool
|
||||
from litellm.types.llms.openai import (
|
||||
ChatCompletionThinkingBlock,
|
||||
@ -274,49 +276,6 @@ def _handle_invalid_parallel_tool_calls(
|
||||
return tool_calls
|
||||
|
||||
|
||||
def _parse_content_for_reasoning(
|
||||
message_text: Optional[str],
|
||||
) -> Tuple[Optional[str], Optional[str]]:
|
||||
"""
|
||||
Parse the content for reasoning
|
||||
|
||||
Returns:
|
||||
- reasoning_content: The content of the reasoning
|
||||
- content: The content of the message
|
||||
"""
|
||||
if not message_text:
|
||||
return None, message_text
|
||||
|
||||
reasoning_match = re.match(
|
||||
r"<(?:think|thinking)>(.*?)</(?:think|thinking)>(.*)", message_text, re.DOTALL
|
||||
)
|
||||
|
||||
if reasoning_match:
|
||||
return reasoning_match.group(1), reasoning_match.group(2)
|
||||
|
||||
return None, message_text
|
||||
|
||||
|
||||
def _extract_reasoning_content(message: dict) -> Tuple[Optional[str], Optional[str]]:
|
||||
"""
|
||||
Extract reasoning content and main content from a message.
|
||||
|
||||
Args:
|
||||
message (dict): The message dictionary that may contain reasoning_content
|
||||
|
||||
Returns:
|
||||
tuple[Optional[str], Optional[str]]: A tuple of (reasoning_content, content)
|
||||
"""
|
||||
message_content = message.get("content")
|
||||
if "reasoning_content" in message:
|
||||
return message["reasoning_content"], message["content"]
|
||||
elif "reasoning" in message:
|
||||
return message["reasoning"], message["content"]
|
||||
elif isinstance(message_content, str):
|
||||
return _parse_content_for_reasoning(message_content)
|
||||
return None, message_content
|
||||
|
||||
|
||||
class LiteLLMResponseObjectHandler:
|
||||
@staticmethod
|
||||
def convert_to_image_response(
|
||||
|
||||
@ -1,7 +1,9 @@
|
||||
import asyncio
|
||||
import contextlib
|
||||
import contextvars
|
||||
from typing import Coroutine, Optional, TypedDict
|
||||
from typing import Coroutine, Optional
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
from litellm._logging import verbose_logger
|
||||
|
||||
|
||||
@ -14,6 +14,7 @@ from typing import (
|
||||
Literal,
|
||||
Mapping,
|
||||
Optional,
|
||||
Tuple,
|
||||
Union,
|
||||
cast,
|
||||
)
|
||||
@ -869,3 +870,63 @@ def convert_prefix_message_to_non_prefix_messages(
|
||||
else:
|
||||
new_messages.append(message)
|
||||
return new_messages
|
||||
|
||||
|
||||
def _extract_reasoning_content(message: dict) -> Tuple[Optional[str], Optional[str]]:
|
||||
"""
|
||||
Extract reasoning content and main content from a message.
|
||||
|
||||
Args:
|
||||
message (dict): The message dictionary that may contain reasoning_content
|
||||
|
||||
Returns:
|
||||
tuple[Optional[str], Optional[str]]: A tuple of (reasoning_content, content)
|
||||
"""
|
||||
message_content = message.get("content")
|
||||
if "reasoning_content" in message:
|
||||
return message["reasoning_content"], message["content"]
|
||||
elif "reasoning" in message:
|
||||
return message["reasoning"], message["content"]
|
||||
elif isinstance(message_content, str):
|
||||
return _parse_content_for_reasoning(message_content)
|
||||
return None, message_content
|
||||
|
||||
|
||||
def _parse_content_for_reasoning(
|
||||
message_text: Optional[str],
|
||||
) -> Tuple[Optional[str], Optional[str]]:
|
||||
"""
|
||||
Parse the content for reasoning
|
||||
|
||||
Returns:
|
||||
- reasoning_content: The content of the reasoning
|
||||
- content: The content of the message
|
||||
"""
|
||||
if not message_text:
|
||||
return None, message_text
|
||||
|
||||
reasoning_match = re.match(
|
||||
r"<(?:think|thinking)>(.*?)</(?:think|thinking)>(.*)", message_text, re.DOTALL
|
||||
)
|
||||
|
||||
if reasoning_match:
|
||||
return reasoning_match.group(1), reasoning_match.group(2)
|
||||
|
||||
return None, message_text
|
||||
|
||||
|
||||
def extract_images_from_message(message: AllMessageValues) -> List[str]:
|
||||
"""
|
||||
Extract images from a message
|
||||
"""
|
||||
images = []
|
||||
message_content = message.get("content")
|
||||
if isinstance(message_content, list):
|
||||
for m in message_content:
|
||||
image_url = m.get("image_url")
|
||||
if image_url:
|
||||
if isinstance(image_url, str):
|
||||
images.append(image_url)
|
||||
elif isinstance(image_url, dict) and "url" in image_url:
|
||||
images.append(image_url["url"])
|
||||
return images
|
||||
|
||||
@ -1024,6 +1024,8 @@ class CustomStreamWrapper:
|
||||
return
|
||||
|
||||
def chunk_creator(self, chunk: Any): # type: ignore # noqa: PLR0915
|
||||
if hasattr(chunk, 'id'):
|
||||
self.response_id = chunk.id
|
||||
model_response = self.model_response_creator()
|
||||
response_obj: Dict[str, Any] = {}
|
||||
try:
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
from abc import ABC, abstractmethod
|
||||
from dataclasses import dataclass
|
||||
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
|
||||
from typing import TYPE_CHECKING, Any, List, Optional, Union
|
||||
|
||||
import httpx
|
||||
|
||||
@ -23,12 +23,13 @@ else:
|
||||
class AudioTranscriptionRequestData:
|
||||
"""
|
||||
Structured data for audio transcription requests.
|
||||
|
||||
|
||||
Attributes:
|
||||
data: The request data (form data for multipart, json data for regular requests)
|
||||
files: Optional files dict for multipart form data
|
||||
content_type: Optional content type override
|
||||
"""
|
||||
|
||||
data: Union[dict, bytes]
|
||||
files: Optional[dict] = None
|
||||
content_type: Optional[str] = None
|
||||
@ -66,13 +67,11 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
|
||||
audio_file: FileTypes,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
) -> Union[AudioTranscriptionRequestData, Dict]:
|
||||
) -> AudioTranscriptionRequestData:
|
||||
raise NotImplementedError(
|
||||
"AudioTranscriptionConfig needs a request transformation for audio transcription models"
|
||||
)
|
||||
|
||||
|
||||
|
||||
def transform_audio_transcription_response(
|
||||
self,
|
||||
raw_response: httpx.Response,
|
||||
@ -110,7 +109,6 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
|
||||
raise NotImplementedError(
|
||||
"AudioTranscriptionConfig does not need a response transformation for audio transcription models"
|
||||
)
|
||||
|
||||
|
||||
def get_provider_specific_params(
|
||||
self,
|
||||
@ -141,7 +139,7 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
|
||||
provider_specific_params[key] = value
|
||||
|
||||
return provider_specific_params
|
||||
|
||||
|
||||
def _should_exclude_param(
|
||||
self,
|
||||
param_name: str,
|
||||
|
||||
@ -14,7 +14,7 @@ from litellm._logging import verbose_logger
|
||||
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
|
||||
from litellm.litellm_core_utils.core_helpers import map_finish_reason
|
||||
from litellm.litellm_core_utils.litellm_logging import Logging
|
||||
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
|
||||
from litellm.litellm_core_utils.prompt_templates.common_utils import (
|
||||
_parse_content_for_reasoning,
|
||||
)
|
||||
from litellm.litellm_core_utils.prompt_templates.factory import (
|
||||
@ -397,7 +397,11 @@ class AmazonConverseConfig(BaseConfig):
|
||||
for param, value in non_default_params.items():
|
||||
if param == "response_format" and isinstance(value, dict):
|
||||
optional_params = self._translate_response_format_param(
|
||||
value=value, model=model, optional_params=optional_params, non_default_params=non_default_params, is_thinking_enabled=is_thinking_enabled
|
||||
value=value,
|
||||
model=model,
|
||||
optional_params=optional_params,
|
||||
non_default_params=non_default_params,
|
||||
is_thinking_enabled=is_thinking_enabled,
|
||||
)
|
||||
if param == "max_tokens" or param == "max_completion_tokens":
|
||||
optional_params["maxTokens"] = value
|
||||
@ -446,11 +450,11 @@ class AmazonConverseConfig(BaseConfig):
|
||||
)
|
||||
|
||||
return optional_params
|
||||
|
||||
|
||||
def _translate_response_format_param(
|
||||
self,
|
||||
value: dict,
|
||||
model: str,
|
||||
self,
|
||||
value: dict,
|
||||
model: str,
|
||||
optional_params: dict,
|
||||
non_default_params: dict,
|
||||
is_thinking_enabled: bool,
|
||||
@ -504,7 +508,7 @@ class AmazonConverseConfig(BaseConfig):
|
||||
optional_params["json_mode"] = True
|
||||
if non_default_params.get("stream", False) is True:
|
||||
optional_params["fake_stream"] = True
|
||||
|
||||
|
||||
return optional_params
|
||||
|
||||
def update_optional_params_with_thinking_tokens(
|
||||
|
||||
@ -3,7 +3,7 @@ from typing import Any, List, Optional, cast
|
||||
from httpx import Response
|
||||
|
||||
from litellm import verbose_logger
|
||||
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
|
||||
from litellm.litellm_core_utils.prompt_templates.common_utils import (
|
||||
_parse_content_for_reasoning,
|
||||
)
|
||||
from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
|
||||
|
||||
@ -118,7 +118,6 @@ class BaseLLMHTTPHandler:
|
||||
response: Optional[httpx.Response] = None
|
||||
for i in range(max(max_retry_on_unprocessable_entity_error, 1)):
|
||||
try:
|
||||
|
||||
response = await async_httpx_client.post(
|
||||
url=api_base,
|
||||
headers=headers,
|
||||
@ -2221,7 +2220,9 @@ class BaseLLMHTTPHandler:
|
||||
|
||||
if isinstance(transformed_request, dict) and "method" in transformed_request:
|
||||
# Handle pre-signed requests (e.g., from Bedrock S3 uploads)
|
||||
upload_response = getattr(sync_httpx_client, transformed_request["method"].lower())(
|
||||
upload_response = getattr(
|
||||
sync_httpx_client, transformed_request["method"].lower()
|
||||
)(
|
||||
url=transformed_request["url"],
|
||||
headers=transformed_request["headers"],
|
||||
data=transformed_request["data"],
|
||||
@ -2233,8 +2234,8 @@ class BaseLLMHTTPHandler:
|
||||
# Handle traditional file uploads
|
||||
# Ensure transformed_request is a string for httpx compatibility
|
||||
if isinstance(transformed_request, bytes):
|
||||
transformed_request = transformed_request.decode('utf-8')
|
||||
|
||||
transformed_request = transformed_request.decode("utf-8")
|
||||
|
||||
# Use the HTTP method specified by the provider config
|
||||
http_method = provider_config.file_upload_http_method.upper()
|
||||
if http_method == "PUT":
|
||||
@ -2314,7 +2315,7 @@ class BaseLLMHTTPHandler:
|
||||
)
|
||||
else:
|
||||
async_httpx_client = client
|
||||
|
||||
|
||||
#########################################################
|
||||
# Debug Logging
|
||||
#########################################################
|
||||
@ -2330,7 +2331,9 @@ class BaseLLMHTTPHandler:
|
||||
|
||||
if isinstance(transformed_request, dict) and "method" in transformed_request:
|
||||
# Handle pre-signed requests (e.g., from Bedrock S3 uploads)
|
||||
upload_response = await getattr(async_httpx_client, transformed_request["method"].lower())(
|
||||
upload_response = await getattr(
|
||||
async_httpx_client, transformed_request["method"].lower()
|
||||
)(
|
||||
url=transformed_request["url"],
|
||||
headers=transformed_request["headers"],
|
||||
data=transformed_request["data"],
|
||||
@ -2342,8 +2345,8 @@ class BaseLLMHTTPHandler:
|
||||
# Handle traditional file uploads
|
||||
# Ensure transformed_request is a string for httpx compatibility
|
||||
if isinstance(transformed_request, bytes):
|
||||
transformed_request = transformed_request.decode('utf-8')
|
||||
|
||||
transformed_request = transformed_request.decode("utf-8")
|
||||
|
||||
# Use the HTTP method specified by the provider config
|
||||
http_method = provider_config.file_upload_http_method.upper()
|
||||
if http_method == "PUT":
|
||||
@ -2468,9 +2471,14 @@ class BaseLLMHTTPHandler:
|
||||
sync_httpx_client = client
|
||||
|
||||
try:
|
||||
if isinstance(transformed_request, dict) and "method" in transformed_request:
|
||||
if (
|
||||
isinstance(transformed_request, dict)
|
||||
and "method" in transformed_request
|
||||
):
|
||||
# Handle pre-signed requests (e.g., from Bedrock with AWS auth)
|
||||
batch_response = getattr(sync_httpx_client, transformed_request["method"].lower())(
|
||||
batch_response = getattr(
|
||||
sync_httpx_client, transformed_request["method"].lower()
|
||||
)(
|
||||
url=transformed_request["url"],
|
||||
headers=transformed_request["headers"],
|
||||
data=transformed_request["data"],
|
||||
@ -2500,8 +2508,11 @@ class BaseLLMHTTPHandler:
|
||||
)
|
||||
|
||||
# Store original request for response transformation
|
||||
litellm_params_with_request = {**litellm_params, "original_batch_request": create_batch_data}
|
||||
|
||||
litellm_params_with_request = {
|
||||
**litellm_params,
|
||||
"original_batch_request": create_batch_data,
|
||||
}
|
||||
|
||||
return provider_config.transform_create_batch_response(
|
||||
model=model,
|
||||
raw_response=batch_response,
|
||||
@ -2531,7 +2542,7 @@ class BaseLLMHTTPHandler:
|
||||
)
|
||||
else:
|
||||
async_httpx_client = client
|
||||
|
||||
|
||||
#########################################################
|
||||
# Debug Logging
|
||||
#########################################################
|
||||
@ -2546,9 +2557,14 @@ class BaseLLMHTTPHandler:
|
||||
)
|
||||
|
||||
try:
|
||||
if isinstance(transformed_request, dict) and "method" in transformed_request:
|
||||
if (
|
||||
isinstance(transformed_request, dict)
|
||||
and "method" in transformed_request
|
||||
):
|
||||
# Handle pre-signed requests (e.g., from Bedrock with AWS auth)
|
||||
batch_response = await getattr(async_httpx_client, transformed_request["method"].lower())(
|
||||
batch_response = await getattr(
|
||||
async_httpx_client, transformed_request["method"].lower()
|
||||
)(
|
||||
url=transformed_request["url"],
|
||||
headers=transformed_request["headers"],
|
||||
data=transformed_request["data"],
|
||||
@ -2578,8 +2594,11 @@ class BaseLLMHTTPHandler:
|
||||
)
|
||||
|
||||
# Store original request for response transformation (for async version)
|
||||
litellm_params_with_request = {**litellm_params, "original_batch_request": create_batch_data or {}}
|
||||
|
||||
litellm_params_with_request = {
|
||||
**litellm_params,
|
||||
"original_batch_request": create_batch_data or {},
|
||||
}
|
||||
|
||||
return provider_config.transform_create_batch_response(
|
||||
model=model,
|
||||
raw_response=batch_response,
|
||||
|
||||
@ -1,10 +1,13 @@
|
||||
from typing import List, Optional
|
||||
from typing import List, Optional, cast
|
||||
|
||||
from litellm.litellm_core_utils.prompt_templates.factory import (
|
||||
convert_generic_image_chunk_to_openai_image_obj,
|
||||
convert_to_anthropic_image_obj,
|
||||
)
|
||||
from litellm.types.llms.openai import AllMessageValues
|
||||
from litellm.litellm_core_utils.prompt_templates.image_handling import (
|
||||
convert_url_to_base64,
|
||||
)
|
||||
from litellm.types.llms.openai import AllMessageValues, ChatCompletionFileObject
|
||||
from litellm.types.llms.vertex_ai import ContentType, PartType
|
||||
from litellm.utils import supports_reasoning
|
||||
|
||||
@ -99,7 +102,8 @@ class GoogleAIStudioGeminiConfig(VertexGeminiConfig):
|
||||
self, messages: List[AllMessageValues]
|
||||
) -> List[ContentType]:
|
||||
"""
|
||||
Google AI Studio Gemini does not support image urls in messages.
|
||||
Google AI Studio Gemini does not support HTTP/HTTPS URLs for files.
|
||||
Convert them to base64 data instead.
|
||||
"""
|
||||
for message in messages:
|
||||
_message_content = message.get("content")
|
||||
@ -124,4 +128,16 @@ class GoogleAIStudioGeminiConfig(VertexGeminiConfig):
|
||||
image_obj
|
||||
)
|
||||
)
|
||||
elif element.get("type") == "file":
|
||||
file_element = cast(ChatCompletionFileObject, element)
|
||||
file_id = file_element["file"].get("file_id")
|
||||
if file_id and ("http://" in file_id or "https://" in file_id):
|
||||
# Convert HTTP/HTTPS file URL to base64 data
|
||||
try:
|
||||
base64_data = convert_url_to_base64(file_id)
|
||||
file_element["file"]["file_data"] = base64_data # type: ignore
|
||||
file_element["file"].pop("file_id", None) # type: ignore
|
||||
except Exception:
|
||||
# If conversion fails, leave as is and let the API handle it
|
||||
pass
|
||||
return _gemini_convert_messages_with_history(messages=messages)
|
||||
|
||||
72
litellm/llms/hosted_vllm/transcriptions/transformation.py
Normal file
72
litellm/llms/hosted_vllm/transcriptions/transformation.py
Normal file
@ -0,0 +1,72 @@
|
||||
"""
|
||||
Transformation logic for Hosted VLLM rerank
|
||||
"""
|
||||
|
||||
from typing import Optional, Union
|
||||
|
||||
import httpx
|
||||
|
||||
from litellm.llms.base_llm.audio_transcription.transformation import (
|
||||
AudioTranscriptionRequestData,
|
||||
)
|
||||
from litellm.llms.base_llm.chat.transformation import BaseLLMException
|
||||
from litellm.llms.openai.transcriptions.whisper_transformation import (
|
||||
OpenAIWhisperAudioTranscriptionConfig,
|
||||
)
|
||||
from litellm.types.utils import FileTypes
|
||||
|
||||
|
||||
class HostedVLLMAudioTranscriptionError(BaseLLMException):
|
||||
def __init__(
|
||||
self,
|
||||
status_code: int,
|
||||
message: str,
|
||||
headers: Optional[Union[dict, httpx.Headers]] = None,
|
||||
):
|
||||
super().__init__(status_code=status_code, message=message, headers=headers)
|
||||
|
||||
|
||||
class HostedVLLMAudioTranscriptionConfig(OpenAIWhisperAudioTranscriptionConfig):
|
||||
def __init__(self) -> None:
|
||||
pass
|
||||
|
||||
def get_complete_url(
|
||||
self,
|
||||
api_base: Optional[str],
|
||||
api_key: Optional[str],
|
||||
model: str,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
stream: Optional[bool] = None,
|
||||
) -> str:
|
||||
if api_base:
|
||||
# Remove trailing slashes and ensure clean base URL
|
||||
api_base = api_base.rstrip("/")
|
||||
if not api_base.endswith("/v1/audio/transcriptions"):
|
||||
api_base = f"{api_base}/v1/audio/transcriptions"
|
||||
return api_base
|
||||
raise ValueError("api_base must be provided for Hosted VLLM rerank")
|
||||
|
||||
def transform_audio_transcription_request(
|
||||
self,
|
||||
model: str,
|
||||
audio_file: FileTypes,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
) -> AudioTranscriptionRequestData:
|
||||
"""
|
||||
Transform the audio transcription request
|
||||
"""
|
||||
|
||||
data = {"model": model, "file": audio_file, **optional_params}
|
||||
|
||||
if "response_format" not in data or (
|
||||
data["response_format"] == "text" or data["response_format"] == "json"
|
||||
):
|
||||
data["response_format"] = (
|
||||
"verbose_json" # ensures 'duration' is received - used for cost calculation
|
||||
)
|
||||
|
||||
return AudioTranscriptionRequestData(
|
||||
data=data,
|
||||
)
|
||||
@ -1,8 +1,9 @@
|
||||
import os
|
||||
import uuid
|
||||
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, TypedDict, Union
|
||||
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
|
||||
|
||||
import httpx
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
import litellm
|
||||
from litellm.llms.base_llm.chat.transformation import BaseLLMException
|
||||
|
||||
@ -15,8 +15,8 @@ class LMStudioChatConfig(OpenAIGPTConfig):
|
||||
) -> Tuple[Optional[str], Optional[str]]:
|
||||
api_base = api_base or get_secret_str("LM_STUDIO_API_BASE") # type: ignore
|
||||
dynamic_api_key = (
|
||||
api_key or get_secret_str("LM_STUDIO_API_KEY") or " "
|
||||
) # vllm does not require an api key
|
||||
api_key or get_secret_str("LM_STUDIO_API_KEY") or "fake-api-key"
|
||||
) # LM Studio does not require an api key, but OpenAI client requires non-None value
|
||||
return api_base, dynamic_api_key
|
||||
|
||||
def map_openai_params(
|
||||
|
||||
@ -16,9 +16,18 @@ from httpx._models import Headers, Response
|
||||
from pydantic import BaseModel
|
||||
|
||||
import litellm
|
||||
from litellm.litellm_core_utils.prompt_templates.common_utils import (
|
||||
_extract_reasoning_content,
|
||||
convert_content_list_to_str,
|
||||
extract_images_from_message,
|
||||
)
|
||||
from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
|
||||
from litellm.llms.base_llm.chat.transformation import BaseConfig, BaseLLMException
|
||||
from litellm.types.llms.ollama import OllamaToolCall, OllamaToolCallFunction
|
||||
from litellm.types.llms.ollama import (
|
||||
OllamaChatCompletionMessage,
|
||||
OllamaToolCall,
|
||||
OllamaToolCallFunction,
|
||||
)
|
||||
from litellm.types.llms.openai import (
|
||||
AllMessageValues,
|
||||
ChatCompletionAssistantToolCall,
|
||||
@ -299,7 +308,23 @@ class OllamaChatConfig(BaseConfig):
|
||||
)
|
||||
new_tools.append(ollama_tool_call)
|
||||
cast(dict, m)["tool_calls"] = new_tools
|
||||
new_messages.append(m)
|
||||
reasoning_content, parsed_content = _extract_reasoning_content(
|
||||
cast(dict, m)
|
||||
)
|
||||
content_str = convert_content_list_to_str(cast(AllMessageValues, m))
|
||||
images = extract_images_from_message(cast(AllMessageValues, m))
|
||||
|
||||
ollama_message = OllamaChatCompletionMessage(
|
||||
role=cast(str, m.get("role")),
|
||||
)
|
||||
if reasoning_content is not None:
|
||||
ollama_message["thinking"] = reasoning_content
|
||||
if content_str is not None:
|
||||
ollama_message["content"] = content_str
|
||||
if images is not None:
|
||||
ollama_message["images"] = images
|
||||
|
||||
new_messages.append(ollama_message)
|
||||
|
||||
# Load Config
|
||||
config = self.get_config()
|
||||
@ -361,7 +386,7 @@ class OllamaChatConfig(BaseConfig):
|
||||
del response_json_message["thinking"]
|
||||
elif response_json_message.get("content") is not None:
|
||||
# parse reasoning content from content
|
||||
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
|
||||
from litellm.litellm_core_utils.prompt_templates.common_utils import (
|
||||
_parse_content_for_reasoning,
|
||||
)
|
||||
|
||||
|
||||
@ -229,7 +229,7 @@ class OllamaConfig(BaseConfig):
|
||||
model = model.split("/", 1)[1]
|
||||
api_base = get_secret_str("OLLAMA_API_BASE") or "http://localhost:11434"
|
||||
api_key = self.get_api_key()
|
||||
headers = { "Authorization": f"Bearer {api_key}" } if api_key else {}
|
||||
headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
|
||||
|
||||
try:
|
||||
response = litellm.module_level_client.post(
|
||||
@ -279,7 +279,7 @@ class OllamaConfig(BaseConfig):
|
||||
api_key: Optional[str] = None,
|
||||
json_mode: Optional[bool] = None,
|
||||
) -> ModelResponse:
|
||||
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
|
||||
from litellm.litellm_core_utils.prompt_templates.common_utils import (
|
||||
_parse_content_for_reasoning,
|
||||
)
|
||||
|
||||
|
||||
@ -1,5 +1,8 @@
|
||||
from typing import List
|
||||
|
||||
from litellm.llms.base_llm.audio_transcription.transformation import (
|
||||
AudioTranscriptionRequestData,
|
||||
)
|
||||
from litellm.types.llms.openai import OpenAIAudioTranscriptionOptionalParams
|
||||
from litellm.types.utils import FileTypes
|
||||
|
||||
@ -27,8 +30,12 @@ class OpenAIGPTAudioTranscriptionConfig(OpenAIWhisperAudioTranscriptionConfig):
|
||||
audio_file: FileTypes,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
) -> dict:
|
||||
) -> AudioTranscriptionRequestData:
|
||||
"""
|
||||
Transform the audio transcription request
|
||||
"""
|
||||
return {"model": model, "file": audio_file, **optional_params}
|
||||
data = {"model": model, "file": audio_file, **optional_params}
|
||||
|
||||
return AudioTranscriptionRequestData(
|
||||
data=data,
|
||||
)
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
from typing import Optional, Union
|
||||
from typing import Optional, Union, cast
|
||||
|
||||
import httpx
|
||||
from openai import AsyncOpenAI, OpenAI
|
||||
@ -34,6 +34,7 @@ class OpenAIAudioTranscription(OpenAIChatCompletion):
|
||||
- call openai_aclient.audio.transcriptions.create by default
|
||||
"""
|
||||
try:
|
||||
|
||||
raw_response = (
|
||||
await openai_aclient.audio.transcriptions.with_raw_response.create(
|
||||
**data, timeout=timeout
|
||||
@ -93,15 +94,14 @@ class OpenAIAudioTranscription(OpenAIChatCompletion):
|
||||
Handle audio transcription request
|
||||
"""
|
||||
if provider_config is not None:
|
||||
data = provider_config.transform_audio_transcription_request(
|
||||
transformed_data = provider_config.transform_audio_transcription_request(
|
||||
model=model,
|
||||
audio_file=audio_file,
|
||||
optional_params=optional_params,
|
||||
litellm_params=litellm_params,
|
||||
)
|
||||
|
||||
if not isinstance(data, dict):
|
||||
raise ValueError("OpenAI transformation route requires a dict")
|
||||
data = cast(dict, transformed_data.data)
|
||||
else:
|
||||
data = {"model": model, "file": audio_file, **optional_params}
|
||||
|
||||
|
||||
@ -1,8 +1,9 @@
|
||||
from typing import List, Optional, Union
|
||||
|
||||
from httpx import Headers
|
||||
from httpx import Headers, Response
|
||||
|
||||
from litellm.llms.base_llm.audio_transcription.transformation import (
|
||||
AudioTranscriptionRequestData,
|
||||
BaseAudioTranscriptionConfig,
|
||||
)
|
||||
from litellm.llms.base_llm.chat.transformation import BaseLLMException
|
||||
@ -11,12 +12,40 @@ from litellm.types.llms.openai import (
|
||||
AllMessageValues,
|
||||
OpenAIAudioTranscriptionOptionalParams,
|
||||
)
|
||||
from litellm.types.utils import FileTypes
|
||||
from litellm.types.utils import FileTypes, TranscriptionResponse
|
||||
|
||||
from ..common_utils import OpenAIError
|
||||
|
||||
|
||||
class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
|
||||
def get_complete_url(
|
||||
self,
|
||||
api_base: Optional[str],
|
||||
api_key: Optional[str],
|
||||
model: str,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
stream: Optional[bool] = None,
|
||||
) -> str:
|
||||
"""
|
||||
OPTIONAL
|
||||
|
||||
Get the complete url for the request
|
||||
|
||||
Some providers need `model` in `api_base`
|
||||
"""
|
||||
## get the api base, attach the endpoint - v1/audio/transcriptions
|
||||
# strip trailing slash if present
|
||||
api_base = api_base.rstrip("/") if api_base else ""
|
||||
|
||||
# if endswith "/v1"
|
||||
if api_base and api_base.endswith("/v1"):
|
||||
api_base = f"{api_base}/audio/transcriptions"
|
||||
else:
|
||||
api_base = f"{api_base}/v1/audio/transcriptions"
|
||||
|
||||
return api_base or ""
|
||||
|
||||
def get_supported_openai_params(
|
||||
self, model: str
|
||||
) -> List[OpenAIAudioTranscriptionOptionalParams]:
|
||||
@ -72,21 +101,22 @@ class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
|
||||
audio_file: FileTypes,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
) -> dict:
|
||||
) -> AudioTranscriptionRequestData:
|
||||
"""
|
||||
Transform the audio transcription request
|
||||
"""
|
||||
|
||||
data = {"model": model, "file": audio_file, **optional_params}
|
||||
|
||||
if "response_format" not in data or (
|
||||
data["response_format"] == "text" or data["response_format"] == "json"
|
||||
):
|
||||
data[
|
||||
"response_format"
|
||||
] = "verbose_json" # ensures 'duration' is received - used for cost calculation
|
||||
data["response_format"] = (
|
||||
"verbose_json" # ensures 'duration' is received - used for cost calculation
|
||||
)
|
||||
|
||||
return data
|
||||
return AudioTranscriptionRequestData(
|
||||
data=data,
|
||||
)
|
||||
|
||||
def get_error_class(
|
||||
self, error_message: str, status_code: int, headers: Union[dict, Headers]
|
||||
@ -96,3 +126,25 @@ class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
|
||||
message=error_message,
|
||||
headers=headers,
|
||||
)
|
||||
|
||||
def transform_audio_transcription_response(
|
||||
self,
|
||||
raw_response: Response,
|
||||
) -> TranscriptionResponse:
|
||||
try:
|
||||
raw_response_json = raw_response.json()
|
||||
except Exception as e:
|
||||
raise ValueError(
|
||||
f"Error transforming response to json: {str(e)}\nResponse: {raw_response.text}"
|
||||
)
|
||||
|
||||
if any(
|
||||
key in raw_response_json
|
||||
for key in TranscriptionResponse.model_fields.keys()
|
||||
):
|
||||
return TranscriptionResponse(**raw_response_json)
|
||||
else:
|
||||
raise ValueError(
|
||||
"Invalid response format. Received response does not match the expected format. Got: ",
|
||||
raw_response_json,
|
||||
)
|
||||
|
||||
141
litellm/llms/ovhcloud/chat/transformation.py
Normal file
141
litellm/llms/ovhcloud/chat/transformation.py
Normal file
@ -0,0 +1,141 @@
|
||||
"""
|
||||
Support for OVHCloud AI Endpoints `/v1/chat/completions` endpoint.
|
||||
|
||||
Our unified API follows the OpenAI standard.
|
||||
More information on our website: https://endpoints.ai.cloud.ovh.net
|
||||
"""
|
||||
from typing import Optional, Union, List
|
||||
|
||||
import httpx
|
||||
from litellm import ModelResponseStream, OpenAIGPTConfig, get_model_info, verbose_logger
|
||||
from litellm.llms.ovhcloud.utils import OVHCloudException
|
||||
from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
|
||||
from litellm.llms.base_llm.chat.transformation import BaseLLMException
|
||||
from litellm.types.llms.openai import AllMessageValues
|
||||
|
||||
class OVHCloudChatConfig(OpenAIGPTConfig):
|
||||
@property
|
||||
def custom_llm_provider(self) -> Optional[str]:
|
||||
return "ovhcloud"
|
||||
|
||||
def get_supported_openai_params(self, model: str) -> list:
|
||||
"""
|
||||
Details about function calling support can be found here:
|
||||
https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-function-calling?id=kb_article_view&sysparm_article=KB0071907
|
||||
"""
|
||||
supports_function_calling: Optional[bool] = None
|
||||
try:
|
||||
model_info = get_model_info(model, custom_llm_provider="ovhcloud")
|
||||
supports_function_calling = model_info.get(
|
||||
"supports_function_calling", False
|
||||
)
|
||||
except Exception as e:
|
||||
verbose_logger.debug(f"Error getting supported OpenAI params: {e}")
|
||||
pass
|
||||
|
||||
optional_params = super().get_supported_openai_params(model)
|
||||
if supports_function_calling is not True:
|
||||
verbose_logger.debug(
|
||||
"You can see our models supporting function_calling in our catalog: https://endpoints.ai.cloud.ovh.net/catalog "
|
||||
)
|
||||
optional_params.remove("tools")
|
||||
optional_params.remove("tool_choice")
|
||||
optional_params.remove("function_call")
|
||||
optional_params.remove("response_format")
|
||||
return optional_params
|
||||
|
||||
def get_complete_url(
|
||||
self,
|
||||
api_base: Optional[str],
|
||||
api_key: Optional[str],
|
||||
model: str,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
stream: Optional[bool] = None,
|
||||
) -> str:
|
||||
api_base = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" if api_base is None else api_base.rstrip("/")
|
||||
complete_url = f"{api_base}/chat/completions"
|
||||
return complete_url
|
||||
|
||||
def get_error_class(
|
||||
self,
|
||||
error_message: str,
|
||||
status_code: int,
|
||||
headers: Union[dict, httpx.Headers]
|
||||
) -> BaseLLMException:
|
||||
return OVHCloudException(
|
||||
message=error_message,
|
||||
status_code=status_code,
|
||||
headers=headers,
|
||||
)
|
||||
|
||||
def map_openai_params(
|
||||
self,
|
||||
non_default_params: dict,
|
||||
optional_params: dict,
|
||||
model: str,
|
||||
drop_params: bool,
|
||||
) -> dict:
|
||||
mapped_openai_params = super().map_openai_params(
|
||||
non_default_params, optional_params, model, drop_params
|
||||
)
|
||||
return mapped_openai_params
|
||||
|
||||
def transform_request(
|
||||
self,
|
||||
model: str,
|
||||
messages: List[AllMessageValues],
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
headers: dict,
|
||||
) -> dict:
|
||||
extra_body = optional_params.pop("extra_body", {})
|
||||
response = super().transform_request(
|
||||
model, messages, optional_params, litellm_params, headers
|
||||
)
|
||||
response.update(extra_body)
|
||||
return response
|
||||
|
||||
class OVHCloudChatCompletionStreamingHandler(BaseModelResponseIterator):
|
||||
"""
|
||||
Handler for OVHCloud AI Endpoints streaming chat completion responses
|
||||
"""
|
||||
|
||||
def chunk_parser(self, chunk: dict) -> ModelResponseStream:
|
||||
"""
|
||||
Parse individual chunks from streaming response
|
||||
"""
|
||||
try:
|
||||
if "error" in chunk:
|
||||
error_chunk = chunk["error"]
|
||||
error_message = "OVHCloud Error: {}".format(
|
||||
error_chunk.get("message", "Unknown error")
|
||||
)
|
||||
raise OVHCloudException(
|
||||
message=error_message,
|
||||
status_code=error_chunk.get("code", 400),
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
|
||||
new_choices = []
|
||||
for choice in chunk["choices"]:
|
||||
if "delta" in choice and "reasoning" in choice["delta"]:
|
||||
choice["delta"]["reasoning_content"] = choice["delta"].get("reasoning")
|
||||
new_choices.append(choice)
|
||||
|
||||
return ModelResponseStream(
|
||||
id=chunk["id"],
|
||||
object="chat.completion.chunk",
|
||||
created=chunk["created"],
|
||||
usage=chunk.get("usage"),
|
||||
model=chunk["model"],
|
||||
choices=new_choices,
|
||||
)
|
||||
except KeyError as e:
|
||||
raise OVHCloudException(
|
||||
message=f"KeyError: {e}, Got unexpected response from CometAPI: {chunk}",
|
||||
status_code=400,
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
except Exception as e:
|
||||
raise e
|
||||
122
litellm/llms/ovhcloud/embedding/transformation.py
Normal file
122
litellm/llms/ovhcloud/embedding/transformation.py
Normal file
@ -0,0 +1,122 @@
|
||||
"""
|
||||
This is OpenAI compatible - no transformation is applied
|
||||
|
||||
"""
|
||||
from typing import List, Optional, Union
|
||||
|
||||
import httpx
|
||||
|
||||
from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
|
||||
from litellm.llms.base_llm.chat.transformation import BaseLLMException
|
||||
from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig
|
||||
from litellm.secret_managers.main import get_secret_str
|
||||
from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues
|
||||
from litellm.types.utils import EmbeddingResponse, Usage
|
||||
|
||||
from ..utils import OVHCloudException
|
||||
|
||||
|
||||
class OVHCloudEmbeddingConfig(BaseEmbeddingConfig):
|
||||
def __init__(self) -> None:
|
||||
pass
|
||||
|
||||
def get_complete_url(
|
||||
self,
|
||||
api_base: Optional[str],
|
||||
api_key: Optional[str],
|
||||
model: str,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
stream: Optional[bool] = None,
|
||||
) -> str:
|
||||
api_base = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" if api_base is None else api_base.rstrip("/")
|
||||
complete_url = f"{api_base}/embeddings"
|
||||
return complete_url
|
||||
|
||||
def validate_environment(
|
||||
self,
|
||||
headers: dict,
|
||||
model: str,
|
||||
messages: List[AllMessageValues],
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
api_key: Optional[str] = None,
|
||||
api_base: Optional[str] = None,
|
||||
) -> dict:
|
||||
if api_key is None:
|
||||
api_key = get_secret_str("OVHCLOUD_API_KEY")
|
||||
|
||||
default_headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"accept": "application/json",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
if "Authorization" in headers:
|
||||
default_headers["Authorization"] = headers["Authorization"]
|
||||
|
||||
return {**default_headers, **headers}
|
||||
|
||||
def get_supported_openai_params(self, model: str):
|
||||
return []
|
||||
|
||||
def map_openai_params(
|
||||
self,
|
||||
non_default_params: dict,
|
||||
optional_params: dict,
|
||||
model: str,
|
||||
drop_params: bool,
|
||||
):
|
||||
supported_openai_params = self.get_supported_openai_params(model)
|
||||
for param, value in non_default_params.items():
|
||||
if param in supported_openai_params:
|
||||
optional_params[param] = value
|
||||
return optional_params
|
||||
|
||||
def transform_embedding_request(
|
||||
self,
|
||||
model: str,
|
||||
input: AllEmbeddingInputValues,
|
||||
optional_params: dict,
|
||||
headers: dict,
|
||||
) -> dict:
|
||||
return {"input": input, "model": model, **optional_params}
|
||||
|
||||
def transform_embedding_response(
|
||||
self,
|
||||
model: str,
|
||||
raw_response: httpx.Response,
|
||||
model_response: EmbeddingResponse,
|
||||
logging_obj: LiteLLMLoggingObj,
|
||||
api_key: Optional[str],
|
||||
request_data: dict,
|
||||
optional_params: dict,
|
||||
litellm_params: dict,
|
||||
) -> EmbeddingResponse:
|
||||
try:
|
||||
raw_response_json = raw_response.json()
|
||||
except Exception:
|
||||
raise OVHCloudException(
|
||||
message=raw_response.text,
|
||||
status_code=raw_response.status_code,
|
||||
headers=raw_response.headers,
|
||||
)
|
||||
|
||||
model_response.model = raw_response_json.get("model")
|
||||
model_response.data = raw_response_json.get("data")
|
||||
model_response.object = raw_response_json.get("object")
|
||||
|
||||
usage = Usage(
|
||||
prompt_tokens=raw_response_json.get("usage", {}).get("prompt_tokens", 0),
|
||||
total_tokens=raw_response_json.get("usage", {}).get("total_tokens", 0),
|
||||
)
|
||||
|
||||
model_response.usage = usage
|
||||
return model_response
|
||||
|
||||
def get_error_class(
|
||||
self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
|
||||
) -> BaseLLMException:
|
||||
return OVHCloudException(
|
||||
message=error_message, status_code=status_code, headers=headers
|
||||
)
|
||||
6
litellm/llms/ovhcloud/utils.py
Normal file
6
litellm/llms/ovhcloud/utils.py
Normal file
@ -0,0 +1,6 @@
|
||||
from litellm.llms.base_llm.chat.transformation import BaseLLMException
|
||||
|
||||
|
||||
class OVHCloudException(BaseLLMException):
|
||||
"""OVHCloud AI Endpoints exception handling class"""
|
||||
pass
|
||||
@ -1,6 +1,7 @@
|
||||
from typing import Optional, TypedDict, Union
|
||||
from typing import Optional, Union
|
||||
|
||||
import httpx
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
import litellm
|
||||
from litellm.llms.custom_httpx.http_handler import (
|
||||
|
||||
@ -3,7 +3,9 @@ Types for Vertex Embeddings Requests
|
||||
"""
|
||||
|
||||
from enum import Enum
|
||||
from typing import List, Optional, TypedDict, Union
|
||||
from typing import List, Optional, Union
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
|
||||
class TaskType(str, Enum):
|
||||
|
||||
@ -164,6 +164,7 @@ from .llms.openai.openai import OpenAIChatCompletion
|
||||
from .llms.openai.transcriptions.handler import OpenAIAudioTranscription
|
||||
from .llms.openai_like.chat.handler import OpenAILikeChatHandler
|
||||
from .llms.openai_like.embedding.handler import OpenAILikeEmbeddingHandler
|
||||
from .llms.ovhcloud.chat.transformation import OVHCloudChatConfig
|
||||
from .llms.petals.completion import handler as petals_handler
|
||||
from .llms.predibase.chat.handler import PredibaseChatCompletion
|
||||
from .llms.replicate.chat.handler import completion as replicate_chat_completion
|
||||
@ -259,6 +260,7 @@ sagemaker_chat_completion = SagemakerChatHandler()
|
||||
bytez_transformation = BytezChatConfig()
|
||||
heroku_transformation = HerokuChatConfig()
|
||||
oci_transformation = OCIChatConfig()
|
||||
ovhcloud_transformation = OVHCloudChatConfig()
|
||||
####### COMPLETION ENDPOINTS ################
|
||||
|
||||
|
||||
@ -3535,6 +3537,42 @@ def completion( # type: ignore # noqa: PLR0915
|
||||
|
||||
pass
|
||||
|
||||
elif custom_llm_provider == "ovhcloud" or model in litellm.ovhcloud_models:
|
||||
api_key = (
|
||||
api_key
|
||||
or litellm.ovhcloud_key
|
||||
or get_secret_str("OVHCLOUD_API_KEY")
|
||||
or litellm.api_key
|
||||
)
|
||||
|
||||
api_base = (
|
||||
api_base
|
||||
or litellm.api_base
|
||||
or get_secret_str("OVHCLOUD_API_BASE")
|
||||
or "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"
|
||||
)
|
||||
|
||||
response = base_llm_http_handler.completion(
|
||||
model=model,
|
||||
messages=messages,
|
||||
headers=headers,
|
||||
model_response=model_response,
|
||||
api_key=api_key,
|
||||
api_base=api_base,
|
||||
acompletion=acompletion,
|
||||
logging_obj=logging,
|
||||
optional_params=optional_params,
|
||||
litellm_params=litellm_params,
|
||||
timeout=timeout, # type: ignore
|
||||
client=client,
|
||||
custom_llm_provider=custom_llm_provider,
|
||||
encoding=encoding,
|
||||
stream=stream,
|
||||
provider_config=ovhcloud_transformation,
|
||||
)
|
||||
|
||||
pass
|
||||
|
||||
elif custom_llm_provider == "custom":
|
||||
url = litellm.api_base or api_base or ""
|
||||
if url is None or url == "":
|
||||
@ -4603,6 +4641,28 @@ def embedding( # noqa: PLR0915
|
||||
aembedding=aembedding,
|
||||
headers=headers,
|
||||
)
|
||||
elif custom_llm_provider == "ovhcloud":
|
||||
api_key = api_key or litellm.api_key or get_secret_str("OVHCLOUD_API_KEY")
|
||||
api_base = (
|
||||
api_base
|
||||
or litellm.api_base
|
||||
or get_secret_str("OVHCLOUD_API_BASE")
|
||||
or "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"
|
||||
)
|
||||
response = base_llm_http_handler.embedding(
|
||||
model=model,
|
||||
input=input,
|
||||
custom_llm_provider=custom_llm_provider,
|
||||
api_base=api_base,
|
||||
api_key=api_key,
|
||||
logging_obj=logging,
|
||||
timeout=timeout,
|
||||
model_response=EmbeddingResponse(),
|
||||
optional_params=optional_params,
|
||||
client=client,
|
||||
aembedding=aembedding,
|
||||
litellm_params={},
|
||||
)
|
||||
elif custom_llm_provider in litellm._custom_providers:
|
||||
custom_handler: Optional[CustomLLM] = None
|
||||
for item in litellm.custom_provider_map:
|
||||
@ -5297,7 +5357,10 @@ def transcription(
|
||||
model_response = litellm.utils.TranscriptionResponse()
|
||||
|
||||
model, custom_llm_provider, dynamic_api_key, api_base = get_llm_provider(
|
||||
model=model, custom_llm_provider=custom_llm_provider, api_base=api_base
|
||||
model=model,
|
||||
custom_llm_provider=custom_llm_provider,
|
||||
api_base=api_base,
|
||||
api_key=api_key,
|
||||
) # type: ignore
|
||||
|
||||
if dynamic_api_key is not None:
|
||||
@ -5313,6 +5376,7 @@ def transcription(
|
||||
custom_llm_provider=custom_llm_provider,
|
||||
**non_default_params,
|
||||
)
|
||||
|
||||
litellm_params_dict = get_litellm_params(**kwargs)
|
||||
|
||||
litellm_logging_obj.update_environment_variables(
|
||||
@ -5377,9 +5441,8 @@ def transcription(
|
||||
max_retries=max_retries,
|
||||
litellm_params=litellm_params_dict,
|
||||
)
|
||||
elif (
|
||||
custom_llm_provider == "openai"
|
||||
or custom_llm_provider in litellm.openai_compatible_providers
|
||||
elif custom_llm_provider == "openai" or (
|
||||
custom_llm_provider in litellm.openai_compatible_providers
|
||||
):
|
||||
api_base = (
|
||||
api_base
|
||||
@ -5394,6 +5457,7 @@ def transcription(
|
||||
or None # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
|
||||
)
|
||||
# set API KEY
|
||||
|
||||
api_key = api_key or litellm.api_key or litellm.openai_key or get_secret("OPENAI_API_KEY") # type: ignore
|
||||
response = openai_audio_transcriptions.audio_transcriptions(
|
||||
model=model,
|
||||
@ -5410,10 +5474,7 @@ def transcription(
|
||||
provider_config=provider_config,
|
||||
litellm_params=litellm_params_dict,
|
||||
)
|
||||
elif custom_llm_provider in [
|
||||
LlmProviders.DEEPGRAM.value,
|
||||
LlmProviders.ELEVENLABS.value,
|
||||
]:
|
||||
elif provider_config is not None:
|
||||
response = base_llm_http_handler.audio_transcriptions(
|
||||
model=model,
|
||||
audio_file=file,
|
||||
|
||||
@ -20777,5 +20777,207 @@
|
||||
"metadata": {
|
||||
"notes": "Volcengine Doubao embedding model - text-240715 version with 2560 dimensions"
|
||||
}
|
||||
},
|
||||
"ovhcloud/Qwen2.5-VL-72B-Instruct": {
|
||||
"max_tokens": 32000,
|
||||
"max_input_tokens": 32000,
|
||||
"max_output_tokens": 32000,
|
||||
"input_cost_per_token": 9.1e-07,
|
||||
"output_cost_per_token": 9.1e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": false,
|
||||
"supports_vision": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/qwen2-5-vl-72b-instruct"
|
||||
},
|
||||
"ovhcloud/llava-v1.6-mistral-7b-hf": {
|
||||
"max_tokens": 32000,
|
||||
"max_input_tokens": 32000,
|
||||
"max_output_tokens": 32000,
|
||||
"input_cost_per_token": 2.9e-07,
|
||||
"output_cost_per_token": 2.9e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": false,
|
||||
"supports_vision": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/llava-next-mistral-7b"
|
||||
},
|
||||
"ovhcloud/gpt-oss-120b": {
|
||||
"max_tokens": 131000,
|
||||
"max_input_tokens": 131000,
|
||||
"max_output_tokens": 131000,
|
||||
"input_cost_per_token": 8e-08,
|
||||
"output_cost_per_token": 4e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": false,
|
||||
"supports_reasoning": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/gpt-oss-120b"
|
||||
},
|
||||
"ovhcloud/Meta-Llama-3_3-70B-Instruct": {
|
||||
"max_tokens": 131000,
|
||||
"max_input_tokens": 131000,
|
||||
"max_output_tokens": 131000,
|
||||
"input_cost_per_token": 6.7e-07,
|
||||
"output_cost_per_token": 6.7e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/meta-llama-3-3-70b-instruct"
|
||||
},
|
||||
"ovhcloud/Qwen2.5-Coder-32B-Instruct": {
|
||||
"max_tokens": 32000,
|
||||
"max_input_tokens": 32000,
|
||||
"max_output_tokens": 32000,
|
||||
"input_cost_per_token": 8.7e-07,
|
||||
"output_cost_per_token": 8.7e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": false,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/qwen2-5-coder-32b-instruct"
|
||||
},
|
||||
"ovhcloud/Mixtral-8x7B-Instruct-v0.1": {
|
||||
"max_tokens": 32000,
|
||||
"max_input_tokens": 32000,
|
||||
"max_output_tokens": 32000,
|
||||
"input_cost_per_token": 6.3e-07,
|
||||
"output_cost_per_token": 6.3e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": false,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/mixtral-8x7b-instruct-v0-1"
|
||||
},
|
||||
"ovhcloud/Meta-Llama-3_1-70B-Instruct": {
|
||||
"max_tokens": 131000,
|
||||
"max_input_tokens": 131000,
|
||||
"max_output_tokens": 131000,
|
||||
"input_cost_per_token": 6.7e-07,
|
||||
"output_cost_per_token": 6.7e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": false,
|
||||
"supports_tool_choice": false,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/meta-llama-3-1-70b-instruct"
|
||||
},
|
||||
"ovhcloud/Mistral-Small-3.2-24B-Instruct-2506": {
|
||||
"max_tokens": 128000,
|
||||
"max_input_tokens": 128000,
|
||||
"max_output_tokens": 128000,
|
||||
"input_cost_per_token": 9e-08,
|
||||
"output_cost_per_token": 2.8e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_vision": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/mistral-small-3-2-24b-instruct-2506"
|
||||
},
|
||||
"ovhcloud/DeepSeek-R1-Distill-Llama-70B": {
|
||||
"max_tokens": 131000,
|
||||
"max_input_tokens": 131000,
|
||||
"max_output_tokens": 131000,
|
||||
"input_cost_per_token": 6.7e-07,
|
||||
"output_cost_per_token": 6.7e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_reasoning": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/deepseek-r1-distill-llama-70b"
|
||||
},
|
||||
"ovhcloud/Llama-3.1-8B-Instruct": {
|
||||
"max_tokens": 131000,
|
||||
"max_input_tokens": 131000,
|
||||
"max_output_tokens": 131000,
|
||||
"input_cost_per_token": 1e-07,
|
||||
"output_cost_per_token": 1e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/llama-3-1-8b-instruct"
|
||||
},
|
||||
"ovhcloud/Mistral-7B-Instruct-v0.3": {
|
||||
"max_tokens": 127000,
|
||||
"max_input_tokens": 127000,
|
||||
"max_output_tokens": 127000,
|
||||
"input_cost_per_token": 1e-07,
|
||||
"output_cost_per_token": 1e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/mistral-7b-instruct-v0-3"
|
||||
},
|
||||
"ovhcloud/gpt-oss-20b": {
|
||||
"max_tokens": 131000,
|
||||
"max_input_tokens": 131000,
|
||||
"max_output_tokens": 131000,
|
||||
"input_cost_per_token": 4e-08,
|
||||
"output_cost_per_token": 1.5e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": false,
|
||||
"supports_reasoning": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/gpt-oss-20b"
|
||||
},
|
||||
"ovhcloud/Mistral-Nemo-Instruct-2407": {
|
||||
"max_tokens": 118000,
|
||||
"max_input_tokens": 118000,
|
||||
"max_output_tokens": 118000,
|
||||
"input_cost_per_token": 1.3e-07,
|
||||
"output_cost_per_token": 1.3e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/mistral-nemo-instruct-2407"
|
||||
},
|
||||
"ovhcloud/Qwen3-32B": {
|
||||
"max_tokens": 32000,
|
||||
"max_input_tokens": 32000,
|
||||
"max_output_tokens": 32000,
|
||||
"input_cost_per_token": 8e-08,
|
||||
"output_cost_per_token": 2.3e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": true,
|
||||
"supports_reasoning": true,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/qwen3-32b"
|
||||
},
|
||||
"ovhcloud/mamba-codestral-7B-v0.1": {
|
||||
"max_tokens": 256000,
|
||||
"max_input_tokens": 256000,
|
||||
"max_output_tokens": 256000,
|
||||
"input_cost_per_token": 1.9e-07,
|
||||
"output_cost_per_token": 1.9e-07,
|
||||
"litellm_provider": "ovhcloud",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": false,
|
||||
"supports_response_schema": true,
|
||||
"supports_tool_choice": false,
|
||||
"source": "https://endpoints.ai.cloud.ovh.net/models/mamba-codestral-7b-v0-1"
|
||||
}
|
||||
}
|
||||
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -1 +0,0 @@
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{6580:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=6580)}),_N_E=n.O()}]);
|
||||
@ -0,0 +1 @@
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{96443:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=96443)}),_N_E=n.O()}]);
|
||||
@ -1 +1 @@
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{11790:function(e,n,t){Promise.resolve().then(t.bind(t,52829))},52829:function(e,n,t){"use strict";t.r(n),t.d(n,{default:function(){return f}});var u=t(57437),s=t(2265),c=t(99376),r=t(72162);function f(){let e=(0,c.useSearchParams)().get("key"),[n,t]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&t(e)},[e]),(0,u.jsx)(r.Z,{accessToken:n})}}},function(e){e.O(0,[50,521,154,162,971,117,744],function(){return e(e.s=11790)}),_N_E=e.O()}]);
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{21024:function(e,n,t){Promise.resolve().then(t.bind(t,52829))},52829:function(e,n,t){"use strict";t.r(n),t.d(n,{default:function(){return f}});var u=t(57437),s=t(2265),c=t(99376),r=t(72162);function f(){let e=(0,c.useSearchParams)().get("key"),[n,t]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&t(e)},[e]),(0,u.jsx)(r.Z,{accessToken:n})}}},function(e){e.O(0,[50,521,154,162,971,117,744],function(){return e(e.s=21024)}),_N_E=e.O()}]);
|
||||
@ -1 +1 @@
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[25],{58538:function(e,n,u){Promise.resolve().then(u.bind(u,22775))},22775:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(36172);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1,userRole:null})}}},function(e){e.O(0,[50,521,866,154,162,172,971,117,744],function(){return e(e.s=58538)}),_N_E=e.O()}]);
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[25],{64563:function(e,n,u){Promise.resolve().then(u.bind(u,22775))},22775:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(36172);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1,userRole:null})}}},function(e){e.O(0,[50,521,866,154,162,172,971,117,744],function(){return e(e.s=64563)}),_N_E=e.O()}]);
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -1 +1 @@
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{20169:function(e,n,t){Promise.resolve().then(t.t.bind(t,12846,23)),Promise.resolve().then(t.t.bind(t,19107,23)),Promise.resolve().then(t.t.bind(t,61060,23)),Promise.resolve().then(t.t.bind(t,4707,23)),Promise.resolve().then(t.t.bind(t,80,23)),Promise.resolve().then(t.t.bind(t,36423,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,117],function(){return n(54278),n(20169)}),_N_E=e.O()}]);
|
||||
(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{10264:function(e,n,t){Promise.resolve().then(t.t.bind(t,12846,23)),Promise.resolve().then(t.t.bind(t,19107,23)),Promise.resolve().then(t.t.bind(t,61060,23)),Promise.resolve().then(t.t.bind(t,4707,23)),Promise.resolve().then(t.t.bind(t,80,23)),Promise.resolve().then(t.t.bind(t,36423,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,117],function(){return n(54278),n(10264)}),_N_E=e.O()}]);
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -1,7 +1,7 @@
|
||||
2:I[19107,[],"ClientPageRoot"]
|
||||
3:I[30628,["665","static/chunks/3014691f-b7b79b78e27792f3.js","990","static/chunks/13b76428-ebdf3012af0e4489.js","50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","220","static/chunks/220-5061c4cea850d728.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","931","static/chunks/app/page-127adcf8da2b5294.js"],"default",1]
|
||||
3:I[30628,["665","static/chunks/3014691f-b7b79b78e27792f3.js","990","static/chunks/13b76428-ebdf3012af0e4489.js","50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","220","static/chunks/220-8af5927d18414264.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","931","static/chunks/app/page-338773f18570e0d6.js"],"default",1]
|
||||
4:I[4707,[],""]
|
||||
5:I[36423,[],""]
|
||||
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
|
||||
1:null
|
||||
|
||||
@ -1,7 +1,7 @@
|
||||
2:I[19107,[],"ClientPageRoot"]
|
||||
3:I[52829,["50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","418","static/chunks/app/model_hub/page-d6e5fb7de2cedde9.js"],"default",1]
|
||||
3:I[52829,["50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","418","static/chunks/app/model_hub/page-0dbadf20167b786c.js"],"default",1]
|
||||
4:I[4707,[],""]
|
||||
5:I[36423,[],""]
|
||||
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
|
||||
1:null
|
||||
|
||||
File diff suppressed because one or more lines are too long
@ -1,7 +1,7 @@
|
||||
2:I[19107,[],"ClientPageRoot"]
|
||||
3:I[22775,["50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","25","static/chunks/app/model_hub_table/page-e06e934de1021ee4.js"],"default",1]
|
||||
3:I[22775,["50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","25","static/chunks/app/model_hub_table/page-f469bae327299fbb.js"],"default",1]
|
||||
4:I[4707,[],""]
|
||||
5:I[36423,[],""]
|
||||
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["model_hub_table",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub_table",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub_table","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["model_hub_table",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub_table",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub_table","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
|
||||
1:null
|
||||
|
||||
File diff suppressed because one or more lines are too long
@ -1,7 +1,7 @@
|
||||
2:I[19107,[],"ClientPageRoot"]
|
||||
3:I[12011,["665","static/chunks/3014691f-b7b79b78e27792f3.js","50","static/chunks/50-fe160ecfa8bc4059.js","154","static/chunks/154-fff436ed72b19a24.js","461","static/chunks/app/onboarding/page-3c5840c907b0a5c8.js"],"default",1]
|
||||
3:I[12011,["665","static/chunks/3014691f-b7b79b78e27792f3.js","50","static/chunks/50-bb8a11a7610535aa.js","154","static/chunks/154-6f752d9e0a5e497b.js","461","static/chunks/app/onboarding/page-7828c2c64e97362a.js"],"default",1]
|
||||
4:I[4707,[],""]
|
||||
5:I[36423,[],""]
|
||||
0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
|
||||
6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
|
||||
1:null
|
||||
|
||||
@ -1,12 +1,20 @@
|
||||
model_list:
|
||||
- model_name: fake-openai-endpoint
|
||||
- model_name: byok-fixed-gpt-4o-mini
|
||||
litellm_params:
|
||||
model: openai/fake
|
||||
api_key: fake-key
|
||||
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
||||
- model_name: wildcard_models/*
|
||||
model: openai/gpt-4o-mini
|
||||
api_base: "https://webhook.site/2f385e05-00aa-402b-86d1-efc9261471a5"
|
||||
api_key: dummy
|
||||
- model_name: "byok-wildcard/*"
|
||||
litellm_params:
|
||||
model: openai/*
|
||||
- model_name: xai-grok-3
|
||||
litellm_params:
|
||||
model: xai/grok-3
|
||||
- model_name: hosted_vllm/whisper-v3
|
||||
litellm_params:
|
||||
model: hosted_vllm/whisper-v3
|
||||
api_base: "https://webhook.site/2f385e05-00aa-402b-86d1-efc9261471a5"
|
||||
api_key: dummy
|
||||
|
||||
|
||||
|
||||
|
||||
@ -1949,7 +1949,7 @@ class LiteLLM_OrganizationMembershipTable(LiteLLMPydanticObjectBase):
|
||||
model_config = ConfigDict(protected_namespaces=())
|
||||
|
||||
|
||||
class LiteLLM_OrganizationTableUpdate(LiteLLMPydanticObjectBase):
|
||||
class LiteLLM_OrganizationTableUpdate(LiteLLM_BudgetTable):
|
||||
"""Represents user-controllable params for a LiteLLM_OrganizationTable record"""
|
||||
|
||||
organization_id: Optional[str] = None
|
||||
|
||||
@ -1211,7 +1211,6 @@ def _check_model_access_helper(
|
||||
models: List[str],
|
||||
team_model_aliases: Optional[Dict[str, str]] = None,
|
||||
team_id: Optional[str] = None,
|
||||
object_type: Literal["user", "team", "key", "org"] = "user",
|
||||
) -> bool:
|
||||
## check if model in allowed model names
|
||||
from collections import defaultdict
|
||||
@ -1316,7 +1315,6 @@ def _can_object_call_model(
|
||||
models=models,
|
||||
team_model_aliases=team_model_aliases,
|
||||
team_id=team_id,
|
||||
object_type=object_type,
|
||||
):
|
||||
return True
|
||||
|
||||
|
||||
@ -0,0 +1,36 @@
|
||||
model_list:
|
||||
- model_name: claude-3-5-sonnet
|
||||
litellm_params:
|
||||
model: anthropic/claude-3-5-sonnet-20241022
|
||||
api_key: os.environ/ANTHROPIC_API_KEY
|
||||
|
||||
guardrails:
|
||||
- guardrail_name: "tool-permission-guardrail"
|
||||
litellm_params:
|
||||
guardrail: tool_permission
|
||||
mode: "post_call"
|
||||
default_on: true # Apply to all requests by default
|
||||
rules:
|
||||
- id: "allow_bash"
|
||||
tool_name: "Bash"
|
||||
decision: "allow"
|
||||
- id: "allow_github_mcp"
|
||||
tool_name: "mcp__github_*"
|
||||
decision: "allow"
|
||||
- id: "allow_aws_documentation"
|
||||
tool_name: "mcp__aws-documentation_*_documentation"
|
||||
decision: "allow"
|
||||
- id: "deny_read_commands"
|
||||
tool_name: "Read"
|
||||
decision: "Deny"
|
||||
default_action: "deny" # deny by default if no rule matches
|
||||
on_disallowed_action: "block" # block by default if no rule matches
|
||||
|
||||
# Optional: Configure general settings
|
||||
general_settings:
|
||||
master_key: sk-1234
|
||||
|
||||
# Optional: Add logging configuration
|
||||
litellm_settings:
|
||||
success_callback: ["langfuse"]
|
||||
failure_callback: ["langfuse"]
|
||||
@ -18,6 +18,7 @@ def initialize_guardrail(litellm_params: "LitellmParams", guardrail: "Guardrail"
|
||||
application_id=litellm_params.application_id,
|
||||
monitor_mode=litellm_params.monitor_mode,
|
||||
block_failures=litellm_params.block_failures,
|
||||
anonymize_input=litellm_params.anonymize_input,
|
||||
event_hook=litellm_params.mode,
|
||||
default_on=litellm_params.default_on,
|
||||
)
|
||||
|
||||
@ -5,9 +5,10 @@
|
||||
#
|
||||
# +-------------------------------------------------------------+
|
||||
|
||||
import asyncio
|
||||
import copy
|
||||
import os
|
||||
from typing import Any, Dict, Literal, Optional, Union
|
||||
from typing import Any, Dict, Final, Literal, Optional, Union
|
||||
from urllib.parse import urljoin
|
||||
|
||||
from fastapi import HTTPException
|
||||
@ -24,6 +25,15 @@ from litellm.proxy._types import UserAPIKeyAuth
|
||||
from litellm.types.guardrails import GuardrailEventHooks
|
||||
from litellm.types.utils import EmbeddingResponse, ImageResponse
|
||||
|
||||
# Constants
|
||||
USER_ROLE: Final[Literal["user"]] = "user"
|
||||
ASSISTANT_ROLE: Final[Literal["assistant"]] = "assistant"
|
||||
SENSITIVE_DATA_DETECTOR_KEYS: Final[list[str]] = ["sensitiveData", "dataDetector"]
|
||||
|
||||
# Type aliases
|
||||
MessageRole = Literal["user", "assistant"]
|
||||
LLMResponse = Union[Any, ModelResponse, EmbeddingResponse, ImageResponse]
|
||||
|
||||
|
||||
class NomaBlockedMessage(HTTPException):
|
||||
"""Exception raised when Noma guardrail blocks a message"""
|
||||
@ -77,6 +87,7 @@ class NomaBlockedMessage(HTTPException):
|
||||
"allowedTopics",
|
||||
"bannedTopics",
|
||||
"topicGuardrails",
|
||||
"topicDetector", # Mock name for tests
|
||||
] and isinstance(value, dict):
|
||||
filtered_topics = {}
|
||||
for topic, topic_result in value.items():
|
||||
@ -86,7 +97,7 @@ class NomaBlockedMessage(HTTPException):
|
||||
if filtered_topics:
|
||||
result[key] = filtered_topics
|
||||
|
||||
elif key == "sensitiveData" and isinstance(value, dict):
|
||||
elif key in SENSITIVE_DATA_DETECTOR_KEYS and isinstance(value, dict):
|
||||
filtered_sensitive = {}
|
||||
for data_type, data_result in value.items():
|
||||
if self._is_result_true(data_result):
|
||||
@ -135,6 +146,7 @@ class NomaGuardrail(CustomGuardrail):
|
||||
application_id: Optional[str] = None,
|
||||
monitor_mode: Optional[bool] = None,
|
||||
block_failures: Optional[bool] = None,
|
||||
anonymize_input: Optional[bool] = None,
|
||||
**kwargs,
|
||||
):
|
||||
self.async_handler = get_async_httpx_client(
|
||||
@ -162,8 +174,326 @@ class NomaGuardrail(CustomGuardrail):
|
||||
else:
|
||||
self.block_failures = block_failures
|
||||
|
||||
if anonymize_input is None:
|
||||
self.anonymize_input = (
|
||||
os.environ.get("NOMA_ANONYMIZE_INPUT", "false").lower() == "true"
|
||||
)
|
||||
else:
|
||||
self.anonymize_input = anonymize_input
|
||||
|
||||
super().__init__(**kwargs)
|
||||
|
||||
def _create_background_noma_check(
|
||||
self,
|
||||
coro,
|
||||
) -> None:
|
||||
"""Create a background task for Noma API calls without blocking the main flow"""
|
||||
try:
|
||||
asyncio.create_task(coro)
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(
|
||||
f"Failed to create background Noma task: {str(e)}"
|
||||
)
|
||||
|
||||
async def _process_user_message_check(
|
||||
self,
|
||||
request_data: dict,
|
||||
user_auth: UserAPIKeyAuth,
|
||||
) -> Optional[str]:
|
||||
"""Shared logic for processing user message checks"""
|
||||
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
|
||||
|
||||
user_message = await self._extract_user_message(request_data)
|
||||
if not user_message:
|
||||
return None
|
||||
|
||||
payload = {"request": {"text": user_message}}
|
||||
response_json = await self._call_noma_api(
|
||||
payload=payload,
|
||||
llm_request_id=None,
|
||||
request_data=request_data,
|
||||
user_auth=user_auth,
|
||||
extra_data=extra_data,
|
||||
)
|
||||
|
||||
if self.monitor_mode:
|
||||
await self._handle_verdict_background(
|
||||
USER_ROLE, user_message, response_json
|
||||
)
|
||||
return user_message
|
||||
|
||||
# Check if we should anonymize content
|
||||
if self._should_anonymize(response_json, USER_ROLE):
|
||||
anonymized_content = self._extract_anonymized_content(
|
||||
response_json, USER_ROLE
|
||||
)
|
||||
if anonymized_content:
|
||||
# Replace the user message content with anonymized version
|
||||
self._replace_user_message_content(request_data, anonymized_content)
|
||||
verbose_proxy_logger.debug(
|
||||
f"Noma guardrail anonymized user message: {anonymized_content}"
|
||||
)
|
||||
return anonymized_content
|
||||
|
||||
await self._check_verdict(USER_ROLE, user_message, response_json)
|
||||
return user_message
|
||||
|
||||
async def _process_llm_response_check(
|
||||
self,
|
||||
request_data: dict,
|
||||
response: LLMResponse,
|
||||
user_auth: UserAPIKeyAuth,
|
||||
) -> Optional[str]:
|
||||
"""Shared logic for processing LLM response checks"""
|
||||
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
|
||||
|
||||
if not isinstance(response, litellm.ModelResponse):
|
||||
return None
|
||||
|
||||
content = None
|
||||
for choice in response.choices:
|
||||
if isinstance(choice, litellm.Choices) and choice.message.content:
|
||||
content = choice.message.content
|
||||
break
|
||||
|
||||
if not content or not isinstance(content, str):
|
||||
return None
|
||||
|
||||
payload = {"response": {"text": content}}
|
||||
|
||||
response_json = await self._call_noma_api(
|
||||
payload=payload,
|
||||
llm_request_id=response.id,
|
||||
request_data=request_data,
|
||||
user_auth=user_auth,
|
||||
extra_data=extra_data,
|
||||
)
|
||||
|
||||
if self.monitor_mode:
|
||||
await self._handle_verdict_background(
|
||||
ASSISTANT_ROLE, content, response_json
|
||||
)
|
||||
return content
|
||||
|
||||
# Check if we should anonymize content
|
||||
if self._should_anonymize(response_json, ASSISTANT_ROLE):
|
||||
anonymized_content = self._extract_anonymized_content(
|
||||
response_json, ASSISTANT_ROLE
|
||||
)
|
||||
if anonymized_content:
|
||||
# Replace the LLM response content with anonymized version
|
||||
self._replace_llm_response_content(response, anonymized_content)
|
||||
verbose_proxy_logger.debug(
|
||||
f"Noma guardrail anonymized LLM response: {anonymized_content}"
|
||||
)
|
||||
return anonymized_content
|
||||
|
||||
await self._check_verdict(ASSISTANT_ROLE, content, response_json)
|
||||
return content
|
||||
|
||||
def _should_only_sensitive_data_failed(self, classification_obj: dict) -> bool:
|
||||
"""
|
||||
Check if only sensitive data detectors (PII, PCI, secrets) have result=true in the classification.
|
||||
|
||||
Args:
|
||||
classification_obj: The prompt or response classification object from Noma API
|
||||
|
||||
Returns:
|
||||
True if only sensitiveData detectors have result=true, False otherwise
|
||||
"""
|
||||
if not classification_obj:
|
||||
return False
|
||||
|
||||
# Track which detectors have result=true (detected violations)
|
||||
failed_detectors = []
|
||||
sensitive_data_detected = False
|
||||
|
||||
for key, value in classification_obj.items():
|
||||
if key in SENSITIVE_DATA_DETECTOR_KEYS and isinstance(value, dict):
|
||||
# Check if any sensitive data detector has result=true
|
||||
for data_type, data_result in value.items():
|
||||
if self._is_result_true(data_result):
|
||||
sensitive_data_detected = True
|
||||
# Don't add to failed_detectors as we want to allow these
|
||||
|
||||
elif isinstance(value, dict) and "result" in value:
|
||||
# Check other detectors - these should NOT have result=true
|
||||
if self._is_result_true(value):
|
||||
failed_detectors.append(key)
|
||||
|
||||
elif isinstance(value, dict):
|
||||
# Handle nested detectors
|
||||
for nested_key, nested_value in value.items():
|
||||
if self._is_result_true(nested_value):
|
||||
failed_detectors.append(f"{key}.{nested_key}")
|
||||
|
||||
# Return True only if sensitive data was detected AND no other detectors have result=true
|
||||
return sensitive_data_detected and len(failed_detectors) == 0
|
||||
|
||||
def _extract_anonymized_content(
|
||||
self, response_json: dict, message_type: MessageRole
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
Extract anonymized content from Noma API response.
|
||||
|
||||
Args:
|
||||
response_json: The full response from Noma API
|
||||
message_type: Either 'user' or 'assistant' to determine which content to extract
|
||||
|
||||
Returns:
|
||||
The anonymized content string if available, None otherwise
|
||||
"""
|
||||
original_response = response_json.get("originalResponse", {})
|
||||
|
||||
if message_type == USER_ROLE:
|
||||
prompt_data = original_response.get("prompt", {})
|
||||
anonymized_data = prompt_data.get("anonymizedContent", {})
|
||||
return anonymized_data.get("anonymized")
|
||||
elif message_type == ASSISTANT_ROLE:
|
||||
response_data = original_response.get("response", {})
|
||||
anonymized_data = response_data.get("anonymizedContent", {})
|
||||
return anonymized_data.get("anonymized")
|
||||
|
||||
return None
|
||||
|
||||
def _should_anonymize(self, response_json: dict, message_type: MessageRole) -> bool:
|
||||
"""
|
||||
Determine if content should be anonymized based on Noma API response.
|
||||
|
||||
Logic:
|
||||
- If verdict=True: Content is safe, anonymize if anonymized version exists
|
||||
- If verdict=False: Check if only sensitiveData detectors have result=True
|
||||
- If yes: Anonymize
|
||||
- If no: Block (other violations detected)
|
||||
|
||||
Args:
|
||||
response_json: The full response from Noma API
|
||||
message_type: Either 'user' or 'assistant' to determine which classification to check
|
||||
|
||||
Returns:
|
||||
True if content should be anonymized, False if it should be blocked
|
||||
"""
|
||||
# Only anonymize in blocking mode when anonymize_input is enabled
|
||||
if self.monitor_mode or not self.anonymize_input:
|
||||
return False
|
||||
|
||||
verdict = response_json.get("verdict", True)
|
||||
# If verdict is True, anonymize (content is considered safe)
|
||||
if verdict:
|
||||
return True
|
||||
|
||||
# If verdict is False, check if only sensitive data detectors have result=True
|
||||
original_response = response_json.get("originalResponse", {})
|
||||
|
||||
if message_type == USER_ROLE:
|
||||
classification_obj = original_response.get("prompt", {})
|
||||
elif message_type == ASSISTANT_ROLE:
|
||||
classification_obj = original_response.get("response", {})
|
||||
else:
|
||||
return False
|
||||
|
||||
# Anonymize only if solely sensitive data (PII/PCI/secrets) was detected
|
||||
return self._should_only_sensitive_data_failed(classification_obj)
|
||||
|
||||
def _is_result_true(self, result_obj: Optional[Dict[str, Any]]) -> bool:
|
||||
"""
|
||||
Check if a result object has a "result" field that is True.
|
||||
|
||||
Args:
|
||||
result_obj: A dictionary that may contain a "result" field
|
||||
|
||||
Returns:
|
||||
True if the "result" field exists and is True, False otherwise
|
||||
"""
|
||||
if not result_obj or not isinstance(result_obj, dict):
|
||||
return False
|
||||
|
||||
return result_obj.get("result") is True
|
||||
|
||||
def _replace_user_message_content(
|
||||
self, request_data: dict, anonymized_content: str
|
||||
):
|
||||
"""
|
||||
Replace the user message content in request data with anonymized version.
|
||||
|
||||
Args:
|
||||
request_data: The original request data
|
||||
anonymized_content: The anonymized content to replace with
|
||||
"""
|
||||
messages = request_data.get("messages", [])
|
||||
if not messages:
|
||||
return
|
||||
|
||||
# Find and replace the last user message
|
||||
for i in range(len(messages) - 1, -1, -1):
|
||||
if messages[i].get("role") == USER_ROLE:
|
||||
messages[i]["content"] = anonymized_content
|
||||
break
|
||||
|
||||
def _replace_llm_response_content(
|
||||
self, response: LLMResponse, anonymized_content: str
|
||||
):
|
||||
"""
|
||||
Replace the LLM response content with anonymized version.
|
||||
|
||||
Args:
|
||||
response: The original LLM response
|
||||
anonymized_content: The anonymized content to replace with
|
||||
"""
|
||||
if not isinstance(response, litellm.ModelResponse):
|
||||
return
|
||||
|
||||
# Replace content in all choices
|
||||
for choice in response.choices:
|
||||
if isinstance(choice, litellm.Choices) and choice.message.content:
|
||||
choice.message.content = anonymized_content
|
||||
|
||||
async def _check_user_message_background(
|
||||
self,
|
||||
request_data: dict,
|
||||
user_auth: UserAPIKeyAuth,
|
||||
) -> None:
|
||||
"""Check user message in background for monitor mode - non-blocking"""
|
||||
try:
|
||||
await self._process_user_message_check(request_data, user_auth)
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(
|
||||
f"Noma background user message check failed: {str(e)}"
|
||||
)
|
||||
|
||||
async def _check_llm_response_background(
|
||||
self,
|
||||
request_data: dict,
|
||||
response: LLMResponse,
|
||||
user_auth: UserAPIKeyAuth,
|
||||
) -> None:
|
||||
"""Check LLM response in background for monitor mode - non-blocking"""
|
||||
try:
|
||||
await self._process_llm_response_check(request_data, response, user_auth)
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(
|
||||
f"Noma background response check failed: {str(e)}"
|
||||
)
|
||||
|
||||
async def _handle_verdict_background(
|
||||
self,
|
||||
type: MessageRole,
|
||||
message: str,
|
||||
response_json: dict,
|
||||
) -> None:
|
||||
"""Handle verdict from Noma API in background - logging only, never blocks"""
|
||||
try:
|
||||
if not response_json.get("verdict", True):
|
||||
msg = f"Noma guardrail blocked {type} message: {message}"
|
||||
verbose_proxy_logger.warning(msg)
|
||||
else:
|
||||
msg = f"Noma guardrail allowed {type} message: {message}"
|
||||
verbose_proxy_logger.info(msg)
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(
|
||||
f"Noma background verdict handling failed: {str(e)}"
|
||||
)
|
||||
|
||||
async def async_pre_call_hook(
|
||||
self,
|
||||
user_api_key_dict: UserAPIKeyAuth,
|
||||
@ -191,6 +521,18 @@ class NomaGuardrail(CustomGuardrail):
|
||||
):
|
||||
return data
|
||||
|
||||
# In monitor mode, run Noma check in background and return immediately
|
||||
if self.monitor_mode:
|
||||
try:
|
||||
self._create_background_noma_check(
|
||||
self._check_user_message_background(data, user_api_key_dict)
|
||||
)
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(
|
||||
f"Failed to start background Noma pre-call check: {str(e)}"
|
||||
)
|
||||
return data
|
||||
|
||||
try:
|
||||
return await self._check_user_message(data, user_api_key_dict)
|
||||
except NomaBlockedMessage:
|
||||
@ -198,7 +540,7 @@ class NomaGuardrail(CustomGuardrail):
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(f"Noma pre-call hook failed: {str(e)}")
|
||||
|
||||
if self.block_failures and not self.monitor_mode:
|
||||
if self.block_failures:
|
||||
raise
|
||||
return data
|
||||
|
||||
@ -220,6 +562,18 @@ class NomaGuardrail(CustomGuardrail):
|
||||
if self.should_run_guardrail(data=data, event_type=event_type) is not True:
|
||||
return data
|
||||
|
||||
# In monitor mode, run Noma check in background and return immediately
|
||||
if self.monitor_mode:
|
||||
try:
|
||||
self._create_background_noma_check(
|
||||
self._check_user_message_background(data, user_api_key_dict)
|
||||
)
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(
|
||||
f"Failed to start background Noma moderation check: {str(e)}"
|
||||
)
|
||||
return data
|
||||
|
||||
try:
|
||||
return await self._check_user_message(data, user_api_key_dict)
|
||||
except NomaBlockedMessage:
|
||||
@ -227,7 +581,7 @@ class NomaGuardrail(CustomGuardrail):
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(f"Noma moderation hook failed: {str(e)}")
|
||||
|
||||
if self.block_failures and not self.monitor_mode:
|
||||
if self.block_failures:
|
||||
raise
|
||||
return data
|
||||
|
||||
@ -235,19 +589,33 @@ class NomaGuardrail(CustomGuardrail):
|
||||
self,
|
||||
data: dict,
|
||||
user_api_key_dict: UserAPIKeyAuth,
|
||||
response: Union[Any, ModelResponse, EmbeddingResponse, ImageResponse],
|
||||
response: LLMResponse,
|
||||
):
|
||||
event_type: GuardrailEventHooks = GuardrailEventHooks.post_call
|
||||
if self.should_run_guardrail(data=data, event_type=event_type) is not True:
|
||||
return response
|
||||
|
||||
# In monitor mode, run Noma check in background and return immediately
|
||||
if self.monitor_mode:
|
||||
try:
|
||||
self._create_background_noma_check(
|
||||
self._check_llm_response_background(
|
||||
data, response, user_api_key_dict
|
||||
)
|
||||
)
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(
|
||||
f"Failed to start background Noma post-call check: {str(e)}"
|
||||
)
|
||||
return response
|
||||
|
||||
try:
|
||||
return await self._check_llm_response(data, response, user_api_key_dict)
|
||||
except NomaBlockedMessage:
|
||||
raise
|
||||
except Exception as e:
|
||||
verbose_proxy_logger.error(f"Noma post-call hook failed: {str(e)}")
|
||||
if self.block_failures and not self.monitor_mode:
|
||||
if self.block_failures:
|
||||
raise
|
||||
return response
|
||||
|
||||
@ -257,55 +625,24 @@ class NomaGuardrail(CustomGuardrail):
|
||||
user_auth: UserAPIKeyAuth,
|
||||
) -> Union[Exception, str, dict, None]:
|
||||
"""Check user message for policy violations"""
|
||||
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
|
||||
|
||||
user_message = await self._extract_user_message(request_data)
|
||||
user_message = await self._process_user_message_check(request_data, user_auth)
|
||||
if not user_message:
|
||||
return request_data
|
||||
|
||||
payload = {"request": {"text": user_message}}
|
||||
response_json = await self._call_noma_api(
|
||||
payload=payload,
|
||||
llm_request_id=None,
|
||||
request_data=request_data,
|
||||
user_auth=user_auth,
|
||||
extra_data=extra_data,
|
||||
)
|
||||
await self._check_verdict("user", user_message, response_json)
|
||||
|
||||
return request_data
|
||||
|
||||
async def _check_llm_response(
|
||||
self,
|
||||
request_data: dict,
|
||||
response: Union[Any, ModelResponse, EmbeddingResponse, ImageResponse],
|
||||
response: LLMResponse,
|
||||
user_auth: UserAPIKeyAuth,
|
||||
) -> Union[Exception, ModelResponse, Any]:
|
||||
"""Check LLM response for policy violations"""
|
||||
extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
|
||||
|
||||
if not isinstance(response, litellm.ModelResponse):
|
||||
return response
|
||||
|
||||
content = None
|
||||
for choice in response.choices:
|
||||
if isinstance(choice, litellm.Choices) and choice.message.content:
|
||||
content = choice.message.content
|
||||
break
|
||||
|
||||
if not content or not isinstance(content, str):
|
||||
return response
|
||||
|
||||
payload = {"response": {"text": content}}
|
||||
|
||||
response_json = await self._call_noma_api(
|
||||
payload=payload,
|
||||
llm_request_id=response.id,
|
||||
request_data=request_data,
|
||||
user_auth=user_auth,
|
||||
extra_data=extra_data,
|
||||
content = await self._process_llm_response_check(
|
||||
request_data, response, user_auth
|
||||
)
|
||||
await self._check_verdict("assistant", content, response_json)
|
||||
if not content:
|
||||
return response
|
||||
|
||||
return response
|
||||
|
||||
@ -316,7 +653,7 @@ class NomaGuardrail(CustomGuardrail):
|
||||
return None
|
||||
|
||||
# Get the last user message
|
||||
user_messages = [msg for msg in messages if msg.get("role") == "user"]
|
||||
user_messages = [msg for msg in messages if msg.get("role") == USER_ROLE]
|
||||
if not user_messages:
|
||||
return None
|
||||
|
||||
@ -371,7 +708,7 @@ class NomaGuardrail(CustomGuardrail):
|
||||
|
||||
async def _check_verdict(
|
||||
self,
|
||||
type: Literal["user", "assistant"],
|
||||
type: MessageRole,
|
||||
message: str,
|
||||
response_json: dict,
|
||||
) -> None:
|
||||
@ -379,11 +716,7 @@ class NomaGuardrail(CustomGuardrail):
|
||||
Check the verdict from the Noma API and raise an exception if needed
|
||||
"""
|
||||
if not response_json.get("verdict", True):
|
||||
msg = str.format(
|
||||
"Noma guardrail blocked {type} message: {message}",
|
||||
type=type,
|
||||
message=message,
|
||||
)
|
||||
msg = f"Noma guardrail blocked {type} message: {message}"
|
||||
|
||||
if self.monitor_mode:
|
||||
verbose_proxy_logger.warning(msg)
|
||||
@ -392,11 +725,7 @@ class NomaGuardrail(CustomGuardrail):
|
||||
original_response = response_json.get("originalResponse", {})
|
||||
raise NomaBlockedMessage(original_response)
|
||||
else:
|
||||
msg = str.format(
|
||||
"Noma guardrail allowed {type} message: {message}",
|
||||
type=type,
|
||||
message=message,
|
||||
)
|
||||
msg = f"Noma guardrail allowed {type} message: {message}"
|
||||
if self.monitor_mode:
|
||||
verbose_proxy_logger.info(msg)
|
||||
else:
|
||||
|
||||
511
litellm/proxy/guardrails/guardrail_hooks/tool_permission.py
Normal file
511
litellm/proxy/guardrails/guardrail_hooks/tool_permission.py
Normal file
@ -0,0 +1,511 @@
|
||||
from fastapi import HTTPException
|
||||
|
||||
import re
|
||||
from typing import Any, AsyncGenerator, Dict, List, Literal, Optional, Union
|
||||
|
||||
from litellm import ChatCompletionToolParam
|
||||
|
||||
from litellm._logging import verbose_proxy_logger
|
||||
from litellm.caching.dual_cache import DualCache
|
||||
from litellm.exceptions import GuardrailRaisedException
|
||||
from litellm.integrations.custom_guardrail import (
|
||||
CustomGuardrail,
|
||||
log_guardrail_information,
|
||||
)
|
||||
from litellm.proxy._types import UserAPIKeyAuth
|
||||
from litellm.proxy.common_utils.callback_utils import (
|
||||
add_guardrail_to_applied_guardrails_header,
|
||||
)
|
||||
from litellm.types.guardrails import GuardrailEventHooks
|
||||
from litellm.types.proxy.guardrails.guardrail_hooks.tool_permission import (
|
||||
PermissionError,
|
||||
ToolPermissionRule,
|
||||
ToolResult,
|
||||
)
|
||||
from litellm.types.utils import (
|
||||
ModelResponse,
|
||||
ModelResponseStream,
|
||||
LLMResponseTypes,
|
||||
Choices,
|
||||
ChatCompletionMessageToolCall,
|
||||
)
|
||||
|
||||
GUARDRAIL_NAME = "tool_permission"
|
||||
|
||||
|
||||
class ToolPermissionGuardrail(CustomGuardrail):
|
||||
def __init__(
|
||||
self,
|
||||
rules: Optional[List[Dict]] = None,
|
||||
default_action: Literal["deny", "allow"] = "deny",
|
||||
on_disallowed_action: Literal["block", "rewrite"] = "block",
|
||||
**kwargs,
|
||||
):
|
||||
"""
|
||||
Initialize the Tool Permission Guardrail
|
||||
|
||||
Args:
|
||||
rules: List of permission rules
|
||||
default_action: Default action when no rule matches ("allow" or "deny")
|
||||
on_disallowed_action:
|
||||
**kwargs: Additional arguments passed to CustomGuardrail
|
||||
"""
|
||||
# Set supported event hooks - this guardrail only works on post_call
|
||||
if "supported_event_hooks" not in kwargs:
|
||||
kwargs["supported_event_hooks"] = [
|
||||
GuardrailEventHooks.pre_call,
|
||||
GuardrailEventHooks.post_call,
|
||||
]
|
||||
|
||||
super().__init__(**kwargs)
|
||||
|
||||
self.rules: List[ToolPermissionRule] = []
|
||||
if rules:
|
||||
for rule_dict in rules:
|
||||
self.rules.append(ToolPermissionRule(**rule_dict))
|
||||
|
||||
self.default_action = default_action
|
||||
self.on_disallowed_action = on_disallowed_action
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
"Tool Permission Guardrail initialized with %d rules, default_action: %s",
|
||||
len(self.rules),
|
||||
self.default_action,
|
||||
)
|
||||
|
||||
def _matches_pattern(self, tool_name: str, pattern: str) -> bool:
|
||||
"""
|
||||
Check if a tool name matches a pattern
|
||||
|
||||
Supports patterns like:
|
||||
- "Bash" - exact match
|
||||
- "mcp__*" - prefix pattern (matches names starting wich "mcp__")
|
||||
- "*_read" - suffix wildcard (matches names ending with "_read")
|
||||
- "mcp__github_*_read" - infix wildcard (matches names like "mcp__github_mark_all_notifications_read")
|
||||
|
||||
Args:
|
||||
tool_name: Name of the tool to check
|
||||
pattern: Pattern to match against
|
||||
|
||||
Returns:
|
||||
True if the tool name matches the pattern
|
||||
"""
|
||||
# Handle exact matches
|
||||
if tool_name == pattern:
|
||||
return True
|
||||
|
||||
if "*" in pattern:
|
||||
# Escape regex special chars except '*'
|
||||
escaped_pattern = re.escape(pattern)
|
||||
# Turn \* into .*
|
||||
regex_pattern = escaped_pattern.replace(r"\*", ".*")
|
||||
return bool(re.fullmatch(regex_pattern, tool_name))
|
||||
|
||||
return False
|
||||
|
||||
def _check_tool_permission(
|
||||
self, tool_name: str
|
||||
) -> tuple[bool, Optional[str], Optional[str]]:
|
||||
"""
|
||||
Check if a tool is allowed based on the configured rules
|
||||
|
||||
Args:
|
||||
tool_name: Name of the tool to check
|
||||
|
||||
Returns:
|
||||
Tuple of (is_allowed, rule_id, message)
|
||||
"""
|
||||
verbose_proxy_logger.debug(f"Checking permission for tool: {tool_name}")
|
||||
|
||||
# Check each rule in order
|
||||
for rule in self.rules:
|
||||
if self._matches_pattern(tool_name, rule.tool_name):
|
||||
is_allowed = rule.decision == "allow"
|
||||
message = f"Tool '{tool_name}' {'allowed' if is_allowed else 'denied'} by rule '{rule.id}'"
|
||||
verbose_proxy_logger.debug(message)
|
||||
return is_allowed, rule.id, message
|
||||
|
||||
# No rule matched, use default action
|
||||
is_allowed = self.default_action == "allow"
|
||||
message = f"Tool '{tool_name}' {'allowed' if is_allowed else 'denied'} by default action"
|
||||
verbose_proxy_logger.debug(message)
|
||||
return is_allowed, None, message
|
||||
|
||||
def _extract_tool_calls_from_response(
|
||||
self, response: ModelResponse
|
||||
) -> List[ChatCompletionMessageToolCall]:
|
||||
"""
|
||||
Extract tool_calls from all choices in a model response.
|
||||
|
||||
Args:
|
||||
response: The model response to analyze
|
||||
|
||||
Returns:
|
||||
List of tool_calls blocks found in the response
|
||||
"""
|
||||
tool_calls = []
|
||||
|
||||
for choice in response.choices:
|
||||
if isinstance(choice, Choices):
|
||||
for tool in choice.message.tool_calls or []:
|
||||
tool_calls.append(tool)
|
||||
|
||||
return tool_calls
|
||||
|
||||
def _modify_request_with_permission_errors(
|
||||
self,
|
||||
data: dict,
|
||||
denied_tool_names: List[str],
|
||||
):
|
||||
"""
|
||||
Modify the request to replace denied tool_calls blocks with error results
|
||||
|
||||
Args:
|
||||
data: The model request to modify
|
||||
denied_tools: List of (tool_use, error) tuples for denied tools
|
||||
"""
|
||||
if not denied_tool_names:
|
||||
return data
|
||||
|
||||
verbose_proxy_logger.info(
|
||||
f"Blocking {len(denied_tool_names)} unauthorized tool uses"
|
||||
)
|
||||
|
||||
# Create a mapping of tool_use_id to error result
|
||||
error_tool_names = set()
|
||||
for tool_use in denied_tool_names:
|
||||
error_tool_names.add(tool_use)
|
||||
|
||||
# Modify the tools
|
||||
tools: Optional[List[ChatCompletionToolParam]] = data.get("tools")
|
||||
if tools is None:
|
||||
return data
|
||||
|
||||
new_tools = []
|
||||
for tool in tools:
|
||||
if tool["type"] != "function":
|
||||
continue
|
||||
tool_name: str = tool["function"]["name"]
|
||||
if tool_name not in error_tool_names:
|
||||
new_tools.append(tool)
|
||||
data["tools"] = new_tools
|
||||
return data
|
||||
|
||||
def _create_permission_error_result(
|
||||
self, tool_call: ChatCompletionMessageToolCall, error: PermissionError
|
||||
) -> ToolResult:
|
||||
"""
|
||||
Create a tool_result block for a permission error
|
||||
|
||||
Args:
|
||||
tool_use: The tool use that was denied
|
||||
error: The permission error details
|
||||
|
||||
Returns:
|
||||
A tool_result block with the error message
|
||||
"""
|
||||
error_message = f"Permission denied: {error.message}"
|
||||
if error.rule_id:
|
||||
error_message += f" (Rule: {error.rule_id})"
|
||||
|
||||
return ToolResult(
|
||||
tool_use_id=tool_call.id, content=error_message, is_error=True
|
||||
)
|
||||
|
||||
def _modify_response_with_permission_errors(
|
||||
self,
|
||||
response: ModelResponse,
|
||||
denied_tools: List[tuple[ChatCompletionMessageToolCall, PermissionError]],
|
||||
) -> None:
|
||||
"""
|
||||
Modify the response to replace denied tool_calls blocks with error results
|
||||
|
||||
Args:
|
||||
response: The model response to modify
|
||||
denied_tools: List of (tool_use, error) tuples for denied tools
|
||||
"""
|
||||
if not denied_tools:
|
||||
return
|
||||
|
||||
verbose_proxy_logger.info(
|
||||
f"Blocking {len(denied_tools)} unauthorized tool uses"
|
||||
)
|
||||
|
||||
# Create a mapping of tool_use_id to error result
|
||||
error_results = {}
|
||||
for tool_use, error in denied_tools:
|
||||
error_result = self._create_permission_error_result(tool_use, error)
|
||||
error_results[tool_use.id] = error_result
|
||||
|
||||
# Modify the response content
|
||||
for choice in response.choices:
|
||||
if isinstance(choice, Choices):
|
||||
filtered_tool_calls = []
|
||||
error_messages = []
|
||||
|
||||
# Rewrite tool_calls
|
||||
for tool_call in choice.message.tool_calls or []:
|
||||
tool_call_id = tool_call.id
|
||||
if tool_call_id in error_results:
|
||||
error_result = error_results[tool_call_id]
|
||||
error_messages.append(error_result.content)
|
||||
else:
|
||||
filtered_tool_calls.append(tool_call)
|
||||
|
||||
choice.message.tool_calls = (
|
||||
filtered_tool_calls if filtered_tool_calls else None
|
||||
)
|
||||
|
||||
# Add error messages to content
|
||||
if error_messages:
|
||||
existing_content = choice.message.content
|
||||
if existing_content:
|
||||
choice.message.content = (
|
||||
existing_content + "\n\n" + "\n".join(error_messages)
|
||||
)
|
||||
else:
|
||||
choice.message.content = "\n".join(error_messages)
|
||||
|
||||
@log_guardrail_information
|
||||
async def async_pre_call_hook(
|
||||
self,
|
||||
user_api_key_dict: UserAPIKeyAuth,
|
||||
cache: DualCache,
|
||||
data: dict,
|
||||
call_type: Literal[
|
||||
"completion",
|
||||
"text_completion",
|
||||
"embeddings",
|
||||
"image_generation",
|
||||
"moderation",
|
||||
"audio_transcription",
|
||||
"pass_through_endpoint",
|
||||
"rerank",
|
||||
"mcp_call",
|
||||
],
|
||||
) -> Union[Exception, str, dict, None]:
|
||||
""" """
|
||||
verbose_proxy_logger.debug("Tool Permission Guardrail Pre-Call Hook")
|
||||
|
||||
from litellm.proxy.common_utils.callback_utils import (
|
||||
add_guardrail_to_applied_guardrails_header,
|
||||
)
|
||||
|
||||
event_type: GuardrailEventHooks = GuardrailEventHooks.pre_call
|
||||
if self.should_run_guardrail(data=data, event_type=event_type) is not True:
|
||||
return data
|
||||
|
||||
new_tools: Optional[List[ChatCompletionToolParam]] = data.get("tools")
|
||||
if new_tools is None:
|
||||
verbose_proxy_logger.warning(
|
||||
"Tool Permission Guardrail: not running guardrail. No tools in data"
|
||||
)
|
||||
return data
|
||||
|
||||
# Check permissions for each tool
|
||||
denied_tool_names = []
|
||||
for tool in new_tools:
|
||||
if tool["type"] != "function":
|
||||
continue
|
||||
tool_name: str = tool["function"]["name"]
|
||||
|
||||
is_allowed, _, message = self._check_tool_permission(tool_name)
|
||||
|
||||
if not is_allowed and message is not None:
|
||||
verbose_proxy_logger.warning(f"Tool Permission Guardrail: {message}")
|
||||
if self.on_disallowed_action == "block":
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail={
|
||||
"error": "Violated guardrail policy",
|
||||
"detection_message": message,
|
||||
},
|
||||
)
|
||||
denied_tool_names.append(tool_name)
|
||||
|
||||
if denied_tool_names:
|
||||
data = self._modify_request_with_permission_errors(data, denied_tool_names)
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
"Tool Permission Guardrail Pre-Call Hook: All tools allowed"
|
||||
)
|
||||
|
||||
add_guardrail_to_applied_guardrails_header(
|
||||
request_data=data, guardrail_name=self.guardrail_name
|
||||
)
|
||||
return data
|
||||
|
||||
@log_guardrail_information
|
||||
async def async_post_call_success_hook(
|
||||
self,
|
||||
data: dict,
|
||||
user_api_key_dict: UserAPIKeyAuth,
|
||||
response: LLMResponseTypes,
|
||||
):
|
||||
"""
|
||||
Check tool usage permissions after the LLM call
|
||||
|
||||
Args:
|
||||
data: Request data
|
||||
user_api_key_dict: User API key information (unused but required by interface)
|
||||
response: The model response to check
|
||||
"""
|
||||
if not isinstance(response, ModelResponse):
|
||||
return
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
"Tool Permission Guardrail Post-Call Hook: Checking response"
|
||||
)
|
||||
|
||||
if not self.should_run_guardrail(
|
||||
data=data, event_type=GuardrailEventHooks.post_call
|
||||
):
|
||||
verbose_proxy_logger.debug(
|
||||
"Tool Permission Guardrail: Skipping check (not enabled)"
|
||||
)
|
||||
return
|
||||
|
||||
# Extract tool_calls from the response
|
||||
tool_calls = self._extract_tool_calls_from_response(response)
|
||||
|
||||
if not tool_calls:
|
||||
verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
|
||||
return
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
f"Tool Permission Guardrail: Found {len(tool_calls)} tool calls"
|
||||
)
|
||||
|
||||
# Check permissions for each tool use
|
||||
denied_tools = []
|
||||
for tool_call in tool_calls:
|
||||
if tool_call.function.name is None:
|
||||
continue
|
||||
is_allowed, rule_id, message = self._check_tool_permission(
|
||||
tool_call.function.name
|
||||
)
|
||||
|
||||
if not is_allowed and message is not None:
|
||||
verbose_proxy_logger.warning(f"Tool Permission Guardrail: {message}")
|
||||
|
||||
if self.on_disallowed_action == "block":
|
||||
raise GuardrailRaisedException(
|
||||
guardrail_name=self.guardrail_name,
|
||||
message=message,
|
||||
)
|
||||
denied_tools.append(
|
||||
(
|
||||
tool_call,
|
||||
PermissionError(
|
||||
tool_name=tool_call.function.name,
|
||||
rule_id=rule_id,
|
||||
message=message,
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
if denied_tools:
|
||||
self._modify_response_with_permission_errors(response, denied_tools)
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
"Tool Permission Guardrail Post-Call Hook: All tools allowed"
|
||||
)
|
||||
|
||||
add_guardrail_to_applied_guardrails_header(
|
||||
request_data=data, guardrail_name=self.guardrail_name
|
||||
)
|
||||
|
||||
async def async_post_call_streaming_iterator_hook(
|
||||
self,
|
||||
user_api_key_dict: UserAPIKeyAuth,
|
||||
response: Any,
|
||||
request_data: dict,
|
||||
) -> AsyncGenerator[ModelResponseStream, None]:
|
||||
"""
|
||||
Check tool usage permissions after the LLM stream call
|
||||
|
||||
Args:
|
||||
user_api_key_dict: User API key information (unused but required by interface)
|
||||
response: The model response to check
|
||||
request_data: The model request (unused but required by interface)
|
||||
"""
|
||||
|
||||
# Import here to avoid circular imports
|
||||
from litellm.llms.base_llm.base_model_iterator import MockResponseIterator
|
||||
from litellm.main import stream_chunk_builder
|
||||
from litellm.types.utils import TextCompletionResponse
|
||||
|
||||
# Collect all chunks to process them together
|
||||
all_chunks: List[ModelResponseStream] = []
|
||||
async for chunk in response:
|
||||
all_chunks.append(chunk)
|
||||
|
||||
assembled_model_response: Optional[
|
||||
Union[ModelResponse, TextCompletionResponse]
|
||||
] = stream_chunk_builder(
|
||||
chunks=all_chunks,
|
||||
)
|
||||
if isinstance(assembled_model_response, ModelResponse):
|
||||
verbose_proxy_logger.debug("Tool Permission Guardrail: Checking response")
|
||||
|
||||
# Extract tool_calls from the response
|
||||
tool_calls = self._extract_tool_calls_from_response(assembled_model_response)
|
||||
|
||||
if not tool_calls:
|
||||
verbose_proxy_logger.debug(
|
||||
"Tool Permission Guardrail: No tool uses found"
|
||||
)
|
||||
return
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
f"Tool Permission Guardrail: Found {len(tool_calls)} tool calls"
|
||||
)
|
||||
|
||||
# Check permissions for each tool use
|
||||
denied_tools = []
|
||||
for tool_call in tool_calls:
|
||||
if tool_call.function.name is None:
|
||||
continue
|
||||
is_allowed, rule_id, message = self._check_tool_permission(
|
||||
tool_call.function.name
|
||||
)
|
||||
|
||||
if not is_allowed and message is not None:
|
||||
verbose_proxy_logger.warning(
|
||||
f"Tool Permission Guardrail: {message}"
|
||||
)
|
||||
|
||||
if self.on_disallowed_action == "block":
|
||||
raise GuardrailRaisedException(
|
||||
guardrail_name=self.guardrail_name,
|
||||
message=message,
|
||||
)
|
||||
denied_tools.append(
|
||||
(
|
||||
tool_call,
|
||||
PermissionError(
|
||||
tool_name=tool_call.function.name,
|
||||
rule_id=rule_id,
|
||||
message=message,
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
verbose_proxy_logger.debug(
|
||||
"Tool Permission Guardrail Post-Call Hook: All tools allowed"
|
||||
)
|
||||
|
||||
if denied_tools:
|
||||
self._modify_response_with_permission_errors(
|
||||
assembled_model_response, denied_tools
|
||||
)
|
||||
|
||||
mock_response = MockResponseIterator(
|
||||
model_response=assembled_model_response
|
||||
)
|
||||
# Return the reconstructed stream
|
||||
async for chunk in mock_response:
|
||||
yield chunk
|
||||
else:
|
||||
for chunk in all_chunks:
|
||||
yield chunk
|
||||
@ -123,3 +123,18 @@ def initialize_hide_secrets(litellm_params: LitellmParams, guardrail: Guardrail)
|
||||
return _secret_detection_object
|
||||
|
||||
|
||||
def initialize_tool_permission(litellm_params: LitellmParams, guardrail: Guardrail):
|
||||
from litellm.proxy.guardrails.guardrail_hooks.tool_permission import (
|
||||
ToolPermissionGuardrail,
|
||||
)
|
||||
|
||||
_tool_permission_callback = ToolPermissionGuardrail(
|
||||
guardrail_name=guardrail.get("guardrail_name", ""),
|
||||
event_hook=litellm_params.mode,
|
||||
rules=litellm_params.rules,
|
||||
default_action=getattr(litellm_params, "default_action", "deny"),
|
||||
on_disallowed_action=getattr(litellm_params, "on_disallowed_action", "block"),
|
||||
default_on=litellm_params.default_on,
|
||||
)
|
||||
litellm.logging_callback_manager.add_litellm_callback(_tool_permission_callback)
|
||||
return _tool_permission_callback
|
||||
|
||||
@ -26,6 +26,7 @@ from .guardrail_initializers import (
|
||||
initialize_lakera,
|
||||
initialize_lakera_v2,
|
||||
initialize_presidio,
|
||||
initialize_tool_permission,
|
||||
)
|
||||
|
||||
guardrail_initializer_registry = {
|
||||
@ -34,6 +35,7 @@ guardrail_initializer_registry = {
|
||||
SupportedGuardrailIntegrations.LAKERA_V2.value: initialize_lakera_v2,
|
||||
SupportedGuardrailIntegrations.PRESIDIO.value: initialize_presidio,
|
||||
SupportedGuardrailIntegrations.HIDE_SECRETS.value: initialize_hide_secrets,
|
||||
SupportedGuardrailIntegrations.TOOL_PERMISSION.value: initialize_tool_permission,
|
||||
}
|
||||
|
||||
guardrail_class_registry: Dict[str, Type[CustomGuardrail]] = {}
|
||||
|
||||
@ -1,10 +1,11 @@
|
||||
import asyncio
|
||||
import sys
|
||||
from datetime import datetime, timedelta
|
||||
from typing import TYPE_CHECKING, Any, List, Literal, Optional, Tuple, TypedDict, Union
|
||||
from typing import TYPE_CHECKING, Any, List, Literal, Optional, Tuple, Union
|
||||
|
||||
from fastapi import HTTPException
|
||||
from pydantic import BaseModel
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
import litellm
|
||||
from litellm import DualCache, ModelResponse
|
||||
|
||||
@ -14,11 +14,14 @@ from litellm.proxy._types import (
|
||||
AddTeamCallback,
|
||||
CommonProxyErrors,
|
||||
LitellmDataForBackendLLMCall,
|
||||
LitellmUserRoles,
|
||||
SpecialHeaders,
|
||||
TeamCallbackMetadata,
|
||||
UserAPIKeyAuth,
|
||||
LitellmUserRoles,
|
||||
)
|
||||
|
||||
# Cache special headers as a frozenset for O(1) lookup performance
|
||||
_SPECIAL_HEADERS_CACHE = frozenset(v.value.lower() for v in SpecialHeaders._member_map_.values())
|
||||
from litellm.proxy.auth.route_checks import RouteChecks
|
||||
from litellm.router import Router
|
||||
from litellm.types.llms.anthropic import ANTHROPIC_API_HEADERS
|
||||
@ -54,6 +57,13 @@ def parse_cache_control(cache_control):
|
||||
return cache_dict
|
||||
|
||||
|
||||
LITELLM_METADATA_ROUTES = (
|
||||
"batches",
|
||||
"/v1/messages",
|
||||
"responses",
|
||||
"files",
|
||||
)
|
||||
|
||||
def _get_metadata_variable_name(request: Request) -> str:
|
||||
"""
|
||||
Helper to return what the "metadata" field should be called in the request data
|
||||
@ -65,22 +75,10 @@ def _get_metadata_variable_name(request: Request) -> str:
|
||||
if RouteChecks._is_assistants_api_request(request):
|
||||
return "litellm_metadata"
|
||||
|
||||
LITELLM_METADATA_ROUTES = [
|
||||
"batches",
|
||||
"/v1/messages",
|
||||
"responses",
|
||||
"files",
|
||||
]
|
||||
|
||||
if any(
|
||||
[
|
||||
litellm_metadata_route in request.url.path
|
||||
for litellm_metadata_route in LITELLM_METADATA_ROUTES
|
||||
]
|
||||
):
|
||||
if any(route in request.url.path for route in LITELLM_METADATA_ROUTES):
|
||||
return "litellm_metadata"
|
||||
else:
|
||||
return "metadata"
|
||||
|
||||
return "metadata"
|
||||
|
||||
|
||||
def safe_add_api_version_from_query_params(data: dict, request: Request):
|
||||
@ -235,14 +233,13 @@ def clean_headers(
|
||||
"""
|
||||
Removes litellm api key from headers
|
||||
"""
|
||||
special_headers = [v.value.lower() for v in SpecialHeaders._member_map_.values()]
|
||||
special_headers = special_headers
|
||||
if litellm_key_header_name is not None:
|
||||
special_headers.append(litellm_key_header_name.lower())
|
||||
clean_headers = {}
|
||||
|
||||
litellm_key_lower = litellm_key_header_name.lower() if litellm_key_header_name is not None else None
|
||||
|
||||
for header, value in headers.items():
|
||||
if header.lower() not in special_headers:
|
||||
header_lower = header.lower()
|
||||
# Check if header should be excluded: either in special headers cache or matches custom litellm key
|
||||
if (header_lower not in _SPECIAL_HEADERS_CACHE and (litellm_key_lower is None or header_lower != litellm_key_lower)):
|
||||
clean_headers[header] = value
|
||||
return clean_headers
|
||||
|
||||
@ -272,7 +269,7 @@ class LiteLLMProxyRequestSetup:
|
||||
if timeout_header is not None:
|
||||
return float(timeout_header)
|
||||
return None
|
||||
|
||||
|
||||
@staticmethod
|
||||
def _get_stream_timeout_from_request(headers: dict) -> Optional[float]:
|
||||
"""
|
||||
@ -292,13 +289,14 @@ class LiteLLMProxyRequestSetup:
|
||||
if num_retries_header is not None:
|
||||
return int(num_retries_header)
|
||||
return None
|
||||
|
||||
|
||||
@staticmethod
|
||||
def _get_spend_logs_metadata_from_request_headers(headers: dict) -> Optional[dict]:
|
||||
"""
|
||||
Get the `spend_logs_metadata` from the request headers.
|
||||
"""
|
||||
from litellm.litellm_core_utils.safe_json_loads import safe_json_loads
|
||||
|
||||
spend_logs_metadata_header = headers.get("x-litellm-spend-logs-metadata", None)
|
||||
if spend_logs_metadata_header is not None:
|
||||
return safe_json_loads(spend_logs_metadata_header)
|
||||
@ -337,16 +335,24 @@ class LiteLLMProxyRequestSetup:
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def add_internal_user_from_user_mapping(general_settings: Optional[Dict], user_api_key_dict: UserAPIKeyAuth, headers: dict) -> UserAPIKeyAuth:
|
||||
def add_internal_user_from_user_mapping(
|
||||
general_settings: Optional[Dict],
|
||||
user_api_key_dict: UserAPIKeyAuth,
|
||||
headers: dict,
|
||||
) -> UserAPIKeyAuth:
|
||||
if general_settings is None:
|
||||
return user_api_key_dict
|
||||
user_header_mapping = general_settings.get("user_header_mappings")
|
||||
if not user_header_mapping:
|
||||
return user_api_key_dict
|
||||
header_name = LiteLLMProxyRequestSetup.get_internal_user_header_from_mapping(user_header_mapping)
|
||||
header_name = LiteLLMProxyRequestSetup.get_internal_user_header_from_mapping(
|
||||
user_header_mapping
|
||||
)
|
||||
if not header_name:
|
||||
return user_api_key_dict
|
||||
header_value = LiteLLMProxyRequestSetup._get_case_insensitive_header(headers, header_name)
|
||||
header_value = LiteLLMProxyRequestSetup._get_case_insensitive_header(
|
||||
headers, header_name
|
||||
)
|
||||
if header_value:
|
||||
user_api_key_dict.user_id = header_value
|
||||
return user_api_key_dict
|
||||
@ -429,15 +435,25 @@ class LiteLLMProxyRequestSetup:
|
||||
"""
|
||||
Add headers to the LLM call by model group
|
||||
"""
|
||||
from litellm.proxy.auth.auth_checks import _check_model_access_helper
|
||||
from litellm.proxy.proxy_server import llm_router
|
||||
|
||||
data_model = data.get("model")
|
||||
|
||||
if (
|
||||
data_model is not None
|
||||
and litellm.model_group_settings is not None
|
||||
and litellm.model_group_settings.forward_client_headers_to_llm_api
|
||||
is not None
|
||||
and data_model
|
||||
in litellm.model_group_settings.forward_client_headers_to_llm_api
|
||||
and _check_model_access_helper(
|
||||
model=data_model,
|
||||
llm_router=llm_router,
|
||||
models=litellm.model_group_settings.forward_client_headers_to_llm_api,
|
||||
team_model_aliases=user_api_key_dict.team_model_aliases,
|
||||
team_id=user_api_key_dict.team_id,
|
||||
) # handles aliases, wildcards, etc.
|
||||
):
|
||||
|
||||
_headers = LiteLLMProxyRequestSetup.add_headers_to_llm_call(
|
||||
headers, user_api_key_dict
|
||||
)
|
||||
@ -497,8 +513,10 @@ class LiteLLMProxyRequestSetup:
|
||||
timeout = LiteLLMProxyRequestSetup._get_timeout_from_request(headers)
|
||||
if timeout is not None:
|
||||
data["timeout"] = timeout
|
||||
|
||||
stream_timeout = LiteLLMProxyRequestSetup._get_stream_timeout_from_request(headers)
|
||||
|
||||
stream_timeout = LiteLLMProxyRequestSetup._get_stream_timeout_from_request(
|
||||
headers
|
||||
)
|
||||
if stream_timeout is not None:
|
||||
data["stream_timeout"] = stream_timeout
|
||||
|
||||
@ -507,7 +525,7 @@ class LiteLLMProxyRequestSetup:
|
||||
data["num_retries"] = num_retries
|
||||
|
||||
return data
|
||||
|
||||
|
||||
@staticmethod
|
||||
def add_litellm_metadata_from_request_headers(
|
||||
headers: dict,
|
||||
@ -520,11 +538,16 @@ class LiteLLMProxyRequestSetup:
|
||||
Relevant issue: https://github.com/BerriAI/litellm/issues/14008
|
||||
"""
|
||||
from litellm.proxy._types import LitellmMetadataFromRequestHeaders
|
||||
|
||||
metadata_from_headers = LitellmMetadataFromRequestHeaders()
|
||||
spend_logs_metadata = LiteLLMProxyRequestSetup._get_spend_logs_metadata_from_request_headers(headers)
|
||||
spend_logs_metadata = (
|
||||
LiteLLMProxyRequestSetup._get_spend_logs_metadata_from_request_headers(
|
||||
headers
|
||||
)
|
||||
)
|
||||
if spend_logs_metadata is not None:
|
||||
metadata_from_headers["spend_logs_metadata"] = spend_logs_metadata
|
||||
|
||||
|
||||
#########################################################################################
|
||||
# Finally update the requests metadata with the `metadata_from_headers`
|
||||
#########################################################################################
|
||||
@ -714,7 +737,6 @@ async def add_litellm_data_to_request( # noqa: PLR0915
|
||||
from litellm.proxy.proxy_server import llm_router, premium_user
|
||||
from litellm.types.proxy.litellm_pre_call_utils import SecretFields
|
||||
|
||||
|
||||
_headers = clean_headers(
|
||||
request.headers,
|
||||
litellm_key_header_name=(
|
||||
@ -740,8 +762,6 @@ async def add_litellm_data_to_request( # noqa: PLR0915
|
||||
if data.get(_metadata_variable_name, None) is None:
|
||||
data[_metadata_variable_name] = {}
|
||||
|
||||
|
||||
|
||||
data.update(
|
||||
LiteLLMProxyRequestSetup.add_litellm_data_for_backend_llm_call(
|
||||
headers=_headers,
|
||||
@ -763,7 +783,9 @@ async def add_litellm_data_to_request( # noqa: PLR0915
|
||||
data=data, headers=_headers, user_api_key_dict=user_api_key_dict
|
||||
)
|
||||
|
||||
user_api_key_dict = LiteLLMProxyRequestSetup.add_internal_user_from_user_mapping(general_settings, user_api_key_dict, _headers)
|
||||
user_api_key_dict = LiteLLMProxyRequestSetup.add_internal_user_from_user_mapping(
|
||||
general_settings, user_api_key_dict, _headers
|
||||
)
|
||||
|
||||
# Parse user info from headers
|
||||
user = LiteLLMProxyRequestSetup.get_user_from_headers(_headers, general_settings)
|
||||
@ -773,7 +795,6 @@ async def add_litellm_data_to_request( # noqa: PLR0915
|
||||
if "user" not in data:
|
||||
data["user"] = user
|
||||
|
||||
|
||||
data["secret_fields"] = SecretFields(raw_headers=dict(request.headers))
|
||||
|
||||
## Dynamic api version (Azure OpenAI endpoints) ##
|
||||
|
||||
@ -93,7 +93,7 @@ async def _upsert_budget_and_membership(
|
||||
create_data["tpm_limit"] = tpm_limit
|
||||
if rpm_limit is not None:
|
||||
create_data["rpm_limit"] = rpm_limit
|
||||
|
||||
|
||||
new_budget = await tx.litellm_budgettable.create(
|
||||
data=create_data,
|
||||
include={"team_membership": True},
|
||||
|
||||
@ -925,6 +925,15 @@ async def prepare_key_update_data(
|
||||
detail="team_id is required for service account keys. Please specify `team_id` in the request body.",
|
||||
)
|
||||
non_default_values = {}
|
||||
# ADD METADATA FIELDS
|
||||
# Set Management Endpoint Metadata Fields
|
||||
for field in LiteLLM_ManagementEndpoint_MetadataFields_Premium:
|
||||
if getattr(data, field, None) is not None:
|
||||
_set_object_metadata_field(
|
||||
object_data=data,
|
||||
field_name=field,
|
||||
value=getattr(data, field),
|
||||
)
|
||||
for k, v in data_json.items():
|
||||
if (
|
||||
k in LiteLLM_ManagementEndpoint_MetadataFields
|
||||
@ -1137,6 +1146,9 @@ async def update_key_fn(
|
||||
change_initiated_by=user_api_key_dict,
|
||||
llm_router=llm_router,
|
||||
)
|
||||
|
||||
# Set Management Endpoint Metadata Fields
|
||||
|
||||
non_default_values = await prepare_key_update_data(
|
||||
data=data, existing_key_row=existing_key_row
|
||||
)
|
||||
|
||||
@ -36,6 +36,28 @@ from litellm.proxy.utils import PrismaClient
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
def handle_nested_budget_structure_in_organization_update_request(raw_data: dict) -> dict:
|
||||
"""
|
||||
Transform organization update request to handle UI payload format.
|
||||
|
||||
The UI sends nested budget data in 'litellm_budget_table', but our
|
||||
model expects flat budget fields at the top level.
|
||||
"""
|
||||
transformed_data = raw_data.copy()
|
||||
|
||||
# Handle nested budget structure from UI
|
||||
if 'litellm_budget_table' in transformed_data:
|
||||
budget_data = transformed_data.pop('litellm_budget_table', {})
|
||||
if budget_data:
|
||||
# Extract valid budget fields and merge into top level
|
||||
budget_fields = LiteLLM_BudgetTable.model_fields.keys()
|
||||
for key, value in budget_data.items():
|
||||
if key in budget_fields and value is not None:
|
||||
transformed_data[key] = value
|
||||
|
||||
return transformed_data
|
||||
|
||||
|
||||
@router.post(
|
||||
"/organization/new",
|
||||
tags=["organization management"],
|
||||
@ -248,7 +270,7 @@ async def _set_object_permission(
|
||||
response_model=LiteLLM_OrganizationTableWithMembers,
|
||||
)
|
||||
async def update_organization(
|
||||
data: LiteLLM_OrganizationTableUpdate,
|
||||
request: Request,
|
||||
user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
|
||||
):
|
||||
"""
|
||||
@ -270,6 +292,13 @@ async def update_organization(
|
||||
},
|
||||
)
|
||||
|
||||
# Transform UI payload to expected format
|
||||
raw_data = await request.json()
|
||||
raw_data_with_flat_budget_fields = handle_nested_budget_structure_in_organization_update_request(raw_data)
|
||||
|
||||
# Create validated data model
|
||||
data = LiteLLM_OrganizationTableUpdate(**raw_data_with_flat_budget_fields)
|
||||
|
||||
if data.updated_by is None:
|
||||
data.updated_by = user_api_key_dict.user_id
|
||||
|
||||
@ -293,6 +322,23 @@ async def update_organization(
|
||||
existing_organization_row=existing_organization_row,
|
||||
)
|
||||
|
||||
# Handle budget updates if budget fields are provided
|
||||
budget_fields = {k: v for k, v in data.model_dump().items()
|
||||
if k in LiteLLM_BudgetTable.model_fields.keys() and v is not None}
|
||||
|
||||
if budget_fields and existing_organization_row.budget_id:
|
||||
await update_budget(
|
||||
budget_obj=BudgetNewRequest(
|
||||
budget_id=existing_organization_row.budget_id,
|
||||
**budget_fields
|
||||
),
|
||||
user_api_key_dict=user_api_key_dict,
|
||||
)
|
||||
|
||||
# Remove budget fields from organization update data
|
||||
for field in LiteLLM_BudgetTable.model_fields.keys():
|
||||
updated_organization_row.pop(field, None)
|
||||
|
||||
response = await prisma_client.db.litellm_organizationtable.update(
|
||||
where={"organization_id": data.organization_id},
|
||||
data=updated_organization_row,
|
||||
|
||||
@ -5,7 +5,7 @@ This is an enterprise feature and requires a premium license.
|
||||
"""
|
||||
|
||||
import uuid
|
||||
from typing import Any, Dict, List, Optional, Set, Tuple, TypedDict
|
||||
from typing import Any, Dict, List, Optional, Set, Tuple
|
||||
|
||||
from fastapi import (
|
||||
APIRouter,
|
||||
@ -17,6 +17,7 @@ from fastapi import (
|
||||
Request,
|
||||
Response,
|
||||
)
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
import litellm
|
||||
from litellm._logging import verbose_proxy_logger
|
||||
|
||||
@ -2,10 +2,11 @@ import asyncio
|
||||
import json
|
||||
import time
|
||||
from datetime import datetime
|
||||
from typing import Literal, Optional, TypedDict
|
||||
from typing import Literal, Optional
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import httpx
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
import litellm
|
||||
from litellm._logging import verbose_proxy_logger
|
||||
|
||||
@ -10,7 +10,6 @@ from fastapi import APIRouter, Depends, HTTPException, status
|
||||
|
||||
import litellm
|
||||
from litellm._logging import verbose_proxy_logger
|
||||
from litellm.router_strategy.budget_limiter import RouterBudgetLimiting
|
||||
from litellm.proxy._types import *
|
||||
from litellm.proxy._types import ProviderBudgetResponse, ProviderBudgetResponseObject
|
||||
from litellm.proxy.auth.user_api_key_auth import user_api_key_auth
|
||||
@ -18,6 +17,7 @@ from litellm.proxy.spend_tracking.spend_tracking_utils import (
|
||||
get_spend_by_team_and_customer,
|
||||
)
|
||||
from litellm.proxy.utils import handle_exception_on_proxy
|
||||
from litellm.router_strategy.budget_limiter import RouterBudgetLimiting
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from litellm.proxy.proxy_server import PrismaClient
|
||||
@ -1660,6 +1660,12 @@ async def ui_view_spend_logs( # noqa: PLR0915
|
||||
model: Optional[str] = fastapi.Query(
|
||||
default=None, description="Filter logs by model"
|
||||
),
|
||||
key_alias: Optional[str] = fastapi.Query(
|
||||
default=None, description="Filter logs by key alias"
|
||||
),
|
||||
end_user: Optional[str] = fastapi.Query(
|
||||
default=None, description="Filter logs by end user"
|
||||
),
|
||||
):
|
||||
"""
|
||||
View spend logs for UI with pagination support
|
||||
@ -1728,6 +1734,15 @@ async def ui_view_spend_logs( # noqa: PLR0915
|
||||
if model is not None:
|
||||
where_conditions["model"] = model
|
||||
|
||||
if key_alias is not None:
|
||||
where_conditions["metadata"] = {
|
||||
"path": ["user_api_key_alias"],
|
||||
"string_contains": key_alias,
|
||||
}
|
||||
|
||||
if end_user is not None:
|
||||
where_conditions["end_user"] = end_user
|
||||
|
||||
if min_spend is not None or max_spend is not None:
|
||||
where_conditions["spend"] = {}
|
||||
if min_spend is not None:
|
||||
|
||||
@ -4414,7 +4414,7 @@ class Router:
|
||||
return tpm_key
|
||||
|
||||
except Exception as e:
|
||||
verbose_router_logger.exception(
|
||||
verbose_router_logger.debug(
|
||||
"litellm.router.Router::deployment_callback_on_success(): Exception occured - {}".format(
|
||||
str(e)
|
||||
)
|
||||
@ -4562,8 +4562,10 @@ class Router:
|
||||
parent_otel_span=parent_otel_span,
|
||||
ttl=RoutingArgs.ttl.value,
|
||||
)
|
||||
|
||||
def _get_metadata_variable_name_from_kwargs(self, kwargs: dict) -> Literal["metadata", "litellm_metadata"]:
|
||||
|
||||
def _get_metadata_variable_name_from_kwargs(
|
||||
self, kwargs: dict
|
||||
) -> Literal["metadata", "litellm_metadata"]:
|
||||
"""
|
||||
Helper to return what the "metadata" field should be called in the request data
|
||||
|
||||
@ -5672,11 +5674,11 @@ class Router:
|
||||
)
|
||||
if supported_openai_params is None:
|
||||
supported_openai_params = []
|
||||
|
||||
|
||||
# Get mode from database model_info if available, otherwise default to "chat"
|
||||
db_model_info = model.get("model_info", {})
|
||||
mode = db_model_info.get("mode", "chat")
|
||||
|
||||
|
||||
model_info = ModelMapInfo(
|
||||
key=model_group,
|
||||
max_tokens=None,
|
||||
@ -6802,7 +6804,9 @@ class Router:
|
||||
model=model,
|
||||
request_kwargs=request_kwargs,
|
||||
healthy_deployments=healthy_deployments,
|
||||
metadata_variable_name=self._get_metadata_variable_name_from_kwargs(request_kwargs),
|
||||
metadata_variable_name=self._get_metadata_variable_name_from_kwargs(
|
||||
request_kwargs
|
||||
),
|
||||
)
|
||||
|
||||
if len(healthy_deployments) == 0:
|
||||
|
||||
@ -3,7 +3,9 @@ Wrapper around router cache. Meant to handle model cooldown logic
|
||||
"""
|
||||
|
||||
import time
|
||||
from typing import TYPE_CHECKING, Any, List, Optional, Tuple, TypedDict, Union
|
||||
from typing import TYPE_CHECKING, Any, List, Optional, Tuple, Union
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
from litellm import verbose_logger
|
||||
from litellm.caching.caching import DualCache
|
||||
|
||||
@ -4,7 +4,9 @@ Wrapper around router cache. Meant to store model id when prompt caching support
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
from typing import TYPE_CHECKING, Any, List, Optional, TypedDict, Union
|
||||
from typing import TYPE_CHECKING, Any, List, Optional, Union
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
from litellm.caching.caching import DualCache
|
||||
from litellm.caching.in_memory_cache import InMemoryCache
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
|
||||
from typing import Any, Dict, List, Literal, Optional, Union
|
||||
|
||||
from pydantic import BaseModel
|
||||
from typing_extensions import TypedDict
|
||||
|
||||
|
||||
class LiteLLMCacheType(str, Enum):
|
||||
|
||||
@ -1,14 +1,10 @@
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
|
||||
from typing import Any, Dict, List, Literal, Optional, Union
|
||||
|
||||
from pydantic import BaseModel, ConfigDict, Field, SecretStr
|
||||
from pydantic import BaseModel, ConfigDict, Field
|
||||
from typing_extensions import Required, TypedDict
|
||||
|
||||
from litellm.types.proxy.guardrails.guardrail_hooks.openai.openai_moderation import (
|
||||
OpenAIModerationGuardrailConfigModel,
|
||||
)
|
||||
|
||||
"""
|
||||
Pydantic object defining how to set guardrails on litellm proxy
|
||||
|
||||
@ -41,6 +37,9 @@ class SupportedGuardrailIntegrations(Enum):
|
||||
MODEL_ARMOR = "model_armor"
|
||||
OPENAI_MODERATION = "openai_moderation"
|
||||
NOMA = "noma"
|
||||
TOOL_PERMISSION = "tool_permission"
|
||||
|
||||
|
||||
|
||||
class Role(Enum):
|
||||
SYSTEM = "system"
|
||||
@ -312,7 +311,6 @@ class BedrockGuardrailConfigModel(BaseModel):
|
||||
)
|
||||
|
||||
|
||||
|
||||
class LakeraV2GuardrailConfigModel(BaseModel):
|
||||
"""Configuration parameters for the Lakera AI v2 guardrail"""
|
||||
|
||||
@ -375,6 +373,22 @@ class NomaGuardrailConfigModel(BaseModel):
|
||||
default=None,
|
||||
description="If True, blocks requests on API failures. Defaults to True if not provided",
|
||||
)
|
||||
anonymize_input: Optional[bool] = Field(
|
||||
default=None,
|
||||
description="If True, replaces sensitive content with anonymized version when only PII/PCI/secrets are detected. Only applies in blocking mode. Defaults to False if not provided",
|
||||
)
|
||||
|
||||
|
||||
class ToolPermissionGuardrailConfigModel(BaseModel):
|
||||
"""Configuration parameters for the Tool Permission guardrail"""
|
||||
|
||||
rules: Optional[List[Dict]] = Field(
|
||||
default=None, description="List of permission rules for tool usage"
|
||||
)
|
||||
default_action: Optional[str] = Field(
|
||||
default="Deny",
|
||||
description="Default action when no rule matches (Allow or Deny)",
|
||||
)
|
||||
|
||||
|
||||
class BaseLitellmParams(BaseModel): # works for new and patch update guardrails
|
||||
@ -425,7 +439,8 @@ class BaseLitellmParams(BaseModel): # works for new and patch update guardrails
|
||||
)
|
||||
|
||||
model: Optional[str] = Field(
|
||||
default=None, description="Optional field if guardrail requires a 'model' parameter"
|
||||
default=None,
|
||||
description="Optional field if guardrail requires a 'model' parameter",
|
||||
)
|
||||
|
||||
# Model Armor params
|
||||
@ -446,7 +461,7 @@ class BaseLitellmParams(BaseModel): # works for new and patch update guardrails
|
||||
default=True,
|
||||
description="Whether to fail the request if Model Armor encounters an error",
|
||||
)
|
||||
|
||||
|
||||
model_config = ConfigDict(extra="allow", protected_namespaces=())
|
||||
|
||||
|
||||
@ -464,6 +479,7 @@ class LitellmParams(
|
||||
LassoGuardrailConfigModel,
|
||||
PillarGuardrailConfigModel,
|
||||
NomaGuardrailConfigModel,
|
||||
ToolPermissionGuardrailConfigModel,
|
||||
BaseLitellmParams,
|
||||
):
|
||||
guardrail: str = Field(description="The type of guardrail integration to use")
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user