Resolve merge conflict by including both CompactifAI and OVHCloud providers

- Keep CompactifAI provider detection logic - Include new OVHCloud provider from main branch - Both providers now work correctly with model prefix detection
2025-09-14 23:03:18 +02:00 · 2025-09-14 23:03:18 +02:00 · 9521414efa
commit 9521414efa
parent 6ac37093e5 03f2be1e20
250 changed files with 14752 additions and 1245 deletions
--- a/README.md
+++ b/README.md
@ -346,6 +346,7 @@ curl 'http://0.0.0.0:4000/key/generate' \
 | [Featherless AI](https://docs.litellm.ai/docs/providers/featherless_ai)                              | ✅                                                       | ✅                                                                               | ✅                                                                                   | ✅                                                                                 |                                                                               |                                                                         |
 | [Nebius AI Studio](https://docs.litellm.ai/docs/providers/nebius)                             | ✅                                                       | ✅                                                                               | ✅                                                                                   | ✅                                                                                 | ✅                                                                             |                                                                         |
 | [Heroku](https://docs.litellm.ai/docs/providers/heroku)                             | ✅                                                       | ✅                                                                               |                                                                                    |                                                                                  |                                                                              |                                                                         |
+| [OVHCloud AI Endpoints](https://docs.litellm.ai/docs/providers/ovhcloud)                             | ✅                                                       | ✅                                                                               |                                                                                    |                                                                                  |                                                                              |                                                                         |

 [**Read the Docs**](https://docs.litellm.ai/docs/)

--- a/cookbook/misc/RELEASE_NOTES_GENERATION_INSTRUCTIONS.md
+++ b/cookbook/misc/RELEASE_NOTES_GENERATION_INSTRUCTIONS.md
@ -0,0 +1,256 @@
+# LiteLLM Release Notes Generation Instructions
+
+This document provides comprehensive instructions for AI agents to generate release notes for LiteLLM following the established format and style.
+
+## Required Inputs
+
+1. **Release Version** (e.g., `v1.76.3-stable`)
+2. **PR Diff/Changelog** - List of PRs with titles and contributors
+3. **Previous Version Commit Hash** - To compare model pricing changes
+4. **Reference Release Notes** - Previous release notes to follow style/format
+
+## Step-by-Step Process
+
+### 1. Initial Setup and Analysis
+
+```bash
+# Check git diff for model pricing changes
+git diff <previous_commit_hash> HEAD -- model_prices_and_context_window.json
+```
+
+**Key Analysis Points:**
+- New models added (look for new entries)
+- Deprecated models removed (look for deleted entries)
+- Pricing updates (look for cost changes)
+- Feature support changes (tool calling, reasoning, etc.)
+
+### 2. Release Notes Structure
+
+Follow this exact structure based on `docs/my-website/release_notes/v1.76.1-stable/index.md`:
+
+```markdown
+---
+title: "v1.76.X-stable - [Key Theme]"
+slug: "v1-76-X"
+date: YYYY-MM-DDTHH:mm:ss
+authors: [standard author block]
+hide_table_of_contents: false
+---
+
+## Deploy this version
+[Docker and pip installation tabs]
+
+## Key Highlights
+[3-5 bullet points of major features]
+
+## Major Changes
+[Critical changes users need to know]
+
+## Performance Improvements
+[Performance-related changes]
+
+## New Models / Updated Models
+[Detailed model tables and provider updates]
+
+## LLM API Endpoints
+[API-related features and fixes]
+
+## Management Endpoints / UI
+[Admin interface and management changes]
+
+## Logging / Guardrail Integrations
+[Observability and security features]
+
+## Performance / Loadbalancing / Reliability improvements
+[Infrastructure improvements]
+
+## General Proxy Improvements
+[Other proxy-related changes]
+
+## New Contributors
+[List of first-time contributors]
+
+## Full Changelog
+[Link to GitHub comparison]
+```
+
+### 3. Categorization Rules
+
+**Performance Improvements:**
+- RPS improvements
+- Memory optimizations
+- CPU usage optimizations
+- Timeout controls
+- Worker configuration
+
+**New Models/Updated Models:**
+- Extract from model_prices_and_context_window.json diff
+- Create tables with: Provider, Model, Context Window, Input Cost, Output Cost, Features
+- Group by provider
+- Note pricing corrections
+- Highlight deprecated models
+
+**Provider Features:**
+- Group by provider (Gemini, OpenAI, Anthropic, etc.)
+- Link to provider docs: `../../docs/providers/[provider_name]`
+- Separate features from bug fixes
+
+**API Endpoints:**
+- Images API
+- Video Generation (if applicable)
+- Responses API
+- Passthrough endpoints
+- General chat completions
+
+**UI/Management:**
+- Authentication changes
+- Dashboard improvements
+- Team management
+- Key management
+
+**Integrations:**
+- Logging providers (Datadog, Braintrust, etc.)
+- Guardrails
+- Cost tracking
+- Observability
+
+### 4. Documentation Linking Strategy
+
+**Link to docs when:**
+- New provider support added
+- Significant feature additions
+- API endpoint changes
+- Integration additions
+
+**Link format:** `../../docs/[category]/[specific_doc]`
+
+**Common doc paths:**
+- `../../docs/providers/[provider]` - Provider-specific docs
+- `../../docs/image_generation` - Image generation
+- `../../docs/video_generation` - Video generation (if exists)
+- `../../docs/response_api` - Responses API
+- `../../docs/proxy/logging` - Logging integrations
+- `../../docs/proxy/guardrails` - Guardrails
+- `../../docs/pass_through/[provider]` - Passthrough endpoints
+
+### 5. Model Table Generation
+
+From git diff analysis, create tables like:
+
+```markdown
+| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
+| -------- | ----- | -------------- | ------------------- | -------------------- | -------- |
+| OpenRouter | `openrouter/openai/gpt-4.1` | 1M | $2.00 | $8.00 | Chat completions with vision |
+```
+
+**Extract from JSON:**
+- `max_input_tokens` → Context Window
+- `input_cost_per_token` × 1,000,000 → Input cost
+- `output_cost_per_token` × 1,000,000 → Output cost
+- `supports_*` fields → Features
+- Special pricing fields (per image, per second) for generation models
+
+### 6. PR Categorization Logic
+
+**By Keywords in PR Title:**
+- `[Perf]`, `Performance`, `RPS` → Performance Improvements
+- `[Bug]`, `[Bug Fix]`, `Fix` → Bug Fixes section
+- `[Feat]`, `[Feature]`, `Add support` → Features section
+- `[Docs]` → Documentation (usually exclude from main sections)
+- Provider names (Gemini, OpenAI, etc.) → Group under provider
+
+**By PR Content Analysis:**
+- New model additions → New Models section
+- UI changes → Management Endpoints/UI
+- Logging/observability → Logging/Guardrail Integrations
+- Rate limiting/budgets → Performance/Reliability
+- Authentication → Management Endpoints
+
+### 7. Writing Style Guidelines
+
+**Tone:**
+- Professional but accessible
+- Focus on user impact
+- Highlight breaking changes clearly
+- Use active voice
+
+**Formatting:**
+- Use consistent markdown formatting
+- Include PR links: `[PR #XXXXX](https://github.com/BerriAI/litellm/pull/XXXXX)`
+- Use code blocks for configuration examples
+- Bold important terms and section headers
+
+**Warnings/Notes:**
+- Add warning boxes for breaking changes
+- Include migration instructions when needed
+- Provide override options for default changes
+
+### 8. Quality Checks
+
+**Before finalizing:**
+- Verify all PR links work
+- Check documentation links are valid
+- Ensure model pricing is accurate
+- Confirm provider names are consistent
+- Review for typos and formatting issues
+
+### 9. Common Patterns to Follow
+
+**Performance Changes:**
+```markdown
+- **+400 RPS Performance Boost** - Description - [PR #XXXXX](link)
+```
+
+**New Models:**
+Always include pricing table and feature highlights
+
+**Breaking Changes:**
+```markdown
+:::warning
+This release has a known issue...
+:::
+```
+
+**Provider Features:**
+```markdown
+- **[Provider Name](../../docs/providers/provider)**
+    - Feature description - [PR #XXXXX](link)
+```
+
+### 10. Missing Documentation Check
+
+**Review for missing docs:**
+- New providers without documentation
+- New API endpoints without examples
+- Complex features without guides
+- Integration setup instructions
+
+**Flag for documentation needs:**
+- New provider integrations
+- Significant API changes
+- Complex configuration options
+- Migration requirements
+
+## Example Command Workflow
+
+```bash
+# 1. Get model changes
+git diff <commit> HEAD -- model_prices_and_context_window.json
+
+# 2. Analyze PR list for categorization
+# 3. Create release notes following template
+# 4. Link to appropriate documentation
+# 5. Review for missing documentation needs
+```
+
+## Output Requirements
+
+- Follow exact markdown structure from reference
+- Include all PR links and contributors
+- Provide accurate model pricing tables
+- Link to relevant documentation
+- Highlight breaking changes with warnings
+- Include deployment instructions
+- End with full changelog link
+
+This process ensures consistent, comprehensive release notes that help users understand changes and upgrade smoothly.
--- a/docs/my-website/docs/completion/input.md
+++ b/docs/my-website/docs/completion/input.md
@ -65,6 +65,7 @@ Use `litellm.get_supported_openai_params()` for an updated list of params for ea
 | Github | ✅| ✅ | ✅ | ✅| ✅ | ✅ | ✅ | ✅| ✅ | ✅| ✅|| || ✅ | ✅ (model dependent) | ✅ (model dependent) || ||
 | Novita AI| ✅| ✅ || ✅| ✅ | ✅ | ✅ | ✅| ✅ | ✅| || ✅||| |||| ||
 | Bytez | ✅| ✅ || ✅| ✅ | | | ✅|| || || || || || ||
+| OVHCloud AI Endpoints | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | | | | | | | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |

 :::note

--- a/docs/my-website/docs/providers/ovhcloud.md
+++ b/docs/my-website/docs/providers/ovhcloud.md
@ -0,0 +1,380 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# 🆕 OVHCloud AI Endpoints
+Leading French Cloud provider in Europe with data sovereignty and privacy.
+
+You can explore the last models we made available in our [catalog](https://endpoints.ai.cloud.ovh.net/catalog).
+
+:::tip
+
+We support ALL OVHCloud AI Endpoints models, just set `model=ovhcloud/<any-model-on-ai-endpoints>` as a prefix when sending litellm requests.
+For the complete models catalog, visit https://endpoints.ai.cloud.ovh.net/catalog. **
+
+:::
+
+## Sample usage
+### Chat completion
+You can define your API key by setting the `OVHCLOUD_API_KEY` environment variable or by overriding the `api_key` parameter. You can generate a key on the [OVHCloud Manager](https://www.ovh.com/manager).
+
+```python
+from litellm import completion
+import os
+
+# Our API is free but ratelimited for calls without an API key.
+os.environ['OVHCLOUD_API_KEY'] = "your-api-key"
+
+response = completion(
+    model = "ovhcloud/Meta-Llama-3_3-70B-Instruct",
+    messages = [
+        {
+            "role": "user",
+            "content": "Hello, how are you?",
+        }
+    ],
+    max_tokens = 10,
+    stop = [],
+    temperature = 0.2,
+    top_p = 0.9,
+    user = "user",
+    api_key = "your-api-key" # Optional if set through the enviromnent variable.
+)
+
+print(response)
+```
+
+### Streaming
+Set the parameter `stream` to `True` to stream a response.
+```python
+from litellm import completion
+import os
+
+os.environ['OVHCLOUD_API_KEY'] = "your-api-key"
+
+response = completion(
+    model = "ovhcloud/Meta-Llama-3_3-70B-Instruct",
+    messages = [
+        {
+            "role": "user",
+            "content": "Hello, how are you?",
+        }
+    ],
+    max_tokens = 10,
+    stop = [],
+    temperature = 0.2,
+    top_p = 0.9,
+    user = "user",
+    api_key = "your-api-key" # Optional if set through the enviromnent variable,
+    stream = True
+)
+
+for part in response:
+    print(response)
+```
+
+### Tool Calling
+
+```python
+from litellm import completion
+import json
+
+def get_current_weather(location, unit="celsius"):
+    if unit == "celsius":
+        return {"location": location, "temperature": "22", "unit": "celsius"}
+    else:
+        return {"location": location, "temperature": "72", "unit": "fahrenheit"}
+
+def print_message(role, content, is_tool_call=False, function_name=None):
+    if role == "user":
+        print(f"🧑 User: {content}")
+    elif role == "assistant":
+        if is_tool_call:
+            print(f"🤖 Assistant: I will call the function '{function_name}' to get some informations.")
+        else:
+            print(f"🤖 Assistant: {content}")
+    elif role == "tool":
+        print(f"🔧 Tool ({function_name}): {content}")
+    print()
+
+messages = [{"role": "user", "content": "What's the weather like in Paris?"}]
+model = "ovhcloud/Meta-Llama-3_3-70B-Instruct"
+
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "get_current_weather",
+            "description": "Get the current weather in a given location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "The city and country, e.g. Montréal, Canada",
+                    },
+                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                },
+                "required": ["location"],
+            },
+        },
+    }
+]
+
+print("🌟 Beginning of the conversation")
+
+# Initial user message
+print_message("user", messages[0]["content"])
+
+# First request to the model
+print("📡 Sending first request to the model...")
+response = completion(
+    model=model,
+    messages=messages,
+    tools=tools,
+    tool_choice="auto",
+)
+
+response_message = response.choices[0].message
+tool_calls = response_message.tool_calls
+
+if tool_calls:
+    available_functions = {
+        "get_current_weather": get_current_weather,
+    }
+    
+    # Display the tool calls suggested by the model
+    for tool_call in tool_calls:
+        print_message("assistant", "", is_tool_call=True, function_name=tool_call.function.name)
+        print(f"   📋 Arguments: {tool_call.function.arguments}")
+        print()
+    
+    # Add assistant message with tool calls to the conversation history
+    assistant_message = {
+        "role": "assistant",
+        "content": response_message.content,
+        "tool_calls": [
+            {
+                "id": tool_call.id,
+                "type": "function", 
+                "function": {
+                    "name": tool_call.function.name,
+                    "arguments": tool_call.function.arguments
+                }
+            } for tool_call in tool_calls
+        ]
+    }
+    
+    messages.append(assistant_message)
+    
+    # Execute each tool call and add the results to the conversation history
+    for tool_call in tool_calls:
+        function_name = tool_call.function.name
+        function_to_call = available_functions[function_name]
+        function_args = json.loads(tool_call.function.arguments)
+        
+        print(f"🔧 Executing function '{function_name}'...")
+        function_response = function_to_call(
+            location=function_args.get("location"),
+            unit=function_args.get("unit"),
+        )
+        
+        # Display tool response
+        print_message("tool", json.dumps(function_response, indent=2), function_name=function_name)
+        
+        messages.append({
+            "tool_call_id": tool_call.id,
+            "role": "tool",
+            "name": function_name,
+            "content": json.dumps(function_response),
+        })
+    
+    print("📡 Sending second request to the model with results...")
+    
+    # Second request with function results
+    second_response = completion(
+        model=model,
+        messages=messages
+    )
+    
+    # Display final response
+    final_content = second_response.choices[0].message.content
+    print_message("assistant", final_content)
+    
+else:
+    print("❌ No function call detected")
+    print_message("assistant", response_message.content)
+```
+
+### Vision Example
+
+```python
+from base64 import b64encode
+from mimetypes import guess_type
+import litellm
+
+# Auxiliary function to get b64 images
+def data_url_from_image(file_path):
+    mime_type, _ = guess_type(file_path)
+    if mime_type is None:
+        raise ValueError("Could not determine MIME type of the file")
+
+    with open(file_path, "rb") as image_file:
+        encoded_string = b64encode(image_file.read()).decode("utf-8")
+
+    data_url = f"data:{mime_type};base64,{encoded_string}"
+    return data_url
+
+response = litellm.completion(
+    model = "ovhcloud/Mistral-Small-3.2-24B-Instruct-2506", 
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "What's in this image?"
+                },
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": data_url_from_image("your_image.jpg"),
+                        "format": "image/jpeg"
+                    }
+                }
+            ]
+        }
+    ],
+    stream=False
+)
+
+print(response.choices[0].message.content)
+```
+
+
+### Structured Output
+
+```python
+from litellm import completion
+
+response = completion(
+    model="ovhcloud/Meta-Llama-3_3-70B-Instruct",
+    messages=[
+        {
+            "role": "system",
+            "content": (
+                "You are a specialist in extracting structured data from unstructured text. "
+                "Your task is to identify relevant entities and categories, then format them "
+                "according to the requested structure."
+            ),
+        },
+        {
+            "role": "user",
+            "content": "Room 12 contains books, a desk, and a lamp."
+        },
+    ],
+    response_format={
+        "type": "json_schema",
+        "json_schema": {
+            "title": "data",
+            "name": "data_extraction",
+            "schema": {
+                "type": "object",
+                "properties": {
+                    "section": {"type": "string"},
+                    "products": {
+                        "type": "array",
+                        "items": {"type": "string"}
+                    }
+                },
+                "required": ["section", "products"],
+                "additionalProperties": False
+            },
+            "strict": False
+        }
+    },
+    stream=False
+)
+
+print(response.choices[0].message.content)
+```
+
+### Embeddings
+
+```python
+from litellm import embedding
+
+response = embedding(
+    model="ovhcloud/BGE-M3",
+    input=["sample text to embed", "another sample text to embed"]
+)
+
+print(response.data)
+```
+
+## Usage with LiteLLM Proxy Server
+
+Here's how to call a OVHCloud AI Endpoints model with the LiteLLM Proxy Server
+
+1. Modify the config.yaml 
+
+  ```yaml
+  model_list:
+    - model_name: my-model
+      litellm_params:
+        model: ovhcloud/<your-model-name>  # add ovhcloud/ prefix to route as OVHCloud provider
+        api_key: api-key                   # api key to send your model
+  ```
+
+
+2. Start the proxy 
+
+  ```bash
+  $ litellm --config /path/to/config.yaml
+  ```
+
+3. Send Request to LiteLLM Proxy Server
+
+  <Tabs>
+
+  <TabItem value="openai" label="OpenAI Python v1.0.0+">
+
+  ```python
+  import openai
+  client = openai.OpenAI(
+      api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
+      base_url="http://0.0.0.0:4000" # litellm-proxy-base url
+  )
+
+  response = client.chat.completions.create(
+      model="my-model",
+      messages = [
+          {
+              "role": "user",
+              "content": "what llm are you"
+          }
+      ],
+  )
+
+  print(response)
+  ```
+  </TabItem>
+
+  <TabItem value="curl" label="curl">
+
+  ```shell
+  curl --location 'http://0.0.0.0:4000/chat/completions' \
+      --header 'Authorization: Bearer sk-1234' \
+      --header 'Content-Type: application/json' \
+      --data '{
+      "model": "my-model",
+      "messages": [
+          {
+          "role": "user",
+          "content": "what llm are you"
+          }
+      ],
+  }'
+  ```
+  </TabItem>
+
+  </Tabs>
--- a/docs/my-website/docs/providers/vllm.md
+++ b/docs/my-website/docs/providers/vllm.md
@ -8,9 +8,9 @@ LiteLLM supports all models on VLLM.
 | Property | Details |
 |-------|-------|
 | Description | vLLM is a fast and easy-to-use library for LLM inference and serving. [Docs](https://docs.vllm.ai/en/latest/index.html) |
-| Provider Route on LiteLLM | `hosted_vllm/` (for OpenAI compatible server), `vllm/` (for vLLM sdk usage) |
+| Provider Route on LiteLLM | `hosted_vllm/` (for OpenAI compatible server), `vllm/` ([DEPRECATED] for vLLM sdk usage) |
 | Provider Doc | [vLLM ↗](https://docs.vllm.ai/en/latest/index.html) |
-| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions`, `/rerank` |
+| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions`, `/rerank`, `/audio/transcriptions` |


 # Quick Start
--- a/docs/my-website/docs/proxy/admin_ui_sso.md
+++ b/docs/my-website/docs/proxy/admin_ui_sso.md
@ -4,6 +4,10 @@ import TabItem from '@theme/TabItem';

 # ✨ SSO for Admin UI

+:::info
+From v1.76.0, SSO is now Free for up to 5 users.
+:::
+
 :::info

 ✨ SSO is on LiteLLM Enterprise
--- a/docs/my-website/docs/proxy/forward_client_headers.md
+++ b/docs/my-website/docs/proxy/forward_client_headers.md
@ -0,0 +1,212 @@
+# Forward Client Headers to LLM API
+
+Control which model groups can forward client headers to the underlying LLM provider APIs.
+
+## Overview
+
+By default, LiteLLM does not forward client headers to LLM provider APIs for security reasons. However, you can selectively enable header forwarding for specific model groups using the `forward_client_headers_to_llm_api` setting.
+
+## Configuration
+
+## Enable Globally
+
+```yaml
+general_settings:
+  forward_client_headers_to_llm_api: true
+```
+
+## Enable for a Model Group
+
+Add the `forward_client_headers_to_llm_api` setting under `model_group_settings` in your configuration:
+
+```yaml
+model_list:
+  - model_name: gpt-4o-mini
+    litellm_params:
+      model: openai/gpt-4o-mini
+      api_key: "your-api-key"
+  - model_name: "wildcard-models/*"
+    litellm_params:
+      model: "openai/*"
+      api_key: "your-api-key"
+
+litellm_settings:
+  model_group_settings:
+    forward_client_headers_to_llm_api:
+      - gpt-4o-mini
+      - wildcard-models/*
+```
+
+## Supported Model Patterns
+
+The configuration supports various model matching patterns:
+
+### 1. Exact Model Names
+```yaml
+forward_client_headers_to_llm_api:
+  - gpt-4o-mini
+  - claude-3-sonnet
+```
+
+### 2. Wildcard Patterns
+```yaml
+forward_client_headers_to_llm_api:
+  - "openai/*"          # All OpenAI models
+  - "anthropic/*"       # All Anthropic models
+  - "wildcard-group/*"  # All models in wildcard-group
+```
+
+### 3. Team Model Aliases
+If your team has model aliases configured, the forwarding will work with both the original model name and the alias.
+
+## Forwarded Headers
+
+When enabled for a model group, LiteLLM forwards the following types of headers:
+
+### Custom Headers (x- prefix)
+- Any header starting with `x-` (except `x-stainless-*` which can cause OpenAI SDK issues)
+- Examples: `x-custom-header`, `x-request-id`, `x-trace-id`
+
+### Provider-Specific Headers
+- **Anthropic**: `anthropic-beta` headers
+- **OpenAI**: `openai-organization` (when enabled via `forward_openai_org_id: true`)
+
+### User Information Headers (Optional)
+When `add_user_information_to_llm_headers` is enabled, LiteLLM adds:
+- `x-litellm-user-id`
+- `x-litellm-org-id`
+- Other user metadata as `x-litellm-*` headers
+
+## Security Considerations
+
+⚠️ **Important Security Notes:**
+
+1. **Sensitive Data**: Only enable header forwarding for trusted model groups, as headers may contain sensitive information
+2. **API Keys**: Never include API keys or secrets in forwarded headers
+3. **PII**: Be cautious about forwarding headers that might contain personally identifiable information
+4. **Provider Limits**: Some providers have restrictions on custom headers
+
+## Example Use Cases
+
+### 1. Request Tracing
+Forward tracing headers to track requests across your system:
+
+```bash
+curl -X POST "https://your-proxy.com/v1/chat/completions" \
+  -H "Authorization: Bearer your-key" \
+  -H "x-trace-id: abc123" \
+  -H "x-request-source: mobile-app" \
+  -d '{
+    "model": "gpt-4o-mini",
+    "messages": [{"role": "user", "content": "Hello"}]
+  }'
+```
+
+### 2. Custom Metadata
+Pass custom metadata to your LLM provider:
+
+```bash
+curl -X POST "https://your-proxy.com/v1/chat/completions" \
+  -H "Authorization: Bearer your-key" \
+  -H "x-customer-id: customer-123" \
+  -H "x-environment: production" \
+  -d '{
+    "model": "gpt-4o-mini", 
+    "messages": [{"role": "user", "content": "Hello"}]
+  }'
+```
+
+### 3. Anthropic Beta Features
+Enable beta features for Anthropic models:
+
+```bash
+curl -X POST "https://your-proxy.com/v1/chat/completions" \
+  -H "Authorization: Bearer your-key" \
+  -H "anthropic-beta: tools-2024-04-04" \
+  -d '{
+    "model": "claude-3-sonnet",
+    "messages": [{"role": "user", "content": "Hello"}]
+  }'
+```
+
+## Complete Configuration Example
+
+```yaml
+model_list:
+  # Fixed model with header forwarding
+  - model_name: byok-fixed-gpt-4o-mini
+    litellm_params:
+      model: openai/gpt-4o-mini
+      api_base: "https://your-openai-endpoint.com"
+      api_key: "your-api-key"
+      
+  # Wildcard model group with header forwarding
+  - model_name: "byok-wildcard/*"
+    litellm_params:
+      model: "openai/*"
+      api_base: "https://your-openai-endpoint.com"
+      api_key: "your-api-key"
+      
+  # Standard model without header forwarding
+  - model_name: standard-gpt-4
+    litellm_params:
+      model: openai/gpt-4
+      api_key: "your-api-key"
+
+litellm_settings:
+  # Enable user info headers globally (optional)
+  add_user_information_to_llm_headers: true
+  
+  model_group_settings:
+    forward_client_headers_to_llm_api:
+      - byok-fixed-gpt-4o-mini
+      - byok-wildcard/*
+      # Note: standard-gpt-4 is NOT included, so no headers forwarded
+
+general_settings:
+  # Enable OpenAI organization header forwarding (optional)
+  forward_openai_org_id: true
+```
+
+## Testing Header Forwarding
+
+To test if headers are being forwarded:
+
+1. **Enable Debug Logging**: Set `set_verbose: true` in your config
+2. **Check Provider Logs**: Monitor your LLM provider's request logs
+3. **Use Webhook Sites**: For testing, you can use webhook.site URLs as api_base to see forwarded headers
+
+## Troubleshooting
+
+### Headers Not Being Forwarded
+
+1. **Check Model Name**: Ensure the model name in your request matches the configuration
+2. **Verify Pattern Matching**: Wildcard patterns must match exactly
+3. **Review Logs**: Enable verbose logging to see header processing
+
+### Provider Errors
+
+1. **Invalid Headers**: Some providers reject unknown headers
+2. **Header Limits**: Providers may have limits on header count/size
+3. **Authentication**: Ensure forwarded headers don't conflict with authentication
+
+## Related Features
+
+- [Request Headers](./request_headers.md) - Complete list of supported request headers
+- [Response Headers](./response_headers.md) - Headers returned by LiteLLM
+- [Team Model Aliases](./team_model_add.md) - Configure model aliases for teams
+- [Model Access Control](./model_access.md) - Control which users can access which models
+
+## API Reference
+
+The header forwarding is controlled by the `ModelGroupSettings` configuration:
+
+```python
+class ModelGroupSettings(BaseModel):
+    forward_client_headers_to_llm_api: Optional[List[str]] = None
+```
+
+Where each string in the list can be:
+- An exact model name (e.g., `"gpt-4o-mini"`)
+- A wildcard pattern (e.g., `"openai/*"`)
+- A model group name (e.g., `"my-model-group/*"`)
--- a/docs/my-website/docs/proxy/guardrails/noma_security.md
+++ b/docs/my-website/docs/proxy/guardrails/noma_security.md
@ -135,6 +135,7 @@ guardrails:
      # application_id: "my-app"
      # monitor_mode: false
      # block_failures: true
+      # anonymize_input: false
 ```

 ### Required Parameters
@ -147,6 +148,7 @@ guardrails:
 - **`application_id`**: Your application identifier (defaults to `"litellm"`)
 - **`monitor_mode`**: If `true`, logs violations without blocking (defaults to `false`)
 - **`block_failures`**: If `true`, blocks requests when guardrail API failures occur (defaults to `true`)
+- **`anonymize_input`**: If `true`, replaces sensitive content with anonymized version (defaults to `false`)

 ## Environment Variables

@ -158,6 +160,7 @@ export NOMA_API_BASE="https://api.noma.security/"   # Optional
 export NOMA_APPLICATION_ID="my-app"                 # Optional
 export NOMA_MONITOR_MODE="false"                    # Optional
 export NOMA_BLOCK_FAILURES="true"                   # Optional
+export NOMA_ANONYMIZE_INPUT="false"                 # Optional
 ```

 ## Advanced Configuration
@ -190,6 +193,20 @@ guardrails:
      block_failures: false  # Allow requests to proceed if guardrail API fails
 ```

+### Content Anonymization
+
+Enable anonymization to replace sensitive content instead of blocking:
+
+```yaml
+guardrails:
+  - guardrail_name: "noma-anonymize"
+    litellm_params:
+      guardrail: noma
+      mode: "pre_call"
+      api_key: os.environ/NOMA_API_KEY
+      anonymize_input: true  # Replace sensitive data with anonymized version
+```
+
 ### Multiple Guardrails

 Apply different configurations for input and output:
--- a/docs/my-website/docs/proxy/guardrails/tool_permission.md
+++ b/docs/my-website/docs/proxy/guardrails/tool_permission.md
@ -0,0 +1,153 @@
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Tool Permission Guardrail
+
+LiteLLM provides a Tool Permission Guardrail that lets you control which **tool calls** a model is allowed to invoke, using configurable allow/deny rules. This offers fine-grained, provider-agnostic control over tool execution (e.g., OpenAI Chat Completions `tool_calls`, Anthropic Messages `tool_use`, MCP tools).
+
+## Quick Start
+### 1. Define Guardrails on your LiteLLM config.yaml 
+
+Define your guardrails under the `guardrails` section
+```yaml
+guardrails:
+  - guardrail_name: "tool-permission-guardrail"
+    litellm_params:
+      guardrail: tool_permission
+      mode: "post_call"
+      rules:
+        - id: "allow_bash"
+          tool_name: "Bash"
+          decision: "allow"
+        - id: "allow_github_mcp"
+          tool_name: "mcp__github_*"
+          decision: "allow"
+        - id: "allow_aws_documentation"
+          tool_name: "mcp__aws-documentation_*_documentation"
+          decision: "allow"
+        - id: "deny_read_commands"
+          tool_name: "Read"
+          decision: "Deny"
+      default_action: "deny"  # Fallback when no rule matches: "allow" or "deny"
+      on_disallowed_action: "block"  # How to handle disallowed tools: "block" or "rewrite"
+```
+
+#### Rule Structure
+
+```yaml
+- id: "unique_rule_id"           # Unique identifier for the rule
+  tool_name: "pattern"           # Tool name or pattern to match
+  decision: "allow"              # "allow" or "deny"
+```
+
+#### Supported values for `mode`
+
+- `pre_call` Run **before** LLM call, on **input**
+- `post_call` Run **after** LLM call, on **input & output**
+
+### 2. Start the Proxy
+
+```shell
+litellm --config config.yaml --port 4000
+```
+
+## Examples
+
+<Tabs>
+<TabItem value="block" label="Block Request">
+
+**Block requset**
+
+```bash
+# Test
+curl -X POST "http://localhost:4000/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer your-master-key-here" \
+  -d '{
+    "model": "gpt-5-mini",
+    "messages": [{"role": "user","content": "What is the weather like in Tokyo today?"}],
+    "tools": [
+      {
+        "type":"function",
+        "function": {
+          "name":"get_current_weather",
+          "description": "Get the current weather in a given location"
+        }
+      }
+    ]
+  }'
+```
+
+**Expected response (Denied):**
+
+```json
+{
+  "error":
+    {
+      "message": "Guardrail raised an exception, Guardrail: tool-permission-guardrail, Message: Tool 'get_current_weather' denied by default action",
+      "type": "None",
+      "param": "None",
+      "code": "500"
+    }
+}
+```
+
+</TabItem>
+<TabItem value="rewrite" label="Rewrite Request">
+
+**Rewrite requset**
+
+```bash
+# Test
+curl -X POST "http://localhost:4000/v1/chat/completions" \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer your-master-key-here" \
+  -d '{
+    "model": "gpt-5-mini",
+    "messages": [{"role": "user","content": "What is the weather like in Tokyo today?"}],
+    "tools": [
+      {
+        "type":"function",
+        "function": {
+          "name":"get_current_weather",
+          "description": "Get the current weather in a given location"
+        }
+      }
+    ]
+  }'
+```
+
+**Expected response:**
+
+```json
+{
+	"id": "chatcmpl-xxxxxxxxxxxxxxx",
+	"created": 1757716050,
+	"model": "gpt-5-mini-2025-08-07",
+	"object": "chat.completion",
+	"choices": [
+		{
+			"finish_reason": "stop",
+			"index": 0,
+			"message": {
+				"content": "I can’t fetch live weather — I don’t have real‑time internet access.",
+				"role": "assistant",
+				"annotations": []
+			},
+			"provider_specific_fields": {}
+		}
+	],
+	"usage": {
+		"prompt_tokens": 112,
+		"total_tokens": 735,
+		"completion_tokens_details": {
+			"reasoning_tokens": 384,
+		},
+	},
+	"service_tier": "default"
+}
+```
+
+</TabItem>
+</Tabs>
--- a/docs/my-website/docs/proxy/request_headers.md
+++ b/docs/my-website/docs/proxy/request_headers.md
@ -2,6 +2,10 @@

 Special headers that are supported by LiteLLM.

+## Header Forwarding
+
+By default, LiteLLM does not forward client headers to LLM provider APIs. However, you can selectively enable header forwarding for specific model groups. [Learn more about configuring header forwarding](./forward_client_headers.md).
+
 ## LiteLLM Headers

 `x-litellm-timeout` Optional[float]: The timeout for the request in seconds.
@ -21,11 +25,15 @@ Special headers that are supported by LiteLLM.
 `anthropic-version` Optional[str]: The version of the Anthropic API to use.  
 `anthropic-beta` Optional[str]: The beta version of the Anthropic API to use.
    - For `/v1/messages` endpoint, this will always be forward the header to the underlying model.
-    - For `/chat/completions` endpoint, this will only be forwarded if `forward_client_headers_to_llm_api` is true.
+    - For `/chat/completions` endpoint, this will only be forwarded if the model is configured in `forward_client_headers_to_llm_api`. [Learn more](./forward_client_headers.md)

 ## OpenAI Headers

 `openai-organization` Optional[str]: The organization to use for the OpenAI API. (currently needs to be enabled via `general_settings::forward_openai_org_id: true`)

+## Custom Headers
+
+Custom headers starting with `x-` can be forwarded to LLM provider APIs when the model is configured in `forward_client_headers_to_llm_api`. [Learn more about header forwarding configuration](./forward_client_headers.md).
+


--- a/docs/my-website/docs/tutorials/openweb_ui.md
+++ b/docs/my-website/docs/tutorials/openweb_ui.md
@ -89,16 +89,20 @@ To track spend and usage for each Open WebUI user, configure both Open WebUI and

 2. **Configure LiteLLM to Parse User Headers**
   
-  Add the following to your LiteLLM `config.yaml` to specify a header to use for user tracking:
+  Add the following to your LiteLLM `config.yaml` to specify the request header mapping for user tracking:

  ```yaml
  general_settings:
-      user_header_name: X-OpenWebUI-User-Id
+    user_header_mappings:
+      - header_name: X-OpenWebUI-User-Id
+        litellm_user_role: internal_user
+      - header_name: X-OpenWebUI-User-Email
+        litellm_user_role: customer
  ```

  ⓘ Available tracking options

-  You can use any of the following headers for `user_header_name`:
+  You can use any of the following headers in `header_name` in `user_header_mappings` :
  - `X-OpenWebUI-User-Id`
  - `X-OpenWebUI-User-Email`
  - `X-OpenWebUI-User-Name`
@ -109,6 +113,12 @@ To track spend and usage for each Open WebUI user, configure both Open WebUI and
  - Users can modify their own usernames
  - Administrators can modify both usernames and emails of any account

+This video walks through on how we can map the openweb ui headers to LiteLLM user roles 
+
+<iframe src="https://www.loom.com/embed/a1b6a4635fc0478ba4fd34cae16e2ffd?sid=791c2dcc-7e65-45be-bf7f-27d2601c123e" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen width="840" height="500"></iframe>
+
+<br/>
+<br/>


 ## Render `thinking` content on Open WebUI
--- a/docs/my-website/release_notes/v1.77.2-stable/index.md
+++ b/docs/my-website/release_notes/v1.77.2-stable/index.md
@ -0,0 +1,155 @@
+---
+title: "v1.77.2-stable - Bedrock Batches API"
+slug: "v1-77-2"
+date: 2025-09-13T10:00:00
+authors:
+  - name: Krrish Dholakia
+    title: CEO, LiteLLM
+    url: https://www.linkedin.com/in/krish-d/
+    image_url: https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg
+  - name: Ishaan Jaffer
+    title: CTO, LiteLLM
+    url: https://www.linkedin.com/in/reffajnaahsi/
+    image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg
+
+hide_table_of_contents: false
+---
+
+import Image from '@theme/IdealImage';
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Deploy this version
+
+<Tabs>
+<TabItem value="docker" label="Docker">
+
+``` showLineNumbers title="docker run litellm"
+docker run \
+-e STORE_MODEL_IN_DB=True \
+-p 4000:4000 \
+ghcr.io/berriai/litellm:v1.77.2
+```
+</TabItem>
+
+<TabItem value="pip" label="Pip">
+
+``` showLineNumbers title="pip install litellm"
+pip install litellm==1.77.2
+```
+
+</TabItem>
+</Tabs>
+
+---
+
+## Key Highlights
+
+- **Bedrock Batches API** - Support for creating Batch Inference Jobs on Bedrock using LiteLLM's unified batch API (OpenAI compatible)
+- **Qwen API Tiered Pricing** - Cost tracking support for Dashscope (Qwen) models with multiple pricing tiers
+
+## New Models / Updated Models
+
+#### New Model Support
+
+| Provider    | Model                           | Context Window | Pricing ($/1M tokens) | Features |
+| ----------- | ------------------------------- | -------------- | --------------------- | -------- |
+| DeepInfra   | `deepinfra/deepseek-ai/DeepSeek-R1` | 164K | **Input:** $0.70<br/>**Output:** $2.40 | Chat completions, tool calling |
+| Heroku      | `heroku/claude-4-sonnet`        | 8K | Contact provider for pricing | Function calling, tool choice |
+| Heroku      | `heroku/claude-3-7-sonnet`      | 8K | Contact provider for pricing | Function calling, tool choice |
+| Heroku      | `heroku/claude-3-5-sonnet-latest` | 8K | Contact provider for pricing | Function calling, tool choice |
+| Heroku      | `heroku/claude-3-5-haiku`       | 4K | Contact provider for pricing | Function calling, tool choice |
+| Dashscope   | `dashscope/qwen-plus-latest`    | 1M | **Tiered Pricing:**<br/>• 0-256K tokens: $0.40 / $1.20<br/>• 256K-1M tokens: $1.20 / $3.60 | Function calling, reasoning |
+| Dashscope   | `dashscope/qwen3-max-preview`   | 262K | **Tiered Pricing:**<br/>• 0-32K tokens: $1.20 / $6.00<br/>• 32K-128K tokens: $2.40 / $12.00<br/>• 128K-252K tokens: $3.00 / $15.00 | Function calling, reasoning |
+| Dashscope   | `dashscope/qwen-flash`          | 1M | **Tiered Pricing:**<br/>• 0-256K tokens: $0.05 / $0.40<br/>• 256K-1M tokens: $0.25 / $2.00 | Function calling, reasoning |
+| Dashscope   | `dashscope/qwen3-coder-plus`    | 1M | **Tiered Pricing:**<br/>• 0-32K tokens: $1.00 / $5.00<br/>• 32K-128K tokens: $1.80 / $9.00<br/>• 128K-256K tokens: $3.00 / $15.00<br/>• 256K-1M tokens: $6.00 / $60.00 | Function calling, reasoning, caching |
+| Dashscope   | `dashscope/qwen3-coder-flash`   | 1M | **Tiered Pricing:**<br/>• 0-32K tokens: $0.30 / $1.50<br/>• 32K-128K tokens: $0.50 / $2.50<br/>• 128K-256K tokens: $0.80 / $4.00<br/>• 256K-1M tokens: $1.60 / $9.60 | Function calling, reasoning, caching |
+
+---
+
+#### Features
+
+- **[Bedrock](../../docs/providers/bedrock_batches)**
+    - Bedrock Batches API - batch processing support with file upload and request transformation - [PR #14518](https://github.com/BerriAI/litellm/pull/14518), [PR #14522](https://github.com/BerriAI/litellm/pull/14522)
+- **[VLLM](../../docs/providers/vllm)**
+    - Added transcription endpoint support - [PR #14523](https://github.com/BerriAI/litellm/pull/14523)
+- **[Ollama](../../docs/providers/ollama)**
+    - `ollama_chat/` - images, thinking, and content as list handling - [PR #14523](https://github.com/BerriAI/litellm/pull/14523)
+- **General**
+    - New debug flag for detailed request/response logging [PR #14482](https://github.com/BerriAI/litellm/pull/14482)
+
+#### Bug Fixes
+
+- **[Azure OpenAI](../../docs/providers/azure)**
+    - Fixed extra_body injection causing payload rejection in image generation - [PR #14475](https://github.com/BerriAI/litellm/pull/14475)
+- **[LM Studio](../../docs/providers/lm-studio)**
+    - Resolved illegal Bearer header value issue - [PR #14512](https://github.com/BerriAI/litellm/pull/14512)
+
+---
+
+## LLM API Endpoints
+
+#### Bug Fixes
+
+- **[/messages](../../docs/anthropic_unified)**
+    - Don't send content block after message w/ finish reason + usage block - [PR #14477](https://github.com/BerriAI/litellm/pull/14477)
+- **[/generateContent](../../docs/generateContent)**
+    - Gemini CLI Integration - Fixed token count errors - [PR #14451](https://github.com/BerriAI/litellm/pull/14451), [PR #14417](https://github.com/BerriAI/litellm/pull/14417)
+
+---
+
+## Spend Tracking, Budgets and Rate Limiting
+
+#### Features
+
+- **[Qwen API Tiered Pricing](../../docs/providers/dashscope)** - Added comprehensive tiered cost tracking for Dashscope/Qwen models - [PR #14471](https://github.com/BerriAI/litellm/pull/14471), [PR #14479](https://github.com/BerriAI/litellm/pull/14479)
+
+#### Bug Fixes
+
+- **Provider Budgets** - Fixed provider budget calculations - [PR #14459](https://github.com/BerriAI/litellm/pull/14459)
+
+---
+
+## Management Endpoints / UI
+
+#### Features
+
+- **User Headers Mapping** - New X-LiteLLM Users mapping feature for enhanced user tracking - [PR #14485](https://github.com/BerriAI/litellm/pull/14485)
+- **Key Unblocking** - Support for hashed tokens in `/key/unblock` endpoint - [PR #14477](https://github.com/BerriAI/litellm/pull/14477)
+- **Model Group Header Forwarding** - Enhanced wildcard model support with documentation - [PR #14528](https://github.com/BerriAI/litellm/pull/14528)
+
+#### Bug Fixes
+
+- **Log Tab Key Alias** - Fixed filtering inaccuracies for failed logs - [PR #14469](https://github.com/BerriAI/litellm/pull/14469), [PR #14529](https://github.com/BerriAI/litellm/pull/14529)
+
+---
+
+## Logging / Guardrail Integrations
+
+#### Features
+
+- **Noma Integration** - Added non-blocking monitor mode with anonymize input support - [PR #14401](https://github.com/BerriAI/litellm/pull/14401)
+
+---
+
+## Performance / Loadbalancing / Reliability improvements
+
+#### Performance
+- Removed dynamic creation of static values - [PR #14538](https://github.com/BerriAI/litellm/pull/14538)
+- Using `_PROXY_MaxParallelRequestsHandler_v3` by default for optimal throughput - [PR #14450](https://github.com/BerriAI/litellm/pull/14450)
+- Improved execution context propagation into logging tasks - [PR #14455](https://github.com/BerriAI/litellm/pull/14455)
+
+---
+
+
+
+## New Contributors
+* @Sameerlite made their first contribution in [PR #14460](https://github.com/BerriAI/litellm/pull/14460)
+* @holzman made their first contribution in [PR #14459](https://github.com/BerriAI/litellm/pull/14459)
+* @sashank5644 made their first contribution in [PR #14469](https://github.com/BerriAI/litellm/pull/14469)
+* @TomAlon made their first contribution in [PR #14401](https://github.com/BerriAI/litellm/pull/14401)
+* @AlexsanderHamir made their first contribution in [PR #14538](https://github.com/BerriAI/litellm/pull/14538)
+
+---
+
+## **[Full Changelog](https://github.com/BerriAI/litellm/compare/v1.77.1.dev.2...v1.77.2.dev)**
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -49,6 +49,7 @@ const sidebars = {
          "proxy/guardrails/secret_detection",
          "proxy/guardrails/custom_guardrail",
          "proxy/guardrails/prompt_injection",
+          "proxy/guardrails/tool_permission",
        ].sort(),
      ],
    },
@ -141,6 +142,7 @@ const sidebars = {
            "proxy/clientside_auth",
            "proxy/request_headers",
            "proxy/response_headers",
+            "proxy/forward_client_headers",
            "proxy/model_discovery",
          ],
        },
@ -487,7 +489,8 @@ const sidebars = {
        "providers/bytez",
        "providers/heroku",
        "providers/oci",
-        "providers/datarobot",  
+        "providers/datarobot",
+        "providers/ovhcloud",  
      ],
    },
    {
--- a/enterprise/litellm_enterprise/types/proxy/proxy_server.py
+++ b/enterprise/litellm_enterprise/types/proxy/proxy_server.py
@ -1,4 +1,6 @@
-from typing import Literal, TypedDict
+from typing import Literal
+
+from typing_extensions import TypedDict


 class CustomAuthSettings(TypedDict):
--- a/litellm-js/spend-logs/package-lock.json
+++ b/litellm-js/spend-logs/package-lock.json
@ -6,7 +6,7 @@
    "": {
      "dependencies": {
        "@hono/node-server": "^1.10.1",
-        "hono": "^4.6.5"
+        "hono": "^4.9.7"
      },
      "devDependencies": {
        "@types/node": "^20.11.17",
@ -463,9 +463,10 @@
      }
    },
    "node_modules/hono": {
-      "version": "4.6.5",
-      "resolved": "https://registry.npmjs.org/hono/-/hono-4.6.5.tgz",
-      "integrity": "sha512-qsmN3V5fgtwdKARGLgwwHvcdLKursMd+YOt69eGpl1dUCJb8mCd7hZfyZnBYjxCegBG7qkJRQRUy2oO25yHcyQ==",
+      "version": "4.9.7",
+      "resolved": "https://registry.npmjs.org/hono/-/hono-4.9.7.tgz",
+      "integrity": "sha512-t4Te6ERzIaC48W3x4hJmBwgNlLhmiEdEE5ViYb02ffw4ignHNHa5IBtPjmbKstmtKa8X6C35iWwK4HaqvrzG9w==",
+      "license": "MIT",
      "engines": {
        "node": ">=16.9.0"
      }
--- a/litellm-js/spend-logs/package.json
+++ b/litellm-js/spend-logs/package.json
@ -4,7 +4,7 @@
  },
  "dependencies": {
    "@hono/node-server": "^1.10.1",
-    "hono": "^4.6.5"
+    "hono": "^4.9.7"
  },
  "devDependencies": {
    "@types/node": "^20.11.17",
--- a/litellm/init.py
+++ b/litellm/init.py
@ -241,6 +241,7 @@ gradient_ai_api_key: Optional[str] = None
 nebius_key: Optional[str] = None
 heroku_key: Optional[str] = None
 cometapi_key: Optional[str] = None
+ovhcloud_key: Optional[str] = None
 common_cloud_provider_auth_params: dict = {
    "params": ["project", "region_name", "token"],
    "providers": ["vertex_ai", "bedrock", "watsonx", "azure", "vertex_ai_beta"],
@ -520,6 +521,8 @@ cometapi_models: Set = set()
 oci_models: Set = set()
 vercel_ai_gateway_models: Set = set()
 volcengine_models: Set = set()
+ovhcloud_models: Set = set()
+ovhcloud_embedding_models: Set = set()


 def is_bedrock_pricing_only_model(key: str) -> bool:
@ -734,6 +737,10 @@ def add_known_models():
            oci_models.add(key)
        elif value.get("litellm_provider") == "volcengine":
            volcengine_models.add(key)
+        elif value.get("litellm_provider") == "ovhcloud":
+            ovhcloud_models.add(key)
+        elif value.get("litellm_provider") == "ovhcloud-embedding-models":
+            ovhcloud_embedding_models.add(key)


 add_known_models()
@ -828,6 +835,7 @@ model_list = list(
    | heroku_models
    | vercel_ai_gateway_models
    | volcengine_models
+    | ovhcloud_models
 )

 model_list_set = set(model_list)
@ -909,6 +917,7 @@ models_by_provider: dict = {
    "cometapi": cometapi_models,
    "oci": oci_models,
    "volcengine": volcengine_models,
+    "ovhcloud": ovhcloud_models | ovhcloud_embedding_models,
 }

 # mapping for those models which have larger equivalents
@ -943,6 +952,7 @@ all_embedding_models = (
    | fireworks_ai_embedding_models
    | nebius_embedding_models
    | sambanova_embedding_models
+    | ovhcloud_embedding_models
 )

 ####### IMAGE GENERATION MODELS ###################
@ -1255,6 +1265,8 @@ from .llms.morph.chat.transformation import MorphChatConfig
 from .llms.lambda_ai.chat.transformation import LambdaAIChatConfig
 from .llms.hyperbolic.chat.transformation import HyperbolicChatConfig
 from .llms.vercel_ai_gateway.chat.transformation import VercelAIGatewayConfig
+from .llms.ovhcloud.chat.transformation import OVHCloudChatConfig
+from .llms.ovhcloud.embedding.transformation import OVHCloudEmbeddingConfig
 from .main import *  # type: ignore
 from .integrations import *
 from .llms.custom_httpx.async_client_cleanup import close_litellm_async_clients
--- a/litellm/completion_extras/litellm_responses_transformation/handler.py
+++ b/litellm/completion_extras/litellm_responses_transformation/handler.py
@ -2,7 +2,9 @@
 Handler for transforming /chat/completions api requests to litellm.responses requests
 """

-from typing import TYPE_CHECKING, Any, Coroutine, TypedDict, Union
+from typing import TYPE_CHECKING, Any, Coroutine, Union
+
+from typing_extensions import TypedDict

 if TYPE_CHECKING:
    from litellm import CustomStreamWrapper, LiteLLMLoggingObj, ModelResponse
--- a/litellm/constants.py
+++ b/litellm/constants.py
@ -313,6 +313,7 @@ LITELLM_CHAT_PROVIDERS = [
    "morph",
    "lambda_ai",
    "vercel_ai_gateway",
+    "ovhcloud",
 ]

 LITELLM_EMBEDDING_PROVIDERS_SUPPORTING_INPUT_ARRAY_OF_TOKENS = [
@ -1023,6 +1024,7 @@ SENTRY_DENYLIST = [
    "FIREWORKS_API_KEY",
    "FIREWORKS_AI_API_KEY",
    "FIREWORKSAI_API_KEY",
+    "OVHCLOUD_API_KEY",
    # Database and Connection Strings
    "database_url",
    "redis_url",
--- a/litellm/endpoints/speech/speech_to_completion_bridge/handler.py
+++ b/litellm/endpoints/speech/speech_to_completion_bridge/handler.py
@ -2,7 +2,9 @@
 Handler for transforming /chat/completions api requests to litellm.responses requests
 """

-from typing import TYPE_CHECKING, Optional, TypedDict, Union
+from typing import TYPE_CHECKING, Optional, Union
+
+from typing_extensions import TypedDict

 if TYPE_CHECKING:
    from litellm import LiteLLMLoggingObj
--- a/litellm/integrations/SlackAlerting/budget_alert_types.py
+++ b/litellm/integrations/SlackAlerting/budget_alert_types.py
@ -31,7 +31,7 @@ class SoftBudgetAlert(BaseBudgetAlertType):
        return "Soft Budget Crossed: "

    def get_id(self, user_info: CallInfo) -> str:
-        return "default_id"
+        return user_info.token or "default_id"


 class UserBudgetAlert(BaseBudgetAlertType):
--- a/litellm/integrations/datadog/datadog_llm_obs.py
+++ b/litellm/integrations/datadog/datadog_llm_obs.py
@ -64,7 +64,7 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
            asyncio.create_task(self.periodic_flush())
            self.flush_lock = asyncio.Lock()
            self.log_queue: List[LLMObsPayload] = []
-            
+
            #########################################################
            # Handle datadog_llm_observability_params set as litellm.datadog_llm_observability_params
            #########################################################
@ -83,22 +83,25 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
        """
        dict_datadog_llm_obs_params: Dict = {}
        if litellm.datadog_llm_observability_params is not None:
-            if isinstance(litellm.datadog_llm_observability_params, DatadogLLMObsInitParams):
-                dict_datadog_llm_obs_params = litellm.datadog_llm_observability_params.model_dump()
+            if isinstance(
+                litellm.datadog_llm_observability_params, DatadogLLMObsInitParams
+            ):
+                dict_datadog_llm_obs_params = (
+                    litellm.datadog_llm_observability_params.model_dump()
+                )
            elif isinstance(litellm.datadog_llm_observability_params, Dict):
                # only allow params that are of DatadogLLMObsInitParams
-                dict_datadog_llm_obs_params = DatadogLLMObsInitParams(**litellm.datadog_llm_observability_params).model_dump()
+                dict_datadog_llm_obs_params = DatadogLLMObsInitParams(
+                    **litellm.datadog_llm_observability_params
+                ).model_dump()
        return dict_datadog_llm_obs_params
-            

    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
        try:
            verbose_logger.debug(
                f"DataDogLLMObs: Logging success event for model {kwargs.get('model', 'unknown')}"
            )
-            payload = self.create_llm_obs_payload(
-                kwargs, start_time, end_time
-            )
+            payload = self.create_llm_obs_payload(kwargs, start_time, end_time)
            verbose_logger.debug(f"DataDogLLMObs: Payload: {payload}")
            self.log_queue.append(payload)

@ -108,15 +111,13 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
            verbose_logger.exception(
                f"DataDogLLMObs: Error logging success event - {str(e)}"
            )
-    
+
    async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
        try:
            verbose_logger.debug(
                f"DataDogLLMObs: Logging failure event for model {kwargs.get('model', 'unknown')}"
            )
-            payload = self.create_llm_obs_payload(
-                kwargs, start_time, end_time
-            )
+            payload = self.create_llm_obs_payload(kwargs, start_time, end_time)
            verbose_logger.debug(f"DataDogLLMObs: Payload: {payload}")
            self.log_queue.append(payload)

@ -184,7 +185,6 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):

        messages = standard_logging_payload["messages"]
        messages = self._ensure_string_content(messages=messages)
-        response_obj = standard_logging_payload.get("response")

        metadata = kwargs.get("litellm_params", {}).get("metadata", {})

@ -193,10 +193,12 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
                messages
            )
        )
-        output_meta = OutputMeta(messages=self._get_response_messages(
-            response_obj=response_obj,
-            call_type=standard_logging_payload.get("call_type")
-        ))
+        output_meta = OutputMeta(
+            messages=self._get_response_messages(
+                standard_logging_payload=standard_logging_payload,
+                call_type=standard_logging_payload.get("call_type"),
+            )
+        )

        error_info = self._assemble_error_info(standard_logging_payload)

@ -214,7 +216,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
            output_tokens=float(standard_logging_payload.get("completion_tokens", 0)),
            total_tokens=float(standard_logging_payload.get("total_tokens", 0)),
            total_cost=float(standard_logging_payload.get("response_cost", 0)),
-            time_to_first_token=self._get_time_to_first_token_seconds(standard_logging_payload),
+            time_to_first_token=self._get_time_to_first_token_seconds(
+                standard_logging_payload
+            ),
        )

        payload: LLMObsPayload = LLMObsPayload(
@ -251,27 +255,35 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
        except Exception:
            pass
        return None
-    
-    def _assemble_error_info(self, standard_logging_payload: StandardLoggingPayload) -> Optional[DDLLMObsError]:
+
+    def _assemble_error_info(
+        self, standard_logging_payload: StandardLoggingPayload
+    ) -> Optional[DDLLMObsError]:
        """
        Assemble error information for failure cases according to DD LLM Obs API spec
        """
        # Handle error information for failure cases according to DD LLM Obs API spec
        error_info: Optional[DDLLMObsError] = None
-        
+
        if standard_logging_payload.get("status") == "failure":
            # Try to get structured error information first
-            error_information: Optional[StandardLoggingPayloadErrorInformation] = standard_logging_payload.get("error_information")
-            
+            error_information: Optional[
+                StandardLoggingPayloadErrorInformation
+            ] = standard_logging_payload.get("error_information")
+
            if error_information:
                error_info = DDLLMObsError(
-                    message=error_information.get("error_message") or standard_logging_payload.get("error_str") or "Unknown error",
+                    message=error_information.get("error_message")
+                    or standard_logging_payload.get("error_str")
+                    or "Unknown error",
                    type=error_information.get("error_class"),
-                    stack=error_information.get("traceback")
+                    stack=error_information.get("traceback"),
                )
        return error_info

-    def _get_time_to_first_token_seconds(self, standard_logging_payload: StandardLoggingPayload) -> float:
+    def _get_time_to_first_token_seconds(
+        self, standard_logging_payload: StandardLoggingPayload
+    ) -> float:
        """
        Get the time to first token in seconds

@ -280,7 +292,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
        For non streaming calls, CompletionStartTime is time we get the response back
        """
        start_time: Optional[float] = standard_logging_payload.get("startTime")
-        completion_start_time: Optional[float] = standard_logging_payload.get("completionStartTime")
+        completion_start_time: Optional[float] = standard_logging_payload.get(
+            "completionStartTime"
+        )
        end_time: Optional[float] = standard_logging_payload.get("endTime")

        if completion_start_time is not None and start_time is not None:
@ -290,19 +304,43 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
        else:
            return 0.0

-
    def _get_response_messages(
-        self, response_obj: Any, call_type: Optional[str]
+        self, standard_logging_payload: StandardLoggingPayload, call_type: Optional[str]
    ) -> List[Any]:
        """
        Get the messages from the response object

        for now this handles logging /chat/completions responses
        """
+
+        response_obj = standard_logging_payload.get("response")
        if response_obj is None:
            return []
-        
-        if call_type in [CallTypes.completion.value, CallTypes.acompletion.value]:
+
+        # edge case: handle response_obj is a string representation of a dict
+        if isinstance(response_obj, str):
+            try:
+                import ast
+
+                response_obj = ast.literal_eval(response_obj)
+            except (ValueError, SyntaxError):
+                try:
+                    # fallback to json parsing
+                    response_obj = json.loads(str(response_obj))
+                except json.JSONDecodeError:
+                    return []
+
+        if call_type in [
+            CallTypes.completion.value,
+            CallTypes.acompletion.value,
+            CallTypes.text_completion.value,
+            CallTypes.atext_completion.value,
+            CallTypes.generate_content.value,
+            CallTypes.agenerate_content.value,
+            CallTypes.generate_content_stream.value,
+            CallTypes.agenerate_content_stream.value,
+            CallTypes.anthropic_messages.value,
+        ]:
            try:
                # Safely extract message from response_obj, handle failure cases
                if isinstance(response_obj, dict) and "choices" in response_obj:
@ -315,102 +353,104 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
                return []
        return []

-    def _get_datadog_span_kind(self, call_type: Optional[str]) -> Literal["llm", "tool", "task", "embedding", "retrieval"]:
+    def _get_datadog_span_kind(
+        self, call_type: Optional[str]
+    ) -> Literal["llm", "tool", "task", "embedding", "retrieval"]:
        """
        Map liteLLM call_type to appropriate DataDog LLM Observability span kind.
-        
+
        Available DataDog span kinds: "llm", "tool", "task", "embedding", "retrieval"
        """
        if call_type is None:
            return "llm"
-        
+
        # Embedding operations
        if call_type in [CallTypes.embedding.value, CallTypes.aembedding.value]:
            return "embedding"
-        
-        # LLM completion operations  
+
+        # LLM completion operations
        if call_type in [
-            CallTypes.completion.value, 
+            CallTypes.completion.value,
            CallTypes.acompletion.value,
-            CallTypes.text_completion.value, 
+            CallTypes.text_completion.value,
            CallTypes.atext_completion.value,
-            CallTypes.generate_content.value, 
+            CallTypes.generate_content.value,
            CallTypes.agenerate_content.value,
-            CallTypes.generate_content_stream.value, 
+            CallTypes.generate_content_stream.value,
            CallTypes.agenerate_content_stream.value,
-            CallTypes.anthropic_messages.value
+            CallTypes.anthropic_messages.value,
        ]:
            return "llm"
-        
+
        # Tool operations
        if call_type in [CallTypes.call_mcp_tool.value]:
            return "tool"
-            
+
        # Retrieval operations
        if call_type in [
-            CallTypes.get_assistants.value, 
+            CallTypes.get_assistants.value,
            CallTypes.aget_assistants.value,
-            CallTypes.get_thread.value, 
+            CallTypes.get_thread.value,
            CallTypes.aget_thread.value,
-            CallTypes.get_messages.value, 
+            CallTypes.get_messages.value,
            CallTypes.aget_messages.value,
-            CallTypes.afile_retrieve.value, 
+            CallTypes.afile_retrieve.value,
            CallTypes.file_retrieve.value,
-            CallTypes.afile_list.value, 
+            CallTypes.afile_list.value,
            CallTypes.file_list.value,
-            CallTypes.afile_content.value, 
+            CallTypes.afile_content.value,
            CallTypes.file_content.value,
-            CallTypes.retrieve_batch.value, 
+            CallTypes.retrieve_batch.value,
            CallTypes.aretrieve_batch.value,
-            CallTypes.retrieve_fine_tuning_job.value, 
+            CallTypes.retrieve_fine_tuning_job.value,
            CallTypes.aretrieve_fine_tuning_job.value,
-            CallTypes.responses.value, 
+            CallTypes.responses.value,
            CallTypes.aresponses.value,
-            CallTypes.alist_input_items.value
+            CallTypes.alist_input_items.value,
        ]:
            return "retrieval"
-            
+
        # Task operations (batch, fine-tuning, file operations, etc.)
        if call_type in [
-            CallTypes.create_batch.value, 
+            CallTypes.create_batch.value,
            CallTypes.acreate_batch.value,
-            CallTypes.create_fine_tuning_job.value, 
+            CallTypes.create_fine_tuning_job.value,
            CallTypes.acreate_fine_tuning_job.value,
-            CallTypes.cancel_fine_tuning_job.value, 
+            CallTypes.cancel_fine_tuning_job.value,
            CallTypes.acancel_fine_tuning_job.value,
-            CallTypes.list_fine_tuning_jobs.value, 
+            CallTypes.list_fine_tuning_jobs.value,
            CallTypes.alist_fine_tuning_jobs.value,
-            CallTypes.create_assistants.value, 
+            CallTypes.create_assistants.value,
            CallTypes.acreate_assistants.value,
-            CallTypes.delete_assistant.value, 
+            CallTypes.delete_assistant.value,
            CallTypes.adelete_assistant.value,
-            CallTypes.create_thread.value, 
+            CallTypes.create_thread.value,
            CallTypes.acreate_thread.value,
-            CallTypes.add_message.value, 
+            CallTypes.add_message.value,
            CallTypes.a_add_message.value,
-            CallTypes.run_thread.value, 
+            CallTypes.run_thread.value,
            CallTypes.arun_thread.value,
-            CallTypes.run_thread_stream.value, 
+            CallTypes.run_thread_stream.value,
            CallTypes.arun_thread_stream.value,
-            CallTypes.file_delete.value, 
+            CallTypes.file_delete.value,
            CallTypes.afile_delete.value,
-            CallTypes.create_file.value, 
+            CallTypes.create_file.value,
            CallTypes.acreate_file.value,
-            CallTypes.image_generation.value, 
+            CallTypes.image_generation.value,
            CallTypes.aimage_generation.value,
-            CallTypes.image_edit.value, 
+            CallTypes.image_edit.value,
            CallTypes.aimage_edit.value,
-            CallTypes.moderation.value, 
+            CallTypes.moderation.value,
            CallTypes.amoderation.value,
-            CallTypes.transcription.value, 
+            CallTypes.transcription.value,
            CallTypes.atranscription.value,
-            CallTypes.speech.value, 
+            CallTypes.speech.value,
            CallTypes.aspeech.value,
-            CallTypes.rerank.value, 
-            CallTypes.arerank.value
+            CallTypes.rerank.value,
+            CallTypes.arerank.value,
        ]:
            return "task"
-            
+
        # Default fallback for unknown or passthrough operations
        return "llm"

@ -443,7 +483,9 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
            "cache_hit": standard_logging_payload.get("cache_hit", "unknown"),
            "cache_key": standard_logging_payload.get("cache_key", "unknown"),
            "saved_cache_cost": standard_logging_payload.get("saved_cache_cost", 0),
-            "guardrail_information": standard_logging_payload.get("guardrail_information", None),
+            "guardrail_information": standard_logging_payload.get(
+                "guardrail_information", None
+            ),
        }

        #########################################################
@ -452,22 +494,32 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
        latency_metrics = self._get_latency_metrics(standard_logging_payload)
        _metadata.update({"latency_metrics": dict(latency_metrics)})

+        ## extract tool calls and add to metadata
+        tool_call_metadata = self._extract_tool_call_metadata(standard_logging_payload)
+        _metadata.update(tool_call_metadata)
+
        _standard_logging_metadata: dict = (
            dict(standard_logging_payload.get("metadata", {})) or {}
        )
        _metadata.update(_standard_logging_metadata)
        return _metadata

-    def _get_latency_metrics(self, standard_logging_payload: StandardLoggingPayload) -> DDLLMObsLatencyMetrics:
+    def _get_latency_metrics(
+        self, standard_logging_payload: StandardLoggingPayload
+    ) -> DDLLMObsLatencyMetrics:
        """
        Get the latency metrics from the standard logging payload
        """
        latency_metrics: DDLLMObsLatencyMetrics = DDLLMObsLatencyMetrics()
        # Add latency metrics to metadata
        # Time to first token (convert from seconds to milliseconds for consistency)
-        time_to_first_token_seconds = self._get_time_to_first_token_seconds(standard_logging_payload)
+        time_to_first_token_seconds = self._get_time_to_first_token_seconds(
+            standard_logging_payload
+        )
        if time_to_first_token_seconds > 0:
-            latency_metrics["time_to_first_token_ms"] = time_to_first_token_seconds * 1000
+            latency_metrics["time_to_first_token_ms"] = (
+                time_to_first_token_seconds * 1000
+            )

        # LiteLLM overhead time
        hidden_params = standard_logging_payload.get("hidden_params", {})
@ -476,11 +528,143 @@ class DataDogLLMObsLogger(DataDogLogger, CustomBatchLogger):
            latency_metrics["litellm_overhead_time_ms"] = litellm_overhead_ms

        # Guardrail overhead latency
-        guardrail_info: Optional[StandardLoggingGuardrailInformation] = standard_logging_payload.get("guardrail_information")
+        guardrail_info: Optional[
+            StandardLoggingGuardrailInformation
+        ] = standard_logging_payload.get("guardrail_information")
        if guardrail_info is not None:
-            _guardrail_duration_seconds: Optional[float] = guardrail_info.get("duration")
+            _guardrail_duration_seconds: Optional[float] = guardrail_info.get(
+                "duration"
+            )
            if _guardrail_duration_seconds is not None:
                # Convert from seconds to milliseconds for consistency
-                latency_metrics["guardrail_overhead_time_ms"] = _guardrail_duration_seconds * 1000
-            
-        return latency_metrics
+                latency_metrics["guardrail_overhead_time_ms"] = (
+                    _guardrail_duration_seconds * 1000
+                )
+
+        return latency_metrics
+
+    def _process_input_messages_preserving_tool_calls(
+        self, messages: List[Any]
+    ) -> List[Dict[str, Any]]:
+        """
+        Process input messages while preserving tool_calls and tool message types.
+
+        This bypasses the lossy string conversion when tool calls are present,
+        allowing complex nested tool_calls objects to be preserved for Datadog.
+        """
+        processed = []
+        for msg in messages:
+            if isinstance(msg, dict):
+                # Preserve messages with tool_calls or tool role as-is
+                if "tool_calls" in msg or msg.get("role") == "tool":
+                    processed.append(msg)
+                else:
+                    # For regular messages, still apply string conversion
+                    converted = (
+                        handle_any_messages_to_chat_completion_str_messages_conversion(
+                            [msg]
+                        )
+                    )
+                    processed.extend(converted)
+            else:
+                # For non-dict messages, apply string conversion
+                converted = (
+                    handle_any_messages_to_chat_completion_str_messages_conversion(
+                        [msg]
+                    )
+                )
+                processed.extend(converted)
+        return processed
+
+    @staticmethod
+    def _tool_calls_kv_pair(tool_calls: List[Dict[str, Any]]) -> Dict[str, Any]:
+        """
+        Extract tool call information into key-value pairs for Datadog metadata.
+
+        Similar to OpenTelemetry's implementation but adapted for Datadog's format.
+        """
+        kv_pairs: Dict[str, Any] = {}
+        for idx, tool_call in enumerate(tool_calls):
+            try:
+                # Extract tool call ID
+                tool_id = tool_call.get("id")
+                if tool_id:
+                    kv_pairs[f"tool_calls.{idx}.id"] = tool_id
+
+                # Extract tool call type
+                tool_type = tool_call.get("type")
+                if tool_type:
+                    kv_pairs[f"tool_calls.{idx}.type"] = tool_type
+
+                # Extract function information
+                function = tool_call.get("function")
+                if function:
+                    function_name = function.get("name")
+                    if function_name:
+                        kv_pairs[f"tool_calls.{idx}.function.name"] = function_name
+
+                    function_arguments = function.get("arguments")
+                    if function_arguments:
+                        # Store arguments as JSON string for Datadog
+                        if isinstance(function_arguments, str):
+                            kv_pairs[
+                                f"tool_calls.{idx}.function.arguments"
+                            ] = function_arguments
+                        else:
+                            import json
+
+                            kv_pairs[
+                                f"tool_calls.{idx}.function.arguments"
+                            ] = json.dumps(function_arguments)
+            except (KeyError, TypeError, ValueError) as e:
+                verbose_logger.debug(
+                    f"DataDogLLMObs: Error processing tool call {idx}: {str(e)}"
+                )
+                continue
+
+        return kv_pairs
+
+    def _extract_tool_call_metadata(
+        self, standard_logging_payload: StandardLoggingPayload
+    ) -> Dict[str, Any]:
+        """
+        Extract tool call information from both input messages and response for Datadog metadata.
+        """
+        tool_call_metadata: Dict[str, Any] = {}
+
+        try:
+            # Extract tool calls from input messages
+            messages = standard_logging_payload.get("messages", [])
+            if messages and isinstance(messages, list):
+                for message in messages:
+                    if isinstance(message, dict) and "tool_calls" in message:
+                        tool_calls = message.get("tool_calls")
+                        if tool_calls:
+                            input_tool_calls_kv = self._tool_calls_kv_pair(tool_calls)
+                            # Prefix with "input_" to distinguish from response tool calls
+                            for key, value in input_tool_calls_kv.items():
+                                tool_call_metadata[f"input_{key}"] = value
+
+            # Extract tool calls from response
+            response_obj = standard_logging_payload.get("response")
+            if response_obj and isinstance(response_obj, dict):
+                choices = response_obj.get("choices", [])
+                for choice in choices:
+                    if isinstance(choice, dict):
+                        message = choice.get("message")
+                        if message and isinstance(message, dict):
+                            tool_calls = message.get("tool_calls")
+                            if tool_calls:
+                                response_tool_calls_kv = self._tool_calls_kv_pair(
+                                    tool_calls
+                                )
+                                # Prefix with "output_" to distinguish from input tool calls
+                                for key, value in response_tool_calls_kv.items():
+                                    tool_call_metadata[f"output_{key}"] = value
+
+        except Exception as e:
+            verbose_logger.debug(
+                f"DataDogLLMObs: Error extracting tool call metadata: {str(e)}"
+            )
+
+        return tool_call_metadata
--- a/litellm/integrations/humanloop.py
+++ b/litellm/integrations/humanloop.py
@ -4,9 +4,10 @@ Humanloop integration
 https://humanloop.com/
 """

-from typing import Any, Dict, List, Optional, Tuple, TypedDict, Union, cast
+from typing import Any, Dict, List, Optional, Tuple, Union, cast

 import httpx
+from typing_extensions import TypedDict

 import litellm
 from litellm.caching import DualCache
--- a/litellm/integrations/prompt_management_base.py
+++ b/litellm/integrations/prompt_management_base.py
@ -1,5 +1,7 @@
 from abc import ABC, abstractmethod
-from typing import Any, Dict, List, Optional, Tuple, TypedDict
+from typing import Any, Dict, List, Optional, Tuple
+
+from typing_extensions import TypedDict

 from litellm.types.llms.openai import AllMessageValues
 from litellm.types.utils import StandardCallbackDynamicParams
--- a/litellm/litellm_core_utils/get_llm_provider_logic.py
+++ b/litellm/litellm_core_utils/get_llm_provider_logic.py
@ -374,6 +374,8 @@ def get_llm_provider(  # noqa: PLR0915
            custom_llm_provider = "oci"
        elif model.startswith("compactifai/"):
            custom_llm_provider = "compactifai"
+        elif model.startswith("ovhcloud/"):
+            custom_llm_provider = "ovhcloud"
        if not custom_llm_provider:
            if litellm.suppress_debug_info is False:
                print()  # noqa
--- a/litellm/litellm_core_utils/llm_response_utils/convert_dict_to_response.py
+++ b/litellm/litellm_core_utils/llm_response_utils/convert_dict_to_response.py
@ -1,6 +1,5 @@
 import asyncio
 import json
-import re
 import time
 import traceback
 import uuid
@ -9,6 +8,9 @@ from typing import Dict, Iterable, List, Literal, Optional, Tuple, Union
 import litellm
 from litellm._logging import verbose_logger
 from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
+from litellm.litellm_core_utils.prompt_templates.common_utils import (
+    _extract_reasoning_content,
+)
 from litellm.types.llms.databricks import DatabricksTool
 from litellm.types.llms.openai import (
    ChatCompletionThinkingBlock,
@ -274,49 +276,6 @@ def _handle_invalid_parallel_tool_calls(
        return tool_calls


-def _parse_content_for_reasoning(
-    message_text: Optional[str],
-) -> Tuple[Optional[str], Optional[str]]:
-    """
-    Parse the content for reasoning
-
-    Returns:
-    - reasoning_content: The content of the reasoning
-    - content: The content of the message
-    """
-    if not message_text:
-        return None, message_text
-
-    reasoning_match = re.match(
-        r"<(?:think|thinking)>(.*?)</(?:think|thinking)>(.*)", message_text, re.DOTALL
-    )
-
-    if reasoning_match:
-        return reasoning_match.group(1), reasoning_match.group(2)
-
-    return None, message_text
-
-
-def _extract_reasoning_content(message: dict) -> Tuple[Optional[str], Optional[str]]:
-    """
-    Extract reasoning content and main content from a message.
-
-    Args:
-        message (dict): The message dictionary that may contain reasoning_content
-
-    Returns:
-        tuple[Optional[str], Optional[str]]: A tuple of (reasoning_content, content)
-    """
-    message_content = message.get("content")
-    if "reasoning_content" in message:
-        return message["reasoning_content"], message["content"]
-    elif "reasoning" in message:
-        return message["reasoning"], message["content"]
-    elif isinstance(message_content, str):
-        return _parse_content_for_reasoning(message_content)
-    return None, message_content
-
-
 class LiteLLMResponseObjectHandler:
    @staticmethod
    def convert_to_image_response(
--- a/litellm/litellm_core_utils/logging_worker.py
+++ b/litellm/litellm_core_utils/logging_worker.py
@ -1,7 +1,9 @@
 import asyncio
 import contextlib
 import contextvars
-from typing import Coroutine, Optional, TypedDict
+from typing import Coroutine, Optional
+
+from typing_extensions import TypedDict

 from litellm._logging import verbose_logger

--- a/litellm/litellm_core_utils/prompt_templates/common_utils.py
+++ b/litellm/litellm_core_utils/prompt_templates/common_utils.py
@ -14,6 +14,7 @@ from typing import (
    Literal,
    Mapping,
    Optional,
+    Tuple,
    Union,
    cast,
 )
@ -869,3 +870,63 @@ def convert_prefix_message_to_non_prefix_messages(
        else:
            new_messages.append(message)
    return new_messages
+
+
+def _extract_reasoning_content(message: dict) -> Tuple[Optional[str], Optional[str]]:
+    """
+    Extract reasoning content and main content from a message.
+
+    Args:
+        message (dict): The message dictionary that may contain reasoning_content
+
+    Returns:
+        tuple[Optional[str], Optional[str]]: A tuple of (reasoning_content, content)
+    """
+    message_content = message.get("content")
+    if "reasoning_content" in message:
+        return message["reasoning_content"], message["content"]
+    elif "reasoning" in message:
+        return message["reasoning"], message["content"]
+    elif isinstance(message_content, str):
+        return _parse_content_for_reasoning(message_content)
+    return None, message_content
+
+
+def _parse_content_for_reasoning(
+    message_text: Optional[str],
+) -> Tuple[Optional[str], Optional[str]]:
+    """
+    Parse the content for reasoning
+
+    Returns:
+    - reasoning_content: The content of the reasoning
+    - content: The content of the message
+    """
+    if not message_text:
+        return None, message_text
+
+    reasoning_match = re.match(
+        r"<(?:think|thinking)>(.*?)</(?:think|thinking)>(.*)", message_text, re.DOTALL
+    )
+
+    if reasoning_match:
+        return reasoning_match.group(1), reasoning_match.group(2)
+
+    return None, message_text
+
+
+def extract_images_from_message(message: AllMessageValues) -> List[str]:
+    """
+    Extract images from a message
+    """
+    images = []
+    message_content = message.get("content")
+    if isinstance(message_content, list):
+        for m in message_content:
+            image_url = m.get("image_url")
+            if image_url:
+                if isinstance(image_url, str):
+                    images.append(image_url)
+                elif isinstance(image_url, dict) and "url" in image_url:
+                    images.append(image_url["url"])
+    return images
--- a/litellm/litellm_core_utils/streaming_handler.py
+++ b/litellm/litellm_core_utils/streaming_handler.py
@ -1024,6 +1024,8 @@ class CustomStreamWrapper:
        return

    def chunk_creator(self, chunk: Any):  # type: ignore  # noqa: PLR0915
+        if hasattr(chunk, 'id'):
+            self.response_id = chunk.id
        model_response = self.model_response_creator()
        response_obj: Dict[str, Any] = {}
        try:
--- a/litellm/llms/base_llm/audio_transcription/transformation.py
+++ b/litellm/llms/base_llm/audio_transcription/transformation.py
@ -1,6 +1,6 @@
 from abc import ABC, abstractmethod
 from dataclasses import dataclass
-from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
+from typing import TYPE_CHECKING, Any, List, Optional, Union

 import httpx

@ -23,12 +23,13 @@ else:
 class AudioTranscriptionRequestData:
    """
    Structured data for audio transcription requests.
-    
+
    Attributes:
        data: The request data (form data for multipart, json data for regular requests)
        files: Optional files dict for multipart form data
        content_type: Optional content type override
    """
+
    data: Union[dict, bytes]
    files: Optional[dict] = None
    content_type: Optional[str] = None
@ -66,13 +67,11 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
        audio_file: FileTypes,
        optional_params: dict,
        litellm_params: dict,
-    ) -> Union[AudioTranscriptionRequestData, Dict]:
+    ) -> AudioTranscriptionRequestData:
        raise NotImplementedError(
            "AudioTranscriptionConfig needs a request transformation for audio transcription models"
        )

-
-    
    def transform_audio_transcription_response(
        self,
        raw_response: httpx.Response,
@ -110,7 +109,6 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
        raise NotImplementedError(
            "AudioTranscriptionConfig does not need a response transformation for audio transcription models"
        )
-    

    def get_provider_specific_params(
        self,
@ -141,7 +139,7 @@ class BaseAudioTranscriptionConfig(BaseConfig, ABC):
            provider_specific_params[key] = value

        return provider_specific_params
-    
+
    def _should_exclude_param(
        self,
        param_name: str,
--- a/litellm/llms/bedrock/chat/converse_transformation.py
+++ b/litellm/llms/bedrock/chat/converse_transformation.py
@ -14,7 +14,7 @@ from litellm._logging import verbose_logger
 from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
 from litellm.litellm_core_utils.core_helpers import map_finish_reason
 from litellm.litellm_core_utils.litellm_logging import Logging
-from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
+from litellm.litellm_core_utils.prompt_templates.common_utils import (
    _parse_content_for_reasoning,
 )
 from litellm.litellm_core_utils.prompt_templates.factory import (
@ -397,7 +397,11 @@ class AmazonConverseConfig(BaseConfig):
        for param, value in non_default_params.items():
            if param == "response_format" and isinstance(value, dict):
                optional_params = self._translate_response_format_param(
-                    value=value, model=model, optional_params=optional_params, non_default_params=non_default_params, is_thinking_enabled=is_thinking_enabled
+                    value=value,
+                    model=model,
+                    optional_params=optional_params,
+                    non_default_params=non_default_params,
+                    is_thinking_enabled=is_thinking_enabled,
                )
            if param == "max_tokens" or param == "max_completion_tokens":
                optional_params["maxTokens"] = value
@ -446,11 +450,11 @@ class AmazonConverseConfig(BaseConfig):
            )

        return optional_params
-    
+
    def _translate_response_format_param(
-        self, 
-        value: dict, 
-        model: str, 
+        self,
+        value: dict,
+        model: str,
        optional_params: dict,
        non_default_params: dict,
        is_thinking_enabled: bool,
@ -504,7 +508,7 @@ class AmazonConverseConfig(BaseConfig):
        optional_params["json_mode"] = True
        if non_default_params.get("stream", False) is True:
            optional_params["fake_stream"] = True
-        
+
        return optional_params

    def update_optional_params_with_thinking_tokens(
--- a/litellm/llms/bedrock/chat/invoke_transformations/amazon_deepseek_transformation.py
+++ b/litellm/llms/bedrock/chat/invoke_transformations/amazon_deepseek_transformation.py
@ -3,7 +3,7 @@ from typing import Any, List, Optional, cast
 from httpx import Response

 from litellm import verbose_logger
-from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
+from litellm.litellm_core_utils.prompt_templates.common_utils import (
    _parse_content_for_reasoning,
 )
 from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
--- a/litellm/llms/custom_httpx/llm_http_handler.py
+++ b/litellm/llms/custom_httpx/llm_http_handler.py
@ -118,7 +118,6 @@ class BaseLLMHTTPHandler:
        response: Optional[httpx.Response] = None
        for i in range(max(max_retry_on_unprocessable_entity_error, 1)):
            try:
-
                response = await async_httpx_client.post(
                    url=api_base,
                    headers=headers,
@ -2221,7 +2220,9 @@ class BaseLLMHTTPHandler:

        if isinstance(transformed_request, dict) and "method" in transformed_request:
            # Handle pre-signed requests (e.g., from Bedrock S3 uploads)
-            upload_response = getattr(sync_httpx_client, transformed_request["method"].lower())(
+            upload_response = getattr(
+                sync_httpx_client, transformed_request["method"].lower()
+            )(
                url=transformed_request["url"],
                headers=transformed_request["headers"],
                data=transformed_request["data"],
@ -2233,8 +2234,8 @@ class BaseLLMHTTPHandler:
            # Handle traditional file uploads
            # Ensure transformed_request is a string for httpx compatibility
            if isinstance(transformed_request, bytes):
-                transformed_request = transformed_request.decode('utf-8')
-            
+                transformed_request = transformed_request.decode("utf-8")
+
            # Use the HTTP method specified by the provider config
            http_method = provider_config.file_upload_http_method.upper()
            if http_method == "PUT":
@ -2314,7 +2315,7 @@ class BaseLLMHTTPHandler:
            )
        else:
            async_httpx_client = client
-        
+
        #########################################################
        # Debug Logging
        #########################################################
@ -2330,7 +2331,9 @@ class BaseLLMHTTPHandler:

        if isinstance(transformed_request, dict) and "method" in transformed_request:
            # Handle pre-signed requests (e.g., from Bedrock S3 uploads)
-            upload_response = await getattr(async_httpx_client, transformed_request["method"].lower())(
+            upload_response = await getattr(
+                async_httpx_client, transformed_request["method"].lower()
+            )(
                url=transformed_request["url"],
                headers=transformed_request["headers"],
                data=transformed_request["data"],
@ -2342,8 +2345,8 @@ class BaseLLMHTTPHandler:
            # Handle traditional file uploads
            # Ensure transformed_request is a string for httpx compatibility
            if isinstance(transformed_request, bytes):
-                transformed_request = transformed_request.decode('utf-8')
-            
+                transformed_request = transformed_request.decode("utf-8")
+
            # Use the HTTP method specified by the provider config
            http_method = provider_config.file_upload_http_method.upper()
            if http_method == "PUT":
@ -2468,9 +2471,14 @@ class BaseLLMHTTPHandler:
            sync_httpx_client = client

        try:
-            if isinstance(transformed_request, dict) and "method" in transformed_request:
+            if (
+                isinstance(transformed_request, dict)
+                and "method" in transformed_request
+            ):
                # Handle pre-signed requests (e.g., from Bedrock with AWS auth)
-                batch_response = getattr(sync_httpx_client, transformed_request["method"].lower())(
+                batch_response = getattr(
+                    sync_httpx_client, transformed_request["method"].lower()
+                )(
                    url=transformed_request["url"],
                    headers=transformed_request["headers"],
                    data=transformed_request["data"],
@ -2500,8 +2508,11 @@ class BaseLLMHTTPHandler:
            )

        # Store original request for response transformation
-        litellm_params_with_request = {**litellm_params, "original_batch_request": create_batch_data}
-        
+        litellm_params_with_request = {
+            **litellm_params,
+            "original_batch_request": create_batch_data,
+        }
+
        return provider_config.transform_create_batch_response(
            model=model,
            raw_response=batch_response,
@ -2531,7 +2542,7 @@ class BaseLLMHTTPHandler:
            )
        else:
            async_httpx_client = client
-        
+
        #########################################################
        # Debug Logging
        #########################################################
@ -2546,9 +2557,14 @@ class BaseLLMHTTPHandler:
        )

        try:
-            if isinstance(transformed_request, dict) and "method" in transformed_request:
+            if (
+                isinstance(transformed_request, dict)
+                and "method" in transformed_request
+            ):
                # Handle pre-signed requests (e.g., from Bedrock with AWS auth)
-                batch_response = await getattr(async_httpx_client, transformed_request["method"].lower())(
+                batch_response = await getattr(
+                    async_httpx_client, transformed_request["method"].lower()
+                )(
                    url=transformed_request["url"],
                    headers=transformed_request["headers"],
                    data=transformed_request["data"],
@ -2578,8 +2594,11 @@ class BaseLLMHTTPHandler:
            )

        # Store original request for response transformation (for async version)
-        litellm_params_with_request = {**litellm_params, "original_batch_request": create_batch_data or {}}
-        
+        litellm_params_with_request = {
+            **litellm_params,
+            "original_batch_request": create_batch_data or {},
+        }
+
        return provider_config.transform_create_batch_response(
            model=model,
            raw_response=batch_response,
--- a/litellm/llms/gemini/chat/transformation.py
+++ b/litellm/llms/gemini/chat/transformation.py
@ -1,10 +1,13 @@
-from typing import List, Optional
+from typing import List, Optional, cast

 from litellm.litellm_core_utils.prompt_templates.factory import (
    convert_generic_image_chunk_to_openai_image_obj,
    convert_to_anthropic_image_obj,
 )
-from litellm.types.llms.openai import AllMessageValues
+from litellm.litellm_core_utils.prompt_templates.image_handling import (
+    convert_url_to_base64,
+)
+from litellm.types.llms.openai import AllMessageValues, ChatCompletionFileObject
 from litellm.types.llms.vertex_ai import ContentType, PartType
 from litellm.utils import supports_reasoning

@ -99,7 +102,8 @@ class GoogleAIStudioGeminiConfig(VertexGeminiConfig):
        self, messages: List[AllMessageValues]
    ) -> List[ContentType]:
        """
-        Google AI Studio Gemini does not support image urls in messages.
+        Google AI Studio Gemini does not support HTTP/HTTPS URLs for files.
+        Convert them to base64 data instead.
        """
        for message in messages:
            _message_content = message.get("content")
@ -124,4 +128,16 @@ class GoogleAIStudioGeminiConfig(VertexGeminiConfig):
                                    image_obj
                                )
                            )
+                    elif element.get("type") == "file":
+                        file_element = cast(ChatCompletionFileObject, element)
+                        file_id = file_element["file"].get("file_id")
+                        if file_id and ("http://" in file_id or "https://" in file_id):
+                            # Convert HTTP/HTTPS file URL to base64 data
+                            try:
+                                base64_data = convert_url_to_base64(file_id)
+                                file_element["file"]["file_data"] = base64_data  # type: ignore
+                                file_element["file"].pop("file_id", None)  # type: ignore
+                            except Exception:
+                                # If conversion fails, leave as is and let the API handle it
+                                pass
        return _gemini_convert_messages_with_history(messages=messages)
--- a/litellm/llms/hosted_vllm/transcriptions/transformation.py
+++ b/litellm/llms/hosted_vllm/transcriptions/transformation.py
@ -0,0 +1,72 @@
+"""
+Transformation logic for Hosted VLLM rerank
+"""
+
+from typing import Optional, Union
+
+import httpx
+
+from litellm.llms.base_llm.audio_transcription.transformation import (
+    AudioTranscriptionRequestData,
+)
+from litellm.llms.base_llm.chat.transformation import BaseLLMException
+from litellm.llms.openai.transcriptions.whisper_transformation import (
+    OpenAIWhisperAudioTranscriptionConfig,
+)
+from litellm.types.utils import FileTypes
+
+
+class HostedVLLMAudioTranscriptionError(BaseLLMException):
+    def __init__(
+        self,
+        status_code: int,
+        message: str,
+        headers: Optional[Union[dict, httpx.Headers]] = None,
+    ):
+        super().__init__(status_code=status_code, message=message, headers=headers)
+
+
+class HostedVLLMAudioTranscriptionConfig(OpenAIWhisperAudioTranscriptionConfig):
+    def __init__(self) -> None:
+        pass
+
+    def get_complete_url(
+        self,
+        api_base: Optional[str],
+        api_key: Optional[str],
+        model: str,
+        optional_params: dict,
+        litellm_params: dict,
+        stream: Optional[bool] = None,
+    ) -> str:
+        if api_base:
+            # Remove trailing slashes and ensure clean base URL
+            api_base = api_base.rstrip("/")
+            if not api_base.endswith("/v1/audio/transcriptions"):
+                api_base = f"{api_base}/v1/audio/transcriptions"
+            return api_base
+        raise ValueError("api_base must be provided for Hosted VLLM rerank")
+
+    def transform_audio_transcription_request(
+        self,
+        model: str,
+        audio_file: FileTypes,
+        optional_params: dict,
+        litellm_params: dict,
+    ) -> AudioTranscriptionRequestData:
+        """
+        Transform the audio transcription request
+        """
+
+        data = {"model": model, "file": audio_file, **optional_params}
+
+        if "response_format" not in data or (
+            data["response_format"] == "text" or data["response_format"] == "json"
+        ):
+            data["response_format"] = (
+                "verbose_json"  # ensures 'duration' is received - used for cost calculation
+            )
+
+        return AudioTranscriptionRequestData(
+            data=data,
+        )
--- a/litellm/llms/huggingface/rerank/transformation.py
+++ b/litellm/llms/huggingface/rerank/transformation.py
@ -1,8 +1,9 @@
 import os
 import uuid
-from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, TypedDict, Union
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union

 import httpx
+from typing_extensions import TypedDict

 import litellm
 from litellm.llms.base_llm.chat.transformation import BaseLLMException
--- a/litellm/llms/lm_studio/chat/transformation.py
+++ b/litellm/llms/lm_studio/chat/transformation.py
@ -15,8 +15,8 @@ class LMStudioChatConfig(OpenAIGPTConfig):
    ) -> Tuple[Optional[str], Optional[str]]:
        api_base = api_base or get_secret_str("LM_STUDIO_API_BASE")  # type: ignore
        dynamic_api_key = (
-            api_key or get_secret_str("LM_STUDIO_API_KEY") or " "
-        )  # vllm does not require an api key
+            api_key or get_secret_str("LM_STUDIO_API_KEY") or "fake-api-key"
+        )  # LM Studio does not require an api key, but OpenAI client requires non-None value
        return api_base, dynamic_api_key
    
    def map_openai_params(
--- a/litellm/llms/ollama/chat/transformation.py
+++ b/litellm/llms/ollama/chat/transformation.py
@ -16,9 +16,18 @@ from httpx._models import Headers, Response
 from pydantic import BaseModel

 import litellm
+from litellm.litellm_core_utils.prompt_templates.common_utils import (
+    _extract_reasoning_content,
+    convert_content_list_to_str,
+    extract_images_from_message,
+)
 from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
 from litellm.llms.base_llm.chat.transformation import BaseConfig, BaseLLMException
-from litellm.types.llms.ollama import OllamaToolCall, OllamaToolCallFunction
+from litellm.types.llms.ollama import (
+    OllamaChatCompletionMessage,
+    OllamaToolCall,
+    OllamaToolCallFunction,
+)
 from litellm.types.llms.openai import (
    AllMessageValues,
    ChatCompletionAssistantToolCall,
@ -299,7 +308,23 @@ class OllamaChatConfig(BaseConfig):
                        )
                        new_tools.append(ollama_tool_call)
                cast(dict, m)["tool_calls"] = new_tools
-            new_messages.append(m)
+            reasoning_content, parsed_content = _extract_reasoning_content(
+                cast(dict, m)
+            )
+            content_str = convert_content_list_to_str(cast(AllMessageValues, m))
+            images = extract_images_from_message(cast(AllMessageValues, m))
+
+            ollama_message = OllamaChatCompletionMessage(
+                role=cast(str, m.get("role")),
+            )
+            if reasoning_content is not None:
+                ollama_message["thinking"] = reasoning_content
+            if content_str is not None:
+                ollama_message["content"] = content_str
+            if images is not None:
+                ollama_message["images"] = images
+
+            new_messages.append(ollama_message)

        # Load Config
        config = self.get_config()
@ -361,7 +386,7 @@ class OllamaChatConfig(BaseConfig):
                del response_json_message["thinking"]
            elif response_json_message.get("content") is not None:
                # parse reasoning content from content
-                from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
+                from litellm.litellm_core_utils.prompt_templates.common_utils import (
                    _parse_content_for_reasoning,
                )

--- a/litellm/llms/ollama/completion/transformation.py
+++ b/litellm/llms/ollama/completion/transformation.py
@ -229,7 +229,7 @@ class OllamaConfig(BaseConfig):
            model = model.split("/", 1)[1]
        api_base = get_secret_str("OLLAMA_API_BASE") or "http://localhost:11434"
        api_key = self.get_api_key()
-        headers = { "Authorization": f"Bearer {api_key}" } if api_key else {}
+        headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}

        try:
            response = litellm.module_level_client.post(
@ -279,7 +279,7 @@ class OllamaConfig(BaseConfig):
        api_key: Optional[str] = None,
        json_mode: Optional[bool] = None,
    ) -> ModelResponse:
-        from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
+        from litellm.litellm_core_utils.prompt_templates.common_utils import (
            _parse_content_for_reasoning,
        )

--- a/litellm/llms/openai/transcriptions/gpt_transformation.py
+++ b/litellm/llms/openai/transcriptions/gpt_transformation.py
@ -1,5 +1,8 @@
 from typing import List

+from litellm.llms.base_llm.audio_transcription.transformation import (
+    AudioTranscriptionRequestData,
+)
 from litellm.types.llms.openai import OpenAIAudioTranscriptionOptionalParams
 from litellm.types.utils import FileTypes

@ -27,8 +30,12 @@ class OpenAIGPTAudioTranscriptionConfig(OpenAIWhisperAudioTranscriptionConfig):
        audio_file: FileTypes,
        optional_params: dict,
        litellm_params: dict,
-    ) -> dict:
+    ) -> AudioTranscriptionRequestData:
        """
        Transform the audio transcription request
        """
-        return {"model": model, "file": audio_file, **optional_params}
+        data = {"model": model, "file": audio_file, **optional_params}
+
+        return AudioTranscriptionRequestData(
+            data=data,
+        )
--- a/litellm/llms/openai/transcriptions/handler.py
+++ b/litellm/llms/openai/transcriptions/handler.py
@ -1,4 +1,4 @@
-from typing import Optional, Union
+from typing import Optional, Union, cast

 import httpx
 from openai import AsyncOpenAI, OpenAI
@ -34,6 +34,7 @@ class OpenAIAudioTranscription(OpenAIChatCompletion):
        - call openai_aclient.audio.transcriptions.create by default
        """
        try:
+
            raw_response = (
                await openai_aclient.audio.transcriptions.with_raw_response.create(
                    **data, timeout=timeout
@ -93,15 +94,14 @@ class OpenAIAudioTranscription(OpenAIChatCompletion):
        Handle audio transcription request
        """
        if provider_config is not None:
-            data = provider_config.transform_audio_transcription_request(
+            transformed_data = provider_config.transform_audio_transcription_request(
                model=model,
                audio_file=audio_file,
                optional_params=optional_params,
                litellm_params=litellm_params,
            )

-            if not isinstance(data, dict):
-                raise ValueError("OpenAI transformation route requires a dict")
+            data = cast(dict, transformed_data.data)
        else:
            data = {"model": model, "file": audio_file, **optional_params}

--- a/litellm/llms/openai/transcriptions/whisper_transformation.py
+++ b/litellm/llms/openai/transcriptions/whisper_transformation.py
@ -1,8 +1,9 @@
 from typing import List, Optional, Union

-from httpx import Headers
+from httpx import Headers, Response

 from litellm.llms.base_llm.audio_transcription.transformation import (
+    AudioTranscriptionRequestData,
    BaseAudioTranscriptionConfig,
 )
 from litellm.llms.base_llm.chat.transformation import BaseLLMException
@ -11,12 +12,40 @@ from litellm.types.llms.openai import (
    AllMessageValues,
    OpenAIAudioTranscriptionOptionalParams,
 )
-from litellm.types.utils import FileTypes
+from litellm.types.utils import FileTypes, TranscriptionResponse

 from ..common_utils import OpenAIError


 class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
+    def get_complete_url(
+        self,
+        api_base: Optional[str],
+        api_key: Optional[str],
+        model: str,
+        optional_params: dict,
+        litellm_params: dict,
+        stream: Optional[bool] = None,
+    ) -> str:
+        """
+        OPTIONAL
+
+        Get the complete url for the request
+
+        Some providers need `model` in `api_base`
+        """
+        ## get the api base, attach the endpoint - v1/audio/transcriptions
+        # strip trailing slash if present
+        api_base = api_base.rstrip("/") if api_base else ""
+
+        # if endswith "/v1"
+        if api_base and api_base.endswith("/v1"):
+            api_base = f"{api_base}/audio/transcriptions"
+        else:
+            api_base = f"{api_base}/v1/audio/transcriptions"
+
+        return api_base or ""
+
    def get_supported_openai_params(
        self, model: str
    ) -> List[OpenAIAudioTranscriptionOptionalParams]:
@ -72,21 +101,22 @@ class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
        audio_file: FileTypes,
        optional_params: dict,
        litellm_params: dict,
-    ) -> dict:
+    ) -> AudioTranscriptionRequestData:
        """
        Transform the audio transcription request
        """
-
        data = {"model": model, "file": audio_file, **optional_params}

        if "response_format" not in data or (
            data["response_format"] == "text" or data["response_format"] == "json"
        ):
-            data[
-                "response_format"
-            ] = "verbose_json"  # ensures 'duration' is received - used for cost calculation
+            data["response_format"] = (
+                "verbose_json"  # ensures 'duration' is received - used for cost calculation
+            )

-        return data
+        return AudioTranscriptionRequestData(
+            data=data,
+        )

    def get_error_class(
        self, error_message: str, status_code: int, headers: Union[dict, Headers]
@ -96,3 +126,25 @@ class OpenAIWhisperAudioTranscriptionConfig(BaseAudioTranscriptionConfig):
            message=error_message,
            headers=headers,
        )
+
+    def transform_audio_transcription_response(
+        self,
+        raw_response: Response,
+    ) -> TranscriptionResponse:
+        try:
+            raw_response_json = raw_response.json()
+        except Exception as e:
+            raise ValueError(
+                f"Error transforming response to json: {str(e)}\nResponse: {raw_response.text}"
+            )
+
+        if any(
+            key in raw_response_json
+            for key in TranscriptionResponse.model_fields.keys()
+        ):
+            return TranscriptionResponse(**raw_response_json)
+        else:
+            raise ValueError(
+                "Invalid response format. Received response does not match the expected format. Got: ",
+                raw_response_json,
+            )
--- a/litellm/llms/ovhcloud/chat/transformation.py
+++ b/litellm/llms/ovhcloud/chat/transformation.py
@ -0,0 +1,141 @@
+"""
+Support for OVHCloud AI Endpoints `/v1/chat/completions` endpoint.
+
+Our unified API follows the OpenAI standard.
+More information on our website: https://endpoints.ai.cloud.ovh.net
+"""
+from typing import Optional, Union, List
+
+import httpx
+from litellm import ModelResponseStream, OpenAIGPTConfig, get_model_info, verbose_logger
+from litellm.llms.ovhcloud.utils import OVHCloudException
+from litellm.llms.base_llm.base_model_iterator import BaseModelResponseIterator
+from litellm.llms.base_llm.chat.transformation import BaseLLMException
+from litellm.types.llms.openai import AllMessageValues
+
+class OVHCloudChatConfig(OpenAIGPTConfig):
+    @property
+    def custom_llm_provider(self) -> Optional[str]:
+        return "ovhcloud"
+
+    def get_supported_openai_params(self, model: str) -> list:
+        """
+        Details about function calling support can be found here:
+        https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-function-calling?id=kb_article_view&sysparm_article=KB0071907
+        """
+        supports_function_calling: Optional[bool] = None
+        try:
+            model_info = get_model_info(model, custom_llm_provider="ovhcloud")
+            supports_function_calling = model_info.get(
+                "supports_function_calling", False
+            )
+        except Exception as e:
+            verbose_logger.debug(f"Error getting supported OpenAI params: {e}")
+            pass
+
+        optional_params = super().get_supported_openai_params(model)
+        if supports_function_calling is not True:
+            verbose_logger.debug(
+                "You can see our models supporting function_calling in our catalog: https://endpoints.ai.cloud.ovh.net/catalog "
+            )
+            optional_params.remove("tools")
+            optional_params.remove("tool_choice")
+            optional_params.remove("function_call")
+            optional_params.remove("response_format")
+        return optional_params
+    
+    def get_complete_url(
+        self,
+        api_base: Optional[str],
+        api_key: Optional[str],
+        model: str,
+        optional_params: dict,
+        litellm_params: dict,
+        stream: Optional[bool] = None,
+    ) -> str:
+        api_base = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" if api_base is None else api_base.rstrip("/")
+        complete_url = f"{api_base}/chat/completions"
+        return complete_url
+    
+    def get_error_class(
+        self, 
+        error_message: str, 
+        status_code: int, 
+        headers: Union[dict, httpx.Headers]
+    ) -> BaseLLMException:
+        return OVHCloudException(
+            message=error_message,
+            status_code=status_code,
+            headers=headers,
+        )
+
+    def map_openai_params(
+        self,
+        non_default_params: dict,
+        optional_params: dict,
+        model: str,
+        drop_params: bool,
+    ) -> dict:
+        mapped_openai_params = super().map_openai_params(
+            non_default_params, optional_params, model, drop_params
+        )
+        return mapped_openai_params
+    
+    def transform_request(
+        self,
+        model: str,
+        messages: List[AllMessageValues],
+        optional_params: dict,
+        litellm_params: dict,
+        headers: dict,
+    ) -> dict:
+        extra_body = optional_params.pop("extra_body", {})
+        response = super().transform_request(
+            model, messages, optional_params, litellm_params, headers
+        )
+        response.update(extra_body)
+        return response
+
+class OVHCloudChatCompletionStreamingHandler(BaseModelResponseIterator):
+    """
+    Handler for OVHCloud AI Endpoints streaming chat completion responses
+    """
+
+    def chunk_parser(self, chunk: dict) -> ModelResponseStream:
+        """
+        Parse individual chunks from streaming response
+        """
+        try:
+            if "error" in chunk:
+                error_chunk = chunk["error"]
+                error_message = "OVHCloud Error: {}".format(
+                    error_chunk.get("message", "Unknown error")
+                )
+                raise OVHCloudException(
+                    message=error_message,
+                    status_code=error_chunk.get("code", 400),
+                    headers={"Content-Type": "application/json"},
+                )
+
+            new_choices = []
+            for choice in chunk["choices"]:
+                if "delta" in choice and "reasoning" in choice["delta"]:
+                    choice["delta"]["reasoning_content"] = choice["delta"].get("reasoning")
+                new_choices.append(choice)
+
+            return ModelResponseStream(
+                id=chunk["id"],
+                object="chat.completion.chunk",
+                created=chunk["created"],
+                usage=chunk.get("usage"),
+                model=chunk["model"],
+                choices=new_choices,
+            )
+        except KeyError as e:
+            raise OVHCloudException(
+                message=f"KeyError: {e}, Got unexpected response from CometAPI: {chunk}",
+                status_code=400,
+                headers={"Content-Type": "application/json"},
+            )
+        except Exception as e:
+            raise e
--- a/litellm/llms/ovhcloud/embedding/transformation.py
+++ b/litellm/llms/ovhcloud/embedding/transformation.py
@ -0,0 +1,122 @@
+"""
+This is OpenAI compatible - no transformation is applied
+
+"""
+from typing import List, Optional, Union
+
+import httpx
+
+from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
+from litellm.llms.base_llm.chat.transformation import BaseLLMException
+from litellm.llms.base_llm.embedding.transformation import BaseEmbeddingConfig
+from litellm.secret_managers.main import get_secret_str
+from litellm.types.llms.openai import AllEmbeddingInputValues, AllMessageValues
+from litellm.types.utils import EmbeddingResponse, Usage
+
+from ..utils import OVHCloudException
+
+
+class OVHCloudEmbeddingConfig(BaseEmbeddingConfig):
+    def __init__(self) -> None:
+        pass
+
+    def get_complete_url(
+        self,
+        api_base: Optional[str],
+        api_key: Optional[str],
+        model: str,
+        optional_params: dict,
+        litellm_params: dict,
+        stream: Optional[bool] = None,
+    ) -> str:
+        api_base = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1" if api_base is None else api_base.rstrip("/")
+        complete_url = f"{api_base}/embeddings"
+        return complete_url
+
+    def validate_environment(
+        self,
+        headers: dict,
+        model: str,
+        messages: List[AllMessageValues],
+        optional_params: dict,
+        litellm_params: dict,
+        api_key: Optional[str] = None,
+        api_base: Optional[str] = None,
+    ) -> dict:
+        if api_key is None:
+            api_key = get_secret_str("OVHCLOUD_API_KEY")
+
+        default_headers = {
+            "Authorization": f"Bearer {api_key}",
+            "accept": "application/json",
+            "Content-Type": "application/json",
+        }
+
+        if "Authorization" in headers:
+            default_headers["Authorization"] = headers["Authorization"]
+
+        return {**default_headers, **headers}
+
+    def get_supported_openai_params(self, model: str):
+        return []
+
+    def map_openai_params(
+        self,
+        non_default_params: dict,
+        optional_params: dict,
+        model: str,
+        drop_params: bool,
+    ):
+        supported_openai_params = self.get_supported_openai_params(model)
+        for param, value in non_default_params.items():
+            if param in supported_openai_params:
+                optional_params[param] = value
+        return optional_params
+
+    def transform_embedding_request(
+        self,
+        model: str,
+        input: AllEmbeddingInputValues,
+        optional_params: dict,
+        headers: dict,
+    ) -> dict:
+        return {"input": input, "model": model, **optional_params}
+
+    def transform_embedding_response(
+        self,
+        model: str,
+        raw_response: httpx.Response,
+        model_response: EmbeddingResponse,
+        logging_obj: LiteLLMLoggingObj,
+        api_key: Optional[str],
+        request_data: dict,
+        optional_params: dict,
+        litellm_params: dict,
+    ) -> EmbeddingResponse:
+        try:
+            raw_response_json = raw_response.json()
+        except Exception:
+            raise OVHCloudException(
+                message=raw_response.text,
+                status_code=raw_response.status_code,
+                headers=raw_response.headers,
+            )
+
+        model_response.model = raw_response_json.get("model")
+        model_response.data = raw_response_json.get("data")
+        model_response.object = raw_response_json.get("object")
+
+        usage = Usage(
+            prompt_tokens=raw_response_json.get("usage", {}).get("prompt_tokens", 0),
+            total_tokens=raw_response_json.get("usage", {}).get("total_tokens", 0),
+        )
+
+        model_response.usage = usage
+        return model_response
+
+    def get_error_class(
+        self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
+    ) -> BaseLLMException:
+        return OVHCloudException(
+            message=error_message, status_code=status_code, headers=headers
+        )
--- a/litellm/llms/ovhcloud/utils.py
+++ b/litellm/llms/ovhcloud/utils.py
@ -0,0 +1,6 @@
+from litellm.llms.base_llm.chat.transformation import BaseLLMException
+
+
+class OVHCloudException(BaseLLMException):
+    """OVHCloud AI Endpoints exception handling class"""
+    pass
--- a/litellm/llms/vertex_ai/text_to_speech/text_to_speech_handler.py
+++ b/litellm/llms/vertex_ai/text_to_speech/text_to_speech_handler.py
@ -1,6 +1,7 @@
-from typing import Optional, TypedDict, Union
+from typing import Optional, Union

 import httpx
+from typing_extensions import TypedDict

 import litellm
 from litellm.llms.custom_httpx.http_handler import (
--- a/litellm/llms/vertex_ai/vertex_embeddings/types.py
+++ b/litellm/llms/vertex_ai/vertex_embeddings/types.py
@ -3,7 +3,9 @@ Types for Vertex Embeddings Requests
 """

 from enum import Enum
-from typing import List, Optional, TypedDict, Union
+from typing import List, Optional, Union
+
+from typing_extensions import TypedDict


 class TaskType(str, Enum):
--- a/litellm/main.py
+++ b/litellm/main.py
@ -164,6 +164,7 @@ from .llms.openai.openai import OpenAIChatCompletion
 from .llms.openai.transcriptions.handler import OpenAIAudioTranscription
 from .llms.openai_like.chat.handler import OpenAILikeChatHandler
 from .llms.openai_like.embedding.handler import OpenAILikeEmbeddingHandler
+from .llms.ovhcloud.chat.transformation import OVHCloudChatConfig
 from .llms.petals.completion import handler as petals_handler
 from .llms.predibase.chat.handler import PredibaseChatCompletion
 from .llms.replicate.chat.handler import completion as replicate_chat_completion
@ -259,6 +260,7 @@ sagemaker_chat_completion = SagemakerChatHandler()
 bytez_transformation = BytezChatConfig()
 heroku_transformation = HerokuChatConfig()
 oci_transformation = OCIChatConfig()
+ovhcloud_transformation = OVHCloudChatConfig()
 ####### COMPLETION ENDPOINTS ################


@ -3535,6 +3537,42 @@ def completion(  # type: ignore # noqa: PLR0915

            pass

+        elif custom_llm_provider == "ovhcloud" or model in litellm.ovhcloud_models:
+            api_key = (
+                api_key
+                or litellm.ovhcloud_key
+                or get_secret_str("OVHCLOUD_API_KEY")
+                or litellm.api_key
+            )
+
+            api_base = (
+                api_base
+                or litellm.api_base
+                or get_secret_str("OVHCLOUD_API_BASE")
+                or "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"
+            )
+
+            response = base_llm_http_handler.completion(
+                model=model,
+                messages=messages,
+                headers=headers,
+                model_response=model_response,
+                api_key=api_key,
+                api_base=api_base,
+                acompletion=acompletion,
+                logging_obj=logging,
+                optional_params=optional_params,
+                litellm_params=litellm_params,
+                timeout=timeout,  # type: ignore
+                client=client,
+                custom_llm_provider=custom_llm_provider,
+                encoding=encoding,
+                stream=stream,
+                provider_config=ovhcloud_transformation,
+            )
+
+            pass
+
        elif custom_llm_provider == "custom":
            url = litellm.api_base or api_base or ""
            if url is None or url == "":
@ -4603,6 +4641,28 @@ def embedding(  # noqa: PLR0915
                aembedding=aembedding,
                headers=headers,
            )
+        elif custom_llm_provider == "ovhcloud":
+            api_key = api_key or litellm.api_key or get_secret_str("OVHCLOUD_API_KEY")
+            api_base = (
+                api_base
+                or litellm.api_base
+                or get_secret_str("OVHCLOUD_API_BASE")
+                or "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"
+            )
+            response = base_llm_http_handler.embedding(
+                model=model,
+                input=input,
+                custom_llm_provider=custom_llm_provider,
+                api_base=api_base,
+                api_key=api_key,
+                logging_obj=logging,
+                timeout=timeout,
+                model_response=EmbeddingResponse(),
+                optional_params=optional_params,
+                client=client,
+                aembedding=aembedding,
+                litellm_params={},
+            )
        elif custom_llm_provider in litellm._custom_providers:
            custom_handler: Optional[CustomLLM] = None
            for item in litellm.custom_provider_map:
@ -5297,7 +5357,10 @@ def transcription(
    model_response = litellm.utils.TranscriptionResponse()

    model, custom_llm_provider, dynamic_api_key, api_base = get_llm_provider(
-        model=model, custom_llm_provider=custom_llm_provider, api_base=api_base
+        model=model,
+        custom_llm_provider=custom_llm_provider,
+        api_base=api_base,
+        api_key=api_key,
    )  # type: ignore

    if dynamic_api_key is not None:
@ -5313,6 +5376,7 @@ def transcription(
        custom_llm_provider=custom_llm_provider,
        **non_default_params,
    )
+
    litellm_params_dict = get_litellm_params(**kwargs)

    litellm_logging_obj.update_environment_variables(
@ -5377,9 +5441,8 @@ def transcription(
            max_retries=max_retries,
            litellm_params=litellm_params_dict,
        )
-    elif (
-        custom_llm_provider == "openai"
-        or custom_llm_provider in litellm.openai_compatible_providers
+    elif custom_llm_provider == "openai" or (
+        custom_llm_provider in litellm.openai_compatible_providers
    ):
        api_base = (
            api_base
@ -5394,6 +5457,7 @@ def transcription(
            or None  # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
        )
        # set API KEY
+
        api_key = api_key or litellm.api_key or litellm.openai_key or get_secret("OPENAI_API_KEY")  # type: ignore
        response = openai_audio_transcriptions.audio_transcriptions(
            model=model,
@ -5410,10 +5474,7 @@ def transcription(
            provider_config=provider_config,
            litellm_params=litellm_params_dict,
        )
-    elif custom_llm_provider in [
-        LlmProviders.DEEPGRAM.value,
-        LlmProviders.ELEVENLABS.value,
-    ]:
+    elif provider_config is not None:
        response = base_llm_http_handler.audio_transcriptions(
            model=model,
            audio_file=file,
--- a/litellm/model_prices_and_context_window_backup.json
+++ b/litellm/model_prices_and_context_window_backup.json
@ -20777,5 +20777,207 @@
        "metadata": {
            "notes": "Volcengine Doubao embedding model - text-240715 version with 2560 dimensions"
        }
+    },
+    "ovhcloud/Qwen2.5-VL-72B-Instruct": {
+        "max_tokens": 32000,
+        "max_input_tokens": 32000,
+        "max_output_tokens": 32000,
+        "input_cost_per_token": 9.1e-07,
+        "output_cost_per_token": 9.1e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": true,
+        "supports_tool_choice": false,
+        "supports_vision": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/qwen2-5-vl-72b-instruct"
+    },
+    "ovhcloud/llava-v1.6-mistral-7b-hf": {
+        "max_tokens": 32000,
+        "max_input_tokens": 32000,
+        "max_output_tokens": 32000,
+        "input_cost_per_token": 2.9e-07,
+        "output_cost_per_token": 2.9e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": true,
+        "supports_tool_choice": false,
+        "supports_vision": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/llava-next-mistral-7b"
+    },
+    "ovhcloud/gpt-oss-120b": {
+        "max_tokens": 131000,
+        "max_input_tokens": 131000,
+        "max_output_tokens": 131000,
+        "input_cost_per_token": 8e-08,
+        "output_cost_per_token": 4e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": true,
+        "supports_tool_choice": false,
+        "supports_reasoning": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/gpt-oss-120b"
+    },
+    "ovhcloud/Meta-Llama-3_3-70B-Instruct": {
+        "max_tokens": 131000,
+        "max_input_tokens": 131000,
+        "max_output_tokens": 131000,
+        "input_cost_per_token": 6.7e-07,
+        "output_cost_per_token": 6.7e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_response_schema": true,
+        "supports_tool_choice": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/meta-llama-3-3-70b-instruct"
+    },
+    "ovhcloud/Qwen2.5-Coder-32B-Instruct": {
+        "max_tokens": 32000,
+        "max_input_tokens": 32000,
+        "max_output_tokens": 32000,
+        "input_cost_per_token": 8.7e-07,
+        "output_cost_per_token": 8.7e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": true,
+        "supports_tool_choice": false,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/qwen2-5-coder-32b-instruct"
+    },
+    "ovhcloud/Mixtral-8x7B-Instruct-v0.1": {
+        "max_tokens": 32000,
+        "max_input_tokens": 32000,
+        "max_output_tokens": 32000,
+        "input_cost_per_token": 6.3e-07,
+        "output_cost_per_token": 6.3e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": true,
+        "supports_tool_choice": false,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/mixtral-8x7b-instruct-v0-1"
+    },
+    "ovhcloud/Meta-Llama-3_1-70B-Instruct": {
+        "max_tokens": 131000,
+        "max_input_tokens": 131000,
+        "max_output_tokens": 131000,
+        "input_cost_per_token": 6.7e-07,
+        "output_cost_per_token": 6.7e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": false,
+        "supports_tool_choice": false,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/meta-llama-3-1-70b-instruct"
+    },
+    "ovhcloud/Mistral-Small-3.2-24B-Instruct-2506": {
+        "max_tokens": 128000,
+        "max_input_tokens": 128000,
+        "max_output_tokens": 128000,
+        "input_cost_per_token": 9e-08,
+        "output_cost_per_token": 2.8e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_response_schema": true,
+        "supports_tool_choice": true,
+        "supports_vision": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/mistral-small-3-2-24b-instruct-2506"
+    },
+    "ovhcloud/DeepSeek-R1-Distill-Llama-70B": {
+        "max_tokens": 131000,
+        "max_input_tokens": 131000,
+        "max_output_tokens": 131000,
+        "input_cost_per_token": 6.7e-07,
+        "output_cost_per_token": 6.7e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_response_schema": true,
+        "supports_tool_choice": true,
+        "supports_reasoning": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/deepseek-r1-distill-llama-70b"
+    },
+    "ovhcloud/Llama-3.1-8B-Instruct": {
+        "max_tokens": 131000,
+        "max_input_tokens": 131000,
+        "max_output_tokens": 131000,
+        "input_cost_per_token": 1e-07,
+        "output_cost_per_token": 1e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_response_schema": true,
+        "supports_tool_choice": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/llama-3-1-8b-instruct"
+    },
+    "ovhcloud/Mistral-7B-Instruct-v0.3": {
+        "max_tokens": 127000,
+        "max_input_tokens": 127000,
+        "max_output_tokens": 127000,
+        "input_cost_per_token": 1e-07,
+        "output_cost_per_token": 1e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_response_schema": true,
+        "supports_tool_choice": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/mistral-7b-instruct-v0-3"
+    },
+    "ovhcloud/gpt-oss-20b": {
+        "max_tokens": 131000,
+        "max_input_tokens": 131000,
+        "max_output_tokens": 131000,
+        "input_cost_per_token": 4e-08,
+        "output_cost_per_token": 1.5e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": true,
+        "supports_tool_choice": false,
+        "supports_reasoning": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/gpt-oss-20b"
+    },
+    "ovhcloud/Mistral-Nemo-Instruct-2407": {
+        "max_tokens": 118000,
+        "max_input_tokens": 118000,
+        "max_output_tokens": 118000,
+        "input_cost_per_token": 1.3e-07,
+        "output_cost_per_token": 1.3e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_response_schema": true,
+        "supports_tool_choice": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/mistral-nemo-instruct-2407"
+    },
+    "ovhcloud/Qwen3-32B": {
+        "max_tokens": 32000,
+        "max_input_tokens": 32000,
+        "max_output_tokens": 32000,
+        "input_cost_per_token": 8e-08,
+        "output_cost_per_token": 2.3e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_response_schema": true,
+        "supports_tool_choice": true,
+        "supports_reasoning": true,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/qwen3-32b"
+    },
+    "ovhcloud/mamba-codestral-7B-v0.1": {
+        "max_tokens": 256000,
+        "max_input_tokens": 256000,
+        "max_output_tokens": 256000,
+        "input_cost_per_token": 1.9e-07,
+        "output_cost_per_token": 1.9e-07,
+        "litellm_provider": "ovhcloud",
+        "mode": "chat",
+        "supports_function_calling": false,
+        "supports_response_schema": true,
+        "supports_tool_choice": false,
+        "source": "https://endpoints.ai.cloud.ovh.net/models/mamba-codestral-7b-v0-1"
    }
 }
--- a/litellm/proxy/_experimental/out/_next/static/chunks/117-a0da667066d322b6.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/117-a0da667066d322b6.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/154-6f752d9e0a5e497b.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/154-6f752d9e0a5e497b.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/154-fff436ed72b19a24.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/154-fff436ed72b19a24.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/220-5061c4cea850d728.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/220-5061c4cea850d728.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/220-8af5927d18414264.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/220-8af5927d18414264.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/50-bb8a11a7610535aa.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/50-bb8a11a7610535aa.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/50-fe160ecfa8bc4059.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/50-fe160ecfa8bc4059.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-25a743106e1c9456.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-25a743106e1c9456.js
@ -1 +0,0 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{6580:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=6580)}),_N_E=n.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-f4acf18888f0aa20.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-f4acf18888f0aa20.js
@ -0,0 +1 @@
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{96443:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=96443)}),_N_E=n.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub/page-0dbadf20167b786c.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub/page-0dbadf20167b786c.js
@ -1 +1 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{11790:function(e,n,t){Promise.resolve().then(t.bind(t,52829))},52829:function(e,n,t){"use strict";t.r(n),t.d(n,{default:function(){return f}});var u=t(57437),s=t(2265),c=t(99376),r=t(72162);function f(){let e=(0,c.useSearchParams)().get("key"),[n,t]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&t(e)},[e]),(0,u.jsx)(r.Z,{accessToken:n})}}},function(e){e.O(0,[50,521,154,162,971,117,744],function(){return e(e.s=11790)}),_N_E=e.O()}]);
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[418],{21024:function(e,n,t){Promise.resolve().then(t.bind(t,52829))},52829:function(e,n,t){"use strict";t.r(n),t.d(n,{default:function(){return f}});var u=t(57437),s=t(2265),c=t(99376),r=t(72162);function f(){let e=(0,c.useSearchParams)().get("key"),[n,t]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&t(e)},[e]),(0,u.jsx)(r.Z,{accessToken:n})}}},function(e){e.O(0,[50,521,154,162,971,117,744],function(){return e(e.s=21024)}),_N_E=e.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub_table/page-f469bae327299fbb.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub_table/page-f469bae327299fbb.js
@ -1 +1 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[25],{58538:function(e,n,u){Promise.resolve().then(u.bind(u,22775))},22775:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(36172);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1,userRole:null})}}},function(e){e.O(0,[50,521,866,154,162,172,971,117,744],function(){return e(e.s=58538)}),_N_E=e.O()}]);
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[25],{64563:function(e,n,u){Promise.resolve().then(u.bind(u,22775))},22775:function(e,n,u){"use strict";u.r(n),u.d(n,{default:function(){return f}});var t=u(57437),s=u(2265),r=u(99376),c=u(36172);function f(){let e=(0,r.useSearchParams)().get("key"),[n,u]=(0,s.useState)(null);return(0,s.useEffect)(()=>{e&&u(e)},[e]),(0,t.jsx)(c.Z,{accessToken:n,publicPage:!0,premiumUser:!1,userRole:null})}}},function(e){e.O(0,[50,521,866,154,162,172,971,117,744],function(){return e(e.s=64563)}),_N_E=e.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-3c5840c907b0a5c8.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-3c5840c907b0a5c8.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-7828c2c64e97362a.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-7828c2c64e97362a.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-127adcf8da2b5294.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-127adcf8da2b5294.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-338773f18570e0d6.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-338773f18570e0d6.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/main-7e39698e9e999d78.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/main-7e39698e9e999d78.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/main-ad370c92406567cc.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/main-ad370c92406567cc.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/main-app-4f7318ae681a6d94.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/main-app-4f7318ae681a6d94.js
@ -1 +1 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{20169:function(e,n,t){Promise.resolve().then(t.t.bind(t,12846,23)),Promise.resolve().then(t.t.bind(t,19107,23)),Promise.resolve().then(t.t.bind(t,61060,23)),Promise.resolve().then(t.t.bind(t,4707,23)),Promise.resolve().then(t.t.bind(t,80,23)),Promise.resolve().then(t.t.bind(t,36423,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,117],function(){return n(54278),n(20169)}),_N_E=e.O()}]);
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{10264:function(e,n,t){Promise.resolve().then(t.t.bind(t,12846,23)),Promise.resolve().then(t.t.bind(t,19107,23)),Promise.resolve().then(t.t.bind(t,61060,23)),Promise.resolve().then(t.t.bind(t,4707,23)),Promise.resolve().then(t.t.bind(t,80,23)),Promise.resolve().then(t.t.bind(t,36423,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,117],function(){return n(54278),n(10264)}),_N_E=e.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/css/2a9ba80f924f3272.css
+++ b/litellm/proxy/_experimental/out/_next/static/css/2a9ba80f924f3272.css
--- a/litellm/proxy/_experimental/out/_next/static/css/c528590c6415a94c.css
+++ b/litellm/proxy/_experimental/out/_next/static/css/c528590c6415a94c.css
--- a/litellm/proxy/_experimental/out/_next/static/fhuPj8WYsuMGymIUE7Xgu/_buildManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/fhuPj8WYsuMGymIUE7Xgu/_buildManifest.js
--- a/litellm/proxy/_experimental/out/_next/static/fhuPj8WYsuMGymIUE7Xgu/_ssgManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/fhuPj8WYsuMGymIUE7Xgu/_ssgManifest.js
--- a/litellm/proxy/_experimental/out/index.html
+++ b/litellm/proxy/_experimental/out/index.html
--- a/litellm/proxy/_experimental/out/index.txt
+++ b/litellm/proxy/_experimental/out/index.txt
@ -1,7 +1,7 @@
 2:I[19107,[],"ClientPageRoot"]
-3:I[30628,["665","static/chunks/3014691f-b7b79b78e27792f3.js","990","static/chunks/13b76428-ebdf3012af0e4489.js","50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","220","static/chunks/220-5061c4cea850d728.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","931","static/chunks/app/page-127adcf8da2b5294.js"],"default",1]
+3:I[30628,["665","static/chunks/3014691f-b7b79b78e27792f3.js","990","static/chunks/13b76428-ebdf3012af0e4489.js","50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","220","static/chunks/220-8af5927d18414264.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","931","static/chunks/app/page-338773f18570e0d6.js"],"default",1]
 4:I[4707,[],""]
 5:I[36423,[],""]
-0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
+0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_experimental/out/model_hub.txt
+++ b/litellm/proxy/_experimental/out/model_hub.txt
@ -1,7 +1,7 @@
 2:I[19107,[],"ClientPageRoot"]
-3:I[52829,["50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","418","static/chunks/app/model_hub/page-d6e5fb7de2cedde9.js"],"default",1]
+3:I[52829,["50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","418","static/chunks/app/model_hub/page-0dbadf20167b786c.js"],"default",1]
 4:I[4707,[],""]
 5:I[36423,[],""]
-0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
+0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_experimental/out/model_hub_table.html
+++ b/litellm/proxy/_experimental/out/model_hub_table.html
--- a/litellm/proxy/_experimental/out/model_hub_table.txt
+++ b/litellm/proxy/_experimental/out/model_hub_table.txt
@ -1,7 +1,7 @@
 2:I[19107,[],"ClientPageRoot"]
-3:I[22775,["50","static/chunks/50-fe160ecfa8bc4059.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","154","static/chunks/154-fff436ed72b19a24.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","25","static/chunks/app/model_hub_table/page-e06e934de1021ee4.js"],"default",1]
+3:I[22775,["50","static/chunks/50-bb8a11a7610535aa.js","521","static/chunks/521-d97d355792d44830.js","866","static/chunks/866-9e1803a09e9ae8da.js","154","static/chunks/154-6f752d9e0a5e497b.js","162","static/chunks/162-4e7640b4d68e1ae4.js","172","static/chunks/172-0f7049c565983c4d.js","25","static/chunks/app/model_hub_table/page-f469bae327299fbb.js"],"default",1]
 4:I[4707,[],""]
 5:I[36423,[],""]
-0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["model_hub_table",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub_table",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub_table","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
+0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["model_hub_table",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub_table",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub_table","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_experimental/out/onboarding.html
+++ b/litellm/proxy/_experimental/out/onboarding.html
--- a/litellm/proxy/_experimental/out/onboarding.txt
+++ b/litellm/proxy/_experimental/out/onboarding.txt
@ -1,7 +1,7 @@
 2:I[19107,[],"ClientPageRoot"]
-3:I[12011,["665","static/chunks/3014691f-b7b79b78e27792f3.js","50","static/chunks/50-fe160ecfa8bc4059.js","154","static/chunks/154-fff436ed72b19a24.js","461","static/chunks/app/onboarding/page-3c5840c907b0a5c8.js"],"default",1]
+3:I[12011,["665","static/chunks/3014691f-b7b79b78e27792f3.js","50","static/chunks/50-bb8a11a7610535aa.js","154","static/chunks/154-6f752d9e0a5e497b.js","461","static/chunks/app/onboarding/page-7828c2c64e97362a.js"],"default",1]
 4:I[4707,[],""]
 5:I[36423,[],""]
-0:["FMlZjJYLUentCU02Wj6R_",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/c528590c6415a94c.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
+0:["fhuPj8WYsuMGymIUE7Xgu",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},[["$L1",["$","$L2",null,{"props":{"params":{},"searchParams":{}},"Component":"$3"}],null],null],null]},[null,["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/31b7f215e119031e.css","precedence":"next","crossOrigin":"$undefined"}],["$","link","1",{"rel":"stylesheet","href":"/litellm-asset-prefix/_next/static/css/2a9ba80f924f3272.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_b0dd8a","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]],null],null],["$L6",null]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","link","5",{"rel":"icon","href":"./favicon.ico"}],["$","meta","6",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_new_secret_config.yaml
+++ b/litellm/proxy/_new_secret_config.yaml
@ -1,12 +1,20 @@
 model_list:
-  - model_name: fake-openai-endpoint
+  - model_name: byok-fixed-gpt-4o-mini
    litellm_params:
-      model: openai/fake
-      api_key: fake-key
-      api_base: https://exampleopenaiendpoint-production.up.railway.app/
-  - model_name: wildcard_models/*
+      model: openai/gpt-4o-mini
+      api_base: "https://webhook.site/2f385e05-00aa-402b-86d1-efc9261471a5"
+      api_key: dummy
+  - model_name: "byok-wildcard/*"
    litellm_params:
      model: openai/*
  - model_name: xai-grok-3
    litellm_params:
      model: xai/grok-3
+  - model_name: hosted_vllm/whisper-v3
+    litellm_params:
+      model: hosted_vllm/whisper-v3
+      api_base: "https://webhook.site/2f385e05-00aa-402b-86d1-efc9261471a5"
+      api_key: dummy
+
+
+
--- a/litellm/proxy/_types.py
+++ b/litellm/proxy/_types.py
@ -1949,7 +1949,7 @@ class LiteLLM_OrganizationMembershipTable(LiteLLMPydanticObjectBase):
    model_config = ConfigDict(protected_namespaces=())


-class LiteLLM_OrganizationTableUpdate(LiteLLMPydanticObjectBase):
+class LiteLLM_OrganizationTableUpdate(LiteLLM_BudgetTable):
    """Represents user-controllable params for a LiteLLM_OrganizationTable record"""

    organization_id: Optional[str] = None
--- a/litellm/proxy/auth/auth_checks.py
+++ b/litellm/proxy/auth/auth_checks.py
@ -1211,7 +1211,6 @@ def _check_model_access_helper(
    models: List[str],
    team_model_aliases: Optional[Dict[str, str]] = None,
    team_id: Optional[str] = None,
-    object_type: Literal["user", "team", "key", "org"] = "user",
 ) -> bool:
    ## check if model in allowed model names
    from collections import defaultdict
@ -1316,7 +1315,6 @@ def _can_object_call_model(
            models=models,
            team_model_aliases=team_model_aliases,
            team_id=team_id,
-            object_type=object_type,
        ):
            return True

--- a/litellm/proxy/example_config_yaml/tool_permission_example.yaml
+++ b/litellm/proxy/example_config_yaml/tool_permission_example.yaml
@ -0,0 +1,36 @@
+model_list:
+  - model_name: claude-3-5-sonnet
+    litellm_params:
+      model: anthropic/claude-3-5-sonnet-20241022
+      api_key: os.environ/ANTHROPIC_API_KEY
+
+guardrails:
+  - guardrail_name: "tool-permission-guardrail"
+    litellm_params:
+      guardrail: tool_permission
+      mode: "post_call"
+      default_on: true  # Apply to all requests by default
+      rules:
+        - id: "allow_bash"
+          tool_name: "Bash"
+          decision: "allow"
+        - id: "allow_github_mcp"
+          tool_name: "mcp__github_*"
+          decision: "allow"
+        - id: "allow_aws_documentation"
+          tool_name: "mcp__aws-documentation_*_documentation"
+          decision: "allow"
+        - id: "deny_read_commands"
+          tool_name: "Read"
+          decision: "Deny"
+      default_action: "deny"  # deny by default if no rule matches
+      on_disallowed_action: "block"  # block by default if no rule matches
+
+# Optional: Configure general settings
+general_settings:
+  master_key: sk-1234
+  
+# Optional: Add logging configuration
+litellm_settings:
+  success_callback: ["langfuse"]
+  failure_callback: ["langfuse"]
--- a/litellm/proxy/guardrails/guardrail_hooks/noma/init.py
+++ b/litellm/proxy/guardrails/guardrail_hooks/noma/init.py
@ -18,6 +18,7 @@ def initialize_guardrail(litellm_params: "LitellmParams", guardrail: "Guardrail"
        application_id=litellm_params.application_id,
        monitor_mode=litellm_params.monitor_mode,
        block_failures=litellm_params.block_failures,
+        anonymize_input=litellm_params.anonymize_input,
        event_hook=litellm_params.mode,
        default_on=litellm_params.default_on,
    )
--- a/litellm/proxy/guardrails/guardrail_hooks/noma/noma.py
+++ b/litellm/proxy/guardrails/guardrail_hooks/noma/noma.py
@ -5,9 +5,10 @@
 #
 # +-------------------------------------------------------------+

+import asyncio
 import copy
 import os
-from typing import Any, Dict, Literal, Optional, Union
+from typing import Any, Dict, Final, Literal, Optional, Union
 from urllib.parse import urljoin

 from fastapi import HTTPException
@ -24,6 +25,15 @@ from litellm.proxy._types import UserAPIKeyAuth
 from litellm.types.guardrails import GuardrailEventHooks
 from litellm.types.utils import EmbeddingResponse, ImageResponse

+# Constants
+USER_ROLE: Final[Literal["user"]] = "user"
+ASSISTANT_ROLE: Final[Literal["assistant"]] = "assistant"
+SENSITIVE_DATA_DETECTOR_KEYS: Final[list[str]] = ["sensitiveData", "dataDetector"]
+
+# Type aliases
+MessageRole = Literal["user", "assistant"]
+LLMResponse = Union[Any, ModelResponse, EmbeddingResponse, ImageResponse]
+

 class NomaBlockedMessage(HTTPException):
    """Exception raised when Noma guardrail blocks a message"""
@ -77,6 +87,7 @@ class NomaBlockedMessage(HTTPException):
                "allowedTopics",
                "bannedTopics",
                "topicGuardrails",
+                "topicDetector",  # Mock name for tests
            ] and isinstance(value, dict):
                filtered_topics = {}
                for topic, topic_result in value.items():
@ -86,7 +97,7 @@ class NomaBlockedMessage(HTTPException):
                if filtered_topics:
                    result[key] = filtered_topics

-            elif key == "sensitiveData" and isinstance(value, dict):
+            elif key in SENSITIVE_DATA_DETECTOR_KEYS and isinstance(value, dict):
                filtered_sensitive = {}
                for data_type, data_result in value.items():
                    if self._is_result_true(data_result):
@ -135,6 +146,7 @@ class NomaGuardrail(CustomGuardrail):
        application_id: Optional[str] = None,
        monitor_mode: Optional[bool] = None,
        block_failures: Optional[bool] = None,
+        anonymize_input: Optional[bool] = None,
        **kwargs,
    ):
        self.async_handler = get_async_httpx_client(
@ -162,8 +174,326 @@ class NomaGuardrail(CustomGuardrail):
        else:
            self.block_failures = block_failures

+        if anonymize_input is None:
+            self.anonymize_input = (
+                os.environ.get("NOMA_ANONYMIZE_INPUT", "false").lower() == "true"
+            )
+        else:
+            self.anonymize_input = anonymize_input
+
        super().__init__(**kwargs)

+    def _create_background_noma_check(
+        self,
+        coro,
+    ) -> None:
+        """Create a background task for Noma API calls without blocking the main flow"""
+        try:
+            asyncio.create_task(coro)
+        except Exception as e:
+            verbose_proxy_logger.error(
+                f"Failed to create background Noma task: {str(e)}"
+            )
+
+    async def _process_user_message_check(
+        self,
+        request_data: dict,
+        user_auth: UserAPIKeyAuth,
+    ) -> Optional[str]:
+        """Shared logic for processing user message checks"""
+        extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
+
+        user_message = await self._extract_user_message(request_data)
+        if not user_message:
+            return None
+
+        payload = {"request": {"text": user_message}}
+        response_json = await self._call_noma_api(
+            payload=payload,
+            llm_request_id=None,
+            request_data=request_data,
+            user_auth=user_auth,
+            extra_data=extra_data,
+        )
+
+        if self.monitor_mode:
+            await self._handle_verdict_background(
+                USER_ROLE, user_message, response_json
+            )
+            return user_message
+
+        # Check if we should anonymize content
+        if self._should_anonymize(response_json, USER_ROLE):
+            anonymized_content = self._extract_anonymized_content(
+                response_json, USER_ROLE
+            )
+            if anonymized_content:
+                # Replace the user message content with anonymized version
+                self._replace_user_message_content(request_data, anonymized_content)
+                verbose_proxy_logger.debug(
+                    f"Noma guardrail anonymized user message: {anonymized_content}"
+                )
+                return anonymized_content
+
+        await self._check_verdict(USER_ROLE, user_message, response_json)
+        return user_message
+
+    async def _process_llm_response_check(
+        self,
+        request_data: dict,
+        response: LLMResponse,
+        user_auth: UserAPIKeyAuth,
+    ) -> Optional[str]:
+        """Shared logic for processing LLM response checks"""
+        extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
+
+        if not isinstance(response, litellm.ModelResponse):
+            return None
+
+        content = None
+        for choice in response.choices:
+            if isinstance(choice, litellm.Choices) and choice.message.content:
+                content = choice.message.content
+                break
+
+        if not content or not isinstance(content, str):
+            return None
+
+        payload = {"response": {"text": content}}
+
+        response_json = await self._call_noma_api(
+            payload=payload,
+            llm_request_id=response.id,
+            request_data=request_data,
+            user_auth=user_auth,
+            extra_data=extra_data,
+        )
+
+        if self.monitor_mode:
+            await self._handle_verdict_background(
+                ASSISTANT_ROLE, content, response_json
+            )
+            return content
+
+        # Check if we should anonymize content
+        if self._should_anonymize(response_json, ASSISTANT_ROLE):
+            anonymized_content = self._extract_anonymized_content(
+                response_json, ASSISTANT_ROLE
+            )
+            if anonymized_content:
+                # Replace the LLM response content with anonymized version
+                self._replace_llm_response_content(response, anonymized_content)
+                verbose_proxy_logger.debug(
+                    f"Noma guardrail anonymized LLM response: {anonymized_content}"
+                )
+                return anonymized_content
+
+        await self._check_verdict(ASSISTANT_ROLE, content, response_json)
+        return content
+
+    def _should_only_sensitive_data_failed(self, classification_obj: dict) -> bool:
+        """
+        Check if only sensitive data detectors (PII, PCI, secrets) have result=true in the classification.
+
+        Args:
+            classification_obj: The prompt or response classification object from Noma API
+
+        Returns:
+            True if only sensitiveData detectors have result=true, False otherwise
+        """
+        if not classification_obj:
+            return False
+
+        # Track which detectors have result=true (detected violations)
+        failed_detectors = []
+        sensitive_data_detected = False
+
+        for key, value in classification_obj.items():
+            if key in SENSITIVE_DATA_DETECTOR_KEYS and isinstance(value, dict):
+                # Check if any sensitive data detector has result=true
+                for data_type, data_result in value.items():
+                    if self._is_result_true(data_result):
+                        sensitive_data_detected = True
+                        # Don't add to failed_detectors as we want to allow these
+
+            elif isinstance(value, dict) and "result" in value:
+                # Check other detectors - these should NOT have result=true
+                if self._is_result_true(value):
+                    failed_detectors.append(key)
+
+            elif isinstance(value, dict):
+                # Handle nested detectors
+                for nested_key, nested_value in value.items():
+                    if self._is_result_true(nested_value):
+                        failed_detectors.append(f"{key}.{nested_key}")
+
+        # Return True only if sensitive data was detected AND no other detectors have result=true
+        return sensitive_data_detected and len(failed_detectors) == 0
+
+    def _extract_anonymized_content(
+        self, response_json: dict, message_type: MessageRole
+    ) -> Optional[str]:
+        """
+        Extract anonymized content from Noma API response.
+
+        Args:
+            response_json: The full response from Noma API
+            message_type: Either 'user' or 'assistant' to determine which content to extract
+
+        Returns:
+            The anonymized content string if available, None otherwise
+        """
+        original_response = response_json.get("originalResponse", {})
+
+        if message_type == USER_ROLE:
+            prompt_data = original_response.get("prompt", {})
+            anonymized_data = prompt_data.get("anonymizedContent", {})
+            return anonymized_data.get("anonymized")
+        elif message_type == ASSISTANT_ROLE:
+            response_data = original_response.get("response", {})
+            anonymized_data = response_data.get("anonymizedContent", {})
+            return anonymized_data.get("anonymized")
+
+        return None
+
+    def _should_anonymize(self, response_json: dict, message_type: MessageRole) -> bool:
+        """
+        Determine if content should be anonymized based on Noma API response.
+
+        Logic:
+        - If verdict=True: Content is safe, anonymize if anonymized version exists
+        - If verdict=False: Check if only sensitiveData detectors have result=True
+          - If yes: Anonymize
+          - If no: Block (other violations detected)
+
+        Args:
+            response_json: The full response from Noma API
+            message_type: Either 'user' or 'assistant' to determine which classification to check
+
+        Returns:
+            True if content should be anonymized, False if it should be blocked
+        """
+        # Only anonymize in blocking mode when anonymize_input is enabled
+        if self.monitor_mode or not self.anonymize_input:
+            return False
+
+        verdict = response_json.get("verdict", True)
+        # If verdict is True, anonymize (content is considered safe)
+        if verdict:
+            return True
+
+        # If verdict is False, check if only sensitive data detectors have result=True
+        original_response = response_json.get("originalResponse", {})
+
+        if message_type == USER_ROLE:
+            classification_obj = original_response.get("prompt", {})
+        elif message_type == ASSISTANT_ROLE:
+            classification_obj = original_response.get("response", {})
+        else:
+            return False
+
+        # Anonymize only if solely sensitive data (PII/PCI/secrets) was detected
+        return self._should_only_sensitive_data_failed(classification_obj)
+
+    def _is_result_true(self, result_obj: Optional[Dict[str, Any]]) -> bool:
+        """
+        Check if a result object has a "result" field that is True.
+
+        Args:
+            result_obj: A dictionary that may contain a "result" field
+
+        Returns:
+            True if the "result" field exists and is True, False otherwise
+        """
+        if not result_obj or not isinstance(result_obj, dict):
+            return False
+
+        return result_obj.get("result") is True
+
+    def _replace_user_message_content(
+        self, request_data: dict, anonymized_content: str
+    ):
+        """
+        Replace the user message content in request data with anonymized version.
+
+        Args:
+            request_data: The original request data
+            anonymized_content: The anonymized content to replace with
+        """
+        messages = request_data.get("messages", [])
+        if not messages:
+            return
+
+        # Find and replace the last user message
+        for i in range(len(messages) - 1, -1, -1):
+            if messages[i].get("role") == USER_ROLE:
+                messages[i]["content"] = anonymized_content
+                break
+
+    def _replace_llm_response_content(
+        self, response: LLMResponse, anonymized_content: str
+    ):
+        """
+        Replace the LLM response content with anonymized version.
+
+        Args:
+            response: The original LLM response
+            anonymized_content: The anonymized content to replace with
+        """
+        if not isinstance(response, litellm.ModelResponse):
+            return
+
+        # Replace content in all choices
+        for choice in response.choices:
+            if isinstance(choice, litellm.Choices) and choice.message.content:
+                choice.message.content = anonymized_content
+
+    async def _check_user_message_background(
+        self,
+        request_data: dict,
+        user_auth: UserAPIKeyAuth,
+    ) -> None:
+        """Check user message in background for monitor mode - non-blocking"""
+        try:
+            await self._process_user_message_check(request_data, user_auth)
+        except Exception as e:
+            verbose_proxy_logger.error(
+                f"Noma background user message check failed: {str(e)}"
+            )
+
+    async def _check_llm_response_background(
+        self,
+        request_data: dict,
+        response: LLMResponse,
+        user_auth: UserAPIKeyAuth,
+    ) -> None:
+        """Check LLM response in background for monitor mode - non-blocking"""
+        try:
+            await self._process_llm_response_check(request_data, response, user_auth)
+        except Exception as e:
+            verbose_proxy_logger.error(
+                f"Noma background response check failed: {str(e)}"
+            )
+
+    async def _handle_verdict_background(
+        self,
+        type: MessageRole,
+        message: str,
+        response_json: dict,
+    ) -> None:
+        """Handle verdict from Noma API in background - logging only, never blocks"""
+        try:
+            if not response_json.get("verdict", True):
+                msg = f"Noma guardrail blocked {type} message: {message}"
+                verbose_proxy_logger.warning(msg)
+            else:
+                msg = f"Noma guardrail allowed {type} message: {message}"
+                verbose_proxy_logger.info(msg)
+        except Exception as e:
+            verbose_proxy_logger.error(
+                f"Noma background verdict handling failed: {str(e)}"
+            )
+
    async def async_pre_call_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
@ -191,6 +521,18 @@ class NomaGuardrail(CustomGuardrail):
        ):
            return data

+        # In monitor mode, run Noma check in background and return immediately
+        if self.monitor_mode:
+            try:
+                self._create_background_noma_check(
+                    self._check_user_message_background(data, user_api_key_dict)
+                )
+            except Exception as e:
+                verbose_proxy_logger.error(
+                    f"Failed to start background Noma pre-call check: {str(e)}"
+                )
+            return data
+
        try:
            return await self._check_user_message(data, user_api_key_dict)
        except NomaBlockedMessage:
@ -198,7 +540,7 @@ class NomaGuardrail(CustomGuardrail):
        except Exception as e:
            verbose_proxy_logger.error(f"Noma pre-call hook failed: {str(e)}")

-            if self.block_failures and not self.monitor_mode:
+            if self.block_failures:
                raise
            return data

@ -220,6 +562,18 @@ class NomaGuardrail(CustomGuardrail):
        if self.should_run_guardrail(data=data, event_type=event_type) is not True:
            return data

+        # In monitor mode, run Noma check in background and return immediately
+        if self.monitor_mode:
+            try:
+                self._create_background_noma_check(
+                    self._check_user_message_background(data, user_api_key_dict)
+                )
+            except Exception as e:
+                verbose_proxy_logger.error(
+                    f"Failed to start background Noma moderation check: {str(e)}"
+                )
+            return data
+
        try:
            return await self._check_user_message(data, user_api_key_dict)
        except NomaBlockedMessage:
@ -227,7 +581,7 @@ class NomaGuardrail(CustomGuardrail):
        except Exception as e:
            verbose_proxy_logger.error(f"Noma moderation hook failed: {str(e)}")

-            if self.block_failures and not self.monitor_mode:
+            if self.block_failures:
                raise
            return data

@ -235,19 +589,33 @@ class NomaGuardrail(CustomGuardrail):
        self,
        data: dict,
        user_api_key_dict: UserAPIKeyAuth,
-        response: Union[Any, ModelResponse, EmbeddingResponse, ImageResponse],
+        response: LLMResponse,
    ):
        event_type: GuardrailEventHooks = GuardrailEventHooks.post_call
        if self.should_run_guardrail(data=data, event_type=event_type) is not True:
            return response

+        # In monitor mode, run Noma check in background and return immediately
+        if self.monitor_mode:
+            try:
+                self._create_background_noma_check(
+                    self._check_llm_response_background(
+                        data, response, user_api_key_dict
+                    )
+                )
+            except Exception as e:
+                verbose_proxy_logger.error(
+                    f"Failed to start background Noma post-call check: {str(e)}"
+                )
+            return response
+
        try:
            return await self._check_llm_response(data, response, user_api_key_dict)
        except NomaBlockedMessage:
            raise
        except Exception as e:
            verbose_proxy_logger.error(f"Noma post-call hook failed: {str(e)}")
-            if self.block_failures and not self.monitor_mode:
+            if self.block_failures:
                raise
            return response

@ -257,55 +625,24 @@ class NomaGuardrail(CustomGuardrail):
        user_auth: UserAPIKeyAuth,
    ) -> Union[Exception, str, dict, None]:
        """Check user message for policy violations"""
-        extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
-
-        user_message = await self._extract_user_message(request_data)
+        user_message = await self._process_user_message_check(request_data, user_auth)
        if not user_message:
            return request_data

-        payload = {"request": {"text": user_message}}
-        response_json = await self._call_noma_api(
-            payload=payload,
-            llm_request_id=None,
-            request_data=request_data,
-            user_auth=user_auth,
-            extra_data=extra_data,
-        )
-        await self._check_verdict("user", user_message, response_json)
-
        return request_data

    async def _check_llm_response(
        self,
        request_data: dict,
-        response: Union[Any, ModelResponse, EmbeddingResponse, ImageResponse],
+        response: LLMResponse,
        user_auth: UserAPIKeyAuth,
    ) -> Union[Exception, ModelResponse, Any]:
        """Check LLM response for policy violations"""
-        extra_data = self.get_guardrail_dynamic_request_body_params(request_data)
-
-        if not isinstance(response, litellm.ModelResponse):
-            return response
-
-        content = None
-        for choice in response.choices:
-            if isinstance(choice, litellm.Choices) and choice.message.content:
-                content = choice.message.content
-                break
-
-        if not content or not isinstance(content, str):
-            return response
-
-        payload = {"response": {"text": content}}
-
-        response_json = await self._call_noma_api(
-            payload=payload,
-            llm_request_id=response.id,
-            request_data=request_data,
-            user_auth=user_auth,
-            extra_data=extra_data,
+        content = await self._process_llm_response_check(
+            request_data, response, user_auth
        )
-        await self._check_verdict("assistant", content, response_json)
+        if not content:
+            return response

        return response

@ -316,7 +653,7 @@ class NomaGuardrail(CustomGuardrail):
            return None

        # Get the last user message
-        user_messages = [msg for msg in messages if msg.get("role") == "user"]
+        user_messages = [msg for msg in messages if msg.get("role") == USER_ROLE]
        if not user_messages:
            return None

@ -371,7 +708,7 @@ class NomaGuardrail(CustomGuardrail):

    async def _check_verdict(
        self,
-        type: Literal["user", "assistant"],
+        type: MessageRole,
        message: str,
        response_json: dict,
    ) -> None:
@ -379,11 +716,7 @@ class NomaGuardrail(CustomGuardrail):
        Check the verdict from the Noma API and raise an exception if needed
        """
        if not response_json.get("verdict", True):
-            msg = str.format(
-                "Noma guardrail blocked {type} message: {message}",
-                type=type,
-                message=message,
-            )
+            msg = f"Noma guardrail blocked {type} message: {message}"

            if self.monitor_mode:
                verbose_proxy_logger.warning(msg)
@ -392,11 +725,7 @@ class NomaGuardrail(CustomGuardrail):
                original_response = response_json.get("originalResponse", {})
                raise NomaBlockedMessage(original_response)
        else:
-            msg = str.format(
-                "Noma guardrail allowed {type} message: {message}",
-                type=type,
-                message=message,
-            )
+            msg = f"Noma guardrail allowed {type} message: {message}"
            if self.monitor_mode:
                verbose_proxy_logger.info(msg)
            else:
--- a/litellm/proxy/guardrails/guardrail_hooks/tool_permission.py
+++ b/litellm/proxy/guardrails/guardrail_hooks/tool_permission.py
@ -0,0 +1,511 @@
+from fastapi import HTTPException
+
+import re
+from typing import Any, AsyncGenerator, Dict, List, Literal, Optional, Union
+
+from litellm import ChatCompletionToolParam
+
+from litellm._logging import verbose_proxy_logger
+from litellm.caching.dual_cache import DualCache
+from litellm.exceptions import GuardrailRaisedException
+from litellm.integrations.custom_guardrail import (
+    CustomGuardrail,
+    log_guardrail_information,
+)
+from litellm.proxy._types import UserAPIKeyAuth
+from litellm.proxy.common_utils.callback_utils import (
+    add_guardrail_to_applied_guardrails_header,
+)
+from litellm.types.guardrails import GuardrailEventHooks
+from litellm.types.proxy.guardrails.guardrail_hooks.tool_permission import (
+    PermissionError,
+    ToolPermissionRule,
+    ToolResult,
+)
+from litellm.types.utils import (
+    ModelResponse,
+    ModelResponseStream,
+    LLMResponseTypes,
+    Choices,
+    ChatCompletionMessageToolCall,
+)
+
+GUARDRAIL_NAME = "tool_permission"
+
+
+class ToolPermissionGuardrail(CustomGuardrail):
+    def __init__(
+        self,
+        rules: Optional[List[Dict]] = None,
+        default_action: Literal["deny", "allow"] = "deny",
+        on_disallowed_action: Literal["block", "rewrite"] = "block",
+        **kwargs,
+    ):
+        """
+        Initialize the Tool Permission Guardrail
+
+        Args:
+            rules: List of permission rules
+            default_action: Default action when no rule matches ("allow" or "deny")
+            on_disallowed_action:
+            **kwargs: Additional arguments passed to CustomGuardrail
+        """
+        # Set supported event hooks - this guardrail only works on post_call
+        if "supported_event_hooks" not in kwargs:
+            kwargs["supported_event_hooks"] = [
+                GuardrailEventHooks.pre_call,
+                GuardrailEventHooks.post_call,
+            ]
+
+        super().__init__(**kwargs)
+
+        self.rules: List[ToolPermissionRule] = []
+        if rules:
+            for rule_dict in rules:
+                self.rules.append(ToolPermissionRule(**rule_dict))
+
+        self.default_action = default_action
+        self.on_disallowed_action = on_disallowed_action
+
+        verbose_proxy_logger.debug(
+            "Tool Permission Guardrail initialized with %d rules, default_action: %s",
+            len(self.rules),
+            self.default_action,
+        )
+
+    def _matches_pattern(self, tool_name: str, pattern: str) -> bool:
+        """
+        Check if a tool name matches a pattern
+
+        Supports patterns like:
+        - "Bash" - exact match
+        - "mcp__*" - prefix pattern (matches names starting wich "mcp__")
+        - "*_read" - suffix wildcard (matches names ending with "_read")
+        - "mcp__github_*_read" - infix wildcard (matches names like "mcp__github_mark_all_notifications_read")
+
+        Args:
+            tool_name: Name of the tool to check
+            pattern: Pattern to match against
+
+        Returns:
+            True if the tool name matches the pattern
+        """
+        # Handle exact matches
+        if tool_name == pattern:
+            return True
+
+        if "*" in pattern:
+            # Escape regex special chars except '*'
+            escaped_pattern = re.escape(pattern)
+            # Turn \* into .*
+            regex_pattern = escaped_pattern.replace(r"\*", ".*")
+            return bool(re.fullmatch(regex_pattern, tool_name))
+
+        return False
+
+    def _check_tool_permission(
+        self, tool_name: str
+    ) -> tuple[bool, Optional[str], Optional[str]]:
+        """
+        Check if a tool is allowed based on the configured rules
+
+        Args:
+            tool_name: Name of the tool to check
+
+        Returns:
+            Tuple of (is_allowed, rule_id, message)
+        """
+        verbose_proxy_logger.debug(f"Checking permission for tool: {tool_name}")
+
+        # Check each rule in order
+        for rule in self.rules:
+            if self._matches_pattern(tool_name, rule.tool_name):
+                is_allowed = rule.decision == "allow"
+                message = f"Tool '{tool_name}' {'allowed' if is_allowed else 'denied'} by rule '{rule.id}'"
+                verbose_proxy_logger.debug(message)
+                return is_allowed, rule.id, message
+
+        # No rule matched, use default action
+        is_allowed = self.default_action == "allow"
+        message = f"Tool '{tool_name}' {'allowed' if is_allowed else 'denied'} by default action"
+        verbose_proxy_logger.debug(message)
+        return is_allowed, None, message
+
+    def _extract_tool_calls_from_response(
+        self, response: ModelResponse
+    ) -> List[ChatCompletionMessageToolCall]:
+        """
+        Extract tool_calls from all choices in a model response.
+
+        Args:
+            response: The model response to analyze
+
+        Returns:
+            List of tool_calls blocks found in the response
+        """
+        tool_calls = []
+
+        for choice in response.choices:
+            if isinstance(choice, Choices):
+                for tool in choice.message.tool_calls or []:
+                    tool_calls.append(tool)
+
+        return tool_calls
+
+    def _modify_request_with_permission_errors(
+        self,
+        data: dict,
+        denied_tool_names: List[str],
+    ):
+        """
+        Modify the request to replace denied tool_calls blocks with error results
+
+        Args:
+            data: The model request to modify
+            denied_tools: List of (tool_use, error) tuples for denied tools
+        """
+        if not denied_tool_names:
+            return data
+
+        verbose_proxy_logger.info(
+            f"Blocking {len(denied_tool_names)} unauthorized tool uses"
+        )
+
+        # Create a mapping of tool_use_id to error result
+        error_tool_names = set()
+        for tool_use in denied_tool_names:
+            error_tool_names.add(tool_use)
+
+        # Modify the tools
+        tools: Optional[List[ChatCompletionToolParam]] = data.get("tools")
+        if tools is None:
+            return data
+
+        new_tools = []
+        for tool in tools:
+            if tool["type"] != "function":
+                continue
+            tool_name: str = tool["function"]["name"]
+            if tool_name not in error_tool_names:
+                new_tools.append(tool)
+        data["tools"] = new_tools
+        return data
+
+    def _create_permission_error_result(
+        self, tool_call: ChatCompletionMessageToolCall, error: PermissionError
+    ) -> ToolResult:
+        """
+        Create a tool_result block for a permission error
+
+        Args:
+            tool_use: The tool use that was denied
+            error: The permission error details
+
+        Returns:
+            A tool_result block with the error message
+        """
+        error_message = f"Permission denied: {error.message}"
+        if error.rule_id:
+            error_message += f" (Rule: {error.rule_id})"
+
+        return ToolResult(
+            tool_use_id=tool_call.id, content=error_message, is_error=True
+        )
+
+    def _modify_response_with_permission_errors(
+        self,
+        response: ModelResponse,
+        denied_tools: List[tuple[ChatCompletionMessageToolCall, PermissionError]],
+    ) -> None:
+        """
+        Modify the response to replace denied tool_calls blocks with error results
+
+        Args:
+            response: The model response to modify
+            denied_tools: List of (tool_use, error) tuples for denied tools
+        """
+        if not denied_tools:
+            return
+
+        verbose_proxy_logger.info(
+            f"Blocking {len(denied_tools)} unauthorized tool uses"
+        )
+
+        # Create a mapping of tool_use_id to error result
+        error_results = {}
+        for tool_use, error in denied_tools:
+            error_result = self._create_permission_error_result(tool_use, error)
+            error_results[tool_use.id] = error_result
+
+        # Modify the response content
+        for choice in response.choices:
+            if isinstance(choice, Choices):
+                filtered_tool_calls = []
+                error_messages = []
+
+                # Rewrite tool_calls
+                for tool_call in choice.message.tool_calls or []:
+                    tool_call_id = tool_call.id
+                    if tool_call_id in error_results:
+                        error_result = error_results[tool_call_id]
+                        error_messages.append(error_result.content)
+                    else:
+                        filtered_tool_calls.append(tool_call)
+
+                choice.message.tool_calls = (
+                    filtered_tool_calls if filtered_tool_calls else None
+                )
+
+                # Add error messages to content
+                if error_messages:
+                    existing_content = choice.message.content
+                    if existing_content:
+                        choice.message.content = (
+                            existing_content + "\n\n" + "\n".join(error_messages)
+                        )
+                    else:
+                        choice.message.content = "\n".join(error_messages)
+
+    @log_guardrail_information
+    async def async_pre_call_hook(
+        self,
+        user_api_key_dict: UserAPIKeyAuth,
+        cache: DualCache,
+        data: dict,
+        call_type: Literal[
+            "completion",
+            "text_completion",
+            "embeddings",
+            "image_generation",
+            "moderation",
+            "audio_transcription",
+            "pass_through_endpoint",
+            "rerank",
+            "mcp_call",
+        ],
+    ) -> Union[Exception, str, dict, None]:
+        """ """
+        verbose_proxy_logger.debug("Tool Permission Guardrail Pre-Call Hook")
+
+        from litellm.proxy.common_utils.callback_utils import (
+            add_guardrail_to_applied_guardrails_header,
+        )
+
+        event_type: GuardrailEventHooks = GuardrailEventHooks.pre_call
+        if self.should_run_guardrail(data=data, event_type=event_type) is not True:
+            return data
+
+        new_tools: Optional[List[ChatCompletionToolParam]] = data.get("tools")
+        if new_tools is None:
+            verbose_proxy_logger.warning(
+                "Tool Permission Guardrail: not running guardrail. No tools in data"
+            )
+            return data
+
+        # Check permissions for each tool
+        denied_tool_names = []
+        for tool in new_tools:
+            if tool["type"] != "function":
+                continue
+            tool_name: str = tool["function"]["name"]
+
+            is_allowed, _, message = self._check_tool_permission(tool_name)
+
+            if not is_allowed and message is not None:
+                verbose_proxy_logger.warning(f"Tool Permission Guardrail: {message}")
+                if self.on_disallowed_action == "block":
+                    raise HTTPException(
+                        status_code=400,
+                        detail={
+                            "error": "Violated guardrail policy",
+                            "detection_message": message,
+                        },
+                    )
+                denied_tool_names.append(tool_name)
+
+        if denied_tool_names:
+            data = self._modify_request_with_permission_errors(data, denied_tool_names)
+
+        verbose_proxy_logger.debug(
+            "Tool Permission Guardrail Pre-Call Hook: All tools allowed"
+        )
+
+        add_guardrail_to_applied_guardrails_header(
+            request_data=data, guardrail_name=self.guardrail_name
+        )
+        return data
+
+    @log_guardrail_information
+    async def async_post_call_success_hook(
+        self,
+        data: dict,
+        user_api_key_dict: UserAPIKeyAuth,
+        response: LLMResponseTypes,
+    ):
+        """
+        Check tool usage permissions after the LLM call
+
+        Args:
+            data: Request data
+            user_api_key_dict: User API key information (unused but required by interface)
+            response: The model response to check
+        """
+        if not isinstance(response, ModelResponse):
+            return
+
+        verbose_proxy_logger.debug(
+            "Tool Permission Guardrail Post-Call Hook: Checking response"
+        )
+
+        if not self.should_run_guardrail(
+            data=data, event_type=GuardrailEventHooks.post_call
+        ):
+            verbose_proxy_logger.debug(
+                "Tool Permission Guardrail: Skipping check (not enabled)"
+            )
+            return
+
+        # Extract tool_calls from the response
+        tool_calls = self._extract_tool_calls_from_response(response)
+
+        if not tool_calls:
+            verbose_proxy_logger.debug("Tool Permission Guardrail: No tool uses found")
+            return
+
+        verbose_proxy_logger.debug(
+            f"Tool Permission Guardrail: Found {len(tool_calls)} tool calls"
+        )
+
+        # Check permissions for each tool use
+        denied_tools = []
+        for tool_call in tool_calls:
+            if tool_call.function.name is None:
+                continue
+            is_allowed, rule_id, message = self._check_tool_permission(
+                tool_call.function.name
+            )
+
+            if not is_allowed and message is not None:
+                verbose_proxy_logger.warning(f"Tool Permission Guardrail: {message}")
+
+                if self.on_disallowed_action == "block":
+                    raise GuardrailRaisedException(
+                        guardrail_name=self.guardrail_name,
+                        message=message,
+                    )
+                denied_tools.append(
+                    (
+                        tool_call,
+                        PermissionError(
+                            tool_name=tool_call.function.name,
+                            rule_id=rule_id,
+                            message=message,
+                        ),
+                    )
+                )
+
+        if denied_tools:
+            self._modify_response_with_permission_errors(response, denied_tools)
+
+        verbose_proxy_logger.debug(
+            "Tool Permission Guardrail Post-Call Hook: All tools allowed"
+        )
+
+        add_guardrail_to_applied_guardrails_header(
+            request_data=data, guardrail_name=self.guardrail_name
+        )
+
+    async def async_post_call_streaming_iterator_hook(
+        self,
+        user_api_key_dict: UserAPIKeyAuth,
+        response: Any,
+        request_data: dict,
+    ) -> AsyncGenerator[ModelResponseStream, None]:
+        """
+        Check tool usage permissions after the LLM stream call
+
+        Args:
+            user_api_key_dict: User API key information (unused but required by interface)
+            response: The model response to check
+            request_data: The model request (unused but required by interface)
+        """
+
+        # Import here to avoid circular imports
+        from litellm.llms.base_llm.base_model_iterator import MockResponseIterator
+        from litellm.main import stream_chunk_builder
+        from litellm.types.utils import TextCompletionResponse
+
+        # Collect all chunks to process them together
+        all_chunks: List[ModelResponseStream] = []
+        async for chunk in response:
+            all_chunks.append(chunk)
+
+        assembled_model_response: Optional[
+            Union[ModelResponse, TextCompletionResponse]
+        ] = stream_chunk_builder(
+            chunks=all_chunks,
+        )
+        if isinstance(assembled_model_response, ModelResponse):
+            verbose_proxy_logger.debug("Tool Permission Guardrail: Checking response")
+
+            # Extract tool_calls from the response
+            tool_calls = self._extract_tool_calls_from_response(assembled_model_response)
+
+            if not tool_calls:
+                verbose_proxy_logger.debug(
+                    "Tool Permission Guardrail: No tool uses found"
+                )
+                return
+
+            verbose_proxy_logger.debug(
+                f"Tool Permission Guardrail: Found {len(tool_calls)} tool calls"
+            )
+
+            # Check permissions for each tool use
+            denied_tools = []
+            for tool_call in tool_calls:
+                if tool_call.function.name is None:
+                    continue
+                is_allowed, rule_id, message = self._check_tool_permission(
+                    tool_call.function.name
+                )
+
+                if not is_allowed and message is not None:
+                    verbose_proxy_logger.warning(
+                        f"Tool Permission Guardrail: {message}"
+                    )
+
+                    if self.on_disallowed_action == "block":
+                        raise GuardrailRaisedException(
+                            guardrail_name=self.guardrail_name,
+                            message=message,
+                        )
+                    denied_tools.append(
+                        (
+                            tool_call,
+                            PermissionError(
+                                tool_name=tool_call.function.name,
+                                rule_id=rule_id,
+                                message=message,
+                            ),
+                        )
+                    )
+
+                verbose_proxy_logger.debug(
+                    "Tool Permission Guardrail Post-Call Hook: All tools allowed"
+                )
+
+                if denied_tools:
+                    self._modify_response_with_permission_errors(
+                        assembled_model_response, denied_tools
+                    )
+
+                mock_response = MockResponseIterator(
+                    model_response=assembled_model_response
+                )
+                # Return the reconstructed stream
+                async for chunk in mock_response:
+                    yield chunk
+        else:
+            for chunk in all_chunks:
+                yield chunk
--- a/litellm/proxy/guardrails/guardrail_initializers.py
+++ b/litellm/proxy/guardrails/guardrail_initializers.py
@ -123,3 +123,18 @@ def initialize_hide_secrets(litellm_params: LitellmParams, guardrail: Guardrail)
    return _secret_detection_object


+def initialize_tool_permission(litellm_params: LitellmParams, guardrail: Guardrail):
+    from litellm.proxy.guardrails.guardrail_hooks.tool_permission import (
+        ToolPermissionGuardrail,
+    )
+
+    _tool_permission_callback = ToolPermissionGuardrail(
+        guardrail_name=guardrail.get("guardrail_name", ""),
+        event_hook=litellm_params.mode,
+        rules=litellm_params.rules,
+        default_action=getattr(litellm_params, "default_action", "deny"),
+        on_disallowed_action=getattr(litellm_params, "on_disallowed_action", "block"),
+        default_on=litellm_params.default_on,
+    )
+    litellm.logging_callback_manager.add_litellm_callback(_tool_permission_callback)
+    return _tool_permission_callback
--- a/litellm/proxy/guardrails/guardrail_registry.py
+++ b/litellm/proxy/guardrails/guardrail_registry.py
@ -26,6 +26,7 @@ from .guardrail_initializers import (
    initialize_lakera,
    initialize_lakera_v2,
    initialize_presidio,
+    initialize_tool_permission,
 )

 guardrail_initializer_registry = {
@ -34,6 +35,7 @@ guardrail_initializer_registry = {
    SupportedGuardrailIntegrations.LAKERA_V2.value: initialize_lakera_v2,
    SupportedGuardrailIntegrations.PRESIDIO.value: initialize_presidio,
    SupportedGuardrailIntegrations.HIDE_SECRETS.value: initialize_hide_secrets,
+    SupportedGuardrailIntegrations.TOOL_PERMISSION.value: initialize_tool_permission,
 }

 guardrail_class_registry: Dict[str, Type[CustomGuardrail]] = {}
--- a/litellm/proxy/hooks/parallel_request_limiter.py
+++ b/litellm/proxy/hooks/parallel_request_limiter.py
@ -1,10 +1,11 @@
 import asyncio
 import sys
 from datetime import datetime, timedelta
-from typing import TYPE_CHECKING, Any, List, Literal, Optional, Tuple, TypedDict, Union
+from typing import TYPE_CHECKING, Any, List, Literal, Optional, Tuple, Union

 from fastapi import HTTPException
 from pydantic import BaseModel
+from typing_extensions import TypedDict

 import litellm
 from litellm import DualCache, ModelResponse
--- a/litellm/proxy/litellm_pre_call_utils.py
+++ b/litellm/proxy/litellm_pre_call_utils.py
@ -14,11 +14,14 @@ from litellm.proxy._types import (
    AddTeamCallback,
    CommonProxyErrors,
    LitellmDataForBackendLLMCall,
+    LitellmUserRoles,
    SpecialHeaders,
    TeamCallbackMetadata,
    UserAPIKeyAuth,
-    LitellmUserRoles,
 )
+
+# Cache special headers as a frozenset for O(1) lookup performance
+_SPECIAL_HEADERS_CACHE = frozenset(v.value.lower() for v in SpecialHeaders._member_map_.values())
 from litellm.proxy.auth.route_checks import RouteChecks
 from litellm.router import Router
 from litellm.types.llms.anthropic import ANTHROPIC_API_HEADERS
@ -54,6 +57,13 @@ def parse_cache_control(cache_control):
    return cache_dict


+LITELLM_METADATA_ROUTES = (
+    "batches",
+    "/v1/messages",
+    "responses",
+    "files",
+)
+
 def _get_metadata_variable_name(request: Request) -> str:
    """
    Helper to return what the "metadata" field should be called in the request data
@ -65,22 +75,10 @@ def _get_metadata_variable_name(request: Request) -> str:
    if RouteChecks._is_assistants_api_request(request):
        return "litellm_metadata"

-    LITELLM_METADATA_ROUTES = [
-        "batches",
-        "/v1/messages",
-        "responses",
-        "files",
-    ]
-
-    if any(
-        [
-            litellm_metadata_route in request.url.path
-            for litellm_metadata_route in LITELLM_METADATA_ROUTES
-        ]
-    ):
+    if any(route in request.url.path for route in LITELLM_METADATA_ROUTES):
        return "litellm_metadata"
-    else:
-        return "metadata"
+
+    return "metadata"


 def safe_add_api_version_from_query_params(data: dict, request: Request):
@ -235,14 +233,13 @@ def clean_headers(
    """
    Removes litellm api key from headers
    """
-    special_headers = [v.value.lower() for v in SpecialHeaders._member_map_.values()]
-    special_headers = special_headers
-    if litellm_key_header_name is not None:
-        special_headers.append(litellm_key_header_name.lower())
    clean_headers = {}
-
+    litellm_key_lower = litellm_key_header_name.lower() if litellm_key_header_name is not None else None
+    
    for header, value in headers.items():
-        if header.lower() not in special_headers:
+        header_lower = header.lower()
+        # Check if header should be excluded: either in special headers cache or matches custom litellm key
+        if (header_lower not in _SPECIAL_HEADERS_CACHE and (litellm_key_lower is None or header_lower != litellm_key_lower)):
            clean_headers[header] = value
    return clean_headers

@ -272,7 +269,7 @@ class LiteLLMProxyRequestSetup:
        if timeout_header is not None:
            return float(timeout_header)
        return None
-    
+
    @staticmethod
    def _get_stream_timeout_from_request(headers: dict) -> Optional[float]:
        """
@ -292,13 +289,14 @@ class LiteLLMProxyRequestSetup:
        if num_retries_header is not None:
            return int(num_retries_header)
        return None
-    
+
    @staticmethod
    def _get_spend_logs_metadata_from_request_headers(headers: dict) -> Optional[dict]:
        """
        Get the `spend_logs_metadata` from the request headers.
        """
        from litellm.litellm_core_utils.safe_json_loads import safe_json_loads
+
        spend_logs_metadata_header = headers.get("x-litellm-spend-logs-metadata", None)
        if spend_logs_metadata_header is not None:
            return safe_json_loads(spend_logs_metadata_header)
@ -337,16 +335,24 @@ class LiteLLMProxyRequestSetup:
        return None

    @staticmethod
-    def add_internal_user_from_user_mapping(general_settings: Optional[Dict], user_api_key_dict: UserAPIKeyAuth, headers: dict) -> UserAPIKeyAuth:
+    def add_internal_user_from_user_mapping(
+        general_settings: Optional[Dict],
+        user_api_key_dict: UserAPIKeyAuth,
+        headers: dict,
+    ) -> UserAPIKeyAuth:
        if general_settings is None:
            return user_api_key_dict
        user_header_mapping = general_settings.get("user_header_mappings")
        if not user_header_mapping:
            return user_api_key_dict
-        header_name = LiteLLMProxyRequestSetup.get_internal_user_header_from_mapping(user_header_mapping)
+        header_name = LiteLLMProxyRequestSetup.get_internal_user_header_from_mapping(
+            user_header_mapping
+        )
        if not header_name:
            return user_api_key_dict
-        header_value = LiteLLMProxyRequestSetup._get_case_insensitive_header(headers, header_name)
+        header_value = LiteLLMProxyRequestSetup._get_case_insensitive_header(
+            headers, header_name
+        )
        if header_value:
            user_api_key_dict.user_id = header_value
            return user_api_key_dict
@ -429,15 +435,25 @@ class LiteLLMProxyRequestSetup:
        """
        Add headers to the LLM call by model group
        """
+        from litellm.proxy.auth.auth_checks import _check_model_access_helper
+        from litellm.proxy.proxy_server import llm_router
+
        data_model = data.get("model")
+
        if (
            data_model is not None
            and litellm.model_group_settings is not None
            and litellm.model_group_settings.forward_client_headers_to_llm_api
            is not None
-            and data_model
-            in litellm.model_group_settings.forward_client_headers_to_llm_api
+            and _check_model_access_helper(
+                model=data_model,
+                llm_router=llm_router,
+                models=litellm.model_group_settings.forward_client_headers_to_llm_api,
+                team_model_aliases=user_api_key_dict.team_model_aliases,
+                team_id=user_api_key_dict.team_id,
+            )  # handles aliases, wildcards, etc.
        ):
+
            _headers = LiteLLMProxyRequestSetup.add_headers_to_llm_call(
                headers, user_api_key_dict
            )
@ -497,8 +513,10 @@ class LiteLLMProxyRequestSetup:
        timeout = LiteLLMProxyRequestSetup._get_timeout_from_request(headers)
        if timeout is not None:
            data["timeout"] = timeout
-        
-        stream_timeout = LiteLLMProxyRequestSetup._get_stream_timeout_from_request(headers)
+
+        stream_timeout = LiteLLMProxyRequestSetup._get_stream_timeout_from_request(
+            headers
+        )
        if stream_timeout is not None:
            data["stream_timeout"] = stream_timeout

@ -507,7 +525,7 @@ class LiteLLMProxyRequestSetup:
            data["num_retries"] = num_retries

        return data
-    
+
    @staticmethod
    def add_litellm_metadata_from_request_headers(
        headers: dict,
@ -520,11 +538,16 @@ class LiteLLMProxyRequestSetup:
        Relevant issue: https://github.com/BerriAI/litellm/issues/14008
        """
        from litellm.proxy._types import LitellmMetadataFromRequestHeaders
+
        metadata_from_headers = LitellmMetadataFromRequestHeaders()
-        spend_logs_metadata = LiteLLMProxyRequestSetup._get_spend_logs_metadata_from_request_headers(headers)
+        spend_logs_metadata = (
+            LiteLLMProxyRequestSetup._get_spend_logs_metadata_from_request_headers(
+                headers
+            )
+        )
        if spend_logs_metadata is not None:
            metadata_from_headers["spend_logs_metadata"] = spend_logs_metadata
-        
+
        #########################################################################################
        # Finally update the requests metadata with the `metadata_from_headers`
        #########################################################################################
@ -714,7 +737,6 @@ async def add_litellm_data_to_request(  # noqa: PLR0915
    from litellm.proxy.proxy_server import llm_router, premium_user
    from litellm.types.proxy.litellm_pre_call_utils import SecretFields

-
    _headers = clean_headers(
        request.headers,
        litellm_key_header_name=(
@ -740,8 +762,6 @@ async def add_litellm_data_to_request(  # noqa: PLR0915
    if data.get(_metadata_variable_name, None) is None:
        data[_metadata_variable_name] = {}

-
-
    data.update(
        LiteLLMProxyRequestSetup.add_litellm_data_for_backend_llm_call(
            headers=_headers,
@ -763,7 +783,9 @@ async def add_litellm_data_to_request(  # noqa: PLR0915
        data=data, headers=_headers, user_api_key_dict=user_api_key_dict
    )

-    user_api_key_dict = LiteLLMProxyRequestSetup.add_internal_user_from_user_mapping(general_settings, user_api_key_dict, _headers)
+    user_api_key_dict = LiteLLMProxyRequestSetup.add_internal_user_from_user_mapping(
+        general_settings, user_api_key_dict, _headers
+    )

    # Parse user info from headers
    user = LiteLLMProxyRequestSetup.get_user_from_headers(_headers, general_settings)
@ -773,7 +795,6 @@ async def add_litellm_data_to_request(  # noqa: PLR0915
        if "user" not in data:
            data["user"] = user

-
    data["secret_fields"] = SecretFields(raw_headers=dict(request.headers))

    ## Dynamic api version (Azure OpenAI endpoints) ##
--- a/litellm/proxy/management_endpoints/common_utils.py
+++ b/litellm/proxy/management_endpoints/common_utils.py
@ -93,7 +93,7 @@ async def _upsert_budget_and_membership(
        create_data["tpm_limit"] = tpm_limit
    if rpm_limit is not None:
        create_data["rpm_limit"] = rpm_limit
-    
+
    new_budget = await tx.litellm_budgettable.create(
        data=create_data,
        include={"team_membership": True},
--- a/litellm/proxy/management_endpoints/key_management_endpoints.py
+++ b/litellm/proxy/management_endpoints/key_management_endpoints.py
@ -925,6 +925,15 @@ async def prepare_key_update_data(
            detail="team_id is required for service account keys. Please specify `team_id` in the request body.",
        )
    non_default_values = {}
+    # ADD METADATA FIELDS
+    # Set Management Endpoint Metadata Fields
+    for field in LiteLLM_ManagementEndpoint_MetadataFields_Premium:
+        if getattr(data, field, None) is not None:
+            _set_object_metadata_field(
+                object_data=data,
+                field_name=field,
+                value=getattr(data, field),
+            )
    for k, v in data_json.items():
        if (
            k in LiteLLM_ManagementEndpoint_MetadataFields
@ -1137,6 +1146,9 @@ async def update_key_fn(
                change_initiated_by=user_api_key_dict,
                llm_router=llm_router,
            )
+
+            # Set Management Endpoint Metadata Fields
+
        non_default_values = await prepare_key_update_data(
            data=data, existing_key_row=existing_key_row
        )
--- a/litellm/proxy/management_endpoints/organization_endpoints.py
+++ b/litellm/proxy/management_endpoints/organization_endpoints.py
@ -36,6 +36,28 @@ from litellm.proxy.utils import PrismaClient
 router = APIRouter()


+def handle_nested_budget_structure_in_organization_update_request(raw_data: dict) -> dict:
+    """
+    Transform organization update request to handle UI payload format.
+    
+    The UI sends nested budget data in 'litellm_budget_table', but our
+    model expects flat budget fields at the top level.
+    """
+    transformed_data = raw_data.copy()
+    
+    # Handle nested budget structure from UI
+    if 'litellm_budget_table' in transformed_data:
+        budget_data = transformed_data.pop('litellm_budget_table', {})
+        if budget_data:
+            # Extract valid budget fields and merge into top level
+            budget_fields = LiteLLM_BudgetTable.model_fields.keys()
+            for key, value in budget_data.items():
+                if key in budget_fields and value is not None:
+                    transformed_data[key] = value
+    
+    return transformed_data
+
+
@router.post(
    "/organization/new",
    tags=["organization management"],
@ -248,7 +270,7 @@ async def _set_object_permission(
    response_model=LiteLLM_OrganizationTableWithMembers,
 )
 async def update_organization(
-    data: LiteLLM_OrganizationTableUpdate,
+    request: Request,
    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
 ):
    """
@ -270,6 +292,13 @@ async def update_organization(
            },
        )

+    # Transform UI payload to expected format
+    raw_data = await request.json()
+    raw_data_with_flat_budget_fields = handle_nested_budget_structure_in_organization_update_request(raw_data)
+    
+    # Create validated data model
+    data = LiteLLM_OrganizationTableUpdate(**raw_data_with_flat_budget_fields)
+    
    if data.updated_by is None:
        data.updated_by = user_api_key_dict.user_id

@ -293,6 +322,23 @@ async def update_organization(
            existing_organization_row=existing_organization_row,
        )

+    # Handle budget updates if budget fields are provided
+    budget_fields = {k: v for k, v in data.model_dump().items() 
+                    if k in LiteLLM_BudgetTable.model_fields.keys() and v is not None}
+    
+    if budget_fields and existing_organization_row.budget_id:
+        await update_budget(
+            budget_obj=BudgetNewRequest(
+                budget_id=existing_organization_row.budget_id,
+                **budget_fields
+            ),
+            user_api_key_dict=user_api_key_dict,
+        )
+    
+    # Remove budget fields from organization update data
+    for field in LiteLLM_BudgetTable.model_fields.keys():
+        updated_organization_row.pop(field, None)
+
    response = await prisma_client.db.litellm_organizationtable.update(
        where={"organization_id": data.organization_id},
        data=updated_organization_row,
--- a/litellm/proxy/management_endpoints/scim/scim_v2.py
+++ b/litellm/proxy/management_endpoints/scim/scim_v2.py
@ -5,7 +5,7 @@ This is an enterprise feature and requires a premium license.
 """

 import uuid
-from typing import Any, Dict, List, Optional, Set, Tuple, TypedDict
+from typing import Any, Dict, List, Optional, Set, Tuple

 from fastapi import (
    APIRouter,
@ -17,6 +17,7 @@ from fastapi import (
    Request,
    Response,
 )
+from typing_extensions import TypedDict

 import litellm
 from litellm._logging import verbose_proxy_logger
--- a/litellm/proxy/pass_through_endpoints/llm_provider_handlers/assembly_passthrough_logging_handler.py
+++ b/litellm/proxy/pass_through_endpoints/llm_provider_handlers/assembly_passthrough_logging_handler.py
@ -2,10 +2,11 @@ import asyncio
 import json
 import time
 from datetime import datetime
-from typing import Literal, Optional, TypedDict
+from typing import Literal, Optional
 from urllib.parse import urlparse

 import httpx
+from typing_extensions import TypedDict

 import litellm
 from litellm._logging import verbose_proxy_logger
--- a/litellm/proxy/spend_tracking/spend_management_endpoints.py
+++ b/litellm/proxy/spend_tracking/spend_management_endpoints.py
@ -10,7 +10,6 @@ from fastapi import APIRouter, Depends, HTTPException, status

 import litellm
 from litellm._logging import verbose_proxy_logger
-from litellm.router_strategy.budget_limiter import RouterBudgetLimiting
 from litellm.proxy._types import *
 from litellm.proxy._types import ProviderBudgetResponse, ProviderBudgetResponseObject
 from litellm.proxy.auth.user_api_key_auth import user_api_key_auth
@ -18,6 +17,7 @@ from litellm.proxy.spend_tracking.spend_tracking_utils import (
    get_spend_by_team_and_customer,
 )
 from litellm.proxy.utils import handle_exception_on_proxy
+from litellm.router_strategy.budget_limiter import RouterBudgetLimiting

 if TYPE_CHECKING:
    from litellm.proxy.proxy_server import PrismaClient
@ -1660,6 +1660,12 @@ async def ui_view_spend_logs(  # noqa: PLR0915
    model: Optional[str] = fastapi.Query(
        default=None, description="Filter logs by model"
    ),
+    key_alias: Optional[str] = fastapi.Query(
+        default=None, description="Filter logs by key alias"
+    ),
+    end_user: Optional[str] = fastapi.Query(
+        default=None, description="Filter logs by end user"
+    ),
 ):
    """
    View spend logs for UI with pagination support
@ -1728,6 +1734,15 @@ async def ui_view_spend_logs(  # noqa: PLR0915
        if model is not None:
            where_conditions["model"] = model

+        if key_alias is not None:
+            where_conditions["metadata"] = {
+                "path": ["user_api_key_alias"],
+                "string_contains": key_alias,
+            }
+
+        if end_user is not None:
+            where_conditions["end_user"] = end_user
+
        if min_spend is not None or max_spend is not None:
            where_conditions["spend"] = {}
            if min_spend is not None:
--- a/litellm/router.py
+++ b/litellm/router.py
@ -4414,7 +4414,7 @@ class Router:
                return tpm_key

        except Exception as e:
-            verbose_router_logger.exception(
+            verbose_router_logger.debug(
                "litellm.router.Router::deployment_callback_on_success(): Exception occured - {}".format(
                    str(e)
                )
@ -4562,8 +4562,10 @@ class Router:
            parent_otel_span=parent_otel_span,
            ttl=RoutingArgs.ttl.value,
        )
-    
-    def _get_metadata_variable_name_from_kwargs(self, kwargs: dict) -> Literal["metadata", "litellm_metadata"]:
+
+    def _get_metadata_variable_name_from_kwargs(
+        self, kwargs: dict
+    ) -> Literal["metadata", "litellm_metadata"]:
        """
        Helper to return what the "metadata" field should be called in the request data

@ -5672,11 +5674,11 @@ class Router:
                )
                if supported_openai_params is None:
                    supported_openai_params = []
-                
+
                # Get mode from database model_info if available, otherwise default to "chat"
                db_model_info = model.get("model_info", {})
                mode = db_model_info.get("mode", "chat")
-                
+
                model_info = ModelMapInfo(
                    key=model_group,
                    max_tokens=None,
@ -6802,7 +6804,9 @@ class Router:
            model=model,
            request_kwargs=request_kwargs,
            healthy_deployments=healthy_deployments,
-            metadata_variable_name=self._get_metadata_variable_name_from_kwargs(request_kwargs),
+            metadata_variable_name=self._get_metadata_variable_name_from_kwargs(
+                request_kwargs
+            ),
        )

        if len(healthy_deployments) == 0:
--- a/litellm/router_utils/cooldown_cache.py
+++ b/litellm/router_utils/cooldown_cache.py
@ -3,7 +3,9 @@ Wrapper around router cache. Meant to handle model cooldown logic
 """

 import time
-from typing import TYPE_CHECKING, Any, List, Optional, Tuple, TypedDict, Union
+from typing import TYPE_CHECKING, Any, List, Optional, Tuple, Union
+
+from typing_extensions import TypedDict

 from litellm import verbose_logger
 from litellm.caching.caching import DualCache
--- a/litellm/router_utils/prompt_caching_cache.py
+++ b/litellm/router_utils/prompt_caching_cache.py
@ -4,7 +4,9 @@ Wrapper around router cache. Meant to store model id when prompt caching support

 import hashlib
 import json
-from typing import TYPE_CHECKING, Any, List, Optional, TypedDict, Union
+from typing import TYPE_CHECKING, Any, List, Optional, Union
+
+from typing_extensions import TypedDict

 from litellm.caching.caching import DualCache
 from litellm.caching.in_memory_cache import InMemoryCache
--- a/litellm/types/caching.py
+++ b/litellm/types/caching.py
@ -1,7 +1,8 @@
 from enum import Enum
-from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
+from typing import Any, Dict, List, Literal, Optional, Union

 from pydantic import BaseModel
+from typing_extensions import TypedDict


 class LiteLLMCacheType(str, Enum):
--- a/litellm/types/guardrails.py
+++ b/litellm/types/guardrails.py
@ -1,14 +1,10 @@
 from datetime import datetime
 from enum import Enum
-from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
+from typing import Any, Dict, List, Literal, Optional, Union

-from pydantic import BaseModel, ConfigDict, Field, SecretStr
+from pydantic import BaseModel, ConfigDict, Field
 from typing_extensions import Required, TypedDict

-from litellm.types.proxy.guardrails.guardrail_hooks.openai.openai_moderation import (
-    OpenAIModerationGuardrailConfigModel,
-)
-
 """
 Pydantic object defining how to set guardrails on litellm proxy

@ -41,6 +37,9 @@ class SupportedGuardrailIntegrations(Enum):
    MODEL_ARMOR = "model_armor"
    OPENAI_MODERATION = "openai_moderation"
    NOMA = "noma"
+    TOOL_PERMISSION = "tool_permission"
+
+

 class Role(Enum):
    SYSTEM = "system"
@ -312,7 +311,6 @@ class BedrockGuardrailConfigModel(BaseModel):
    )


-
 class LakeraV2GuardrailConfigModel(BaseModel):
    """Configuration parameters for the Lakera AI v2 guardrail"""

@ -375,6 +373,22 @@ class NomaGuardrailConfigModel(BaseModel):
        default=None,
        description="If True, blocks requests on API failures. Defaults to True if not provided",
    )
+    anonymize_input: Optional[bool] = Field(
+        default=None,
+        description="If True, replaces sensitive content with anonymized version when only PII/PCI/secrets are detected. Only applies in blocking mode. Defaults to False if not provided",
+    )
+
+
+class ToolPermissionGuardrailConfigModel(BaseModel):
+    """Configuration parameters for the Tool Permission guardrail"""
+
+    rules: Optional[List[Dict]] = Field(
+        default=None, description="List of permission rules for tool usage"
+    )
+    default_action: Optional[str] = Field(
+        default="Deny",
+        description="Default action when no rule matches (Allow or Deny)",
+    )


 class BaseLitellmParams(BaseModel):  # works for new and patch update guardrails
@ -425,7 +439,8 @@ class BaseLitellmParams(BaseModel):  # works for new and patch update guardrails
    )

    model: Optional[str] = Field(
-        default=None, description="Optional field if guardrail requires a 'model' parameter"
+        default=None,
+        description="Optional field if guardrail requires a 'model' parameter",
    )

    # Model Armor params
@ -446,7 +461,7 @@ class BaseLitellmParams(BaseModel):  # works for new and patch update guardrails
        default=True,
        description="Whether to fail the request if Model Armor encounters an error",
    )
-    
+
    model_config = ConfigDict(extra="allow", protected_namespaces=())


@ -464,6 +479,7 @@ class LitellmParams(
    LassoGuardrailConfigModel,
    PillarGuardrailConfigModel,
    NomaGuardrailConfigModel,
+    ToolPermissionGuardrailConfigModel,
    BaseLitellmParams,
 ):
    guardrail: str = Field(description="The type of guardrail integration to use")
--- a/Show More
+++ b/Show More
				`@ -1 +0,0 @@`
				`(self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).push([[185],{6580:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=6580)}),_N_E=n.O()}]);`
				`@ -0,0 +1 @@`
				`(self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).push([[185],{96443:function(n,e,t){Promise.resolve().then(t.t.bind(t,39974,23)),Promise.resolve().then(t.t.bind(t,2778,23))},2778:function(){},39974:function(n){n.exports={style:{fontFamily:"'__Inter_b0dd8a', '__Inter_Fallback_b0dd8a'",fontStyle:"normal"},className:"__className_b0dd8a"}}},function(n){n.O(0,[919,986,971,117,744],function(){return n(n.s=96443)}),_N_E=n.O()}]);`