feat: final improvements for prompt management api

This commit is contained in:
Krrish Dholakia 2025-12-12 11:55:20 -08:00
parent 06a4fd3ef1
commit 26cd194d97
6 changed files with 1323 additions and 1 deletions

View File

@ -0,0 +1,293 @@
# Mock Prompt Management Server
A reference implementation of the [LiteLLM Generic Prompt Management API](https://docs.litellm.ai/docs/adding_provider/generic_prompt_management_api).
This FastAPI server demonstrates how to build a prompt management API that integrates with LiteLLM without requiring a PR to the LiteLLM repository.
## Quick Start
### 1. Install Dependencies
```bash
pip install fastapi uvicorn pydantic
```
### 2. Start the Server
```bash
python mock_prompt_management_server.py
```
The server will start on `http://localhost:8080`
### 3. Test the Endpoint
```bash
# Get a prompt
curl "http://localhost:8080/beta/litellm_prompt_management?prompt_id=hello-world-prompt"
# Get a prompt with authentication
curl "http://localhost:8080/beta/litellm_prompt_management?prompt_id=hello-world-prompt" \
-H "Authorization: Bearer test-token-12345"
# List all prompts
curl "http://localhost:8080/prompts"
# Get prompt variables
curl "http://localhost:8080/prompts/hello-world-prompt/variables"
```
## Using with LiteLLM
### Configuration
Create a `config.yaml` file:
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
prompts:
- prompt_id: "hello-world-prompt"
litellm_params:
prompt_integration: "generic_prompt_management"
api_base: http://localhost:8080
api_key: test-token-12345
```
### Start LiteLLM Proxy
```bash
litellm --config config.yaml
```
### Make a Request
```bash
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"prompt_id": "hello-world-prompt",
"prompt_variables": {
"domain": "data science",
"task": "analyzing customer behavior"
},
"messages": [
{"role": "user", "content": "Please help me get started"}
]
}'
```
## Available Prompts
The server includes several example prompts:
| Prompt ID | Description | Variables |
|-----------|-------------|-----------|
| `hello-world-prompt` | Basic helpful assistant | `domain`, `task` |
| `code-review-prompt` | Code review assistant | `years_experience`, `language`, `code` |
| `customer-support-prompt` | Customer support agent | `company_name`, `customer_message` |
| `data-analysis-prompt` | Data analysis expert | `analysis_type`, `dataset_name`, `data` |
| `creative-writing-prompt` | Creative writing assistant | `genre`, `length`, `topic` |
## Authentication
The server supports optional Bearer token authentication. Valid tokens for testing:
- `test-token-12345`
- `dev-token-67890`
- `prod-token-abcdef`
If no `Authorization` header is provided, requests are allowed (for testing purposes).
## API Endpoints
### LiteLLM Spec Endpoints
#### `GET /beta/litellm_prompt_management`
Get a prompt by ID (required by LiteLLM).
**Query Parameters:**
- `prompt_id` (required): The prompt ID
- `project_name` (optional): Project filter
- `slug` (optional): Slug filter
- `version` (optional): Version filter
**Response:**
```json
{
"prompt_id": "hello-world-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a helpful assistant specialized in {domain}."
},
{
"role": "user",
"content": "Help me with: {task}"
}
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.7,
"max_tokens": 500
}
}
```
### Convenience Endpoints (Not in LiteLLM Spec)
#### `GET /health`
Health check endpoint.
#### `GET /prompts`
List all available prompts.
#### `GET /prompts/{prompt_id}/variables`
Get all variables used in a prompt template.
#### `POST /prompts`
Create a new prompt (in-memory only, for testing).
## Example: Full Integration Test
### 1. Start the Mock Server
```bash
python mock_prompt_management_server.py
```
### 2. Test with Python
```python
from litellm import completion
# The completion will:
# 1. Fetch the prompt from your API
# 2. Replace {domain} with "machine learning"
# 3. Replace {task} with "building a recommendation system"
# 4. Merge with your messages
# 5. Use the model and params from the prompt
response = completion(
model="gpt-4",
prompt_id="hello-world-prompt",
prompt_variables={
"domain": "machine learning",
"task": "building a recommendation system"
},
messages=[
{"role": "user", "content": "I have user behavior data from the past year."}
],
# Configure the generic prompt manager
generic_prompt_config={
"api_base": "http://localhost:8080",
"api_key": "test-token-12345",
}
)
print(response.choices[0].message.content)
```
## Customization
### Adding New Prompts
Edit the `PROMPTS_DB` dictionary in `mock_prompt_management_server.py`:
```python
PROMPTS_DB = {
"my-custom-prompt": {
"prompt_id": "my-custom-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a {role}."
},
{
"role": "user",
"content": "{user_input}"
}
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.8,
"max_tokens": 1000
}
}
}
```
### Using a Database
Replace the `PROMPTS_DB` dictionary with database queries:
```python
@app.get("/beta/litellm_prompt_management")
async def get_prompt(prompt_id: str):
# Fetch from database
prompt = await db.prompts.find_one({"prompt_id": prompt_id})
if not prompt:
raise HTTPException(status_code=404, detail="Prompt not found")
return PromptResponse(**prompt)
```
### Adding Access Control
Use the custom query parameters for access control:
```python
@app.get("/beta/litellm_prompt_management")
async def get_prompt(
prompt_id: str,
project_name: Optional[str] = None,
user_id: Optional[str] = None,
authorization: Optional[str] = Header(None)
):
token = verify_api_key(authorization)
# Check if user has access to this project
if not has_project_access(token, project_name):
raise HTTPException(status_code=403, detail="Access denied")
# Fetch and return prompt
...
```
## Production Considerations
Before deploying to production:
1. **Use a real database** instead of in-memory storage
2. **Implement proper authentication** with JWT tokens or API keys
3. **Add rate limiting** to prevent abuse
4. **Use HTTPS** for encrypted communication
5. **Add logging and monitoring** for observability
6. **Implement caching** for frequently accessed prompts
7. **Add versioning** for prompt management
8. **Implement access control** based on teams/users
9. **Add input validation** for all parameters
10. **Use environment variables** for configuration
## Related Documentation
- [Generic Prompt Management API Documentation](https://docs.litellm.ai/docs/adding_provider/generic_prompt_management_api)
- [LiteLLM Prompt Management](https://docs.litellm.ai/docs/proxy/prompt_management)
- [Generic Guardrail API](https://docs.litellm.ai/docs/adding_provider/generic_guardrail_api)
## Questions?
This is a reference implementation for the LiteLLM Generic Prompt Management API. For questions or issues, please open an issue on the [LiteLLM GitHub repository](https://github.com/BerriAI/litellm).

View File

@ -0,0 +1,390 @@
#!/usr/bin/env python3
"""
Mock Prompt Management API Server
This is a FastAPI server that implements the LiteLLM Generic Prompt Management API
for testing and demonstration purposes.
Usage:
python mock_prompt_management_server.py
The server will start on http://localhost:8080
Test the endpoint:
curl "http://localhost:8080/beta/litellm_prompt_management?prompt_id=hello-world-prompt"
"""
import os
import json
from typing import Any, Dict, List, Optional
from fastapi import FastAPI, HTTPException, Header, Query
from fastapi.responses import JSONResponse
from pydantic import BaseModel, Field
# ============================================================================
# Response Models
# ============================================================================
class MessageContent(BaseModel):
"""A single message in the prompt template"""
role: str = Field(..., description="Message role (system, user, assistant)")
content: str = Field(
..., description="Message content with optional {variable} placeholders"
)
class PromptResponse(BaseModel):
"""Response format for the prompt management API"""
prompt_id: str = Field(..., description="The ID of the prompt")
prompt_template: List[MessageContent] = Field(
..., description="Array of messages in OpenAI format"
)
prompt_template_model: Optional[str] = Field(
None, description="Optional model to use for this prompt"
)
prompt_template_optional_params: Optional[Dict[str, Any]] = Field(
None, description="Optional parameters like temperature, max_tokens, etc."
)
# ============================================================================
# Mock Prompt Database
# ============================================================================
PROMPTS_DB = {
"hello-world-prompt": {
"prompt_id": "hello-world-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a helpful assistant specialized in {domain}.",
},
{"role": "user", "content": "Help me with: {task}"},
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {"temperature": 0.7, "max_tokens": 500},
},
"code-review-prompt": {
"prompt_id": "code-review-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are an expert code reviewer with {years_experience} years of experience in {language}.",
},
{
"role": "user",
"content": "Please review the following code for bugs, security issues, and best practices:\n\n{code}",
},
],
"prompt_template_model": "gpt-4-turbo",
"prompt_template_optional_params": {
"temperature": 0.3,
"max_tokens": 1500,
},
},
"customer-support-prompt": {
"prompt_id": "customer-support-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a friendly customer support agent for {company_name}. Always be professional, empathetic, and solution-oriented.",
},
{
"role": "user",
"content": "Customer inquiry: {customer_message}",
},
],
"prompt_template_model": "gpt-3.5-turbo",
"prompt_template_optional_params": {
"temperature": 0.8,
"max_tokens": 800,
"top_p": 0.9,
},
},
"data-analysis-prompt": {
"prompt_id": "data-analysis-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a data scientist expert in {analysis_type} analysis.",
},
{
"role": "user",
"content": "Analyze the following data and provide insights:\n\nDataset: {dataset_name}\nData: {data}",
},
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.5,
"max_tokens": 2000,
},
},
"creative-writing-prompt": {
"prompt_id": "creative-writing-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a creative writer specializing in {genre} fiction.",
},
{
"role": "user",
"content": "Write a {length} story about: {topic}",
},
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.9,
"max_tokens": 3000,
"top_p": 0.95,
},
},
}
# Valid API tokens for authentication (in production, use a secure token store)
VALID_API_TOKENS = {
"test-token-12345",
"dev-token-67890",
"prod-token-abcdef",
}
# ============================================================================
# FastAPI App
# ============================================================================
app = FastAPI(
title="Mock Prompt Management API",
description="A mock server implementing the LiteLLM Generic Prompt Management API",
version="1.0.0",
)
def verify_api_key(authorization: Optional[str] = Header(None)) -> bool:
"""
Verify the API key from the Authorization header.
Args:
authorization: Authorization header (Bearer token)
Returns:
True if valid, raises HTTPException if invalid
"""
if authorization is None:
# Allow requests without authentication for testing
return True
# Extract token from "Bearer <token>"
if not authorization.startswith("Bearer "):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid authorization header format. Expected 'Bearer <token>'",
)
token = authorization.replace("Bearer ", "").strip()
if token not in VALID_API_TOKENS:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid API key",
)
return True
@app.get("/beta/litellm_prompt_management", response_model=PromptResponse)
async def get_prompt(
prompt_id: str = Query(..., description="The ID of the prompt to fetch"),
project_name: Optional[str] = Query(
None, description="Optional project name filter"
),
slug: Optional[str] = Query(None, description="Optional slug filter"),
version: Optional[str] = Query(None, description="Optional version filter"),
authorization: Optional[str] = Header(None),
) -> PromptResponse:
"""
Get a prompt by ID with optional filtering.
This endpoint implements the LiteLLM Generic Prompt Management API specification.
Args:
prompt_id: The ID of the prompt to fetch
project_name: Optional project name for filtering
slug: Optional slug for filtering
version: Optional version for filtering
authorization: Optional Bearer token for authentication
Returns:
PromptResponse with the prompt template and configuration
Raises:
HTTPException: 401 if authentication fails, 404 if prompt not found
"""
# Verify authentication
verify_api_key(authorization)
# Log the request parameters (useful for debugging)
print(f"Fetching prompt: {prompt_id}")
if project_name:
print(f" Project: {project_name}")
if slug:
print(f" Slug: {slug}")
if version:
print(f" Version: {version}")
# Check if prompt exists
if prompt_id not in PROMPTS_DB:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Prompt '{prompt_id}' not found. Available prompts: {list(PROMPTS_DB.keys())}",
)
# Get the prompt from the database
prompt_data = PROMPTS_DB[prompt_id]
# Optional: Apply filtering based on project_name, slug, or version
# In a real implementation, you might use these to filter prompts by access control
# or to fetch specific versions from your database
return PromptResponse(**prompt_data)
@app.get("/health")
async def health_check():
"""Health check endpoint"""
return {
"status": "healthy",
"service": "mock-prompt-management-api",
"version": "1.0.0",
}
@app.get("/prompts")
async def list_prompts(authorization: Optional[str] = Header(None)):
"""
List all available prompts.
This is a convenience endpoint (not part of the LiteLLM spec) for
discovering available prompts.
"""
# Verify authentication
verify_api_key(authorization)
prompts_list = [
{
"prompt_id": pid,
"model": p.get("prompt_template_model"),
"has_variables": any(
"{" in msg.get("content", "") for msg in p.get("prompt_template", [])
),
}
for pid, p in PROMPTS_DB.items()
]
return {"prompts": prompts_list, "total": len(prompts_list)}
@app.get("/prompts/{prompt_id}/variables")
async def get_prompt_variables(
prompt_id: str, authorization: Optional[str] = Header(None)
):
"""
Get all variables in a prompt template.
This is a convenience endpoint (not part of the LiteLLM spec) for
discovering what variables a prompt expects.
"""
# Verify authentication
verify_api_key(authorization)
if prompt_id not in PROMPTS_DB:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Prompt '{prompt_id}' not found",
)
prompt_data = PROMPTS_DB[prompt_id]
variables = set()
# Extract variables from the prompt template
import re
for message in prompt_data["prompt_template"]:
content = message.get("content", "")
# Find all {variable} patterns
found_vars = re.findall(r"\{(\w+)\}", content)
variables.update(found_vars)
return {
"prompt_id": prompt_id,
"variables": sorted(list(variables)),
"example_usage": {
"prompt_id": prompt_id,
"prompt_variables": {var: f"<{var}_value>" for var in variables},
},
}
@app.post("/prompts")
async def create_prompt(
prompt: PromptResponse, authorization: Optional[str] = Header(None)
):
"""
Create a new prompt (convenience endpoint for testing).
This is NOT part of the LiteLLM spec - it's just for testing purposes.
"""
# Verify authentication
verify_api_key(authorization)
if prompt.prompt_id in PROMPTS_DB:
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail=f"Prompt '{prompt.prompt_id}' already exists",
)
PROMPTS_DB[prompt.prompt_id] = prompt.dict()
return {
"status": "created",
"prompt_id": prompt.prompt_id,
"message": "Prompt created successfully (in-memory only)",
}
# ============================================================================
# Main
# ============================================================================
if __name__ == "__main__":
import uvicorn
print("=" * 70)
print("Mock Prompt Management API Server")
print("=" * 70)
print(f"\nStarting server on http://localhost:8080")
print(f"\nAvailable prompts: {len(PROMPTS_DB)}")
for prompt_id in PROMPTS_DB.keys():
print(f" - {prompt_id}")
print(f"\nValid API tokens: {len(VALID_API_TOKENS)}")
print(" - test-token-12345")
print(" - dev-token-67890")
print(" - prod-token-abcdef")
print("\nEndpoints:")
print(" GET /beta/litellm_prompt_management?prompt_id=<id> (LiteLLM spec)")
print(" GET /health (health check)")
print(" GET /prompts (list all prompts)")
print(
" GET /prompts/{id}/variables (get prompt variables)"
)
print(" POST /prompts (create prompt)")
print("\nExample usage:")
print(
' curl "http://localhost:8080/beta/litellm_prompt_management?prompt_id=hello-world-prompt"'
)
print("\nPress CTRL+C to stop the server")
print("=" * 70)
uvicorn.run(app, host="0.0.0.0", port=8080, log_level="info")

View File

@ -0,0 +1,576 @@
# [BETA] Generic Prompt Management API - Integrate Without a PR
## The Problem
As a prompt management provider, integrating with LiteLLM traditionally requires:
- Making a PR to the LiteLLM repository
- Waiting for review and merge
- Maintaining provider-specific code in LiteLLM's codebase
- Updating the integration for changes to your API
## The Solution
The **Generic Prompt Management API** lets you integrate with LiteLLM **instantly** by implementing a simple API endpoint. No PR required.
### Key Benefits
1. **No PR Needed** - Deploy and integrate immediately
3. **Simple Contract** - One GET endpoint, standard JSON response
4. **Variable Substitution** - Support for prompt variables with `{variable}` syntax
5. **Custom Parameters** - Pass provider-specific query params via config
6. **Full Control** - You own and maintain your prompt management API
7. **Model & Parameters Override** - Optionally override model and parameters from your prompts
## Get Started in 3 Steps
### Step 1: Configure LiteLLM
Add to your `config.yaml`:
```yaml
prompts:
- prompt_id: "simple_prompt"
litellm_params:
prompt_integration: "generic_prompt_management"
api_base: http://localhost:8080
api_key: os.environ/YOUR_API_KEY
```
### Step 2: Implement Your API Endpoint
```python
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
@app.get("/beta/litellm_prompt_management")
async def get_prompt(prompt_id: str):
return {
"prompt_id": prompt_id,
"prompt_template": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Help me with {task}"}
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {"temperature": 0.7}
}
```
### Step 3: Use in Your App
```python
from litellm import completion
response = completion(
model="gpt-4",
prompt_id="simple_prompt",
prompt_variables={"task": "data analysis"},
messages=[{"role": "user", "content": "I have sales data"}]
)
```
That's it! LiteLLM fetches your prompt, applies variables, and makes the request
## API Contract
### Endpoint
Implement `GET /beta/litellm_prompt_management`
### Request Format
Your endpoint will receive a GET request with query parameters:
```
GET /beta/litellm_prompt_management?prompt_id={prompt_id}&{custom_params}
```
**Query Parameters:**
- `prompt_id` (required): The ID of the prompt to fetch
- Custom parameters: Any additional parameters you configured in `provider_specific_query_params`
**Example:**
```
GET /beta/litellm_prompt_management?prompt_id=hello-world-prompt-2bac&project_name=litellm&slug=hello-world-prompt-2bac
```
### Response Format
```json
{
"prompt_id": "hello-world-prompt-2bac",
"prompt_template": [
{
"role": "system",
"content": "You are a helpful assistant specialized in {domain}."
},
{
"role": "user",
"content": "Help me with {task}"
}
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.7,
"max_tokens": 500,
"top_p": 0.9
}
}
```
**Response Fields:**
- `prompt_id` (string, required): The ID of the prompt
- `prompt_template` (array, required): Array of OpenAI-format messages with optional `{variable}` placeholders
- `prompt_template_model` (string, optional): Model to use for this prompt (overrides client model unless `ignore_prompt_manager_model: true`)
- `prompt_template_optional_params` (object, optional): Additional parameters like temperature, max_tokens, etc. (merged with client params unless `ignore_prompt_manager_optional_params: true`)
## LiteLLM Configuration
Add to `config.yaml`:
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
prompts:
- prompt_id: "simple_prompt"
litellm_params:
prompt_integration: "generic_prompt_management"
provider_specific_query_params:
project_name: litellm
slug: hello-world-prompt-2bac
api_base: http://localhost:8080
api_key: os.environ/YOUR_PROMPT_API_KEY # optional
ignore_prompt_manager_model: true # optional, keep client's model
ignore_prompt_manager_optional_params: true # optional, don't merge prompt manager's params (e.g. temperature, max_tokens, etc.)
```
### Configuration Parameters
- `prompt_integration`: Must be `"generic_prompt_management"`
- `provider_specific_query_params`: Custom query parameters sent to your API (optional)
- `api_base`: Base URL of your prompt management API
- `api_key`: Optional API key for authentication (sent as `Bearer` token)
- `ignore_prompt_manager_model`: If `true`, use the model specified by client instead of prompt's model (default: `false`)
- `ignore_prompt_manager_optional_params`: If `true`, don't merge prompt's optional params with client params (default: `false`)
## Usage
### Using with LiteLLM SDK
**Basic usage with prompt ID:**
```python
from litellm import completion
response = completion(
model="gpt-4",
prompt_id="simple_prompt",
messages=[{"role": "user", "content": "Additional message"}]
)
```
**With prompt variables:**
```python
response = completion(
model="gpt-4",
prompt_id="simple_prompt",
prompt_variables={
"domain": "data science",
"task": "analyzing customer churn"
},
messages=[{"role": "user", "content": "Please provide a detailed analysis"}]
)
```
The prompt template will have `{domain}` replaced with "data science" and `{task}` replaced with "analyzing customer churn".
### Using with LiteLLM Proxy
**1. Start the proxy with your config:**
```bash
litellm --config /path/to/config.yaml
```
**2. Make requests with prompt_id:**
```bash
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-4",
"prompt_id": "simple_prompt",
"prompt_variables": {
"domain": "healthcare",
"task": "patient risk assessment"
},
"messages": [
{"role": "user", "content": "Analyze the following data..."}
]
}'
```
**3. Using with OpenAI SDK:**
```python
from openai import OpenAI
client = OpenAI(
base_url="http://0.0.0.0:4000",
api_key="sk-1234"
)
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Analyze the data"}
],
extra_body={
"prompt_id": "simple_prompt",
"prompt_variables": {
"domain": "finance",
"task": "fraud detection"
}
}
)
```
## Implementation Example
See [mock_prompt_management_server.py](https://github.com/BerriAI/litellm/blob/main/cookbook/mock_prompt_management_server/mock_prompt_management_server.py) for a complete reference implementation with multiple example prompts, authentication, and convenience endpoints.
**Minimal FastAPI example:**
```python
from fastapi import FastAPI, HTTPException, Header
from typing import Optional, Dict, Any, List
from pydantic import BaseModel
app = FastAPI()
# In-memory prompt storage (replace with your database)
PROMPTS = {
"hello-world-prompt": {
"prompt_id": "hello-world-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a helpful assistant specialized in {domain}."
},
{
"role": "user",
"content": "Help me with: {task}"
}
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.7,
"max_tokens": 500
}
},
"code-review-prompt": {
"prompt_id": "code-review-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are an expert code reviewer. Review code for {language}."
},
{
"role": "user",
"content": "Review the following code:\n\n{code}"
}
],
"prompt_template_model": "gpt-4-turbo",
"prompt_template_optional_params": {
"temperature": 0.3,
"max_tokens": 1000
}
}
}
class PromptResponse(BaseModel):
prompt_id: str
prompt_template: List[Dict[str, str]]
prompt_template_model: Optional[str] = None
prompt_template_optional_params: Optional[Dict[str, Any]] = None
@app.get("/beta/litellm_prompt_management", response_model=PromptResponse)
async def get_prompt(
prompt_id: str,
authorization: Optional[str] = Header(None),
project_name: Optional[str] = None,
slug: Optional[str] = None,
):
"""
Get a prompt by ID with optional filtering by project_name and slug.
Args:
prompt_id: The ID of the prompt to fetch
authorization: Optional Bearer token for authentication
project_name: Optional project name filter
slug: Optional slug filter
"""
# Optional: Validate authorization
if authorization:
token = authorization.replace("Bearer ", "")
# Validate your token here
if not is_valid_token(token):
raise HTTPException(status_code=401, detail="Invalid API key")
# Optional: Apply additional filtering based on custom params
if project_name or slug:
# You can use these parameters to filter or validate access
# For example, check if the user has access to this project
pass
# Fetch the prompt from your storage
if prompt_id not in PROMPTS:
raise HTTPException(
status_code=404,
detail=f"Prompt '{prompt_id}' not found"
)
prompt_data = PROMPTS[prompt_id]
return PromptResponse(**prompt_data)
def is_valid_token(token: str) -> bool:
"""Validate API token - implement your logic here"""
# Example: Check against your database or secret store
valid_tokens = ["your-secret-token", "another-valid-token"]
return token in valid_tokens
# Optional: Health check endpoint
@app.get("/health")
async def health_check():
return {"status": "healthy"}
# Optional: List all prompts endpoint
@app.get("/prompts")
async def list_prompts(authorization: Optional[str] = Header(None)):
"""List all available prompts"""
if authorization:
token = authorization.replace("Bearer ", "")
if not is_valid_token(token):
raise HTTPException(status_code=401, detail="Invalid API key")
return {
"prompts": [
{"prompt_id": pid, "model": p.get("prompt_template_model")}
for pid, p in PROMPTS.items()
]
}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8080)
```
### Running the Example Server
1. Install dependencies:
```bash
pip install fastapi uvicorn
```
2. Save the code above to `prompt_server.py`
3. Run the server:
```bash
python prompt_server.py
```
4. Test the endpoint:
```bash
curl "http://localhost:8080/beta/litellm_prompt_management?prompt_id=hello-world-prompt&project_name=litellm&slug=hello-world-prompt-2bac"
```
Expected response:
```json
{
"prompt_id": "hello-world-prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a helpful assistant specialized in {domain}."
},
{
"role": "user",
"content": "Help me with: {task}"
}
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.7,
"max_tokens": 500
}
}
```
## Advanced Features
### Variable Substitution
LiteLLM automatically substitutes variables in your prompt templates using the `{variable}` syntax. Both `{variable}` and `{{variable}}` formats are supported.
**Example prompt template:**
```json
{
"prompt_template": [
{
"role": "system",
"content": "You are an expert in {domain} with {years} years of experience."
}
]
}
```
**Client request:**
```python
completion(
model="gpt-4",
prompt_id="expert_prompt",
prompt_variables={
"domain": "machine learning",
"years": "10"
}
)
```
**Result:**
```
"You are an expert in machine learning with 10 years of experience."
```
### Caching
LiteLLM automatically caches fetched prompts in memory. The cache key includes:
- `prompt_id`
- `prompt_label` (if provided)
- `prompt_version` (if provided)
This means your API endpoint is only called once per unique prompt configuration.
### Model Override Behavior
**Default behavior (without `ignore_prompt_manager_model`):**
```yaml
prompts:
- prompt_id: "my_prompt"
litellm_params:
prompt_integration: "generic_prompt_management"
api_base: http://localhost:8080
```
If your API returns `"prompt_template_model": "gpt-4"`, LiteLLM will use `gpt-4` regardless of what the client specified.
**With `ignore_prompt_manager_model: true`:**
```yaml
prompts:
- prompt_id: "my_prompt"
litellm_params:
prompt_integration: "generic_prompt_management"
api_base: http://localhost:8080
ignore_prompt_manager_model: true
```
LiteLLM will use the model specified by the client, ignoring the prompt's model.
### Parameter Merging Behavior
**Default behavior (without `ignore_prompt_manager_optional_params`):**
Client params are merged with prompt params, with prompt params taking precedence:
```python
# Prompt returns: {"temperature": 0.7, "max_tokens": 500}
# Client sends: {"temperature": 0.9, "top_p": 0.95}
# Final params: {"temperature": 0.7, "max_tokens": 500, "top_p": 0.95}
```
**With `ignore_prompt_manager_optional_params: true`:**
Only client params are used:
```python
# Prompt returns: {"temperature": 0.7, "max_tokens": 500}
# Client sends: {"temperature": 0.9, "top_p": 0.95}
# Final params: {"temperature": 0.9, "top_p": 0.95}
```
## Security Considerations
1. **Authentication**: Use the `api_key` parameter to secure your prompt management API
2. **Authorization**: Implement team/user-based access control using the custom query parameters
3. **Rate Limiting**: Add rate limiting to prevent abuse of your API
4. **Input Validation**: Validate all query parameters before processing
5. **HTTPS**: Always use HTTPS in production for encrypted communication
6. **Secrets**: Store API keys in environment variables, not in config files
## Use Cases
✅ **Use Generic Prompt Management API when:**
- You want instant integration without waiting for PRs
- You maintain your own prompt management service
- You need full control over prompt versioning and updates
- You want to build custom prompt management features
- You need to integrate with your internal systems
✅ **Common scenarios:**
- Internal prompt management system for your organization
- Multi-tenant prompt management with team-based access control
- A/B testing different prompt versions
- Prompt experimentation and analytics
- Integration with existing prompt engineering workflows
## When to Use This
✅ **Use Generic Prompt Management API when:**
- You want instant integration without waiting for PRs
- You maintain your own prompt management service
- You need full control over updates and features
- You want custom prompt storage and versioning logic
❌ **Make a PR when:**
- You want deeper integration with LiteLLM internals
- Your integration requires complex LiteLLM-specific logic
- You want to be featured as a built-in provider
- You're building a reusable integration for the community
## Troubleshooting
### Prompt not found
- Verify the `prompt_id` matches exactly (case-sensitive)
- Check that your API endpoint is accessible from LiteLLM
- Verify authentication if using `api_key`
### Variables not substituted
- Ensure variables use `{variable}` or `{{variable}}` syntax
- Check that variable names in `prompt_variables` match template exactly
- Variables are case-sensitive
### Model not being overridden
- Check if `ignore_prompt_manager_model: true` is set in config
- Verify your API is returning `prompt_template_model` in the response
### Parameters not being applied
- Check if `ignore_prompt_manager_optional_params: true` is set
- Verify your API is returning `prompt_template_optional_params`
- Ensure parameter names match OpenAI's parameter names
## Questions?
This is a **beta API**. We're actively improving it based on feedback. Open an issue or PR if you need additional capabilities.
## Related Documentation
- [Prompt Management Overview](../proxy/prompt_management.md)
- [Generic Guardrail API](./generic_guardrail_api.md)
- [LiteLLM Proxy Setup](../proxy/quick_start.md)

View File

@ -1185,6 +1185,9 @@ When responding to Computer Use tool calls, include the URL and screenshot:
</TabItem>
</Tabs>
## Thought Signatures
Thought signatures are encrypted representations of the model's internal reasoning process for a given turn in a conversation. By passing thought signatures back to the model in subsequent requests, you provide it with the context of its previous thoughts, allowing it to build upon its reasoning and maintain a coherent line of inquiry.

View File

@ -11,6 +11,7 @@ Run experiments or change the specific model (e.g. from gpt-4o to gpt4o-mini fin
| Native LiteLLM GitOps (.prompt files) | [Get Started](native_litellm_prompt) |
| Langfuse | [Get Started](https://langfuse.com/docs/prompts/get-started) |
| Humanloop | [Get Started](../observability/humanloop) |
| Generic Prompt Management API | [Get Started](../adding_provider/generic_prompt_management_api) |
## Onboarding Prompts via config.yaml
@ -34,7 +35,7 @@ prompts:
- prompt_id: "my_prompt_id"
litellm_params:
prompt_id: "my_prompt_id"
prompt_integration: "dotprompt" # or langfuse, bitbucket, gitlab, custom
prompt_integration: "dotprompt" # or langfuse, bitbucket, gitlab, generic_prompt_management, custom
# integration-specific parameters below
```
@ -46,6 +47,7 @@ The `prompt_integration` field determines where and how prompts are loaded:
- **`langfuse`**: Fetch prompts from Langfuse prompt management
- **`bitbucket`**: Load from BitBucket repository `.prompt` files (team-based access control)
- **`gitlab`**: Load from GitLab repository `.prompt` files (team-based access control)
- **`generic_prompt_management`**: Integrate any prompt management system via a simple API endpoint (no PR required)
- **`custom`**: Use your own custom prompt management implementation
Each integration has its own configuration parameters and access control mechanisms.
@ -207,6 +209,57 @@ System: You are a helpful assistant.
User: {{user_message}}
```
</TabItem>
<TabItem value="generic" label="Generic Prompt Management">
```yaml
prompts:
- prompt_id: "simple_prompt"
litellm_params:
prompt_integration: "generic_prompt_management"
provider_specific_query_params:
project_name: litellm
slug: hello-world-prompt-2bac
api_base: http://localhost:8080
api_key: os.environ/BRAINTRUST_API_KEY
ignore_prompt_manager_model: true # optional
ignore_prompt_manager_optional_params: true # optional
```
**What you need to implement:**
A GET endpoint at `/beta/litellm_prompt_management` that returns:
```json
{
"prompt_id": "simple_prompt",
"prompt_template": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Help me with {task}"
}
],
"prompt_template_model": "gpt-4",
"prompt_template_optional_params": {
"temperature": 0.7,
"max_tokens": 500
}
}
```
**Benefits:**
- No PR required - integrate any prompt management system
- Full control over your prompt storage and versioning
- Support for variable substitution with `{variable}` syntax
- Custom query parameters for filtering and access control
**Learn more:** [Generic Prompt Management API Documentation](../adding_provider/generic_prompt_management_api)
</TabItem>
</Tabs>

View File

@ -96,6 +96,13 @@ const sidebars = {
type: "category",
label: "[Beta] Prompt Management",
items: [
{
type: "category",
label: "Contributing to Prompt Management",
items: [
"adding_provider/generic_prompt_management_api",
]
},
"proxy/litellm_prompt_management",
"proxy/custom_prompt_management",
"proxy/native_litellm_prompt",