* build: migrate packaging metadata to uv * ci: move automation and local tooling to uv * docker: migrate image builds and runtime setup to uv * docs: update install and deployment guidance for uv * chore: align auxiliary scripts and tests with uv * test: harden test_litellm isolation * fix: keep release and health check images self-contained * build: pin uv tooling and health check deps * test: isolate bedrock image request formatting from suite state * test: cover sandbox executor requirements flow * ci: fix circleci no-op command steps * ci: fix circleci publish workflow parsing * fix: stabilize remaining uv migration CI checks * ci: increase matrix test timeout headroom * fix: restore published docker and license coverage * fix: restore proxy runtime build parity * fix: restore proxy extras parity and venv migrations * ci: persist uv path across circleci steps * fix: keep psycopg binary in default test env * docker: preserve prisma cache across stages * test: run local proxy checks through uv python * build: restore runtime deps moved into ci * build: refresh uv lock after upstream merge * fix: restore module import in test_check_migration after merge The conflict resolution imported only the function but the test body references check_migration as a module throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching - Move google-generativeai, Pillow, tenacity back to ci group (they are lazily imported and bloat the base SDK install needlessly) - Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant in Docker where system Node.js is already installed via apk) - Remove all nodejs-wheel node replacement and venv npm patching blocks from Dockerfiles since the wheel is no longer installed - Add --no-default-groups to CodSpeed benchmark workflow so the benchmark environment matches the old minimal pip install footprint - Apply standard uv two-phase Docker pattern: copy metadata first, install deps (cached layer), then copy source and install project - Replace CircleCI enterprise no-op with proper uv sync command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate uv.lock after removing nodejs-wheel-binaries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): use cache/restore instead of cache to prevent cache poisoning The old workflow used actions/cache/restore (read-only). The uv migration changed it to actions/cache (read-write), which zizmor flags as a cache poisoning risk. Restore the safer read-only variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert The setup-uv action enables caching by default, which zizmor flags as a cache poisoning risk. Disable it since we already use a read-only cache/restore step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv cache in publish workflow Silences zizmor cache-poisoning alert. Publishing workflow runs infrequently on protected branches so caching adds no real benefit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(test): remove duplicate verbose_logger mock in test_check_migration The logger was patched twice — first via mocker.patch() then via mocker.patch.object(autospec=True). The second call fails because autospec cannot inspect an already-mocked attribute. Remove the redundant first patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): free disk space before Docker build in test-server-root-path The Dockerfile.non_root build ran out of disk on the CI runner. Remove Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 KiB
10 KiB
Bedrock Realtime API
Overview
Amazon Bedrock's Nova Sonic model supports real-time bidirectional audio streaming for voice conversations. This tutorial shows how to use it through LiteLLM Proxy.
Setup
1. Configure LiteLLM Proxy
Create a config.yaml file:
model_list:
- model_name: "bedrock-sonic"
litellm_params:
model: bedrock/amazon.nova-sonic-v1:0
aws_region_name: us-east-1 # or your preferred region
model_info:
mode: realtime
2. Start LiteLLM Proxy
litellm --config config.yaml
Basic Text Interaction
import asyncio
import websockets
import json
LITELLM_API_KEY = "sk-1234" # Your LiteLLM API key
LITELLM_URL = 'ws://localhost:4000/v1/realtime?model=bedrock-sonic'
async def test_text_conversation():
async with websockets.connect(
LITELLM_URL,
additional_headers={
"Authorization": f"Bearer {LITELLM_API_KEY}"
}
) as ws:
# Wait for session.created
response = await ws.recv()
print(f"Connected: {json.loads(response)['type']}")
# Configure session
session_update = {
"type": "session.update",
"session": {
"instructions": "You are a helpful assistant.",
"modalities": ["text"],
"temperature": 0.8
}
}
await ws.send(json.dumps(session_update))
# Send a message
message = {
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
}
}
await ws.send(json.dumps(message))
# Trigger response
await ws.send(json.dumps({"type": "response.create"}))
# Listen for response
while True:
response = await ws.recv()
event = json.loads(response)
if event['type'] == 'response.text.delta':
print(event['delta'], end='', flush=True)
elif event['type'] == 'response.done':
print("\n✓ Complete")
break
if __name__ == "__main__":
asyncio.run(test_text_conversation())
Audio Streaming with Voice Conversation
import asyncio
import websockets
import json
import base64
import pyaudio
LITELLM_API_KEY = "sk-1234"
LITELLM_URL = 'ws://localhost:4000/v1/realtime?model=bedrock-sonic'
# Audio configuration
INPUT_RATE = 16000 # Nova Sonic expects 16kHz input
OUTPUT_RATE = 24000 # Nova Sonic outputs 24kHz
CHUNK = 1024
async def audio_conversation():
# Initialize PyAudio
p = pyaudio.PyAudio()
# Input stream (microphone)
input_stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=INPUT_RATE,
input=True,
frames_per_buffer=CHUNK
)
# Output stream (speakers)
output_stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=OUTPUT_RATE,
output=True,
frames_per_buffer=CHUNK
)
async with websockets.connect(
LITELLM_URL,
additional_headers={"Authorization": f"Bearer {LITELLM_API_KEY}"}
) as ws:
# Wait for session.created
await ws.recv()
print("✓ Connected")
# Configure session with audio
session_update = {
"type": "session.update",
"session": {
"instructions": "You are a friendly voice assistant.",
"modalities": ["text", "audio"],
"voice": "matthew",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16"
}
}
await ws.send(json.dumps(session_update))
print("🎤 Speak into your microphone...")
async def send_audio():
"""Capture and send audio from microphone"""
while True:
audio_data = input_stream.read(CHUNK, exception_on_overflow=False)
audio_b64 = base64.b64encode(audio_data).decode('utf-8')
await ws.send(json.dumps({
"type": "input_audio_buffer.append",
"audio": audio_b64
}))
await asyncio.sleep(0.01)
async def receive_audio():
"""Receive and play audio responses"""
while True:
response = await ws.recv()
event = json.loads(response)
if event['type'] == 'response.audio.delta':
audio_b64 = event.get('delta', '')
if audio_b64:
audio_bytes = base64.b64decode(audio_b64)
output_stream.write(audio_bytes)
elif event['type'] == 'response.text.delta':
print(event['delta'], end='', flush=True)
elif event['type'] == 'response.done':
print("\n✓ Response complete")
# Run both tasks concurrently
await asyncio.gather(send_audio(), receive_audio())
if __name__ == "__main__":
try:
asyncio.run(audio_conversation())
except KeyboardInterrupt:
print("\n\nGoodbye!")
Using Tools/Function Calling
import asyncio
import websockets
import json
from datetime import datetime
LITELLM_API_KEY = "sk-1234"
LITELLM_URL = 'ws://localhost:4000/v1/realtime?model=bedrock-sonic'
# Define tools
TOOLS = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
]
def get_weather(location: str) -> dict:
"""Simulated weather function"""
return {
"location": location,
"temperature": 72,
"conditions": "sunny"
}
async def conversation_with_tools():
async with websockets.connect(
LITELLM_URL,
additional_headers={"Authorization": f"Bearer {LITELLM_API_KEY}"}
) as ws:
# Wait for session.created
await ws.recv()
# Configure session with tools
session_update = {
"type": "session.update",
"session": {
"instructions": "You are a helpful assistant with access to tools.",
"modalities": ["text"],
"tools": TOOLS
}
}
await ws.send(json.dumps(session_update))
# Send a message that requires a tool
message = {
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "What's the weather in San Francisco?"}]
}
}
await ws.send(json.dumps(message))
await ws.send(json.dumps({"type": "response.create"}))
# Handle responses and tool calls
while True:
response = await ws.recv()
event = json.loads(response)
if event['type'] == 'response.text.delta':
print(event['delta'], end='', flush=True)
elif event['type'] == 'response.function_call_arguments.done':
# Execute the tool
function_name = event['name']
arguments = json.loads(event['arguments'])
print(f"\n🔧 Calling {function_name}({arguments})")
result = get_weather(**arguments)
# Send tool result back
tool_result = {
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": event['call_id'],
"output": json.dumps(result)
}
}
await ws.send(json.dumps(tool_result))
await ws.send(json.dumps({"type": "response.create"}))
elif event['type'] == 'response.done':
print("\n✓ Complete")
break
if __name__ == "__main__":
asyncio.run(conversation_with_tools())
Configuration Options
Voice Options
Available voices: matthew, joanna, ruth, stephen, gregory, amy
Audio Formats
- Input: 16kHz PCM16 (mono)
- Output: 24kHz PCM16 (mono)
Modalities
["text"]- Text only["audio"]- Audio only["text", "audio"]- Both text and audio
Example Test Scripts
Complete working examples are available in the LiteLLM repository:
- Basic audio streaming:
test_bedrock_realtime_client.py - Simple text test:
test_bedrock_realtime_simple.py - Tool calling:
test_bedrock_realtime_tools.py
Requirements
uv add litellm websockets pyaudio
AWS Configuration
Ensure your AWS credentials are configured:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION_NAME=us-east-1
Or use AWS CLI configuration:
aws configure
Troubleshooting
Connection Issues
- Ensure LiteLLM proxy is running on the correct port
- Verify AWS credentials are properly configured
- Check that the Bedrock model is available in your region
Audio Issues
- Verify PyAudio is properly installed
- Check microphone/speaker permissions
- Ensure correct sample rates (16kHz input, 24kHz output)
Tool Calling Issues
- Ensure tools are properly defined in session.update
- Verify tool results are sent back with correct call_id
- Check that response.create is sent after tool result