litellm/docs/my-website/docs/providers/bedrock_realtime_with_audio.md
stuxf a6c30b30bf
build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00

10 KiB

Bedrock Realtime API

Overview

Amazon Bedrock's Nova Sonic model supports real-time bidirectional audio streaming for voice conversations. This tutorial shows how to use it through LiteLLM Proxy.

Setup

1. Configure LiteLLM Proxy

Create a config.yaml file:

model_list:
  - model_name: "bedrock-sonic"
    litellm_params:
      model: bedrock/amazon.nova-sonic-v1:0
      aws_region_name: us-east-1  # or your preferred region
    model_info:
      mode: realtime

2. Start LiteLLM Proxy

litellm --config config.yaml

Basic Text Interaction

import asyncio
import websockets
import json

LITELLM_API_KEY = "sk-1234"  # Your LiteLLM API key
LITELLM_URL = 'ws://localhost:4000/v1/realtime?model=bedrock-sonic'

async def test_text_conversation():
    async with websockets.connect(
        LITELLM_URL,
        additional_headers={
            "Authorization": f"Bearer {LITELLM_API_KEY}"
        }
    ) as ws:
        # Wait for session.created
        response = await ws.recv()
        print(f"Connected: {json.loads(response)['type']}")
        
        # Configure session
        session_update = {
            "type": "session.update",
            "session": {
                "instructions": "You are a helpful assistant.",
                "modalities": ["text"],
                "temperature": 0.8
            }
        }
        await ws.send(json.dumps(session_update))
        
        # Send a message
        message = {
            "type": "conversation.item.create",
            "item": {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "Hello!"}]
            }
        }
        await ws.send(json.dumps(message))
        
        # Trigger response
        await ws.send(json.dumps({"type": "response.create"}))
        
        # Listen for response
        while True:
            response = await ws.recv()
            event = json.loads(response)
            
            if event['type'] == 'response.text.delta':
                print(event['delta'], end='', flush=True)
            elif event['type'] == 'response.done':
                print("\n✓ Complete")
                break

if __name__ == "__main__":
    asyncio.run(test_text_conversation())

Audio Streaming with Voice Conversation

import asyncio
import websockets
import json
import base64
import pyaudio

LITELLM_API_KEY = "sk-1234"
LITELLM_URL = 'ws://localhost:4000/v1/realtime?model=bedrock-sonic'

# Audio configuration
INPUT_RATE = 16000   # Nova Sonic expects 16kHz input
OUTPUT_RATE = 24000  # Nova Sonic outputs 24kHz
CHUNK = 1024

async def audio_conversation():
    # Initialize PyAudio
    p = pyaudio.PyAudio()
    
    # Input stream (microphone)
    input_stream = p.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=INPUT_RATE,
        input=True,
        frames_per_buffer=CHUNK
    )
    
    # Output stream (speakers)
    output_stream = p.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=OUTPUT_RATE,
        output=True,
        frames_per_buffer=CHUNK
    )
    
    async with websockets.connect(
        LITELLM_URL,
        additional_headers={"Authorization": f"Bearer {LITELLM_API_KEY}"}
    ) as ws:
        # Wait for session.created
        await ws.recv()
        print("✓ Connected")
        
        # Configure session with audio
        session_update = {
            "type": "session.update",
            "session": {
                "instructions": "You are a friendly voice assistant.",
                "modalities": ["text", "audio"],
                "voice": "matthew",
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16"
            }
        }
        await ws.send(json.dumps(session_update))
        print("🎤 Speak into your microphone...")
        
        async def send_audio():
            """Capture and send audio from microphone"""
            while True:
                audio_data = input_stream.read(CHUNK, exception_on_overflow=False)
                audio_b64 = base64.b64encode(audio_data).decode('utf-8')
                await ws.send(json.dumps({
                    "type": "input_audio_buffer.append",
                    "audio": audio_b64
                }))
                await asyncio.sleep(0.01)
        
        async def receive_audio():
            """Receive and play audio responses"""
            while True:
                response = await ws.recv()
                event = json.loads(response)
                
                if event['type'] == 'response.audio.delta':
                    audio_b64 = event.get('delta', '')
                    if audio_b64:
                        audio_bytes = base64.b64decode(audio_b64)
                        output_stream.write(audio_bytes)
                
                elif event['type'] == 'response.text.delta':
                    print(event['delta'], end='', flush=True)
                
                elif event['type'] == 'response.done':
                    print("\n✓ Response complete")
        
        # Run both tasks concurrently
        await asyncio.gather(send_audio(), receive_audio())

if __name__ == "__main__":
    try:
        asyncio.run(audio_conversation())
    except KeyboardInterrupt:
        print("\n\nGoodbye!")

Using Tools/Function Calling

import asyncio
import websockets
import json
from datetime import datetime

LITELLM_API_KEY = "sk-1234"
LITELLM_URL = 'ws://localhost:4000/v1/realtime?model=bedrock-sonic'

# Define tools
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

def get_weather(location: str) -> dict:
    """Simulated weather function"""
    return {
        "location": location,
        "temperature": 72,
        "conditions": "sunny"
    }

async def conversation_with_tools():
    async with websockets.connect(
        LITELLM_URL,
        additional_headers={"Authorization": f"Bearer {LITELLM_API_KEY}"}
    ) as ws:
        # Wait for session.created
        await ws.recv()
        
        # Configure session with tools
        session_update = {
            "type": "session.update",
            "session": {
                "instructions": "You are a helpful assistant with access to tools.",
                "modalities": ["text"],
                "tools": TOOLS
            }
        }
        await ws.send(json.dumps(session_update))
        
        # Send a message that requires a tool
        message = {
            "type": "conversation.item.create",
            "item": {
                "type": "message",
                "role": "user",
                "content": [{"type": "input_text", "text": "What's the weather in San Francisco?"}]
            }
        }
        await ws.send(json.dumps(message))
        await ws.send(json.dumps({"type": "response.create"}))
        
        # Handle responses and tool calls
        while True:
            response = await ws.recv()
            event = json.loads(response)
            
            if event['type'] == 'response.text.delta':
                print(event['delta'], end='', flush=True)
            
            elif event['type'] == 'response.function_call_arguments.done':
                # Execute the tool
                function_name = event['name']
                arguments = json.loads(event['arguments'])
                
                print(f"\n🔧 Calling {function_name}({arguments})")
                result = get_weather(**arguments)
                
                # Send tool result back
                tool_result = {
                    "type": "conversation.item.create",
                    "item": {
                        "type": "function_call_output",
                        "call_id": event['call_id'],
                        "output": json.dumps(result)
                    }
                }
                await ws.send(json.dumps(tool_result))
                await ws.send(json.dumps({"type": "response.create"}))
            
            elif event['type'] == 'response.done':
                print("\n✓ Complete")
                break

if __name__ == "__main__":
    asyncio.run(conversation_with_tools())

Configuration Options

Voice Options

Available voices: matthew, joanna, ruth, stephen, gregory, amy

Audio Formats

  • Input: 16kHz PCM16 (mono)
  • Output: 24kHz PCM16 (mono)

Modalities

  • ["text"] - Text only
  • ["audio"] - Audio only
  • ["text", "audio"] - Both text and audio

Example Test Scripts

Complete working examples are available in the LiteLLM repository:

  • Basic audio streaming: test_bedrock_realtime_client.py
  • Simple text test: test_bedrock_realtime_simple.py
  • Tool calling: test_bedrock_realtime_tools.py

Requirements

uv add litellm websockets pyaudio

AWS Configuration

Ensure your AWS credentials are configured:

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_REGION_NAME=us-east-1

Or use AWS CLI configuration:

aws configure

Troubleshooting

Connection Issues

  • Ensure LiteLLM proxy is running on the correct port
  • Verify AWS credentials are properly configured
  • Check that the Bedrock model is available in your region

Audio Issues

  • Verify PyAudio is properly installed
  • Check microphone/speaker permissions
  • Ensure correct sample rates (16kHz input, 24kHz output)

Tool Calling Issues

  • Ensure tools are properly defined in session.update
  • Verify tool results are sent back with correct call_id
  • Check that response.create is sent after tool result