litellm/docs/my-website/docs/integrations/letta.md
stuxf a6c30b30bf
build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00

23 KiB

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Letta Integration

Letta (formerly MemGPT) is a framework for building stateful LLM agents with persistent memory. This guide shows how to integrate both LiteLLM SDK and LiteLLM Proxy with Letta to leverage multiple LLM providers while building memory-enabled agents.

What is Letta?

Letta allows you to build LLM agents that can:

  • Maintain long-term memory across conversations
  • Use function calling for tool interactions
  • Handle large context windows efficiently
  • Persist agent state and memory

Prerequisites

uv add letta litellm

Quick Start

1. Start LiteLLM Proxy

First, create a configuration file for your LiteLLM proxy:

# config.yaml
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-3-sonnet
    litellm_params:
      model: anthropic/claude-3-sonnet-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-35-turbo
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
      api_version: "2023-07-01-preview"

Start the proxy:

litellm --config config.yaml --port 4000

2. Configure Letta with LiteLLM Proxy

Configure Letta to use your LiteLLM proxy endpoint:

import letta
from letta import create_client

# Configure Letta to use LiteLLM proxy
client = create_client()

# Configure the LLM endpoint
client.set_default_llm_config(
    model="gpt-4",  # This should match a model from your LiteLLM config
    model_endpoint_type="openai",
    model_endpoint="http://localhost:4000",  # Your LiteLLM proxy URL
    context_window=8192
)

# Configure embedding endpoint (optional)
client.set_default_embedding_config(
    embedding_endpoint_type="openai",
    embedding_endpoint="http://localhost:4000",
    embedding_model="text-embedding-ada-002"
)

1. Configure LiteLLM SDK

Set up your API keys and configure LiteLLM:

import os
import litellm

# Set your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# Optional: Configure default settings
litellm.set_verbose = True  # For debugging

2. Create Custom LLM Wrapper for Letta

Create a custom LLM wrapper that uses LiteLLM SDK:

import letta
from letta import create_client
from letta.llm_api.llm_api_base import LLMConfig
import litellm
from typing import List, Dict, Any

class LiteLLMWrapper:
    def __init__(self, model: str):
        self.model = model
    
    def chat_completions_create(self, messages: List[Dict], **kwargs):
        # Use LiteLLM SDK for completion
        response = litellm.completion(
            model=self.model,
            messages=messages,
            **kwargs
        )
        return response

# Configure Letta with custom LiteLLM wrapper
client = create_client()

# Set up LLM configuration using direct SDK integration
llm_config = LLMConfig(
    model="gpt-4",  # or "claude-3-sonnet", "azure/gpt-35-turbo", etc.
    model_endpoint_type="openai",
    context_window=8192
)

client.set_default_llm_config(llm_config)

3. Create and Use a Letta Agent

import letta
from letta import create_client

# Create Letta client
client = create_client()

# Create a new agent
agent_state = client.create_agent(
    name="my-assistant",
    system="You are a helpful assistant with persistent memory.",
    llm_config=client.get_default_llm_config(),
    embedding_config=client.get_default_embedding_config()
)

# Send a message to the agent
response = client.user_message(
    agent_id=agent_state.id,
    message="Hi! My name is Alice and I love reading science fiction books."
)

print(f"Agent response: {response.messages[-1].text}")

# Send another message - the agent will remember previous context
response = client.user_message(
    agent_id=agent_state.id,
    message="What did I tell you about my interests?"
)

print(f"Agent response: {response.messages[-1].text}")
import letta
from letta import create_client
import litellm
import os

# Set up environment variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"

# Create Letta client with LiteLLM integration
client = create_client()

# Create a new agent
agent_state = client.create_agent(
    name="my-assistant",
    system="You are a helpful assistant with persistent memory.",
    llm_config=client.get_default_llm_config(),
    embedding_config=client.get_default_embedding_config()
)

# Send a message to the agent
response = client.user_message(
    agent_id=agent_state.id,
    message="Hi! My name is Alice and I love reading science fiction books."
)

print(f"Agent response: {response.messages[-1].text}")

# Send another message - the agent will remember previous context
response = client.user_message(
    agent_id=agent_state.id,
    message="What did I tell you about my interests?"
)

print(f"Agent response: {response.messages[-1].text}")

Advanced Configuration

Using Different Models for Different Agents

from letta import LLMConfig, EmbeddingConfig

# Create different LLM configurations pointing to your proxy
gpt4_config = LLMConfig(
    model="gpt-4",
    model_endpoint_type="openai",
    model_endpoint="http://localhost:4000",
    context_window=8192
)

claude_config = LLMConfig(
    model="claude-3-sonnet",
    model_endpoint_type="openai",  # Using OpenAI-compatible endpoint
    model_endpoint="http://localhost:4000",
    context_window=200000
)

# Create agents with different configurations
research_agent = client.create_agent(
    name="research-agent",
    system="You are a research assistant specialized in analysis.",
    llm_config=claude_config  # Use Claude for research tasks
)

creative_agent = client.create_agent(
    name="creative-agent", 
    system="You are a creative writing assistant.",
    llm_config=gpt4_config  # Use GPT-4 for creative tasks
)
import os
import litellm
from letta import LLMConfig, EmbeddingConfig

# Set up API keys for different providers
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# Create different LLM configurations for direct SDK usage
gpt4_config = LLMConfig(
    model="openai/gpt-4",  # Using LiteLLM model format
    model_endpoint_type="openai",
    context_window=8192
)

claude_config = LLMConfig(
    model="anthropic/claude-3-sonnet-20240229",  # Using LiteLLM model format
    model_endpoint_type="openai",
    context_window=200000
)

# Create agents with different configurations
research_agent = client.create_agent(
    name="research-agent",
    system="You are a research assistant specialized in analysis.",
    llm_config=claude_config  # Use Claude for research tasks
)

creative_agent = client.create_agent(
    name="creative-agent", 
    system="You are a creative writing assistant.",
    llm_config=gpt4_config  # Use GPT-4 for creative tasks
)

Function Calling with Tools

# Define custom tools for your agent
def search_web(query: str) -> str:
    """Search the web for information"""
    # Your web search implementation
    return f"Search results for: {query}"

def save_note(content: str) -> str:
    """Save a note to persistent storage"""
    # Your note saving implementation
    return f"Note saved: {content}"

# Create agent with tools (using proxy endpoint)
agent_state = client.create_agent(
    name="research-assistant",
    system="You are a research assistant that can search the web and save notes.",
    llm_config=client.get_default_llm_config(),
    embedding_config=client.get_default_embedding_config(),
    tools=[search_web, save_note]
)

# The agent can now use these tools
response = client.user_message(
    agent_id=agent_state.id,
    message="Search for recent developments in AI and save important findings."
)
import litellm
import os

# Set up API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"

# Define custom tools for your agent
def search_web(query: str) -> str:
    """Search the web for information"""
    # Your web search implementation
    return f"Search results for: {query}"

def save_note(content: str) -> str:
    """Save a note to persistent storage"""
    # Your note saving implementation
    return f"Note saved: {content}"

# Create agent with tools (using LiteLLM SDK directly)
agent_state = client.create_agent(
    name="research-assistant",
    system="You are a research assistant that can search the web and save notes.",
    llm_config=LLMConfig(
        model="openai/gpt-4",  # Direct model specification
        model_endpoint_type="openai",
        context_window=8192
    ),
    embedding_config=client.get_default_embedding_config(),
    tools=[search_web, save_note]
)

# The agent can now use these tools
response = client.user_message(
    agent_id=agent_state.id,
    message="Search for recent developments in AI and save important findings."
)

Authentication

If your LiteLLM proxy requires authentication:

import os
from letta import LLMConfig

# Set up authenticated configuration
llm_config = LLMConfig(
    model="gpt-4",
    model_endpoint_type="openai",
    model_endpoint="http://localhost:4000",
    model_wrapper="openai",
    context_window=8192
)

# If using API keys with your proxy
os.environ["OPENAI_API_KEY"] = "your-litellm-proxy-api-key"

client = create_client()
client.set_default_llm_config(llm_config)

For proxy with authentication enabled:

# config.yaml with auth
general_settings:
  master_key: "your-master-key"

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
# Configure Letta with authenticated proxy
llm_config = LLMConfig(
    model="gpt-4",
    model_endpoint_type="openai",
    model_endpoint="http://localhost:4000",
    context_window=8192,
    api_key="your-master-key"  # Proxy master key
)

With LiteLLM SDK, set up your provider API keys directly:

import os
import litellm

# Set up API keys for different providers
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key" 
os.environ["AZURE_API_KEY"] = "your-azure-api-key"
os.environ["AZURE_API_BASE"] = "https://your-resource.openai.azure.com"
os.environ["AZURE_API_VERSION"] = "2023-07-01-preview"

# Optional: Configure default settings
litellm.api_key = os.environ.get("OPENAI_API_KEY")  # Default key
litellm.set_verbose = True  # For debugging

# Use in Letta configuration
from letta import LLMConfig

llm_config = LLMConfig(
    model="openai/gpt-4",  # Will use OPENAI_API_KEY automatically
    model_endpoint_type="openai",
    context_window=8192
)

# Or for Azure
azure_config = LLMConfig(
    model="azure/gpt-35-turbo", 
    model_endpoint_type="openai",
    context_window=4096
)

Load Balancing and Fallbacks

LiteLLM proxy's load balancing and fallback features work seamlessly with Letta:

# config.yaml with fallbacks
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
    tpm: 40000
    rpm: 500

  - model_name: gpt-4  # Same model name for fallback
    litellm_params:
      model: azure/gpt-4
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
      api_version: "2023-07-01-preview"
    tpm: 80000
    rpm: 800

router_settings:
  routing_strategy: "usage-based-routing"
  fallbacks: [{"gpt-4": ["azure/gpt-4"]}]

The proxy handles all routing, load balancing, and fallbacks transparently for Letta.

With LiteLLM SDK, you can set up routing and fallbacks programmatically:

import litellm
from litellm import Router

# Configure router with multiple models
router = Router(
    model_list=[
        {
            "model_name": "gpt-4",
            "litellm_params": {
                "model": "openai/gpt-4",
                "api_key": os.environ["OPENAI_API_KEY"]
            },
            "tpm": 40000,
            "rpm": 500
        },
        {
            "model_name": "gpt-4",  # Same name for fallback
            "litellm_params": {
                "model": "azure/gpt-4", 
                "api_key": os.environ["AZURE_API_KEY"],
                "api_base": os.environ["AZURE_API_BASE"],
                "api_version": "2023-07-01-preview"
            },
            "tpm": 80000,
            "rpm": 800
        }
    ],
    fallbacks=[{"gpt-4": ["azure/gpt-4"]}],
    routing_strategy="usage-based-routing"
)

# Create custom completion function for Letta
def custom_completion(messages, model="gpt-4", **kwargs):
    return router.completion(
        model=model,
        messages=messages,
        **kwargs
    )

# Use with Letta by monkey-patching or custom wrapper
litellm.completion = custom_completion

Monitoring and Observability

Enable logging to track your Letta agents' LLM usage through the proxy:

# config.yaml with logging
model_list:
  # ... your models

litellm_settings:
  success_callback: ["langfuse"]  # or other observability tools
  
environment_variables:
  LANGFUSE_PUBLIC_KEY: "your-key"
  LANGFUSE_SECRET_KEY: "your-secret"

View metrics in the proxy dashboard:

# Start proxy with UI
litellm --config config.yaml --port 4000 --detailed_debug

Set up observability directly in your SDK integration:

import litellm
import os

# Configure observability callbacks
os.environ["LANGFUSE_PUBLIC_KEY"] = "your-key"
os.environ["LANGFUSE_SECRET_KEY"] = "your-secret"

# Set global callbacks
litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]

# Optional: Set up custom logging
litellm.set_verbose = True

# Create custom completion wrapper with logging
def logged_completion(messages, model="gpt-4", **kwargs):
    try:
        response = litellm.completion(
            model=model,
            messages=messages,
            **kwargs
        )
        # Custom logging logic here if needed
        return response
    except Exception as e:
        # Custom error handling
        print(f"LLM call failed: {e}")
        raise

# Use in Letta configuration
litellm.completion = logged_completion

Example: Multi-Agent System

import letta
from letta import create_client, LLMConfig

client = create_client()

# Create specialized agents using proxy endpoints
agents = {}

# Research agent using Claude for analysis
agents['researcher'] = client.create_agent(
    name="researcher",
    system="You are a research specialist. Analyze information thoroughly.",
    llm_config=LLMConfig(
        model="claude-3-sonnet",
        model_endpoint="http://localhost:4000",
        model_endpoint_type="openai"
    )
)

# Writer agent using GPT-4 for content creation
agents['writer'] = client.create_agent(
    name="writer",
    system="You are a content writer. Create engaging, well-structured content.",
    llm_config=LLMConfig(
        model="gpt-4",
        model_endpoint="http://localhost:4000", 
        model_endpoint_type="openai"
    )
)

# Coordinator workflow
def research_and_write_workflow(topic: str):
    # Research phase
    research_response = client.user_message(
        agent_id=agents['researcher'].id,
        message=f"Research the topic: {topic}. Provide key insights and data."
    )
    
    research_results = research_response.messages[-1].text
    
    # Writing phase
    write_response = client.user_message(
        agent_id=agents['writer'].id,
        message=f"Based on this research: {research_results}\n\nWrite an article about {topic}."
    )
    
    return write_response.messages[-1].text

# Execute workflow
article = research_and_write_workflow("The future of AI in healthcare")
print(article)
import letta
from letta import create_client, LLMConfig
import litellm
import os

# Set up environment
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

client = create_client()

# Create specialized agents using direct SDK models
agents = {}

# Research agent using Claude for analysis
agents['researcher'] = client.create_agent(
    name="researcher",
    system="You are a research specialist. Analyze information thoroughly.",
    llm_config=LLMConfig(
        model="anthropic/claude-3-sonnet-20240229",
        model_endpoint_type="openai"
    )
)

# Writer agent using GPT-4 for content creation
agents['writer'] = client.create_agent(
    name="writer",
    system="You are a content writer. Create engaging, well-structured content.",
    llm_config=LLMConfig(
        model="openai/gpt-4",
        model_endpoint_type="openai"
    )
)

# Cost-conscious agent using GPT-3.5
agents['reviewer'] = client.create_agent(
    name="reviewer",
    system="You are an editor. Review and improve content quality.",
    llm_config=LLMConfig(
        model="openai/gpt-3.5-turbo",
        model_endpoint_type="openai"
    )
)

# Enhanced workflow with multiple agents
def enhanced_workflow(topic: str):
    # Research phase
    research_response = client.user_message(
        agent_id=agents['researcher'].id,
        message=f"Research the topic: {topic}. Provide key insights and data."
    )
    
    research_results = research_response.messages[-1].text
    
    # Writing phase
    write_response = client.user_message(
        agent_id=agents['writer'].id,
        message=f"Based on this research: {research_results}\n\nWrite an article about {topic}."
    )
    
    draft_article = write_response.messages[-1].text
    
    # Review phase
    review_response = client.user_message(
        agent_id=agents['reviewer'].id,
        message=f"Please review and improve this article:\n\n{draft_article}"
    )
    
    return review_response.messages[-1].text

# Execute enhanced workflow
article = enhanced_workflow("The future of AI in healthcare")
print(article)

Best Practices

  1. Model Selection: Use appropriate models for different tasks:

    • Claude for analysis and reasoning
    • GPT-4 for creative tasks
    • GPT-3.5-turbo for simple interactions
  2. Proxy Configuration:

    • Set appropriate rate limits and timeouts
    • Use fallbacks for reliability
    • Enable authentication for production
  3. Memory Management: Letta handles memory automatically, but monitor usage with large contexts

  4. Cost Optimization:

    • Use the proxy's budgeting features to control costs
    • Set up rate limiting per user/team
    • Monitor token usage through proxy dashboard
  5. Monitoring: Enable observability to track agent performance and token usage

  1. Model Selection: Choose models based on task requirements:

    • Use openai/gpt-4 for complex reasoning
    • Use anthropic/claude-3-sonnet-20240229 for analysis
    • Use openai/gpt-3.5-turbo for cost-effective simple tasks
  2. Error Handling: Implement robust error handling with retries:

    import litellm
    from litellm import completion
    
    # Set up retry logic
    litellm.num_retries = 3
    litellm.request_timeout = 60
    
    # Custom error handling
    def safe_completion(**kwargs):
        try:
            return completion(**kwargs)
        except Exception as e:
            print(f"LLM call failed: {e}")
            # Implement fallback logic
            return completion(model="openai/gpt-3.5-turbo", **kwargs)
    
  3. Cost Management:

    • Use cheaper models for non-critical tasks
    • Implement token counting and budgets
    • Cache responses when appropriate
  4. Performance:

    • Use async operations for concurrent requests
    • Implement connection pooling
    • Monitor response times
  5. Security:

    • Store API keys securely (environment variables)
    • Rotate keys regularly
    • Implement rate limiting

Troubleshooting

Connection Issues

# Test your LiteLLM proxy
curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Configuration Debugging

# Enable verbose logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Test Letta configuration
client = create_client()
print(client.get_default_llm_config())

Common Proxy Issues

  • Port conflicts: Make sure port 4000 isn't in use
  • Model not found: Verify model names match your config.yaml
  • Authentication errors: Check master key configuration
  • Rate limiting: Monitor proxy logs for rate limit hits

API Key Issues

import os
import litellm

# Check if API keys are set
print("OpenAI Key:", os.environ.get("OPENAI_API_KEY", "Not set"))
print("Anthropic Key:", os.environ.get("ANTHROPIC_API_KEY", "Not set"))

# Test direct LiteLLM call
try:
    response = litellm.completion(
        model="openai/gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print("LiteLLM working:", response.choices[0].message.content)
except Exception as e:
    print("LiteLLM error:", e)

Configuration Debugging

# Enable verbose logging
litellm.set_verbose = True

# Test model availability
models = ["openai/gpt-4", "anthropic/claude-3-sonnet-20240229"]
for model in models:
    try:
        response = litellm.completion(
            model=model,
            messages=[{"role": "user", "content": "Test"}],
            max_tokens=10
        )
        print(f"✓ {model} working")
    except Exception as e:
        print(f"✗ {model} failed: {e}")

Common SDK Issues

  • Import errors: Ensure uv add litellm letta is run
  • Model format: Use provider/model format (e.g., openai/gpt-4)
  • API key format: Different providers have different key formats
  • Rate limits: Implement exponential backoff for retries

Resources