Backend - Spend Log Storage for Realtime Calls: - Collect user voice transcripts and text input during WebSocket sessions - Store collected messages in spend logs when store_prompts_in_spend_logs enabled - Capture tool definitions from session.update and tool calls from response.done - Enrich proxy_server_request with tools and response with tool_calls for UI Backend - WebSocket Auth: - Support browser-based auth via Sec-WebSocket-Protocol subprotocol - Echo back subprotocol on WebSocket accept UI - Realtime Playground: - New RealtimePlayground component with WebSocket voice+text chat - Mic recording (PCM16 24kHz), server VAD, audio playback, text input - Handle binary WebSocket frames (Blob/ArrayBuffer decoding) - Add /v1/realtime endpoint option to playground endpoint selector UI - Tools Section for Realtime Logs: - Extract tool calls from realtime response format (response.tool_calls and response.results[].response.output[].type=function_call) Tests: - 15 new backend tests for realtime streaming and spend log storage - 4 new UI tests for realtime tool call extraction Fixes pre-existing build errors: - ToolPolicies.tsx: duplicate import, antd styles type - create_key_button.tsx: missing message import Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
9.3 KiB
INSTRUCTIONS FOR LITELLM
This document provides comprehensive instructions for AI agents working in the LiteLLM repository.
OVERVIEW
LiteLLM is a unified interface for 100+ LLMs that:
- Translates inputs to provider-specific completion, embedding, and image generation endpoints
- Provides consistent OpenAI-format output across all providers
- Includes retry/fallback logic across multiple deployments (Router)
- Offers a proxy server (LLM Gateway) with budgets, rate limits, and authentication
- Supports advanced features like function calling, streaming, caching, and observability
REPOSITORY STRUCTURE
Core Components
litellm/- Main library codellms/- Provider-specific implementations (OpenAI, Anthropic, Azure, etc.)proxy/- Proxy server implementation (LLM Gateway)router_utils/- Load balancing and fallback logictypes/- Type definitions and schemasintegrations/- Third-party integrations (observability, caching, etc.)
Key Directories
tests/- Comprehensive test suitesdocs/my-website/- Documentation websiteui/litellm-dashboard/- Admin dashboard UIenterprise/- Enterprise-specific features
DEVELOPMENT GUIDELINES
MAKING CODE CHANGES
-
Provider Implementations: When adding/modifying LLM providers:
- Follow existing patterns in
litellm/llms/{provider}/ - Implement proper transformation classes that inherit from
BaseConfig - Support both sync and async operations
- Handle streaming responses appropriately
- Include proper error handling with provider-specific exceptions
- Follow existing patterns in
-
Type Safety:
- Use proper type hints throughout
- Update type definitions in
litellm/types/ - Ensure compatibility with both Pydantic v1 and v2
-
Testing:
- Add tests in appropriate
tests/subdirectories - Include both unit tests and integration tests
- Test provider-specific functionality thoroughly
- Consider adding load tests for performance-critical changes
- Add tests in appropriate
MAKING CODE CHANGES FOR THE UI (IGNORE FOR BACKEND)
-
Tremor is DEPRECATED, do not use Tremor components in new features/changes
- The only exception is the Tremor Table component and its required Tremor Table sub components.
-
Use Common Components as much as possible:
- These are usually defined in the
common_componentsdirectory - Use these components as much as possible and avoid building new components unless needed
- These are usually defined in the
-
Testing:
- The codebase uses Vitest and React Testing Library
- Query Priority Order: Use query methods in this order:
getByRole,getByLabelText,getByPlaceholderText,getByText,getByTestId - Always use
screeninstead of destructuring fromrender()(e.g., usescreen.getByText()notgetByText) - Wrap user interactions in
act(): Always wrapfireEventcalls withact()to ensure React state updates are properly handled - Use
querymethods for absence checks: UsequeryBy*methods (notgetBy*) when expecting an element to NOT be present - Test names must start with "should": All test names should follow the pattern
it("should ...") - Mock external dependencies: Check
setupTests.tsfor global mocks and mock child components/networking calls as needed - Structure tests properly:
- First test should verify the component renders successfully
- Subsequent tests should focus on functionality and user interactions
- Use
waitForfor async operations that aren't already awaited
- Avoid using
querySelector: Prefer React Testing Library queries over direct DOM manipulation
IMPORTANT PATTERNS
-
Function/Tool Calling:
- LiteLLM standardizes tool calling across providers
- OpenAI format is the standard, with transformations for other providers
- See
litellm/llms/anthropic/chat/transformation.pyfor complex tool handling
-
Streaming:
- All providers should support streaming where possible
- Use consistent chunk formatting across providers
- Handle both sync and async streaming
-
Error Handling:
- Use provider-specific exception classes
- Maintain consistent error formats across providers
- Include proper retry logic and fallback mechanisms
-
Configuration:
- Support both environment variables and programmatic configuration
- Use
BaseConfigclasses for provider configurations - Allow dynamic parameter passing
PROXY SERVER (LLM GATEWAY)
The proxy server is a critical component that provides:
- Authentication and authorization
- Rate limiting and budget management
- Load balancing across multiple models/deployments
- Observability and logging
- Admin dashboard UI
- Enterprise features
Key files:
litellm/proxy/proxy_server.py- Main server implementationlitellm/proxy/auth/- Authentication logiclitellm/proxy/management_endpoints/- Admin API endpoints
MCP (MODEL CONTEXT PROTOCOL) SUPPORT
LiteLLM supports MCP for agent workflows:
- MCP server integration for tool calling
- Transformation between OpenAI and MCP tool formats
- Support for external MCP servers (Zapier, Jira, Linear, etc.)
- See
litellm/experimental_mcp_client/andlitellm/proxy/_experimental/mcp_server/
RUNNING SCRIPTS
Use poetry run python script.py to run Python scripts in the project environment (for non-test files).
GITHUB TEMPLATES
When opening issues or pull requests, follow these templates:
Bug Reports (.github/ISSUE_TEMPLATE/bug_report.yml)
- Describe what happened vs. expected behavior
- Include relevant log output
- Specify LiteLLM version
- Indicate if you're part of an ML Ops team (helps with prioritization)
Feature Requests (.github/ISSUE_TEMPLATE/feature_request.yml)
- Clearly describe the feature
- Explain motivation and use case with concrete examples
Pull Requests (.github/pull_request_template.md)
- Add at least 1 test in
tests/litellm/ - Ensure
make test-unitpasses
TESTING CONSIDERATIONS
- Provider Tests: Test against real provider APIs when possible
- Proxy Tests: Include authentication, rate limiting, and routing tests
- Performance Tests: Load testing for high-throughput scenarios
- Integration Tests: End-to-end workflows including tool calling
DOCUMENTATION
- Keep documentation in sync with code changes
- Update provider documentation when adding new providers
- Include code examples for new features
- Update changelog and release notes
SECURITY CONSIDERATIONS
- Handle API keys securely
- Validate all inputs, especially for proxy endpoints
- Consider rate limiting and abuse prevention
- Follow security best practices for authentication
ENTERPRISE FEATURES
- Some features are enterprise-only
- Check
enterprise/directory for enterprise-specific code - Maintain compatibility between open-source and enterprise versions
COMMON PITFALLS TO AVOID
- Breaking Changes: LiteLLM has many users - avoid breaking existing APIs
- Provider Specifics: Each provider has unique quirks - handle them properly
- Rate Limits: Respect provider rate limits in tests
- Memory Usage: Be mindful of memory usage in streaming scenarios
- Dependencies: Keep dependencies minimal and well-justified
- UI/Backend Contract Mismatch: When adding a new entity type to the UI, always check whether the backend endpoint accepts a single value or an array. Match the UI control accordingly (single-select vs. multi-select) to avoid silently dropping user selections
- Missing Tests for New Entity Types: When adding a new entity type (e.g., in
EntityUsage,UsageViewSelect), always add corresponding tests in the existing test files and update any icon/component mocks
HELPFUL RESOURCES
- Main documentation: https://docs.litellm.ai/
- Provider-specific docs in
docs/my-website/docs/providers/ - Admin UI for testing proxy features
WHEN IN DOUBT
- Follow existing patterns in the codebase
- Check similar provider implementations
- Ensure comprehensive test coverage
- Update documentation appropriately
- Consider backward compatibility impact
Cursor Cloud specific instructions
Environment
- Poetry is installed in
~/.local/bin; the update script ensures it is onPATH. - Python 3.12, Node 22 are pre-installed.
- The virtual environment lives under
~/.cache/pypoetry/virtualenvs/.
Running the proxy server
Start the proxy with a config file:
poetry run litellm --config dev_config.yaml --port 4000
The proxy takes ~15-20 seconds to fully start (it runs Prisma migrations on boot). Wait for /health to return before sending requests. Without a PostgreSQL DATABASE_URL, the proxy connects to a default Neon dev database embedded in the litellm-proxy-extras package.
Running tests
See CLAUDE.md and the Makefile for standard commands. Key notes:
psycopg-binarymust be installed (poetry run pip install psycopg-binary) because the pytest-postgresql plugin requires it and the lock file only includespsycopg(no binary).- The
--timeoutpytest flag is NOT available; don't pass it. - Unit tests:
poetry run pytest tests/test_litellm/ -x -vv -n 4 - Black
--checkmay report pre-existing formatting issues; this does not block test runs.
Lint
cd litellm && poetry run ruff check .
Ruff is the primary fast linter. For the full lint suite (including mypy, black, circular imports), run make lint per CLAUDE.md.