Adds a rule to CLAUDE.md and AGENTS.md instructing AI coding agents to pause and ask the user before introducing any third-party organization name that does not already appear in the repository. Names already established in the codebase (existing LLM providers, etc.) remain fine to use without prompting.
17 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Confidentiality: Customer and Company Names in Code
The codebase is public. Before writing any third-party organization name into this repository — in source code, file or directory names, docstrings, comments, tests, fixtures, mock payloads, error messages, log lines, commit messages, or PR descriptions — pause and check:
Already in the codebase (OpenAI, Anthropic, Google, Azure, Bedrock, Fireworks, and other established LLM providers / integrations) — fine to use. Quick check: git grep -i "<name>" — if it returns hits in real code (not just your current diff), the name is established.
Anything else — customers, prospects, partners, new vendor integrations, observability tools, infra vendors, or any organization name that does not already appear in the repo. STOP and surface it to the user. Ask for explicit consent before writing the name into any file, commit message, or PR description. Do not write it speculatively and clean up later. Do not substitute a placeholder and proceed. Do not assume it is safe because it "looks like" a public company. The user must approve first.
What to do instead of a customer-specific reference:
- If you find yourself reaching for a customer name — real or fake — step back. The code shouldn't be customer-specific in the first place. Generalize the feature, or capture the customer motivation in internal docs (Notion / Linear / the internal staging PR description), never in the repo.
- Frame changes by the capability they add, not the customer who asked for it ("add per-team Bedrock guardrail routing", not "add routing for $CUSTOMER").
- Standard "fake value" markers (
example.com,localhost,127.0.0.1,test@example.com) and abstract identifiers (team_a,user_1,tenant_x) are fine — those are not customer stand-ins.
Documentation
Documentation lives in a separate repository: BerriAI/litellm-docs. It is served at docs.litellm.ai. Do not create or edit documentation files in this repository — open doc PRs against BerriAI/litellm-docs instead.
Development Commands
Installation
make install-dev- Install core development dependenciesmake install-proxy-dev- Install proxy development dependencies with full feature setmake install-test-deps- Install the full local test environment and generate the Prisma client
Testing
make test- Run all testsmake test-unit- Run unit tests (tests/test_litellm) with 4 parallel workersmake test-integration- Run integration tests (excludes unit tests)pytest tests/- Direct pytest execution
Code Quality
make lint- Run all linting (Ruff, MyPy, Black, circular imports, import safety)make format- Apply Black code formattingmake lint-ruff- Run Ruff linting onlymake lint-mypy- Run MyPy type checking only- Before committing, always run
uv run black .to format your code. Black formatting is enforced in CI.
Single Test Files
uv run pytest tests/path/to/test_file.py -v- Run specific test fileuv run pytest tests/path/to/test_file.py::test_function -v- Run specific test
Running Scripts
uv run python script.py- Run Python scripts (use for non-test files)
GitHub Issue & PR Templates
When contributing to the project, use the appropriate templates:
Bug Reports (.github/ISSUE_TEMPLATE/bug_report.yml):
- Describe what happened vs. what you expected
- Include relevant log output
- Specify your LiteLLM version
Feature Requests (.github/ISSUE_TEMPLATE/feature_request.yml):
- Describe the feature clearly
- Explain the motivation and use case
Pull Requests (.github/pull_request_template.md):
- Add at least 1 test in
tests/litellm/ - Ensure
make test-unitpasses
Architecture Overview
LiteLLM is a unified interface for 100+ LLM providers with two main components:
Core Library (litellm/)
- Main entry point:
litellm/main.py- Contains core completion() function - Provider implementations:
litellm/llms/- Each provider has its own subdirectory - Router system:
litellm/router.py+litellm/router_utils/- Load balancing and fallback logic - Type definitions:
litellm/types/- Pydantic models and type hints - Integrations:
litellm/integrations/- Third-party observability, caching, logging - Caching:
litellm/caching/- Multiple cache backends (Redis, in-memory, S3, etc.)
Proxy Server (litellm/proxy/)
- Main server:
proxy_server.py- FastAPI application - Authentication:
auth/- API key management, JWT, OAuth2 - Database:
db/- Prisma ORM with PostgreSQL/SQLite support - Management endpoints:
management_endpoints/- Admin APIs for keys, teams, models - Pass-through endpoints:
pass_through_endpoints/- Provider-specific API forwarding - Guardrails:
guardrails/- Safety and content filtering hooks - UI Dashboard: Served from
_experimental/out/(Next.js build)
Key Patterns
Provider Implementation
- Providers inherit from base classes in
litellm/llms/base.py - Each provider has transformation functions for input/output formatting
- Support both sync and async operations
- Handle streaming responses and function calling
Error Handling
- Provider-specific exceptions mapped to OpenAI-compatible errors
- Fallback logic handled by Router system
- Comprehensive logging through
litellm/_logging.py
Configuration
- YAML config files for proxy server (see
proxy/example_config_yaml/) - Environment variables for API keys and settings
- Database schema managed via Prisma (
proxy/schema.prisma)
Development Notes
Code Style
- Uses Black formatter, Ruff linter, MyPy type checker
- Pydantic v2 for data validation
- Async/await patterns throughout
- Type hints required for all public APIs
- Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability. The only exception is avoiding circular imports where absolutely necessary.
- Use dict spread for immutable copies — prefer
{**original, "key": new_value}overdict(obj)+ mutation. The spread produces the final dict in one step and makes intent clear. - Guard at resolution time — when resolving an optional value through a fallback chain (
a or b or ""), raise immediately if the resolved result being empty is an error. Don't pass empty strings or sentinel values downstream for the callee to deal with. - Extract complex comprehensions to named helpers — a set/dict comprehension that calls into the DB or manager (e.g. "which of these server IDs are OAuth2?") belongs in a named helper function, not inline in the caller.
- FastAPI parameter declarations — mark required query/form params with
= Query(...)/= Form(...)explicitly when other params in the same handler are optional. Mixingstr(required) withOptional[str] = Nonein the same signature causes silent 422s when the required param is missing.
Testing Strategy
- Unit tests in
tests/test_litellm/ - Integration tests for each provider in
tests/llm_translation/ - Proxy tests in
tests/proxy_unit_tests/ - Load tests in
tests/load_tests/ - Always add tests when adding new entity types or features — if the existing test file covers other entity types, add corresponding tests for the new one
- Keep monkeypatch stubs in sync with real signatures — when a function gains a new optional parameter, update every
fake_*/stub_*in tests that patch it to also accept that kwarg (even as**kwargs). Stale stubs fail withunexpected keyword argumentand mask real bugs. - Test all branches of name→ID resolution — when adding server/resource lookup that resolves names to UUIDs, test: (1) name resolves and UUID is allowed, (2) name resolves but UUID is not allowed, (3) name does not resolve at all. The silent-fallback path is where access-control bugs hide.
UI / Backend Consistency
- When wiring a new UI entity type to an existing backend endpoint, verify the backend API contract (single value vs. array, required vs. optional params) and ensure the UI controls match — e.g., use a single-select dropdown when the backend accepts a single value, not a multi-select
UI Component Library
- Always use
antdfor new UI components — we are migrating off of@tremor/react. Do not introduce newBadge,Text,Card,Grid,Title, or other imports from@tremor/reactin any new or modified file. Useantdequivalents:Tagfor labels,Typography.Text/Typography.Title/Typography.Paragraphfor textual content (avoid plain text-only<span>,<p>,<h*>when Typography fits), andCardfromantd. Note thatantdhas no"yellow"Tag color — use"gold"for amber/yellow.
MCP OAuth / OpenAPI Transport Mapping
available_on_public_internet: falsewithdelegate_auth_to_upstream: true(oauth2, interactive — notclient_credentials) — LiteLLM still allows the anonymous upstream PKCE path (no proxy API key for/authorizeand matching MCP routes). The internal-only flag mainly affects other surfaces (e.g. IP-based discovery). Rely on the upstream IdP and network policy; the dashboard shows a warning when both are set, and the proxy logs a warning when the server is loaded from config or the database.TRANSPORT.OPENAPIis a UI-only concept. The backend only accepts"http","sse", or"stdio". Always map it to"http"before any API call (including pre-OAuth temp-session calls).- FastAPI validation errors return
detailas an array of{loc, msg, type}objects. Error extractors must handle: array (map.msg), string, nested{error: string}, and fallback. - When an MCP server already has
authorization_urlstored, skip OAuth discovery (_discovery_metadata) — the server URL for OpenAPI MCPs is the spec file, not the API base, and fetching it causes timeouts. client_idshould be optional in the/authorizeendpoint — if the server has a storedclient_idin credentials, use that. Never require callers to re-supply it.
MCP Credential Storage
- OAuth credentials and BYOK credentials share the
litellm_mcpusercredentialstable, distinguished by a"type"field in the JSON payload ("oauth2"vs plain string). - When deleting OAuth credentials, check type before deleting to avoid accidentally deleting a BYOK credential for the same
(user_id, server_id)pair. - Always pass the raw
expires_attimestamp to the client — never set it toNonefor expired credentials. Let the frontend compute the "Expired" display state from the timestamp. - Use
RecordNotFoundError(not bareexcept Exception) when catching "already deleted" in credential delete endpoints.
Browser Storage Safety (UI)
- Never write LiteLLM access tokens or API keys to
localStorage— usesessionStorageonly.localStoragesurvives browser close and is readable by any injected script (XSS). - Shared utility functions (e.g.
extractErrorMessage) belong insrc/utils/— never define them inline in hooks or duplicate them across files.
Database Migrations
- Prisma handles schema migrations
- Migration files auto-generated with
prisma migrate dev - Always test migrations against both PostgreSQL and SQLite
Proxy database access
- Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of
execute_raw/query_raw. - Use the generated client:
prisma_client.db.<model>(e.g.litellm_tooltable,litellm_usertable) with.upsert(),.find_many(),.find_unique(),.update(),.update_many()as appropriate. This avoids schema/client drift, keeps code testable with simple mocks, and matches patterns used in spend logs and other proxy code. - No N+1 queries. Never query the DB inside a loop. Batch-fetch with
{"in": ids}and distribute in-memory. - Batch writes. Use
create_many/update_many/delete_manyinstead of individual calls (these return counts only;update_many/delete_manyno-op silently on missing rows). When multiple separate writes target the same table (e.g. inbatch_()), order by primary key to avoid deadlocks. - Push work to the DB. Filter, sort, group, and aggregate in SQL, not Python. Verify Prisma generates the expected SQL — e.g. prefer
group_byoverfind_many(distinct=...)which does client-side processing. - Bound large result sets. Prisma materializes full results in memory. For results over ~10 MB, paginate with
take/skiporcursor/take, always with an explicitorder. Prefer cursor-based pagination (skipis O(n)). Don't paginate naturally small result sets. - Limit fetched columns on wide tables. Use
selectto fetch only needed fields — returns a partial object, so downstream code must not access unselected fields. - Check index coverage. For new or modified queries, check
schema.prismafor a supporting index. Prefer extending an existing index (e.g.@@index([a])→@@index([a, b])) over adding a new one, unless it's a@@unique. Only add indexes for large/frequent queries. - Keep schema files in sync. Apply schema changes to all
schema.prismacopies (schema.prisma,litellm/proxy/,litellm-proxy-extras/) with a migration underlitellm-proxy-extras/litellm_proxy_extras/migrations/.
Setup Wizard (litellm/setup_wizard.py)
- The wizard is implemented as a single
SetupWizardclass with@staticmethodmethods — keep it that way. No module-level functions exceptrun_setup_wizard()(the public entrypoint) and pure helpers (color, ANSI). - Use
litellm.utils.check_valid_key(model, api_key)for credential validation — never roll a custom completion call. - Do not hardcode provider env-key names or model lists that already exist in the codebase. Add a
test_modelfield to each provider entry to drivecheck_valid_key; set it toNonefor providers that can't be validated with a single API key (Azure, Bedrock, Ollama).
Enterprise Features
- Enterprise-specific code in
enterprise/directory - Optional features enabled via environment variables
- Separate licensing and authentication for enterprise features
CI Supply-Chain Safety
- Never pipe a remote script into a shell (
curl ... | bash,wget ... | sh). Download the artifact to a file, verify its SHA-256 checksum, then install. - Pin every external tool to a specific version with a full URL (not
latestorstable). Unversioned downloads silently change under you. - Verify checksums for all downloaded binaries. Use the provider's official
.sha256/.sha256sumsidecar file when available; otherwise compute and hardcode the digest. - Prefer reusable CircleCI commands (
commands:section) so a tool is installed and verified in exactly one place, then referenced everywhere with- install_<tool>or- wait_for_service. - Don't add tools just because they were there before. Audit whether an external dependency is still needed. If it can be replaced with a shell one-liner or a tool already in the image, remove it.
- These rules apply to every download in CI: binaries, install scripts, language version managers, package repos. No exceptions.
HTTP Client Cache Safety
- Never close HTTP/SDK clients on cache eviction.
LLMClientCache._remove_key()must not callclose()/aclose()on evicted clients — they may still be used by in-flight requests. Doing so causesRuntimeError: Cannot send a request, as the client has been closed.after the 1-hour TTL expires. Cleanup happens at shutdown viaclose_litellm_async_clients().
Troubleshooting: DB schema out of sync after proxy restart
litellm-proxy-extras runs prisma migrate deploy on startup using its own bundled migration files, which may lag behind schema changes in the current worktree. Symptoms: Unknown column, Invalid prisma invocation, or missing data on new fields.
Diagnose: Run \d "TableName" in psql and compare against schema.prisma — missing columns confirm the issue.
Fix options:
- Create a Prisma migration (permanent) — run
prisma migrate dev --name <description>in the worktree. The generated file will be picked up byprisma migrate deployon next startup. - Apply manually for local dev —
psql -d litellm -c "ALTER TABLE ... ADD COLUMN IF NOT EXISTS ..."after each proxy start. Fine for dev, not for production. - Update litellm-proxy-extras — if the package is installed from PyPI, its migration directory must include the new file. Either update the package or run the migration manually until the next release ships it.