Commit Graph

27 Commits

Author SHA1 Message Date
Mateo Wang
8259d6cd85
fix: small CLAUDE.md nit (#29749) 2026-06-05 06:30:05 +00:00
Mateo Wang
84c4c12f90
fix: small CLAUDE.md nits (#29504)
* fix: small CLAUDE.md nits

* fix: make test rule more concise

* fix: add arrow rule
2026-06-02 09:02:47 -07:00
Sameer Kankute
68952a55d7
docs(agents): clarify when to create new test files (#29472)
* docs(agents): clarify when to create new test files in CLAUDE.md

Document that bug fixes should extend existing mapped test files while new
features may add files under the mirrored tests/test_litellm/ layout.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(agents): clarify test file naming conventions in CLAUDE.md

Address Greptile feedback: document test_<filename>.py vs descriptive
test_*_transformation.py patterns and when to match existing names.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-01 21:10:42 -07:00
Mateo Wang
f27df8d516
docs: hand-written CLAUDE.md; point GEMINI.md and AGENTS.md at it (#29252)
* docs: replace generated CLAUDE.md with hand-written guidance, remove AGENTS.md

Swap the auto-generated CLAUDE.md for a concise hand-written version that captures how we actually want agents to work in this repo: minimal comments, simplicity first, meaningful tests with a high mutation kill rate, PRs based off litellm_internal_staging rather than main, and curl against a live proxy as proof of fix instead of pasted pytest output. Remove AGENTS.md so there is one source of truth for agent guidance. The customer and company name confidentiality policy, along with the MCP available_on_public_internet note, are carried over from the previous CLAUDE.md.

* fix: further clarify communication guidelines

* docs: point GEMINI.md at CLAUDE.md instead of duplicating guidance

Replace the standalone GEMINI.md copy, which had already drifted from the new CLAUDE.md, with a one-line pointer so Gemini reads the same single source of truth.

* docs: simplify PR template test checklist item

Replace the rigid "at least 1 test is a hard requirement" checklist line with "I have added meaningful tests", which matches the testing guidance in CLAUDE.md, and tidy a comma into a semicolon in the scope-isolation item.

* docs: point AGENTS.md at CLAUDE.md instead of deleting it

Keep AGENTS.md so tools that read it still resolve guidance, but collapse it to the same one-line pointer to CLAUDE.md used by GEMINI.md, keeping a single source of truth.

* fix: make AI-generated rules more concise

* fix: spelling

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: make the .env usage more careful

* docs: restore MCP available_on_public_internet note to CLAUDE.md

The PR description states this note was carried over verbatim from the
previous CLAUDE.md, but it was dropped in the rewrite. Restore it so the
file matches the description and the team guidance is not lost.

* docs: restore browser storage and CI supply-chain safety notes to CLAUDE.md

These security-relevant rules were dropped in the rewrite. Restore the
sessionStorage-over-localStorage (XSS) guidance and the CI supply-chain
rules (no curl|bash, pin versions, verify checksums) so agents editing UI
or CI code are still steered away from those pitfalls.

* docs: move area-specific guidance into nested CLAUDE.md files

The MCP, browser-storage, and CI supply-chain notes are scoped to
particular parts of the tree, so move each into a nested CLAUDE.md that
Claude Code loads on demand when those files are touched: the MCP note
under the mcp_server gateway, the browser-storage rule under the UI
dashboard, and the CI supply-chain rules under .circleci. Keeps the root
CLAUDE.md focused on general guidance while the area notes surface where
they are relevant.

* docs: keep CI supply-chain note in root CLAUDE.md

CI guidance applies beyond .circleci (it also covers downloads in GitHub
workflows and any CI script), and CI work does not reliably touch a single
subtree, so a nested file under .circleci would not surface it dependably.
Keep it in the always-loaded root instead. The MCP and browser-storage
notes stay nested where they map cleanly to one area of the tree.

* fix: make it clear we prefer httpOnly

* chore: make ci rule more concise

* chore: make concise

Fix formatting and punctuation in MCP note.

* fix: don't include Claude attribution

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-05-29 00:05:05 -07:00
yuneng-jiang
b6fd7f7746
docs(agents): require consent before writing new third-party names (#28908)
Adds a rule to CLAUDE.md and AGENTS.md instructing AI coding agents to
pause and ask the user before introducing any third-party organization
name that does not already appear in the repository. Names already
established in the codebase (existing LLM providers, etc.) remain fine
to use without prompting.
2026-05-26 16:40:07 -07:00
Sameer Kankute
d855e56333
chore(mcp): warn on internal + upstream PKCE delegate
Log verbose_logger.warning when loading oauth2 interactive servers with
available_on_public_internet=false and delegate_auth_to_upstream=true
(config + DB). Dashboard Alert for the same combo. CLAUDE note for
operators. Tests for log and M2M skip.
2026-05-15 10:05:35 +05:30
Yassin Kortam
fa5eae8bc9
chore: remove legacy deployment artifacts and litellm-js packages (#27541)
- Remove litellm-js/proxy and litellm-js/spend-logs TypeScript packages that provided Cloudflare Worker proxy and Node.js spend logging services, as these are no longer maintained
- Remove deprecated Docker variants (Dockerfile.alpine, Dockerfile.dev, Dockerfile.custom_ui, Dockerfile.health_check, Dockerfile.ghcr_base) that have been superseded by the primary Dockerfile
- Remove legacy Kubernetes manifests (kub.yaml, service.yaml) from deploy/kubernetes in favor of the Helm chart
- Remove stale index.yaml Helm chart index pinned to an old version (v1.43.18)
- Remove dev_config.yaml development configuration file that contained hardcoded credentials and example endpoints
- Clean up ~3,500 lines of unused code and configuration to reduce repository maintenance burden

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-09 20:51:34 +00:00
Yuneng Jiang
c35f3a50ae docs: remove docs/my-website, point contributors to litellm-docs
The documentation source has moved to a separate repository,
BerriAI/litellm-docs, served at docs.litellm.ai. This PR removes
docs/my-website/ from this repo and updates README.md, AGENTS.md,
and CLAUDE.md to direct doc contributions to the new repo.

Also fixes a broken relative link in
litellm/integrations/levo/README.md.

The existing CI symlink in .github/workflows/test-code-quality.yml
(which clones litellm-docs and symlinks docs/my-website to it for
tests/documentation_tests/*) continues to work without change.
2026-04-24 14:17:46 -07:00
Ryan Crabbe
17568e81f2
chore(ui): use antd Typography in GuardrailTestPlayground
- Replace plain text/heading tags with Typography.Text, Title, Paragraph
- Document Typography preference in CLAUDE.md UI guidelines
2026-04-15 08:45:11 -07:00
yuneng-jiang
a306092d47
Merge pull request #25463 from BerriAI/litellm_oss_staging_04_09_2026
Litellm oss staging 04 09 2026
2026-04-13 17:25:53 -07:00
Ryan Crabbe
842523a918
chore(ui): use antd in GuardrailSettingsView and document the rule
Converts GuardrailSettingsView from @tremor/react (Badge, Text) to
antd (Tag, plain spans) as part of the Tremor migration. Also
captures the "no new Tremor imports" rule in CLAUDE.md and expands
the existing note in AGENTS.md with the specific antd equivalents
and the yellow→gold gotcha.
2026-04-13 11:49:39 -07:00
Yuneng Jiang
78c282f400
[Infra] CI: harden supply chain — remove dockerize, pin tools, verify checksums
- Remove dockerize entirely. Replace all 26 `dockerize -wait` calls with
  a new `wait_for_service` CircleCI command using built-in bash + curl.
- Replace `curl | bash` Docker install with a `docker version` check
  (Docker is pre-installed on the ubuntu-2204 machine image).
- Pin Helm v3.17.3, Kind v0.20.0, kubectl v1.31.4 with SHA-256
  checksum verification. Replace `curl | bash` helm install.
- Add reusable commands: wait_for_service, install_helm, install_kind.
- Add CI supply-chain safety guidelines to CLAUDE.md.
2026-04-10 21:42:33 -07:00
stuxf
a6c30b30bf
build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00
yuneng-jiang
d2d99aa082
[Docs] Enforce Black Formatting in Contributor Docs (#25135)
* [Docs] Enforce Black formatting in contributor docs

Black formatting is now enforced in CI. Update CLAUDE.md, AGENTS.md,
and CONTRIBUTING.md to instruct contributors and AI agents to run
`poetry run black .` before committing, and add VS Code setup guidance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: fixes

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:51:25 -07:00
Ishaan Jaff
8e61b32b8e
[Staging] - Ishaan March 17th (#23903)
* feat(xai): add grok-4.20 beta 2 models with pricing (#23900)

Add three grok-4.20 beta 2 model variants from xAI:
- grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent)
- grok-4.20-beta-0309-reasoning (reasoning)
- grok-4.20-beta-0309-non-reasoning

Pricing (from https://docs.x.ai/docs/models):
- Input: $2.00/1M tokens ($0.20/1M cached)
- Output: $6.00/1M tokens
- Context: 2M tokens

All variants support vision, function calling, tool choice, and web search.
Closes LIT-2171

* docs: add Quick Install section for litellm --setup wizard (#23905)

* docs: add Quick Install section for litellm --setup wizard

* docs: clarify setup wizard is for local/beginner use

* feat(setup): interactive setup wizard + install.sh (#23644)

* feat(setup): add interactive setup wizard + install.sh

Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that
guides users through provider selection, API key entry, and proxy config
generation, then optionally starts the proxy immediately.

- litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu
  (OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts,
  port/master-key config, and litellm_config.yaml generation
- litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard
- scripts/install.sh: curl-installable script (detect OS/Python, pip
  install litellm[proxy], launch wizard)

Usage:
  curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh
  litellm --setup

* fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs

* fix(install.sh): install from git branch so --setup is available for QA

* fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error

* fix(install.sh): force-reinstall from git to bypass cached PyPI version

* fix(install.sh): show pip progress bar during install

* fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary

* fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists)

* fix(install.sh): suppress RuntimeWarning from module invocation

* fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing

* fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary

* fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe

* fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster

* feat(setup_wizard): arrow-key selector, updated model names

* fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm

* feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start

* style(install.sh): show git clone warning in blue

* refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils

* address greptile review: fix yaml escaping, port validation, display name collisions, tests

- setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys
- setup_wizard.py: add _styled_input() with readline ANSI ignore markers
- setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture
- setup_wizard.py: validate port range 1-65535, initialize before loop
- setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai
- setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict
- setup_wizard.py: skip model_list entries for providers with no credentials
- setup_wizard.py: prompt for azure deployment name
- setup_wizard.py: wrap os.execlp in try/except with friendly fallback
- setup_wizard.py: wrap config write in try/except OSError
- setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite)
- setup_wizard.py: add .gitignore tip next to key storage notice
- setup_wizard.py: fix run_setup_wizard() return type annotation to None
- scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh)
- scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch)
- scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat
- scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies
- tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape

* style: black format setup_wizard.py

* fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow

- guard termios/tty imports with try/except ImportError for Windows compat
- quote master_key as YAML double-quoted scalar (same as env vars)
- remove unused port param from _build_config signature
- _validate_and_report now returns the final key so re-entered creds are stored
- add test for master_key YAML quoting

* fix: add --port to suggested command, guard /dev/tty exec in install.sh

* fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change

* fix: address greptile review comments

- _yaml_escape: add control character escaping (\n, \r, \t)
- test: fix tautological assertion in test_build_config_azure_no_deployment_skipped
- test: add tests for control character escaping in _yaml_escape

* feat(ui): remove Chat UI page link and banner from sidebar and playground (#23908)

* feat(guardrails): MCPJWTSigner - built-in guardrail for zero trust MCP auth (#23897)

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth

Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers
can trust a single signing authority instead of every upstream IdP.

Enable in config.yaml:
  guardrails:
    - guardrail_name: mcp-jwt-signer
      litellm_params:
        guardrail: mcp_jwt_signer
        mode: pre_mcp_call
        default_on: true

JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss,
aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless
MCP_JWT_SIGNING_KEY env var is set.

Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration
so MCP servers can verify LiteLLM-issued tokens via OIDC discovery.

* Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message.

* fix: address P1 issues in MCPJWTSigner

- OpenAPI servers: warn + skip header injection instead of 500
- JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent
- sub claim: fallback to apikey:{token_hash} for anonymous callers
- ttl_seconds: validate > 0 at init time

* docs: add MCP zero trust auth guide with architecture diagram

* docs: add FastMCP JWT verification guide to zero trust doc

* fix: address remaining Greptile review issues (round 2)

- mcp_server_manager: warn when hook Authorization overwrites existing header
- __init__: remove _mcp_jwt_signer_instance from __all__ (private internal)
- discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation
- test docstring: reflect warn-and-continue behavior for OpenAPI servers
- test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs)

* fix: address Greptile round 3 feedback

- initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured
  mode silently bypasses JWT injection, which is a zero-trust bypass
- _build_claims: remove duplicate inline 'import re' (module-level import already present)
- _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing
  for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes

Addresses all missing pieces from the scoping doc review:

FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri
and token_introspection_endpoint.  When set, the incoming Bearer token is
extracted from raw_headers (threaded through pre_call_tool_check), verified
against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if
valid.  Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode.

FR-12 (Configurable end-user identity mapping): end_user_claim_sources
ordered list drives sub resolution — sources: token:<claim>, litellm:user_id,
litellm:email, litellm:end_user_id, litellm:team_id.

FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always
override), remove_claims (delete) applied in that order.

FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a
second JWT injected as x-mcp-channel-token: Bearer <token>.

FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any
listed claim is absent; optional_claims passes listed claims from verified
token into the outbound JWT.

FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid,
sub, iss, exp, scope.

FR-10 (Configurable scopes): allowed_scopes replaces auto-generation.  Also
fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission).

P1 fixes:
- proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than
  replaces extra_headers, preserving headers from prior guardrails.
- mcp_server_manager.py: warns when hook injects Authorization alongside a
  server-configured authentication_token (previously silent).
- mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and
  extracts incoming_bearer_token so FR-5 verification has the raw token.
- proxy/utils.py: remove stray inline import inspect inside loop (pre-existing
  lint error, now cleaned up).

Tests: 43 passing (28 new tests covering all FR flags + P1 fixes).

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core)

Remaining files from the FR implementation:

mcp_jwt_signer.py — full rewrite with all new params:
  FR-5:  access_token_discovery_uri, token_introspection_endpoint,
         verify_issuer, verify_audience + _verify_incoming_jwt(),
         _introspect_opaque_token()
  FR-12: end_user_claim_sources ordered resolution chain
  FR-13: add_claims, set_claims, remove_claims
  FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token
  FR-15: required_claims (raises 403), optional_claims (passthrough)
  FR-9:  debug_headers → x-litellm-mcp-debug
  FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list

mcp_server_manager.py:
  - pre_call_tool_check gains raw_headers param to extract incoming_bearer_token
  - Silent Authorization override warning fixed: now fires when server has
    authentication_token AND hook injects Authorization

tests/test_mcp_jwt_signer.py:
  28 new tests covering all FR flags + P1 fixes (43 total, all passing)

* fix(mcp_jwt_signer): address pre-landing review issues

- Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is
  already populated and consumed by MCPJWTSigner in the same PR
- Fix _get_oidc_discovery to only cache the OIDC discovery doc when
  jwks_uri is present; a malformed/empty doc now retries on the next
  request instead of being permanently cached until proxy restart
- Add FR-5 test coverage for _fetch_jwks (cache hit/miss),
  _get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt
  (valid token, expired token), _introspect_opaque_token (active,
  inactive, no endpoint), and the end-to-end 401 hook path — 53 tests
  total, all passing

* docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features

Add scenario-driven sections for each new config area:
- Verify+re-sign with Okta/Azure AD (access_token_discovery_uri,
  end_user_claim_sources, token_introspection_endpoint)
- Enforcing caller attributes with required_claims / optional_claims
- Adding metadata via add_claims / set_claims / remove_claims
- Two-token model for AWS Bedrock AgentCore Gateway
  (channel_token_audience / channel_token_ttl)
- Controlling scopes with allowed_scopes
- Debugging JWT rejections with debug_headers

Update JWT claims table to reflect configurable sub (end_user_claim_sources)

* fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail

The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner.
All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri,
end_user_claim_sources, add/set/remove_claims, channel_token_audience,
required/optional_claims, debug_headers, allowed_scopes, etc.) were
silently dropped, making every advertised advanced feature non-functional
when loaded from config.yaml.

Add regression test that asserts every param is wired through correctly.

* docs(mcp_zero_trust): add hero image

* docs(mcp_zero_trust): apply Linear-style edits

- Lead with the problem (unsigned direct calls bypass access controls)
- Shorter statement section headers instead of question-form headers
- Move diagram/OIDC discovery block after the reader is bought in
- Add 'read further only if you need to' callout after basic setup
- Two-token section now opens from the user problem not product jargon
- Add concrete 403 error response example in required_claims section
- Debug section opens from the symptom (MCP server returning 401)
- Lowercase claims reference header for consistency

* fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL

- Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead.
  Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks.
- Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h).
  Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible.

---------

Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>

* fix(ci): stabilize CI - formatting, type errors, test polling, security CVEs, router bug, batch resolution

Fix 1: Run Black formatter on 35 files
Fix 2: Fix MyPy type errors:
  - setup_wizard.py: add type annotation for 'selected' set variable
  - user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment
Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total
  spend instead of just 'any increase' from burst 2
Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979,
  CVE-2026-27980, CVE-2026-29057
Fix 5: Fix router _pre_call_checks model variable being overwritten inside
  loop, causing wrong model lookups on subsequent deployments. Use local
  _deployment_model variable instead.
Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve
  non-terminal-to-terminal path (matching the terminal path behavior)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* chore: regenerate poetry.lock to sync with pyproject.toml

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: format merged files from main and regenerate poetry.lock

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): annotate jwt_claims as Optional[dict] to fix type incompatibility

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): update router region test to use gpt-4.1-mini (fix flaky model lookup)

Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in
test_router_region_pre_call_check, following the same pattern used in commit
717d37cc5b for test_router_context_window_check_pre_call_check_out_group.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retry flaky logging_testing (async event loop race condition)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): aggregate all mock calls in langfuse e2e test to fix race condition

The _verify_langfuse_call helper only inspected the last mock call
(mock_post.call_args), but the Langfuse SDK may split trace-create and
generation-create events across separate HTTP flush cycles. This caused
an IndexError when the last call's batch contained only one event type.

Fix: iterate over mock_post.call_args_list to collect batch items from
ALL calls. Also add a safety assertion after filtering by trace_id and
mark all langfuse e2e tests with @pytest.mark.flaky(retries=3) as an
extra safety net for any residual timing issues.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): black formatting + update OpenAPI compliance tests for spec changes

- Apply Black 26.x formatting to litellm_logging.py (parenthesized style)
- Update test_input_types_match_spec to follow $ref to InteractionsInput schema
  (Google updated their OpenAPI spec to use $ref instead of inline oneOf)
- Update test_content_schema_uses_discriminator to handle discriminator without
  explicit mapping (Google removed the mapping key from Content discriminator)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* revert: undo incorrect Black 26.x formatting on litellm_logging.py

The file was correctly formatted for Black 23.12.1 (the version pinned
in pyproject.toml). The previous commit applied Black 26.x formatting
which was incompatible with the CI's Black version.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): deduplicate and sort langfuse batch events after aggregation

The Langfuse SDK may send the same event (e.g., trace-create) in
multiple flush cycles, causing duplicates when we aggregate from all
mock calls. After filtering by trace_id, deduplicate by keeping only
the first event of each type, then sort to ensure trace-create is at
index 0 and generation-create at index 1.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-18 15:09:01 -07:00
yuneng-jiang
278c9babc6
[Infra] Merging RC Branch with Main (#23786)
* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock

The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): route OpenAI models through chat completions in pass-through tests

The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): fix assemblyai custom auth and router wildcard test flakiness

1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
   user can access management endpoints like /key/generate. The test
   test_assemblyai_transcribe_with_non_admin_key was hidden behind an
   earlier -x failure and was never reached before.

2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
   to 2s for test_router_get_model_group_usage_wildcard_routes. The async
   callback needs time to write usage to cache, and 1s is insufficient on
   slower CI hardware.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retrigger CI pipeline

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic

Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)

* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline

* [Fix] Install bsdmainutils for column command in security scans

The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle string callback values in prometheus multiproc setup

When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
  TypeError: can only concatenate str (not "list") to str

Normalize each callback setting to a list before concatenating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bump: version 1.82.2 → 1.82.3

* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing

When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.

Now checks whether the router_model_id entry actually has pricing before preferring it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 15:32:20 -07:00
Ishaan Jaffer
4e42333925 fix claude.md 2026-03-13 08:32:07 -07:00
Ishaan Jaff
f5e5d17e4a
fix(mcp): fix OpenAPI OAuth flow — transport mapping, error messages, and discovery bypass (#23315)
* fix(mcp): fix OpenAPI OAuth flow — transport mapping, error messages, and discovery bypass

Three bugs fixed to make the end-to-end OAuth flow work for OpenAPI MCP servers:

1. **Transport mapping in getTemporaryPayload**: `TRANSPORT.OPENAPI` is a UI-only concept;
   the backend only accepts `"http"`, `"sse"`, or `"stdio"`. The pre-OAuth temp-session
   call was sending `transport: "openapi"` and getting a 422. Fixed by mapping to `"http"`.

2. **deriveErrorMessage handles FastAPI 422 arrays**: FastAPI validation errors return
   `detail` as an array of `{loc, msg, type}` objects. The shared error extractor was
   returning the array directly, causing `Error: [object Object]`. Fixed to map each
   item to its `.msg` field.

3. **Skip OAuth discovery when authorization_url already provided**: `build_mcp_server_from_table`
   was unconditionally calling `_descovery_metadata(server_url)` for OAuth servers. For
   OpenAPI servers the url is the spec JSON file, not the API base — this caused a timeout
   fetching e.g. the GitHub spec (2 MB). Fixed by skipping discovery when `authorization_url`
   is already set.

Also: collapsible auth section in MCP server form, "Create OAuth App →" link next to
Client ID when a docs URL is available (e.g. GitHub OAuth App creation page), and
`extractErrorMessage` helper in `useMcpOAuthFlow` for cleaner error display.

* refactor(mcp): extract needs_discovery flag and reduceStaticHeaders helper

* feat(mcp): user OAuth connect flow — OAuthConnectModal, MCPCredentialsTab, useUserMcpOAuthFlow

Adds the user-facing MCP OAuth2 PKCE connect flow:

- OAuthConnectModal: modal that launches the PKCE flow for a user to connect to an MCP server
- MCPCredentialsTab: credentials management tab in the MCP apps panel
- useUserMcpOAuthFlow: hook that handles the full PKCE auth code exchange for user-level connections
- MCPAppsPanel: wires up the new credentials tab and connect modal
- ChatPage: further cleanup after responses-API revert
- db.py / mcp_management_endpoints.py / _types.py: backend support for storing user MCP credentials

* fix(mcp): make client_id optional in /authorize — use server's stored client_id when not provided

* address greptile review feedback

* fix(mcp): narrow bare except to RecordNotFoundError in BYOK credential delete

* refactor(mcp): move inline imports to module level in db.py

* docs(claude): add MCP OAuth, transport mapping, and browser storage patterns

* fix(security): remove accessToken from sessionStorage in OAuth flow state

The LiteLLM API key was being serialised into sessionStorage as part of
StoredFlowState. After the OAuth redirect the component re-mounts with the
same accessToken prop, so it never needed to be stored. Read it from props
in resumeOAuthFlow instead.

* fix(ui): remove duplicate extractErrorMessage, sessionStorage-only in admin OAuth hook, call delete API on disconnect

* fix(ui): guard resumeOAuthFlow against wrong hook instance consuming OAuth result

* fix(ui): separate OAuth result keys per flow, sessionStorage-only, surface revoke errors

* fix(ui): remove dead OAuthConnectModal, revert tsconfig jsx mode to preserve

* fix(mcp): guard BYOK overwrite in oauth credential store, raise clear error when client_id absent

* fix: forward OAuth error params in callback, fix BYOK guard exception handling in db.py
2026-03-11 16:16:08 -07:00
Carlo Alberto Ferraris
9d2b0117d9
docs: add DB performance guidelines to CLAUDE.md
Extend the "Proxy database access" section with guidelines to prevent
common DB performance issues, tailored to actual Prisma usage patterns
in the litellm codebase: N+1 queries, client-side processing, batching
writes, bounding result sets, select on wide tables, index coverage,
and schema file sync.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 13:40:13 +09:00
Ishaan Jaff
503eb2fd4c
fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22925)
* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline
2026-03-05 12:00:38 -08:00
Ishaan Jaff
1bb713bc7b
feat(mcp): BYOK MCP servers with OAuth 2.1 PKCE authorization flow (#22850)
* feat(mcp): BYOK (Bring Your Own Key) for OpenAPI MCP servers with OAuth 2.1 flow

Adds per-user credential storage for BYOK MCP servers so external clients
can authenticate via standard OAuth 2.1 PKCE without needing a full identity
provider.

Backend:
- New DB table LiteLLM_MCPUserCredentials (user_id, server_id, credential_b64)
- is_byok, byok_description, byok_api_key_help_url fields on MCPServerTable
- OAuth 2.1 authorization server endpoints (/.well-known/oauth-authorization-server,
  /.well-known/oauth-protected-resource, /v1/mcp/oauth/authorize, /v1/mcp/oauth/token)
- 401 challenge with WWW-Authenticate header when BYOK server has no credential
- CRUD endpoints: POST/DELETE /v1/mcp/server/{id}/user-credential
- has_user_credential annotated on GET /v1/mcp/server response

UI:
- ByokCredentialModal: 2-step Connect flow (access description + API key entry)
- BYOK toggle + description fields on admin MCP server create form
- Connect/Connected state in MCP server table
- BYOK Demo page (/tools/byok-demo) showing full OAuth 2.1 PKCE flow

* feat(mcp/byok): redesign OAuth authorize page to match 2-step Connect mockup

- Step 1: L→S logos, requested access checklist, How it works box, Continue button
- Step 2: API key input, Save toggle, Duration pills (1h/24h/7d/30d/until_revoked), security note
- Matches screenshots: white modal on dark bg, progress dots, dark CTA buttons
- Authorize handler now fetches byok_description and byok_api_key_help_url from server registry
- CLAUDE.md: replace SQL snippet with proper DB migration troubleshooting guidance

* fix: address greptile review feedback (greploop iteration 1)

- XSS: escape all user-supplied values in _build_authorize_html() with html.escape()
- Open redirect: validate redirect_uri scheme and URL-encode code/state in redirect
- N+1 query: batch BYOK credential lookup into single find_many() call
- Critical path DB: add 60s TTL in-memory cache to _check_byok_credential()
- Encrypt BYOK credentials at rest using encrypt_value_helper/decrypt_value_helper

* fix(byok): update OAuth popup with LiteLLM logo, MCP title suffix, remove emojis

* fix(byok-demo): fix token endpoint URL (/v1/mcp/oauth/token not /v1/mcp/token)

* feat(byok): inject stored BYOK credential as mcp_auth_header on tool execution

* feat(byok): use contextvars to inject per-user credential into OpenAPI tool closures; remove byok-demo from LiteLLM UI

OpenAPI tools have auth headers baked into their closures at registration time. BYOK servers have
no static auth token, so per-user credentials were never reaching the HTTP calls.

Fix: add _request_auth_header ContextVar in openapi_to_mcp_generator.py. create_tool_function now
reads this var at call time and overrides the Authorization header if set. execute_mcp_tool resolves
the MCP server and performs BYOK checks before the local-tool dispatch branch, then sets the
ContextVar around _handle_local_mcp_tool so the credential flows into the HTTP request.

Also remove the /tools/byok-demo page from the LiteLLM UI dashboard — the demo lives at
~/Downloads/litellm-byok-demo/index.html (served separately on port 8080).

* fix: address greptile review feedback (greploop iteration 2)

- Cache invalidation: add _invalidate_byok_cred_cache() and call it after
  store_user_credential() in both token endpoint and management endpoint
- Unbounded cache: add _BYOK_CRED_CACHE_MAX_SIZE=4096 with clear-on-overflow
- Unbounded auth codes: add _AUTH_CODES_MAX_SIZE=1000 with 503 on overflow
- Double DB query: merge _check_byok_credential + _get_byok_credential into
  single _get_byok_credential call; raise 401 inline if None returned
- Sidebar: remove byok-demo entry (page was deleted in prior commit)
- JWT comment: document why byok_session HS256 token can't be used as proxy auth

* fix: address greptile review feedback (greploop iteration 3)

- auth_type: pre-format Authorization header (Bearer/ApiKey/Basic) in server.py
  before setting ContextVar so openapi_to_mcp_generator respects server auth_type
- cache invalidation on delete: call _invalidate_byok_cred_cache after
  delete_user_credential so stale True entries don't persist for 60s
- ContextVar guard: only set _request_auth_header when mcp_auth_header is set,
  avoiding unnecessary ContextVar overhead on non-BYOK tool calls

* fix: address greptile review feedback (greploop iteration 4)

- Unified credential cache: store actual credential value (Optional[str])
  instead of just bool so _get_byok_credential also benefits from caching —
  eliminates the DB hit on every BYOK tool call within the 60s TTL window
- Extracted _write_byok_cred_cache() helper for consistent cache writes
- Replaced has_user_credential with get_user_credential in _check_byok_credential
  so one DB call satisfies both existence check and value retrieval
- Remove false 'encrypted at rest' claim from OAuth HTML and ByokCredentialModal

* Update tests/test_litellm/proxy/_experimental/mcp_server/test_byok_oauth_endpoints.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tests/test_litellm/proxy/_experimental/mcp_server/test_byok_oauth_endpoints.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-04 21:19:25 -08:00
Ishaan Jaff
1f412bc6d8
[Feat] Add Tool Policies for AI Gateway (#22732)
* fix: fix ui render

* fix: fix minor bugs

* refactor: use prisma functions instead of raw sql (safer)

* fix(add-new-tiles-to-tool-policies): allow developer to see what's available

* feat: ensure tool allowlist runs correctly for tool names + mcp's

* refactor: more ui improvements

* feat: working key tool blocking

* feat(tools): show tool logs

* refactor: backend code improvements

* refactor: improve log viewer for tools

* fix: address PR review feedback for tool access control

- Add missing blocked_tools column to root schema.prisma (schema drift)
- Invalidate ToolPolicyRegistry after policy mutations so changes take effect immediately
- Remove dead code: unused get_effective_policies, get_tool_policies_cached, and helpers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: race condition in permission resolution and remove duplicate allowlist check

- Use atomic update_many with object_permission_id=None to prevent concurrent
  requests from creating orphaned permission rows and losing tool blocks
- Remove duplicate allowed_tools enforcement from guardrail (already enforced
  in auth layer via check_tools_allowlist)
- Move inline uuid import to module level

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* update to account for  userAgent

* UI - Add ToolDetails

* input/output policy

* LiteLLM_PolicyAttachmentTable

* LiteLLM_PolicyAttachmentTable

* fix: add _enqueue_tool_registry_upsert

* fix: tool mgmt endpoints

* tool mgmt endpoints

* Update tests/test_litellm/proxy/db/test_tool_registry_writer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tests/test_litellm/proxy/db/test_tool_registry_writer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tests/test_litellm/proxy/db/test_tool_registry_writer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: sync root schema.prisma and fix test_tool_registry_writer for input/output policy

- Migrate root schema.prisma LiteLLM_ToolTable from call_policy to
  input_policy/output_policy, add missing user_agent and last_used_at columns
  (now consistent with litellm/proxy/schema.prisma and litellm-proxy-extras)
- Fix SpendLogToolIndex comment across all three schema files
- Fix all call_policy references in test_tool_registry_writer.py:
  swapped update_tool_policy arguments, wrong get_tools_by_names return type
  assertions, _mock_tool_row setting call_policy instead of input_policy

Addresses Greptile review feedback on PR #22732.

Made-with: Cursor

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-03 20:22:20 -08:00
Krish Dholakia
5662228e20
feat(ui): add user filtering to usage page (#22059)
* feat(ui): add user filtering to usage page

Adds "User Usage" as a new view option in the usage page dropdown,
allowing admins to view and filter usage data by individual users
via the existing /user/daily/activity backend endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(ui/): working usage filtering

* fix(ui): use single-select for user filter and add tests

The user entity type's backend endpoint only accepts a single user_id,
so the filter now uses single-select mode instead of multi-select.
Added tests for the new user entity type in EntityUsage and
UsageViewSelect. Updated CLAUDE.md and AGENTS.md with guidance on
UI/backend contract consistency and test coverage for new entity types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* revert: remove unintended package-lock.json changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* revert: restore package-lock.json to merge base state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 10:37:45 -08:00
Ishaan Jaff
81146472cb
[Feat] MCP Gateway - Allow setting MCP Servers as Private/Public available on Internet (#20607)
* update MCPAuthenticatedUser

* add available_on_public_internet for MCPs

* update claude.md

* init IPAddressUtils

* init available_on_public_internet

* add on REST endpoints

* filter with IP

* TestIsInternalIp

* _extract_mcp_headers_from_request

* init get_mcp_client_ip

* _get_general_settings

* allowed_server_ids

* address PR comments

* get_mcp_server_by_name fix

* fix server

* fix review comments

* get_public_mcp_servers

* address _get_allowed_mcp_servers
2026-02-06 17:51:20 -08:00
Cesar Garcia
5b0729034c
docs: cleanup README and improve agent guides (#17003)
* docs: cleanup README and improve AI agent guides

- Remove obsolete version warnings (openai>=1.0.0, pydantic>=2.0.0)
- Add note about Responses API in README
- Add GitHub templates section to CLAUDE.md, GEMINI.md, and AGENTS.md
- Remove temporary test file test_pydantic_fields.py

* update files

* update Gemini file
2025-11-23 21:53:53 -08:00
Cesar Garcia
0908208076
chore: cleanup repo and improve AI docs (#16775)
* chore: remove development setup files from repository

Removes VERTEX_ENV_SETUP.md and setup_vertex_env.sh from the
repository as they are not referenced in documentation or tests.

These files were added in PR #15824 alongside the VertexAI Search
feature. The setup information is already well-documented in the
official docs at:
docs/my-website/docs/pass_through/vertex_ai_search_datastores.md

* docs: add poetry run python usage to CLAUDE.md and AGENTS.md
2025-11-18 11:36:27 -08:00
Cole McIntosh
15dabf6573
docs(CLAUDE.md): add development guidance and architecture overview for Claude Code (#12011) 2025-06-24 20:48:08 -07:00