Available on all Portkey plans.
/v1/messages endpoint accepts the Anthropic Messages API format and routes to any of 3000+ models across all major providers. Tools built natively on the Messages format — like Claude Code and the Claude Agent SDK — work with any backend model through Portkey without modification.
Why Messages API
- Write once, run anywhere — Any SDK or tool built on the Anthropic Messages format works instantly. No rewrites.
- Switch providers with one string — Change the
modelparameter to route to a different provider. Request format and response shape stay identical. - Full gateway features — Fallbacks, load balancing, caching, and observability work transparently across all providers.
Quick Start
Use the Anthropic SDK with Portkey’s base URL. The@provider/model format routes requests to the correct provider.
max_tokens is required. See the Model Catalog for all supported provider and model strings.Switching Providers
Change themodel string to route to any provider. Everything else stays the same.
The SDK code, request format, and response shape are identical across all providers. Portkey translates the Messages format to each provider’s native API. See Provider Support for how this works.
Migrate in 2 Lines
Already using the Anthropic SDK? Point it at Portkey:Python
@anthropic-provider/ prefix to keep routing to Anthropic, or switch the model string to any other provider.
Text Generation
System Prompt
Set a system prompt with the top-levelsystem parameter:
system parameter also accepts an array of content blocks for prompt caching:
Python
Streaming
Stream responses withstream=True in the SDK or "stream": true in cURL.
SSE event reference
SSE event reference
Events are emitted in this sequence for every streaming response:
Example Example
| Event | Description |
|---|---|
message_start | Opens the message with metadata (id, model, initial usage) |
content_block_start | Opens a content block — type: "text" for text, type: "tool_use" for tool calls |
content_block_delta | Incremental content — text_delta for text, input_json_delta for tool input |
content_block_stop | Closes a content block |
message_delta | Closes the message with stop_reason (end_turn, max_tokens, tool_use) and final usage |
message_stop | Final event signaling stream completion |
content_block_delta event:message_delta event:Multi-turn Conversations
Build conversations by passing the full message history. Messages must alternate betweenuser and assistant roles.
Generation Parameters
| Parameter | Type | Description |
|---|---|---|
max_tokens | integer | Required. Maximum tokens in the response |
temperature | float | Sampling temperature (0–1). Higher = more creative |
top_p | float | Nucleus sampling threshold (0–1) |
top_k | integer | Top-K sampling. Anthropic native only — silently dropped on adapter providers |
stop_sequences | array | Stop strings. Translated to stop for adapter providers |
stream | boolean | Enable streaming responses |
Tool Use
Define tools withname, description, and input_schema:
Tool Results
Pass tool results back in auser message with tool_result content blocks to continue the conversation:
Vision
Send images using content blocks. Supports both URLs and base64-encoded data.Structured Output
Useoutput_config to constrain responses to a JSON schema. Portkey maps this to response_format for adapter providers.
output_config is a Portkey extension to the Messages API format. Only json_schema is supported — json_object is not available via the adapter.Extended Thinking
Two mechanisms for controlling model reasoning:thinking — Anthropic native. Pass directly to Anthropic Claude models. Silently dropped on adapter providers.
output_config.effort — Cross-provider. Works across Anthropic, OpenAI o-series, and Gemini 2.5. Portkey maps it to each provider’s native reasoning format.
When using Anthropic’s
thinking parameter, max_tokens must exceed budget_tokens. See Thinking Mode for provider-specific effort mappings.Prompt Caching
Usecache_control on system prompts, messages, and tool definitions to cache frequently-used content.
usage object of the response.
cache_control is Anthropic-native and is stripped when routing to adapter providers. See Anthropic prompt caching for details.Provider Support
Portkey handles the Messages API in two ways depending on the provider:- Native providers — Requests pass through directly. All Anthropic-specific features work (
thinking,cache_control,top_k, etc.). - Adapter providers — Portkey translates between Messages format and the provider’s native Chat Completions format. See Parameter Compatibility for what is and isn’t supported.
Parameter Compatibility
Portkey’s Messages adapter translates requests to each provider’s Chat Completions format for non-native providers. Unsupported parameters are silently dropped — no error is returned. Parameters translated for adapter providers:| Messages API param | Adapter equivalent | Notes |
|---|---|---|
max_tokens | max_completion_tokens | |
stop_sequences | stop | |
system | First message with role: "system" | String or array both handled |
tools[].input_schema | tools[].function.parameters | Format converted |
tool_choice: {type: "auto"} | "auto" | |
tool_choice: {type: "any"} | "required" | |
tool_choice: {type: "tool", name: X} | {type: "function", function: {name: X}} | |
metadata.user_id | user | |
temperature, top_p, stream | Direct pass-through | |
output_config.format (json_schema) | response_format | Portkey extension; json_object not supported |
output_config.effort | reasoning_effort | Portkey extension for cross-provider reasoning |
thinking— Anthropic-native; useoutput_config.effortfor cross-provider reasoning controltop_k— no Chat Completions equivalentcache_control— stripped during message transformationcontainer,mcp_servers,service_tier,anthropic_beta
Using with Portkey Features
The Messages API works with all Portkey gateway features. Pass a config via header alongside any Anthropic SDK call:Configs
Route, load balance, and set fallbacks
Caching
Cache responses for faster, cheaper calls
Fallbacks
Automatic failover across providers
Load Balancing
Distribute traffic across models
Guardrails
Input/output guardrails
Observability
Full logging and tracing
API Reference
- Messages —
POST /v1/messages

