Messages

Available on all Portkey plans.

Portkey’s /v1/messages endpoint accepts the Anthropic Messages API format and routes to any of 3000+ models across all major providers. Tools built natively on the Messages format — like Claude Code and the Claude Agent SDK — work with any backend model through Portkey without modification.

Why Messages API

Write once, run anywhere — Any SDK or tool built on the Anthropic Messages format works instantly. No rewrites.
Switch providers with one string — Change the model parameter to route to a different provider. Request format and response shape stay identical.
Full gateway features — Fallbacks, load balancing, caching, and observability work transparently across all providers.

Quick Start

Use the Anthropic SDK with Portkey’s base URL. The @provider/model format routes requests to the correct provider.

import anthropic

client = anthropic.Anthropic(
    api_key="PORTKEY_API_KEY",
    base_url="https://api.portkey.ai"
)

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}]
)

print(message.content[0].text)

max_tokens is required. See the Model Catalog for all supported provider and model strings.

Switching Providers

Change the model string to route to any provider. Everything else stays the same.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)

The SDK code, request format, and response shape are identical across all providers. Portkey translates the Messages format to each provider’s native API. See Provider Support for how this works.

Migrate in 2 Lines

Already using the Anthropic SDK? Point it at Portkey:

Python

client = anthropic.Anthropic(
    api_key="PORTKEY_API_KEY",       # Replace your Anthropic key with your Portkey API key
    base_url="https://api.portkey.ai"  # Point at Portkey
)

All existing Messages API calls work as-is. Use the @anthropic-provider/ prefix to keep routing to Anthropic, or switch the model string to any other provider.

Text Generation

System Prompt

Set a system prompt with the top-level system parameter:

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system="You are a pirate. Always respond in pirate speak.",
    messages=[{"role": "user", "content": "Say hello."}]
)

The system parameter also accepts an array of content blocks for prompt caching:

Python

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system=[
        {"type": "text", "text": "You are an expert on this topic..."},
        {"type": "text", "text": "Here is the reference material...", "cache_control": {"type": "ephemeral"}}
    ],
    messages=[{"role": "user", "content": "Summarize the key points"}]
)

Streaming

Stream responses with stream=True in the SDK or "stream": true in cURL.

with client.messages.stream(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about AI"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Portkey normalizes all streaming responses to the Anthropic SSE event format, regardless of which provider handles the request.

SSE event reference

Events are emitted in this sequence for every streaming response:

Event	Description
`message_start`	Opens the message with metadata (`id`, `model`, initial `usage`)
`content_block_start`	Opens a content block — `type: "text"` for text, `type: "tool_use"` for tool calls
`content_block_delta`	Incremental content — `text_delta` for text, `input_json_delta` for tool input
`content_block_stop`	Closes a content block
`message_delta`	Closes the message with `stop_reason` (`end_turn`, `max_tokens`, `tool_use`) and final `usage`
`message_stop`	Final event signaling stream completion

Example content_block_delta event:

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "text_delta", "text": "Hello"}}

Example message_delta event:

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}, "usage": {"output_tokens": 42}}

Multi-turn Conversations

Build conversations by passing the full message history. Messages must alternate between user and assistant roles.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "My name is Alice."},
        {"role": "assistant", "content": "Hello Alice! How can I help you?"},
        {"role": "user", "content": "What is my name?"}
    ]
)

print(message.content[0].text)  # "Your name is Alice."

Generation Parameters

Parameter	Type	Description
`max_tokens`	integer	Required. Maximum tokens in the response
`temperature`	float	Sampling temperature (0–1). Higher = more creative
`top_p`	float	Nucleus sampling threshold (0–1)
`top_k`	integer	Top-K sampling. Anthropic native only — silently dropped on adapter providers
`stop_sequences`	array	Stop strings. Translated to `stop` for adapter providers
`stream`	boolean	Enable streaming responses

Tool Use

Define tools with name, description, and input_schema:

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=[{
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }]
)

for block in message.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

Tool Results

Pass tool results back in a user message with tool_result content blocks to continue the conversation:

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"},
        {"role": "assistant", "content": [
            {"type": "tool_use", "id": "tool_123", "name": "get_weather", "input": {"location": "Paris"}}
        ]},
        {"role": "user", "content": [
            {"type": "tool_result", "tool_use_id": "tool_123", "content": '{"temp": "22°C", "condition": "sunny"}'}
        ]}
    ],
    tools=[{
        "name": "get_weather",
        "description": "Get weather for a location",
        "input_schema": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}
    }]
)

print(message.content[0].text)

For MCP-based tool use, see Remote MCP.

Vision

Send images using content blocks. Supports both URLs and base64-encoded data.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "url", "url": "https://example.com/image.jpg"}},
            {"type": "text", "text": "Describe this image"}
        ]
    }]
)

print(message.content[0].text)

Structured Output

Use output_config to constrain responses to a JSON schema. Portkey maps this to response_format for adapter providers.

message = client.messages.create(
    model="@openai-provider/gpt-4.1",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract name and age from: Alice is 30 years old."}],
    extra_body={
        "output_config": {
            "format": {
                "type": "json_schema",
                "schema": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "age": {"type": "integer"}
                    },
                    "required": ["name", "age"]
                }
            }
        }
    }
)

output_config is a Portkey extension to the Messages API format. Only json_schema is supported — json_object is not available via the adapter.

Extended Thinking

Two mechanisms for controlling model reasoning: thinking — Anthropic native. Pass directly to Anthropic Claude models. Silently dropped on adapter providers. output_config.effort — Cross-provider. Works across Anthropic, OpenAI o-series, and Gemini 2.5. Portkey maps it to each provider’s native reasoning format.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography"}]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Response: {block.text}")

When using Anthropic’s thinking parameter, max_tokens must exceed budget_tokens. See Thinking Mode for provider-specific effort mappings.

Prompt Caching

Use cache_control on system prompts, messages, and tool definitions to cache frequently-used content.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "You are an expert analyst. Here is a very long reference document...",
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Summarize the key points"}]
)

Cached content reduces latency and cost. Cache usage is reflected in the usage object of the response.

cache_control is Anthropic-native and is stripped when routing to adapter providers. See Anthropic prompt caching for details.

Provider Support

Portkey handles the Messages API in two ways depending on the provider:

Native providers — Requests pass through directly. All Anthropic-specific features work (thinking, cache_control, top_k, etc.).
Adapter providers — Portkey translates between Messages format and the provider’s native Chat Completions format. See Parameter Compatibility for what is and isn’t supported.

The response always comes back in Anthropic Messages format, regardless of which provider handles the request. Native providers: Anthropic, AWS Bedrock (Claude models) Adapter providers: OpenAI, Azure OpenAI, Google Gemini, Google Vertex AI, AWS Bedrock (non-Claude), Mistral AI, Groq, Together AI, and all other providers

Parameter Compatibility

Portkey’s Messages adapter translates requests to each provider’s Chat Completions format for non-native providers. Unsupported parameters are silently dropped — no error is returned. Parameters translated for adapter providers:

Messages API param	Adapter equivalent	Notes
`max_tokens`	`max_completion_tokens`
`stop_sequences`	`stop`
`system`	First message with `role: "system"`	String or array both handled
`tools[].input_schema`	`tools[].function.parameters`	Format converted
`tool_choice: {type: "auto"}`	`"auto"`
`tool_choice: {type: "any"}`	`"required"`
`tool_choice: {type: "tool", name: X}`	`{type: "function", function: {name: X}}`
`metadata.user_id`	`user`
`temperature`, `top_p`, `stream`	Direct pass-through
`output_config.format` (`json_schema`)	`response_format`	Portkey extension; `json_object` not supported
`output_config.effort`	`reasoning_effort`	Portkey extension for cross-provider reasoning

Parameters silently dropped on adapter providers:

thinking — Anthropic-native; use output_config.effort for cross-provider reasoning control
top_k — no Chat Completions equivalent
cache_control — stripped during message transformation
container, mcp_servers, service_tier, anthropic_beta

Provider-specific parameters (e.g. Gemini’s safety_settings, Bedrock guardrail configs) cannot be passed through the Messages adapter. Use the provider’s native integration or Chat Completions instead.

Using with Portkey Features

The Messages API works with all Portkey gateway features. Pass a config via header alongside any Anthropic SDK call:

import anthropic

client = anthropic.Anthropic(
    api_key="PORTKEY_API_KEY",
    base_url="https://api.portkey.ai",
    default_headers={
        "x-portkey-config": "pp-config-xxx"  # Config with fallbacks, load balancing, etc.
    }
)

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Configs

Route, load balance, and set fallbacks

Caching

Cache responses for faster, cheaper calls

Fallbacks

Automatic failover across providers

Load Balancing

Distribute traffic across models

Guardrails

Input/output guardrails

Observability

Full logging and tracing

API Reference

Messages — POST /v1/messages

Anthropic API Docs

Anthropic specification

API Reference

Portkey Messages API reference

Universal API

All three API formats

Responses API

OpenAI Responses format

Chat Completions

OpenAI Chat Completions format

Introduction

Product

Self-Hosting

Support

Why Messages API

Quick Start

Switching Providers

Migrate in 2 Lines

Text Generation

System Prompt

Streaming

Multi-turn Conversations

Generation Parameters

Tool Use

Tool Results

Vision

Structured Output

Extended Thinking

Prompt Caching

Provider Support

Parameter Compatibility

Using with Portkey Features

Configs

Caching

Fallbacks

Load Balancing

Guardrails

Observability

API Reference

Anthropic API Docs

API Reference

Universal API

Responses API

Chat Completions

Introduction

Product

Self-Hosting

Support

​Why Messages API

​Quick Start

​Switching Providers

​Migrate in 2 Lines

​Text Generation

​System Prompt

​Streaming

​Multi-turn Conversations

​Generation Parameters

​Tool Use

​Tool Results

​Vision

​Structured Output

​Extended Thinking

​Prompt Caching

​Provider Support

​Parameter Compatibility

​Using with Portkey Features

Configs

Caching

Fallbacks

Load Balancing

Guardrails

Observability

​API Reference

Anthropic API Docs

API Reference

Universal API

Responses API

Chat Completions

Why Messages API

Quick Start

Switching Providers

Migrate in 2 Lines

Text Generation

System Prompt

Streaming

Multi-turn Conversations

Generation Parameters

Tool Use

Tool Results

Vision

Structured Output

Extended Thinking

Prompt Caching

Provider Support

Parameter Compatibility

Using with Portkey Features

API Reference