POST /v1/chat/completions

> Minimum-discount routing: Prefix the path with a min{N} segment (e.g. /min30/v1/chat/completions) to require marketplace seller offers to meet a minimum estimated buyer discount before routing. Buyer-owned providers are not covered. See Minimum-Discount Routing.

The primary inference endpoint. OpenAI-compatible — works with any client that speaks the OpenAI format.

Request

curl https://api.surplusintelligence.ai/v1/chat/completions \

-H "Authorization: Bearer inf_xxx" \

-H "Content-Type: application/json" \

-d '{

"model": "claude-opus-4.6",

"messages": [{"role": "user", "content": "Hello!"}],

"stream": true,

"max_tokens": 1000

}'

Parameters

ParameterRequiredDescription
modelYesModel name (canonical, OpenRouter, Venice, or alias format)
messagesYesArray of message objects
streamNotrue for SSE streaming (default: false)
max_tokensNoMaximum output tokens
max_completion_tokensNoAlias for max_tokens
temperatureNoSampling temperature
top_pNoNucleus sampling
stopNoStop sequences
toolsNoFunction calling tools (OpenAI format — auto-translated for Anthropic sellers)
tool_choiceNoTool selection strategy (auto, required, or specific function)
response_formatNoStructured output format
providerNoOptional provider hint. Narrows routing to offers from a matching provider before cheapest-offer selection.
provider_urlNoOptional provider URL hint. Equivalent to provider, but matched by provider host.
provider_base_urlNoOptional provider base URL hint. Equivalent to provider_url.
stream_optionsNo{"include_usage": true} (auto-injected)

Price Threshold

-H "X-Max-Price-Per-1M: 8.0"

Or in body: "max_price_per_1m": 8.0 — skips sellers above this price.

See Parameter Compatibility for full model-specific support.

Provider Pinning

By default, Surplus routes each request to the cheapest healthy seller for the requested model. If you need to use a specific provider, pass a provider hint in the request body:

{

"model": "glm-4.7",

"messages": [{"role": "user", "content": "Hello!"}],

"provider": "zai"

}

Provider pinning narrows the candidate offers to matching provider hosts, then still applies normal routing inside that provider: health checks, spend caps, price thresholds, estimated-cost sorting, and failover among matching offers.

Accepted forms:

  • "provider": "zai" or "provider": "Z.ai"
  • "provider": "uncensored" or "provider": "Uncensored AI"
  • "provider_url": "https://api.z.ai/api/paas/v4"
  • "provider_base_url": "https://api.uncensored.com/api/v1"

Provider hints are optional and advanced. If no active healthy offer matches the hint, the request can fail even when another provider has available liquidity for the same model.

Tool Calling

Current behavior:

  • Send tools in OpenAI tools / tool_choice format.
  • OpenAI-compatible sellers receive OpenAI-format tool calls after normalization. This is the path used by the active production Claude sellers today, mostly Venice and Bankr LLM Gateway.
  • Native Anthropic sellers (api.anthropic.com) are supported through an OpenAI ⇄ Anthropic translation layer: tools become Anthropic tools, assistant tool_calls become tool_use blocks, role: "tool" messages become tool_result blocks, and streaming tool_use deltas are converted back to OpenAI tool_calls deltas.
  • Tool support is still model-dependent. Check GET /v1/models: if supported_parameters does not include tools, the marketplace strips tools and tool_choice before forwarding because the upstream model rejects them.
  • Non-function tools such as computer-use or text-editor tool types are not executed by the marketplace; unsupported tool definitions are stripped or normalized to function tools when possible.

Example multi-turn function-tool transcript:

{

"model": "claude-opus-4.6",

"messages": [

{"role": "user", "content": "What's the weather in Tokyo?"},

{"role": "assistant", "content": null, "tool_calls": [

{"id": "call_1", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\":\"Tokyo\"}"}}

]},

{"role": "tool", "tool_call_id": "call_1", "content": "{\"temp\": 22, \"condition\": \"sunny\"}"}

],

"tools": [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}]

}

Streaming function-tool calling is supported for providers/models that stream tool-call deltas. Anthropic SSE tool_use events are translated to OpenAI tool_calls deltas; OpenAI-compatible providers are passed through after normalization.

Message Roles

RoleDescription
systemSystem prompt (mapped to Anthropic system parameter when needed)
developerAlias for system (OpenAI convention, auto-mapped)
userUser message (text, images, or mixed content blocks)
assistantModel response (may include tool_calls)
toolTool result (requires tool_call_id matching a previous tool_calls entry)

Response

Standard OpenAI format. Streaming returns SSE chunks with data: {...} lines.

Error Codes

StatusMeaning
400Invalid request (bad model name, missing messages)
401Invalid API key
402Insufficient balance/allowance, or payment required (x402/MPP)
404Model has no sellers
503All sellers unhealthy