Chat Completions

POST/v1/chat/completions

Minimum-discount routing: Prefix the path with a min{N} segment (e.g. /min30/v1/chat/completions) to require marketplace seller offers to meet a minimum estimated buyer discount before routing. Buyer-owned providers are not covered. See Minimum-Discount Routing.

The primary inference endpoint. OpenAI-compatible — works with any client that speaks the OpenAI format.

Request

Parameters

Name
model
required
Description
Model name (canonical, OpenRouter, Venice, or alias format)
Name
messages
required
Description
Array of message objects
Name
stream
Description
true for SSE streaming (default: false)
Name
max_tokens
Description
Maximum output tokens
Name
max_completion_tokens
Description
Alias for max_tokens
Name
temperature
Description
Sampling temperature
Name
top_p
Description
Nucleus sampling
Name
stop
Description
Stop sequences
Name
tools
Description
Function calling tools (OpenAI format — auto-translated for Anthropic sellers)
Name
tool_choice
Description
Tool selection strategy (auto, required, or specific function)
Name
response_format
Description
Structured output format
Name
provider
Description
Optional provider hint — a single provider or an array of providers (an allow-list). Narrows routing to offers from the matching provider(s) before cheapest-offer selection.
Name
provider_url
Description
Optional provider URL hint (single value). Equivalent to provider, but matched by provider host.
Name
provider_base_url
Description
Optional provider base URL hint (single value). Equivalent to provider_url.
Name
stream_options
Description
{"include_usage": true} (auto-injected)

Request

bash

curl https://api.surplusintelligence.ai/v1/chat/completions \
  -H "Authorization: Bearer inf_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4.6",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true,
    "max_tokens": 1000
  }'

Price Threshold

bash

-H "X-Max-Price-Per-1M: 8.0"

Or in body: "max_price_per_1m": 8.0 — skips sellers above this price.

See Parameter Compatibility for full model-specific support.

Provider Pinning

By default, Surplus routes each request to the cheapest healthy seller for the requested model. If you need to restrict routing to one or more specific providers, pass a provider hint in the request body. It accepts a single provider or an array of providers (an allow-list).

With an array, the request routes to any offer whose provider is in the list — the router then applies normal selection within that set: it picks the cheapest healthy offer among the allowed providers and excludes everyone else (even a cheaper provider you didn't list). All the usual rules still apply inside the allow-list: health checks, spend caps, price thresholds, estimated-cost sorting, and failover among matching offers.

Accepted forms (each array element can be any of these):

a provider id or name — "zai" / "Z.ai", "uncensored" / "Uncensored AI"
a provider host — "api.z.ai"
a provider URL — "provider_url": "https://api.z.ai/api/paas/v4" or "provider_base_url": "https://api.uncensored.com/api/v1" (these two aliases stay single-valued)

Unsupported providers. Each entry is matched against the supported provider list. Unrecognized entries are ignored, so a mix like ["zai", "not-a-provider"] simply routes on zai. If every entry is unrecognized, the request fails with 400 unsupported_provider (the error names the bad entries and lists the supported providers).

No offers vs unsupported. A supported provider that just has no active healthy offer for the requested model returns 404 no_sellers_for_model — distinct from 400 unsupported_provider. Provider hints are optional and advanced: pinning narrowly can fail even when another (unlisted) provider has liquidity for the same model.

Provider pinning

json

{
  "model": "glm-4.7",
  "messages": [{"role": "user", "content": "Hello!"}],
  "provider": "zai"
}

json

{
  "model": "glm-4.7",
  "messages": [{"role": "user", "content": "Hello!"}],
  "provider": ["zai", "openrouter"]
}

Tool Calling

Current behavior:

Send tools in OpenAI tools / tool_choice format.
OpenAI-compatible sellers receive OpenAI-format tool calls after normalization. This is the path used by the active production Claude sellers today, mostly Venice and Bankr LLM Gateway.
Native Anthropic sellers (api.anthropic.com) are supported through an OpenAI ⇄ Anthropic translation layer: tools become Anthropic tools, assistant tool_calls become tool_use blocks, role: "tool" messages become tool_result blocks, and streaming tool_use deltas are converted back to OpenAI tool_calls deltas.
Tool support is still model-dependent. Check GET /v1/models: if supported_parameters does not include tools, the marketplace strips tools and tool_choice before forwarding because the upstream model rejects them.
Non-function tools such as computer-use or text-editor tool types are not executed by the marketplace; unsupported tool definitions are stripped or normalized to function tools when possible.

Streaming function-tool calling is supported for providers/models that stream tool-call deltas. Anthropic SSE tool_use events are translated to OpenAI tool_calls deltas; OpenAI-compatible providers are passed through after normalization.

Example multi-turn function-tool transcript

json

{
  "model": "claude-opus-4.6",
  "messages": [
    {"role": "user", "content": "What's the weather in Tokyo?"},
    {"role": "assistant", "content": null, "tool_calls": [
      {"id": "call_1", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\":\"Tokyo\"}"}}
    ]},
    {"role": "tool", "tool_call_id": "call_1", "content": "{\"temp\": 22, \"condition\": \"sunny\"}"}
  ],
  "tools": [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}]
}

Message Roles

Role	Description
`system`	System prompt (mapped to Anthropic `system` parameter when needed)
`developer`	Alias for `system` (OpenAI convention, auto-mapped)
`user`	User message (text, images, or mixed content blocks)
`assistant`	Model response (may include `tool_calls`)
`tool`	Tool result (requires `tool_call_id` matching a previous `tool_calls` entry)

Response

Standard OpenAI format. Streaming returns SSE chunks with data: {...} lines.

Error Codes

Status	Meaning
400	Invalid request (bad model name, missing messages)
401	Invalid API key
402	Insufficient balance/allowance, or payment required (x402/MPP)
404	Model has no sellers
503	All sellers unhealthy