POST /v1/chat/completions
> Minimum-discount routing: Prefix the path with a min{N} segment (e.g. /min30/v1/chat/completions) to require marketplace seller offers to meet a minimum estimated buyer discount before routing. Buyer-owned providers are not covered. See Minimum-Discount Routing.
The primary inference endpoint. OpenAI-compatible — works with any client that speaks the OpenAI format.
Request
curl https://api.surplusintelligence.ai/v1/chat/completions \
-H "Authorization: Bearer inf_xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4.6",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true,
"max_tokens": 1000
}'
Parameters
| Parameter | Required | Description |
|---|---|---|
model | Yes | Model name (canonical, OpenRouter, Venice, or alias format) |
messages | Yes | Array of message objects |
stream | No | true for SSE streaming (default: false) |
max_tokens | No | Maximum output tokens |
max_completion_tokens | No | Alias for max_tokens |
temperature | No | Sampling temperature |
top_p | No | Nucleus sampling |
stop | No | Stop sequences |
tools | No | Function calling tools (OpenAI format — auto-translated for Anthropic sellers) |
tool_choice | No | Tool selection strategy (auto, required, or specific function) |
response_format | No | Structured output format |
provider | No | Optional provider hint. Narrows routing to offers from a matching provider before cheapest-offer selection. |
provider_url | No | Optional provider URL hint. Equivalent to provider, but matched by provider host. |
provider_base_url | No | Optional provider base URL hint. Equivalent to provider_url. |
stream_options | No | {"include_usage": true} (auto-injected) |
Price Threshold
-H "X-Max-Price-Per-1M: 8.0"
Or in body: "max_price_per_1m": 8.0 — skips sellers above this price.
See Parameter Compatibility for full model-specific support.
Provider Pinning
By default, Surplus routes each request to the cheapest healthy seller for the requested model. If you need to use a specific provider, pass a provider hint in the request body:
{
"model": "glm-4.7",
"messages": [{"role": "user", "content": "Hello!"}],
"provider": "zai"
}
Provider pinning narrows the candidate offers to matching provider hosts, then still applies normal routing inside that provider: health checks, spend caps, price thresholds, estimated-cost sorting, and failover among matching offers.
Accepted forms:
"provider": "zai"or"provider": "Z.ai""provider": "uncensored"or"provider": "Uncensored AI""provider_url": "https://api.z.ai/api/paas/v4""provider_base_url": "https://api.uncensored.com/api/v1"
Provider hints are optional and advanced. If no active healthy offer matches the hint, the request can fail even when another provider has available liquidity for the same model.
Tool Calling
Current behavior:
- Send tools in OpenAI
tools/tool_choiceformat. - OpenAI-compatible sellers receive OpenAI-format tool calls after normalization. This is the path used by the active production Claude sellers today, mostly Venice and Bankr LLM Gateway.
- Native Anthropic sellers (
api.anthropic.com) are supported through an OpenAI ⇄ Anthropic translation layer:toolsbecome Anthropic tools, assistanttool_callsbecometool_useblocks,role: "tool"messages becometool_resultblocks, and streamingtool_usedeltas are converted back to OpenAItool_callsdeltas. - Tool support is still model-dependent. Check
GET /v1/models: ifsupported_parametersdoes not includetools, the marketplace stripstoolsandtool_choicebefore forwarding because the upstream model rejects them. - Non-function tools such as computer-use or text-editor tool types are not executed by the marketplace; unsupported tool definitions are stripped or normalized to function tools when possible.
Example multi-turn function-tool transcript:
{
"model": "claude-opus-4.6",
"messages": [
{"role": "user", "content": "What's the weather in Tokyo?"},
{"role": "assistant", "content": null, "tool_calls": [
{"id": "call_1", "type": "function", "function": {"name": "get_weather", "arguments": "{\"city\":\"Tokyo\"}"}}
]},
{"role": "tool", "tool_call_id": "call_1", "content": "{\"temp\": 22, \"condition\": \"sunny\"}"}
],
"tools": [{"type": "function", "function": {"name": "get_weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}]
}
Streaming function-tool calling is supported for providers/models that stream tool-call deltas. Anthropic SSE tool_use events are translated to OpenAI tool_calls deltas; OpenAI-compatible providers are passed through after normalization.
Message Roles
| Role | Description |
|---|---|
system | System prompt (mapped to Anthropic system parameter when needed) |
developer | Alias for system (OpenAI convention, auto-mapped) |
user | User message (text, images, or mixed content blocks) |
assistant | Model response (may include tool_calls) |
tool | Tool result (requires tool_call_id matching a previous tool_calls entry) |
Response
Standard OpenAI format. Streaming returns SSE chunks with data: {...} lines.
Error Codes
| Status | Meaning |
|---|---|
| 400 | Invalid request (bad model name, missing messages) |
| 401 | Invalid API key |
| 402 | Insufficient balance/allowance, or payment required (x402/MPP) |
| 404 | Model has no sellers |
| 503 | All sellers unhealthy |