Inference API

OpenAI-compatible chat, completions, and embeddings endpoints served by tokenhub.xcity.one.

The inference gateway lives at https://tokenhub.xcity.one/v1 and speaks the OpenAI REST contract. Any OpenAI SDK works.

Authentication

All requests require a bearer token from /dashboard/keys:

Authorization: Bearer sk-...

Keys are revocable from the dashboard or via the Keys API. Rotating a key takes effect within ~5s globally.

POST /v1/chat/completions

Standard OpenAI chat-completions shape.

curl https://tokenhub.xcity.one/v1/chat/completions \
  -H "Authorization: Bearer $XCITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [
      {"role": "system", "content": "You are concise."},
      {"role": "user", "content": "Summarize the Argentina project in two sentences."}
    ],
    "stream": false
  }'

Response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1747353600,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 64,
    "total_tokens": 96
  }
}

POST /v1/completions

Legacy completions endpoint. Supported for OpenAI parity but new code should use chat/completions.

POST /v1/embeddings

curl https://tokenhub.xcity.one/v1/embeddings \
  -H "Authorization: Bearer $XCITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "text-embedding-3-small", "input": "hello world" }'

GET /v1/models

Returns the models allowed by the requesting key’s plan whitelist — not the global catalog. Use this to populate UI model pickers without leaking plans the user can’t access.

Streaming

Set "stream": true for SSE-style streaming. The wire format matches OpenAI exactly:

data: {"choices":[{"delta":{"content":"He"}}]}
data: {"choices":[{"delta":{"content":"llo"}}]}
data: [DONE]

Error codes

Status	Meaning
`401`	Invalid or revoked key
`403`	Model not in your plan’s whitelist
`402`	Budget cap exceeded (per-request or monthly)
`429`	Rate limit hit; retry with exponential backoff
`5xx`	Upstream provider or gateway issue; safe to retry idempotent calls

All error bodies follow the OpenAI shape:

{ "error": { "message": "...", "type": "...", "code": "..." } }

Last updated: May 15, 2026