Inference API
OpenAI-compatible chat, completions, and embeddings endpoints served by tokenhub.xcity.one.
The inference gateway lives at https://tokenhub.xcity.one/v1 and speaks the OpenAI REST contract. Any OpenAI SDK works.
Authentication
All requests require a bearer token from /dashboard/keys:
Authorization: Bearer sk-...
Keys are revocable from the dashboard or via the Keys API. Rotating a key takes effect within ~5s globally.
POST /v1/chat/completions
Standard OpenAI chat-completions shape.
curl https://tokenhub.xcity.one/v1/chat/completions \
-H "Authorization: Bearer $XCITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Summarize the Argentina project in two sentences."}
],
"stream": false
}'
Response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1747353600,
"model": "claude-sonnet-4-6",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 32,
"completion_tokens": 64,
"total_tokens": 96
}
}
POST /v1/completions
Legacy completions endpoint. Supported for OpenAI parity but new code should use chat/completions.
POST /v1/embeddings
curl https://tokenhub.xcity.one/v1/embeddings \
-H "Authorization: Bearer $XCITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "model": "text-embedding-3-small", "input": "hello world" }'
GET /v1/models
Returns the models allowed by the requesting key’s plan whitelist — not the global catalog. Use this to populate UI model pickers without leaking plans the user can’t access.
Streaming
Set "stream": true for SSE-style streaming. The wire format matches OpenAI exactly:
data: {"choices":[{"delta":{"content":"He"}}]}
data: {"choices":[{"delta":{"content":"llo"}}]}
data: [DONE]
Error codes
| Status | Meaning |
|---|---|
401 | Invalid or revoked key |
403 | Model not in your plan’s whitelist |
402 | Budget cap exceeded (per-request or monthly) |
429 | Rate limit hit; retry with exponential backoff |
5xx | Upstream provider or gateway issue; safe to retry idempotent calls |
All error bodies follow the OpenAI shape:
{ "error": { "message": "...", "type": "...", "code": "..." } }
Last updated: