GPUBox for developers

Sovereign, OpenAI-compatible inference.
Billed in £.

Point your existing OpenAI client at our gateway and ship. UK data residency, transparent per-call pricing in pounds. Built for AI builders who want a sovereign, auditable answer — not another opaque endpoint.

Request API access Live API reference →

swap the base_url — keep your OpenAI code

from openai import OpenAI

client = OpenAI(
    api_key="gpb_...",
    base_url="https://api.gpubox.ai/v1",
)
# everything below is unchanged from your OpenAI code

Capabilities

OpenAI-compatible surfaces.

Read the docs →

Chat completions

live

Qwen2.5-32B-Instruct and QwQ-32B reasoning. Streaming, tool use, JSON mode. Drop-in /v1/chat/completions.

/v1/chat/completions

Vision / image input

live

Qwen2.5-VL-7B-Instruct. Send images inline as image_url (base64 data-URI or https URL) on the same /v1/chat/completions endpoint. OCR, screenshot/UI analysis, charts, visual Q&A.

/v1/chat/completions

Embeddings

live

BGE-M3 multilingual 1024-dim dense vectors, 8k context, L2-normalised. /v1/embeddings.

/v1/embeddings

Audio / speech-to-text

live

Whisper-large-v3-turbo via faster-whisper. 100+ languages, verbose JSON with segment timestamps. /v1/audio/transcriptions.

/v1/audio/transcriptions

Images

live

Text-to-image generation on UK GPUs, OpenAI-compatible request shape. Live on the API today (FLUX).

/v1/images/generations

Fine-tunes (LoRA)

coming soon

Train a LoRA adapter on your data, host it with us, call it by name — weights stay in the UK. Coming with the Factory product; not yet on the API.

/v1/training/runs

Compliance & residency

UK GDPR, stated plainly.

Operated by a UK-incorporated counterparty (Mobile Paradigm Consultancy Ltd, VAT GB397067846). UK GDPR Article 28 DPA with named-subprocessor disclosure and IDTA / SCCs where data flows outside the UK. Per-call audit log retained 30 days minimum.

Residency

UK data residency; billing and payment rails in £. Inference runs on UK-domiciled hardware — we never claim compute or data residency outside the UK.

Workload fit

Best fit for fine-tunes, batch jobs, evaluation, and development inference. We do not position GPUBox for real-time, low-latency consumer serving.

SLA

B2B SLA is best-effort and lower than our consumer tier — no uptime percentage is promised today. Dedicated-capacity SLAs are available on enterprise contracts.

Pricing — GBP

Published pay-as-you-go rates.

What	Rate	Unit
Chat completions (LLM)	£1.00	per 1M tokens
Speech-to-text	£0.005	per audio minute
Embeddings	£0.05	per 1M tokens

Chat completions (LLM)

£1.00per 1M tokens

Speech-to-text

£0.005per audio minute

Embeddings

£0.05per 1M tokens

Published pay-as-you-go rates, excluding VAT. UK customers see VAT added at checkout. See full pricing for the complete card.

See full pricing →

Request API access

Tell us about your workload and we will follow up. No key is issued from this page and no account is created — this is a conversation starter, not a sign-up.

Name

Company

Workload

Monthly volume

Region

Request API access →

Enter your name and company to continue. No API key is issued and no account is created from this page.

Sovereign, OpenAI-compatible inference.Billed in £.

OpenAI-compatible surfaces.

Chat completions

Vision / image input

Embeddings

Audio / speech-to-text

Images

Fine-tunes (LoRA)

UK GDPR, stated plainly.

Published pay-as-you-go rates.

Request API access

Sovereign, OpenAI-compatible inference.
Billed in £.