gpubox.ai

Comparison

GPUBox vs RunPod

Different products with different audiences. RunPod is GPU container rental — you bring a model, they rent you the hardware by the hour. It's the right answer when you need a specific model that nobody else hosts, or when you're running custom training.

GPUBox is an inference API. We host curated models on UK-domiciled hardware, you call them via the OpenAI-compatible surface and pay per token. It's the right answer when you want base_url = "https://api.gpubox.ai/v1" and to be done.

AttributeGPUBoxRunPod
Product shapeOpenAI-compatible API. You call /v1/chat/completions with a model name. We host the GPU.GPU pod rental. You rent containers by the hour, install your own runtime, manage your own scaling.
Pricing modelPer-call. £1.00 per million tokens (chat), £0.005 per audio minute. No idle charges.Per-hour pod rental. ~$0.34/hr for RTX 4090, ~$0.69/hr for RTX A6000, ~$2.69/hr for H100. Pay for the time the pod is running.
Hardware locationUnited Kingdom. Single-region by design. UK-incorporated operating company.Global. ~30 regions across NA / EU / Asia. Customers self-select region.
Data sovereigntyUK-domiciled hardware, UK company, UK jurisdiction. No data leaves the UK without your action.Region-dependent. Most regions are US-based. EU regions exist. No specific UK-sovereign offering.
Setup timeChange one URL in your existing OpenAI SDK code. Three lines. No infrastructure to provision.Pick a template (vLLM, Ollama, custom) → spin up a pod → wait for warm-up → expose endpoint. Minutes per pod.
Model selectionCurated. Three models live: Qwen2.5-32B-Instruct (chat), Whisper-large-v3-turbo (audio), BGE-M3 embeddings (soon).Anything you can fit in a container. Bring-your-own-model. Hundreds of community templates.
Cold startsNone. Models are warm. First token is sub-second on chat completions.Yes on serverless tier. Mitigated by their FlashBoot warm-pool. Pod-tier has no cold start (you're paying for warm).
Idle chargesNone. You pay only for tokens you generate.Yes on pod tier. The card is yours while it's running, even if you're not making requests.
Scale ceilingSingle 5090 today. Capacity-planned, not auto-elastic. Email us for dedicated capacity.Effectively unlimited. Spin up as many pods as you can pay for.
Custom models / fine-tuningNot yet on the API. Roadmap (Factory product). Available manually for partnership engagements.Yes — bring any container. Train, fine-tune, run any model you can package.
OpenAI SDK compatibilityFull. Python, Node, Go, curl — every official SDK works with one URL change.Depends on the template. vLLM templates yes. Custom containers: whatever you implement.
BillingGBP, VAT-compliant invoicing for UK and EU customers.USD primarily. International billing supported.
Audit logPer-call audit log retained 30 days minimum. Token-level usage in dashboard.Pod-level usage logs. Per-request logging is your responsibility (you're running the runtime).

Pick GPUBox if

  • You want OpenAI-compatible API access in three lines of code.
  • You need UK data residency for regulatory or contractual reasons.
  • You want predictable per-token pricing, not per-hour pod meters.
  • You want managed model serving — no template selection, no warm-up tuning.
  • Curated chat / audio / embeddings models cover your use case.

Pick RunPod if

  • You need a specific OSS model GPUBox doesn't host.
  • You're running custom training and need raw container control.
  • You need cards we don't have (H100, A100, multi-GPU pods).
  • You're operating globally and want region selection per pod.
  • Your traffic is bursty enough that hourly metering wins on cost.

Try the drop-in for yourself.

Email us for a same-day API key. First £20 of usage is on us.