Comparison

GPUBox vs Together.ai

Both serve open-source models behind an OpenAI-compatible API. The honest difference: catalog breadth vs jurisdiction. Together hosts hundreds of models in US infrastructure. We host a curated few in the UK.

If catalog breadth matters more than where the data lives, Together wins on that axis. If your stakeholders ask "is the inference happening in the UK?" and "adequate" isn't the answer they want, GPUBox is the answer.

Attribute	GPUBox	Together.ai
API surface	OpenAI-compatible. Drop-in replacement at /v1.	OpenAI-compatible. Drop-in replacement at /v1.
Hosting jurisdiction	United Kingdom. UK-incorporated operating company. UK VAT registered.	United States primarily. SOC 2 Type II.
Model catalog size	Six live models, curated (chat, reasoning, vision, audio, embeddings, image). Quality over breadth.	200+ open-source models — Llama, Mixtral, Qwen, DeepSeek, Stable Diffusion, audio, embeddings.
Frontier model access	Qwen2.5-32B (chat), QwQ-32B (reasoning), Qwen2.5-VL-7B (vision), Whisper, BGE-M3 — version-pinned; you pick what you call.	DeepSeek-V3, Llama 3.3 405B, Mixtral, Qwen variants. Larger frontier OSS options.
Pricing — chat completions	£1.00 per 1M tokens (blended input + output). Currently ~$1.25 at GBP/USD.	Tiered by model. ~$0.18/M for Qwen2.5-7B → ~$0.88/M for DeepSeek-V3 → $5/M for Llama 3.3 405B.
Pricing transparency	Single blended rate per model. No separate input/output rates. Published at /pricing.	Per-model pricing. Separate input vs output rates. Discounted for batch.
Currency	GBP. VAT-compliant invoicing for UK and EU.	USD.
Streaming + tools	Streaming SSE, JSON mode, function calling — all OpenAI-compatible.	Streaming SSE, JSON mode, function calling.
Fine-tuning service	LoRA + full fine-tuning via /v1/training; paid fine-tune plans live (NG via Paystack, UK via Stripe).	LoRA + full fine-tuning available. Bring data, get a serving endpoint.
Dedicated capacity	Available for sovereign / regulated customers — reserved hardware, signed DPA. See /sovereignty.	Together Reserved tier — dedicated GPU clusters. Enterprise sales.
Audit log	Per-call audit log retained 30 days minimum.	Usage analytics in dashboard. Audit log details vary by tier.
Audience	UK developers, regulated industries, sovereignty-conscious enterprises.	Global AI developers, OSS researchers, anyone wanting a wide model catalog.

API surface

GPUBox

OpenAI-compatible. Drop-in replacement at /v1.

Together.ai

OpenAI-compatible. Drop-in replacement at /v1.

Hosting jurisdiction

GPUBox

United Kingdom. UK-incorporated operating company. UK VAT registered.

Together.ai

United States primarily. SOC 2 Type II.

Model catalog size

GPUBox

Six live models, curated (chat, reasoning, vision, audio, embeddings, image). Quality over breadth.

Together.ai

200+ open-source models — Llama, Mixtral, Qwen, DeepSeek, Stable Diffusion, audio, embeddings.

Frontier model access

GPUBox

Qwen2.5-32B (chat), QwQ-32B (reasoning), Qwen2.5-VL-7B (vision), Whisper, BGE-M3 — version-pinned; you pick what you call.

Together.ai

DeepSeek-V3, Llama 3.3 405B, Mixtral, Qwen variants. Larger frontier OSS options.

Pricing — chat completions

GPUBox

£1.00 per 1M tokens (blended input + output). Currently ~$1.25 at GBP/USD.

Together.ai

Tiered by model. ~$0.18/M for Qwen2.5-7B → ~$0.88/M for DeepSeek-V3 → $5/M for Llama 3.3 405B.

Pricing transparency

GPUBox

Single blended rate per model. No separate input/output rates. Published at /pricing.

Together.ai

Per-model pricing. Separate input vs output rates. Discounted for batch.

Currency

GPUBox

GBP. VAT-compliant invoicing for UK and EU.

Together.ai

USD.

Streaming + tools

GPUBox

Streaming SSE, JSON mode, function calling — all OpenAI-compatible.

Together.ai

Streaming SSE, JSON mode, function calling.

Fine-tuning service

GPUBox

LoRA + full fine-tuning via /v1/training; paid fine-tune plans live (NG via Paystack, UK via Stripe).

Together.ai

LoRA + full fine-tuning available. Bring data, get a serving endpoint.

Dedicated capacity

GPUBox

Available for sovereign / regulated customers — reserved hardware, signed DPA. See /sovereignty.

Together.ai

Together Reserved tier — dedicated GPU clusters. Enterprise sales.

Audit log

GPUBox

Per-call audit log retained 30 days minimum.

Together.ai

Usage analytics in dashboard. Audit log details vary by tier.

Audience

GPUBox

UK developers, regulated industries, sovereignty-conscious enterprises.

Together.ai

Global AI developers, OSS researchers, anyone wanting a wide model catalog.

Pick GPUBox if

UK data residency is a contractual or regulatory requirement.
GBP invoicing matters for accounts payable.
You want one blended rate, not per-model pricing maps.
Curated models cover your use case (chat, reasoning, vision, audio, embeddings, image).
You want a UK-incorporated counterparty for the DPA.

Pick Together.ai if

You need a specific OSS model not on our menu.
You're running OSS-research breadth across many model families.
US data residency is fine for your customers.
You need 405B-class models — we run a 32B today.

Try the drop-in for yourself.

Email us for a same-day API key. First £20 of usage is on us.

Get an API key Read the quickstart →