GPU Inferencing & Cloud

Dedicated GPU Infrastructure for AI at Scale

Bare-metal NVIDIA GPUs — from T4 to full DGX H100 nodes — with Triton, JupyterLab, MIG, InfiniBand, and automated fine-tuning jobs built in.

Sub-60s Provisioning

10 Global Regions

DDoS Protected

Hourly or Monthly

GPU Metrics Dash

Auto Fine-Tuning

GPU Cloud Instances

Every instance includes CUDA 12, cuDNN 9, and a full ML framework stack. Commit monthly to save up to 13% vs hourly.

NVIDIA T4

16 GB GDDR6

65 TFLOPS FP16

PCIe 3.0 ×16

GPU-T4

$320/mo

$0.504/hr if hourly

vCPUs

8

System RAM

32 GB

NVMe Storage

200 GB

Bandwidth

10 TB

Best for

Inference servingSmall model fine-tuningVideo analyticsNLP pipelines
  • CUDA 12 + cuDNN 9
  • TensorRT 10 pre-installed
  • NVMe SSD storage
  • Snapshots included
  • IPv4 + IPv6
Deploy GPU-T4
Most Popular

NVIDIA A100

40 GB HBM2e

312 TFLOPS FP16

PCIe 4.0 ×16 / NVLink 3.0

GPU-A100-40

$1200/mo

$1.890/hr if hourly

vCPUs

16

System RAM

64 GB

NVMe Storage

400 GB

Bandwidth

15 TB

Best for

LLM trainingDiffusion model servingScientific HPCComputer vision
  • CUDA 12 + cuDNN 9
  • TensorRT 10 pre-installed
  • NVMe RAID storage
  • Snapshots + automated backups
  • IPv4 + IPv6
  • Priority support
Deploy GPU-A100-40

NVIDIA A100

80 GB HBM2e

312 TFLOPS FP16

PCIe 4.0 ×16 / NVLink 3.0

GPU-A100-80

$2200/mo

$3.466/hr if hourly

vCPUs

32

System RAM

128 GB

NVMe Storage

800 GB

Bandwidth

20 TB

Best for

70B+ LLM trainingGenerative AIMolecular dynamicsRL workloads
  • CUDA 12 + cuDNN 9
  • TensorRT 10 + Triton
  • NVMe RAID storage
  • Snapshots + automated backups
  • IPv4 + IPv6
  • Priority support
Deploy GPU-A100-80

NVIDIA H100

80 GB HBM3

989 TFLOPS FP16

PCIe 5.0 ×16 / NVLink 4.0

GPU-H100

$4800/mo

$7.562/hr if hourly

vCPUs

48

System RAM

192 GB

NVMe Storage

1600 GB

Bandwidth

Unmetered

Best for

Foundation model trainingMulti-modal AIProtein foldingAutonomous systems
  • CUDA 12 + cuDNN 9
  • TensorRT 10 + Triton + NCCL
  • NVMe RAID storage
  • Full backup suite
  • Unmetered bandwidth
  • Dedicated account manager
Deploy GPU-H100

8× NVIDIA H100

640 GB HBM3 (8× 80 GB)

7,912 TFLOPS FP16 (cluster)

NVLink 4.0 900 GB/s + InfiniBand HDR

GPU-H100x8

$36000/mo

$56.712/hr if hourly

vCPUs

192

System RAM

768 GB

NVMe Storage

6400 GB

Bandwidth

Unmetered

Best for

Pre-training GPT-class modelsDistributed RLClimate simulationsDrug discovery
  • Full DGX H100 node
  • NVLink 4.0 — 900 GB/s GPU-to-GPU
  • 100 Gbps InfiniBand HDR
  • NVMe RAID 6 — 6.4 TB
  • Unmetered 100 Gbps uplink
  • Dedicated SRE + 15-min SLA
Deploy GPU-H100x8
Advanced Facilities

Everything You Need to Ship AI in Production

Not just raw compute — every GPU instance ships with a production-grade ML platform built in.

Included on A100 / H100

Model Serving Stack

NVIDIA Triton Inference Server pre-configured with dynamic batching, model ensembles, and gRPC/REST endpoints out of the box.

All plans

JupyterLab Environment

One-click JupyterLab with GPU passthrough, pre-installed PyTorch 2, TensorFlow 2, JAX, and Hugging Face Transformers.

All plans

Persistent Checkpoints

Attach up to 50 TB of NVMe block storage for training checkpoints. Snapshots are crash-consistent and taken automatically every 6 hours.

H100×8 only

InfiniBand HDR (H100×8)

100 Gbps InfiniBand fabric with RDMA for sub-microsecond GPU-to-GPU communication across the DGX node. NCCL fully tuned.

A100 + H100

Automated Fine-Tuning Jobs

Submit LoRA / QLoRA / full fine-tuning jobs via a REST API. Jobs are queued, executed on your GPU, and results pushed to your S3-compatible bucket.

All plans

GPU Metrics Dashboard

Real-time GPU utilisation, VRAM consumption, power draw, temperature, PCIe / NVLink throughput, and SM occupancy via a hosted Grafana dashboard.

H100 + H100×8

Secure Enclaves & NVDEC

Confidential computing with NVIDIA Hopper TEE (H100 only). Hardware-accelerated video decode/encode via NVDEC/NVENC for media pipelines.

A100 / H100

Multi-Instance GPU (MIG)

Slice an A100 or H100 into up to 7 isolated MIG instances — each with dedicated VRAM, bandwidth, and fault isolation — via the Hostoya control panel.

Supported Models

Deploy open-source checkpoints in seconds or bring your own fine-tuned model.

ModelCategoryParameter sizesMin. GPU
LLaMA 3.1 / 3.3Text Generation8B · 70B · 405BT4
Mistral / MixtralText Generation7B · 8×7B · 8×22BT4
Stable Diffusion XLImage Generation6.6BT4
FLUX.1Image Generation12BA100-40
Whisper v3 LargeSpeech-to-Text1.5BT4
CLIP / SigLIPVision-Language400M · 4BT4
Gemma 2Text Generation2B · 9B · 27BT4
DeepSeek-R1Reasoning7B · 32B · 671BA100-80
Qwen 2.5Text Generation7B · 72BA100-40
Emu3 / CogVideoXVideo Generation8BA100-80
AlphaFold 3Protein FoldingH100
Custom BYOMAnyAny checkpointT4

GPU Pricing Calculator

Estimate your monthly cost based on GPU type, daily usage hours, and inference throughput.

Configure Workload

NVIDIA T4 (16 GB GDDR6)
T4A100-40A100-80H100H100×8
8 hr
1 hr24 hr (always-on)
5,000
100 req/hr100 K req/hr

Monthly Cost Estimate

NVIDIA T4 compute (monthly)$320.00
Inference API (1200K req/mo @ $0.08/K)$96.00
Network egressIncluded
JupyterLab + Triton servingIncluded
GPU metrics dashboardIncluded
Total / month$416.00

Effective rate: $1.733/hr GPU-hour

Deploy NVIDIA T4
No setup fees
Sub-60s provisioning
99.9% uptime SLA
Pause & resume any time