Agentic AI, Post-Training
PyTorch, vLLM
Ray
Kubernetes
Bare-Metal GPUs
Workloads like agentic AI and post-training are exploding. They run on PyTorch and inference engines like vLLM. Those frameworks rely on distributed compute — increasingly Ray — which sits on top of Kubernetes and bare-metal GPU clusters.
K8s can schedule containers. It can’t manage multi-tenant GPU allocation, handle fractional GPU sharing, or optimize utilization across training and inference jobs competing for the same hardware. Kueue and Volcano are early attempts, but not production-complete.
We sit between orchestration and distributed compute. Our PySpark-based control plane manages GPU cluster lifecycle — scheduling, allocation, and optimization — bridging Spark’s data pipeline strengths with Kubernetes-native orchestration.
GPU availability is the bottleneck. We make sure every GPU-hour counts.