Orchestrating 1,000+ A100 GPUs for GenAI features serving 200M+ global users with HAMi GPU sharing and KEDA autoscaling.
2× fewer GPUs with HAMi GPU sharing, handling 700% traffic spikes
- 2× fewer GPUs needed for training + inference pipelines via HAMi GPU sharing.
- USD 17.4M in estimated cost savings compared to equivalent on-demand cloud GPU provisioning.
- MTTR reduced by 91% (from ~2 hrs to ~10 min); GPU surge errors dropped by 85%.
Improving GPU utilization for autonomous driving workloads with HAMi-based GPU virtualization on Kubernetes.
10× GPU utilization improvement in CI pipelines
- 30% reduction in GPU hours for simulation workloads.
- Hybrid GPU sharing strategy combining HAMi with MIG and time-slicing.
Scaling machine learning infrastructure with HAMi-based GPU virtualization on Kubernetes.
3x improvement in platform GPU utilization
- Improved overall cluster GPU efficiency under mixed workloads.
- Enabled faster rollout of AI features with more predictable scheduling.
Building a flexible GPU cloud with HAMi to increase utilization and improve delivery speed.
>80% average GPU utilization after vGPU adoption
- GPU operating costs were reduced by around 50%.
- Typical environment delivery time dropped from about one day to around twenty minutes.
Building a heterogeneous AI virtualization pooling solution (Effective GPU) with HAMi.
Up to 57% GPU savings for production and test clusters
- Reduced GPU waste in both production and non-production environments.
- Improved utilization with a unified pool across heterogeneous accelerators.
Improving AI inference orchestration with HAMi in education-focused workloads.
90% of GPU infrastructure optimized using HAMi
- Most GPU infrastructure was standardized and optimized through HAMi.
- Strengthened stability and efficiency for inference-heavy traffic.