Viraat Das
| Viraat Das | |
| Occupation | Entrepreneur, software engineer |
|---|---|
| Known for | Co-founder and CEO of Exla |
Viraat Das is an American entrepreneur and software engineer who is the co-founder and CEO of Exla, a technology company that develops a software development kit (SDK) for running transformer-based artificial intelligence models across a wide range of hardware platforms. Exla was part of the Y Combinator Winter 2025 batch.[1]
Career
Prior to founding Exla, Das worked as a machine learning engineer at Amazon.
In January 2025, Das co-founded Exla alongside Pranav Nair. The company develops an SDK designed to optimize and deploy transformer models—including large language models (LLMs), vision-language models (VLMs), vision-language-action models (VLAs), and computer vision models—on a broad spectrum of hardware, from edge devices to data center GPUs.[2]
Exla's core technology employs aggressive quantization techniques to minimize the memory footprint of AI models while simultaneously increasing inference speed. According to the company, its optimizations can reduce model memory usage by up to 80% and accelerate inference by 3 to 20 times compared to unoptimized baselines. The SDK supports a range of hardware targets, including NVIDIA Jetson edge devices, NVIDIA consumer and data center GPUs, Apple Silicon, Intel processors with AVX-512, ARM processors with NEON, Raspberry Pi, and mobile devices running iOS and Android.
The company publishes performance benchmarks for its optimizations. For example, Exla claims inference speeds of approximately 200 tokens per second on an NVIDIA Jetson (described as a 5x improvement), 35,000 tokens per second on an NVIDIA H100 (7x improvement), 30 tokens per second on an iPhone 15 Pro (4x improvement), and 75 tokens per second on an Apple M3 Max (4x improvement), using a GPT open-source 120-billion-parameter model.
Exla's SDK is designed to integrate with existing machine learning workflows, allowing developers to load models from sources such as Hugging Face and apply hardware-specific optimizations with a few lines of Python code. Users can specify target hardware and optional memory constraints during the optimization process.
In addition to its SDK, Exla launched a service for on-demand GPU cluster provisioning. The company also offers custom optimization solutions for organizations with specific model deployment or hardware requirements.
Exla operates in the fields of artificial intelligence, edge computing, and computer vision.
References
- ↑ "Exla – An SDK to run transformer models anywhere". 'Exla}'. Retrieved 2026-03-19.
- ↑ "Exla – An SDK to run transformer models anywhere". 'Exla}'. Retrieved 2026-03-19.