At Together AI, the AI Native Cloud, we’re obsessed with performance. Making large language models faster, cheaper, and more efficient is not a one-trick problem — it requires optimizing along multiple axes. That is the philosophy behind Together Turbo, our suite of inference innovations that draw from research in algorithms, architectures, and modeling recipes. We’re excited to introduce the AdapTive-LeArning Speculator System (ATLAS), the first speculator of its kind that gives automatic performance improvements without any manual tuning.ATLAS offers a new way of doing speculative decoding — one that dynamically improves at runtime — and it fits seamlessly alongside our other Turbo techniques like the proprietary Together Turbo Speculator or Custom Speculators. But why create an adaptive-learning speculator system?Standard speculators are trained for general workloads. Custom speculators are trained on your specific data, but only for a specific snapshot in time. However, as the workload evolves (codebase grows, traffic patterns shift, request distributions change), even highly customized speculators can fall behind. In contrast, ATLAS evolves automatically with usage, learning from both historical patterns and live traffic to continuously align with the target model’s behaviors in real time. This means the more you use our inference service, the better ATLAS will perform! Built on top of Together Turbo Speculator, ATLAS reaches up to 500 TPS on DeepSeek-V3.1 and up to 460 TPS on Kimi-K2 in a fully adapted scenario — 2.65x faster than standard decoding, outperforming even specialized hardware like Groq (Figure 1). Figure 1: We show the decoding speed on NVIDIA HGX B200 with our Turbo speculator and the adaptive-learning speculator system for DeepSeek-V3.1 (top) KIMI-K2-0905 (bottom) with Arena Hard traffic.11. Speculative DecodingSpeculative decoding is one of the most powerful levers for accelerating inference.2 Instead of having the target model generate every t...
First seen: 2025-10-12 11:18
Last seen: 2025-10-13 02:21