GPT-OSS Reinforcement Learning

https://news.ycombinator.com/rss Hits: 13

Summary

You can now train OpenAI gpt-oss with RL and GRPO via Unsloth. Unsloth now offers the fastest inference (3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss. Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to deliver 3x faster inference for gpt-oss at ~21 tokens/s. For BF16, Unsloth also achieves the fastest inference (~30 tokens/s), especially relative to VRAM usage, using 50% less VRAM vs. any implementation.With Unsloth, you can train gpt-oss-20b with GRPO on 15GB VRAM and free on Colab. Unsloth's new inference runs faster on any GPU including A100, H100 and old T4's. gpt-oss-120b fits on 80GB VRAM.Unsloth is the only framework to support 4-bit RL for gpt-oss. All performance gains are due to Unsloth's unique weight sharing, Flex Attention, Standby and custom kernels.Reminder: Flash Attention 3 (FA3) is unsuitable for gpt-oss training since it currently doesn’t support backward passes for attention sinks, causing incorrect training loss. If you’re not using Unsloth, FA3 may be enabled by default, so please double-check it’s not in use!⚡Making Inference Much FasterInference is crucial in RL training. To achieve the fastest inference speed for gpt-oss without vLLM, we rewrote Transformers inference and integrated many innovations including custom algorithms like Unsloth Flex Attention, torch.compile. The new inference was evaluated against an already optimized baseline (2x faster than native Transformers).vLLM does not support RL for gpt-oss since it lacks bf16 training and LoRA support for gpt-oss. Without Unsloth, only training via bf16 works, making memory use even 800%+ higher. Most frameworks enable FA3 by default (which reduces VRAM use & increases speed) but this causes incorrect training loss. You must disable FA3, though that prevents long-context training, so instead, we implemented Unsloth Flex Attention.We evaluated gpt-oss RL inference by ben...

First seen: 2025-09-27 04:21

Last seen: 2025-09-27 18:23

Read Full Article More from this Source

GPT-OSS Reinforcement Learning

Summary

Related News

Formal Reasoning [pdf]

You Already Have a Git Server

ICE Will Use AI to Surveil Social Media

How I turned Zig into my favorite language to write network programs in

Resource use matters, but material footprints are a poor way to measure it