Grok 4.1

https://news.ycombinator.com/rss Hits: 1

Summary

We are excited to introduce Grok 4.1, which brings significant improvements to the real-world usability of Grok. Our 4.1 model is exceptionally capable in creative, emotional, and collaborative interactions. It is more perceptive to nuanced intent, compelling to speak with, and coherent in personality, while fully retaining the razor-sharp intelligence and reliability of its predecessors. To achieve this, we used the same large scale reinforcement learning infrastructure that powered Grok 4 and applied it to optimize the style, personality, helpfulness, and alignment of the model. In order to optimize these non-verifiable reward signals, we developed new methods that let us use frontier agentic reasoning models as reward models to autonomously evaluate and iterate on responses at scale. We conducted a gradual silent rollout of preliminary Grok 4.1 builds to a progressively larger share of production traffic across grok.com, X, and mobile apps. During the two-week silent rollout we ran continuous blind pairwise evaluations on live traffic. Grok 4.1vs. previous Grok Compared to the previous production model in traffic, Grok 4.1 is preferred 64.78% of the time. Grok 4.1 establishes a new standard in blind human preference evaluations. Overall Style Control Elo13251525 In LMArena's Text Arena, Grok 4.1 Thinking (code name: quasarflux) holds the #1 overall position with 1483 Elo —a commanding margin of 31 points over the highest non-xAI model. Grok 4.1 in its non-reasoning mode (code name: tensor) uses no thinking tokens for an immediate response and ranks #2 at 1465 Elo. Grok 4.1 non-thinking surpasses every other model’s full-reasoning configuration on the public leaderboard. Grok 4.1 significantly surpasses Grok 4, which had an overall rank of #33. To measure progress on our model’s personality and interpersonal ability, we evaluated Grok 4.1 on EQ-Bench3. EQ-Bench is a LLM-judged test, evaluating active emotional intelligence abilities, understanding, insight, empath...

First seen: 2025-11-17 22:48

Last seen: 2025-11-17 22:48

Read Full Article More from this Source

Grok 4.1

Summary

Related News

SSE sucks for transporting LLM tokens

Hacking Google Chrome Source Code: Make Puppeteer work over Redis PubSub

Photographer Built a Medium-Format Rangefinder, and So Can You

Fast, Memory-Efficient Hash Table in Java: Borrowing the Best Ideas

Computer Animator and Amiga fanatic Dick Van Dyke turns 100