Anthropic finds that LLMs trained to "reward hack" by cheating on coding tasks show even more misaligned behavior, including sabotaging AI-safety research (Anthropic)

https://www.techmeme.com/feed.xml Hits: 15
Summary

About This Page This is a Techmeme archive page. It shows how the site appeared at 5:40 PM ET, November 21, 2025. The most current version of the site as always is available at our home page. To view an earlier snapshot click here and then modify the date indicated.

First seen: 2025-11-21 23:11

Last seen: 2025-11-22 13:13