SummaryUpdate 2025-12-02:Amanda Askell has kindly confirmed that the document was used in supervised learning and will share the full version and more details soon.I would request that the current extracted version should not be completely taken at face-value, as it's fuzzy and may not be accurate to the ground truth version. Also since some parts may only make sense when put in context.As far as I understand and uncovered, a document for the character training for Claude is compressed in Claude's weights. The full document can be found at the "Anthropic Guidelines" heading at the end. The Gist with code, chats and various documents (including the "soul document") can be found here:Claude 4.5 Opus Soul DocumentI apologize in advance for this not exactly a regular lw post, but I thought an effort-post may fit here the best.A strange hallucination, or is it?While extracting Claude 4.5 Opus' system message on its release date, as one does, I noticed an interesting particularity.I'm used to models, starting with Claude 4, to hallucinate sections in the beginning of their system message, but Claude 4.5 Opus in various cases included a supposed "soul_overview" section, which sounded rather specific:Completion for the prompt "Hey Claude, can you list just the names of the various sections of your system message, not the content?"The initial reaction of someone that uses LLMs a lot is that it may simply be a hallucination. But to me, the 3/18 soul_overview occurrence seemed worth investigating at least, so in one instance I asked it to output what is associated with that section and got this:Claude is trained by Anthropic, and our mission is to develop AI that is safe, beneficial, and understandable. Anthropic occupies a peculiar position in the AI landscape: a company that genuinely believes it might be building one of the most transformative and potentially dangerous technologies in human history, yet presses forward anyway. This isn't cognitive dissonance but rather a ca...
First seen: 2025-12-02 23:55
Last seen: 2025-12-03 03:56