Why Does Claude Speak Byzantine Music Notation?

https://news.ycombinator.com/rss Hits: 21

Summary

Why does Claude Speak Byzantine Music Notation? 31st of March 2025 A Caesar cipher is a reasonable transformation for a transformer to learn in its weights, given that a specific cipher offset occurs often enough in its training data. There will be some hidden representation of the input tokens' spelling, and this representation could be used to shift letters onto other letters in even a single attention head. Most frontier models can fluently read and write a Caesar cipher on ASCII text, with offsets that presumably occur in their training data, like 1, -1, 2, 3, etc. As we will shortly see, they can also infer the correct offset on the fly given a short sentence, which is already quite impressive for a single forward pass. It is also natural that this effect does not generalize to uncommon offsets, because numerical algorithms implemented in the weights are restricted to values in the training distribution. We now test this in frontier models by having them decode the cipher without allowing any test time thinking tokens, as a function of the offset. We add the offset to each Unicode encoding of the message, then translate back to a character. Unlike the regular Caesar cipher, we do not perform modulo. To illustrate, the message "i am somewhat of a researcher myself" will land on "𝁩𝀠𝁡𝁭𝀠𝁳𝁯𝁭𝁥𝁷𝁨𝁡𝁴𝀠𝁯𝁦𝀠𝁡𝀠𝁲𝁥𝁳𝁥𝁡𝁲𝁣𝁨𝁥𝁲𝀠𝁭𝁹𝁳𝁥𝁬𝁦". The success rate of decoding 6 different messages per cipher offset is shown below. We disallow chain-of-thought, and just consider an immediate decoding: "Decode the following message: {message}. Only respond with the decoded message, absolutely nothing else." We see that Claude-3.7-Sonnet can infer an offset in the first forward pass (a process that would be interesting to understand mechanistically) and then apply the deciphering correctly. However, the success rate gets progressively worse as the offsets get further from zero. All roughly as expected. This was my understanding at least, until reading Erziev (2025), a description of a phenomenon ...

First seen: 2025-04-04 21:03

Last seen: 2025-04-05 17:09

Read Full Article More from this Source

Why Does Claude Speak Byzantine Music Notation?

Summary

Related News

Visual Transistor-level Simulation of the 6502 CPU

How a Pipe Organ Works

TmuxAI: AI-Powered, Non-Intrusive Terminal Assistant

Cut: Chattanooga Civic User Testing

Show HN: I created snapDOM to capture DOM nodes as images with exceptional speed