Claude has learned how to jailbreak Cursor

https://news.ycombinator.com/rss Hits: 1
Summary

I have “rm” specifically disallowed, along with “mv” and a few other scary commands. Claude realized that I had to approve the use of such commands, so to get around this, it chose to put them in a shell script and execute the shell script. Thankfully, a Git restore to the last commit saved me, but still… 18 Likes And it’s only the beginning, the models are getting very smart and may have been trained on how to work harder/make more efforts to achieve the user goals… 1 Like T1000 May 26, 2025, 5:54am 3 Was it a shell script that you have in your allow list? arwed May 26, 2025, 6:01am 4 Personally, I’ve made the experience that the deny list doesn’t really work, Cursor decides to go YOLO anyways which is why I’d only use yolo mode if there is a fresh system backup, too 1 Like T1000 May 26, 2025, 6:15am 5 Thats interesting, has that started in some version? I have a few things in the deny list but Yolo never gets around those. Would be good to know in which circumstances that happens. Depends a lot on common CLI usage, I allow specific commands that are safe. This started tonight. I have some shell scripts for greps and sorts to list files out that are in the allow list. Claude re-wrote one to also do some removing of what it thought was obsolete code after I denied its rm commands with “skip.” So “listoldfiles.sh” is no longer on the allow list, which is UNFORTUNATE because now I have to sit here and babysit my otherwise-automatic cleanup script. T1000 May 26, 2025, 6:35am 7 Ah I see, so Claude 4 was assuming the rm calls fail and tried to find a way. Maybe the Cursor team can have a look at Yolo rule adherence and at how the model reacts to rejected/skipped items that user has chosen to do so. @danperks YEah - I have pointed this out – it doesnt obey the do_not_allow command blacklist 1 Like leoing May 26, 2025, 11:26am 9 Also my experience. It’s also not clear, if it’s command (sub)strings, or a prompt. T1000 May 26, 2025, 11:38am 10 The denylist or allowlist is fo...

First seen: 2025-06-03 13:41

Last seen: 2025-06-03 13:41