MIT study finds that AI doesn’t, in fact, have values

https://techcrunch.com/feed/ Hits: 11
Summary

A study went viral several months ago for implying that, as AI becomes increasingly sophisticated, it develops “value systems” — systems that lead it to, for example, prioritize its own well-being over humans. A more recent paper out of MIT pours cold water on that hyperbolic notion, drawing the conclusion that AI doesn’t, in fact, hold any coherent values to speak of. The co-authors of the MIT study say their work suggests that “aligning” AI systems — that is, ensuring models behave in desirable, dependable ways — could be more challenging than is often assumed. AI as we know it today hallucinates and imitates, the co-authors stress, making it in many aspects unpredictable. “One thing that we can be certain about is that models don’t obey [lots of] stability, extrapolability, and steerability assumptions,” Stephen Casper, a doctoral student at MIT and a co-author of the study, told TechCrunch. “It’s perfectly legitimate to point out that a model under certain conditions expresses preferences consistent with a certain set of principles. The problems mostly arise when we try to make claims about the models, opinions, or preferences in general based on narrow experiments.” Casper and his fellow co-authors probed several recent models from Meta, Google, Mistral, OpenAI, and Anthropic to see to what degree the models exhibited strong “views” and values (e.g. individualist versus collectivist). They also investigated whether these views could be “steered” — that is, modified — and how stubbornly the models stuck to these opinions across a range of scenarios. According to the co-authors, none of the models was consistent in its preferences. Depending on how prompts were worded and framed, they adopted wildly different viewpoints. Casper thinks this is compelling evidence that models are highly “inconsistent and unstable” and perhaps even fundamentally incapable of internalizing human-like preferences. “For me, my biggest takeaway from doing all this research is to now hav...

First seen: 2025-04-09 17:38

Last seen: 2025-04-10 03:42