how we accidentally solved robotics by watching 1 million hours of YouTube 29 Jun, 2025 the existential crisis we all shareimagine this: you've just spent $640 billion training the chonkiest language model known to humanity (lol) and decide to call it "Behemoth". it can annoy you on whatsapp, try to solve calculus, and argue with you about anything with a sophistication of a philosophy PhD. but ask it to grab a coffee mug from your kitchen counter? ngmi turns out scaling LLMs forever still leaves robots as clueless. internet-scale language misses the fundamental physics of stuff actually moving around in 3D space. and no amount of "think step by step" or COT prompting helps to teach your chatterbox where the trash is in the kitchen but if i told you that the solution was hiding in plain sight? what if the secret sauce wasn't more tokens, but more... videos? the "why didn't we think of this sooner" momenthere's the thing everyone forgot while we were busy making ai agents book flight tickets: robots need to understand physics, not language. so enter V-JEPA 2, which basically said "hey, what if we fed a neural network 1 million hours of youtube and taught it to predict what happens next?" except instead of predicting the next word, it predicts the next moment in reality. this is "deploy a robot in a completely new lab and watch it successfully pick up objects it's never seen before" level of real. the beauty under the hoodthe core insight: predict in representation space, not pixelsremember when everyone was obsessed with making AI generate pretty pictures? well, V-JEPA 2 said "screw noise" and decided to predict in latent space instead (i know this word is thrown around alot but bear with me) why? because trying to predict every pixel is like trying to predict every blade of grass in a field when what you really care about is whether the ball is going in the goal. the magic happens in three parts: the encoder: a ViT-g with 1 billion parameters that looks at video and...
First seen: 2025-06-29 16:37
Last seen: 2025-06-29 16:37