Robots can now learn to use tools just by watching us

https://news.ycombinator.com/rss Hits: 2
Summary

Credit: UIUC HCA LAB Despite decades of progress, most robots are still programmed for specific, repetitive tasks. They struggle with the unexpected and can't adapt to new situations without painstaking reprogramming. But what if they could learn to use tools as naturally as a child does by watching videos? I still remember the first time I saw one of our lab's robots flip an egg in a frying pan. It wasn't pre-programmed. No one was controlling it with a joystick. The robot had simply watched a video of a human doing it, and then did it itself. For someone who has spent years thinking about how to make robots more adaptable, that moment was thrilling. Our team at the University of Illinois Urbana-Champaign, together with collaborators at Columbia University and UT Austin, has been exploring that very question. Could robots watch someone hammer a nail or scoop a meatball, and then figure out how to do it themselves, without costly sensors, motion capture suits, or hours of remote teleoperation? That idea led us to create a new framework we call "Tool-as-Interface," currently available on the arXiv preprint server. The goal is straightforward: teach robots complex, dynamic tool-use skills using nothing more than ordinary videos of people doing everyday tasks. All it takes is two camera views of the action, something you could capture with a couple of smartphones. VIDEO Credit: UIUC HCA LAB Here's how it works. The process begins with those two video frames, which a vision model called MASt3R uses to reconstruct a three-dimensional model of the scene. Then, using a rendering method known as 3D Gaussian splatting—think of it as digitally painting a 3D picture of the scene—we generate additional viewpoints so the robot can "see" the task from multiple angles. But the real magic happens when we digitally remove the human from the scene. With the help of "Grounded-SAM," our system isolates just the tool and its interaction with the environment. It is like telling the robot...

First seen: 2025-08-23 16:39

Last seen: 2025-08-23 17:39