52 Comments
User's avatar
Brian Villanueva's avatar

I teach robotics to HS students. "Manipulation is the problem" is something we talk about quite a bit. I show them the old DARPA challenge videos and they get a laugh at robots trying to just open a door. Robots have gotten better, but simple tasks still baffle them. Why? Tactile sensor density.

Computer vision has gotten very good: high resolution, AI pattern rec, the robot knows what's around it. But manipulation isn't visual. It's tactile, and tactile sensor density simply hasn't kept up. Human tactile resolution in the fingertip is about 1/2mm. And that's not just binary -- "am I touching something?". It's quite complex: "How much pressure?"; "Is it hot or cold?"; "Is it hard or soft?" The human palm is less sensor dense, but still far denser than any robotic fingertip. This is a biggest limitation to humanoid robots today: they can't feel. (And I don't mean emotions.)

You're also correct that dexterity and strength are largely a tradeoff and probably always will be. Strength requires higher power servos, but their larger size limits how many of them can be put in something small like a finger. Also, the higher pressures of lifting heavy things tend to damage the tactile sensors needed for more dexterous applications. This is likely unsolvable but won't matter long-term though. Robots will become cheap enough they will be specialized. The unit that can crack eggs for an omelet doesn't need to lift 50 pounds.

Vision is there. AI is there. Servos are close. But tactile sensors are the biggest hurdle. There's lots of folks working on this, and it's going to get there. But it's not there yet.

Expand full comment
Russell Hawkins's avatar

Is vision actually "there" though? I was reading recently about the surprising persistence of human radiologists, and I was surprised by how much the problems candidate AI replacements had sounded like the the ones they had in the early 2010s. Issues like not being able to handle small differences images from different machines, and accidentally overfitting to features of the images in the training data that aren't a part of the actual scan.

Also, everyone in self-driving cars (except Tesla) is relying heavily on lidar.

I'm not sure how these facts fit in with the obviously massive improvements in object categorization and facial recognition, but I wonder if the visual aspect of dexterous manipulation might also still be a huge challenge.

Expand full comment
Brian Villanueva's avatar

My understanding is that computer vision is there. And that makes sense, since it's really fairly basic pattern rec. It may sound weird to non geeks to say "computer vision is easy", but the great AI renaissance of the last 10 years has been in pattern rec. Failures of vision today are either too limited or weak training data. I still think the radiologists are doomed.

Lidar is used because it's faster and more reliable at distance measurements than binocular computer vision, even if the eyeballs are on opposite sides of the car.

Expand full comment
Danila Medvedev's avatar

A good illustration of why it’s difficult to manipulate something is the difficulty of creating realistic doors in computer games. See https://www.youtube.com/watch?v=AYEWsLdLmcc for example.

Expand full comment
Gary Mindlin Miguel's avatar

If there are tasks that can be done by a human operating a robot, that suggests those are not blocked on any hardware.

Expand full comment
Ryan Davidson's avatar

It's not just tactile sensor efficacy and density that's the issue. It's proprioception, or kinesthesia. Having a "sense" of position and movement, both of one's own body and the surrounding environment, without the use of visual or auditory inputs. Another commentor mentioned having an "almost visual image" of the task of buttoning a shirt. That's what we're talking about here.

This is vitally important to human (or, really, animal) dexterity. It's not enough to just have the tactile input. Or, rather inputs, because as a different commenter mentioned, our sense of touch is multi-channel in terms of both the number and types of inputs, all of which are analog, not binary. Pick up a pencil. The sensation you experience is an integration of signals sent by potentially thousands of different nerve endings. Those signals are integrated in both time and space, with each signal being interpreted in relation to the others. That's how you can know that a ball is round, for instance: integrating the sensation from your entire hand taking into account the position of each finger in relation to the whole.

This appears to be a very difficult problem even for biological nervous systems. Humans, and most mammals, appear to have a pretty good sense of proprioception. But mammals are vertebrates. We have rigid internal skeletons. Our limbs may move in relationship to each other, but their dimensions are basically fixed. This means that our nervous system can treat its own dimensions as a constant.

Which is why we're so awkward around puberty, for what it's worth: for a while there, we grow faster than our nervous system has time to account for. We literally outgrow our own feet.

Anyway, this probably why animals like octopuses don't appear to have much in the way of proprioception. But their legs are boneless. They can change not only the relative position of their legs (independently!) but bend them at every point along their length as well as change both the diameter and length of each one. Their nervous systems are pretty damn complex, though clearly not as complex as ours. But unlike mammals, with our rigid limbs, their nervous systems can't take any of their own dimensions as a given. So they basically don't bother with proprioception, as far as we've been able to tell.

Octopuses can get away without having proprioception because they're basically all legs, live in the water (meaning their limbs don't have to support their own weight or the weight of things they're trying to manipulate), and they are basically infinitely flexible. This means that for an octobus, the answer to "Where are my legs?" is effectively "Everywhere!" They can also squeeze themselves through any hole larger than their eyeballs, which is a pretty neat trick.

Needless to say, we can't build robots that way. But that being the case, we're left trying to replicate an incredibly sophisticated, analog, multi-variate, multi-channel sensory phenomenon with digital, algorithmic brute force.

Expand full comment
Luke Lea's avatar

The importance of proprioception seems right. Is that part of the sense of touch or something else entirely?

Expand full comment
Ryan Davidson's avatar

I'm no neurologist, but they seem to be connected somehow. Note that you can tell where your hand is even when it isn't touching anything. But without that sense, touching something wouldn't tell you it's shape.

Expand full comment
Luke Lea's avatar

One datum: when your leg goes to sleep you can't even walk.

Expand full comment
Kalen's avatar

I suspect that, as with chatbots, the urge to 'replicate human abilities' is a sci-fi-addled blind corner distinct from 'do something useful' that encourages hype, deception, and guarantees a degree of foundering on edge cases that eventually reveal the extent of the aforementioned hype and deception. The core business of so many tech companies is not actually producing functional, saleable products at profit-generating scale- it's generating a permanent sense of futurity that keeps people buying the stock, and if what you need to do is power that dream machine, then having a person-shaped machine mash some laundry, *even if everyone knows this is not likely to be within its real-world capabilities, or your price range, for basically forever*, is really more important to your business than trying to find and adapt tasks your robots can do. The world is still full of flat floors and square boxes, but getting a robot that could, say, help a mover get boxes off a truck at a sensible price point would entail trying to bring something kind of boring looking to market, whereas if you keep chasing the (grim, every problem-is-a-nail-for-my-techno-hammer) dream of a robot that can braid old peoples' hair, it always feels like the future.

Expand full comment
Brett's avatar

I remember this issue came up in an earlier wave of self-driving truck startups that didn't pan out. Some of the below essay from the founder of now-defunct Starsky Robotics feels dated, but he made a good point about how VCs expected founders to lie to them and promise the Moon. They much preferred in practice to invest in companies that promised all kinds of amazing SF features even if they weren't even remotely close to getting them reliable enough to do those things more than "once in a while it might work":

https://medium.com/starsky-robotics-blog/the-end-of-starsky-robotics-acb8a6a8a5f5

Expand full comment
Kalen's avatar

It happens in lots of arenas where 'moonshots' are culturally normalized- here's a study looking at applicants for grants from the Gates Foundation that found that people making outsize claims didn't achieve anything more than more honest/modest actors, but did get more money. Sure that was a one-off thing and not something that explains vast swathes of the world we live in.... : https://arxiv.org/ftp/arxiv/papers/0909/0909.4043.pdf

Expand full comment
Brian Villanueva's avatar

Replicating humans sounds cool: your private robot servant that can do everything, but it's very difficult and expensive. The more likely path is getting robotics cheap enough that you have several mobile robots that are specialized in specific things. The robot that folds your laundry ill be different from the one that teaches your kids and the one that helps pick up Grandma when she falls.

Expand full comment
Alien On Earth's avatar

To me the robot looked like Dr. Ock before I went under.

Expand full comment
Andy in TX's avatar

This was terrific, as usual. I'd love to see you tackle medical robots sometime. I just had a robotic prostectomy and the surgeon's video of a typical procedure that I saw looked like the patient was being attacked by a giant robot spider, while the MD played a video game in the corner. The surgery was a breeze compared to non-robotic ones (4 tiny incisions, 1 slightly larger one), the team consisted of the MD, a tech to swap stuff on the robot's arms, the anesthetist, and not much else. Similarly, a friend had a robotic knee replacement and had had his other knee replaced the old fashioned way. No comparison in the rate of recovery, etc. These fairly specialized machines (not sure if a prostate removal robot can also do other things, but I suspect it can at least remove other stuff, if not do a knee replacement) are really improving medicine. It doesn't matter that they look like a spider rather than a human. Which makes me think that the huge improvements in robotics in the short to medium term are likely to come from specialized robots rather than all purpose humanoid ones. Digging into that, especially in medicine, would be a great column!

Expand full comment
Russell Hawkins's avatar

Surgical robotics are a fantastic innovation, but I'm pretty sure at this point they are still 100% teleoperated. So in this sense they aren't at all dexterous on their own, because they never operate on their own.

Expand full comment
Luke Lea's avatar

I had a similar experience with major bowel surgery. Twelve inches of large intestine resected with essentially no recovery time.

Expand full comment
Geoff Olynyk's avatar

How much of this is because the latest generation of intelligent AIs hasn’t yet been applied to the control systems of humanoid (or industrial) robots?

I am dizzy at how fast it’s proceeding — intelligence of AI systems doubling every 7 months, and we’re realizing that the LLM architecture maybe actually just generalizes to become an AGI. Its “neurons” don’t look like ours do, but they’re beginning to be creative and solve problems at the level of top humans. I don’t pay for ChatGPT so I don’t have access to OpenAI’s latest model (o4-mini-high) but I hear it’s awe-inspiring on what it can do.

In my former field (tokamak fusion research), AIs are now controlling the plasma actuators to suppress edge-localized-modes that leak heat out of the plasma, and there are high hopes that AI control can actually suppress major disruptions. A humanoid robot control system is probably similar in complexity to a fusion-grade plasma.

Expand full comment
Sam's avatar
5dEdited

Some researchers create models and robots that use deep learning or neutral networks to do things like pick up objects, fold clothes, untangle ropes, etc. but the models, like plasma control, are specific to tasks, and don't generalize to being good at manipulating everything physical.

https://vcresearch.berkeley.edu/news/ken-goldberg-wins-multiple-best-paper-awards

Expand full comment
K Brown's avatar

You had me at "In my former field (tokamak fusion research)" :)

Expand full comment
mike harper's avatar

Funny as I was buttoning my shirt, I thought about how the sensation of my fingers manipulating the button gave me an almost visual brain image of the task. The tiny brain also thought: I wonder if a robot could do that? Would it need as many sensors as are in my hands and arms? I did not have to see the shirt or the button.

Add #22 Buttoning a shirt.

Expand full comment
Jim's avatar

I have an aging relative who would tell you that's the main thing she needs a robot for.

Expand full comment
mike harper's avatar

I don't need the robot yet but soon. #90 coming in Nov. Hoping to check out before someone has to wipe my ass.

Expand full comment
Jim's avatar

Yeah, that's an even better one.

Expand full comment
Doctor Hammer's avatar

I don’t want to be in the volunteer test group for that one.

“Damn, that’s the third one this morning. We really need to lower the torque or we will have a hell of a lawsuit.”

Expand full comment
Jim's avatar

This is reminding me that there's actually good news here. People like the Toto Washlet, which is a toilet seat with a spray washer attached, and a dryer. And supposedly means you don't need to wipe.

Importantly, the Washlet can only move along one axis, and it's not up.

Expand full comment
Elan Barenholtz, Ph.D.'s avatar

'“Manipulation is the hard problem we need to solve to make humanoid robots useful, not locomotion.”

Interesting. I guess cases where locomotion is the primary goal—like transporting stuff, operating weapons— you may not need humanoid robots; quadrapeds or other designs may be better models. But domestic and workplace environments that are designed for the human form require human-like dexterity.

Expand full comment
Mark Brophy's avatar

If locomotion is your goal, an ostrich is better than a human because it didn't descend from an animal that lives in trees.

Expand full comment
rahul razdan's avatar

nice article..

I was having a long conversation with a robot expert (I am more of a computing expert), and he explained the problem as... we have nothing which is close the properties of a muscle. A muscle is amazing because it provides power, but in the "limp" state ..gets out of the way. This provides an incredible amount of range of motion. Very interesting....

Expand full comment
Paul Drake's avatar

In my view it's the sensors. From your numbers the human hand has nearly 100 times more than the best robot, and the human ones are far more capable. That is a massive difference.

Expand full comment
Geoff Olynyk's avatar

Three different comments here now saying it’s the sensor density in the effectors (“fingers” and “hands”).

Feels like we have a consensus :)

Expand full comment
Kevin's avatar

An eval is not so useful when the state of the art is like, well it’s completely impossible to score above a zero. Maybe a more useful way to evaluate would be on a standard task with continuous performance.

For example, how fast can a pair of robotic hands fold an origami paper crane? It’s very replicable because you just need a standard square of origami paper. And there is *some* performance that you could have today, it’s not just completely impossible.

Expand full comment
K Brown's avatar

Your first sentence was my thought, but then I had two more: I remembered the first DARPA challenge in the desert, where zero vehicles crossed the finish line, and a few ended up in ditches. The second was: I would have trouble scoring well on a few of these - that "start the first sheet from a new toilet paper roll" performed after I just woke up came to mind.

So while I don't think starting with an eval that most state-of-the-art systems would completely fail at would mean that eval was completely useless, since many fields start with exactly these kinds of unachievable goals, I do think the baseline score should be 1) an average human doing these tasks, and 2) an average human doing these tasks after three rapid fire martinis :)

Expand full comment
Jim's avatar

Darpa Dexterity Challenge, sign me up

Expand full comment
Kevin's avatar

The DARPA challenge was a good eval because you could measure by how far the robots got. So you would still have like, of these ten teams, this one is in first place, congratulations! You could make progress.

An eval that says "Everyone in the field is equally bad and scores zero points" is no good because you want someone to be able to win, and then the winner uses it for some purposes, likr the PR of saying "the CMU robotics team won first place" even if that just means like, you drove into a ditch 100 feet further on than the MIT team drove into a ditch.

A good eval needs to have a winner. Ideally a top ten list. People love to read top ten lists.

Expand full comment
K Brown's avatar

But what about the martinis? ;)

Expand full comment
Brian Villanueva's avatar

I like the paper crane idea; that would be a great addition. Evals are useful even if systems fail at them though. Look at the Turing Test or the ARC Prize. Systems scored 0's for a very long time. Until they didn't.

Expand full comment
K Brown's avatar

And there's this: https://arcprize.org/blog/announcing-arc-agi-2-and-arc-prize-2025

Currently no general LLM foundational model scores higher than zero. But like the original DARPA grand challenge, there's a $1M prize. So it will be interesting to see if this drives innovation too.

Expand full comment
[sphere]'s avatar

Robotics engineer here; I don't work on manipulation currently, but I like to think I'm reasonably familiar with the state of the field. This post is pretty spot-on. A lot of manipulation tasks (in addition to all the other problems) require coordinating two or more arms to do something, which not only doubles the hardware expense, it adds a whole layer of task/motion planning complexity that is an active area of current research. Re: dexterity evals, you might be interested in the YCB object set and related benchmarks (https://www.ycbbenchmarks.com/), which are fairly standard for manipulation research -- you'll notice it's quite light on deformable objects and other such tricky items and has still managed to keep roboticists busy for the past decade. In general it's always a good idea to assume that a robot video is filmed in very specific ideal conditions, you're seeing one of dozens or hundreds of takes, and that it's all teleoperated unless you see hard evidence otherwise.

Expand full comment
Yoonseo Kang's avatar

Hardware is a solved problem. Tactile could be better but the real problem is advanced AI software design. This is a brick wall that even the top robot startups are running into, despite all their money. Good news is, robotic AI breakthrough coming this year. Everything is about to change.

Expand full comment
Sean White's avatar

Thanks @brianpotter for this balanced and sober interpretation of what is unfolding before us in the world of robotics.

Expand full comment
Adam's avatar

Great perspective. As someone who works with industrial robots pretty regularly, the key obstacle is safety - if something is stronger or faster than a human, it needs to be kept on some type of protective cage or work only with other robots.

Isn't that a huge issue here too? You can't have robots stumbling over babies or knocking old people down. And they should be light enough you could move one that was impairing your own movement. I don't see how you make a humanoid robot strong enough to be useful while also preventing it from accidentally injuring other people pretty regularly

Expand full comment
Ankur Handa's avatar

Hi Brian, I want to highlight our DexPilot work https://sites.google.com/view/dex-pilot where we show that you can do wide variety of things with hands and arms with teleop - done in 2019 long before people started taking hands seriously.

I'd argue the major bottleneck isn't hardware as much but software and I am very bullish on simulation to close the gap in the coming 3-4 years.

Also, you substack post is really good!

Expand full comment
tg56's avatar
4dEdited

I like your eval list. Another one is cracking and separating an egg without breaking the yolk or getting shells in either the whites or yolks. There are purpose built machines that can do this (at speed), but they don't look anything like hands.

Expand full comment