Root
HiTech and Digital
MIT engineers want to give household robots ‘common sense’

MIT engineers want to give household robots ‘common sense’

Do repost and rate:

Rated:9

Massachusetts Institute of Technology (MIT) engineers are developing software for household robots that uses large language models (LLM) to help them acquire “common sense.” They believe this will enable such robots to complete tasks when interrupted or misstepping.

At present, most robots of this nature learn tasks by mimicking humans. As a result, most of them have not been able to handle real-world interruptions to a task like bumps or nudges until now. Those who can often need to restart an entire task from the beginning and are not able to “pick up where they left off.”

However, MIT engineers believe integrating LLM could help plug this gap, making such robots more useful in the real world. To this end, including robot motion data with large language models should give such robots “common sense knowledge.”

MIT’s new technique allows a robot to break down various household tasks into smaller sub-tasks and adapt to any disruptions within a sub-task. This enables the robot to continue a task without starting from the beginning or requiring engineers to manually program solutions for every possible failure.

Giving robots common sense with LLM

“Imitation learning is a mainstream approach enabling household robots. But suppose a robot is blindly mimicking a human’s motion trajectories. In that case, tiny errors can accumulate and eventually derail the rest of the execution,” explained Yanwei Wang, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS).

“With our method, a robot can self-correct execution errors and improve overall task success,” he added. To test their idea, the researchers demonstrated their latest technique by scooping marbles from one bowl and pouring them into another.

To make the robot more human-like, they may repeat this process several times to create demonstrations for the robot to imitate. “But the human demonstration is one long, continuous trajectory,” Wang says.

The team realized that a human task— while seeming like a single action— is essentially a series of smaller actions or paths. For example, the robot must reach into the bowl before scooping, and before moving toward the empty bowl, it must scoop up the marbles. So, the team needed a way to break up the tasks into little subtasks and keep track.

The team’s obvious choice was deep learning models, specifically LLMs. These can process vast amounts of text data and establish connections between words, sentences, and paragraphs.

Keep calm and carry on

This allows the models to generate new sentences based on what they have learned about the likelihood of certain words following others. According to the researchers, an LLM can produce a logical list of subtasks related to a given task and generate sentences and paragraphs.

The team devised a new method using an algorithm to link a robot’s physical location or image data representing its state with a natural language label assigned to a specific subtask using an LLM. The process of associating a natural language label with a robot’s physical location or image data is known as “grounding.”

Using this, the team let the robot carry out tasks independently using its new “grounding” classifiers. They found that gentle nudging didn’t phase the robot; it could self-correct and carry on. It could even tell if no marbles were in the spoon and start over if needed.

“With our method, when the robot is making mistakes, we don’t need to ask humans to program or give extra demonstrations of how to recover from failures,” Wang says. “That’s super exciting because there’s a huge effort now toward training household robots with data collected on teleoperation systems. Our algorithm can now convert that training data into robust robot behavior that can do complex tasks despite external perturbations.”