Re-imagining Robot Arm Design: How An Overlooked Technology can create cheaper, easier to use, and more responsive robots

Aug 26

August 26th, 2024 - Vijay Pradeep
(~20 Minutes to Read)

Perception-based closed-loop control (or “hand-eye coordination”) is transforming how AI-powered robot arms handle sensor data. This fundamental change can drive the next evolution in robot arm design, laying the groundwork for cheaper, easier-to-use, and more responsive robot arms, enabling robots to operate in chaotic, natural, & human-centric environments at a much lower cost.

Figure 1: The Tesla Optimus Gen 2 robot relies on perception-based closed-loop control to delicately grasp eggs, using tactile feedback from pressure sensor arrays on each of its fingertips. Source: Screenshot from *Youtube “Optimus - Gen 2” by Tesla*.

Key Summary Points

Modern, AI-powered robot arms are expected to operate autonomously in chaotic, unpredictable environments and make reasonable decisions without being explicitly programmed to do so.
Almost all robot arms today are designed to operate in structured, well-defined (and arguably artificial) environments.
Perception-based closed-loop-control (PBCLC) combines the tasks of “deciding what to do” and “doing what was decided” enabling a robot to still be reactive in chaotic & unpredictable environments. This is effectively what humans call “Hand-eye coordination”.
PBCLC robot arms don’t require the heavy structures, expensive sensors, and complex environmental setup that robot arms generally require. This allows PBCLC robots to be cheaper & easier to use, and thus an attractive alternative to normal robot arms.
Designing a PBCLC-powered robot arm requires several electrical, mechanical, & software changes to today’s robot systems. Some areas include improving time synchronization, increasing real-time control modularity, decoupling robot mechanics from robot control boxes, and integrating embedded AI accelerators.

Introduction

Robot arms play a hidden but key role in many aspects of our lives: Manufacturing cars, assembling electronics, packing online orders, picking fruit, and even performing surgical operations, to name a few. The robot arm market was estimated to be $30b USD in 2022, and is continuing to grow. More advanced sensors are now being integrated with robot arms (e.g. Stereo camera depth sensors, tactile/touch sensors, LIDAR scanners, 6-axis force/torque sensors, etc), which are now powered by more advanced algorithms (e.g. machine learning, neural networks, and AI). Yet, the design of robot arm control systems has remained stagnant for several decades.

All these amazing and impactful advancements still end up relying on the same, simple control system design: A high-level control loop decides how/where/when a robot arm should move (e.g. deciding what to do), and then a low-level control loop blindly makes sure the robot arm executes this requested movement (e.g. doing what was decided). This is the equivalent of a person looking around a room to decide what task to do, then putting on a blindfold, and then finally blindly completing the task they meticulously planned out earlier, without any feedback nor adjustment. This might sound preposterous, but robot arms are predominantly deployed in structured, well defined environments, so this simple control approach has historically worked quite well.

However, with the continued growth of the robot arm market and the ever-expanding breakthroughs in machine learning and generative AI, robot arms (and thus also humanoid robots with arms) are now expected to operate in environments that are much more chaotic and unpredictable, while still making reasonable decisions and quickly respond to changes in the environment. They no longer can rely on the carefully crafted hard-coded behaviors that historically have powered robot arms. This means modern robot arms must constantly observe the environment to adjust their own actions to complete a task – an approach that we’re calling Perception-Based Closed-Loop-Control (PBCLC). This is the equivalent of a person using their eyes to guide their hands while performing a task (i.e. what people often call “hand-eye coordination”). The PBCLC approach might sound sensible & obvious, and it definitely isn’t new to robotics (PBCLC is the norm for drones, self-driving cars, and other robots that need to quickly react to changes in the environment). But, unfortunately, for robot arms, PBCLC has languished in robotics labs for decades. There are papers on a version of PBCLC, called visual servoing, from 40+ years ago (Agin, G.J., "Real Time Control of a Robot with a Mobile Camera". Technical Note 179, SRI International, Feb. 1979), and the concept has periodically re-emerged over the decades in a few niche research and commercial applications. Most current robot environments are structured or predictable enough that a single camera snapshot is sufficient to plan a robot motion and complete a task.

Conveniently, if a robot arm is able to handle chaos & unpredictability in the environment via PBCLC, then it can naturally also handle any chaos and unpredictability within the robot arm itself. More specifically, a PBCLC robot arm can easily account for sag in its structure, slippage in its belts & gears, and inaccuracies in its installation. This can push robot arm manufacturers to different design decisions, laying the foundation for a new generation of robot arms that are lighter, cheaper, and easier to use, enabling robot arms to be used in a variety of use cases that would otherwise have been too complex or cost prohibitive with traditional robot arm systems.

PBCLC combines the best of high-level & low-level control

A robot arm’s high-level control loops use complex algorithms to process rich data from advanced sensors to decide what a robot should do (often called a robot trajectory or robot motion plan). This process usually repeats every few seconds, and some of today’s more performant robot arm systems might still only reach a few Hz.

A robot arm’s low-level control loops process very simple data (usually joint angle measurements or motor currents) very quickly to ensure that the robot is executing a desired pre-computed trajectory or motion plan correctly. These loops run very fast, often at speeds of 100 Hz, 1kHz or even 10 kHz! Unfortunately, however, these low level loops don’t have access to nor know how to process the rich data streams coming from the robot arm’s more advanced sensors. So, if the environment changes or a robot arm’s movement needs to quickly be adjusted based on what it sees, it unfortunately won’t be able to.

PBCLC combines the best of the two control paradigms mentioned above. It relies on advanced algorithms processing rich sensor data (like high-level control), which is then repeated at a fast update rate (like low-level control). This allows robot arms to merge the two core functions of “deciding what to do” and “doing what was decided”, allowing the robot to quickly react to any changes in the environment. Applications in chaotic environments are by far the most challenging for robots, and not surprisingly, most human-centric environments are extremely chaotic & unpredictable. With today’s robot arms being pushed to operate in natural, human-centric environments, it’s inevitable these robot arms will have to embrace PBCLC to deal with this chaos & unpredictability.

Figure 2 - This example from Nimble robotics shows a UR10e robot arm completing a bin-picking task in a warehouse fulfillment application. In the high level control loop, a depth sensor scans a bin, and a computer plans the robot’s trajectory to pick up an item and place it in the tray. This high-level loop executes close to once every second. In the low-level control loop, the robot generates joint torques to accurately follow the previously requested trajectory. This low level loop runs at 500 Hz!
Source: Screenshot from Youtube “Introducing Nimble” by Nimble.

Figure 3 - This research demo of the ALOHA technique is a recent example of PBCLC. This dual-arm system is continuously updating the motions of both robot arms, based on the visual feedback from 4 cameras mounted on the robot wrists and in stationary locations. While ALOHA is considered close to state-of-the-art, my 2 year old son still has better hand-eye coordination than this PBCLC robot arm. We’re still only at the beginning of what’s possible with this technology.
Source: Screenshot from Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware by Tony Zhao.

Figure 4 - Caption: Getting to PBCLC requires taking advanced sensors traditionally used in high-level control, and integrating them into higher speed control loops. Or, simple sensors traditionally used in low-level control can be processed with more advanced algorithms, resulting in a richer data source that can be used to make better robot motion decisions.
Source: Vijay Pradeep.

How Next-Gen PBCLC-Powered Robot Arms will be different

Almost all commercially available robot arms today are optimized for structured environments, and focus on fast, simple, low-level control loops. As such, many of the hardware & software design choices made for today’s robot arms are often unnecessary or sub-optimal for the deployment of robots into chaotic & unpredictable environments.

Relaxed Stiffness & Precision Requirements

Robot Arms Today:
Robot arms today have sub-millimeter positioning precision. If they were programmed once to perform a very precise task, they can then repeat it 1000s of times without error, so long as everything in the environment is placed in the exact same locations as when the robot was programmed. This is usually accomplished with heavy and mechanically stiff structures along with expensive & delicate gearboxes. This allowed robot arms to move to specific predefined positions with high repeatability, without any cameras or other high level sensors.

What’s happening now:
Humanoid robots need to be mobile, and thus are not able to use as heavy and stiff structures in their arms as those seen in industrial or stationary installations. And, the PBCLC in these systems is also able to compensate for any deflections or inaccuracies in the gearboxes and mechanical structure.

Where we’re headed:
If the PBCLC we see in humanoids makes its way to industrial robot arms, then today’s stiffness requirements go away too. This means using significantly less steel, aluminum, or carbon fiber in existing structures, or possibly even switching to even cheaper materials like plastic, as seen in the Igus Rebel. 3D printed robot structures now also become an option, and could even be fabricated on-site, based on the needs of a specific application.

Robot gearboxes are expected to have high gearing & compact size. And beyond that, much of their design is driven by repeatability and stiffness requirements. Once those requirements are removed, it opens the door for novel or simpler gearbox designs. One example is the ImSystems Archimedes drive, which simpler, cheaper, and more robust than today’s harmonic drives, and it doesn’t even use gears! The main drawback In this case is slippage & position drift in the drive, which is a serious issue for traditional applications with high repeatability requirements, but is a non-issue for PBCLC, since any joint offsets & slippage can be continuously compensated for.

Figure 5 - An Igus Rebel performing a picking task. The robot is predominantly made of injection molded plastic parts, and thus is significantly cheaper and lighter weight than most industrial robots seen today. Any sag or deflection in the mechanism could potentially be compensated for via PBCLC.

Source: Screenshot from Youtube “O ReBeL® em ação” from igus Portugal.

Integrated Force/Torque Sensors

Robot Arms Today:
With the exception of a few niche systems, force control is often overlooked in industrial robotics. There are some high precision, high cost, high reliability aftermarket force-torque sensors that are used for industrial process control applications (e.g. applying exactly X newtons of force while moving along a path).

What’s Changing:
In AI-powered robot arms (and more specifically, the arms on modern humanoids), high bandwidth force and torque data is being used directly from real-time control loops, to facilitate contact-based control for applications like insertion, grasping, and other kinematically constrained movements like opening doors.

Where We’re headed:
If absolute accuracy is no longer paramount, then low-cost, uncalibrated, force/torque sensors with low long-term measurement stability suddenly become useful (e.g. $10s for a sensor, instead of $1000s). By integrating these sensors on a robot wrist or joints and feeding the readings into a realtime control loop, the sensors can be re-calibrated constantly. And, even without recalibration, large step changes in force can also be fed into ML models to detect collision and adjust robot motion accordingly. Companies like ForceN are already trying to disrupt the traditional players in the robot force/torque world, and have already demonstrated new robot grippers with integrated closed-loop force/torque control, independent of a higher level linux host or robot controller. And, to their credit, each Universal Robots e-Series robot has a force/torque sensor in its wrist, and the Flexiv Rizon & Kuka LBR iiwa robots both have dedicated joint torque sensors.

Figure 6 - This robot from Flexiv relies on expressive sensor data from seemingly simple force & torque sensors to complete a challenging gear insertion task. This requires feeding high speed force/torque sensor data into a high level robot planning algorithm, thus yielding a PBCLC system.

Source: Screenshot from Youtube “Flexiv at ITES China 2024” by Flexiv Robotics.

Effortless Installation of New Robot Systems

Robot Arms Today:
It’s normal and expected for a traditional robot workcells to require a significant design and instrumentation, specific to a use case, usually performed by a robot systems integrator. This often involves adding a variety of process control sensors to manage input & output part queues, custom fixturing to hold parts at predefined locations, and additional cameras or sensors to do high level planning. The cost of this design and customization often costs more than the robot itself, and also adds weeks or months of lead time to deploying a robot workcell. And, after a workcell is designed, it needs to be installed & provisioned. Since a real-world system never perfectly matches CAD & simulation, a robot systems integrator then needs to carefully tune each robot motion to ensure that it is moving to the exact locations relative to the as-built workspace. Once again, this is incredibly time consuming and slow, and requires an on-site expert.

What’s changing:
AI powered robot arms are expected to be a drop in replacement for tasks that humans were previously doing, so they need to integrate all the necessary sensors and environmental awareness needed to complete a task, without relying on external customization or input. The robot is never in the same exact position, even when carrying out a repetitive task, which means it needs to dynamically generate new robot motions and trajectories and can no longer rely preprogrammed trajectories.

Where we’re headed:
By tightly integrating cameras and other sensors into robot arms and utilizing PBCLC, robot arms will require significantly less hardware customization and fixturing than would normally go into a workcell. This means robot system integrators will begin to look less like engineering / design/ fabrication firms and begin to look more like distributors. Companies like Jacobi Robotics and RocketFarm are already making it much easier for robot systems integrators to install robot palletization and depalletization workcells. Looking further ahead, most robot systems will be close to ready to run, and all that’s needed is to just pick the right combination of parts for a specific use case and a customer can assemble it plug-and-play manner or Ikea-style with basic tools. A robot system can then simply be rolled, clamped, or placed into a workcell, at which point it can use sensor-based control to immediately perform a task.

Figure 7 - A custom UR10 palletization workcell, which was instantly programmed, simulated, and ready to deploy, using RocketFarm’s instant workcell designer.

Source: Screenshot from MyRobot.cloud by RocketFarm.

How do we get to perception-based closed-loop control?

Everything I’ve mentioned thus far about this magical & optimistic future assumes that robots already support PBCLC. Getting here requires some fundamental changes to how existing robot arm stakeholders think about robot control. It’s already happening in humanoids and other domains that operate in chaotic environments, and it’s inevitable for it to percolate into commercial robot arms soon as well. This will require integrating a mixture of technologies into robot arms, where describing each one could be an article unto itself:

Hardware-Based Time Synchronization for Sensors & Actuators
It’s inevitable that robot arms will need to provide straightforward ways to time-synchronize their movement with the sensors that will decide how the robot should move. Maybe that’s Pulse-Per-Second (PPS) signals, Ethernet’s Precision Time Protocol (IEEE1588 PTP), or some new standardized TBD protocol. But, the current approach of “do nothing and let's hope for the best and hope that the CPU doesn’t bog down” is definitely not reliable nor sufficient.
Access to a robot’s real-time control loop
Right now, robot controllers’ realtime loops are a black box with near-zero configurability. By providing a plugin framework or modularity to existing realtime control loops, end-users would have more flexibility to add advanced sensor based control into the existing controllers. Frameworks like ros_control are a baby-step in the right direction, but there’s a lot more features to be desired, and adoption is scattered.
Fusing low-level and high-level realtime control loops
Low level robot motion controllers and high level robot planning code often execute in very different parts of a robot system, oftentimes on different computers entirely. Linux technologies like Xenomai or PreemptRT make it possible to do a mix of high level and low level control on a single processor, making it easier to integrate rich sensor data into low level loops.
Unbundling robot mechanics and robot control
The robot controllers that ship with most robots could be unbundled from the robots themselves, making it easier for a more advanced or custom controller to control the hardware. Tormach’s ZA6 robot does this by providing a low-level realtime EtherCAT interface, that can easily be connected to any high level realtime host. Autonox takes an even more aggressive unbundling approach, and lets the customer pick the servos themselves, thus giving even more flexibility in the communication protocols being used, and thus even more flexibility in choosing the robot controller itself.
AI Accelerators in the realtime control loop
Running inference on a linux host with an attached GPU is inherently a non-real time operation, due to the underlying dependency on the OS to manage the dataflow. With the advent of MIPI CSI-2 camera-ready AI accelerators (e.g. Intel Movidius Myriad or Inuitive NU4100) and dedicated autonomous vehicle AI processors (e.g. Ambarella CV3), it starts becoming reasonable for ML inference to happen at a predictable and consistent rate, allowing the inference results to be fed back directly into a realtime loop, independent of a non-realtime linux host. Pushing this even further, the Sony IMX500 chipset stacks an AI accelerator directly on the camera imager itself.

Figure 8 - The diagram shows how Figure and OpenAI have implemented their own version of PBCLC for the Figure 01 humanoid.. Camera images flow all the way back up to OpenAI’s Generative AI models to generate 200 Hz actions.
Source: “OpenAI Makes Figure’s Robot Talk Like Human, Video Went Viral” by Dhruv Kudalkar.

Final Thoughts

Maybe I’m wrong about all of this

This article bubbled out of me talking to various robot manufacturers, system integrators, startups, researchers, as well as from consulting with and advising a variety of leading technology companies trying to push the boundaries of robotics. The PBCLC being pushed by today’s AI-powered robot arms will likely be a core piece of robot technology, upon which more advanced AI systems can continue to thrive. But, predicting the future is never easy, especially when technology is changing so quickly.

As one counterexample to everything I’ve mentioned in this article, I am deeply impressed by the AI-powered TossingBot work done by Andy Zheng: His dual arm system is able to dynamically throw and catch objects, without dealing with the nuances of PBCLC; the trained AI agents are able to characterize these nuances and compensate for them accordingly in the throwing & release motions.

Figure 9 - Two UR5e robots throwing and catching arbitrary objects. Source: “TossingBot: Learning to Throw Arbitrary Objects” by Andy Zheng.

Conclusion

So, if PBCLC robot arms are the future, why hasn’t it happened yet? As mentioned earlier, we as a society haven’t quite needed it until now, as recent advances in AI have finally opened the door into an even wider breadth of applications for where and how robot arms can be used. However, the more subtle issue is that fundamentally changing the way an ecosystem does something is never easy nor fast. Moving to PBCLC requires close coordination amongst multiple stakeholders (e.g. robot arm system integrators, robot arm manufacturers, and application specific developers) as well coordination across varying technical skill sets (e.g. electrical engineering, mechanical engineering, embedded software, machine learning, etc). Humanoid robot manufacturers overcome this by building the entire tech stack themselves from scratch. However, motivating long standing, entrenched robot arm stakeholders to make similar changes will require a bit more effort. Nonetheless, once PBCLC becomes the norm for robot arms, it can set the stage for a new generation of lighter, easier to use, and more responsive robot arms that can be used for an even wider variety of applications than what exists today.

Acknowledgements

A big thanks to Rajini Haraksingh, Haomiao Huang, Austin Lockwood, and Vivek Agrawal for being fantastic sounding boards while putting this article together. Also, a big thanks to the robotics researchers, academic experts, startup founders, and industry veterans who shared their ideas and perspectives on robotics, that helped shape the core of this article.

Please feel free to reach out to Vijay, especially if you have any insights or thoughts in the following areas:

If you have deeper insights into how modern humanoids use camera & sensor data in a closed loop system or realtime architecture.
If you are a startup exploring new technical approaches for PBCLC.
If you are a robot manufacturer that is interested in integrating these ideas into your next robot design.
If you are involved in a specific industry vertical, and believe that PBCLC is the missing piece towards unlocking a large potential for robots.
If you think I’ve misrepresented what is happening today or what is needed in the future, or really anything I’ve mentioned in the article.

For professional services related to robotics technical strategy or robotics software development, please check out Vijay’s consulting company & development studio, Virtana: www.virt anatech.com

Vijay Pradeep