Skip to content
AI News10 min read0 views

Robotics Meets Agentic AI: Figure and Boston Dynamics Deploy LLM-Powered Robot Agents

Humanoid robots powered by large language models can now understand natural language commands and autonomously plan complex physical tasks, merging embodied AI with agentic reasoning.

When Language Models Get a Body

The convergence of large language models and physical robotics has produced what may be the most consequential development in AI since the transformer architecture itself. In Q1 2026, both Figure AI and Boston Dynamics demonstrated production-ready humanoid robots that use LLM-based agentic reasoning to understand natural language commands, plan multi-step physical tasks, and adapt to unexpected situations in real-time.

This is not the scripted, pre-programmed robotics of the past decade. These systems combine the natural language understanding and reasoning capabilities of frontier LLMs with the perception, manipulation, and locomotion capabilities of modern humanoid platforms. The result is robots that can be instructed in plain English to perform complex, multi-step tasks they have never been explicitly programmed to execute.

"We've spent decades trying to hand-code every possible scenario a robot might encounter," said Brett Adcock, CEO of Figure AI. "LLMs give us general-purpose reasoning that transfers to the physical world. The robot doesn't need to have seen a specific task before — it can reason about novel situations using the same common-sense understanding that makes language models useful."

Figure 02: The First Commercial LLM-Powered Humanoid

Figure AI's second-generation humanoid robot, Figure 02, began commercial deployments in January 2026 at BMW's manufacturing facility in Spartanburg, South Carolina. The robot stands 5'6", weighs 130 pounds, and features 40 degrees of freedom with hands capable of manipulating objects as small as a pen.

What distinguishes Figure 02 from previous industrial robots is its cognitive architecture. The robot uses a multimodal LLM — developed in partnership with OpenAI — that processes visual input from stereo cameras, proprioceptive feedback from joint sensors, and natural language instructions simultaneously.

Task Planning and Execution

When given an instruction like "Sort these parts by size and place the defective ones in the red bin," Figure 02 decomposes the task into sub-steps: identify all parts, estimate relative sizes, determine sort order, detect defects using visual inspection, and execute the physical manipulation sequence. This decomposition happens in real-time using chain-of-thought reasoning within the LLM.

The robot's planning system generates a hierarchical task graph that it executes while continuously monitoring for deviations. If a part slips from its grasp, it doesn't fail catastrophically — it recognizes the error, re-plans, and recovers. This robustness comes from the LLM's ability to reason about unexpected situations rather than relying on brittle pre-programmed error handlers.

Performance Metrics

In BMW's initial deployment, Figure 02 achieved the following metrics after a 90-day evaluation period:

  • Task completion rate: 94% for trained tasks, 78% for novel task variations
  • Mean time to complete: Within 1.3x of human worker speed for manipulation tasks
  • Unplanned downtime: Less than 2% over the evaluation period
  • Safety incidents: Zero reportable incidents across 10,000+ operating hours

Boston Dynamics Atlas: From Research Platform to Agentic Worker

Boston Dynamics took a different path to the same destination. Their electric Atlas humanoid, which replaced the hydraulic research platform in 2024, now ships with what the company calls the "Cognitive Layer" — an LLM-based planning system that sits atop their industry-leading locomotion and manipulation controllers.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

The Cognitive Architecture

Atlas's cognitive architecture separates reasoning into three layers:

Strategic Layer (LLM): Processes natural language instructions, decomposes them into sub-tasks, and manages high-level planning. This layer uses a fine-tuned version of a frontier model that has been trained on millions of hours of robotic task execution data.

Tactical Layer (Neural Controllers): Converts high-level sub-tasks into motion plans, handling path planning, obstacle avoidance, and dynamic balance. These controllers were trained using reinforcement learning in simulation and transfer to the physical robot.

Reflexive Layer (Real-time Control): Handles sub-millisecond balance corrections, force feedback during manipulation, and safety-critical responses. This layer operates independently of the higher layers and can override any command that would put the robot in an unsafe state.

"The LLM gives Atlas common sense," said Robert Playter, CEO of Boston Dynamics. "It knows that fragile things should be handled gently, that heavy objects need a wide base of support, and that it should ask for clarification if an instruction is ambiguous. These are things we could never hand-code for every possible scenario."

Warehouse Deployments

Atlas began warehouse deployments with Hyundai's logistics division in February 2026. In these environments, the robot performs mixed-SKU picking, palletizing, and inventory auditing — tasks that require the combination of physical dexterity, spatial reasoning, and language understanding that neither pure robotics nor pure AI could handle alone.

The Technical Challenges

Despite the impressive demonstrations, significant technical challenges remain before LLM-powered robots achieve widespread deployment.

Latency

LLM inference takes hundreds of milliseconds, which is acceptable for high-level planning but too slow for reactive physical control. The current architectures handle this through the layered approach described above, but edge cases still exist where the strategic layer's decisions arrive too late for the tactical layer to execute safely.

Hallucination in Physical Space

LLM hallucination in a text context is inconvenient. LLM hallucination in a physical context is dangerous. If a robot's language model incorrectly reasons that an object is lightweight when it's actually heavy, the resulting manipulation attempt could cause damage or injury. Both Figure and Boston Dynamics have invested heavily in grounding mechanisms that cross-reference LLM reasoning with sensor data, but the problem is not fully solved.

Cost

Figure 02 is priced at approximately $60,000-$80,000 per unit for commercial customers, with a total cost of ownership including maintenance and cloud compute for the LLM layer estimated at $15-20 per operating hour. While this is competitive with fully loaded human labor costs in many markets, it remains prohibitive for smaller operations.

The Competitive Landscape

Figure and Boston Dynamics are not alone. Tesla's Optimus program continues development with a target of sub-$20,000 unit cost. Chinese manufacturers including Unitree Robotics and UBTECH are shipping simpler humanoid platforms at lower price points. Agility Robotics' Digit, focused on logistics, has been deployed at Amazon facilities since 2024.

The race is now on to determine which architecture — and which business model — will dominate the emerging market for general-purpose humanoid robots. The LLM-powered agentic approach championed by Figure and Boston Dynamics represents the highest-capability but also highest-cost end of the spectrum.

What is clear is that the combination of large language models and physical robotics has crossed a threshold. Robots that can understand, reason, plan, and act in the physical world are no longer science fiction. They are shipping products with paying customers and measurable ROI.

Sources

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.