When Language Models Get a Body

The convergence of large language models and physical robotics has produced what may be the most consequential development in AI since the transformer architecture itself. In Q1 2026, both Figure AI and Boston Dynamics demonstrated production-ready humanoid robots that use LLM-based agentic reasoning to understand natural language commands, plan multi-step physical tasks, and adapt to unexpected situations in real-time.

This is not the scripted, pre-programmed robotics of the past decade. These systems combine the natural language understanding and reasoning capabilities of frontier LLMs with the perception, manipulation, and locomotion capabilities of modern humanoid platforms. The result is robots that can be instructed in plain English to perform complex, multi-step tasks they have never been explicitly programmed to execute.

"We've spent decades trying to hand-code every possible scenario a robot might encounter," said Brett Adcock, CEO of Figure AI. "LLMs give us general-purpose reasoning that transfers to the physical world. The robot doesn't need to have seen a specific task before — it can reason about novel situations using the same common-sense understanding that makes language models useful."

Figure 02: The First Commercial LLM-Powered Humanoid

Figure AI's second-generation humanoid robot, Figure 02, began commercial deployments in January 2026 at BMW's manufacturing facility in Spartanburg, South Carolina. The robot stands 5'6", weighs 130 pounds, and features 40 degrees of freedom with hands capable of manipulating objects as small as a pen.

What distinguishes Figure 02 from previous industrial robots is its cognitive architecture. The robot uses a multimodal LLM — developed in partnership with OpenAI — that processes visual input from stereo cameras, proprioceptive feedback from joint sensors, and natural language instructions simultaneously.

Task Planning and Execution

When given an instruction like "Sort these parts by size and place the defective ones in the red bin," Figure 02 decomposes the task into sub-steps: identify all parts, estimate relative sizes, determine sort order, detect defects using visual inspection, and execute the physical manipulation sequence. This decomposition happens in real-time using chain-of-thought reasoning within the LLM.

The robot's planning system generates a hierarchical task graph that it executes while continuously monitoring for deviations. If a part slips from its grasp, it doesn't fail catastrophically — it recognizes the error, re-plans, and recovers. This robustness comes from the LLM's ability to reason about unexpected situations rather than relying on brittle pre-programmed error handlers.

Performance Metrics

In BMW's initial deployment, Figure 02 achieved the following metrics after a 90-day evaluation period:

Task completion rate: 94% for trained tasks, 78% for novel task variations
Mean time to complete: Within 1.3x of human worker speed for manipulation tasks
Unplanned downtime: Less than 2% over the evaluation period
Safety incidents: Zero reportable incidents across 10,000+ operating hours

Boston Dynamics Atlas: From Research Platform to Agentic Worker

Boston Dynamics took a different path to the same destination. Their electric Atlas humanoid, which replaced the hydraulic research platform in 2024, now ships with what the company calls the "Cognitive Layer" — an LLM-based planning system that sits atop their industry-leading locomotion and manipulation controllers.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

The Cognitive Architecture

Atlas's cognitive architecture separates reasoning into three layers:

Strategic Layer (LLM): Processes natural language instructions, decomposes them into sub-tasks, and manages high-level planning. This layer uses a fine-tuned version of a frontier model that has been trained on millions of hours of robotic task execution data.

Tactical Layer (Neural Controllers): Converts high-level sub-tasks into motion plans, handling path planning, obstacle avoidance, and dynamic balance. These controllers were trained using reinforcement learning in simulation and transfer to the physical robot.

Reflexive Layer (Real-time Control): Handles sub-millisecond balance corrections, force feedback during manipulation, and safety-critical responses. This layer operates independently of the higher layers and can override any command that would put the robot in an unsafe state.

"The LLM gives Atlas common sense," said Robert Playter, CEO of Boston Dynamics. "It knows that fragile things should be handled gently, that heavy objects need a wide base of support, and that it should ask for clarification if an instruction is ambiguous. These are things we could never hand-code for every possible scenario."

Warehouse Deployments

Atlas began warehouse deployments with Hyundai's logistics division in February 2026. In these environments, the robot performs mixed-SKU picking, palletizing, and inventory auditing — tasks that require the combination of physical dexterity, spatial reasoning, and language understanding that neither pure robotics nor pure AI could handle alone.

The Technical Challenges

Despite the impressive demonstrations, significant technical challenges remain before LLM-powered robots achieve widespread deployment.

Latency

LLM inference takes hundreds of milliseconds, which is acceptable for high-level planning but too slow for reactive physical control. The current architectures handle this through the layered approach described above, but edge cases still exist where the strategic layer's decisions arrive too late for the tactical layer to execute safely.

Hallucination in Physical Space

LLM hallucination in a text context is inconvenient. LLM hallucination in a physical context is dangerous. If a robot's language model incorrectly reasons that an object is lightweight when it's actually heavy, the resulting manipulation attempt could cause damage or injury. Both Figure and Boston Dynamics have invested heavily in grounding mechanisms that cross-reference LLM reasoning with sensor data, but the problem is not fully solved.

Cost

Figure 02 is priced at approximately $60,000-$80,000 per unit for commercial customers, with a total cost of ownership including maintenance and cloud compute for the LLM layer estimated at $15-20 per operating hour. While this is competitive with fully loaded human labor costs in many markets, it remains prohibitive for smaller operations.

The Competitive Landscape

Figure and Boston Dynamics are not alone. Tesla's Optimus program continues development with a target of sub-$20,000 unit cost. Chinese manufacturers including Unitree Robotics and UBTECH are shipping simpler humanoid platforms at lower price points. Agility Robotics' Digit, focused on logistics, has been deployed at Amazon facilities since 2024.

The race is now on to determine which architecture — and which business model — will dominate the emerging market for general-purpose humanoid robots. The LLM-powered agentic approach championed by Figure and Boston Dynamics represents the highest-capability but also highest-cost end of the spectrum.

What is clear is that the combination of large language models and physical robotics has crossed a threshold. Robots that can understand, reason, plan, and act in the physical world are no longer science fiction. They are shipping products with paying customers and measurable ROI.

Robotics Meets Agentic AI: Figure and Boston Dynamics Deploy LLM-Powered Robot Agents

When Language Models Get a Body

Figure 02: The First Commercial LLM-Powered Humanoid

Task Planning and Execution

Performance Metrics

Boston Dynamics Atlas: From Research Platform to Agentic Worker

The Cognitive Architecture

Warehouse Deployments

The Technical Challenges

Latency

Hallucination in Physical Space

Cost

The Competitive Landscape

Sources

Try CallSphere AI Voice Agents

Related Articles

The State of Enterprise AI Adoption in 2026: Key Findings and What They Mean | CallSphere Blog

From Pilot to Production: Why Most AI Projects Stall and How to Break Through | CallSphere Blog

OpenAI Launches Operator 2.0: Autonomous Web Agents Now Handle Multi-Step Purchases