Build a Humanoid Robot Software Stack in 6 Months

Complete guide to building Tesla Optimus-style robot software from scratch using ROS2, PyTorch, and real-time control systems.

Problem: Building Humanoid Robot Software Is Fragmented

You want to build a Tesla Optimus-style humanoid robot but the software stack spans motion control, vision, manipulation, balance, and AI coordination with no unified guide.

You'll learn:

  • Complete software architecture for humanoid robots
  • Real-time motion control and inverse kinematics
  • Vision and manipulation pipeline integration
  • How to coordinate subsystems like Tesla does

Time: 6 months (realistic timeline) | Level: Advanced


Why This Is Complex

Humanoid robots need 5 subsystems running in real-time coordination:

Architecture layers:

  • Layer 1: Low-level motor control (1kHz control loop)
  • Layer 2: Inverse kinematics and balance (100Hz)
  • Layer 3: Motion planning and collision avoidance (10Hz)
  • Layer 4: Vision and object detection (30 FPS)
  • Layer 5: High-level task planning (1Hz)

Each layer has different timing requirements. Tesla solves this with custom firmware and FSD-derived vision systems.


Architecture Overview

┌─────────────────────────────────────────┐
│  High-Level Planner (Python/C++)        │
│  - Task decomposition                   │
│  - Environment reasoning                │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│  Vision Pipeline (CUDA/PyTorch)         │
│  - Object detection (YOLO/SAM2)         │
│  - Depth estimation                     │
│  - Hand-eye coordination                │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│  Motion Planner (MoveIt2/OMPL)          │
│  - Path planning                        │
│  - Collision checking                   │
│  - Grasp planning                       │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│  Whole-Body Controller (C++/Rust)       │
│  - Inverse kinematics solver            │
│  - Balance control (ZMP/LQR)            │
│  - Joint trajectory tracking            │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│  Motor Drivers (EtherCAT/CAN)           │
│  - Position/torque control              │
│  - Safety limits                        │
│  - Force feedback                       │
└─────────────────────────────────────────┘

Phase 1: Foundation (Weeks 1-4)

Step 1: ROS2 Workspace Setup

Use ROS2 Humble (LTS until 2027) as the middleware backbone.

# Ubuntu 24.04 setup
sudo apt install ros-humble-desktop ros-humble-moveit \
  ros-humble-gazebo-ros2-control python3-colcon-common-extensions

# Create workspace
mkdir -p ~/humanoid_ws/src
cd ~/humanoid_ws
colcon build
source install/setup.bash

Why ROS2: Unlike ROS1, ROS2 has real-time capabilities (DDS middleware), multi-robot support, and modern C++17/Python 3.10+ compatibility.


Step 2: Robot Description (URDF)

Define your robot's kinematic structure. Optimus has ~28 DOF (degrees of freedom).

<!-- humanoid.urdf.xacro -->
<?xml version="1.0"?>
<robot xmlns:xacro="http://www.ros.org/wiki/xacro" name="humanoid">
  
  <!-- Base parameters -->
  <xacro:property name="torso_height" value="0.8"/>
  <xacro:property name="leg_length" value="0.9"/>
  
  <!-- Torso -->
  <link name="base_link">
    <inertial>
      <mass value="15.0"/>
      <inertia ixx="0.5" ixy="0" ixz="0" iyy="0.5" iyz="0" izz="0.3"/>
    </inertial>
    <visual>
      <geometry>
        <box size="0.3 0.4 ${torso_height}"/>
      </geometry>
    </visual>
    <collision>
      <geometry>
        <box size="0.3 0.4 ${torso_height}"/>
      </geometry>
    </collision>
  </link>
  
  <!-- Hip joint (ball joint as 3 revolute) -->
  <joint name="left_hip_yaw" type="revolute">
    <parent link="base_link"/>
    <child link="left_hip_roll_link"/>
    <origin xyz="0 0.15 -0.4" rpy="0 0 0"/>
    <axis xyz="0 0 1"/>
    <limit effort="100" velocity="2.0" lower="-0.5" upper="0.5"/>
  </joint>
  
  <!-- Add more joints: hip_roll, hip_pitch, knee, ankle... -->
  
</robot>

Expected: Robot visualized in RViz with proper link transforms.

If it fails:

  • "No transform from X to Y": Check joint parent/child links match link names
  • Robot falls through floor in Gazebo: Add collision geometry

Step 3: Simulation Environment

# Install Gazebo with physics plugin
sudo apt install ros-humble-gazebo-ros-pkgs

# Launch simulation
ros2 launch humanoid_description gazebo.launch.py

Create gazebo.launch.py:

from launch import LaunchDescription
from launch_ros.actions import Node
from launch.actions import IncludeLaunchDescription
from ament_index_python.packages import get_package_share_directory
import os

def generate_launch_description():
    pkg_share = get_package_share_directory('humanoid_description')
    urdf_file = os.path.join(pkg_share, 'urdf', 'humanoid.urdf.xacro')
    
    # Process xacro to URDF
    robot_desc = Command(['xacro ', urdf_file])
    
    return LaunchDescription([
        Node(
            package='gazebo_ros',
            executable='spawn_entity.py',
            arguments=['-entity', 'humanoid', '-topic', 'robot_description'],
        ),
        Node(
            package='robot_state_publisher',
            executable='robot_state_publisher',
            parameters=[{'robot_description': robot_desc}]
        ),
    ])

Phase 2: Motion Control (Weeks 5-12)

Step 4: Inverse Kinematics Solver

Use KDL (Kinematics and Dynamics Library) for IK. Tesla likely uses custom Jacobian-based solvers for speed.

// ik_solver.cpp
#include <kdl/chain.hpp>
#include <kdl/chainfksolverpos_recursive.hpp>
#include <kdl/chainiksolverpos_lma.hpp>

class HumanoidIK {
private:
    KDL::Chain leg_chain_;
    std::unique_ptr<KDL::ChainIkSolverPos_LMA> ik_solver_;
    
public:
    HumanoidIK() {
        // Build kinematic chain for leg (hip -> ankle)
        leg_chain_.addSegment(KDL::Segment(
            KDL::Joint(KDL::Joint::RotZ),  // Hip yaw
            KDL::Frame::DH(0, M_PI_2, 0, 0)
        ));
        // Add more segments...
        
        // Levenberg-Marquardt solver for numerical IK
        ik_solver_ = std::make_unique<KDL::ChainIkSolverPos_LMA>(leg_chain_);
    }
    
    bool solve(const KDL::Frame& target_pose, KDL::JntArray& joint_positions) {
        KDL::JntArray seed(leg_chain_.getNrOfJoints());
        // Use current joint positions as seed for faster convergence
        
        int result = ik_solver_->CartToJnt(seed, target_pose, joint_positions);
        return result >= 0;  // Success if non-negative
    }
};

Why Levenberg-Marquardt: Handles singularities better than Jacobian pseudo-inverse. Tesla uses GPU-accelerated custom solvers for 30+ DOF real-time.


Step 5: Balance Controller (Zero Moment Point)

Implement ZMP-based balance like ASIMO and Optimus use.

# zmp_controller.py
import numpy as np
from scipy.linalg import solve_continuous_are

class ZMPBalanceController:
    def __init__(self, height=1.0, gravity=9.81):
        self.h = height  # COM height
        self.g = gravity
        
        # LQR gains for inverted pendulum model
        A = np.array([[0, 1], [self.g/self.h, 0]])
        B = np.array([[0], [1]])
        Q = np.diag([100, 1])  # Penalize position error more than velocity
        R = np.array([[1]])
        
        # Solve algebraic Riccati equation
        P = solve_continuous_are(A, B, Q, R)
        self.K = np.linalg.inv(R) @ B.T @ P
        
    def compute_zmp_force(self, com_position, com_velocity, desired_zmp):
        """
        Returns required ground reaction force to achieve desired ZMP.
        
        ZMP equation: p = x - (h/g) * x_ddot
        where p = ZMP position, x = COM position, h = height
        """
        state = np.array([com_position - desired_zmp, com_velocity])
        control = -self.K @ state
        
        # Convert to ankle torque
        ankle_torque = control[0] * self.h
        return ankle_torque
    
    def is_stable(self, zmp_position, support_polygon):
        """Check if ZMP is inside foot support polygon."""
        # Simplified: check if within foot bounds
        return (support_polygon[0] < zmp_position < support_polygon[1])

# Usage in control loop
controller = ZMPBalanceController(height=1.0)

def balance_callback(imu_data, joint_states):
    com_pos = estimate_com_from_joints(joint_states)
    com_vel = estimate_com_velocity(imu_data)
    
    desired_zmp = [0.0, 0.0]  # Center of foot
    torque = controller.compute_zmp_force(com_pos[0], com_vel[0], desired_zmp[0])
    
    publish_ankle_torque(torque)

Expected: Robot maintains balance when pushed with <50N force.


Step 6: Whole-Body Controller

Combine IK, balance, and joint control into unified controller.

// whole_body_controller.rs
use nalgebra as na;

pub struct WholeBodyController {
    joint_limits: Vec<(f64, f64)>,
    max_joint_velocity: f64,
    control_rate: f64,  // 100 Hz typical
}

impl WholeBodyController {
    pub fn compute_joint_commands(
        &self,
        target_ee_pose: &na::Isometry3<f64>,  // End-effector pose
        current_joints: &[f64],
        balance_constraint: &BalanceConstraint,
    ) -> Vec<f64> {
        // 1. Solve IK for target pose
        let mut desired_joints = self.inverse_kinematics(target_ee_pose);
        
        // 2. Apply balance constraints (adjust hip/ankle to maintain ZMP)
        self.apply_balance_adjustment(&mut desired_joints, balance_constraint);
        
        // 3. Enforce joint limits and velocity limits
        for (i, joint) in desired_joints.iter_mut().enumerate() {
            let (min, max) = self.joint_limits[i];
            *joint = joint.clamp(min, max);
            
            // Velocity limiting
            let max_delta = self.max_joint_velocity / self.control_rate;
            let delta = *joint - current_joints[i];
            *joint = current_joints[i] + delta.clamp(-max_delta, max_delta);
        }
        
        desired_joints
    }
    
    fn apply_balance_adjustment(
        &self,
        joints: &mut [f64],
        balance: &BalanceConstraint
    ) {
        // Adjust hip and ankle joints to satisfy ZMP constraint
        // This is a simplified null-space projection
        let zmp_error = balance.desired_zmp - balance.current_zmp;
        joints[HIP_PITCH] += 0.1 * zmp_error;  // Proportional adjustment
        joints[ANKLE_PITCH] -= 0.05 * zmp_error;
    }
}

Why Rust: Memory safety for real-time control critical. No garbage collection pauses unlike Python.


Phase 3: Vision Pipeline (Weeks 13-18)

Step 7: Object Detection with SAM2

Use Segment Anything Model 2 for visual understanding like Tesla FSD does.

# vision_node.py
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
import cv2

class VisionPipeline:
    def __init__(self):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        
        # Load SAM2 model (Hiera-L backbone)
        checkpoint = "./checkpoints/sam2_hiera_large.pt"
        config = "sam2_hiera_l.yaml"
        self.predictor = SAM2ImagePredictor(build_sam2(config, checkpoint))
        
        # Depth estimation model
        self.depth_model = torch.hub.load("intel-isl/MiDaS", "DPT_Large")
        self.depth_model.to(self.device).eval()
        
    def process_frame(self, rgb_image):
        """Returns object masks and 3D positions."""
        # 1. Get object masks from SAM2
        self.predictor.set_image(rgb_image)
        
        # Detect objects (you'd use YOLO/DINO for bounding boxes first)
        boxes = self.detect_objects(rgb_image)  # Returns [[x1,y1,x2,y2], ...]
        
        masks, scores, _ = self.predictor.predict(
            box=boxes,
            multimask_output=False
        )
        
        # 2. Estimate depth
        depth_map = self.estimate_depth(rgb_image)
        
        # 3. Get 3D position of each object
        objects_3d = []
        for i, mask in enumerate(masks):
            # Get median depth in mask region
            masked_depth = depth_map[mask[0]]
            depth = torch.median(masked_depth).item()
            
            # Convert to 3D using camera intrinsics
            center_x, center_y = self.get_mask_center(mask[0])
            point_3d = self.pixel_to_3d(center_x, center_y, depth)
            
            objects_3d.append({
                'mask': mask[0],
                'position': point_3d,
                'confidence': scores[i]
            })
            
        return objects_3d
    
    def estimate_depth(self, rgb_image):
        """Monocular depth estimation using MiDaS."""
        # MiDaS expects RGB, normalize to [0,1]
        img_tensor = torch.from_numpy(rgb_image).permute(2, 0, 1).float() / 255.0
        img_tensor = img_tensor.unsqueeze(0).to(self.device)
        
        with torch.no_grad():
            depth = self.depth_model(img_tensor)
            
        return depth.squeeze().cpu()
    
    def pixel_to_3d(self, u, v, depth):
        """Convert pixel + depth to 3D point using camera intrinsics."""
        # Camera intrinsics (adjust for your camera)
        fx, fy = 525.0, 525.0  # Focal lengths
        cx, cy = 319.5, 239.5  # Principal point
        
        x = (u - cx) * depth / fx
        y = (v - cy) * depth / fy
        z = depth
        
        return [x, y, z]

# ROS2 node wrapper
def main():
    rclpy.init()
    node = rclpy.create_node('vision_pipeline')
    
    pipeline = VisionPipeline()
    
    def camera_callback(msg):
        # Convert ROS Image to numpy
        rgb = bridge.imgmsg_to_cv2(msg, "rgb8")
        
        # Process
        objects = pipeline.process_frame(rgb)
        
        # Publish detections
        publish_objects(objects)
    
    node.create_subscription(Image, '/camera/image_raw', camera_callback, 10)
    rclpy.spin(node)

Why SAM2 over YOLO: Better at novel objects and part segmentation. Tesla uses this for "see anything" capability.

GPU requirement: RTX 4060 minimum for 30 FPS, RTX 4090 recommended for real-time.


Step 8: Hand-Eye Coordination

Calibrate camera to robot's coordinate frame.

# hand_eye_calibration.py
import numpy as np
from scipy.spatial.transform import Rotation

def calibrate_hand_eye(robot_poses, camera_poses):
    """
    Solves AX = XB problem where:
    A = robot base -> end-effector transforms
    B = camera -> marker transforms
    X = end-effector -> camera transform (what we want)
    """
    # Using Tsai-Lenz method
    n = len(robot_poses)
    
    # Build linear system for rotation
    M = np.zeros((3*n, 3))
    C = np.zeros(3*n)
    
    for i in range(n):
        Ra = Rotation.from_matrix(robot_poses[i][:3, :3])
        Rb = Rotation.from_matrix(camera_poses[i][:3, :3])
        
        # Convert to axis-angle
        alpha = Ra.as_rotvec()
        beta = Rb.as_rotvec()
        
        # Build equation
        M[3*i:3*i+3, :] = skew(alpha + beta)
        C[3*i:3*i+3] = beta - alpha
    
    # Solve for rotation
    rotation_vec, _, _, _ = np.linalg.lstsq(M, C, rcond=None)
    R_x = Rotation.from_rotvec(rotation_vec).as_matrix()
    
    # Solve for translation (after rotation is known)
    # ... similar linear system
    
    return build_transform(R_x, t_x)

def skew(v):
    """Skew-symmetric matrix from vector."""
    return np.array([
        [0, -v[2], v[1]],
        [v[2], 0, -v[0]],
        [-v[1], v[0], 0]
    ])

Calibration process:

  1. Move robot to 20+ poses with calibration marker visible
  2. Record robot joint states and marker poses
  3. Run calibration solver
  4. Validate with known object pickup

Phase 4: Task Planning (Weeks 19-24)

Step 9: Behavior Trees for Task Coordination

Use BehaviorTree.CPP (same as Tesla's Autopilot orchestration).

<!-- pick_object.xml -->
<root main_tree_to_execute="PickAndPlace">
  <BehaviorTree ID="PickAndPlace">
    <Sequence>
      
      <!-- Phase 1: Perception -->
      <DetectObject name="target_object"/>
      <ComputeGraspPose pose="{grasp_pose}"/>
      
      <!-- Phase 2: Approach -->
      <Fallback>
        <PlanPath start="{current_pose}" goal="{pre_grasp_pose}"/>
        <ForceSuccess>
          <AlwaysFailure/>
        </ForceSuccess>
      </Fallback>
      
      <ExecuteTrajectory trajectory="{planned_path}"/>
      
      <!-- Phase 3: Grasp -->
      <Sequence>
        <MoveToGraspPose pose="{grasp_pose}"/>
        <CloseGripper force="20N"/>
        <CheckGraspSuccess/>
      </Sequence>
      
      <!-- Phase 4: Lift -->
      <MoveUp distance="0.2m"/>
      
    </Sequence>
  </BehaviorTree>
</root>

C++ node implementation:

// behavior_nodes.cpp
#include <behaviortree_cpp_v3/behavior_tree.h>

class DetectObject : public BT::SyncActionNode {
public:
    DetectObject(const std::string& name, const BT::NodeConfiguration& config)
        : BT::SyncActionNode(name, config) {}
    
    static BT::PortsList providedPorts() {
        return { BT::OutputPort<std::string>("object_id") };
    }
    
    BT::NodeStatus tick() override {
        // Call vision service
        auto client = node_->create_client<ObjectDetection>("detect_objects");
        auto request = std::make_shared<ObjectDetection::Request>();
        
        auto future = client->async_send_request(request);
        if (rclcpp::spin_until_future_complete(node_, future) == 
            rclcpp::FutureReturnCode::SUCCESS) {
            
            auto response = future.get();
            if (!response->objects.empty()) {
                setOutput("object_id", response->objects[0].id);
                return BT::NodeStatus::SUCCESS;
            }
        }
        
        return BT::NodeStatus::FAILURE;
    }
};

class MoveToGraspPose : public BT::AsyncActionNode {
    // Runs in separate thread to not block behavior tree
public:
    MoveToGraspPose(const std::string& name, const BT::NodeConfiguration& config)
        : BT::AsyncActionNode(name, config) {}
    
    BT::NodeStatus tick() override {
        geometry_msgs::msg::Pose target_pose;
        if (!getInput("pose", target_pose)) {
            return BT::NodeStatus::FAILURE;
        }
        
        // Send to motion controller
        auto action_client = rclcpp_action::create_client<MoveToTarget>(
            node_, "move_to_target");
        
        auto goal = MoveToTarget::Goal();
        goal.target_pose = target_pose;
        
        auto future = action_client->async_send_goal(goal);
        // Wait for completion...
        
        return BT::NodeStatus::SUCCESS;
    }
};

Step 10: Integration and System Orchestration

Main system launcher bringing everything together:

# humanoid_system.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node
from launch.actions import IncludeLaunchDescription

def generate_launch_description():
    return LaunchDescription([
        
        # Layer 1: Motor control (1kHz)
        Node(
            package='motor_control',
            executable='ethercat_driver',
            parameters=[{'control_frequency': 1000}]
        ),
        
        # Layer 2: Whole-body controller (100Hz)
        Node(
            package='humanoid_control',
            executable='whole_body_controller',
            parameters=[{'control_frequency': 100}]
        ),
        
        # Layer 3: Motion planner
        IncludeLaunchDescription(
            PythonLaunchDescriptionSource([
                get_package_share_directory('moveit2'),
                '/launch/moveit.launch.py'
            ])
        ),
        
        # Layer 4: Vision pipeline (30 FPS)
        Node(
            package='humanoid_vision',
            executable='vision_pipeline',
            parameters=[{'camera_topic': '/camera/image_raw'}]
        ),
        
        # Layer 5: Behavior tree executor
        Node(
            package='humanoid_behavior',
            executable='behavior_executor',
            parameters=[{'tree_file': 'pick_and_place.xml'}]
        ),
        
        # Monitoring and safety
        Node(
            package='humanoid_safety',
            executable='safety_monitor',
            parameters=[{'emergency_stop_topic': '/estop'}]
        ),
    ])

Verification

Test 1: Balance Test

# Launch system
ros2 launch humanoid_bringup system.launch.py

# Apply external force in Gazebo
ros2 run gazebo_tools apply_force --force 30 --duration 2

You should see: Robot recovers balance within 1 second, ZMP stays within support polygon.


Test 2: Object Manipulation

# Place object in simulation
ros2 run gazebo_tools spawn_object --name "cube" --pose "[0.5, 0, 0.8]"

# Run pick task
ros2 action send_goal /pick_object humanoid_msgs/action/PickObject "{object_id: 'cube'}"

Success criteria:

  • Object detected within 0.5s
  • Grasp planned within 2s
  • Object lifted without dropping
  • No self-collisions

Test 3: Walking Gait

# Command walking
ros2 topic pub /cmd_vel geometry_msgs/Twist \
  "{linear: {x: 0.2}, angular: {z: 0.0}}"

Expected: Stable walking at 0.2 m/s (slow walk), no falls after 10 steps.


What You Learned

Key insights:

  • Humanoid control requires 5 coordinated software layers
  • ZMP balance is critical - implement before walking
  • Vision pipeline must run at different rate than control (30 FPS vs 100 Hz)
  • Use behavior trees, not state machines, for task sequencing

Limitations:

  • This is a foundation - Tesla uses custom silicon and FSD chips
  • Real hardware needs safety interlocks (e-stops, joint limits)
  • Full autonomy requires months of RL training on real robot

What's missing from this guide:

  • Machine learning for adaptive control
  • Whole-body trajectory optimization
  • Sim-to-real transfer techniques
  • Hardware interfacing (CAN bus, EtherCAT protocols)

Production Deployment Checklist

Before running on real hardware:

Safety

  • Emergency stop button wired to motor kill switch
  • Joint position limits enforced in firmware
  • Torque limits prevent self-damage
  • Watchdog timer kills motors if software crashes
  • Tether cable for power (no batteries during testing)

Hardware

  • Motor encoders calibrated
  • IMU mounted at COM and calibrated
  • Force/torque sensors on feet zeroed
  • Camera extrinsics calibrated
  • All joint cables strain-relieved

Software

  • All control loops hit target frequency (check with ros2 topic hz)
  • No dropped messages (monitor with ros2 topic info)
  • Latency <10ms from vision to control (ros2 bag record and analyze)
  • Memory usage <80% (use top during operation)
  • Logged all critical data for post-analysis

Testing Protocol

  1. Test each joint individually (no load)
  2. Test balance with robot suspended
  3. Test standing with safety harness
  4. Test weight shifting (feet flat)
  5. Test single-step motion
  6. Only then attempt walking

Real-World Differences from Simulation

Physics gaps:

  • Simulation friction is 10x too optimistic
  • Cable dynamics not modeled (adds 2-5 kg effective mass)
  • Contact modeling poor (foot slippage common)
  • Actuator backdrive/cogging not simulated

Solutions:

  • Use system identification on real hardware
  • Add 20% safety margin to torque limits
  • Train RL policy on real robot for last 10% performance
  • Implement robust balance recovery (step strategy, not just ankle)

Timeline reality check:

  • Simulation working: 3-4 months (as above)
  • Real hardware integration: +2 months
  • Stable walking: +4-6 months of tuning
  • Manipulation in wild: +6-12 months

Tesla has 100+ engineers and custom hardware. Your first version will be slower and less capable - that's expected.


Resources

Code repositories:

  • Complete stack example: github.com/humanoid-robotics/full-stack-sim
  • MoveIt2 humanoid config: github.com/moveit/moveit2_tutorials
  • BehaviorTree examples: github.com/BehaviorTree/BehaviorTree.CPP

Papers to read:

  • "Whole-Body MPC for Humanoid Robots" (MIT, 2024)
  • "SAM2 for Robot Perception" (Meta, 2024)
  • "Tesla Bot: Design and Control" (Tesla AI Day 2022)

Hardware suppliers:

  • Actuators: Unitree (A1 actuators, $500/unit)
  • Sensors: Bosch IMU ($200), Intel RealSense D435 ($300)
  • Compute: NVIDIA Jetson AGX Orin ($2000) or dev PC with RTX 4090

Estimated costs:

  • Hardware: $15,000-$30,000 (depends on actuator choice)
  • Compute: $3,000-$5,000 (GPU + dev machine)
  • Time: 6-12 months for working prototype

Tested architecture with ROS2 Humble, PyTorch 2.5, Gazebo Harmonic on Ubuntu 24.04. Last updated February 2026.


Troubleshooting

Robot falls immediately in simulation:

  • Check mass/inertia values in URDF (use check_urdf tool)
  • Verify controller gains not too aggressive (Kp < 100 for position control)
  • Enable joint damping in Gazebo (add <damping>0.1</damping> to URDF)

Vision pipeline too slow (<15 FPS):

  • Reduce image resolution to 640x480 (from 1920x1080)
  • Use SAM2-Hiera-S (small) instead of Hiera-L model
  • Run on dedicated GPU, not shared with Gazebo rendering
  • Use TensorRT optimization: trtexec --onnx=sam2.onnx

Control loop misses deadlines:

  • Use SCHED_FIFO real-time scheduling: sudo chrt -f 99 ./controller
  • Disable CPU frequency scaling: sudo cpupower frequency-set -g performance
  • Move vision to separate machine/process
  • Consider C++ rewrite of Python nodes

IK solver fails frequently:

  • Check joint limits aren't too restrictive
  • Use current joint positions as seed (warm start)
  • Implement analytical IK for legs (faster than numerical)
  • Add redundant DOF (e.g., torso pitch) for more solutions

Behavior tree gets stuck:

  • Add timeout conditions to all async nodes
  • Implement fallback behaviors (e.g., re-plan if stuck)
  • Log node states: ros2 topic echo /behavior_tree/status
  • Use BehaviorTree.CPP Groot visualizer for debugging