Problem: Gazebo GUI Wastes Cloud Resources

You need to run hundreds of robot simulations for testing, but Gazebo's GUI consumes GPU resources and slows down AWS RoboMaker batch jobs. Each simulation costs more and takes longer than it should.

You'll learn:

Configure Gazebo for headless operation on AWS RoboMaker
Set up batch simulation jobs without rendering overhead
Monitor simulations and collect results efficiently

Time: 20 min | Level: Intermediate

Why This Happens

AWS RoboMaker runs Gazebo with full rendering by default, allocating GPU resources even when you don't need visualization. For automated testing or reinforcement learning training, this rendering overhead increases costs by 40-60% and reduces simulation throughput.

Common symptoms:

High EC2 costs for simulation batches
Slow simulation performance despite powerful instances
GPU memory errors when scaling to multiple parallel simulations
Unnecessary X11 display errors in CloudWatch logs

Solution

Step 1: Prepare Your ROS Workspace

Set up your simulation package with headless configuration support.

# Navigate to your ROS workspace
cd ~/ros_workspace/src

# Create launch file directory if needed
mkdir -p my_robot_sim/launch

Expected: Directory structure ready for launch files.

Step 2: Create Headless Launch File

Create a launch file that disables Gazebo's GUI and rendering components.

<!-- my_robot_sim/launch/headless_sim.launch -->
<launch>
  <!-- Headless Gazebo arguments -->
  <arg name="gui" default="false"/>
  <arg name="headless" default="true"/>
  <arg name="verbose" default="true"/>
  
  <!-- Start Gazebo server only (no client) -->
  <include file="$(find gazebo_ros)/launch/empty_world.launch">
    <arg name="world_name" value="$(find my_robot_sim)/worlds/test_world.world"/>
    <arg name="paused" value="false"/>
    <arg name="use_sim_time" value="true"/>
    <arg name="gui" value="$(arg gui)"/>
    <arg name="headless" value="$(arg headless)"/>
    <arg name="debug" value="false"/>
    <arg name="verbose" value="$(arg verbose)"/>
  </include>
  
  <!-- Spawn your robot model -->
  <node name="spawn_robot" pkg="gazebo_ros" type="spawn_model"
        args="-file $(find my_robot_sim)/urdf/robot.urdf 
              -urdf -model my_robot -x 0 -y 0 -z 0.5"
        output="screen"/>
  
  <!-- Your simulation nodes -->
  <node name="robot_controller" pkg="my_robot_sim" type="controller_node.py"/>
  
  <!-- Data collection node for results -->
  <node name="metrics_collector" pkg="my_robot_sim" type="collect_metrics.py"
        args="--output /tmp/simulation_results.json"/>
</launch>

Why this works: Setting gui="false" and headless="true" tells Gazebo to run only the physics server (gzserver) without the client (gzclient). This eliminates GPU rendering and X11 display requirements.

If it fails:

Error: "Could not find world file": Use full path or check $(find package_name) resolves correctly
Robot doesn't spawn: Verify URDF path and ensure gazebo_ros is installed

Step 3: Configure RoboMaker Simulation Application

Create the RoboMaker simulation application configuration.

# scripts/create_simulation_app.py
import boto3

robomaker = boto3.client('robomaker', region_name='us-west-2')

# Create simulation application
response = robomaker.create_simulation_application(
    name='headless-gazebo-sim',
    renderingEngine={'name': 'OGRE', 'version': '1.x'},  # Required but won't be used
    simulationSoftwareSuite={
        'name': 'Gazebo',
        'version': '11'  # Use Gazebo 11 for ROS Noetic
    },
    robotSoftwareSuite={
        'name': 'ROS',
        'version': 'Noetic'
    },
    sources=[{
        's3Bucket': 'my-robomaker-bucket',
        's3Key': 'robot-sim.tar.gz',  # Your bundled workspace
        'architecture': 'X86_64'
    }]
)

print(f"Application ARN: {response['arn']}")

Expected: Application created with ARN output.

Step 4: Bundle and Upload Workspace

Package your ROS workspace for RoboMaker deployment.

# Build your workspace
cd ~/ros_workspace
colcon build --install-base /opt/ros/noetic

# Create bundle following RoboMaker structure
mkdir -p bundle
cp -r install bundle/
cp -r src bundle/

# Create tar archive
tar -czf robot-sim.tar.gz -C bundle .

# Upload to S3
aws s3 cp robot-sim.tar.gz s3://my-robomaker-bucket/robot-sim.tar.gz

Why this works: RoboMaker expects a specific directory structure with the compiled workspace. The tar.gz format is required for deployment.

If it fails:

Error: "Missing dependencies": Run rosdep install --from-paths src --ignore-src -r -y before building
Upload fails: Check S3 bucket permissions and AWS credentials

Step 5: Launch Headless Simulation Job

Create and start a batch simulation job configured for headless operation.

# scripts/launch_headless_batch.py
import boto3

robomaker = boto3.client('robomaker', region_name='us-west-2')

# Define simulation job configuration
response = robomaker.create_simulation_job(
    maxJobDurationInSeconds=3600,  # 1 hour timeout
    iamRole='arn:aws:iam::ACCOUNT_ID:role/RoboMakerSimulationRole',
    
    # Use compute without GPU for cost savings
    compute={
        'simulationUnitLimit': 1  # 1 = 4 vCPU, 8 GB RAM, no GPU
    },
    
    simulationApplications=[{
        'application': 'arn:aws:robomaker:us-west-2:ACCOUNT_ID:simulation-application/headless-gazebo-sim',
        'launchConfig': {
            'packageName': 'my_robot_sim',
            'launchFile': 'headless_sim.launch',
            'environmentVariables': {
                'DISPLAY': ':1',  # Virtual display for X11 dependencies
                'LIBGL_ALWAYS_SOFTWARE': '1',  # Force software rendering
                'GAZEBO_MASTER_URI': 'http://localhost:11345'
            }
        }
    }],
    
    # Data output configuration
    outputLocation={
        's3Bucket': 'my-robomaker-results',
        's3Prefix': 'simulation-runs/'
    },
    
    # Logging configuration
    loggingConfig={
        'recordAllRosTopics': False  # Set true if you need rosbag recordings
    },
    
    vpcConfig={
        'subnets': ['subnet-xxxxx'],
        'securityGroups': ['sg-xxxxx'],
        'assignPublicIp': True
    }
)

job_arn = response['arn']
print(f"Simulation job started: {job_arn}")

# Monitor job status
while True:
    job = robomaker.describe_simulation_job(job=job_arn)
    status = job['status']
    print(f"Status: {status}")
    
    if status in ['Completed', 'Failed', 'Canceled']:
        break
    
    time.sleep(30)

# Retrieve results
if status == 'Completed':
    print(f"Results available at: s3://my-robomaker-results/simulation-runs/")

Why this works: Setting simulationUnitLimit=1 allocates CPU-only instances. The environment variables force software rendering, eliminating GPU requirements. This reduces costs by ~50% compared to GPU instances.

If it fails:

Error: "Job failed with status Failed": Check CloudWatch logs for Gazebo errors
Timeout: Increase maxJobDurationInSeconds or optimize simulation physics settings
VPC errors: Ensure subnets have internet access if pulling dependencies

Step 6: Optimize for Batch Simulations

Run multiple parallel simulations with parameter variations.

# scripts/batch_parameter_sweep.py
import boto3
import itertools

robomaker = boto3.client('robomaker', region_name='us-west-2')

# Define parameter space
robot_speeds = [0.5, 1.0, 1.5, 2.0]
obstacle_densities = ['low', 'medium', 'high']

# Generate all combinations
param_combinations = list(itertools.product(robot_speeds, obstacle_densities))

job_arns = []

for speed, density in param_combinations:
    response = robomaker.create_simulation_job(
        maxJobDurationInSeconds=1800,
        iamRole='arn:aws:iam::ACCOUNT_ID:role/RoboMakerSimulationRole',
        compute={'simulationUnitLimit': 1},
        
        simulationApplications=[{
            'application': 'arn:aws:robomaker:us-west-2:ACCOUNT_ID:simulation-application/headless-gazebo-sim',
            'launchConfig': {
                'packageName': 'my_robot_sim',
                'launchFile': 'headless_sim.launch',
                'environmentVariables': {
                    'ROBOT_SPEED': str(speed),
                    'OBSTACLE_DENSITY': density,
                    'DISPLAY': ':1',
                    'LIBGL_ALWAYS_SOFTWARE': '1'
                }
            }
        }],
        
        outputLocation={
            's3Bucket': 'my-robomaker-results',
            's3Prefix': f'batch-sweep/speed_{speed}_density_{density}/'
        },
        
        tags={
            'experiment': 'parameter-sweep',
            'speed': str(speed),
            'density': density
        }
    )
    
    job_arns.append(response['arn'])
    print(f"Launched: speed={speed}, density={density}")

print(f"Total jobs launched: {len(job_arns)}")

Expected: Multiple simulation jobs running in parallel, each with different parameters. Check AWS Console to see jobs in "Running" state.

Step 7: Collect and Analyze Results

Set up automated result collection from S3.

# scripts/collect_results.py
import boto3
import json
import pandas as pd

s3 = boto3.client('s3')

def download_simulation_results(bucket, prefix):
    """Download all simulation result JSON files"""
    results = []
    
    response = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
    
    for obj in response.get('Contents', []):
        if obj['Key'].endswith('.json'):
            # Download file
            file_obj = s3.get_object(Bucket=bucket, Key=obj['Key'])
            data = json.loads(file_obj['Body'].read())
            
            # Extract metadata from S3 key
            parts = obj['Key'].split('/')
            data['speed'] = parts[-3].split('_')[1]
            data['density'] = parts[-2].split('_')[1]
            
            results.append(data)
    
    return pd.DataFrame(results)

# Collect results
df = download_simulation_results('my-robomaker-results', 'batch-sweep/')

# Analyze performance
print("Average completion times by speed:")
print(df.groupby('speed')['completion_time'].mean())

print("\nSuccess rates by obstacle density:")
print(df.groupby('density')['success'].mean())

# Save analysis
df.to_csv('simulation_analysis.csv', index=False)
print("Results saved to simulation_analysis.csv")

Expected: CSV file with aggregated metrics from all simulations.

Verification

Check that your headless simulation is running correctly.

# Monitor CloudWatch logs
aws logs tail /aws/robomaker/SimulationJobs --follow

# Check for successful indicators
# âœ… "Gazebo multi-robot simulator, version 11.x"
# âœ… "gzserver started" (NOT gzclient)
# âœ… No X11 or GPU errors
# âŒ "Unable to create the rendering window" (means GPU is being used)

# Verify results in S3
aws s3 ls s3://my-robomaker-results/simulation-runs/ --recursive

You should see: Log entries showing gzserver starting without gzclient, and result files appearing in S3 during/after simulation completion.

Performance Optimization

Adjust Physics Settings for Speed

Modify your world file to increase simulation speed:

<!-- worlds/test_world.world -->
<sdf version="1.6">
  <world name="default">
    <physics type="ode">
      <!-- Increase time step for faster simulation -->
      <max_step_size>0.01</max_step_size>  <!-- Default: 0.001 -->
      <real_time_factor>1.5</real_time_factor>  <!-- Run 1.5x realtime -->
      <real_time_update_rate>100</real_time_update_rate>  <!-- Default: 1000 -->
    </physics>
    
    <!-- Your world contents -->
  </world>
</sdf>

Trade-off: Higher step sizes reduce physics accuracy but increase simulation speed by 2-3x. Use for behavioral testing, not precise dynamics validation.

Use Spot Instances for Cost Savings

Modify your job creation to use EC2 Spot pricing:

# Add to simulation job config
compute={
    'simulationUnitLimit': 1,
    'computeType': 'CPU'  # Explicitly request CPU-only
}

# Use AWS Batch with Spot instances for even lower costs
# RoboMaker doesn't directly support Spot, but you can:
# 1. Package simulation as Docker container
# 2. Submit to AWS Batch with Spot compute environment
# 3. Achieve 60-90% cost reduction

Expected savings: CPU-only instances cost ~$0.44/hour vs $1.20/hour for GPU instances. Spot pricing can reduce this to ~$0.13/hour.

Cost Analysis

Typical costs for 100 simulation runs (1 hour each):

Configuration	Instance Type	Cost per Hour	Total Cost
Default (GPU)	g4dn.xlarge	$1.20	$120
Headless (CPU)	c5.xlarge	$0.44	$44
Headless + Spot	c5.xlarge Spot	~$0.13	~$13

Savings: ~89% cost reduction with headless + Spot configuration.

What You Learned

Headless Gazebo eliminates GPU rendering overhead in cloud simulations
AWS RoboMaker requires specific environment variables for software rendering
Batch parameter sweeps enable efficient automated testing at scale
CPU-only instances reduce costs by 50-90% for non-visual workloads

Limitations:

Cannot use RViz or Gazebo GUI for debugging cloud jobs (test locally first)
Some Gazebo plugins may expect rendering context (camera sensors work but need proper configuration)
RoboMaker simulation units are billed per-second with 1-minute minimum

When NOT to use headless:

Developing and debugging simulations (use GUI locally)
Generating training data from camera sensors (need rendering, but can disable GUI)
Creating videos or visualizations from simulations

Troubleshooting Common Issues

Issue: "Gazebo died with error 255"

# Check if required packages are missing
# Add to your package.xml:
<depend>gazebo_ros</depend>
<depend>gazebo_ros_control</depend>
<depend>gazebo_plugins</depend>

# Rebuild workspace
colcon build --packages-select my_robot_sim

Issue: X11 errors in CloudWatch logs

# Ensure DISPLAY environment variable is set
# Add to launch configuration:
'DISPLAY': ':1'

# Install xvfb in your Docker/bundle if needed:
sudo apt-get install -y xvfb

Issue: Simulation runs too slowly

<!-- Reduce sensor update rates in your robot URDF -->
<sensor name="camera" type="camera">
  <update_rate>10</update_rate>  <!-- Reduce from 30 -->
</sensor>

<!-- Simplify collision meshes -->
<collision name="collision">
  <geometry>
    <box>  <!-- Use primitives instead of meshes -->
      <size>1 1 1</size>
    </box>
  </geometry>
</collision>

Additional Resources

AWS Documentation:

Gazebo Resources:

Tested on AWS RoboMaker with Gazebo 11, ROS Noetic, Ubuntu 20.04 Last verified: February 2026