Problem: Gazebo GUI Wastes Cloud Resources
You need to run hundreds of robot simulations for testing, but Gazebo's GUI consumes GPU resources and slows down AWS RoboMaker batch jobs. Each simulation costs more and takes longer than it should.
You'll learn:
- Configure Gazebo for headless operation on AWS RoboMaker
- Set up batch simulation jobs without rendering overhead
- Monitor simulations and collect results efficiently
Time: 20 min | Level: Intermediate
Why This Happens
AWS RoboMaker runs Gazebo with full rendering by default, allocating GPU resources even when you don't need visualization. For automated testing or reinforcement learning training, this rendering overhead increases costs by 40-60% and reduces simulation throughput.
Common symptoms:
- High EC2 costs for simulation batches
- Slow simulation performance despite powerful instances
- GPU memory errors when scaling to multiple parallel simulations
- Unnecessary X11 display errors in CloudWatch logs
Solution
Step 1: Prepare Your ROS Workspace
Set up your simulation package with headless configuration support.
# Navigate to your ROS workspace
cd ~/ros_workspace/src
# Create launch file directory if needed
mkdir -p my_robot_sim/launch
Expected: Directory structure ready for launch files.
Step 2: Create Headless Launch File
Create a launch file that disables Gazebo's GUI and rendering components.
<!-- my_robot_sim/launch/headless_sim.launch -->
<launch>
<!-- Headless Gazebo arguments -->
<arg name="gui" default="false"/>
<arg name="headless" default="true"/>
<arg name="verbose" default="true"/>
<!-- Start Gazebo server only (no client) -->
<include file="$(find gazebo_ros)/launch/empty_world.launch">
<arg name="world_name" value="$(find my_robot_sim)/worlds/test_world.world"/>
<arg name="paused" value="false"/>
<arg name="use_sim_time" value="true"/>
<arg name="gui" value="$(arg gui)"/>
<arg name="headless" value="$(arg headless)"/>
<arg name="debug" value="false"/>
<arg name="verbose" value="$(arg verbose)"/>
</include>
<!-- Spawn your robot model -->
<node name="spawn_robot" pkg="gazebo_ros" type="spawn_model"
args="-file $(find my_robot_sim)/urdf/robot.urdf
-urdf -model my_robot -x 0 -y 0 -z 0.5"
output="screen"/>
<!-- Your simulation nodes -->
<node name="robot_controller" pkg="my_robot_sim" type="controller_node.py"/>
<!-- Data collection node for results -->
<node name="metrics_collector" pkg="my_robot_sim" type="collect_metrics.py"
args="--output /tmp/simulation_results.json"/>
</launch>
Why this works: Setting gui="false" and headless="true" tells Gazebo to run only the physics server (gzserver) without the client (gzclient). This eliminates GPU rendering and X11 display requirements.
If it fails:
- Error: "Could not find world file": Use full path or check
$(find package_name)resolves correctly - Robot doesn't spawn: Verify URDF path and ensure gazebo_ros is installed
Step 3: Configure RoboMaker Simulation Application
Create the RoboMaker simulation application configuration.
# scripts/create_simulation_app.py
import boto3
robomaker = boto3.client('robomaker', region_name='us-west-2')
# Create simulation application
response = robomaker.create_simulation_application(
name='headless-gazebo-sim',
renderingEngine={'name': 'OGRE', 'version': '1.x'}, # Required but won't be used
simulationSoftwareSuite={
'name': 'Gazebo',
'version': '11' # Use Gazebo 11 for ROS Noetic
},
robotSoftwareSuite={
'name': 'ROS',
'version': 'Noetic'
},
sources=[{
's3Bucket': 'my-robomaker-bucket',
's3Key': 'robot-sim.tar.gz', # Your bundled workspace
'architecture': 'X86_64'
}]
)
print(f"Application ARN: {response['arn']}")
Expected: Application created with ARN output.
Step 4: Bundle and Upload Workspace
Package your ROS workspace for RoboMaker deployment.
# Build your workspace
cd ~/ros_workspace
colcon build --install-base /opt/ros/noetic
# Create bundle following RoboMaker structure
mkdir -p bundle
cp -r install bundle/
cp -r src bundle/
# Create tar archive
tar -czf robot-sim.tar.gz -C bundle .
# Upload to S3
aws s3 cp robot-sim.tar.gz s3://my-robomaker-bucket/robot-sim.tar.gz
Why this works: RoboMaker expects a specific directory structure with the compiled workspace. The tar.gz format is required for deployment.
If it fails:
- Error: "Missing dependencies": Run
rosdep install --from-paths src --ignore-src -r -ybefore building - Upload fails: Check S3 bucket permissions and AWS credentials
Step 5: Launch Headless Simulation Job
Create and start a batch simulation job configured for headless operation.
# scripts/launch_headless_batch.py
import boto3
robomaker = boto3.client('robomaker', region_name='us-west-2')
# Define simulation job configuration
response = robomaker.create_simulation_job(
maxJobDurationInSeconds=3600, # 1 hour timeout
iamRole='arn:aws:iam::ACCOUNT_ID:role/RoboMakerSimulationRole',
# Use compute without GPU for cost savings
compute={
'simulationUnitLimit': 1 # 1 = 4 vCPU, 8 GB RAM, no GPU
},
simulationApplications=[{
'application': 'arn:aws:robomaker:us-west-2:ACCOUNT_ID:simulation-application/headless-gazebo-sim',
'launchConfig': {
'packageName': 'my_robot_sim',
'launchFile': 'headless_sim.launch',
'environmentVariables': {
'DISPLAY': ':1', # Virtual display for X11 dependencies
'LIBGL_ALWAYS_SOFTWARE': '1', # Force software rendering
'GAZEBO_MASTER_URI': 'http://localhost:11345'
}
}
}],
# Data output configuration
outputLocation={
's3Bucket': 'my-robomaker-results',
's3Prefix': 'simulation-runs/'
},
# Logging configuration
loggingConfig={
'recordAllRosTopics': False # Set true if you need rosbag recordings
},
vpcConfig={
'subnets': ['subnet-xxxxx'],
'securityGroups': ['sg-xxxxx'],
'assignPublicIp': True
}
)
job_arn = response['arn']
print(f"Simulation job started: {job_arn}")
# Monitor job status
while True:
job = robomaker.describe_simulation_job(job=job_arn)
status = job['status']
print(f"Status: {status}")
if status in ['Completed', 'Failed', 'Canceled']:
break
time.sleep(30)
# Retrieve results
if status == 'Completed':
print(f"Results available at: s3://my-robomaker-results/simulation-runs/")
Why this works: Setting simulationUnitLimit=1 allocates CPU-only instances. The environment variables force software rendering, eliminating GPU requirements. This reduces costs by ~50% compared to GPU instances.
If it fails:
- Error: "Job failed with status Failed": Check CloudWatch logs for Gazebo errors
- Timeout: Increase
maxJobDurationInSecondsor optimize simulation physics settings - VPC errors: Ensure subnets have internet access if pulling dependencies
Step 6: Optimize for Batch Simulations
Run multiple parallel simulations with parameter variations.
# scripts/batch_parameter_sweep.py
import boto3
import itertools
robomaker = boto3.client('robomaker', region_name='us-west-2')
# Define parameter space
robot_speeds = [0.5, 1.0, 1.5, 2.0]
obstacle_densities = ['low', 'medium', 'high']
# Generate all combinations
param_combinations = list(itertools.product(robot_speeds, obstacle_densities))
job_arns = []
for speed, density in param_combinations:
response = robomaker.create_simulation_job(
maxJobDurationInSeconds=1800,
iamRole='arn:aws:iam::ACCOUNT_ID:role/RoboMakerSimulationRole',
compute={'simulationUnitLimit': 1},
simulationApplications=[{
'application': 'arn:aws:robomaker:us-west-2:ACCOUNT_ID:simulation-application/headless-gazebo-sim',
'launchConfig': {
'packageName': 'my_robot_sim',
'launchFile': 'headless_sim.launch',
'environmentVariables': {
'ROBOT_SPEED': str(speed),
'OBSTACLE_DENSITY': density,
'DISPLAY': ':1',
'LIBGL_ALWAYS_SOFTWARE': '1'
}
}
}],
outputLocation={
's3Bucket': 'my-robomaker-results',
's3Prefix': f'batch-sweep/speed_{speed}_density_{density}/'
},
tags={
'experiment': 'parameter-sweep',
'speed': str(speed),
'density': density
}
)
job_arns.append(response['arn'])
print(f"Launched: speed={speed}, density={density}")
print(f"Total jobs launched: {len(job_arns)}")
Expected: Multiple simulation jobs running in parallel, each with different parameters. Check AWS Console to see jobs in "Running" state.
Step 7: Collect and Analyze Results
Set up automated result collection from S3.
# scripts/collect_results.py
import boto3
import json
import pandas as pd
s3 = boto3.client('s3')
def download_simulation_results(bucket, prefix):
"""Download all simulation result JSON files"""
results = []
response = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
for obj in response.get('Contents', []):
if obj['Key'].endswith('.json'):
# Download file
file_obj = s3.get_object(Bucket=bucket, Key=obj['Key'])
data = json.loads(file_obj['Body'].read())
# Extract metadata from S3 key
parts = obj['Key'].split('/')
data['speed'] = parts[-3].split('_')[1]
data['density'] = parts[-2].split('_')[1]
results.append(data)
return pd.DataFrame(results)
# Collect results
df = download_simulation_results('my-robomaker-results', 'batch-sweep/')
# Analyze performance
print("Average completion times by speed:")
print(df.groupby('speed')['completion_time'].mean())
print("\nSuccess rates by obstacle density:")
print(df.groupby('density')['success'].mean())
# Save analysis
df.to_csv('simulation_analysis.csv', index=False)
print("Results saved to simulation_analysis.csv")
Expected: CSV file with aggregated metrics from all simulations.
Verification
Check that your headless simulation is running correctly.
# Monitor CloudWatch logs
aws logs tail /aws/robomaker/SimulationJobs --follow
# Check for successful indicators
# ✅ "Gazebo multi-robot simulator, version 11.x"
# ✅ "gzserver started" (NOT gzclient)
# ✅ No X11 or GPU errors
# ⌠"Unable to create the rendering window" (means GPU is being used)
# Verify results in S3
aws s3 ls s3://my-robomaker-results/simulation-runs/ --recursive
You should see: Log entries showing gzserver starting without gzclient, and result files appearing in S3 during/after simulation completion.
Performance Optimization
Adjust Physics Settings for Speed
Modify your world file to increase simulation speed:
<!-- worlds/test_world.world -->
<sdf version="1.6">
<world name="default">
<physics type="ode">
<!-- Increase time step for faster simulation -->
<max_step_size>0.01</max_step_size> <!-- Default: 0.001 -->
<real_time_factor>1.5</real_time_factor> <!-- Run 1.5x realtime -->
<real_time_update_rate>100</real_time_update_rate> <!-- Default: 1000 -->
</physics>
<!-- Your world contents -->
</world>
</sdf>
Trade-off: Higher step sizes reduce physics accuracy but increase simulation speed by 2-3x. Use for behavioral testing, not precise dynamics validation.
Use Spot Instances for Cost Savings
Modify your job creation to use EC2 Spot pricing:
# Add to simulation job config
compute={
'simulationUnitLimit': 1,
'computeType': 'CPU' # Explicitly request CPU-only
}
# Use AWS Batch with Spot instances for even lower costs
# RoboMaker doesn't directly support Spot, but you can:
# 1. Package simulation as Docker container
# 2. Submit to AWS Batch with Spot compute environment
# 3. Achieve 60-90% cost reduction
Expected savings: CPU-only instances cost ~$0.44/hour vs $1.20/hour for GPU instances. Spot pricing can reduce this to ~$0.13/hour.
Cost Analysis
Typical costs for 100 simulation runs (1 hour each):
| Configuration | Instance Type | Cost per Hour | Total Cost |
|---|---|---|---|
| Default (GPU) | g4dn.xlarge | $1.20 | $120 |
| Headless (CPU) | c5.xlarge | $0.44 | $44 |
| Headless + Spot | c5.xlarge Spot | ~$0.13 | ~$13 |
Savings: ~89% cost reduction with headless + Spot configuration.
What You Learned
- Headless Gazebo eliminates GPU rendering overhead in cloud simulations
- AWS RoboMaker requires specific environment variables for software rendering
- Batch parameter sweeps enable efficient automated testing at scale
- CPU-only instances reduce costs by 50-90% for non-visual workloads
Limitations:
- Cannot use RViz or Gazebo GUI for debugging cloud jobs (test locally first)
- Some Gazebo plugins may expect rendering context (camera sensors work but need proper configuration)
- RoboMaker simulation units are billed per-second with 1-minute minimum
When NOT to use headless:
- Developing and debugging simulations (use GUI locally)
- Generating training data from camera sensors (need rendering, but can disable GUI)
- Creating videos or visualizations from simulations
Troubleshooting Common Issues
Issue: "Gazebo died with error 255"
# Check if required packages are missing
# Add to your package.xml:
<depend>gazebo_ros</depend>
<depend>gazebo_ros_control</depend>
<depend>gazebo_plugins</depend>
# Rebuild workspace
colcon build --packages-select my_robot_sim
Issue: X11 errors in CloudWatch logs
# Ensure DISPLAY environment variable is set
# Add to launch configuration:
'DISPLAY': ':1'
# Install xvfb in your Docker/bundle if needed:
sudo apt-get install -y xvfb
Issue: Simulation runs too slowly
<!-- Reduce sensor update rates in your robot URDF -->
<sensor name="camera" type="camera">
<update_rate>10</update_rate> <!-- Reduce from 30 -->
</sensor>
<!-- Simplify collision meshes -->
<collision name="collision">
<geometry>
<box> <!-- Use primitives instead of meshes -->
<size>1 1 1</size>
</box>
</geometry>
</collision>
Additional Resources
AWS Documentation:
Gazebo Resources:
Tested on AWS RoboMaker with Gazebo 11, ROS Noetic, Ubuntu 20.04 Last verified: February 2026