Profiling ROS Nodes: Find Memory Leaks with Valgrind in 20 Minutes

Problem: Your ROS Node Is Leaking Memory

Your robot runs fine for five minutes, then slows down, then crashes. htop shows the node's RSS climbing without stopping. You've stared at the code for an hour and can't find it.

This is a memory leak, and Valgrind will find it in minutes.

You'll learn:

How to launch a ROS 2 node under Valgrind without losing your mind
How to read Memcheck output and pinpoint the exact leak
How to use Massif for heap profiling when Memcheck output is overwhelming

Time: 20 min | Level: Intermediate

Why This Happens

ROS 2 nodes written in C++ manage memory manually. Common culprits are callbacks that allocate on the heap and never free, shared_ptr cycles that prevent destruction, and message buffers that grow unbounded when a subscriber can't keep up.

Common symptoms:

Node RSS memory increases linearly over time
ros2 topic echo eventually slows or drops messages
Node crashes after long uptime with std::bad_alloc

Solution

Step 1: Install Valgrind and Build in Debug Mode

sudo apt install valgrind

Your node must be built without optimizations — otherwise Valgrind's stack traces are useless.

# In your ROS 2 workspace
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Debug
source install/setup.bash

Expected: Build completes with Debug mode. Binary will be larger and slower — that's fine for profiling.

Step 2: Run the Node Under Valgrind Memcheck

Don't use ros2 run directly. Wrap the underlying executable:

valgrind \
  --tool=memcheck \
  --leak-check=full \
  --show-leak-kinds=all \
  --track-origins=yes \
  --log-file=valgrind-output.txt \
  ros2 run your_package your_node

Let it run for long enough to reproduce the leak — usually 60–120 seconds. Then Ctrl+C to shut down. Valgrind will finalize the report on exit.

Why --track-origins=yes: This tells Valgrind to track where uninitialized values came from. It's slower but gives you the allocation site, not just where the bad read happened.

If it fails:

Node exits immediately: Some nodes need parameters. Pass them with --ros-args -p param:=value after the node name.
Output is too noisy: Add --suppressions=/usr/share/valgrind/default.supp to filter known libc false positives.

Step 3: Read the Memcheck Report

Open valgrind-output.txt. You're looking for LEAK SUMMARY at the bottom and definitely lost blocks above it.

==12345== 48 bytes in 3 blocks are definitely lost in loss record 12 of 45
==12345==    at 0x4C2FB0F: malloc (vg_replace_malloc.c:309)
==12345==    by 0x10B3A2: SensorProcessor::processScan(sensor_msgs::LaserScan const&) (sensor_processor.cpp:87)
==12345==    by 0x10A1F4: SensorProcessor::scanCallback(sensor_msgs::LaserScan::SharedPtr) (sensor_processor.cpp:52)

This tells you: every time scanCallback fires, 48 bytes are allocated inside processScan at line 87 and never freed.

Leak kinds to prioritize:

definitely lost — your code allocated this and lost the pointer. Fix it.
indirectly lost — a leaked struct contains pointers to more heap. Fix the parent.
still reachable — pointer exists at exit but wasn't freed. Lower priority; often global state.
possibly lost — Valgrind isn't sure. Investigate if the number grows over time.

Step 4: Fix the Leak

Go to the file and line Valgrind reported. The fix is almost always one of three patterns:

Pattern A — raw pointer never deleted:

// Before: allocated but never freed
void SensorProcessor::processScan(const sensor_msgs::msg::LaserScan& msg) {
  float* buffer = new float[msg.ranges.size()]; // leaks every call
  // ... process ...
  // forgot: delete[] buffer;
}

// After: use RAII
void SensorProcessor::processScan(const sensor_msgs::msg::LaserScan& msg) {
  std::vector<float> buffer(msg.ranges.size()); // freed automatically
  // ... process ...
}

Pattern B — shared_ptr cycle:

// Before: A holds B, B holds A — neither ever destructs
struct NodeA { std::shared_ptr<NodeB> b; };
struct NodeB { std::shared_ptr<NodeA> a; }; // cycle

// After: break the cycle with weak_ptr
struct NodeB { std::weak_ptr<NodeA> a; }; // won't prevent destruction

Pattern C — unbounded subscriber queue:

// Before: queue grows forever if callback is slow
rclcpp::Subscription<sensor_msgs::msg::LaserScan>::SharedPtr sub_;
sub_ = this->create_subscription<sensor_msgs::msg::LaserScan>(
  "scan", 1000, callback); // queue depth of 1000 is dangerous

// After: bound the queue
sub_ = this->create_subscription<sensor_msgs::msg::LaserScan>(
  "scan", 10, callback); // drop old messages instead of accumulating

Step 5: Use Massif for Heap Growth Profiling

When Memcheck output has hundreds of leak records and you need to find the biggest offender fast, use Massif instead:

valgrind \
  --tool=massif \
  --pages-as-heap=yes \
  --massif-out-file=massif.out \
  ros2 run your_package your_node

Then visualize it:

ms_print massif.out | head -100

This prints a bar chart of heap usage over time with the top allocation sites. Look for a bar that keeps growing — that's your leak.

If it fails:

ms_print not found: sudo apt install valgrind includes it, or use massif-visualizer for a GUI view.

Verification

After fixing, rebuild in Debug mode and re-run under Valgrind:

colcon build --cmake-args -DCMAKE_BUILD_TYPE=Debug
source install/setup.bash

valgrind --tool=memcheck --leak-check=full \
  --log-file=valgrind-after.txt \
  ros2 run your_package your_node

Let it run for the same duration as before. Check the summary:

grep -A 10 "LEAK SUMMARY" valgrind-after.txt

You should see: definitely lost: 0 bytes in 0 blocks for the call sites you fixed. still reachable from ROS internals is normal and not your problem.

Also verify at runtime that RSS stays flat:

watch -n 2 'ps -o pid,rss,comm -p $(pgrep -f your_node)'

You should see: RSS stabilizes after startup rather than climbing indefinitely.

What You Learned

Valgrind Memcheck finds exact allocation sites for leaks — not just symptoms
Debug builds are mandatory; release builds strip the line info Valgrind needs
definitely lost is always your code; fix those first
Massif is faster than Memcheck when you need to find the biggest heap consumer
Bounded subscriber queue depth prevents a whole class of memory growth bugs

Limitation: Valgrind slows execution by 10–50x. Don't use it on hard-realtime nodes where timing matters — profile offline with a rosbag instead.

When NOT to use this: If your node is Python (rclpy), Valgrind won't help — use tracemalloc or memory_profiler instead.

Tested on ROS 2 Jazzy, Ubuntu 24.04, Valgrind 3.22, GCC 13