Problem: Your Franka Panda Drops Everything
You've got a Franka Emika Panda, a depth camera, and objects that keep ending up on the floor. Hardcoded grasp poses break the moment the object shifts an inch. You need a pipeline that detects grasp candidates from a point cloud and executes them reliably.
You'll learn:
- How to run GraspNet-baseline inference on an RGBD frame and get ranked 6-DOF grasp poses
- How to transform grasp poses from camera frame to robot base frame
- How to filter candidates and execute with MoveIt 2 on real hardware
Time: 45 min | Level: Advanced
Why This Happens
Rule-based grasping breaks because it encodes the object's pose, not its graspability. A banana lying flat needs a different grip than one standing upright — and that logic explodes combinatorially as you add objects.
Deep grasp detection models like GraspNet learn grasp quality directly from point cloud geometry. They output ranked SE(3) grasp poses (position + orientation) you can pass straight to a motion planner.
Common symptoms of the old approach:
- Gripper collides with object edges when pose drifts even slightly
- Works in simulation, fails on real hardware due to calibration error
- New objects require manual pose re-annotation
GraspNet top-10 grasp candidates visualized on the point cloud before execution
Solution
Step 1: Environment Setup
You need ROS 2 Humble, the Franka ROS 2 driver, and GraspNet-baseline. Use a conda env to isolate the Python deps from ROS.
# Clone GraspNet-baseline
git clone https://github.com/graspnet/graspnet-baseline.git ~/graspnet
cd ~/graspnet
conda create -n graspnet python=3.10 -y
conda activate graspnet
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# Install pointnet2 ops (CUDA-compiled)
cd pointnet2
python setup.py install
Download the checkpoint — use checkpoint-rs.tar for real-scan data (which your depth camera produces):
# From the GraspNet project page, place checkpoint at:
# ~/graspnet/logs/log_rs/checkpoint-rs.tar
Expected: python demo.py runs without error on the sample scene.
If it fails:
- CUDA OOM: Reduce
--num_point 10000to5000indemo.py - Missing ops: Rebuild pointnet2 with
python setup.py clean --all && python setup.py install
Step 2: Calibrate Hand-Eye Transform
GraspNet outputs grasp poses in camera frame. You need the transform from your depth camera to the Franka base (panda_link0).
Use easy_handeye2 with ROS 2:
sudo apt install ros-humble-easy-handeye2
ros2 launch easy_handeye2 calibrate.launch.py \
robot_base_frame:=panda_link0 \
robot_effector_frame:=panda_hand \
tracking_base_frame:=camera_color_optical_frame
Follow the eye-on-hand or eye-to-hand procedure (15–20 robot poses). Save the result — you'll get a YAML like this:
# hand_eye_calibration.yaml
transformation:
x: 0.041
y: -0.012
z: 0.112
qx: 0.002
qy: 0.706
qz: -0.002
qw: 0.708
Publish this as a static transform in your launch file:
# launch/grasp_pipeline.launch.py
from launch_ros.actions import Node
static_tf = Node(
package="tf2_ros",
executable="static_transform_publisher",
arguments=[
"0.041", "-0.012", "0.112",
"0.002", "0.706", "-0.002", "0.708",
"panda_hand", # parent
"camera_color_optical_frame" # child
]
)
Why this matters: 1 cm of calibration error produces gripper collisions at grasp time. Re-run calibration if you remount the camera.
Step 3: Run Inference and Extract the Best Grasp
Write a ROS 2 node that subscribes to your depth camera, runs GraspNet, and publishes the top grasp as a geometry_msgs/PoseStamped.
# grasp_detector_node.py
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import PointCloud2
from geometry_msgs.msg import PoseStamped
import sensor_msgs_py.point_cloud2 as pc2
import numpy as np
import torch
import sys
sys.path.insert(0, '/home/user/graspnet')
from models.graspnet import GraspNet, pred_decode
from dataset.graspnet_dataset import minkowski_collate_fn
import MinkowskiEngine as ME
CHECKPOINT = '/home/user/graspnet/logs/log_rs/checkpoint-rs.tar'
NUM_POINT = 10000
SCORE_THRESH = 0.5 # Only accept grasps above this quality score
class GraspDetector(Node):
def __init__(self):
super().__init__('grasp_detector')
self.model = self._load_model()
self.sub = self.create_subscription(
PointCloud2, '/camera/depth/color/points',
self.cloud_cb, 1)
self.pub = self.create_publisher(PoseStamped, '/best_grasp', 1)
self.get_logger().info('GraspDetector ready')
def _load_model(self):
net = GraspNet(input_feature_dim=0, num_view=300,
num_angle=12, num_depth=4,
cylinder_radius=0.05, hmin=-0.02,
hmax_list=[0.01, 0.02, 0.03, 0.04])
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
checkpoint = torch.load(CHECKPOINT, map_location=device)
net.load_state_dict(checkpoint['model_state_dict'])
net.to(device).eval()
return net
def cloud_cb(self, msg):
# Convert ROS PointCloud2 → numpy (x,y,z only)
pts = np.array([[p[0], p[1], p[2]]
for p in pc2.read_points(
msg, field_names=('x','y','z'),
skip_nans=True)])
if len(pts) < 500:
return
# Subsample for speed
idx = np.random.choice(len(pts), NUM_POINT, replace=len(pts) < NUM_POINT)
pts = pts[idx].astype(np.float32)
# Run inference
with torch.no_grad():
coords = ME.utils.batched_coordinates(
[torch.from_numpy(pts / 0.005).int()])
feats = torch.ones(len(coords), 1)
inputs = {'point_clouds': torch.from_numpy(pts).unsqueeze(0).cuda(),
'coords': coords.cuda(), 'feats': feats.cuda()}
end_points = self.model(inputs)
grasps = pred_decode(end_points)
# Filter by quality score and pick the best
g = grasps[0] # batch index 0
mask = g.scores > SCORE_THRESH
if mask.sum() == 0:
self.get_logger().warn('No high-quality grasps found')
return
best = g[mask][g.scores[mask].argmax()]
# Publish as PoseStamped in camera frame
pose = PoseStamped()
pose.header = msg.header # keep camera frame + timestamp
t = best.translations.cpu().numpy()
q = best.rotation_matrices.cpu().numpy() # convert to quaternion below
from scipy.spatial.transform import Rotation
quat = Rotation.from_matrix(q).as_quat() # [x,y,z,w]
pose.pose.position.x, pose.pose.position.y, pose.pose.position.z = float(t[0]), float(t[1]), float(t[2])
pose.pose.orientation.x, pose.pose.orientation.y = float(quat[0]), float(quat[1])
pose.pose.orientation.z, pose.pose.orientation.w = float(quat[2]), float(quat[3])
self.pub.publish(pose)
self.get_logger().info(f'Published grasp score={best.scores.item():.3f}')
Expected: /best_grasp publishes at ~1–2 Hz while objects are in view.
The green arrow represents the top-ranked grasp, pointing along the gripper approach axis
Step 4: Execute with MoveIt 2
Subscribe to /best_grasp, transform it into panda_link0 frame, then call MoveIt 2 to plan and execute.
# grasp_executor_node.py
import rclpy
from rclpy.node import Node
from geometry_msgs.msg import PoseStamped
from moveit_msgs.action import MoveGroup
from rclpy.action import ActionClient
import tf2_ros, tf2_geometry_msgs
PRE_GRASP_OFFSET = 0.12 # 12 cm back along approach axis
GRIPPER_OPEN_WIDTH = 0.08 # metres
class GraspExecutor(Node):
def __init__(self):
super().__init__('grasp_executor')
self.tf_buffer = tf2_ros.Buffer()
self.tf_listener = tf2_ros.TransformListener(self.tf_buffer, self)
self.sub = self.create_subscription(
PoseStamped, '/best_grasp', self.grasp_cb, 1)
self._move_client = ActionClient(self, MoveGroup, '/move_action')
self.executing = False
def grasp_cb(self, msg: PoseStamped):
if self.executing:
return # Ignore new detections mid-execution
self.executing = True
try:
# Transform grasp pose to robot base frame
grasp_base = self.tf_buffer.transform(
msg, 'panda_link0', timeout=rclpy.duration.Duration(seconds=1.0))
# 1. Open gripper
self._set_gripper(GRIPPER_OPEN_WIDTH)
# 2. Move to pre-grasp (offset along -Z of gripper frame)
pre = self._offset_pose(grasp_base, -PRE_GRASP_OFFSET)
self._move_to(pre)
# 3. Move straight to grasp pose (Cartesian)
self._move_cartesian([grasp_base])
# 4. Close gripper
self._set_gripper(0.0)
# 5. Lift 15 cm
lift = self._offset_pose(grasp_base, 0.15, axis='z_world')
self._move_to(lift)
finally:
self.executing = False
def _offset_pose(self, pose, offset, axis='approach'):
"""Offset a pose along its approach axis (local -Z) or world Z."""
import copy, numpy as np
from scipy.spatial.transform import Rotation
p = copy.deepcopy(pose)
q = [pose.pose.orientation.x, pose.pose.orientation.y,
pose.pose.orientation.z, pose.pose.orientation.w]
R = Rotation.from_quat(q).as_matrix()
if axis == 'approach':
direction = R @ np.array([0, 0, -1]) # gripper Z points into object
else:
direction = np.array([0, 0, 1])
p.pose.position.x += direction[0] * offset
p.pose.position.y += direction[1] * offset
p.pose.position.z += direction[2] * offset
return p
def _move_to(self, pose: PoseStamped):
# MoveIt 2 goal via action — abbreviated for clarity
# Use moveit_py or MoveGroupInterface from moveit2 Python bindings
pass # Implement with your preferred MoveIt 2 Python API
def _move_cartesian(self, waypoints):
pass # Use compute_cartesian_path equivalent
def _set_gripper(self, width):
pass # Call Franka gripper action server
If it fails:
- TF lookup fails: Confirm the static transform from Step 2 is publishing with
ros2 run tf2_tools view_frames - MoveIt planning fails: Check that the grasp pose isn't inside a collision object. Add 2 cm to your pre-grasp offset.
- Gripper doesn't close: Ensure
franka_gripperaction server is running:ros2 node list | grep gripper
Verification
# In one Terminal: launch the full pipeline
ros2 launch grasp_pipeline grasp_pipeline.launch.py
# In another: watch grasp quality scores
ros2 topic echo /best_grasp --field pose.position
You should see: The robot opens its gripper, moves to pre-grasp, descends, closes, and lifts — without hitting the table. Grasp success rate on household objects should be >75% after hand-eye calibration.
Successful pick: object lifted 15 cm, gripper closed firmly with non-zero torque feedback
What You Learned
- GraspNet produces ranked SE(3) grasp poses from a point cloud — no object CAD model needed
- Hand-eye calibration quality is the single biggest factor in real-world success; redo it if accuracy drops
- The
SCORE_THRESHparameter is a lever: lower it for more candidates, higher for fewer but more reliable grasps - This pipeline doesn't handle grasp re-tries; add a feedback loop using Franka's force-torque sensor to detect slip
Limitation: GraspNet was trained on tabletop scenes. Heavily occluded objects or transparent materials degrade performance significantly — consider data augmentation or domain adaptation fine-tuning for your specific objects.
When NOT to use this: If you're grasping a known object in a structured bin with repeatable pose, a pose estimator (FoundPose, FoundationPose) + ICP will be more reliable and faster.
Tested on Franka Emika Panda, ROS 2 Humble, Ubuntu 22.04, GraspNet-baseline (2024-12), PyTorch 2.3, CUDA 12.1, Intel RealSense D435i