Pick and Place: Implementing Deep Grasping with a Franka Emika

Implement reliable deep learning grasp detection on a Franka Emika Panda robot using GraspNet and ROS 2 in under an hour.

Problem: Your Franka Panda Drops Everything

You've got a Franka Emika Panda, a depth camera, and objects that keep ending up on the floor. Hardcoded grasp poses break the moment the object shifts an inch. You need a pipeline that detects grasp candidates from a point cloud and executes them reliably.

You'll learn:

  • How to run GraspNet-baseline inference on an RGBD frame and get ranked 6-DOF grasp poses
  • How to transform grasp poses from camera frame to robot base frame
  • How to filter candidates and execute with MoveIt 2 on real hardware

Time: 45 min | Level: Advanced


Why This Happens

Rule-based grasping breaks because it encodes the object's pose, not its graspability. A banana lying flat needs a different grip than one standing upright — and that logic explodes combinatorially as you add objects.

Deep grasp detection models like GraspNet learn grasp quality directly from point cloud geometry. They output ranked SE(3) grasp poses (position + orientation) you can pass straight to a motion planner.

Common symptoms of the old approach:

  • Gripper collides with object edges when pose drifts even slightly
  • Works in simulation, fails on real hardware due to calibration error
  • New objects require manual pose re-annotation

Franka Panda picking a mug with GraspNet overlay GraspNet top-10 grasp candidates visualized on the point cloud before execution


Solution

Step 1: Environment Setup

You need ROS 2 Humble, the Franka ROS 2 driver, and GraspNet-baseline. Use a conda env to isolate the Python deps from ROS.

# Clone GraspNet-baseline
git clone https://github.com/graspnet/graspnet-baseline.git ~/graspnet
cd ~/graspnet
conda create -n graspnet python=3.10 -y
conda activate graspnet
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Install pointnet2 ops (CUDA-compiled)
cd pointnet2
python setup.py install

Download the checkpoint — use checkpoint-rs.tar for real-scan data (which your depth camera produces):

# From the GraspNet project page, place checkpoint at:
# ~/graspnet/logs/log_rs/checkpoint-rs.tar

Expected: python demo.py runs without error on the sample scene.

If it fails:

  • CUDA OOM: Reduce --num_point 10000 to 5000 in demo.py
  • Missing ops: Rebuild pointnet2 with python setup.py clean --all && python setup.py install

Step 2: Calibrate Hand-Eye Transform

GraspNet outputs grasp poses in camera frame. You need the transform from your depth camera to the Franka base (panda_link0).

Use easy_handeye2 with ROS 2:

sudo apt install ros-humble-easy-handeye2
ros2 launch easy_handeye2 calibrate.launch.py \
  robot_base_frame:=panda_link0 \
  robot_effector_frame:=panda_hand \
  tracking_base_frame:=camera_color_optical_frame

Follow the eye-on-hand or eye-to-hand procedure (15–20 robot poses). Save the result — you'll get a YAML like this:

# hand_eye_calibration.yaml
transformation:
  x: 0.041
  y: -0.012
  z: 0.112
  qx: 0.002
  qy: 0.706
  qz: -0.002
  qw: 0.708

Publish this as a static transform in your launch file:

# launch/grasp_pipeline.launch.py
from launch_ros.actions import Node

static_tf = Node(
    package="tf2_ros",
    executable="static_transform_publisher",
    arguments=[
        "0.041", "-0.012", "0.112",
        "0.002", "0.706", "-0.002", "0.708",
        "panda_hand",          # parent
        "camera_color_optical_frame"  # child
    ]
)

Why this matters: 1 cm of calibration error produces gripper collisions at grasp time. Re-run calibration if you remount the camera.


Step 3: Run Inference and Extract the Best Grasp

Write a ROS 2 node that subscribes to your depth camera, runs GraspNet, and publishes the top grasp as a geometry_msgs/PoseStamped.

# grasp_detector_node.py
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import PointCloud2
from geometry_msgs.msg import PoseStamped
import sensor_msgs_py.point_cloud2 as pc2
import numpy as np
import torch
import sys
sys.path.insert(0, '/home/user/graspnet')
from models.graspnet import GraspNet, pred_decode
from dataset.graspnet_dataset import minkowski_collate_fn
import MinkowskiEngine as ME

CHECKPOINT = '/home/user/graspnet/logs/log_rs/checkpoint-rs.tar'
NUM_POINT  = 10000
SCORE_THRESH = 0.5  # Only accept grasps above this quality score

class GraspDetector(Node):
    def __init__(self):
        super().__init__('grasp_detector')
        self.model = self._load_model()
        self.sub = self.create_subscription(
            PointCloud2, '/camera/depth/color/points',
            self.cloud_cb, 1)
        self.pub = self.create_publisher(PoseStamped, '/best_grasp', 1)
        self.get_logger().info('GraspDetector ready')

    def _load_model(self):
        net = GraspNet(input_feature_dim=0, num_view=300,
                       num_angle=12, num_depth=4,
                       cylinder_radius=0.05, hmin=-0.02,
                       hmax_list=[0.01, 0.02, 0.03, 0.04])
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        checkpoint = torch.load(CHECKPOINT, map_location=device)
        net.load_state_dict(checkpoint['model_state_dict'])
        net.to(device).eval()
        return net

    def cloud_cb(self, msg):
        # Convert ROS PointCloud2 → numpy (x,y,z only)
        pts = np.array([[p[0], p[1], p[2]]
                        for p in pc2.read_points(
                            msg, field_names=('x','y','z'),
                            skip_nans=True)])
        if len(pts) < 500:
            return

        # Subsample for speed
        idx = np.random.choice(len(pts), NUM_POINT, replace=len(pts) < NUM_POINT)
        pts = pts[idx].astype(np.float32)

        # Run inference
        with torch.no_grad():
            coords = ME.utils.batched_coordinates(
                [torch.from_numpy(pts / 0.005).int()])
            feats  = torch.ones(len(coords), 1)
            inputs = {'point_clouds': torch.from_numpy(pts).unsqueeze(0).cuda(),
                      'coords': coords.cuda(), 'feats': feats.cuda()}
            end_points = self.model(inputs)
            grasps = pred_decode(end_points)

        # Filter by quality score and pick the best
        g = grasps[0]  # batch index 0
        mask = g.scores > SCORE_THRESH
        if mask.sum() == 0:
            self.get_logger().warn('No high-quality grasps found')
            return
        best = g[mask][g.scores[mask].argmax()]

        # Publish as PoseStamped in camera frame
        pose = PoseStamped()
        pose.header = msg.header  # keep camera frame + timestamp
        t = best.translations.cpu().numpy()
        q = best.rotation_matrices.cpu().numpy()  # convert to quaternion below
        from scipy.spatial.transform import Rotation
        quat = Rotation.from_matrix(q).as_quat()  # [x,y,z,w]
        pose.pose.position.x, pose.pose.position.y, pose.pose.position.z = float(t[0]), float(t[1]), float(t[2])
        pose.pose.orientation.x, pose.pose.orientation.y = float(quat[0]), float(quat[1])
        pose.pose.orientation.z, pose.pose.orientation.w = float(quat[2]), float(quat[3])
        self.pub.publish(pose)
        self.get_logger().info(f'Published grasp score={best.scores.item():.3f}')

Expected: /best_grasp publishes at ~1–2 Hz while objects are in view.

RViz showing best grasp arrow on point cloud The green arrow represents the top-ranked grasp, pointing along the gripper approach axis


Step 4: Execute with MoveIt 2

Subscribe to /best_grasp, transform it into panda_link0 frame, then call MoveIt 2 to plan and execute.

# grasp_executor_node.py
import rclpy
from rclpy.node import Node
from geometry_msgs.msg import PoseStamped
from moveit_msgs.action import MoveGroup
from rclpy.action import ActionClient
import tf2_ros, tf2_geometry_msgs

PRE_GRASP_OFFSET = 0.12   # 12 cm back along approach axis
GRIPPER_OPEN_WIDTH = 0.08  # metres

class GraspExecutor(Node):
    def __init__(self):
        super().__init__('grasp_executor')
        self.tf_buffer = tf2_ros.Buffer()
        self.tf_listener = tf2_ros.TransformListener(self.tf_buffer, self)
        self.sub = self.create_subscription(
            PoseStamped, '/best_grasp', self.grasp_cb, 1)
        self._move_client = ActionClient(self, MoveGroup, '/move_action')
        self.executing = False

    def grasp_cb(self, msg: PoseStamped):
        if self.executing:
            return   # Ignore new detections mid-execution
        self.executing = True
        try:
            # Transform grasp pose to robot base frame
            grasp_base = self.tf_buffer.transform(
                msg, 'panda_link0', timeout=rclpy.duration.Duration(seconds=1.0))

            # 1. Open gripper
            self._set_gripper(GRIPPER_OPEN_WIDTH)

            # 2. Move to pre-grasp (offset along -Z of gripper frame)
            pre = self._offset_pose(grasp_base, -PRE_GRASP_OFFSET)
            self._move_to(pre)

            # 3. Move straight to grasp pose (Cartesian)
            self._move_cartesian([grasp_base])

            # 4. Close gripper
            self._set_gripper(0.0)

            # 5. Lift 15 cm
            lift = self._offset_pose(grasp_base, 0.15, axis='z_world')
            self._move_to(lift)

        finally:
            self.executing = False

    def _offset_pose(self, pose, offset, axis='approach'):
        """Offset a pose along its approach axis (local -Z) or world Z."""
        import copy, numpy as np
        from scipy.spatial.transform import Rotation
        p = copy.deepcopy(pose)
        q = [pose.pose.orientation.x, pose.pose.orientation.y,
             pose.pose.orientation.z, pose.pose.orientation.w]
        R = Rotation.from_quat(q).as_matrix()
        if axis == 'approach':
            direction = R @ np.array([0, 0, -1])  # gripper Z points into object
        else:
            direction = np.array([0, 0, 1])
        p.pose.position.x += direction[0] * offset
        p.pose.position.y += direction[1] * offset
        p.pose.position.z += direction[2] * offset
        return p

    def _move_to(self, pose: PoseStamped):
        # MoveIt 2 goal via action — abbreviated for clarity
        # Use moveit_py or MoveGroupInterface from moveit2 Python bindings
        pass  # Implement with your preferred MoveIt 2 Python API

    def _move_cartesian(self, waypoints):
        pass  # Use compute_cartesian_path equivalent

    def _set_gripper(self, width):
        pass  # Call Franka gripper action server

If it fails:

  • TF lookup fails: Confirm the static transform from Step 2 is publishing with ros2 run tf2_tools view_frames
  • MoveIt planning fails: Check that the grasp pose isn't inside a collision object. Add 2 cm to your pre-grasp offset.
  • Gripper doesn't close: Ensure franka_gripper action server is running: ros2 node list | grep gripper

Verification

# In one Terminal: launch the full pipeline
ros2 launch grasp_pipeline grasp_pipeline.launch.py

# In another: watch grasp quality scores
ros2 topic echo /best_grasp --field pose.position

You should see: The robot opens its gripper, moves to pre-grasp, descends, closes, and lifts — without hitting the table. Grasp success rate on household objects should be >75% after hand-eye calibration.

Franka lifting mug after successful grasp Successful pick: object lifted 15 cm, gripper closed firmly with non-zero torque feedback


What You Learned

  • GraspNet produces ranked SE(3) grasp poses from a point cloud — no object CAD model needed
  • Hand-eye calibration quality is the single biggest factor in real-world success; redo it if accuracy drops
  • The SCORE_THRESH parameter is a lever: lower it for more candidates, higher for fewer but more reliable grasps
  • This pipeline doesn't handle grasp re-tries; add a feedback loop using Franka's force-torque sensor to detect slip

Limitation: GraspNet was trained on tabletop scenes. Heavily occluded objects or transparent materials degrade performance significantly — consider data augmentation or domain adaptation fine-tuning for your specific objects.

When NOT to use this: If you're grasping a known object in a structured bin with repeatable pose, a pose estimator (FoundPose, FoundationPose) + ICP will be more reliable and faster.


Tested on Franka Emika Panda, ROS 2 Humble, Ubuntu 22.04, GraspNet-baseline (2024-12), PyTorch 2.3, CUDA 12.1, Intel RealSense D435i