Problem: Your Robot Arm Grabs the Wrong Object (or Nothing)
You've got a bin full of randomly oriented parts. Your robot arm either misses, collides, or picks the one object you didn't want. Classic bin picking failure.
You'll learn:
- How to capture and filter a 3D point cloud from a depth camera
- How to segment individual objects from a cluttered scene
- How to generate valid grasp poses and pass them to a robot controller
Time: 45 min | Level: Advanced
Why This Happens
Bin picking fails when your perception pipeline can't reliably answer three questions: What is in the bin? Where exactly is it? Which way is it facing?
2D cameras lose depth. Depth cameras return noisy point clouds. And randomly piled objects occlude each other constantly, so your model needs to reason about partial geometry.
Common symptoms:
- Grasp attempts consistently miss by 5–15mm
- Arm collides with bin walls when reaching for edge objects
- Object detection confidence drops below threshold in cluttered scenes
Solution
Step 1: Capture a Filtered Point Cloud
Use a structured-light or ToF camera (Intel RealSense, Azure Kinect, Zivid) mounted above the bin. Capture depth + RGB, then convert to a point cloud and remove noise.
import open3d as o3d
import numpy as np
def capture_filtered_cloud(depth_image, intrinsics, voxel_size=0.003):
# Convert raw depth to Open3D point cloud
pcd = o3d.geometry.PointCloud.create_from_depth_image(
o3d.geometry.Image(depth_image),
intrinsics
)
# Voxel downsample — reduces ~2M points to ~40k for faster processing
pcd = pcd.voxel_down_sample(voxel_size=voxel_size)
# Remove statistical outliers (stray points from reflective surfaces)
pcd, _ = pcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0)
return pcd
Expected: A point cloud with 30k–60k points representing the bin contents.
If it fails:
- Empty cloud: Check camera exposure — shiny metal parts absorb IR poorly. Add matte spray coating or switch to photometric stereo.
- Massive noise band: Your depth camera is too close. Keep it ≥500mm from bin floor.
Step 2: Crop the ROI and Segment Objects
Crop the point cloud to the bin interior using a known bounding box, then use DBSCAN clustering to isolate individual objects.
def segment_objects(pcd, bin_bounds, eps=0.015, min_points=100):
# Crop to bin interior — avoids picking bin walls as objects
bbox = o3d.geometry.AxisAlignedBoundingBox(
min_bound=bin_bounds["min"],
max_bound=bin_bounds["max"]
)
pcd_cropped = pcd.crop(bbox)
# DBSCAN: eps is the neighborhood radius in meters
# Tune eps to ~half your smallest object diameter
labels = np.array(pcd_cropped.cluster_dbscan(
eps=eps,
min_points=min_points,
print_progress=False
))
objects = []
for label in set(labels):
if label < 0:
continue # -1 = noise points, skip them
mask = labels == label
obj_pcd = pcd_cropped.select_by_index(np.where(mask)[0])
objects.append(obj_pcd)
return objects
Expected: A list of point clouds, one per object cluster.
If it fails:
- All points in one cluster:
epsis too large. Lower it until clusters separate. - One object splits into fragments:
epsis too small, or the object has a concave shape. Increaseepsslightly or post-process with convex hull merging.
Step 3: Estimate Normals and Score Grasp Candidates
For each object, estimate surface normals, then generate antipodal grasp candidates. Score them by approach angle and collision clearance.
def generate_grasps(obj_pcd, gripper_width=0.08):
# Normals needed for grasp axis alignment
obj_pcd.estimate_normals(
search_param=o3d.geometry.KDTreeSearchParamHybrid(radius=0.01, max_nn=30)
)
obj_pcd.orient_normals_to_align_with_direction([0, 0, -1]) # Camera looks down
points = np.asarray(obj_pcd.points)
normals = np.asarray(obj_pcd.normals)
grasps = []
for i, (pt, normal) in enumerate(zip(points, normals)):
# Only consider points with upward-facing normals (reachable from above)
if normal[2] < 0.3:
continue
# Antipodal check: find a point roughly opposite across the object center
center = points.mean(axis=0)
candidate = {
"position": pt,
"approach": -normal, # Gripper approaches along reversed normal
"width": gripper_width,
"score": float(normal[2]) # Higher = more vertical = easier approach
}
grasps.append(candidate)
# Return top 5 by score
grasps.sort(key=lambda g: g["score"], reverse=True)
return grasps[:5]
Expected: A ranked list of grasp poses (position + approach vector) for the best-positioned object.
If it fails:
- No grasps returned: Object is fully occluded or lying flat with no upward-facing surface. Move to next object in queue.
- Arm hits bin wall: Add a clearance check — reject any grasp where the approach vector intersects the bin bounding box before reaching the object.
Step 4: Send Grasp to ROS 2 Controller
Publish the top grasp as a geometry_msgs/PoseStamped for your motion planner (MoveIt 2, cuRobo, or similar).
import rclpy
from rclpy.node import Node
from geometry_msgs.msg import PoseStamped
from scipy.spatial.transform import Rotation
class GraspPublisher(Node):
def __init__(self):
super().__init__("grasp_publisher")
self.pub = self.create_publisher(PoseStamped, "/grasp_target", 10)
def publish_grasp(self, grasp):
msg = PoseStamped()
msg.header.frame_id = "camera_depth_optical_frame"
msg.header.stamp = self.get_clock().now().to_msg()
pos = grasp["position"]
msg.pose.position.x = float(pos[0])
msg.pose.position.y = float(pos[1])
msg.pose.position.z = float(pos[2])
# Align gripper Z-axis with the approach vector
approach = grasp["approach"]
rot = Rotation.align_vectors([approach], [[0, 0, 1]])[0]
q = rot.as_quat() # [x, y, z, w]
msg.pose.orientation.x = q[0]
msg.pose.orientation.y = q[1]
msg.pose.orientation.z = q[2]
msg.pose.orientation.w = q[3]
self.pub.publish(msg)
self.get_logger().info(f"Published grasp at {pos} score={grasp['score']:.3f}")
Expected: Your motion planner receives a valid pose and plans a collision-free trajectory to it.
Verification
Run the full pipeline on a static bin scene and visualize before sending to hardware:
python pipeline.py --visualize --no-execute
You should see: An Open3D window showing the segmented clusters with colored bounding boxes and the top grasp pose displayed as a coordinate frame.
Once that looks correct:
ros2 topic echo /grasp_target
Confirm the pose message arrives with sane position values (within the bin bounds you defined) and a unit quaternion.
What You Learned
- Point cloud filtering with voxel downsampling and statistical outlier removal handles camera noise without destroying object geometry
- DBSCAN is the right clustering tool here because you don't know object count ahead of time — tune
epsto ~half your smallest object's diameter - Normal-based grasp scoring is fast and good enough for simple parts; for complex or deformable objects you'll want a learned grasp model (GraspNet, AnyGrasp)
Limitations:
- This pipeline assumes a static scene — the bin contents must not move between capture and execution
- Transparent objects (glass, clear plastic) are invisible to most depth cameras; you need RGB-only detection or polarized lighting
- Cycle time is ~200ms per frame at these settings; if you need <100ms, move filtering to GPU with CUDA-accelerated Open3D
Tested on Open3D 0.18, ROS 2 Jazzy, Python 3.12, Ubuntu 24.04