Bin Picking: Using 3D Vision to Sort Random Objects

Problem: Your Robot Arm Grabs the Wrong Object (or Nothing)

You've got a bin full of randomly oriented parts. Your robot arm either misses, collides, or picks the one object you didn't want. Classic bin picking failure.

You'll learn:

How to capture and filter a 3D point cloud from a depth camera
How to segment individual objects from a cluttered scene
How to generate valid grasp poses and pass them to a robot controller

Time: 45 min | Level: Advanced

Why This Happens

Bin picking fails when your perception pipeline can't reliably answer three questions: What is in the bin? Where exactly is it? Which way is it facing?

2D cameras lose depth. Depth cameras return noisy point clouds. And randomly piled objects occlude each other constantly, so your model needs to reason about partial geometry.

Common symptoms:

Grasp attempts consistently miss by 5–15mm
Arm collides with bin walls when reaching for edge objects
Object detection confidence drops below threshold in cluttered scenes

Solution

Step 1: Capture a Filtered Point Cloud

Use a structured-light or ToF camera (Intel RealSense, Azure Kinect, Zivid) mounted above the bin. Capture depth + RGB, then convert to a point cloud and remove noise.

import open3d as o3d
import numpy as np

def capture_filtered_cloud(depth_image, intrinsics, voxel_size=0.003):
    # Convert raw depth to Open3D point cloud
    pcd = o3d.geometry.PointCloud.create_from_depth_image(
        o3d.geometry.Image(depth_image),
        intrinsics
    )

    # Voxel downsample — reduces ~2M points to ~40k for faster processing
    pcd = pcd.voxel_down_sample(voxel_size=voxel_size)

    # Remove statistical outliers (stray points from reflective surfaces)
    pcd, _ = pcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0)

    return pcd

Expected: A point cloud with 30k–60k points representing the bin contents.

If it fails:

Empty cloud: Check camera exposure — shiny metal parts absorb IR poorly. Add matte spray coating or switch to photometric stereo.
Massive noise band: Your depth camera is too close. Keep it ≥500mm from bin floor.

Step 2: Crop the ROI and Segment Objects

Crop the point cloud to the bin interior using a known bounding box, then use DBSCAN clustering to isolate individual objects.

def segment_objects(pcd, bin_bounds, eps=0.015, min_points=100):
    # Crop to bin interior — avoids picking bin walls as objects
    bbox = o3d.geometry.AxisAlignedBoundingBox(
        min_bound=bin_bounds["min"],
        max_bound=bin_bounds["max"]
    )
    pcd_cropped = pcd.crop(bbox)

    # DBSCAN: eps is the neighborhood radius in meters
    # Tune eps to ~half your smallest object diameter
    labels = np.array(pcd_cropped.cluster_dbscan(
        eps=eps,
        min_points=min_points,
        print_progress=False
    ))

    objects = []
    for label in set(labels):
        if label < 0:
            continue  # -1 = noise points, skip them
        mask = labels == label
        obj_pcd = pcd_cropped.select_by_index(np.where(mask)[0])
        objects.append(obj_pcd)

    return objects

Expected: A list of point clouds, one per object cluster.

If it fails:

All points in one cluster: eps is too large. Lower it until clusters separate.
One object splits into fragments: eps is too small, or the object has a concave shape. Increase eps slightly or post-process with convex hull merging.

Step 3: Estimate Normals and Score Grasp Candidates

For each object, estimate surface normals, then generate antipodal grasp candidates. Score them by approach angle and collision clearance.

def generate_grasps(obj_pcd, gripper_width=0.08):
    # Normals needed for grasp axis alignment
    obj_pcd.estimate_normals(
        search_param=o3d.geometry.KDTreeSearchParamHybrid(radius=0.01, max_nn=30)
    )
    obj_pcd.orient_normals_to_align_with_direction([0, 0, -1])  # Camera looks down

    points = np.asarray(obj_pcd.points)
    normals = np.asarray(obj_pcd.normals)

    grasps = []
    for i, (pt, normal) in enumerate(zip(points, normals)):
        # Only consider points with upward-facing normals (reachable from above)
        if normal[2] < 0.3:
            continue

        # Antipodal check: find a point roughly opposite across the object center
        center = points.mean(axis=0)
        candidate = {
            "position": pt,
            "approach": -normal,          # Gripper approaches along reversed normal
            "width": gripper_width,
            "score": float(normal[2])     # Higher = more vertical = easier approach
        }
        grasps.append(candidate)

    # Return top 5 by score
    grasps.sort(key=lambda g: g["score"], reverse=True)
    return grasps[:5]

Expected: A ranked list of grasp poses (position + approach vector) for the best-positioned object.

If it fails:

No grasps returned: Object is fully occluded or lying flat with no upward-facing surface. Move to next object in queue.
Arm hits bin wall: Add a clearance check — reject any grasp where the approach vector intersects the bin bounding box before reaching the object.

Step 4: Send Grasp to ROS 2 Controller

Publish the top grasp as a geometry_msgs/PoseStamped for your motion planner (MoveIt 2, cuRobo, or similar).

import rclpy
from rclpy.node import Node
from geometry_msgs.msg import PoseStamped
from scipy.spatial.transform import Rotation

class GraspPublisher(Node):
    def __init__(self):
        super().__init__("grasp_publisher")
        self.pub = self.create_publisher(PoseStamped, "/grasp_target", 10)

    def publish_grasp(self, grasp):
        msg = PoseStamped()
        msg.header.frame_id = "camera_depth_optical_frame"
        msg.header.stamp = self.get_clock().now().to_msg()

        pos = grasp["position"]
        msg.pose.position.x = float(pos[0])
        msg.pose.position.y = float(pos[1])
        msg.pose.position.z = float(pos[2])

        # Align gripper Z-axis with the approach vector
        approach = grasp["approach"]
        rot = Rotation.align_vectors([approach], [[0, 0, 1]])[0]
        q = rot.as_quat()  # [x, y, z, w]
        msg.pose.orientation.x = q[0]
        msg.pose.orientation.y = q[1]
        msg.pose.orientation.z = q[2]
        msg.pose.orientation.w = q[3]

        self.pub.publish(msg)
        self.get_logger().info(f"Published grasp at {pos} score={grasp['score']:.3f}")

Expected: Your motion planner receives a valid pose and plans a collision-free trajectory to it.

Verification

Run the full pipeline on a static bin scene and visualize before sending to hardware:

python pipeline.py --visualize --no-execute

You should see: An Open3D window showing the segmented clusters with colored bounding boxes and the top grasp pose displayed as a coordinate frame.

Once that looks correct:

ros2 topic echo /grasp_target

Confirm the pose message arrives with sane position values (within the bin bounds you defined) and a unit quaternion.

What You Learned

Point cloud filtering with voxel downsampling and statistical outlier removal handles camera noise without destroying object geometry
DBSCAN is the right clustering tool here because you don't know object count ahead of time — tune eps to ~half your smallest object's diameter
Normal-based grasp scoring is fast and good enough for simple parts; for complex or deformable objects you'll want a learned grasp model (GraspNet, AnyGrasp)

Limitations:

This pipeline assumes a static scene — the bin contents must not move between capture and execution
Transparent objects (glass, clear plastic) are invisible to most depth cameras; you need RGB-only detection or polarized lighting
Cycle time is ~200ms per frame at these settings; if you need <100ms, move filtering to GPU with CUDA-accelerated Open3D

Tested on Open3D 0.18, ROS 2 Jazzy, Python 3.12, Ubuntu 24.04