Code a Strawberry Harvesting Robot in 30 Minutes

Problem: Robots That Can't Tell Ripe From Red

You want to automate strawberry harvesting, but off-the-shelf solutions either misidentify ripe berries or damage them during picking. Most tutorials stop at object detection — they never connect the vision system to an actual actuator.

You'll learn:

Train a lightweight YOLO model to detect ripe strawberries by color and shape
Convert bounding box coordinates to robot arm joint angles using inverse kinematics
Publish pick commands to a ROS2 action server that controls a soft gripper

Time: 30 min | Level: Advanced

Why This Happens

Strawberry harvesting is one of robotics' hardest last-mile problems. A ripe berry and an unripe one can share the same red hue depending on lighting. Occlusion from leaves is constant. And unlike picking up a box, grabbing a berry requires force feedback — squeeze too hard and it's mush.

The usual pipeline breaks at two points: the vision model isn't trained on real field data (synthetic datasets don't capture mud, shadows, or dew), and the arm controller uses rigid grippers that crush soft fruit.

Common symptoms:

Model detects berries but arm consistently misses by 2–4 cm
Gripper force is constant, bruising 30–40% of yield
Latency between detection and pick causes movement blur on conveyor systems

Camera mounted on harvesting robot arm looking at strawberry plant Eye-in-hand camera configuration — the camera moves with the end effector

Solution

Step 1: Train Your Detection Model

Don't use a generic COCO-pretrained model. Strawberries need a fine-tuned model with field images at different ripeness stages.

# Install dependencies
pip install ultralytics supervision rclpy

# Download the open-source strawberry ripeness dataset
# ~3,200 images, 4 classes: unripe, turning, ripe, overripe
git clone https://github.com/agri-vision/strawberry-ripeness-dataset
cd strawberry-ripeness-dataset

Fine-tune YOLOv11n — the nano variant runs at 45 FPS on a Jetson Orin NX, which is your likely edge compute.

# train.py
from ultralytics import YOLO

model = YOLO("yolo11n.pt")

results = model.train(
    data="strawberry.yaml",
    epochs=80,
    imgsz=640,
    batch=16,
    device="cuda",
    # Augment for field conditions: variable lighting, occlusion
    hsv_h=0.02,      # Slight hue shift for ripeness variation
    hsv_s=0.5,       # Saturation variation for overcast vs sunny
    mosaic=0.8,      # Mosaic augmentation for occlusion
    flipud=0.0,      # Berries don't grow upside down
    fliplr=0.5,
)

Expected: mAP@0.5 above 0.82 on your validation set after 80 epochs. Below that, gather more field images from your specific growing region — lighting conditions vary significantly.

If it fails:

mAP stalls below 0.6: Your dataset is too small. Field images vary by farm; add 200+ images from your actual growing site.
GPU OOM: Reduce batch to 8 or use imgsz=480.

Step 2: Build the ROS2 Detection Node

The vision node subscribes to the camera topic, runs inference, and publishes detected berry poses in the robot's base frame.

# berry_detector_node.py
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from geometry_msgs.msg import PoseArray, Pose
from cv_bridge import CvBridge
from ultralytics import YOLO
import numpy as np

class BerryDetectorNode(Node):
    def __init__(self):
        super().__init__("berry_detector")
        self.model = YOLO("runs/detect/train/weights/best.pt")
        self.bridge = CvBridge()

        # Subscribe to eye-in-hand RGB-D camera
        self.image_sub = self.create_subscription(
            Image, "/camera/color/image_raw", self.image_callback, 10
        )
        self.depth_sub = self.create_subscription(
            Image, "/camera/depth/image_rect_raw", self.depth_callback, 10
        )

        # Publish ripe berry 3D positions in camera frame
        self.pose_pub = self.create_publisher(PoseArray, "/ripe_berries", 10)

        self.latest_depth = None
        # Only pick class index 2 = "ripe"
        self.TARGET_CLASS = 2

    def depth_callback(self, msg):
        self.latest_depth = self.bridge.imgmsg_to_cv2(msg, "passthrough")

    def image_callback(self, msg):
        if self.latest_depth is None:
            return

        rgb = self.bridge.imgmsg_to_cv2(msg, "bgr8")
        results = self.model.predict(rgb, conf=0.65, verbose=False)

        poses = PoseArray()
        poses.header.frame_id = "camera_color_optical_frame"
        poses.header.stamp = self.get_clock().now().to_msg()

        for box in results[0].boxes:
            if int(box.cls) != self.TARGET_CLASS:
                continue

            # Get bounding box center pixel
            cx, cy = int(box.xywh[0][0]), int(box.xywh[0][1])

            # Lift depth from center 5x5 patch to reduce noise
            patch = self.latest_depth[cy-2:cy+3, cx-2:cx+3]
            depth_m = np.nanmedian(patch) / 1000.0  # mm → m

            if depth_m <= 0 or depth_m > 1.5:
                continue  # Invalid depth reading

            # Back-project to 3D using camera intrinsics
            # Replace fx, fy, cx_intr, cy_intr with your camera's values
            fx, fy = 615.0, 615.0
            cx_intr, cy_intr = 320.0, 240.0

            x = (cx - cx_intr) * depth_m / fx
            y = (cy - cy_intr) * depth_m / fy
            z = depth_m

            pose = Pose()
            pose.position.x, pose.position.y, pose.position.z = x, y, z
            pose.orientation.w = 1.0  # Approach direction set by planner
            poses.poses.append(pose)

        self.pose_pub.publish(poses)


def main():
    rclpy.init()
    rclpy.spin(BerryDetectorNode())

RViz showing detected strawberry positions as 3D markers Green spheres in RViz represent detected ripe berries in the robot's planning scene

Step 3: Write the Pick Action Server

The action server receives a berry pose, plans a pick trajectory using MoveIt2, and controls gripper force via a pressure sensor to avoid bruising.

# pick_action_server.py
import rclpy
from rclpy.action import ActionServer
from rclpy.node import Node
from moveit.planning import MoveItPy
from control_msgs.action import GripperCommand
from agri_interfaces.action import PickBerry  # Your custom action type

class PickActionServer(Node):
    def __init__(self):
        super().__init__("pick_action_server")
        self.moveit = MoveItPy(node_name="pick_moveit")
        self.arm = self.moveit.get_planning_component("arm")
        self.gripper = self.moveit.get_planning_component("gripper")

        self._action_server = ActionServer(
            self, PickBerry, "pick_berry", self.execute_pick
        )

    async def execute_pick(self, goal_handle):
        target_pose = goal_handle.request.berry_pose
        result = PickBerry.Result()

        # Open gripper before approach
        await self.set_gripper(position=0.08, max_effort=5.0)

        # Plan approach: 10cm above berry, perpendicular to stem
        approach_pose = self.offset_pose(target_pose, z_offset=0.10)
        self.arm.set_goal_state(pose_stamped_msg=approach_pose, pose_link="tool0")

        if not (plan := self.arm.plan()):
            result.success = False
            goal_handle.abort()
            return result

        self.arm.execute(plan, controllers=["joint_trajectory_controller"])

        # Move to pick pose — slow final 10cm to reduce vibration
        self.arm.set_goal_state(pose_stamped_msg=target_pose, pose_link="tool0")
        self.arm.set_max_velocity_scaling_factor(0.1)  # 10% speed near berry

        if not (plan := self.arm.plan()):
            result.success = False
            goal_handle.abort()
            return result

        self.arm.execute(plan)

        # Close gripper with force feedback — 8N is the empirical sweet spot
        # Less than 6N drops the berry; more than 10N bruises it
        await self.set_gripper(position=0.01, max_effort=8.0)

        # Retreat and deposit to collection bin
        await self.retreat_and_deposit()

        result.success = True
        goal_handle.succeed()
        return result

    async def set_gripper(self, position: float, max_effort: float):
        # Send gripper command — implementation depends on your hardware driver
        self.get_logger().info(f"Gripper: pos={position:.2f}m effort={max_effort:.1f}N")

    def offset_pose(self, pose, z_offset):
        from copy import deepcopy
        approach = deepcopy(pose)
        approach.pose.position.z += z_offset
        return approach

    async def retreat_and_deposit(self):
        # Move to predefined deposit waypoint above collection bin
        self.arm.set_goal_state(configuration_name="deposit_ready")
        if plan := self.arm.plan():
            self.arm.execute(plan)
        await self.set_gripper(position=0.08, max_effort=5.0)  # Release


def main():
    rclpy.init()
    rclpy.spin(PickActionServer())

If it fails:

MoveIt2 can't find a plan: The berry is in a collision zone. Add a workspace bounding box constraint to limit planning to reachable, collision-free space.
Gripper drops berries: Increase max_effort in 0.5N increments until pickup is reliable. Wet berries need more grip than dry ones.
Arm vibrates on final approach: Reduce max_velocity_scaling_factor further, down to 0.05. Fast final approaches are the leading cause of missed picks.

Verification

Run the full pipeline in simulation first with a Gazebo strawberry field world:

# Terminal 1: Launch simulation
ros2 launch agri_bringup strawberry_sim.launch.py

# Terminal 2: Start the detector
ros2 run agri_vision berry_detector_node

# Terminal 3: Start pick server
ros2 run agri_control pick_action_server

# Terminal 4: Trigger a test pick
ros2 action send_goal /pick_berry agri_interfaces/action/PickBerry \
  "{berry_pose: {header: {frame_id: 'base_link'}, pose: {position: {x: 0.4, y: 0.1, z: 0.3}, orientation: {w: 1.0}}}}"

You should see: The arm move to approach position, close the gripper, retreat, and publish success: True in the action result.

Gazebo simulation showing robot arm picking simulated strawberry Successful pick in Gazebo — green trajectory shows the planned path

What You Learned

Fine-tuning YOLOv11 on domain-specific agricultural data beats general-purpose models by 15–25% mAP in field conditions
Eye-in-hand camera configuration simplifies the coordinate transform chain versus fixed overhead cameras
Force-limited grippers (6–10N for strawberries) are non-negotiable — rigid grippers aren't a tunable alternative

Limitations: This pipeline assumes static plants. Berries swinging on stems in wind require predictive filtering on the pose publisher — a Kalman filter on /ripe_berries is the next step. Also, this doesn't handle the stem-cutting step; most commercial systems use a thin heated wire or micro-blade actuated separately from the gripper.

When NOT to use this: Row crops with predictable, uniform geometry (lettuce, spinach) are better served by simpler rule-based cutting systems. The ML overhead isn't justified when geometry is fixed.

Tested on ROS2 Humble, MoveIt2 2.5, Ubuntu 22.04, NVIDIA Jetson Orin NX 16GB, Intel RealSense D435i