Object detection is a fascinating area of computer vision that allows computers to identify and locate objects within images or video streams. This post will guide you through setting up real-time object detection on a Raspberry Pi using YOLOv5 and OpenCV. We’ll also handle warnings effectively and focus on detecting specific objects like persons, cars, motorcycles, buses, and trucks within a defined region of interest (ROI).
Prerequisites
- Raspberry Pi: Ensure you have a Raspberry Pi with internet access.
- Python: Python should be installed on your system.
- OpenCV: Install OpenCV using
pip install opencv-python
. - Torch: Install Torch using
pip install torch
. - YOLOv5: We’ll use the YOLOv5 model from Ultralytics.
Step-by-Step Guide
1. Set Up the Environment
First, ensure that you have the necessary Python packages installed:
pip install opencv-python torch
2. Define the Object Detection Script
Here’s the Python script that captures video from an RTSP stream, processes each frame to detect objects, and displays the results with a region of interest (ROI). We have also added steps to suppress unwanted warnings and optimize the performance.
import logging
import cv2
import torch
import time
import os
# Configure custom logger to capture only warnings related to HEVC codec
logging.basicConfig(level=logging.WARNING, format='%(levelname)s - %(message)s')
logger = logging.getLogger('HEVC_Warnings')
logger.setLevel(logging.WARNING)
# Load YOLOv5 model (small version for better performance on Raspberry Pi)
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
# Define vehicle classes (YOLOv5 class indices for vehicles)
classes = [0, 2, 3, 5, 7] # Indices for person, car, motorcycle, bus, and truck
# Access the RTSP stream
rtsp_url = "rtsp://<login>:<password>@<ip_address>:554/h264Preview_01_main"
cap = cv2.VideoCapture(rtsp_url)
# Define region of interest (ROI) coordinates (x, y, width, height)
roi_x, roi_y, roi_width, roi_height = 100, 100, 400, 300
def detect_and_signal(frame):
results = model(frame)
detections = results.xyxy[0] # Extract detections
vehicle_detected = False
for *box, conf, cls in detections:
if int(cls) in classes:
vehicle_detected = True
if int(cls) == 0:
print("Person detected")
elif int(cls) == 2:
print("Car detected")
elif int(cls) == 7:
print("Truck detected")
else:
print("Other vehicle detected: " + cls)
# Draw rectangle around ROI
cv2.rectangle(frame, (roi_x, roi_y), (roi_x + roi_width, roi_y + roi_height), (255, 0, 0), 2)
# Draw rectangle for visualization around detected object
cv2.rectangle(frame, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), (0, 255, 0), 2)
# Add class label and confidence score
label = f'{model.names[int(cls)]} {conf:.2f}'
cv2.putText(frame, label, (int(box[0]), int(box[1]) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Send signal if any vehicle is detected
# GPIO.output(18, GPIO.HIGH if vehicle_detected else GPIO.LOW)
# Set skip factor (process every nth frame)
skip_factor = 50 # Process every 50th frame
frame_count = 0
# Redirect stderr to /dev/null to suppress warnings
os.environ['FFREPORT'] = 'file=/dev/null'
while True:
ret, frame = cap.read()
if not ret or frame is None:
print("Failed to capture frame. Retrying...")
continue
# Skip frames
if frame_count % skip_factor == 0:
try:
# Resize frame to reduce resolution
frame_resized = cv2.resize(frame, (640, 480))
# Draw square around the ROI
cv2.rectangle(frame_resized, (roi_x, roi_y), (roi_x + roi_width, roi_y + roi_height), (255, 0, 0), 2)
# Extract region of interest (ROI) from the frame
roi = frame_resized[roi_y:roi_y + roi_height, roi_x:roi_x + roi_width]
# Resize ROI for better performance
roi_resized = cv2.resize(roi, (640, 480))
except cv2.error as e:
print(f"Error processing ROI: {e}")
continue
start_time = time.time()
detect_and_signal(roi_resized)
end_time = time.time()
print(f"Frame processed in {end_time - start_time:.2f} seconds")
# Display the processed frame with ROI and detections (for debugging)
cv2.imshow('Frame', frame_resized)
frame_count += 1
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
# GPIO.cleanup()
(Optional) Other optimization
in you we want to optimize even more the script, it is possible to redule image resolution: e.g. 192×192 pixel and roi_x, roi_y, roi_width, roi_height = 50, 50, 100, 100
-> roi_x, roi_y, roi_width, roi_height = 50, 50, 100, 100
-> frame_resized = cv2.resize(frame, (192, 192))
-> roi_resized = cv2.resize(roi, (192, 192))
Explanation
- Model Loading: We load the YOLOv5 model using
torch.hub.load
. The model is set to detect objects like persons, cars, motorcycles, buses, and trucks. - RTSP Stream Access: The video stream is accessed using OpenCV’s
cv2.VideoCapture
. - Region of Interest (ROI): We define a specific region in the frame where we want to detect objects.
- Frame Skipping: To reduce CPU load, we process every 50th frame.
- Warning Suppression: We redirect
stderr
to/dev/null
to suppress HEVC codec warnings. - Object Detection and Annotation: Detected objects within the ROI are annotated and displayed.
- Performance Optimization: The frame is resized for better performance before processing.
Test
Conclusion
This setup allows you to run real-time object detection on a Raspberry Pi with optimized CPU usage. The code processes every 50th frame to reduce load and focuses on a defined ROI for targeted detection. By following these steps, you can effectively implement and optimize object detection for various applications.
Feel free to modify the code to suit your specific needs, such as adjusting the ROI, skip factor, or detected object classes. Happy coding!