Introduction to Deep Learning on drones
Getting a deep learning program working flawlessly on the desktop is nontrivial, so when that application must run on an individual board computer handling a drone, the duty becomes quite challenging.
FlytOS offers a framework to ease these issues by aiding in the easy integration of your profound learning software with drones. FlytOS permits a Profound Learning software (like the next video tutorial) to create and running with a Nvidia TX1 rapidly.
In the previous tutorial, we had built Caffe and set up a Catkin work space for our ROS Caffe node.
In this tutorial we will write an ROS node that takes in images from a USB camera topic that FlytOS publishes, detects and localizes objects in the images, draws bounding boxes on the detected objects and finally, publishes the output on an ROS image topic.
Using the inbuilt video streaming capabilities of FlytOS, the video stream can be viewed in the web browser of any device that is connected to the same network the TX1 is connected to.
For object detection and localization, we use a popular Convolutional Neural Network framework called the Single Shot Multi Box detector (SSD). It is one of the fastest localization network available and is suitable for applications where the object localization is required to be performed in real time. In the context of drones, SSD can be used for applications like surveillance, tracking and following objects of interest, counting cattle etc.
The code for this tutorial is available on https://github.com/flytbase/flytos_tx1. We have taken the CPP example provided in the Single Shot Multibox detector repository and converted it into an ROS node.
The example originally takes input in the form of image files or videos and prints the detected bounding boxes coordinates in the terminal. We modified the input to be ROS image messages being published in FlytOS.
The output is in the form of ROS image messages with the bounding boxes drawn on each image. Images below illustrate example output images from the node.
Running the Code
- For this example, you will have to download a model and pre-trained weights available from this link. If you want to read up on the dataset that this model was trained on and the different classes that it can classify, visit this link. There are more models available at https://github.com/weiliu89/caffe/t… which you can use to experiment. Extract the downloaded zip file and place the
deploy.prototxt
file and the.caffemodel
file in a folder named model in your home folder.
- You should also have FlytOS running on your Nvidia TX1. You can follow this tutorial if it is not the case.
- We use a USB webcam attached to the TX1 for the video source. If you are using the TX1 Development Board, the USB webcam will probably show up with the device name
/dev/video1
. You will have to edit the file/flyt/flytos/flytcore/share/vision_apps/launch/cam_api.launch
and change the line<param name="video_device" value="/dev/video0"/>
to<param name="video_device" value="/dev/video1"/>
- You will need superuser permission to do so. Reboot FlytOS for the changes to take effect.
- If you have followed the previous tutorial you will have the node ready to run. Make sure you have sourced your catkin workspace in your terminal. You can do that by typing:
source ~/flytos_tx1/devel/setup.bash
- Then launch the SSD node by typing the following command in your terminal (the node takes
.prototxt
and.caffemodel
file paths as arguments):
rosrun ssd_caffe ssd_all_bbox ~/model/deploy.prototxt
~/model/VGG_VOC0712_SSD_300x300_iter_60000.caffemodel
- The model will take a few seconds to load, and then the node will start publishing on the
/flytpod/flytcam/detected_objects
topic.
- You can view the image stream in your TX1’s browser by opening the link http://localhost/flytconsole, then clicking the video tab and selecting the
/flytpod/flytcam/detected_objects
topic in the drop-down list that is shown.
This stream can also be seen on any other computer on the same network connection. Just open the following address in your browser:
http://<ip-address-of-TX1>/flytconsole
The screenshot below shows live streaming being viewed in flyt console:
Code explained
We will now explain the modifications we applied to the ssd_caffe.cpp
example to convert it into an ROS node.
- Including ROS related header files
//ROS related includes
#include <ros/ros.h>
#include <image_transport/image_transport.h>
#include <sensor_msgs/image_encodings.h>
#include <std_msgs/Header.h>
#include <cv_bridge/cv_bridge.h>
- The following array holds the name of all output classes that the network can classify from.
std::string class_labels[] = {"__background__","Aeroplane","Bicycle","Bird","Boat","Bottle", "Bus", "Car", "Cat", "Chair","Cow", "Diningtable", "Dog", "Horse","Motorbike", "Person", "Foliage","Sheep", "Sofa", "Train", "Tvmonitor"};
- Creating a subscriber and a callback function for the images on
/flytpod/flytcam/image_raw
topic:
image_transport::ImageTransport it(nh);
image_transport::Subscriber sub = it.subscribe(“/flytpod/flytcam/image_raw”, 1,
imageCallback);void imageCallback(const sensor_msgs::ImageConstPtr& msg)
{
try
{
new_img_header = msg->header;
new_img_ptr = cv_bridge::toCvCopy(msg, sensor_msgs::image_encodings::BGR8);
}
catch (cv_bridge::Exception& e)
{
ROS_ERROR(“Could not convert from ‘%s’ to ‘bgr8’.”, msg->encoding.c_str());
}
}
- Cropping the image to a square size and then passing it to the detector
cv::Mat img = img_uncropped(cv::Rect((int(img_uncropped.cols –
img_uncropped.rows)/2),0,img_uncropped.rows,img_uncropped.rows));
std::vector<vector > detections = detector.Detect(img);
- Drawing the bounding box and writing the class name along with the confidence
cv::addWeighted(color, 0.5, roi, 0.5 , 0.0, roi);
cv::putText(img,class_labels[int(d[1])],cv::Point(static_cast(d[3] * img.cols),static_cast(d[4] * img.rows) +25), cv::FONT_HERSHEY_TRIPLEX,0.8,white,1,8);
cv::putText(img,confidence_text,cv::Point(static_cast(d[3] * img.cols),static_cast(d[4] * img.rows) +50), cv::FONT_HERSHEY_TRIPLEX,0.8,white,1,8);
- Publishing the final image
sensor_msgs::ImagePtr pub_msg = cv_bridge::CvImage(std_msgs::Header(),"bgr8", img).toImageMsg();
image_pub.publish(pub_msg);
You are good to go.
For any queries, you can reach out to us on Forums or FlytOS Developers Group
Comments
Excellent accomplishment, great object acquisition, identification and bounding.
Simple rectangular bounding frame is very practical.
Going to get TX2 soon and will definitely try this code.
In addition to camera I am hoping to integrate also with scanned 3D rangefinder for navigation information.
The combination of bounding box and object identification with 3D Distance points should provide an excellent basis for intelligent navigation.
Best Regards,
Gary