A team of four Master's students from the Electrical Engineering and Computer Science and the Industrial Engineering and Operations Research departments at UC Berkeley (Armanda Kouassi, Emmanuelle M'bolo, Sunil Shah and Yong Keong Yap) worked on a capstone project, Drones: The Killer App, sponsored by 3D Robotics Research & Development.
The Master of Engineering program aims to create engineering leaders of the future by combining courses in leadership with a technical core. As part of this project, we looked into possible commercial applications of drone technology and worked on a few sub-projects that we felt would help or enable companies developing products around drones.
We worked with the Cyber Physical Cloud Computing research group, a group that looks at the applications of unmanned aerial vehicles and were advised jointly by Dr. Brandon Basso from 3D Robotics and Professor Raja Sengupta from UC Berkeley.
This is the first in a series of two posts where we'll go into some detail about our projects. In this post, we'll talk about our attempt to implement real-time image processing on low cost embedded computers such as the Beaglebone Black.
It's easily possible to process frames from a video capture device as fast as they come (typically 30 frames per second) - if you have a nice fast x86 processor. However, most systems which feature these processors are typically too large to fly on multi-rotors - either requiring significant additional weight in the form of cooling or requiring a significant amount of space due to their mainboard's footprint. In addition, the power draw of these boards can be as much as a single motor (or more). These factors make it impractical to deploy such boards on small UAS.
As processing power continues to miniaturise, more options become available, including the recently announced Intel Edison. However, given the high computational demands of unmanned systems, it is still important to architect and write efficient image-processing systems to leave room for additional intelligence.
As part of this project, we investigated the use of commodity ARM processor based boards to be used for on-board image processing. We implemented a landing algorithm described in the 2001 paper, "A Vision System for Landing an Unmanned Aerial Vehicle" by Sharp, Shakernia and Sastry. This allows very accurate pose estimation - whereas GPS typically gives you accuracy of a few metres, we saw position estimates that were accurate to within a few centimetres of our multi-rotor's actual position relative to a landing pattern.
We attempted to re-use popular open source robotics libraries for our implementation, to speed up development time and to avoid re-inventing the wheel. These included roscopter (which we forked to document and clean up), ROS, and OpenCV. Our approach makes use of a downward facing Logitech webcam and a landing pad with six squares, shown below.
This approach first labels each corner. Since we know the relative sizes of each square and we now know their coordinates in "camera" space, it becomes possible to apply a matrix to work out their position in real world space.
Videos of pose estimation working in the lab and from the air:
Computationally, the most intensive part of this process is in pre-processing the image, finding polygons and then identifying the squares within this set of polygons. We then label the polygons and their corners. This process is shown below:
Using the BeagleBone Black, a single core ARM board with a 1 GHz Cortex A8 processor and 512 MB of RAM, we were unable to process frames at any more than 3.01 frames per second.
After spending a significant amount of time optimising OpenCV to use the optional ARM-specific SIMD NEON extensions, code-level parallelism through Intel's TBB library, and to use libjpeg-turbo, an optimised JPEG decoding library, we managed to get the average frame rate up to 3.20 frames per second.
It was clear that our approach needed to be re-visited. We therefore profiled our code to figure out where the majority of time went, generating a heat diagram (more time spent = darker):
After revisiting our code, we removed the median blur (which turned out not to affect performance) and refactored some of our for loops to avoid unnecessary computations. This took our average frame rate to 5.08 frames per second. Considerably better but still not frequent enough for good real-time control.
We then moved to the more expensive and slightly larger Odroid XU, an ARM board with the Cortex A15 processor, featuring four 1.6 GHz cores and 2 GB of RAM. This immediately took us up to 21.58 frames per second. Partially due to the increased clock speed and partially due to being multi-core (less context switching between our code and operating system processes).
Finally, we implemented pipelining using Pthreads, dispatching each frame to a free worker thread, shown below.
When we ran this implementation using just two threads, we were able to get up to almost 30 frames per second, at a system load average of 1.10 - leaving plenty of headroom for other running processes. Unfortunately, we weren't able to get our controller working well enough to actually show the landing algorithm in action.
The full project report (including detailed performance figures) can be found here. Our landing code is open source - the pose estimation works perfectly (and quickly!) but our controller needs some work. Feel free to clone or fork it.
In the next post, we'll talk about our prototype of a structural health monitoring system.