Drones at UC Berkeley: Real-time Image Processing (Part 1 of 2)

A team of four Master's students from the Electrical Engineering and Computer Science and the Industrial Engineering and Operations Research departments at UC Berkeley (Armanda Kouassi, Emmanuelle M'bolo, Sunil Shah and Yong Keong Yap) worked on a capstone project, Drones: The Killer App, sponsored by 3D Robotics Research & Development.

The Master of Engineering program aims to create engineering leaders of the future by combining courses in leadership with a technical core. As part of this project, we looked into possible commercial applications of drone technology and worked on a few sub-projects that we felt would help or enable companies developing products around drones.

We worked with the Cyber Physical Cloud Computing research group, a group that looks at the applications of unmanned aerial vehicles and were advised jointly by Dr. Brandon Basso from 3D Robotics and Professor Raja Sengupta from UC Berkeley.

This is the first in a series of two posts where we'll go into some detail about our projects. In this post, we'll talk about our attempt to implement real-time image processing on low cost embedded computers such as the Beaglebone Black.

It's easily possible to process frames from a video capture device as fast as they come (typically 30 frames per second) - if you have a nice fast x86 processor. However, most systems which feature these processors are typically too large to fly on multi-rotors - either requiring significant additional weight in the form of cooling or requiring a significant amount of space due to their mainboard's footprint. In addition, the power draw of these boards can be as much as a single motor (or more). These factors make it impractical to deploy such boards on small UAS.

As processing power continues to miniaturise, more options become available, including the recently announced Intel Edison. However, given the high computational demands of unmanned systems, it is still important to architect and write efficient image-processing systems to leave room for additional intelligence.

As part of this project, we investigated the use of commodity ARM processor based boards to be used for on-board image processing. We implemented a landing algorithm described in the 2001 paper, "A Vision System for Landing an Unmanned Aerial Vehicle" by Sharp, Shakernia and Sastry. This allows very accurate pose estimation - whereas GPS typically gives you accuracy of a few metres, we saw position estimates that were accurate to within a few centimetres of our multi-rotor's actual position relative to a landing pattern.

We attempted to re-use popular open source robotics libraries for our implementation, to speed up development time and to avoid re-inventing the wheel. These included roscopter (which we forked to document and clean up), ROS, and OpenCV. Our approach makes use of a downward facing Logitech webcam and a landing pad with six squares, shown below.

The picture above shows our hardware "stack" on a 3DR quadcopter. Below the APM 2.6 is our computer (in this setup, a Beaglebone Black) plus a USB hub. To the left is a Logitech C920 webcam.

This approach first labels each corner. Since we know the relative sizes of each square and we now know their coordinates in "camera" space, it becomes possible to apply a matrix to work out their position in real world space.

Videos of pose estimation working in the lab and from the air:

Computationally, the most intensive part of this process is in pre-processing the image, finding polygons and then identifying the squares within this set of polygons. We then label the polygons and their corners. This process is shown below:

Using the BeagleBone Black, a single core ARM board with a 1 GHz Cortex A8 processor and 512 MB of RAM, we were unable to process frames at any more than 3.01 frames per second.

After spending a significant amount of time optimising OpenCV to use the optional ARM-specific SIMD NEON extensions, code-level parallelism through Intel's TBB library, and to use libjpeg-turbo, an optimised JPEG decoding library, we managed to get the average frame rate up to 3.20 frames per second.

It was clear that our approach needed to be re-visited. We therefore profiled our code to figure out where the majority of time went, generating a heat diagram (more time spent = darker):

After revisiting our code, we removed the median blur (which turned out not to affect performance) and refactored some of our for loops to avoid unnecessary computations. This took our average frame rate to 5.08 frames per second. Considerably better but still not frequent enough for good real-time control.

We then moved to the more expensive and slightly larger Odroid XU, an ARM board with the Cortex A15 processor, featuring four 1.6 GHz cores and 2 GB of RAM. This immediately took us up to 21.58 frames per second. Partially due to the increased clock speed and partially due to being multi-core (less context switching between our code and operating system processes).

Finally, we implemented pipelining using Pthreads, dispatching each frame to a free worker thread, shown below.

When we ran this implementation using just two threads, we were able to get up to almost 30 frames per second, at a system load average of 1.10 - leaving plenty of headroom for other running processes. Unfortunately, we weren't able to get our controller working well enough to actually show the landing algorithm in action.

The full project report (including detailed performance figures) can be found here. Our landing code is open source - the pose estimation works perfectly (and quickly!) but our controller needs some work. Feel free to clone or fork it.

Credit is also due to collaborators Constantin Berzan and Nahush Bhanage.

In the next post, we'll talk about our prototype of a structural health monitoring system.

Petr, we considered that option - the idea was to try and keep all processing onboard to avoid requiring an ground infrastructure.Feasibly you could build a ground station that has both the fixed landing pattern and a higher powered computer just to handle the processing too, when the UAS is near. I suspect that the time saved by processing on a faster computer would be negated by the increased latency.

We considered looking at using a GPU but most of the embedded boards have horrific driver support and proprietary SDKs for developing against their graphics chips. In addition, what driver support they have (at least for the Odroid) seemed to be built only for Android. I'd be curious to see what we could do with a graphics card on the ground and something like OpenCL.

Thanks for the tip John! I'll pass that on to the current maintainers.

Julien - We need all corners of the shape in view in order to solve a series of equations to get the position and yaw of the quadcopter. If you didn't care about one of these, you could possibly simplify this further. In the current implementation I think we need 24 points (could be wrong though, it's been a few months!) to calculate x, y, z and yaw. We considered using nested patterns - assuming we had different colours or perhaps some other way to identify the current pattern, it's very easy to have it switch to a smaller pattern as you descend.

You need to be a member of diydrones to add comments!

Join diydrones

Comments

Robert Wagner September 28, 2014 at 11:41am

Great work!!

perhaps it would be helpful to exchange the board, and use the ordoid-xu3 / HMP and opencl 1.1 http://www.hardkernel.com/main/products/prdt_info.php?g_code=G14044...
Gilbert McGhee September 27, 2014 at 8:37pm

Sunil et al., what are your thoughts on putting the camera and processing power on the ground side and the image on the Vehicle? My idea is to use gps to get the vehicle close, then fly a search pattern until the ground camera finds the vehicle, then have the GCS send commands to land the vehicle. Would that scenario still have the latency problems you mentioned above? Unfortunately I have no expierence with machine vision.
3D Robotics

Ramon Roche September 26, 2014 at 5:18pm

Excellent write up and great Idea, its great to see projects of this quality coming out of schools
Sunil Shah September 26, 2014 at 5:04pm

Petr, we considered that option - the idea was to try and keep all processing onboard to avoid requiring an ground infrastructure.Feasibly you could build a ground station that has both the fixed landing pattern and a higher powered computer just to handle the processing too, when the UAS is near. I suspect that the time saved by processing on a faster computer would be negated by the increased latency.

We considered looking at using a GPU but most of the embedded boards have horrific driver support and proprietary SDKs for developing against their graphics chips. In addition, what driver support they have (at least for the Odroid) seemed to be built only for Android. I'd be curious to see what we could do with a graphics card on the ground and something like OpenCL.

Thanks for the tip John! I'll pass that on to the current maintainers.

Julien - We need all corners of the shape in view in order to solve a series of equations to get the position and yaw of the quadcopter. If you didn't care about one of these, you could possibly simplify this further. In the current implementation I think we need 24 points (could be wrong though, it's been a few months!) to calculate x, y, z and yaw. We considered using nested patterns - assuming we had different colours or perhaps some other way to identify the current pattern, it's very easy to have it switch to a smaller pattern as you descend.
Julien Dubois September 26, 2014 at 10:32am

Very good job!

from your video, it looks like you wait for every shapes to be recognized to find out their position.

Maybe you could only look for a partial but single shape and that probably would allow the landing to complete even if the video is not wide enough to get the full shape.

But for that, your whole shape should not have any symmetry (some little squares rotated or replaced by lozenges for exemple).

Maybe the trick would be to identify landmarks on the video and use those landmarks movements to find out the copter x/y position and yawing. If you increase/decrease altitude, get other landmarks... This way, there is no more fov neither resolution constraint... but you won't be able to give an absolute altitude, only relative changes.
Developer

John Arne Birkeland September 26, 2014 at 9:37am

Very interesting article. Regarding the costly medium blur filter. If all you need is a low pass filter to remove sensor noise, there should be no need to perform a true median blur like medianBlur() in OpenCV. Use boxFilter() instead, or fake a good enough fast blur at the cost of some additions and a division (bit shift if lacking division hardware) per pixel. All 1 clock instructions that scale to any SIMD instructions available (32/64/128bit etc.).
Petr Hubacek September 26, 2014 at 1:38am

Great. If they need much more processing power for image processing. The easiest way is to stream the video wirelessly to an ordinar PC. Thats the way how WE PEOPLE remotely process images and video nowday by our eyes. Remotely, on a PC, they will recognise a latency of between 250 and 1000ms between the reality and the image process, but it would be sufficient for the SW image processing developpment work. Resuls can be transported to the ARM or Cortex or whatever Drone board later.

They may use graphic cards later on instead ordinary xyboard. Graphical cards are small enough and extremly powerful...
Sunil Shah September 26, 2014 at 1:01am

Hey Chris,

If I remember correctly, we operate on the colour image.

I'm not too sure about the characteristics of the PX4Flow camera system - global shutter and high frame rate is good, what's the field of view like? The Logitech camera worked well in terms of horizontal field of view but we had a problem when the quadcopter got below ~ 2 metres where the landing pad would sometimes not fit entirely in the camera's vertical field of view. Looks like it'd work better for low light situations too - we suffered a lot when it was cloudy outside and the camera wasn't able to pick up contrast between the red and white area on the landing pad!
Chris Card September 25, 2014 at 9:13pm

Great project!

I have downloaded the paper & I look forward to reading it.

Is the video image converted to monochrome or is the processing performed in full colour?

..Almost OT...

For the purpose of identifying a landing pad target could the PX4flow camera be used as a video source for this processing system.

The PX4 Flow has a very high frame rate, global shutter and it has a USB port.
John C. September 25, 2014 at 8:57pm

Really excellent work by everyone involved. Congrats!

of 2

This reply was deleted.