Turn your phone into a 3D scanner - and next your quadrotor

The Computer Vision and Geometry Lab of ETH Zurich (Swiss Federal Institute of Technology) might be in this community mostly known for Pixhawk and PX4, but our research focus is in fact 3D reconstruction, which means to create 3D models from 2D images (mostly, but we also use RGB-D cameras).

To reconstruct an object, the user takes a number of images from different viewpoints around the object and the 3D points will be rendered as they have been measured over multiple images, iteratively building the complete 3D model. Use cases include culturage heritage (like the objects shown in the video), building 3D photos / statuettes of your family or even reconstructing a couch (or a heating system) to see the if the furniture fits your living room or the heating appliance fits your basement. 3D reconstructions will become as usual as 2D pictures are today.

The novelty in what we presented now for the first time publicly at the International Conference on Computer Vision in Sydney, Australia lies in that the 3D model is created interactively directly on the phone, allowing the user to intuitively understand if he has acquired enough and the right images from enough viewpoints to create a complete 3D model. The previous state of the art required to take images, upload them to a server and only get a 3D model back many minutes later. We still can leverage the cloud computing power to refine the model obtained on the phone.

It can be compared to the move from analog to digital photography: A digital camera allows to preview how the image looks while (or shortly after) taking it, as does our approach. The previous state of the art meant to take images and only get the result back later, potentially after having left the scene and not being able to take a better picture or adjust the viewpoint.

These results are highly relevant for sUAS / micro air vehicles, because the processing boards available for these small platforms use the same type of processors as mobile phones. Because our technology also provides camera position feedback (suitable to steer the aircraft without GPS at a rate of 20-50 Hz and ~20-100 ms latency) it can be used to autonomously reconstruct larger objects. While vision based flight has been demonstrated earlier successfully (e.g. by our group or our colleagues from ASL), the results obtained on the mobile phone add a dense 3D point cloud (allowing things like terrain following and autonomous waypoint planning around the object) and add very efficient processing on top. We do plan to leverage both normal cameras and RGB-D cameras as they become available in small form factors.

The app is at this point a technology demo and not available for download. Our intent is however to bring a demo version of it into end-users hands as soon as possible, but we can't provide a date yet.

 

 

Views: 3023


Developer
Comment by Andrew Tridgell on December 6, 2013 at 1:30am

nice work Lorenz!


Developer
Comment by Linus on December 6, 2013 at 2:17am

 great work! and i believed 123d stuff is top notch...

Comment by Project Nadar on December 6, 2013 at 2:39am

Simply wow.... 


Moderator
Comment by Gary Mortimer on December 6, 2013 at 4:19am

As ever, blimey


Developer
Comment by Rob_Lefebvre on December 6, 2013 at 7:00am

Amazing stuff!

Lorenz, I wonder if this type of application might perform better with stereoscopic vision?  If you had two imagers, separated by a calibrated distance, with known lens refraction, wouldn't the software be able to resolve the 3D shape much faster and easier?

I understand the intent is to make something that runs on a typical smart phone.  But I'm just thinking about the next step.

Comment by Dan "HotSeat" Neault on December 6, 2013 at 8:09am

Nice work!

 

Bbut I have to ask why?

There are several companies with TOF camera solution that fit into smart phones. to be introduced shortly.

The latest Gen, is very sweet, and you don't even need a Dev kit, just grab a Xbox One, if you want to see how good. (there are lots of TOF dev kits as well, just goggle TOF camera)

Comment by JesseJay on December 6, 2013 at 4:30pm

I'm thinking there are many applications for TOF cameras, but they are not a great solution for every problem.  They have some weaknesses.   TOF cameras will probably not work well in sunlight, nor will they work at longer distances.  Plus they are very low resolution compared to what even a simple webcam can provide.  


Wiki Ninja
Comment by Gary McCray on December 6, 2013 at 5:44pm

This is really interesting and most important work Lorenz.

Real time image analysis to extract 3D point cloud from non-aligned multiple frames is going to be very useful in the future.

Should contribute significantly to machine vision and robotics.

Looking forward to your release code.


Wiki Ninja
Comment by Gary McCray on December 6, 2013 at 5:56pm

TOF cameras suffer from requirement for actively "lighting" the scene, which is not always feasible or even possible especially outdoors.

Even though TOFs can use very short duration illumination, it is very hard to generate sufficient illumination for sunlit conditions or long distances.

Stereo offset cameras have depth resolution that falls off rapidly with increasing distance.

And the offset is generally optimized for a specific range.

Since the method described here has the capability having the camera at various offset distances, it is effectively a (multi) stereo camera system with an infinitely adjustable offset resulting in a self optimizing stereo offset methodology.

Since it sequences many frames to arrive at it's final depth information it is effectively multi-channel stereoscopic.

Since it is not dependent on providing its own illumination it is not subject to those inadequacies either.

This is a method of reconstructing the physical world much more like our brain works.

Even though we have stereoscopic vision, our inter-pupillary offset only provides useful stereo information out to a few feet.

All the other depth / size / physical construction information we perceive is done by the interpolation of our brain of the photo-optical information it has received and matching it with previous experience.

These guys are mimicking that process by much simpler means.

This is how we can get machines to see like we do.

Yes?

Comment by Swift on December 6, 2013 at 6:45pm

+10 on my WOW factor!

Comment

You need to be a member of DIY Drones to add comments!

Join DIY Drones

© 2014   Created by Chris Anderson.   Powered by

Badges  |  Report an Issue  |  Terms of Service