Turn your phone into a 3D scanner - and next your quadrotor - Blogs

The Computer Vision and Geometry Lab of ETH Zurich (Swiss Federal Institute of Technology) might be in this community mostly known for Pixhawk and PX4, but our research focus is in fact 3D reconstruction, which means to create 3D models from 2D images (mostly, but we also use RGB-D cameras).

To reconstruct an object, the user takes a number of images from different viewpoints around the object and the 3D points will be rendered as they have been measured over multiple images, iteratively building the complete 3D model. Use cases include culturage heritage (like the objects shown in the video), building 3D photos / statuettes of your family or even reconstructing a couch (or a heating system) to see the if the furniture fits your living room or the heating appliance fits your basement. 3D reconstructions will become as usual as 2D pictures are today.

The novelty in what we presented now for the first time publicly at the International Conference on Computer Vision in Sydney, Australia lies in that the 3D model is created interactively directly on the phone, allowing the user to intuitively understand if he has acquired enough and the right images from enough viewpoints to create a complete 3D model. The previous state of the art required to take images, upload them to a server and only get a 3D model back many minutes later. We still can leverage the cloud computing power to refine the model obtained on the phone.

It can be compared to the move from analog to digital photography: A digital camera allows to preview how the image looks while (or shortly after) taking it, as does our approach. The previous state of the art meant to take images and only get the result back later, potentially after having left the scene and not being able to take a better picture or adjust the viewpoint.

These results are highly relevant for sUAS / micro air vehicles, because the processing boards available for these small platforms use the same type of processors as mobile phones. Because our technology also provides camera position feedback (suitable to steer the aircraft without GPS at a rate of 20-50 Hz and ~20-100 ms latency) it can be used to autonomously reconstruct larger objects. While vision based flight has been demonstrated earlier successfully (e.g. by our group or our colleagues from ASL), the results obtained on the mobile phone add a dense 3D point cloud (allowing things like terrain following and autonomous waypoint planning around the object) and add very efficient processing on top. We do plan to leverage both normal cameras and RGB-D cameras as they become available in small form factors.

The app is at this point a technology demo and not available for download. Our intent is however to bring a demo version of it into end-users hands as soon as possible, but we can't provide a date yet.

TOF cameras suffer from requirement for actively "lighting" the scene, which is not always feasible or even possible especially outdoors.

Even though TOFs can use very short duration illumination, it is very hard to generate sufficient illumination for sunlit conditions or long distances.

Stereo offset cameras have depth resolution that falls off rapidly with increasing distance.

And the offset is generally optimized for a specific range.

Since the method described here has the capability having the camera at various offset distances, it is effectively a (multi) stereo camera system with an infinitely adjustable offset resulting in a self optimizing stereo offset methodology.

Since it sequences many frames to arrive at it's final depth information it is effectively multi-channel stereoscopic.

Since it is not dependent on providing its own illumination it is not subject to those inadequacies either.

This is a method of reconstructing the physical world much more like our brain works.

Even though we have stereoscopic vision, our inter-pupillary offset only provides useful stereo information out to a few feet.

All the other depth / size / physical construction information we perceive is done by the interpolation of our brain of the photo-optical information it has received and matching it with previous experience.

These guys are mimicking that process by much simpler means.

This is how we can get machines to see like we do.

Yes?

You need to be a member of diydrones to add comments!

Join diydrones

Comments

Developer

Sandro Benigno December 8, 2013 at 1:24am

@Lorenz, that's awesome! Congratulations for all the team.

@Dincer: the huge difference is that it's real time point-cloud generation. Photofly only creates the point cloud by post-processing the images taken.
dincer hepguler December 7, 2013 at 2:41pm
dincer hepguler December 7, 2013 at 2:36pm

awesome technology... not new though... years ago i used a similar software called Autodesk Photofly... here is the youtube link of a 3d video i made with PhotoFly:

https://www.youtube.com/watch?v=vzdzWSWai9g
John Githens December 7, 2013 at 5:42am

Incredible. Possibly an example of this.
Oliver December 6, 2013 at 11:39pm

Gary, you took the words right out of my mouth (NOT, LoL): Great synopsis of the huge difference between this and TOF, thanks for that.

Lorenz, is there some way to get on the list for the upcoming "end user" demo version of this brilliant thing?
Chris Card December 6, 2013 at 8:08pm

Totally awesome work!
Gary McCray December 6, 2013 at 4:56pm

TOF cameras suffer from requirement for actively "lighting" the scene, which is not always feasible or even possible especially outdoors.

Even though TOFs can use very short duration illumination, it is very hard to generate sufficient illumination for sunlit conditions or long distances.

Stereo offset cameras have depth resolution that falls off rapidly with increasing distance.

And the offset is generally optimized for a specific range.

Since the method described here has the capability having the camera at various offset distances, it is effectively a (multi) stereo camera system with an infinitely adjustable offset resulting in a self optimizing stereo offset methodology.

Since it sequences many frames to arrive at it's final depth information it is effectively multi-channel stereoscopic.

Since it is not dependent on providing its own illumination it is not subject to those inadequacies either.

This is a method of reconstructing the physical world much more like our brain works.

Even though we have stereoscopic vision, our inter-pupillary offset only provides useful stereo information out to a few feet.

All the other depth / size / physical construction information we perceive is done by the interpolation of our brain of the photo-optical information it has received and matching it with previous experience.

These guys are mimicking that process by much simpler means.

This is how we can get machines to see like we do.

Yes?
Gary McCray December 6, 2013 at 4:44pm

This is really interesting and most important work Lorenz.

Real time image analysis to extract 3D point cloud from non-aligned multiple frames is going to be very useful in the future.

Should contribute significantly to machine vision and robotics.

Looking forward to your release code.
JesseJay December 6, 2013 at 3:30pm

I'm thinking there are many applications for TOF cameras, but they are not a great solution for every problem. They have some weaknesses. TOF cameras will probably not work well in sunlight, nor will they work at longer distances. Plus they are very low resolution compared to what even a simple webcam can provide.
Dan Neault December 6, 2013 at 7:09am

Nice work!

Bbut I have to ask why?

There are several companies with TOF camera solution that fit into smart phones. to be introduced shortly.

The latest Gen, is very sweet, and you don't even need a Dev kit, just grab a Xbox One, if you want to see how good. (there are lots of TOF dev kits as well, just goggle TOF camera)

of 2

This reply was deleted.