From twoview to a complete sparse 3D point cloud

Continuing on from the last post on this subject, here's a complete sparse point cloud generated from some 40 images. In the twoview case it became apparent that you can triangulate points from 2 images. In a two-view match you sometimes get inaccurate or incorrect matches, which lead to outliers. If the feature is consistent and static, you can triangulate points from a 3-view instead. Such 3+ matches quite perfectly eliminate outliers, which leaves you with a sparse point cloud that then mostly contains inaccuracies due to (relatively rough) pixel measurements, incorrect distortion parameters, slight drifts in feature recognition, pose fitting errors, etc.

In this stage of processing, the sparse point cloud generation, the objective is to discover camera poses at the same time as adding new points to the cloud so that future matches can take place and the cloud can grow. In this case, I use the point cloud itself to estimate future poses. For each 3D point, I maintain a list which images contributed to that point. Then a new image which has matches with already registered images can figure out which feature match in its own image corresponds to an existing 3D point in the cloud. Then I simply build a list of 3D points and 2D points that should correspond together. When I have that information, I can figure out, based on how the 3D points should appear in the image, where the camera ought to be located. So it's basically "triangulating backwards" from the points to the camera knowing where they are projected on the sensor in 2D and then figuring out where the sensor was located.

When I have the pose, I triangulate matches that I do not yet have in the cloud as new 3D points and grow the cloud a little.

The order in which you attempt to add cameras (images) to the cloud is important because the current state of the 3D point cloud determines how many points you have available for pose estimation. If that number is low, you may have very little or inaccurate information (outliers!) to do the pose estimation. If the pose is bad, the point cloud deteriorates and future poses cannot be determined.

So, how does it work in more detail in a way that makes the solution stable?

"Boot" the point cloud using two images only as in the previous article.
Grab a new image and find all matches with other images that we already have camera poses for.
Create a vector of 3D (point cloud) positions and 2D (image) points which should correspond.
Estimate the pose in combination with an algorithm called Ransac to remove outliers (grab some points, try a fit, see if it can improve, exchange some points for others, iterate towards a best fit).
Refine the pose estimation further.
Triangulate new 3D points that we don't have yet by looking at matches from this image with other images (cameras) we already have in the point cloud.
Refine the new 3D points.
back to 2 if there are more images.
Print out a point cloud for all 3D points that have more than 2 "views" (points which originated from more than 2 images).

This sparse point cloud, although crude, can already serve certain purposes. It still needs to be subjected to a process called "Bundle adjustment", where poses and 3D points are refined further on a global scale. The outcome of that improves the appearance of planar surfaces and further refines the camera poses.

So what does this teach us about collecting uav data?

Always make sure that each feature appears in 3 or more images to ensure it's stable. Too little overlap can still produce point cloud data, but at the cost of having many outliers and low numerical stability of the solution in the processing pipeline. Some processing tools will simply discard the feature, others keep them and attempt to "smooth" them out into the rest of the cloud, usually creating humps or valleys in objects. Make sure the survey area is large enough for all objects you want to have with accuracy. Better data is much better than relying on algorithms to interpolate/extrapolate and in other ways fantasize data together.
In processing this set I recognized that I had a very large number of stray points right above an area where a tree was expected. Turns out that features of that tree were not stable and not recognized in 3+ images, so triangulation produced a very noisy subcloud in that location. Eventually all those points disappeared in the final cloud, leaving a hole in the point cloud at that location, because the ground under the tree was never triangulated. Again: vegetation needs very high overlap.
Adding (correct) GPS to images reduces processing time if the pipeline knows how to use location data. In this set I used the telemetry log (over data), which contained errors, sometimes 60 degrees out. Not all tools (not even commercial ones!) deal with such GPS errors or missing information correctly. Worse even, the images were eliminated from the set, reducing local overlap and thus the number of views per feature, which could lead to bad pose estimations and local inaccuracies.
It even further explains why surfaces like water cannot be mapped. Everything that's moving around while pictures are taken result in features that match in 2 images, but not the third. Even if it matched consistently, it will eventually be filtered out as noise.

Interesting ideas:

- use two cameras instead of one, horizontally apart even by just a little bit. This will double the number of images and increase the chances to reproduce vegetation correctly (stereo imagery without the "snapped at the same time" constraint).

- variable speeds and CAM_TRIGG_DIST for a mission? When over simple geometry speed up, when over complex geometry slow down to improve the match quality.

Blogs

From twoview to a complete sparse 3D point cloud

Comments