Continuing on from the last post on this subject, here's a complete sparse point cloud generated from some 40 images. In the twoview case it became apparent that you can triangulate points from 2 images. In a two-view match you sometimes get inaccurate or incorrect matches, which lead to outliers. If the feature is consistent and static, you can triangulate points from a 3-view instead. Such 3+ matches quite perfectly eliminate outliers, which leaves you with a sparse point cloud that then mostly contains inaccuracies due to (relatively rough) pixel measurements, incorrect distortion parameters, slight drifts in feature recognition, pose fitting errors, etc.
In this stage of processing, the sparse point cloud generation, the objective is to discover camera poses at the same time as adding new points to the cloud so that future matches can take place and the cloud can grow. In this case, I use the point cloud itself to estimate future poses. For each 3D point, I maintain a list which images contributed to that point. Then a new image which has matches with already registered images can figure out which feature match in its own image corresponds to an existing 3D point in the cloud. Then I simply build a list of 3D points and 2D points that should correspond together. When I have that information, I can figure out, based on how the 3D points should appear in the image, where the camera ought to be located. So it's basically "triangulating backwards" from the points to the camera knowing where they are projected on the sensor in 2D and then figuring out where the sensor was located.
When I have the pose, I triangulate matches that I do not yet have in the cloud as new 3D points and grow the cloud a little.
The order in which you attempt to add cameras (images) to the cloud is important because the current state of the 3D point cloud determines how many points you have available for pose estimation. If that number is low, you may have very little or inaccurate information (outliers!) to do the pose estimation. If the pose is bad, the point cloud deteriorates and future poses cannot be determined.
So, how does it work in more detail in a way that makes the solution stable?
This sparse point cloud, although crude, can already serve certain purposes. It still needs to be subjected to a process called "Bundle adjustment", where poses and 3D points are refined further on a global scale. The outcome of that improves the appearance of planar surfaces and further refines the camera poses.
So what does this teach us about collecting uav data?
- use two cameras instead of one, horizontally apart even by just a little bit. This will double the number of images and increase the chances to reproduce vegetation correctly (stereo imagery without the "snapped at the same time" constraint).
- variable speeds and CAM_TRIGG_DIST for a mission? When over simple geometry speed up, when over complex geometry slow down to improve the match quality.