Gerard Toonstra's Posts

Opensourcing ESC work

Posted by Gerard Toonstra on July 12, 2015 at 6:00am

So almost a year ago, I posted about how I started work on a field oriented control ESC:

Some people in the comments already said how they made their own or started work on different ESC's. I went through 5 iterations of the ESC after the post.

Unfortunately, I never got any hardware working reliably for pretty simple issues in the end, which is frustrating. Primarily, I could never get the buck boosters of the TI DVR830x to work reliably and there were issues with SPI voltage and behavor. This caused a failure of 8-9 out of 10 boards or sometimes an entire batch. To produce one batch here is expensive: 60% import costs, $40 shipping for 2 boxes (one boards, one components) and then there's the currency rate of R$3 to the $1. In the end, to produce one batch here feels like spending $1200 dollars if I were in the US.

Important: I'm not saying the booster of the chip doesn't work correct, but that it's probably something in the process of my production (the "pizza oven bake") or the design of the component values around it or routing of the board.

Anyway, moving on... I did huge amounts of research into field oriented control. I built a FOC simulator in python and there's some 150MB of documents related to FOC implementations, ranging from presentations of TI about the subject to application notes, research papers, etc, plus I include some remnants of the board designs. Obviously, no point including boards of failing hardware, so just the schematics...

From the image above, all blocks at the top are really easy. They're only conversions from 3-axis to 2-axis models using clarke/park transforms. The difficult part is the "angle and rpm estimator", which is where the magic really happens. Eroding all complexity... it's trying to figure out bemf magnitudes from the Ialpha and Ibeta currents (in a 2-axis view), so that over a couple of samples it can converge to an angle. Another piece of code then gets this angle estimate and figures out the rpm, which is fed back into the angle estimate, because the rpm doesn't change all that quick. All that's left is tuning and the correct L and R inputs for the motor you're using.

The repository is here, along with a README on where to find the stuff.

https://github.com/gtoonstra/foc_esc

Disclaimer: No guarantees that any of this stuff actually works...

Experiments with an industrial USB webcam

Posted by Gerard Toonstra on July 3, 2015 at 3:51pm

Above is a fullsize image of a 5MP webcam. A webcam usually has a CMOS sensor and commercial ones support up to 720p resolution. 720p is a result of the usb bus bandwidth available to stream the images and since many computers still use USB 2.0, this is the maximum resolution for reasonable framerates of 25fps or so. At USB 3.0 you can achieve 30fps with uncompressed data, or 1080p@15. With MJPEG streams this increases to 55fps/30fps respectively with "full size" resolutions somewhere between 2-10fps.
I have a raspi and a piNoir cam and I was disappointed by the severe amount of rolling shutter. This is an industrial USB cam with an M12 lens mount.

I created this image by just waving the camera module around a lot. Notice the umbrella. A uav probably doesn't have such extreme movements, but there will be vibration involved.

The chip that's onboard is a Cypress CX3. The CX3 is an ARM processor with multi-lane image sensor readout capability, a USB 3.0 interface and a development environment that facilitates development of everything that you can stream over USB. The chip only has 512kB memory for code, memory, stack etc., yet the JPEG images (or YUV) are at least 1MB, so there's no way to buffer the image data. Uncompressed it's more than twice and for encoding you need a bit of memory too. This means that data flows through the following pipeline:

Host <--USB--> CX3 <--MIPI--> sensor

So it's probably unable to stream the data out quick enough, which means it needs to hold out on streaming more lines of data from the sensor. That's a theory, because I'm not sure how this process works in detail. It is likely though that using USB 3.0 will reduce the effect; usb-2.0 is 480 Mb/s and 3.0 is 5 Gb/s (not all of which usable for data, there's a lot of framing overhead).

These cameras also have an auto-managed shutter time. It sounds great if they claim it achieves 1/1600, but given the pipeline above, that only applies to the lines being read at the moment and since they're not read fast enough, you still have huge rolling shutter effects. Those effects are difficult to remove, because if vibration is involved or you have non-constant angular velocity, it's not a simple operation.

So why the interest in this camera?

It's only 20g and very small, very easy to attach
The lens is very easily replaceable; IR, IR-cut, fov, filters, gels, etc.
It has a GPIO pin to initiate the capture and this is accurate. You literally "arm" the camera
It has a GPIO pin that apparently indicates when a capture is taken for "strobe" use.
It can take up to 8 frames per second.
Two of these cameras can be hooked up, shoot together to 1ms precise.

The disadvantages:

You need a host computer with USB 3.0 interface. Not sure if embedded systems are fast enough to process the data and this must fit on the vehicle and be light.
It's unlikely that an embedded system exists with two independent 3.0 usb hubs.
The shutter speed is probably not as good as a P&S camera, which impacts blur.

There's also a 13MP usb cam, but this would increase the data to be transferred by a factor of 3, increasing the rolling shutter.

Probably the better choice is just grabbing a gopro, exchange the lens and then store the images onboard. The gopro includes a processor and probably temporary memory for processing. Anything custom built is unlikely to be lighter than 90g.

The P&S cam probably reads the sensor as fast as it can with fast memory inbetween for temporary storage, which is read out and processed prior to storing on an SD card. Perhaps an expert can chime in here.

There are interesting CMOS cameras that use a mechanical shutter and special CMOS sensors that hold their charge for much longer. This means that the sensor is exposed at the same time, but the readout process (in a dark area) can then take a bit longer. There are also the global shutter CMOS sensors. However, those are $3k or so without camera body.

Project Ion charter

Posted by Gerard Toonstra on June 7, 2015 at 1:57pm

I'd like to invite the diydrones members to participate in a project I started on github: Project Ion on github

The control of uav's have improved a whole lot in the past couple of years and you can now pretty much put one in the air and wait for the results to come back, whether these are videos or images. There are numerous posts and videos of people who show how they made orthomosaics, 3D models, point clouds and digital elevation models.

I started Project Ion with that motivation in mind; thinking from the perspective of your customer (or your own needs), how do you grab the orthomosaic / DEM you just created and start to answers the real questions that end users have and answer those questions in a visually compelling way? Not every office contracts a bunch of engineers to work with ortho and DEM data.

The work above was done with a team of experienced surveyors. In this case I played a supporting role to provide them with the orthomosaic, which was for visualization purposes. They used the geo-referenced ortho themselves to validate their collected data points. Since I have my own methods to derive an accurate set of control points, you get two disjoint data sets which have no dependency on one another. Here, I'm just demonstrating how the ortho could be annotated with help of some of their survey points to better visualize the land and the boundaries. In many cases, surveyors provide an iconic diagram of the land made in autocad and they don't have acces to a recent ortho. So it makes sense to also think towards support roles for uav in surveys, where the orthomosaic is just used as a means of cross validation and visualization of the results.

Another thing that came up is that the area had recently been leveled. They were interested in the accuracy of this leveling. With a contour map, you can easily process the DEM to extract these contour lines at any interval. Of course, everything can be exported to CAD tools for further processing.

All images above were made with the help of QGIS, now starting to become a very powerful GIS application with a bunch of plugins for many different purposes.

So I'm looking for contributors to set up this knowledge base. Not everything has to be written as a tutorial on the wiki, the idea is to link to existing content, link to videos and just show what others have done, generate ideas and make it a wiki that you can browse to find really cool applications that go further than orthomosaics from pix4d/photoscan, then explain how such resuls are achieved, etc. I'm not excluding that the project develops some of its own scripts and tools to faciliate in that process.

How to become a collaborator? Send me a privmsg on diydrones if you're interested and your github username so I can add you as a collaborator on github. If you don't have a github account and want to send me a cool link to some video or tutorial on how uav data is being used or applied, feel free to send this in a privmsg as well.

Cheap 1-2cm (scale) accuracy for your surveys

Posted by Gerard Toonstra on April 21, 2015 at 7:46pm

Image: Topcon RTK collecting survey points

High precision GPS systems are getting more known and used and RTK is already becoming a familiar acronym. This post is about the use of RTK for collecting ground control points to be used in high quality survey results, not about RTK for navigation use (that's a lot harder). What I was looking for is a method to collect about 5 control points for surveys in a way that is cost effective, but also doesn't cost me a lot of time. I'm working in the tropics, so standing 30 minutes in the sun and dust is a real chore and not very comfortable. Here I describe how I do this using low-cost L1 GPS modules that cost ~$50 each with an additional ~$150 for a recording device and a good antenna.

If I were to use the above RTK station, I would get really good accuracy (L1+L2), but I would also spend > $20k that I'd have to recover from business income and I'd have to spend 5-10 minutes per point to get an accurate fix, since I only had one station. With many modules you also depend on a proprietary radio link, so you'd eventually have to buy two if you need to set up your own reference station somewhere.

First, let's consider there are two kinds of accuracy:

1 Absolute accuracy; the difference between measured and actual coordinates of the point in world space.
2 Relative accuracy; the difference between measured and actual offset of one point and a reference point in the survey area.

The second characteristic by itself is already extremely useful for high precision volume and distance measurements. All professional surveys have this characteristic, which is essentially the characteristic of correct scale. Absolute precision is only really needed when you want to exchange this data with other systems and/or use different data sets together (now and in the future). The absolute accuracy of a data set can always be improved after the fact by simply applying a unique global offset to all the data or control points in 3D coordinates (meters, inch, whatever).

In this approach, we're going to store the raw observations in a file to be processed together at a later point in time. RTK GPS has a plethora of online discussions and examples where you require some kind of radio or wifi link between modules, but this is not actually a prerequisite of the technology. Having files is much better, because you can work out the exact configuration and settings in the office and so you won't make any configuration mistakes in the field. Also, a radio/wifi link is extra interference for your uav and not to forget the GPS module itself and if those links drop out for a reason, you lose all the data and have to come back again.

What you record is raw GPS output data, straight from the module, so there's little that can go wrong. You get the best results if the frequency for all modules is equal, because sometimes the tools get confused if there's an output frequency mismatch. Since modules are set up as static, I just set this frequency to 1Hz for all modules. I myself purchased some ~$50 modules (NEO-6T from emlid), put them in the field with any device that can read the modules (uart/usb) and record the data to an HD, SD, etc. If all modules were recording in the same time period, you can post-process the raw GPS logs from the comfort of your office.

I use RTKLIB to do all that: http://www.rtklib.com/

The process is:
1. select the dataset to be used as reference, call it REF
2. convert that to RINEX. Make sure to use the correct data source format.

optional:
To get high accuracy absolute positions from the field if you do not have an accurate reference yet, you can use PPP to derive it's position. This will give a higher absolute accuracy than a cellphone:
3. download the "SP3" file from the IGS data repository (available 3hrs-24hrs after the survey): https://igscb.jpl.nasa.gov/components/prods_cb.html ; (available 3-24hrs later).
4. post-process the reference point REF using "PPP static", "Precise" solution, "Iono estimate" and "ZTD estimate" and select the SP3 file together with the NAV and SBS file using rtkpost.exe. You can also use the CLK file, but in my case it actually hurt results. Some more research there is needed.
5. if you have ~30 mins of data, the processed solution should be accurate to about 70cm (absolute) and probably is better than a meter. If you need better accuracy, the only way to get this better with PPP is to keep the module longer in the field (6-24 hours). After that, it's very much a black art which gives best results, but some people report you should expect ~10cm. That's not 1-2cm for the relative accuracy.

6. Use either the calculated position or the one from your cellphone. Let's call that REF.
7. select the dataset (one of the other modules) to be used as "rover", let's call this ROVER here.
8. convert ROVER dataset to "RINEX" data format using the rtkconv.exe program. Make sure to set the right source data format (of the source file).
9. then use rtkpost.exe to post-process both files, applying the settings you'd use in rtknavi.exe and process away. You should insert the position from REF in the "antenna position" now in the settings dialog and use "static" as the mechanism for post-processing, probably with "continuous" instead of "fix and hold".
10. Eventually there should be a fix for a long period of time. You can then average that out or see how much the variation is and not bother further. It should be 1-2cm when modules were not far apart.
11. Repeat from 2 for the other modules.

This method is cost- and time-effective. With 5 modules, you'd spend $1000 (raspi+GPS+good antenna ~= $200) to be able to measure 5 control points at the same time without waiting 5-10 minutes for each one. You can drop them in the field prior to a mission, turn them on, walk back and execute the uav flight. If that took about 30 minutes to complete, you can already retrieve the modules and go back to the office for post-processing.

Here's a nice tutorial about PPP with rtklib, but some options only make sense when you have L1+L2 data:

http://blog.latitude51.ca/rtklib-part-3-precise-point-positioning-with-igs-products/

I tested the above processing using a couple of well known reference positions of the local university, so the numbers of 1-2cm precision for RTK, 70cm PPP static are from such tests, however all tests were executed in one afternoon, so this doesn't give a lot of variance given the weather, ionosphere and solar activity. Actual results in the field will therefore vary, and vary per day too.

Work domain analysis and cognitive systems

Posted by Gerard Toonstra on February 11, 2015 at 6:00pm

I used to work as humble research engineer/scientific developer for a research lab in the Netherlands. It's how I got into this business eventually. In that environment I also got introduced into the various topics related to cognitive science, interface design, cognitive workload, work domain analysis and how this work typically generates complicated technological requirements that usually require new algorithms or functions, leading to systems that look and feel different than if they were designed by an engineer looking at the technical features of a device.

One text that circulated there is still a very interesting read. It presents a view on drones, their control systems and the way how you interact with them from a different perspective. The usual perspective of (mostly) technical engineers on this site is how the functions map to a hardware architecture and/or how the software is modularized. You then compartmentalize specific functions to those areas and you present the concept.

This post serves to present this alternative, more cognitive perspective. We first have to realize that the overall system is not limited to the drone and the ground control station. The system's boundaries include the operator, pilot, engineer and any other person involved and should also include how well they are trained to communicate together.

An interesting paper to read on this subject is this one, produced by a colleague working in that lab:

http://repository.tudelft.nl/assets/uuid:0af8c4fa-5e6f-4328-b933-0e6af240ea99/Amelink_PhD_thesis.pdf

Key issues in this paper is this graph, which demonstrates how you can consider "flight" from the perspective of cognitive systems. The technique used to derive this graph is called "Work Domain Analysis" and it's more of a cognitive analysis of the work at hand then a design that attempts to identify and locate where processes go in an architecture.

Because we look at this work domain from a cognitive systems perspective, it applies to drones, airplanes, manual flight, automated flight, etc., because the technique allows you to swap out human cognition with automated systems and vice-versa. So the only thing that really changes is whether a machine is doing it or a human. This makes it easier to figure out what the tasks are of this automated system and it also allows you what skills are needed to make sense of the input and output of these automated systems, so you can identify what the better user interfaces are, or figure out in which cases they are more applicable. In other words... "work" here doesn't mean energy, but cognitive functions that either a human being or an automated system may provide.

So let's apply the graph above to a pilot in a cessna...

1. At the lowest level of "flight" you have the physical characteristics of the airplane. This analyzes the available sensors, the fuel capacity, the surface control, motor, propeller, energy consumption to maintain flight, etc. The pilot in this case has an awareness of the physical limitations of the airplane, which is important at the higher levels for mission planning and during piloting the flight envelope, etc. Basically, this lowest level provides the capability of flight, but only when knowledge or control systems are applied will the airplane become airborne.
2. So the next level is "flight control". Here we consider issues that are of immediate concerns when it comes to staying in the air, efficiency, reliability, staying well inside the flight envelope, wind estimation, gusts, etc. Basically, you could say that the pilot can now effectively fly, but he doesn't know yet to go anywhere useful. The lower level of flight has a high impact in terms of how they define the constraints at this level.
3. The "Aviation" layer is located here, which is an intermediate layer not yet generally implemented on hardware. This aviation layer is about flying safely considering environmental constraints like buildings, church towers, trees. During planning you don't generally have information available at this required density, so the aviation layer is the pilot slightly deviating from a navigation plan to meet those constraints.
4. The "navigate" layer is where you consider the cognitive functions necessary for mission planning. You look at no-fly zones, flight altitudes and start planning your flight through the air. It's possible to do this without a map if you have a cognitive map of the area and know where things are located in space. The 'navigate' layer is about planning where to go and tracking that it does indeed happen that way.
5. The "mission" layer is where you determine what needs to be done. In this case it's the cessna pilot talking to his client where he wants to go.
6. The "joint mission" layer is where you have different pilots executing two different missions and coordinating how they work together to achieve those goals.

What are some interesting observations after this analysis?

When the cessna is in a direct emergency situation, the "navigate" and "mission" layers become much less important. The flight control and aviate layers suddenly become the only active layers in this system, because the pilot is only concerned with putting the craft down anywhere acceptable. His immediate "planning horizon" is therefore significantly different. So this is an example how this analysis helps to serve cognitive requirements for ground control stations. If you receive through telemetry that a craft is about to go down, you can adjust the user interface to better cope with that situation. The mission that was planned has become totally obsolete and you want instead to switch to a camera view if you have one, have better control over the last position and allow different control mechanisms for the operator (auto switch to manual?) to get better control over the situation. In other cases, you may want to switch all your algorithms and automate the process of emergency landing. (the easiest way out is to deploy a parachute, but you may want to determine the correct location to do this first).

How this also helps is to figure out what can be automated, what kind of impact this has on the situational awareness of the operator, or what kind of interfaces you should provide to make this automation effective. A very large factor that determines the effectiveness of automation is the ability of the operator to understand the reasoning process of the computation device and to interpret the results of that process. So if you develop a computer where you perform calculations that have actions in the real world, but then only put a LED on top that starts to blink when the computation is complete, you make a system that makes them nervous. The challenge here is to visualize the results or elaboration.

The idea is that you try to bother the operator with as little detail as possible, so you must find abstractions for complicated planning elements and insert higher level handles and tools for the operator to be able to influence that planning process. Another discussion that this work provokes is a discussion where to execute this automation process. Usually the network link we have with a drone is pretty limited in how much data can be transferred, so this limitation in bandwidth constraints how much of the reasoning process can be displayed to the user. As lessons in big data dictate, rather than trying to move all the data to where the computation happens, you need to bring the computation to the data instead. There are reasons why you wouldn't want a uav to be quite as autonomous as you can make it, because for reasons other than autonomous reasoning you may want to have finer control and insight to this planning insight by the operator.

If you want to know more about how these techniques help you to analyze how a drone fits into your organization, I recommend going to the site of Gavan Lintern: http://www.cognitivesystemsdesign.net/ ; . He provides some tutorials after you register and they demonstrate how relatively easy it is to apply them. What you find out after you apply them is new insights into what kind of knowledge and instinct really is involved with the task at hand. Primarily, the objective is to figure out what the concerns are that people have during control tasks. Humans are pretty clever, so when you present them with a task they typically figure out instinctively what the constraints and affordances related to that task and they learn to optimize towards those in a few cycles. This means they develop, over time, a particular skill into how fast or slow they can fly, how it feels before you get into a stall, etc. ( this happens to be at the flight level ). When you analyze these carefully, you usually find great opportunities for innovation or improved algorithms.

Figure out what the concerns are of people that have a lot of experience in one area and design your automated solutions around those!

Ground station survey results

Posted by Gerard Toonstra on February 1, 2015 at 4:36pm

It's been a week that the ground station survey is online and it received 82 responses. Not a whole lot for a forum like diydrones, but most companies only manage to interview 5-10 people. So the results of this survey should be considered a valuable resource.

In this analysis, I won't highlight the points that seem inconclusive, because they didn't get convincing results. Convincing is when there's more than 20% difference between disagreement and agreement. Before you complain about these definitions, the full data is publicly available at this link, so you can reinterpret the data any way you wish:

https://docs.google.com/forms/d/1h0LPVE2wBQ9M2ggtfrSf1tVHnUgg5ygf1msSuElUdyc/viewanalytics

https://docs.google.com/spreadsheets/d/1mSGOxo_MCRRILa5vDh2Ppu9JKFlt9FmepD8_BWBsQWo/edit#gid=119881673

52% states they use the GCS as a backup mechanism for controlling the aircraft. 26% controls the aircraft mostly through the ground station and 25% does something different.
76% says that their ground station allows them to do what they need.
56% vs. 30% says you need training to be able to use a ground station.
61% vs. 19% says that they are impacted by unreadable screens in sunny weather.
30% vs. 15% thinks the PFD is indispensible, but 60% is neutral. This could indicate people do not really care about the PFD being on screen during flight? Some people would probably complain here, because when you're in the plane you can be subject to loss of orientation and this would therefore require the PFD to be installed? Except for uav's, where the IMU doesn't get confused like humans do. In the case it *does* get confused, the reading is not reliable anyway, so you can't really use it then. For multirotors the results will be catastrophic, for fixedwings you need to be a good model pilot. So it's an interesting question to be asked that perhaps PFD's are only sensible in the config and setup stage to confirm correct IMU sensing?
65% vs. 10% says that speech and audible functions are indispensable. That is a convincing majority and this makes sense, given that a large number of users are both pilots and gcs operators or a mix thereof.
59% vs. 13% thinks that they shouldn't be forced to move the waypoints individually, but somehow do this at a higher level of conception.

Then some interesting statistics:

40% operates their uav as a one-man show and about 50% is a 2+ team. 16% claims to have rigidly defines roles.
In the control question asking how the roles are defined, we see the 40% plus another 20% ( of the 2+ people teams ), where the pilot seems to get the entire say and control over the GCS. That comes close to the expectation.
45% is happy with the way how their GCS is supporting in all phases of flight. 21% needs to learn about these options more and 34% thinks there's room for improvement.
In cases of events that their craft needs to go out of the way, 43% claims to be able to do this between 2-30 seconds. 20% takes over manually instead. 37% says they either don't need it, will take too long or haven't tested this in practice.
About 80% has ever only controlled 1 UAV by the GCS at the same time. 10% 2, 8% 3, and 4% 4-6. This is however a bit tricky to analyze, because this type of swarm control is an active area of research and not all GCS's allow for more than one uav at the same time to be controlled.
40% plans the flight with the customer indoors prior to the flight. 24% does this in the field in detail and 35% does most of the planning themselves, sometimes with a bit of gesturing by the customer. None of these styles listed are considered "best" by the way, because they highly depend on the industry area you're active.

Then, to find out a bit more about respondents:

30% has 2-4 years in total related to modeling, uav's, aviation or the entire field.
34% has less than 2 years experience.
34% has more than 4 years experience.

Then the open question, where people can complain, add, or write some additional notes about the use of GCS's in general. Reducing them categorically:

General complaints about use in the field: screen readability, portability and making things easier for non-technical users, clipping on transmitter, dust on touch-screens, bulky mouse/pc, etc.
Switching tools depending on the environment: MP for inside, DP for outside, AP not used.
Permission settings on which GCS or user can control what.
The ability to keep mission-related stuff together, including cached maps, logs, etc.
User customization of screen elements (make them larger? placement, etc. ) and saving those customizations. Saving the last-known context, so you go straight back into the last thing you were doing.
Different views in the GCS for different roles: pilot, engineer, payload operator. Implies more instances.
POI, ROI and Follow me functions using flight data (specify less, better defaults?)
Run the GCS without having the uav connected yet, so less dependent on having the actual connection to the uav and being more agnostic on establishing the actual connection ( craft discovery? USB event trapping? fixing baud rates? or trying most likely baud rate and then if failure occurs, allow user to modify it? )
GCS should be inspired by game designers and figure out what's really needed on screen. More flexibility in defining the display.
GCS should be usable on iOS and touch-input devices with care taken about incorrect touches ("are you sure y/n?")
More (scripted?) automation on events. Think HDOP < x, battery (already there), etc.
Better payload and auxiliary control through GCS.
interfaces in 3D, better visualization for planning missions and landings.
Integration with R/C controls.
Make it easier to use for crash analysis and to figure out what went wrong.
Pre-flight check including parameter verification. Sounds like people actually want a "good" file on the GCS and then prior to flight verify the settings on the device are actually those ones. Peace of mind!
Manually constructing the landing procedure is too difficult.
Slow startup on more limited hardware in the field. Should be faster.

One comment attracted my interest, so I'll post it here in full:

GCS is broad term. I will speak specifically to DroidPlanner 2 and Mission Planner respectively. The fist offers limited functionality with regards to critical pilot info. ie HDOP. It seems like a tool built for the newbs to enjoy playing with technology rather than understanding the dynamics of UAV flights. Don't get me wrong it's a beautiful fluid interface, just wish I had full parameter control (w/o bugginess) and the ability to add certain reads to the display. MP is a robust toolbox that I bring out to the field because I trust it 100%. My wish is that UI/UX would be rethought. For example, I like to verify parameters before flight. At the moment this is too many clicks away and buried under heavy mouse behavior. If I could, I would integrate a customizable pre-flight check function into MP that would bring up in series specific parts of MP that give me peace of mind prior to take off. Dronedeploy is thinking about this with their system, but in a bit too controlled manner with not enough info. I've voiced some of these things to a local crew here who's developing and browser GCS. http://www.flyroutinely.com/

So these are the results of the survey... thanks all for participating, they did give me some interesting insights into what lives in the mind of the uav operators. I hope this survey contributes to making existing ground stations better, so we can make future flights safer and make our operators better informed about the status and intention of our aircrafts.

Ground station survey

Posted by Gerard Toonstra on January 26, 2015 at 6:12am

Ground stations are the means you have available in the field to interact with your autonomous vehicle, so they're a very important part of the toolset to conduct your work or hobby. The ground station as we know it now is not just there to support your activities to plan or execute the flight, they also contain functions for initial setups, configuration of sensors and parameters that manipulate their behavior and post-flight analysis tools.

I wrote a public, anonymous survey where my intention is to see how people experience ground stations in general, with the intention to learn more on how they are experienced. Are they easy to use? Do many people just use them to log the flight data? How many people are in your team? How do you divide the roles?

There are ten questions, most multiple choice, so shouldn't take too long. Looking forward to the responses.

In good opensource fashion, the detailed survey results are also open, so when you complete the survey, you get a link to download the results yourself (so far, of course). My intention is to run this survey for about a week, which is about the time a blog post sort of disappears into the archives. So this survey will close sunday evening 1st of february at 24h UTC. After that, I'll get the results and make a new blog post on the findings and the numbers.

The survey is available here:

https://docs.google.com/forms/d/1h0LPVE2wBQ9M2ggtfrSf1tVHnUgg5ygf1msSuElUdyc/viewform?usp=send_form

Multirotor testing device first prototype

Posted by Gerard Toonstra on January 17, 2015 at 5:00am

This is a tryout prototype of a multirotor testing device. Someone else built it for me actually who normally works with doors and heavier stuff, so it got bulky in the end and the rings gain too much momentum to be anything comparable to a multirotor this size. All connection points are regular bearings: inner ring is roll, second ring is pitch and third ring is yaw. The outer ring is support and should be mounted upright.

This is not shockingly new, the dragonfly indiegogo project used a tether for their testing, as seen in this video:

http://vimeo.com/105405677

This model is intended to be used for testing controller boards with motors that you'd typically put on miniquads and 5x3 props. So thrust is a bit tougher. Some improvements that need to be made are swapping the alu material for carbon and ideally making the rings round with the edge facing the direction of travel (not flat as seen here) to reduce wind drag. A slightly better method of mounting the rings is for the inner ring to rotate around the Z axis. In the current construction method, if the vehicle has some angle in either pitch or roll, the yawing forces are reduced. Such an inner rotating ring is a lot more complex to construct, so that's probably not going to happen.

Such a device offers the following benefits:

It avoids damage if the firmware isn't fully tested
You can test the controller in any weather condition
It's less likely to hurt yourself (holding the model to see how it behaves, etc)
If yaw is not used, you can hook it all up to a power supply to reduce battery dependencies and you can debug the controller without having to hold or constrain the model.
Can be used for training.

In the next prototype I intend to use a square outer support and mount this on a flexible base with 5 pressure sensors at each quadrant and one below. The generated pressure is the force the quad generates in that direction, which through a script can be translated in acceleration and eventually velocity. The acceleration and speed data with attitude can then be sent to a flight simulator to be able to test the behavior of the controller in a simulated environment and the GPS+air pressure data needs to be overridden in the controller through telemetry. The idea is to reduce the complexity of a HITL simulation, so you don't need to simulate the complex action of the props anymore and you can use a very simple UFO aircraft to test the navigation code and performance.

I can share design files with people who want to collaborate on this.

Lumix CM1 camera, a step up in quality for surveys?

Posted by Gerard Toonstra on January 6, 2015 at 10:49am

This camera may be able to improve the quality of shots taken from drones for those jobs where you need the ultimate in precision and texture quality ( I think? ). This is, according to the manufacturer, a camera with smartphone capabilities. It has a Leica Lens, 1" sensor and takes 20MP photos and will go for $1,000 in the US. It weighs 206g, so slightly more than a Canon. I think the benefit is that it doesn't have a zoom lens anymore, so it should be easier to keep the focus.

What remains is to figure out how well the battery lasts and how quickly it can snap pictures and maintain the correct exposure when you fly over land with different reflectivity. This thing having android, I see great opportunities to make it simpler to snap pics, using even for example the 3G connection so you can control it from the ground station. Of course, GPS should be included already.

If you're at CES, give us some feedback on your experiences with the phone when it comes to snapping speeds and exposure compensation :)

Redesigning multirotor ESC's

Posted by Gerard Toonstra on October 21, 2014 at 6:18am

It's been quiet on my front, but that was because I was redesigning ESC's (for multirotors and AP's).

Most of the ESC's for multirotor use the SimonK firmware on a relatively simple Atmel microcontroller. There's a single control wire running from the autopilot to the ESC, which is a signal proportionally dictates how long the mosfets are left open and as such command the torque on the motors.

And that's pretty much all there is to an ESC... No signal/wire coming back to tell the autopilot how that particular motor is doing or what the rpm or current is, it's just a "command wire". That sounds a bit antiquated for 2014.

So this picture is of an ESC dev board I first started on, here using the Allegro A4960 chip for simplicity. Shipping to Brazil takes time, so before it arrived the design already morphed into something new, so that's why the board looks unused. Both the MCU and driver chip changed on the newest development board version and I introduced testing points for oscilloscope readings; this project is about to get serious!

What are the features that I think an ESC for a modern multirotor should have?

1. Send the rpm back to the AP; for logging. I see people posting logs to request help figuring out what went wrong, but the log only states the "pwm out" for each motor, which is in no way a guarantee that the motor actually did that. So we need some feedback that states what the motor was actually doing, not what it was commanded to do.
2. Overload detection; the ESC's know what the current is and warn for overload situations.
3. Current & velocity control; neither current nor rpm is actively controlled as a proportional measure to the input PWM signal. So the control loop for the AP spans the IMU, motors, ESC and props, which is a large loop with lots of variables. This ESC will run one or two 'inner loops' and become responsible for achieving either torque or lift and run at a much higher frequency than 500Hz. What you get is that some variables no longer impact the control loop of the AP directly, which should make the vehicle more stable and likely more responsive.
4. Field Oriented Control; The flyback diodes next to mosfets typically burn energy in trapezoidal drive implementations, which increases the heat on those mosfets. This happens because the mosfets close suddenly. The motor coil wants to resist that change, so you have a current that has nowhere to go except through that diode. In sinusoidal control, there's always one mosfet open for any coil, so the current always has somewhere to go, which means the flyback diodes won't get used, so you don't lose the heat.
5. FOC; better efficiency, because the current is always perpendicular to the magnetic field. This may come at the cost of max. torque (related to motor inductance and then only about 5%).
6. FOC; lower torque ripple (1/2-1/3) vs. trapezoidal drive, so hopefully less vibrations, less whistling.
7. Send current readings back to the AP; another opportunity for precisely logging what goes on near the motors. This could be helpful to detect ESC/motor/prop health (bad bearings, prop drag, etc)
8. Configuration; the AP can reconfigure ESC's prior to flight or when in maintenance to tune it for a specific motor.
9. Motor monitoring; if the motor stopped, shorted or the mosfets misbehaved, the ESC can shut down immediately and advise the AP. The AP can then take additional action.
10. Opportunities for automated ESC tuning specific to the motor/prop in use.

The way I see this ESC make a difference is when abnormal situations occur. The current AP's cannot be informed of failure, so it would simply send a signal to "run faster", which, guaranteed, has a disastrous effect to mosfet or motor and could therefore worsen the situation. Soon as the AP is informed something is wrong, it could sound an alarm, activate a chute, disable the counter motor... you suddenly have options!

To spur innovation in this area, I'm considering to setup a kickstarter and actually manufacture around 1.000 or so at a professional PCB house. Aren't these features indispensable for an ESC made in 2014? Would you back it?

From survey to visualization

Posted by Gerard Toonstra on July 16, 2014 at 6:25pm

So I had some time to look into the 3D reconstruction and visualization some more and managed to make significant improvements to the workflow. In the first results I didn't clean up points and applied the texture to the entire model in one go, straight from generation -> blender. This makes the UV atlas ( a parametrized texture, where it decides which surface gets what ) really fragmented and you end up with a very small number of useful pixels in the texture, losing about 50% of useful space. Increasing this to 2 textures doesn't really help that much.

So today I took a different approach: spend one day cleaning up the model, splitting up the data and iterate towards incrementally better results. My workflow now uses the following:

Generate the dense point cloud. I used "medium" settings, as high would get me 54M points and I don't have that much memory to post-process those results. I work with around 16M points.
Delete stray points. More time spent here improves mesh results because you generate fewer surfaces that are parametrized in the UV atlas. So try to get rid of all. Make sure to delete points under and above the mesh. Try to delete points inside buildings too (which are not visible and generated in an attempt to reconstruct the roof).
Remove trees and vegetation. If you want trees, you should recreate them using a 3D package. They never look very good from survey results. This does create some gaping holes in the ground, because the trees consumed ground detail from the photos.
( close gaps where trees used to be? ).
Classify the data points to separate the meshes you're going to create. Separating the meshes really helps to reduce the wasted space in the UV atlas (the texture). Ideally, you want to get rid of all vertical surfaces (surfaces > 45 deg) . Classification is done by selecting 'building' or 'ground' for each data point (unfortunately yes, each one).
Select all data points and classify them as ground.
Use the lasso tool to select an area around a building you want to classify and also select some points of the ground area around it. Classify them as building.
From a slightly oblique view, select the points near the building where you want the building edge and move that selection outwards. Reclassify the selected ground points as ground. From 2-3 different views around the building, you now have a clear separation of 'building' and ground.
Do this for all buildings.
Try to remove other vertical surfaces like cars and fences.
Build your mesh using only points classified as ground.
Build the texture for the mesh. For the ground, the "ortho" projection works well at 8192 if you don't have many vertical surfaces.
Export the ground model as obj or whatever you prefer.
Build your mesh using only points classified as buildings. This replaces your mesh, but not your dense point cloud.
Build the texture for this mesh too, but use "generic" projection instead and 8192.
Export the "buildings" model as obj or whatever you prefer.
Import buildings and ground into blender or some other 3D tool. I use 'cycles' in blender and need to "use nodes" in the material settings and select the texture.
I select my "all buildings" object and separate each building to make it a separate object instead. This allows me to hide stuff I"m not working on and makes it easier to edit results from different viewpoints. Then I activate the building I work on, the ground and usually start with a new cube that I try to fit into the building. From there I split the cube into fragments to extrude and push back surfaces and iteratively work to add more detail to the mesh, trying to make it as much as possible like the original. Obviously, the better data you have of the original, the better this works out. In a sense, the photoscan generated mesh is a 'proxy' for me to use as a source.
I then delete the generated object and keep the clean building. At this time you have as much detail as you can reasonably could get so the original building is never going to be needed again.
I then export each separate object, one-by-one from blender and import it back into photoscan.
It appears as a single mesh and I regenerate the texture once again for that building with the new geometry. This time I select a small texture size like 256/512/1024. That's fine for aerial visualizations, but it depends on the size of the building.
When the texture is rebuilt, I export the single building from photoscan again and reimport it into blender, substituting it for the one I had that had no textures. It now has a cool new texture with the original dirt and grime and stuff.
When all are done one-by-one, the scene looks amazing especially from a bit of a distance. The sharp corners look much better as well as the removal of the bubbly surfaces. It is possible though that the surface doesn't exactly fit the wall, so it may loop over to the roof or vice-versa. That's because it's always a bit of guesswork with geometry.

Further improvements can be made from there. You could decide to tweak the geometry a bit more.

It's possibly a good idea to classify the ground close to buildings not as ground but as something different. I found that if you have no ground points for the building, it sort of seems to wave upwards and you don't have a proper measure for the ground distance. You can see this happening in the video when it flies to the big building and you look at the little shed roofs near the bottom and the 2 buildings on the right. So it's a good idea to leave some ground points there. You'll want to create the mesh then with 2 groups so you get the ground points in each mesh.

Where can you take this? Well, the guys here used a helicopter and photoscan to generate a 3D model of a spot in South Africa called Sandton City. They used that model similar to how I worked; as proxies to make their own 3D models, but they modeled the buildings the way they wanted to get better control over how it can be disassembled. It also reduces the poly count. The exact spot, size and hull are pretty well defined by photogrammetry, so you can get an artist to work out the interior bits. You can see how this works in the "making of" video. They only took 9 weeks!

All these efforts simply dwarf by the efforts from AEROmetrex of course.

3D rendering of an area taken in nadir view

Posted by Gerard Toonstra on July 13, 2014 at 5:30pm

This is a short animation of the quality you should expect from about 220 images in nadir view over a 100x200 area. The texture is a single UV atlas of 8192 and at this size a little bit stretched when it comes to detail. Nevertheless, the mesh detail itself is very reasonably, showing humps and bumps of grass overgrowing the sides and areas with dead leaf and sand lining the trottoir bands. The mesh is already a little bit reduced to 1M polygons and it should be possible to go up to 3M without impacting too much the rendering times.

It's clear that side shots are necessary to fill up the holes in the buildings and that it may even be necessary to remodel the simple structures in a 3D program altogether from a 3D approximation. Vegetation remains a problem. Together with the original photos it should be possible to come up with some stunning reproduction methods. I think in this case it's better to reduce the area a bit to maintain the texture detail.

Point cloud densification and orthophoto generation

Posted by Gerard Toonstra on April 24, 2014 at 5:20am

This is a continuation of previous blog posts about how 3D reconstructions work. In this image you see a dense reconstruction of the same data set used before. I removed points that have less than 3 matches. You can see gaps where there were trees. Vertical surfaces also have very little data because they are typically in the shade and therefore have poor gradients for matching.

An interesting detail is that I processed the same dataset in visualsfm as well and I actually got a lower-quality result there. The main issue with that output was that the radial distortion parameters weren't estimated correctly. Fortunately, you can recognize such cases, because the point cloud has an overall arcing shape. You can still see that also in this data there's a slight arcing effect, as the roof isn't 100% straight, it sort of arcs over.

When that happens, the densification process has much more difficulty in generating the point cloud and it only generates 30-40% of the points that it would generate in other cases. In other words, the better the estimation of camera parameters and the positions, the better the results of the densification. If you work with visualsfm, you should pay attention to that arcing effect.

As a final step, it's possible to generate an orthophoto from this data. There are actually two main strategies for creating orthophotos. One is through homographies and concatenating them together. A homography describes a relation between two photos that have overlap. You can then apply a transform to every image and insert the results in a work image, the final orthophoto. Such ortho's are iteratively created and when you apply blending to it and exposure compensation and the correct warping algorithm, you get results like this, similar to MS ICE:

It looks very nice, but there are mismatches in the data, for example the hedge at the bottom and the slight mismatch near the road and the shadow of the shed. You can also see how this method prunes data of low confidence and just leaves it out of the image. So if you have poor overlap or images with different exposures, you should expect poor matches and more artifacts in the end result with this method.

In the example below I ran the surfacing algorithm to generate a 3D mesh of the area and then reprojected the original photos on that mesh. The mesh is then viewed from an orthographic projection from above, hence "orthophoto". This strategy should theoretically yield better results because perspective issues are more reduced (it would simply project pixels of a wall on a vertical surface which later should be invisible). That only works though if the model has a good estimation of height and vertical surfaces. In practice vertical surfaces aren't reproduced very easily, as these are usually in shady areas, which makes matching difficult. You'd typically get blobby artifacts around such regions. Here you can also see that in areas where the height of the 3D model was estimated poorly, you see an overlap between images, because the pixels get projected on either side on the ground. The roof of the shed demonstrates that issue.

This image doesn't look as nice as the first one because it didn't yet apply a strategy for blending and exposure compensation. In areas where the 3D model is poor, you'd expect artifacts. This happens primarily near the borders.

As a final note... in a deliverable of data you typically also have a digital surface model. That can be created very easily from a 3D mode as well by simply testing the height of the data at each pixel of the image. It is essentially a "depth map" of the data.

The European Commission has today proposed to set tough new standards to regulate the operations of civil drones

Posted by Gerard Toonstra on April 8, 2014 at 10:00am

The European Commission has today proposed to set tough new standards to regulate the operations of civil drones (or "remotely piloted aircraft sytems" – RPAS). The new standards will cover safety, security, privacy, data protection, insurance and liability. The aim is to allow European industry to become a global leader in the market for this emerging technology, while at the same time ensuring that all the necessary safeguards are in place.

Civil drones are increasingly being used in Europe, in countries such as Sweden, France and the UK, in different sectors, but under a fragmented regulatory framework. Basic national safety rules apply, but the rules differ across the EU and a number of key safeguards are not addressed in a coherent way.

The new standards will cover the following areas:

Strict EU wide rules on safety authorisations.
Tough controls on privacy and data protection.
Controls to ensure security.
A clear framework for liability and insurance.
Streamlining R&D and supporting new industry.

http://europa.eu/rapid/press-release_IP-14-384_en.htm

Accuracy evaluation: is better than 5cm possible with uav's?

Posted by Gerard Toonstra on April 2, 2014 at 6:43am

We did a quick evaluation how much accuracy we could achieve on all axises using a multirotor. We read many accuracy reports from fixed wings and this teaches us that the planimetric accuracy (horizontal) is usually about 1x the ground sampling distance (GSD) (0.5 if you're really good and have better cameras) and that the vertical accuracy (Z) is usually 2-3 times the horizontal accuracy. That's only valid for some altitude ranges, the regular flight altitude for uav's between 80-150 meters. Forward velocity and trigger distance requires a certain altitude to make it work.

Here we lowered the altitude from 80m to 40m and used a multirotor. We wanted to find out whether the vertical accuracy definitely would improve and hopefully establish a 1:1 relationship between vertical accuracy and GSD as well. The reason why vertical accuracy would improve steadily is because there's more perspective in images at lower altitude, so you pick up more height information in each image, which corresponds to better Z estimates.

In this example case we flew at 45 meters with a hexa at a speed of 3 m/s to get a high 85% forward overlap, making it more difficult for a wing to do the same. 211 photos were taken. The GSD produced is 1.41cm.

The photos were georeferenced using 5 marker points that were collected with high precision GPS equipment. The expectation is that when these GCP's are marked in the image, there's about a 0.5-1 pixel deviation, so it's expected that the error in marking GCP's is about 0,5-1 GSD as well. Sharper pictures and better markers reduce that error.

In this case we had 2 less accurate GCP's, so the planimetric accuracy of this dataset eventually became 1.7cm, slightly above 1* GSD. What we confirmed though is that we got a 1.8cm vertical accuracy for this set, (or rather, the residual error from the mathematical model).

This dataset could have been improved as follows:

Better marking of GCP's and more attention paid during GCP marking.
Sharper photos (better lenses).
Higher precision GPS.

In the end, the maximum accuracy that one should expect with this equipment is 1* the GSD and better equipment isn't going to make this magically happen. This accuracy isn't correlated to the real world, that would be a totally different exercise altogether.

Here are some detailed images of the point cloud from close up. Notice the vertical detail of the irregular curb.

And the detail of a house. The planar surfaces aren't warped, a good indication of excellent georeferencing and accurate point triangulation.

This experiment is very relevant, because Lidar is commonly used for "real" precision projects, often considered for work where they need better than x cm precision. Although lidar data is probably as accurate as 5mm, it is also subject to occlusions and the station needs to be moved around a lot to get proper point cloud coverage, so operating it isn't all that easy.

UAV point clouds may always have less accuracy than laser clouds, but they do have the advantage of the bird's eye view: they generate data across an entire region with the same consistency in accuracy and density, whereas lidar isn't that consistently dense for every part of the survey area due to occlusions.

Price makes a big difference too. Lidar stations apparently cost $175k to acquire, whereas uav's probably put you back by $3000. The question that one needs to answer is whether the slight improvement in accuracy is worth the extra money.

What this experiment shows is that also for uav's 2cm vertical accuracy is probably within the possibilities, pending further experiments where datasets are compared against points in the real world.

From twoview to a complete sparse 3D point cloud

Posted by Gerard Toonstra on March 28, 2014 at 4:49pm

Continuing on from the last post on this subject, here's a complete sparse point cloud generated from some 40 images. In the twoview case it became apparent that you can triangulate points from 2 images. In a two-view match you sometimes get inaccurate or incorrect matches, which lead to outliers. If the feature is consistent and static, you can triangulate points from a 3-view instead. Such 3+ matches quite perfectly eliminate outliers, which leaves you with a sparse point cloud that then mostly contains inaccuracies due to (relatively rough) pixel measurements, incorrect distortion parameters, slight drifts in feature recognition, pose fitting errors, etc.

In this stage of processing, the sparse point cloud generation, the objective is to discover camera poses at the same time as adding new points to the cloud so that future matches can take place and the cloud can grow. In this case, I use the point cloud itself to estimate future poses. For each 3D point, I maintain a list which images contributed to that point. Then a new image which has matches with already registered images can figure out which feature match in its own image corresponds to an existing 3D point in the cloud. Then I simply build a list of 3D points and 2D points that should correspond together. When I have that information, I can figure out, based on how the 3D points should appear in the image, where the camera ought to be located. So it's basically "triangulating backwards" from the points to the camera knowing where they are projected on the sensor in 2D and then figuring out where the sensor was located.

When I have the pose, I triangulate matches that I do not yet have in the cloud as new 3D points and grow the cloud a little.

The order in which you attempt to add cameras (images) to the cloud is important because the current state of the 3D point cloud determines how many points you have available for pose estimation. If that number is low, you may have very little or inaccurate information (outliers!) to do the pose estimation. If the pose is bad, the point cloud deteriorates and future poses cannot be determined.

So, how does it work in more detail in a way that makes the solution stable?

"Boot" the point cloud using two images only as in the previous article.
Grab a new image and find all matches with other images that we already have camera poses for.
Create a vector of 3D (point cloud) positions and 2D (image) points which should correspond.
Estimate the pose in combination with an algorithm called Ransac to remove outliers (grab some points, try a fit, see if it can improve, exchange some points for others, iterate towards a best fit).
Refine the pose estimation further.
Triangulate new 3D points that we don't have yet by looking at matches from this image with other images (cameras) we already have in the point cloud.
Refine the new 3D points.
back to 2 if there are more images.
Print out a point cloud for all 3D points that have more than 2 "views" (points which originated from more than 2 images).

This sparse point cloud, although crude, can already serve certain purposes. It still needs to be subjected to a process called "Bundle adjustment", where poses and 3D points are refined further on a global scale. The outcome of that improves the appearance of planar surfaces and further refines the camera poses.

So what does this teach us about collecting uav data?

Always make sure that each feature appears in 3 or more images to ensure it's stable. Too little overlap can still produce point cloud data, but at the cost of having many outliers and low numerical stability of the solution in the processing pipeline. Some processing tools will simply discard the feature, others keep them and attempt to "smooth" them out into the rest of the cloud, usually creating humps or valleys in objects. Make sure the survey area is large enough for all objects you want to have with accuracy. Better data is much better than relying on algorithms to interpolate/extrapolate and in other ways fantasize data together.
In processing this set I recognized that I had a very large number of stray points right above an area where a tree was expected. Turns out that features of that tree were not stable and not recognized in 3+ images, so triangulation produced a very noisy subcloud in that location. Eventually all those points disappeared in the final cloud, leaving a hole in the point cloud at that location, because the ground under the tree was never triangulated. Again: vegetation needs very high overlap.
Adding (correct) GPS to images reduces processing time if the pipeline knows how to use location data. In this set I used the telemetry log (over data), which contained errors, sometimes 60 degrees out. Not all tools (not even commercial ones!) deal with such GPS errors or missing information correctly. Worse even, the images were eliminated from the set, reducing local overlap and thus the number of views per feature, which could lead to bad pose estimations and local inaccuracies.
It even further explains why surfaces like water cannot be mapped. Everything that's moving around while pictures are taken result in features that match in 2 images, but not the third. Even if it matched consistently, it will eventually be filtered out as noise.

Interesting ideas:

- use two cameras instead of one, horizontally apart even by just a little bit. This will double the number of images and increase the chances to reproduce vegetation correctly (stereo imagery without the "snapped at the same time" constraint).

- variable speeds and CAM_TRIGG_DIST for a mission? When over simple geometry speed up, when over complex geometry slow down to improve the match quality.

A twoview 3D reconstruction

Posted by Gerard Toonstra on March 23, 2014 at 6:11pm

Click for big!

In the last post just two days ago, I talked about the fundamental matrix and a homography which allowed 2D images to be warped in such a way that they overlap. That technique works a bit better if you take the photos from the same perspective point (more like a tripod looking around), because there will be less perspective distortion.

In this post, I'm discussing a bit more how 3D reconstructions are made. Using some photos from the same dataset as before, it will become apparent what good features are and how these eventually result in good or bad data. I'll try to upload them at maximum resolution so you can zoom in a bit this time. Warning, they're pretty big, 8000x3000. I hope it maintains the original size.

This is a picture demonstrating the inliers for the fundamental matrix solution of this image combination. In the previous post I discussed how algorithms like SIFT and SURF recognize features on the basis of local gradients. In this image you should be able to see exactly how these 200 features are recognized and what a good feature is. As you can see, areas that have poor local gradients aren't matched easily, my algorithm prunes these very early. Features that are really excellent and unique are shadows on the ground. That's because they are flat areas so their gradients don't change and the sun through the leaves creates irregular shapes that are good matches at full scale, but also increasing scales as happens in these feature recognition algorithms. The irregular rooftop is also a great source for features. It's easy to see some more global areas that probably did match, but don't stand out as very strong keypoints. What does this mean? It's important to select the right time of day for taking pictures! Hard cast shadows with strong sun may cause local gradients to disappear, a very low sun with soft shadows may not emit enough light for a suitable shutter time and with the sun right above your head the shadows may be minimal. It's a great area for research on what defines good shadows for perfect 3D reconstructions.

So how did I get from this image to the one above, where the points are 3D triangulated? Through the fundamental matrix we find the essential matrix if we have a camera that's calibrated (focal length and image center point). From there, the camera projection matrix P is derived for each image. That camera projection matrix describes a rotation and translation from one camera to the other in 3D space. The information it used to derive that is a list of normalized 2D coordinates from both images (our pixel matches!). Normalized means: radial distortion removed.

From here, things are starting to become simple. Knowing the orientation of cameras in 3D space (in this simple "2 camera virtual world"), then we can triangulate the matched 2D points in each image, basically a projection of a 3D point, to derive the X,Y,Z coordinate of that 3D point itself.

It's important to mention here that the actual position of that 3D point in this system is highly determined by the distance between the cameras. Unfortunately we don't know that yet, so the scale at this moment is arbitrary. For applications like stereo vision this distance *is* known, that's why triangulated points in such machines can derive pretty accurate depth information. In our case, we could scale our P matrix according to our GPS 'guesses'.

This is only a two-view solution. Through a process called "registration" though it's possible to incorporate more cameras in this simple 3D space because a lot of these cameras are associated through other image combination analysis. You'd apply the same process over and over again, every time creating more and more points and of course filtering similar points. Each 3D point in such algorithms usually has a list describing the cameras and the 2D point index corresponding to that 3D point, which is useful for refining the solution.

About refining... what I did not discuss here is bundle adjustment, smoothing and point filtering. You can see outliers in my solution and probably there's a bit of point jitter on surfaces that should be planar. There are techniques that can be applied to remove unwanted points automatically (mostly statistical analysis methods), smooth the point cloud and larger scale adjustments that sweep through your entire solution to reduce the overall error and derive the best fit by manipulating the constraints that you imposed during the construction. For example, you could relax constraints for a camera that had a very poor HDOP, whereas one with a strong HDOP/VDOP at the time could have a more stringent constraint. In the process of finding a better solution, this eventually leads to a model that doesn't converge so much to the best "average", but one that leans more to known correct measurements and has a low emphasis on possibly bad information that looks good.

2D image post-processing techniques and algorithms

Posted by Gerard Toonstra on March 21, 2014 at 8:41pm

A number of people on this site are using their vehicles for aerial mapping. There are tools available for image stitching, which basically produces a rough idea about what the terrain looks like from above. Image stitching does not compensate for perspective and usually contains lots of artifacts. The reason why some older tools don't really work that well is because they rely on older algorithms for image matching which are not as scale or rotation invariant.

For example, have a look at the image below:

Here we see two images that have been matched together using a "SIFT" algorithm. That algorithm is scale and rotation invariant, which means images that are slightly different in scale, are rotated in any direction can still be matched together reasonably well. In this example, it's easy to see it also deals with changes in luminosity very well. These algorithms look at local luminosity gradients (not actual values), so they detect places where you have very specific, irregular changes, but changes that are consistent between images. This makes large, uninteresting areas invisible to this algorithm (as there are no local gradients there). The shadow edge is pretty regular and never matched. Have a look at which features the algorithm detected instead and matched to the paired image to understand what makes a good feature. For an indepth read, this guy explains this really well: http://www.aishack.in/2010/05/sift-scale-invariant-feature-transform/

Now here's the reason why it's important: if you understand how this algorithm works, you can also get a better understanding how to shoot your images and what to prevent to get good matches. For algorithms like these, organic, flat areas are great. However, trees aren't that great because leaves occlude specific gradients when you change position over them. In other words, if you fly 10 meters further, the way a 16x16 pixel area looked has changed considerably due to wind and what is visible through the leaves: your gradients change completely! That's why such areas need photos taken closer together to be able to get features in the biomass, otherwise they'll end up flat or blobby.

The second image shows the matching pairs of features after the fundamental matrix was established. The fundamental matrix establishes an epipolar relationship between two images. That means that a point in one image is related to another point in the image which must be located somewhere along a line. This makes finding the feature in the other image easier. When you have a camera model, it also becomes possible to triangulate these points to real world geometry.

The image at the top wasn't created with the fundamental matrix, it was created using a "homography matrix". This matrix defines how the two images as planar surfaces are related to one another. So it describes a 2D geometric transform that should be applied to minimize the error between these two images. You can use this to to stitch images, but also for things like "augmented reality", where a 3D camera is matched to your real camera depending on how a 2D marker is matched in the view.

Want to play around with this yourself? I found a very nice Java library, probably easier to use than opencv, with some really clear examples:

http://boofcv.org/index.php?title=Example_Image_Stitching

http://boofcv.org/index.php?title=Example_Fundamental_Matrix

This code is already halfway towards a panorama stitcher. If you calibrated your camera, then you can use the parameters to work on the undistorted images in this case. I don't recommend undistorting images prior to 3D reconstructions because it also distorts pixels and therefore impacts on how features are matched. In the 3D reconstruction pipeline there is a camera model with calibrated parameters, so features do get transformed correctly through that more accurate model.

If you want to georeference your work, have a look here:

http://www.digital-geography.com/qgis-tutorial-i-how-to-georeference-a-map/

Autonomous path planning for robots with blender

Posted by Gerard Toonstra on March 5, 2014 at 7:21pm

Here's a couple of screenshots of a little experiment I did today. This is a slightly reduced mesh from the point cloud example yesterday. In this work I'm using the blender game engine for path planning of an autonomous robot. My "robot" is that green cube over there and it's trying to get to the purple sphere. I'm using a navigation mesh laid over the parts where I don't want the robot to go at all, for example grass, the hedge or that mountain thing. Since my environment is georeferenced, I can simply take the "blender" coordinates from the waypoint list that the planner is generating and in theory feed them to an actual robot navigating the actual real world environment. If the environment is static and nothing changed, this works well. Not a guarantee for dealing with dynamic objects yet, but even the Google car makes a full plan to get to a destination, then tunes this for the actual situation depending on sensors that sense the immediate environment.

I already said the words: "navigation mesh". A relatively recent and better technology used for AI path planning for game characters is a navigation mesh, assuming you're working with characters that touch the ground. It's just a set of polygons that determines where a character/NPC is allowed to go and where it's not. For an impressive demo of how this works, check out this link. ( Some background here: previous algorithms used lists of waypoints. When you've ever seen NPC's in a game stuck in a running position on an object, then those would typically use those waypoint lists. They can't deviate from that path because they can't make assumptions, so get stuck when there's something in the way. This also makes the same robots collide or unable to avoid dynamic objects in their path. )

Setting up a little AI environment for experimenting is surprisingly easy in Blender. A very nice tutorial video showing how to do that is here.

This is all relevant because mainstream drone technology right now relies on operators to do the actual path planning. I think that's going to change in the future when you hook up databases with more specific information, deal with a large number of constraints, execute dynamic scenarios or are unsure of the vehicle capabilities. The idea is to have the computer propose a (list of) routes just like my Waze does, instead of plotting out my road wp for wp.

What blender thus allows you to do is set up a simulation environment and explore the algorithms and not waste precious time on the visualizations. Since it's all from within blender, offering great support for primitives, meshes, lines and points, it's pretty simple to add visual cues to the simulation. The biggest benefit is how easy it is to model a testing environment. Because it's so easy to work with python in the game engine, it should also be easy to 'drive' the blender simulation using an external simulator as well, for example a flight simulator that has a better flight model for uav's.

Making a 3D SBS video from a set of 2D images

Posted by Gerard Toonstra on March 4, 2014 at 1:30pm

In a previous post on diydrones I showed how to make a mesh model for visualization or games from a landscape surveyed by a drone. That workflow started off with the set of bare 2D images and used VisualSfM and CMPMVS to generate the point cloud and camera parameters.

VisualSfM and CMPMVS take a long time to run though and do not always produce the best end results. The commercial alternatives, like pix4d/agisoft/menci/etc. also generate parameter files that contain the camera positions and, with a bit of processing, may produce better point clouds. I wanted to find out if I could script those results and get similar outputs, so I wouldn't have to depend on my own runs to start working in 3D. At least it would take less time to process.

I added some excitement by rendering a 3D SBS video of the end results. Stuff used: pix4d output files and parameters, a custom script to generate bundler.out and list.txt files, meshlab, cloudcompare and blender.

Meshlab expects the camera parameters as a rotation matrix with a translation applied afterwards, not the actual camera position. The geocal file from pix4d contains the position however. A bit of math though shows that T=[-RC] (translation=[-rotation*position], so this is easily derived. The rotation matrix R is calculated from the euler angles, as the matrix in the metriccal file uses a different convention for the axises.

The texturing of the mesh was done as in the other example, but this time I used the point cloud from pix4d only, slightly tweaked by cloudcompare. The pix4d point clouds are in UTM projection, so I had to apply a global shift and then remove that shift manually, bringing the point cloud to some local coordinate system for easier processing. If not, the numbers are too big for visualization. The same global shift was later applied to the camera positions in the custom processing script.

In the end I only had to import the OBJ into blender, set up a stereo vision camera rig, define a path for the camera to follow, join the stereo rendered images together in 3D Side-By-Side fashion and render an animation from the image strip.

It's clear that the level of detail is really low, both in texture and geometry, but that's because the point cloud was downsampled a bit too far and the poisson surfacing was not very aggressive in maintaining angles. It's possible to get better results by maintaining high point density, apply the poisson a bit more aggressively and subdivide the mesh. The most important issue here though was that the overlap was very low. The source material must be of high quality to begin with. Trees and such are flat and projected on the ground because there was far too little information to reconstruct them. Trees with leaves become hunks of goo due to the poisson surfacing. Those are issues that need to be tweaked by hand.

From here, the sky is the limit: import the APM log and refly the mission on this virtual terrain, add some trees, remodel the mesh with primitives to get better looking results, add sky boxes, animate stuff, script this with the blender game engine and make it navigable using the Oculus Rift, use any other game engine to import the landscape and create interface apps that "do" stuff in the real world. For example: plan your mission virtually by flying through it in 3D, then collect the curve and build a list of waypoints.

This experiment shows that it's possible to get miles ahead in a 3D visualization project when you've surveyed the area by a drone first. It takes 4 hours to collect the data, a day to process them 'in the cloud' and only half an hour to get your OBJ from the end results. That means you can have a 3D model of the area ready for remodeling and editing in 2 days.