While flying our Telemaster last Sunday with two APMs we noticed something very strange about LOITER mode. The above track is a small snippet from our flight path. We entered LOITER mode on the left, with a loiter radius of 25m. As you can see from the spiral path, the plane did not loiter at all well, instead spiralling to the right.
It would be easy to dismiss this sort of behaviour as just a one off, but I've seen this before. Our plane has loitered well sometimes, but sometimes it was way off, often spiralling away from the right position just like this. I wanted to get to the bottom of it, and I thought that explaining the process of diagnosing and fixing a bug in APM might help other people to better understand their own UAVs. So read on for a long story of a set of bugs in the APM/ACM magnetometer driver.
The first thing I needed to do was look more closely at the raw data that I had logged for this flight. To do that I wrote a little utility called mavgraph.py, built on top of pymavlink. It takes MAVLink logs and allows you to graph arbitrary mathimatical expressions of any of the MAVLink fields. Someday I'll write up a proper tutorial for mavgraph, but for now if you want to use it have a look at the pymavlink github repository. The utility you want is examples/mavgraph.py. You'll probably find you will need to install a bunch of python packages to make it work. As I mentioned, some day I'll add instructions to the APM wiki unless someone else beats me to it (hint hint!).
If you are wondering how you get MAVLink logs, the major ground stations can record them. mavgraph will accept the format that the planner uses, or the format that mavproxy and qgroundcontrol use. I haven't tried it with HK GCS.
The key to good loitering is heading, so lets plot the GPS and compass headings we got in the above loiter:
mavgraph.py --condition='SYS_STATUS.mode==4 and SYS_STATUS.nav_mode==8 and RAW_IMU.usec > 908942404' Telemaster/logs/2011-08-07/flight8/mav.log VFR_HUD.heading GPS_RAW.hdg
That gives this:
as you can see, the GPS and compass headings did not agree at all well! As we had two GPSes on this plane, I was able to confirm that the GPS values were OK, so what was up with the compass? It looks like its range is badly squashed. That certainly explains why the plane couldn't loiter, as it didn't know which way to fly!
The next thing was to check if this was a problem during the whole flight, or just at certain times. If it was the whole flight then perhaps the compass is broken. We can run this to show just the times the plane is flying:
mavgraph.py --condition='VFR_HUD.groundspeed>3' Telemaster/logs/2011-08-07/flight8/mav.log VFR_HUD.heading GPS_RAW.hdg
as you can see, the compass did track the GPS heading for some parts of the flight, but for a lot of the flight it was terrible. You can also see that I mostly did left hand circuits during this flight!
I started to suspect the magnetometer offsets. The magnetomers used by APM/ACM have two types of calibration. One is a linear scaling, and is computed at startup by asking the magnetometer to apply a 'strap' excitation and measuring the resulting field. The other is a set of offsets that account for local magnetic fields in the plane, and the properties of the magnetometer chips. The offsets are updated dynamically at runtime using a system called 'offset nulling' invented by Bill Premerlani and implemented by Doug Weibel. You can see the code for APM in libraries/AP_Compass/Compass.cpp.
Unfortunately the offsets that the null offsets code calculates were not being logged by APM (more on that later). So all we could do was use the RAW_IMU log, which does contain the x, y and z magnetometer readings after the offsets are applied. We can use those values to post-compute the heading from the raw magnetometer values. The formula is in the APM Compass code, and it translates into this rather long mavgraph command:
mavgraph.py --condition='VFR_HUD.groundspeed>3' Telemaster/logs/2011-08-07/flight8/mav.log 'VFR_HUD.heading' GPS_RAW.hdg 'fmod(degrees(atan2(-(RAW_IMU.ymag*cos(ATTITUDE.roll) - RAW_IMU.zmag*sin(ATTITUDE.roll)), RAW_IMU.xmag*cos(ATTITUDE.pitch) + RAW_IMU.ymag*sin(ATTITUDE.roll)*sin(ATTITUDE.pitch) + RAW_IMU.zmag*cos(ATTITUDE.roll)*sin(ATTITUDE.pitch)))+360,360)' --labels='APM heading,GPS heading,computed_heading'
Here is one part of that graph:
this shows that the basic compass calculations in APM are not at fault. The post-computed heading does match what APM calculated while flying (roughly!). So, what is the problem?
To get to the bottom of this we need a few more bits of information. First off, we need to start logging magnetometer offsets and other sensor offsets in our MAVLink logs, so we can properly check all the calculations in a post flight analysis. I added this patch to MAVLink and the code in APM to use it. So now we can get MAVLink logs showing the raw sensor offsets, not only for the magnetometer, but also for the gyros, and barometer.
Next I needed a reproducible test case. Waiting for the fault to show up again while flying didn't sound like a good idea, so I built a little test rig on my desk.
The test rig consists of a pan/tilt mount with two APMs mounted on top, both with magnetometers. The pan/tilt is controlled by servo outputs from one of the APMs. I then wrote a test script using pymavlink to run the test. The test script pans the two APMs around and tilts it when going in one direction, leaving it flat in the other direction. It also resets the magnetometer offsets at the start of each test (using another APM MAVLink message I added for this test).
This test rig allowed me to capture good logs of the two APMs moving about in a predictable manner. On one of the APMs I put a 5883L magnetometer, and on the other I put a older 5843. Here is the result, along with the 'true' heading calculated from measurements I took using a bushwalking compass.
(I should note that I am skipping quite a few intermediate steps I took this week in finding this bug. I don't want this post to turn into a book!).
As you can see from the above graph, the 5843 did track the true heading fairly well, but the 5883L didn't. So it looks like we have a bug in the 5883L driver. That means it's probably my fault, as I was the last one to work on that driver. So I am on the hook for fixing it. Darn!
Also note that this test is the worst possible case. It starts with zero offsets, and it only rotates the mags through less than 180 degrees, and doesn't roll them at all (I only have a 2-D mount available). So if you are wondering why your 5883L does better than this, then that is probably why.
The above result led me to start looking very carefully at the 5883L driver in APM. I found several problems. One was that we just accepted the first value from the mags for calibration purposes, but logging showed that sometimes that first value wasn't very good. So I added code to get at least 5 good values, where 'good' means not more than 30% off the expected value for the strap field. I also noticed that the expected values for each axis didn't match what the specification said they should be. The spec says that the X and Y values should be higher during calibration, and the Z value a bit lower. I found that the X was higher, and the Y and Z about equal. That was easy to fix in the code as well.
Next, I looked carefully at the gains used. The 5843 mag uses a strap field of 0.55Ga, whereas the 5883L uses a strap of 1.1Ga. That means you need to use different gain values during calibration of the 5883L than during runtime, as otherwise you either risk overflowing during calibration, or you will get lower sensitivity than we would like. So we use two gain values for the 5883L, and only one for the 5843. The problem was we were not scaling the resulting calibration values for the change in gain on the 5883L.
Finally, and perhaps most importantly, we were mucking up the rotation of the 5883L compass. The 5843 and 5883L mags have different orientations, so we apply a matrix multiply to fix that. Unfortunately we had some chicken and egg bugs related to that rotation. The compass init code was using the read() routine to get its values, and that applied the rotation, but the output of the init routine was used at runtime before the rotation is applied. That meant the calibration was incorrectly rotated. It also left some bugs related to init order, as we need to know what sort of mag it is to get the rotation right, but we also need to setup the rotation before we init, as init relied on the rotation. Nasty.
The fix was to extract out a read_raw() routine which didn't do any rotation or apply any offsets, and use that in the init() call, and also use that as the core of the runtime read() call. Thanks to Randy Mackay for this suggestion. Randy and I spent quite a lot of time on this problem together via skype screen sharing, and his suggestions were very helpful.
After all those fixes, this is the result of running the same experiment with the new code:
That's much better! We can also have a look at how the mag offsets evolve during this test, here are the 5883L offsets:
you can really see how the null offsets code is converging on reasonable values. It looks like it converges in about 10 minutes or so, but the heading is good much faster than that.
I'm still not completely happy with the performance of the mag driver, but it's better than it was. If you look at the 'good' graph above you will see that the 'true heading' matches the mag heading over a wide range of values, but the true heading peaks higher. I've double checked the measurements on my table using my bushwalking compass, and I think this is a real effect. It looks like we may be off by 20 degrees or so in some cases for both mag types. That means more investigation and logging. Meanwhile, I think APM/ACM will just treat it as a bit of 'wind' and will fly fine. I'll find out tomorrow when I take the Telemaster up again.
Comments
a small update on this. I flew the Telemaster again today, same HW and config as last time, but with the new mag driver code. Loiter worked very well, and the compass heading matched the GPS heading well throughout the flight. So it passes the flight test.
Wow, Im gonna try and apply this with the 2.0.38 I hope its ok crossing fingers :)
Thanks very much for writing and sharing mavgraph.py. That is going to be a strategic tool for many UAV platforms. (including UAV DevBoard).
Wow--see what I meant about him being a programming god?
The fixes are checked into the library and will be in the next AC2 and APM hexes pushed out.
Fine investigation from start to finish! All driver level code should have some form of functional testing performed, But as this post shows, It can be a long path to success. Thank You for taking the time and efforts needed to complete this, Good Luck in future test flights !.
Good stuff. This might explain a couple of crashes and the compass issues I've had with my quadcopter! Is the updated code available yet, even if it's WIP, so I might try flying it already?
Really nice work! Thanks for the details, gives me some great ideas about trapping some issues.