# catching the drift in DCM

I've been working for a while to improve the ability of DCM in ArduPilot to accurately track gyro drift while at the same time making it less susceptible to noise. It is a delicate balance, but it is a worthwhile thing to improve as it makes such a big difference to how planes and copters fly in practice.

The recent releases of ArduCopter and ArduPlane included a number of DCM changes that came out of this work. The most important ones were:

• slope limiting the changes in omega_I
• scaling the kI term according to the known sensor properties
• using the 'drop z' method to reduce the impact of noise

Each of these changes came out of studying the impact of large amounts of sensor noise and gyro drift on the behaviour of DCM in APM. The first two were fairly uncontroversial, but the 3rd one left some people wondering why it worked (or if it worked!). That is a reasonable question, as we never really put it on a sound mathematical footing - it worked well in the simulator and pilots said it helped in practice, so it went in.

Unfortunately it led to a side effect discovered by Gábor, which is that poor accelerometer calibration and scaling led to inaccurate attitude at larger roll/pitch values. It worked fine for most people, but on some APM1s the attitude error could be quite large.

I've now found a much better way of coping with large amounts of noise while still tracking gyro drift, so I will be 'dropping' the 'drop z' method in favour of the new approach.

First some background. The drift correction in DCM aims to cope with the fact that gyros aren't perfect. If we had perfect gyros and we ran DCM at a very high rate then we wouldn't need drift correction at all, and we could keep an accurate attitude estimate just by updating the DCM matrix using the gyro vector. Unfortunately gyros aren't perfect, so we need a reference vector to do some correction of errors that build up over time.

There are really two key types of errors to worry about. One is short term errors that result from imperfect application of the gyro vector to the DCM matrix (for example, from the fact that the gyros don't have infinite precision - they measure rotation in units of about 0.4 degrees/second). The second type of error is long term drift of the gyro zero point.

We cope with the first type of error quite easily by using the accelerometers and compass (or GPS) as a reference vector to 'push' the solution in the right direction.The amount of push is re-calculated every time we get a new reference vector, and all we really need to do is choose a reasonable scaling factor for this push so that the time constant is reasonable. A higher value will push the attitude to line up with the accelerometers and compass more quickly. This value is called kP in APM.

Coping with the second sort of gyro error - gyro drift - is much harder. This sort of error happens when the gyro starts reading something like 3 degrees/second rotation when in fact the aircraft is not rotating about that axis at all. That scale of drift overwhelms the ability of the short term 'push' term to cope with the error, and the result is a very bad attitude solution.

Newer gyros (like those in the MPU6000 used in the APM2) suffer from gyro drift much less than the older gyros in the APM1. The APM1 gyros can drift at a rate of around 2 degrees/second/minute, which can quickly build-up to be far too much drift for the short term error correction to cope with.

The original paper on DCM by Bill Premerlani provides the basis for coping with this gyro drift that we use in APM. The amount of gyro drift is estimated in a vector called omegaI, with the error at each update of the DCM matrix used to slowly push omegaI towards the right value. The omegaI is an integrator - adding up the errors happening with each step to produce the long term drift estimate. The omegaI value is then used to 'correct' the hardware gyro values, by subtracting off the drift estimate. The constant used to determine how fast this drift is learnt is called kI.

Unfortunately it is very hard to get this integrator learning right. If it learns too slowly then the DCM code won't cope with the gyro drift that may happen in a real APM, and the result is a poor estimate of roll and pitch as the kP term struggles to cope. The aircraft starts to lean over badly. On the other hand, if it learns too fast, then it may mistake sensor noise for gyro drift and incorrectly 'learn' gyro drift that isn't there. That is just as bad, and results in leans as well, in the opposite direction.

It is this second effect, where large amounts of noise can lead to omegaI build-up that I was particularly interested in. Some flight logs showed it was a very real effect in some aircraft. It doesn't happen in every flight, but when it does happen it can easily lead to a crash.

Here is a plot of omegaI being learnt in the SITL simulator when we have a noise level of '2' (thats the sort of noise level where you start to think the aircraft will fall to pieces with vibration).

The 'sawtooth' line is the simulated level of gyro drift which DCM is trying to learn. It is set to 1 degree/second/minute, with a period of 10 minutes. That is quite a high drift level, but is quite possible on an APM1. The 3 omegaI lines show what the DCM learning of the gyro drift did - they managed to track the simulator drift, but not very precisely. In fact, this is much better gyro drift tracking than what we achieved in earlier versions of APM!

The graph below shows the same test, but with a small tweak to the omegaI learning code:

That is much better! The three omegaI terms have tracked the simulated gyro drift much more accurately.

You may notice that the 2nd graph is quite granular, whereas the first graph is relatively smooth. That is because what is being done with the new approach is to save up omegaI changes over a period of 10 seconds, and then apply them all at once.

The reasoning behind this change is twofold. The first is the interaction of the slope limit for omegaI. We introduced a slope limit to omegaI as part of the first round of noise robustness changes. The way it works is that it limits omegaI changes to be at a rate no higher than what the sensor driver claims is possible with the gyros being used. For the APM1 that is 3 degrees/second/minute, and for the APM2 it is 0.5 degrees/second/minute (see libraries/AP_InertialSensor in the get_gyro_drift_rate() functions).

The slope limit helped reduced the impact of noise by preventing error spikes from noise from pushing omegaI faster than the gyros really can drift. We're trying to learn a physical parameter in omegaI (the gyro drift), so limiting it to the physical limits of the device makes sense.

The problem is we were applying the slop limit at the drift correction rate (50Hz for APM and 100Hz for ACM). This applied the limit to each small step, but the line isn't smooth, so the slope limit had a much larger effect than it really should.

The second problem was one noticed by Jonathan Challinger. Jon noticed that omegaI would react over the short term to errors induced due to aircraft turns. The aim of omegaI is to capture long term drift, but it was in fact interacting with the P term in the usual way a PI controller does, and capturing some of the short term error.

So instead of applying omegaI changes directly into omegaI, we instead apply them to omegaI_sum. That vector builds up over a 10 second period, then we apply the slope limit, and add omegaI_sum to omegaI. The omegaI_sum vector is then zeroed and the process starts again.

As you can see from the above graph, the difference in ability to accurately track gyro drift is quite dramatic!

You can see the patch in my DCM-wip branch. After this is flight tested in both APM and ACM I expect it will go into a upcoming release.

Cheers, Tridge

Tags: DCM, IMU, gyro

Views: 1047

### Replies to This Discussion

Hi Tridge,

Nice work.

Regarding the mathematical basis for the combination of 10 second integration followed by slope-limit being more effective than slope-limit followed by integration, there is a good reason:

The slope-limiting operation is a non-linear operation. Non-linear operations cause a shift in bias in the presence of noise. In this case, if you use slope-limiting ahead of the integration, in the presence of a lot of noise, the non-linear effect shifts the real bias that is in the signal toward zero, which is why the drift estimates in the first plot were having trouble converging: the slope-limiting operation was reducing the signal more than it was reducing the noise.

So, things got a lot better when you did a 10 second integration prior to the slope-limiting. By improving the signal to noise ratio of the signal going into the slope-limit it reduced the amount of bias shift toward zero created by the non-linear operation, allowing the signal to get through.

Personally, I try to avoid operations like slope-limiting (they cause a bias shift) and 10 second averaging (it increases latency), but the 10 second averaging does counteract the bias shift, so overall you have worked out a method to deal with large amounts of noise.

The way we deal with noise and vibration in MatrixPilot is to use a combination of a high sampling rate (8000 samples/per second), and an "integrate and dump" filter. The combination of the two techniques is approximately equivalent to running the DCM algorithm and drift compensation 8000 times per second. We have found it to be very effective.

By the way, "integrate and dump" used in MatrixPilot is exactly the same method that you are using for your 10 second average, except we take a 0.025 second average, which contains 200 samples.

Finally, there is more test case I would be interested in, if you are willing to run it. I would like to see a comparison of the offset estimation without any slope-limit at all, with the same noise that you used, and same sawtooth. One method would use the 10 second averaging, one would not.

Best regards,

Bill

Hi Bill,

So, things got a lot better when you did a 10 second integration prior to the slope-limiting. By improving the signal to noise ratio of the signal going into the slope-limit it reduced the amount of bias shift toward zero created by the non-linear operation, allowing the signal to get through.

ok, that makes sense. I was thinking of it in more graphical terms (if you limit the slope of a noisy graph at lots of points, then the graph cannot grow very fast), but I'm glad you agree it should work :-)

The way we deal with noise and vibration in MatrixPilot is to use a combination of a high sampling rate (8000 samples/per second), and an "integrate and dump" filter. The combination of the two techniques is approximately equivalent to running the DCM algorithm and drift compensation 8000 times per second. We have found it to be very effective.

we also use "integrate and dump" in APM, just not at the same rate. We sample at 1kHz (on APM1), and then "dump" at 50Hz in ArduPlane and 100Hz in ArduCopter. I haven't actually tried to push that sample rate up to 8kHz, but I suspect it wouldn't work with our hardware.

In APM2 with the digital MPU6k gyros/accels we get data at 200Hz, so for ArduPlane we "integrate and dump" just 4 samples per update. Internally the MPU6k is doing its own filtering, currently setup for 95Hz bandwidth.

Finally, there is more test case I would be interested in, if you are willing to run it. I would like to see a comparison of the offset estimation without any slope-limit at all, with the same noise that you used, and same sawtooth. One method would use the 10 second averaging, one would not.

sure! Here is the result:

as you can see, it tracks omegaI fine, with a very similar result to the slope limited code with 10s sum. That doesn't mean I am going to remove the slope limit though! I really like the fact that we constrain the omegaI rate of change to the drift limit of the sensors, especially on the APM2 where we know the gyros won't drift fast (so we limit the slope to a much smaller number than what we use on the APM1). It means that no matter what happens with the noise (which may not be anything like white noise) that we won't jump the omegaI up to unrealistic values.

What this really shows is that the core of the problem with the original APM DCM code was the "accel weight" code. As you have pointed out previously, that introduces non-linearities which impacts on omegaI and was likely the main problem with the original code.

Cheers, Tridge

Hi Tridge,

I understand that you like to constrain the rate of change of the drift correction, but, as evidenced by your new plots I do not think there is a strong case for slope-limiting, and there are three good reasons not to do it:

1. Slope-limiting is a non-linear operation that can create an offset shift under certain conditions.

2. There is no way to guarantee an upper limit of the gyro drift under all conditions, there are lots of ways the gyros can drift much faster than expected, such as an unexpected gyro power supply sag. In the end, the drift reference vectors are the ultimate authority on drift rate. Slope-limiting when the rate of change of drift really exceeds the slope limit could cause a crash.

3. The 10 second integration step introduces a latency in a feedback loop, which reduces the stability margin. Which means there is less room to raise the feedback gains before the loop goes into oscillations. This is why we changed the method we use in MatrixPilot to compensate for GPS latency from a lag filter on the IMU estimate of yaw, to a lead filter on the GPS estimate of yaw. It increased the range of allowable values of the PI gains.

That said, "at the end of the day" its probably a judgement call, you could make a case either way, but I do not think there is a clear cut best way to do it, so I will agree that what you have done is a perfectly reasonable approach.

Best regards,

Bill

Hi Bill,

1. Slope-limiting is a non-linear operation that can create an offset shift under certain conditions.

I can see how that happens with the slope-limit applied at each point, but I don't see how it will cause a problem when we are applying it at 10s intervals. That means we are applying it every 10k samples from the ADC on APM1, and every 2k samples on APM2. Surely any non-linear effects would have to be tiny?

2. There is no way to guarantee an upper limit of the gyro drift under all conditions, there are lots of ways the gyros can drift much faster than expected, such as an unexpected gyro power supply sag. In the end, the drift reference vectors are the ultimate authority on drift rate. Slope-limiting when the rate of change of drift really exceeds the slope limit could cause a crash.

The original motivation for this DCM work was crashes caused by omegaI 'learning' large amounts of drift that wasn't there when we had no slope limit. So the problem goes both ways and is a general problem with learning gyro drift. Of course, we now think that the real bug was the non-linearity of the accel weighting, but I'm still more comfortable with limiting the omegaI slope according to what Pat has told me about the design of these particular gyros than just trusting the learning to not go too far.

At least with a slope limit I know that if the learning does get out of control it will do so slowly, perhaps giving a chance for a pilot to recover. Having no limit on how fast the drift can change seems more risky to me. I will watch for any downsides in the logs however.

These sorts of algorithms can be hard to get right. As we found with the mag offset learning, there can be unexpected behaviour when you start throwing real noise into the algorithm.

3. The 10 second integration step introduces a latency in a feedback loop, which reduces the stability margin. Which means there is less room to raise the feedback gains before the loop goes into oscillations

This one puzzles me, as to me the aim with the omegaI term is to avoid fast feedback. You don't want omegaI to 'learn' quickly from the error that may come from a fast turn - that is what the P term is for. It was simulator plots from Jon that initially made we think of adding the 10 second sum in front of omegaI, as he showed that omegaI was indeed reacting to the short term movement of the aircraft (probably due to inaccurate acceleration correction), and not just to the real drift. We could just lower kI, but then we can't cope with the size of drift that we think is possible with the APM1 gyros.

Ultimately the solution is to move away from this drift correction system completely of course, and use something based on your new system once we sort out the issues, so hopefully this system will be ancient history soon, but for now I'm more comfortable leaving in the slope limit, but removing/reducing the impact of the non-linearity by summing over 10 seconds.

Cheers, Tridge

Hi Tridge,

Regarding the 10 second integration step, it introduces a latency in the feedback loop. Latencies are never a good idea in a feedback loop. I will give you a simple metaphor...suppose that it took 10 seconds for the temperature of the water coming out of your shower head to react to changes that you make to control the water temperature. It would make it difficult for you to adjust the temperature.

Yes, the aim when setting the gains of the drift correction feedback is to avoid fast feedback. But, for any given set of gains, if you were to increase the latency in the feedback, that would create a phase shift in the feedback which would eventually result in oscillations. The net effect of introducing a latency into a feedback loop is to increase the risk the loop will go unstable. A general rule of thumb in feedback loops is that you should never create latencies that are not absolutely needed. For more information about the downside of latency, see this report, for example.

If you insist on using a slope limit, I think a better approach than using the 10 second integration would be to simply find out what the maximum slope is under worst case noise conditions, and set the limit to be just a little bit above that.

By the way, this discussion got me to thinking again about the mag offset learning. I am still puzzled as to why I am not observing the unexpected behavior that you observed. So I am wondering if there perhaps was a slope-limit somewhere in your computations?

Best regards,

Bill

Hi Bill,

Thanks for continuing to debate this with me! I hope you don't mind me being so stubborn.

Regarding the 10 second integration step, it introduces a latency in the feedback loop. Latencies are never a good idea in a feedback loop. I will give you a simple metaphor...suppose that it took 10 seconds for the temperature of the water coming out of your shower head to react to changes that you make to control the water temperature. It would make it difficult for you to adjust the temperature.

indeed it would, and if you were trying to use that PI controller to control the temperature then it would be a foolish thing to do.

I don't think it is a good analogy though. It's always hard to come up with a good analogy, but at the risk of pushing it too far, let me see if I can adjust the analogy to better match how I see the situation.

Let's imagine that you again have your shower head putting out water, and you want it to come out at a steady temperature. Now imagine you know there are two underlying causes of the temperature changing

1. poor fittings in the shower meaning the mixing from the hot and cold changes proportion
2. the sun heating up the tank in the backyard means that the cold water gets warmer over time, and then cools again when shade covers the tank

These are two very different physical processes. The first happens rapidly, and the second over hours. How would you build a controller for that? Well, you could start with a simple proportional controller for coping with the short term temperature changes from the poor mixing. That will do quite well, but as the sun heats the tank, the proportional controller will have to work harder to cope with the long term error.

So you now add a estimator for the temperature in the tank. How do you do that? You use the long term temperature error as a proxy for the tank temperature. What sort of controller suits that? An integrator is nice, with a long time constant that matches the volume of water in the tank and how fast you know it can heat up and cool down.

That is (roughly!) how I see the situation we have with the drift correction. We have two physical processes going on. One is that DCM isn't perfect (the poor fittings), so we expect our attitude estimate to be off in the short term. The other is that the gyros slowly drift over time, due to (for example) the temperature of the silicon changing.

So I very much think of the PI controller not as one controller, but two. We want a simple proportional controller to deal with the error we get from the imperfections inherent in the DCM update process. However we have this separate physical process going on, where the gyros drift over long periods of time, on the order of 3 or 4 orders of magnitude longer time in fact.

To estimate that source of error we need a controller that has memory, and the simplest memory controller is an integrator. Where I think we disagree is on whether that integrator should be involved in fixing the short term error. I think it shouldn't - it should just try to build a long term estimate of the real physical process of gyro drift.

So what does this have to do with the slope limit? Well, let's imagine we didn't want to correct for gyro drift, and instead just wanted to log the gyro drift. That is a perfectly reasonable goal, as you may want to know if gyro drift is a problem, and not yet try to correct for it (as you haven't yet characterised its properties, so you don't yet know how to control it).

What you would do in that case is still calculate what we call omegaI, but not use omegaI in the DCM update code. It would just be logged as an estimate of the gyro drift. That is equivalent to setting the "10 second" sum equal to the length of the whole flight.

Now imagine the UAV operator is watching the logs on the ground, and sees that this omegaI long term estimate of the gyro drift is getting quite large. It gets to 4 degrees/second and his aircraft is starting to lean over as the simple proportional controller struggles. What would the operator do? The obvious thing is to adjust the zero point calibration of the gyros by 4 degrees/s, and zero omegaI. The P controller (which has no memory) is perfectly happy with this, as it now has less work to do. It won't suddenly start oscillating. We've just removed a systematic source of error from the control task.

That is how this 10 second sum is working. The aim is to take advantage of that 3 or 4 order of magnitude gap between these two very different physical processes to have separate controllers, preventing short term interaction where the integrator tries to cope with the short term errors in the update process and preventing the P controller from being overwhelmed by long term drift, which is an entirely separate physical process.

By the way, this discussion got me to thinking again about the mag offset learning. I am still puzzled as to why I am not observing the unexpected behavior that you observed. So I am wondering if there perhaps was a slope-limit somewhere in your computations?

I don't have a slope limit as such in the mag offset code, but there is something that plays a similar role. This is the code:

// limit the change from any one reading. This is to prevent
// single crazy readings from throwing off the offsets for a long
// time
length = diff.length();
if (length > max_change) {
diff *= max_change / length;
}

As you can see, it limits the size of a single change in the offsets. I'm quite sure it is not the cause of the difference in our results however. I ran tests with and without this, and found that the base mag offset algorithm behaved very badly in the face of noise even without this delta limit.

Luckily the mag offset algorithm is extremely easy to model and test. To see the effects of noise on the drift correction I needed to run a full aircraft sim over long periods of time, but with the mag offset problem I have a small python script that allows me to take a flight log of mag vectors and try different algorithms with different noise levels to see how it behaves. I don't know if you are a python coder, but if you are the experimental harness is here:

and here is a good log file to experiment on:

that is a log file with offsets of 100, -120, 130. You can adjust all of the different parameters in the learning algorithm on the command line, disabling or enabling the various tweaks to the algorithm and adding noise. It is very easy to see the base algorithm behaving very badly with noise.

(note, you need pymavlink installed to use this tool).

Cheers, Tridge

Certain IMU's may vector coordinate with a gravitational pull. This would explain why some cell phones with IMU's work almost accurate all the time.

1

2

3

4

5

6

7

8

9

10

## Groups

878 members

137 members

165 members

10 members

• ### Agricultural UAVs

283 members

Season Two of the Trust Time Trial (T3) Contest has now begun. The fifth round, 3D model building requiring contestants to survey and reconstruct a building or natural scene from aerial images.

A list of all T3 contests is here.