LOITER and RTL crashing bug

Hi there,

I'm writing to report a very likely bug affecting at least firmwares 3.1 rc8 to 3.1.2 (), at least for hex X frames. The bug manifests when RTL is the last step of an AUTO mission and RTL return altitude is different from the current altitude. The drone crashes consistently in these conditions. Additionally, the drone crashes from LOITER after a non-deterministic time - sometimes is quick (tens of seconds), sometimes it can loiter for minutes. In both cases the symptoms are the same: the drone goes from hover holding output (motors ~ 1600) to shutting down 1-2 motors in 100-200ms, and then shutting down the others in another 100-200ms.  The way it shots down the motors is very specific: it reduces throttle to 12xy where xy is the same for all motors. In the attached logs for example, after LOITER the motor commands are as follows:

RCOU, 487396, 1695, 1645, 1672, 1667, 1443, 1874, 32767, 32767
RCOU, 487495, 1523, 1597, 1517, 1602, 1231, 1824, 32767, 32767
RCOU, 487595, 1398, 1421, 1368, 1451, 1231, 1581, 32767, 32767
RCOU, 487696, 1231, 1231, 1231, 1231, 1231, 1231, 32767, 32767
RCOU, 487796, 1231, 1231, 1231, 1231, 1231, 1231, 32767, 32767
RCOU, 487897, 1231, 1231, 1231, 1231, 1231, 1231, 32767, 32767
RCOU, 487996, 1231, 1231, 1231, 1231, 1231, 1231, 32767, 32767
RCOU, 488096, 1329, 1444, 1401, 1372, 1231, 1545, 32767, 32767
RCOU, 488195, 1392, 1380, 1545, 1231, 1299, 1474, 32767, 32767
RCOU, 488297, 1404, 1369, 1501, 1271, 1231, 1545, 32767, 32767

Obviously 1231 is not sufficient to sustain flight - the fact that all motors have the exact same value (1231) reinforces the likelihood of this being the output of a bug.

The RTL example is similar - the RTL command is executed right after the first RCOU:

RCOU, 742799, 1697, 1603, 1542, 1756, 1571, 1728, 32767, 32767
RCOU, 742898, 1241, 1241, 1241, 1241, 1241, 1241, 32767, 32767
RCOU, 742799, 1697, 1603, 1542, 1756, 1571, 1728, 32767, 32767
RCOU, 742898, 1241, 1241, 1241, 1241, 1241, 1241, 32767, 32767

This time the motors are all 1241 - interestingly different from 1231, but still very close, not nearly enough for flight and all equal in all motors.

In case you wonder if there is something with the drone, both bugs have been reproduced in two different drones several times: the loiter crash three times and the RTL crash twice.

In case it helps, in addition to the dataflash logs I have telemetry logs from mission planner as well as videos, but I assume that the dataflash logs are sufficient, as they had enabled practically everything relevant (I think) - certainly they had ATT, CTUN, NTUN, IMU, RCOU, GPS, and RCIN.

If this is a known bug, please drop a line here, if I can help in tracking it down, let me know.

Best,

Mihai

loiterCrash2.log

rtlFromAutoCrash.log

You need to be a member of diydrones to add comments!

Join diydrones

Email me when people reply –

Replies

  • Hi guys

    Want to add to this, also had a very mysterious crash on a Hexa-X a few days ago in very similar circumstances. Crashed 166 seconds into AUTO flight with a single motor diving down and the others following, enough to send it into a fatal spin. 

    No leading indicators that I can find in the logs - the motor dive seems to come out of nowhere. It's at line 8042 in the dataflash log and line 17823 in the tlog.

    We have suspected RC issues - as soon as it was crashing, sent throttle up (which IS visible in the logs) as well as changing to STABILIZE on ch 5, which is not visible - it sits at 1537, which we had mapped to AUTO. The RC connection got sheared in the crash, so the RTL failsafe occurred just after it came down.

    Logs (dataflash and .tlog) attached, where we actually had MOT and IMU enabled.

    • Francis,I'm not sure it's the same problem: it has some common parts, namely where the motors all shot down and stay down although the right action is obviously not followed. However, in my case the crash *always* comes within a few hundred (100-300) milliseconds after a command (except for loiter) - it seems that the new coordinates after the command mess it up. Also, for me one of the motors goes down first to some low value, and then all the motors go to exactly the same low value (I had 1231, 1241, 1198, etc).
      I'll let you know if I make any headway into this.Mihai

      • Hi Mihai

        I've had two more similar crashes with the firmware taking the hex out of the sky, driving down one motor value and the rest following. The most recent one with an APM2 and in a hex-+ configuration, rather than pixhawk hex-X, but with a very similar signature, during a reasonably lengthy but simple auto mission.

        However our issues do seem to be different - I'm not getting the signature spike in the DRol and DPit (desired roll and pitch) that appears in your logs and seems to be causing the error. My DRol and DPit stay sane even as it comes down. I've been looking for leading indicators for a while but nothing seems to give any warning of the error.

        My other observation is that this error only seems to occur during long flights - not correlated with distance from remote, but correlated with total distance travelled.

        Wondering if you have only noticed this problem occurring with logging turned up? I had a lot of successful flights on APM2 with a hex+, but after an ESC hardware problem enabled the motor dataflash logging. Possible that the logging overheads are causing some kind of obscure memory leak or something in hex that causes unusual motor outputs?

        • Thanks for sharing Francis.

          A few observations:

            - first, not *all* of my crashes have the huge spike in DRol and DPitch - the first few ones do (and clearly saturate in the limits - verified by changing the limits), but I believe that toward the end some don't have the spike in DRol and DPitch. 

          Regarding long flights - I got it shortly after take-off (20s), and I got it after 5 minutes - there seem to be no rhyme or reason.

          Regarding the logging - it did cross my mind, but I ruled it out due to two things: first, when it started (see the first two logs) I didn't have any significant logging occurring - certainly not MOTORS/RCOU or RCIN or IMU. Second, the currently last crash (or the one before) has an impeccable performance monitoring log with 1000 cycles every second and an almost perfect max time - it seems that the CPU was not overly busy. So... I don't think that logging is to blame. The only thing that would happen if I disable logging is I don't get to know what happened. On the contrary, tonight I enabled INAV to perhaps try to figure out what's going on.

          • One more thing - this evening (almost night - 8:35pm) I had the drone recover from what is usually a terrible crash - same exact bug, but this time the entire state was not completely destroyed. A short timeline is here:

              - drone is fine

              - RTL is triggered (line 11176)

              - within 100ms all motors drop to the exact same value (1238) - line 11195

             - within another 100ms motors are back up - the drone only drops 20cm - it was very noticeable to the ear - I clearly heard the motors stopped and restart.

            • Not sure if it is helpful at this stage, but I noticed my motors stop the other day in alt hold. But I think it was, at least in my case, correct. I had some side speed and came to a quick stop, and the copter (due to the forward motion) had climbed a bit, and the motors stopped to bring the ALT back to correct level. But it was kind of scary - all of a sudden no noise, then back to stable.

            • Forgot to upload the log.

              recoveredFromRTL.log

    • I suspect that the cause of the RTL is not important - in our case it was the last step in the AUTO trajectory.

      Hope this will get fixed - I do think that it may be specific to a Hex (or Hex X) frame, otherwise many others would have seen it.

      M.

This reply was deleted.

Activity