Pixhawk BAD ACCEL HEALTH APM 3.2 - collecting data

A few users, myself included, appear to be having issues calibrating accels and/or arming when using APM 3.2 on a Pixhawk.  I see posts here and there, but thought it may be best if those having the issue could all 'check in' to one post so that the developers can see how widespread the issue is and gather data.

My Pixhawk will arm occasionally, and will even fly successfully.  Occasionally however it will not arm and I will receive the pre-arm error "Accels not healthy".  A few days ago I was able to arm and fly thorugh one lipo, then after landing and swapping batteries I could not arm due to this reason.  I have attached the logs from the successful flight, as well as the failed pre-arm logs, to this post.

As you can see from these images there is something amiss with IMU2.  This is a snippit from the good flight, showing values for AccX on IMU1 and IMU2:

3691166950?profile=original

and here are the same values on the second attempt, pre-arm.

3691167027?profile=original

In the 3.2 release thread a few users had mentioned the issue and Randy advised to set the log_bitmask to "131070" so that it will log everything including the pre-arm checks.  I encourage others having the issue to do the same and share the logs and experiences here so that we can find out what is going on here - is there a bad batch of Pixhawks in the wild that only now show the hardware errors due to something new in 3.2, or is there an issue in the APM software?

goodlog.bin

badlog.bin

You need to be a member of diydrones to add comments!

Join diydrones

Email me when people reply –

Replies

      • Developer

        Hi Erik,

        Your log has helped me find an issue with the register checking in the lsm303d driver. Can you try again with the latest build?

        Can you also tell me where you bought the board?

        Cheers, Tridge

  • Thanks everyone for speaking up - I'm not sure if the Devs are looking into this or not, but I'm going to send them a message.  

    • Hi Josh, we're looking at the problem whether it's 3DR boards or others.  But, it's not easy to figure out what the problem is.  It's entirely possible, and likely, to be a hardware problem that has nothing to do with the firmware.  It's also possible that this had been happening for a long time, and it's only with 3.2 that the problem is visible because we were not checking for it before.

      I wouldn't expect there is going to be any fix that will "roll back a change". It's pretty common when these sorts of things happen that people think it's something we did.  GPS problems are classic.  But it's normally nothing to do with firmware changes.

      • Thank you from my side, too.
        If you suggest any testing routine I would happily do it, when I'm back in Germany in January.
      • Thank you Rob, glad it is being looked into. The issue I guess we have is that we could replace clone boards with 3DR only for the same problem to occur.

        As you point out, things were fine until 3.2 (also the last 2 beta's of 3.2 in my case) and this would tend to say that the code has either exposed new hardware functions (checking 2nd IMU) or is being stricter on checks. 

        It is really odd that connecting / disconnecting a battery 3-4 times will clear the problem. If it was a simple hardware issue, I'd expect it to be reproducible with reasonable consistency. Mind you, same could be said of many s/w issues <grin> Oh, to be a coder ...

        All your hard work is genuinely appreciated. Bit nervous now as my flight test is in a week and I'll look really daft if I can't even get the thing to fly on the day!

      • My apologies if it seems I am insinuating it is something you did intentionally - that is not the case.  I agree that this is most likely a hardware issue, and apparently it is not affecting official hardware (or at least as much).  However if a hardware check was put in place that detects this issue where it didn't before, it would be worth it to the community to explain why it is doing so and why it was implemented in 3.2 at all.

        I don't know much about the code, but if IMU2 has been providing bad data all this time (which is very likely) then it apparently isn't used for anything currently.  Do the devs have plans for it in the future, or was the hardware check put in place for some other reason?  If it is unused, perhaps you could add an option to disable this check so that those with third-party boards could fly again?

        • @ThatJoshGuy, I am concerned that your question "or was the hardware check put in place for some other reason?" may be at the heart of the matter.  I hope the check was not coded intentionally, to make certain kinds of boards inoperable. 

          Your suggestion/solution " If it is unused, perhaps you could add an option to disable this check so that those with third-party boards could fly again?" is a stellar idea

          • Larry, there is no conspiracy.

            Here is a brief history:

            1. The APM class boards used the MPU6000 gyro/accel chip.
            2. The Pixhawk was originally designed to use the LSM303D chip, as it was supposed to be better.
            3. Initial prototype testing of Pixhawk boards revealed problems with the LSM chip.  To avoid further delays of Pixhawk production, it was decided to add the MPU chip back onto the board, so now there would be 2.  
            4. At some point shortly after the MPU was added, the LSM problem was solved, but it was decided to leave both chips on the board anyway, as it allows us to take advantage of redundant sensors.
            5. Up until 3.2, the MPU chip was the only one being used. The LSM was basically ignored.  If it failed, the code did not care.
            6. Starting with 3.2, we wanted to start taking advantage of the sensor redundancy, so we began using the data from the LSM.  Generally this is fine, except for cases where the LSM has a problem.  Or the MPU has a problem.  We did not forsee this failure rate until the code was released.  The problem is, either chip can fail, and the code doesn't know which is good and which is bad. It just knows they're different.  In this early stage, using two sensors does not really offer redundancy and increased reliability.  It actually doubles our chance of having a problem.

            The code is being worked on, to enable it to detect which has failed robustly, and then ignore that chip.  This will get us to the redundancy we initially intended. 

            Whenever these sorts of things happen, people always jump to conclude that we are intentionally doing something to mess with clone boards.  This is not the case.  The fact is, if the other manufacturers contributed resources to finding these problems, we'd probably prevent them and/or solve them faster.  

            • @Rob and Tridge,

              What a great “professional” response and one that is much appreciated!  There is such solace in knowing the scope of the problem and moreover, hearing there is a potential solution.  

              Please, accept my apology for doubting your commitment to all of the community, be they Pixhawk or clone users.  Your response here has clearly demonstrated you are men of character.  I wish I could help with the solution, but my coding days are long over as I have been retired for many years.  However, I am more than willing to participate in any Beta testing you may prescribe.

              Respectfully,

              Larry

          • There have been reports of 3DR boards with the fault, too.
            In 3.2 both IMUs data is used for the EKF. That's probably why it is tested more thoroughly.
            The "option to disable this check" is already there.
            Just disable the prearm checks.
            I'm flying a clone and I want to know if any component ist faulty.
This reply was deleted.

Activity