A few users, myself included, appear to be having issues calibrating accels and/or arming when using APM 3.2 on a Pixhawk.  I see posts here and there, but thought it may be best if those having the issue could all 'check in' to one post so that the developers can see how widespread the issue is and gather data.

My Pixhawk will arm occasionally, and will even fly successfully.  Occasionally however it will not arm and I will receive the pre-arm error "Accels not healthy".  A few days ago I was able to arm and fly thorugh one lipo, then after landing and swapping batteries I could not arm due to this reason.  I have attached the logs from the successful flight, as well as the failed pre-arm logs, to this post.

As you can see from these images there is something amiss with IMU2.  This is a snippit from the good flight, showing values for AccX on IMU1 and IMU2:

and here are the same values on the second attempt, pre-arm.

In the 3.2 release thread a few users had mentioned the issue and Randy advised to set the log_bitmask to "131070" so that it will log everything including the pre-arm checks.  I encourage others having the issue to do the same and share the logs and experiences here so that we can find out what is going on here - is there a bad batch of Pixhawks in the wild that only now show the hardware errors due to something new in 3.2, or is there an issue in the APM software?

Views: 53652

Attachments:

Reply to This

Replies to This Discussion

So I put freezer bags on hawk and waiting.

Do somebody know what press_temp value means ? C divided by 100 ?

I just disabled GPS precheck (in basement doesn't work even Neo8) and now getting Thr below falsafe so I suppose that arming precheck are still ok. Will wait little. 

What I see is slow raising of altitude (know feature of baro, even with temperature compensation it change altitude)

Ok, so freezer bag was not cold enough to cool hawk down or problem is something different.

We have autopsied quite a number of boards and used the serial numbers to track manufacturing dates and batch numbers. That's how we solved this.  As somebody who has worked both on the design of the Pixhawk hardware and the software I would say we have a very good understanding of the various failure modes and there are multiple ways for the sensors to fail but I have not seen any evidence of defective LSM303D or MPU6000 devices.

Just a bit of history, we initially had problems with the LSM303D failing on the early Pixhawks and we added the MPU6000 which we had previously used on the APM and the PX4 as backup.  After that we resolved the issues with the LSM303D after speaking directly with the manufacturer of the chip. They had some hidden registers that needed to be configured and after we made those changes to the code we have not had any code related problems since then.

When we stated seeing new problems in the fall, we x-rayed and used a scanning electron microscope to inspect boards exhibiting problems with the MPU6000 and determined the failure mode.

Our analysis of the failed boards showed we have two issues with where and how we placed the MPU6000 on the Pixhawk.  One of those issues was further exacerbated by a process change made in July of last year.  Rather than make a design change we have resolved the issues on the assembly line. The clone manufactures will also have to make those changes to their processes as well.

The analysis of logs http://uav.tridgell.net/MPU6000-error/summary/summary.html also shows more failures of the LSM303D from clone boards such as the one you have.  I'm curious what is going on there but I really don't know what the problem is.  I suspect they are experiencing similar manufacturing issues and damaging the LSM303D on the boards they assemble.  Unfortunately we have no way to identify which clone manufacturer or manufacturers are having issues but it is interesting to see. 

Yes.

Thank you for the detailed response! So it seems in my case who ever assembles rtf boards has probably some trouble in the line with LSM chips.

Shouldn't it be just fine to fly with the working mpu6000? Now it's only possible if I switch off the boot time check?

One more time, so these damaged chips are not totally dead, as they seem to work part-time (at least in my case)? Was that also the case with mpu6000 or were they totally dead? I'm sure you understand the confusion as the chip seems to have mind of it's own..

Pixhawk runs fine in cold and hot.  We have one sitting in a thermal chamber cycling between -40 and +40 C however this is more dramatic https://www.youtube.com/watch?v=DfZfNk-jYdI

Today, I flew my plane with the same Pixhawk that had Bad ___ Health warnings. Latest MP and 3.2.2 plane FW. This time, I had only a bad AHRS warning. I believe this was do to the GPS not having sufficient sats and/or HDOP. The warning did not go away even after I had 13 sats and HDOP of 1.4. I unplugged the battery and rebooted the PH and all was good. The flight was uneventful.

So, is the issue(s) intermittent or consistently repeatable? As to reviewing the data flash file, I did that but I am not sure what to look for. Is there an Accel Data Flash file reading for Dummies? At least a good and bad with the data points that indicate malfunction. Is the auto analysis sufficient? If it shows good for accel offset, is that good enough?

Btw, thanks for all the explanations, they are very helpful indeed.

Craig, et al,

I sounds like 3DR has determined the root cause of this issue and contained it. Can we assume that all Pixhawks that are currently shipping have been screened for the Accel anomaly?

Thanks!

You can check it by selecting for example accx from both IMUs and compare their readings. When mine is misbehaving, it shows constant +57 value as the working one shows changing values around zero. I'm sure other kind of anomalies are quite clearly also visible, if there are sudden value jumps and so on..

Mine seems intermittent, imu2 could work just fine on another boot.. Havent got enough data to tell if it happens during flight yet.

Are you saying only accel x acts up and not Y and Z? It's very strange that rebooting can clear the problem. Of course, I am not privy to the root cause of the failure, the manufacturing process that Craig has alluded to.

Your point is also valid. Once one gets a clean boot, will the accels behave properly until the next boot or can they go South?

Again I see no evidence that there is anything wrong with the devices themselves.  In the manufacturing process the solder that holds the chips to the board is fractured. Some times there is a connection and sometimes not.

>>>Shouldn't it be just fine to fly with the working mpu6000? 

No.

Well I guess that depends on how much you value your vehicle. At some point it is going to crash.

Yes, that is correct.  Every function and every device on every Pixhawk is tested and demonstrated to be working before it leaves the factory.

Reply to Discussion

RSS

© 2019   Created by Chris Anderson.   Powered by

Badges  |  Report an Issue  |  Terms of Service