A few users, myself included, appear to be having issues calibrating accels and/or arming when using APM 3.2 on a Pixhawk. I see posts here and there, but thought it may be best if those having the issue could all 'check in' to one post so that the developers can see how widespread the issue is and gather data.
My Pixhawk will arm occasionally, and will even fly successfully. Occasionally however it will not arm and I will receive the pre-arm error "Accels not healthy". A few days ago I was able to arm and fly thorugh one lipo, then after landing and swapping batteries I could not arm due to this reason. I have attached the logs from the successful flight, as well as the failed pre-arm logs, to this post.
As you can see from these images there is something amiss with IMU2. This is a snippit from the good flight, showing values for AccX on IMU1 and IMU2:
and here are the same values on the second attempt, pre-arm.
In the 3.2 release thread a few users had mentioned the issue and Randy advised to set the log_bitmask to "131070" so that it will log everything including the pre-arm checks. I encourage others having the issue to do the same and share the logs and experiences here so that we can find out what is going on here - is there a bad batch of Pixhawks in the wild that only now show the hardware errors due to something new in 3.2, or is there an issue in the APM software?
Replies
So I put freezer bags on hawk and waiting.
Do somebody know what press_temp value means ? C divided by 100 ?
I just disabled GPS precheck (in basement doesn't work even Neo8) and now getting Thr below falsafe so I suppose that arming precheck are still ok. Will wait little.
What I see is slow raising of altitude (know feature of baro, even with temperature compensation it change altitude)
Ok, so freezer bag was not cold enough to cool hawk down or problem is something different.
Same issue here. Attached logs for bad (1) and good (2). I don't know is bad log any good as I enabled logging when alert was on.
Seems that when I plug the battery in inside the house, I usually get no warnings. But if I do it outside (has been around -3 - -8 celsius) I get bad accel health all the time.
I have rtfq version of pixhawk board.
2015-02-16 22-34-07 1.bin
2015-02-16 22-34-13 2.bin
Hello
2015-02-16 22-34-13 2.bin, is normal.
2015-02-16 22-34-07 1.bin shows a LSM303D failure.
That serial number is not on my list of 3DRobotics manufacturered Pixhawks but if it is recent I might not have it.
If it is a genuine board then please contact help at 3DR for a replacement, otherwise you will need to contact the manufacturer for replacement. Is it a 3DR manufactured board?
Craig, how do you see this issue as whole? Are there bad lsm chips on the loose or could it be something else? Has 3DR got any of these problematic boards back and had a chance to analyze those yet? As it seems this is not manufacturer dependent problem..
Like I told in the first post, I have rtf version of the board, so I'll contact rtfquads about the possible rma. But before that it would be nice to figure out the reason. If there's something wrong with the design, just a replacement would not do it. Are you able to take design problem out of the table?
We have autopsied quite a number of boards and used the serial numbers to track manufacturing dates and batch numbers. That's how we solved this. As somebody who has worked both on the design of the Pixhawk hardware and the software I would say we have a very good understanding of the various failure modes and there are multiple ways for the sensors to fail but I have not seen any evidence of defective LSM303D or MPU6000 devices.
Just a bit of history, we initially had problems with the LSM303D failing on the early Pixhawks and we added the MPU6000 which we had previously used on the APM and the PX4 as backup. After that we resolved the issues with the LSM303D after speaking directly with the manufacturer of the chip. They had some hidden registers that needed to be configured and after we made those changes to the code we have not had any code related problems since then.
When we stated seeing new problems in the fall, we x-rayed and used a scanning electron microscope to inspect boards exhibiting problems with the MPU6000 and determined the failure mode.
Our analysis of the failed boards showed we have two issues with where and how we placed the MPU6000 on the Pixhawk. One of those issues was further exacerbated by a process change made in July of last year. Rather than make a design change we have resolved the issues on the assembly line. The clone manufactures will also have to make those changes to their processes as well.
The analysis of logs http://uav.tridgell.net/MPU6000-error/summary/summary.html also shows more failures of the LSM303D from clone boards such as the one you have. I'm curious what is going on there but I really don't know what the problem is. I suspect they are experiencing similar manufacturing issues and damaging the LSM303D on the boards they assemble. Unfortunately we have no way to identify which clone manufacturer or manufacturers are having issues but it is interesting to see.
Craig, is one of the contributing factors you mentioned "two issues with where and how we placed the MPU6000 on the Pixhawk" is decoupling related? I have noticed that when I plug the batt with my taranis next to the copter I usually get bad_acc message. However if there is about 6-7 meters between the copter and taranis it is usually fine. also, I think it is worth investigating with supplying clean power at boot time? In my case on one of the machines I have put an LC filter in the power line as well as powering my Rx with a separate BEC (for landing gear) with the power wire removed from the sbus connection. This setup has not given me any trouble since late December, however, as soon as I plug the power wire in the sbus connection for backup power I get bad_acc messages. Removing BEC from Rx and powering it from the sbus line gives bad_acc messages rather often. This last experiment as well as the proximity to RC Tx led me to think of issues with decoupling.
Not decoupling. All mechanical related to pads / solder paste / thermal profile in the re-flow oven and de-paneling the boards.
Power issues is an interesting point.
I had some power issues since the beginning (RTFHawk, intermittent LSM303D failures).
I always related it to my usb connection not giving enough power to the board.
But unfortunately these problems are quite difficult to reproduce and one would have to bench test it. Actually I also have some sparking (4S, extra caps between battery and ESC in the arms...) on battery connection...
Today, I flew my plane with the same Pixhawk that had Bad ___ Health warnings. Latest MP and 3.2.2 plane FW. This time, I had only a bad AHRS warning. I believe this was do to the GPS not having sufficient sats and/or HDOP. The warning did not go away even after I had 13 sats and HDOP of 1.4. I unplugged the battery and rebooted the PH and all was good. The flight was uneventful.
So, is the issue(s) intermittent or consistently repeatable? As to reviewing the data flash file, I did that but I am not sure what to look for. Is there an Accel Data Flash file reading for Dummies? At least a good and bad with the data points that indicate malfunction. Is the auto analysis sufficient? If it shows good for accel offset, is that good enough?
Btw, thanks for all the explanations, they are very helpful indeed.