A few users, myself included, appear to be having issues calibrating accels and/or arming when using APM 3.2 on a Pixhawk. I see posts here and there, but thought it may be best if those having the issue could all 'check in' to one post so that the developers can see how widespread the issue is and gather data.
My Pixhawk will arm occasionally, and will even fly successfully. Occasionally however it will not arm and I will receive the pre-arm error "Accels not healthy". A few days ago I was able to arm and fly thorugh one lipo, then after landing and swapping batteries I could not arm due to this reason. I have attached the logs from the successful flight, as well as the failed pre-arm logs, to this post.
As you can see from these images there is something amiss with IMU2. This is a snippit from the good flight, showing values for AccX on IMU1 and IMU2:
and here are the same values on the second attempt, pre-arm.
In the 3.2 release thread a few users had mentioned the issue and Randy advised to set the log_bitmask to "131070" so that it will log everything including the pre-arm checks. I encourage others having the issue to do the same and share the logs and experiences here so that we can find out what is going on here - is there a bad batch of Pixhawks in the wild that only now show the hardware errors due to something new in 3.2, or is there an issue in the APM software?
re- cold weather effecting the controller
I live in Michigan. It's been VERY cold. I tried many times to see if temps had any effect on it. I brought it outside and started it right up in 10 degrees no issues. Next I let it sit outside off for 45 minutes in 10 degree temps then I started it up and flew. No issues. About the only thing I was really worried about was my props being really brittle. I've had several dozen flights in cold weather never had an issue.
All axis behaves the same way. I just put the x as an example.
I haven't been able to do good flight after I enabled full logging so it's work in progress. But the LSM has behaved that way when I connect it to usb. On some connection all is fine, the other it has failed. Same with battery.
You are writing that Tridges log summary shows "more" failures of the LSM303D with clone boards. Do you mean more LSM303D failures than MPU6K failures with clone boards?
I think I haven´t found one statement of LSM303D failure with genuine 3DR Pixhawks.
So is there any significant number of genuine 3DR Pixhawks with LSM303D failures.
I think there are quite some people like me in the line to replace faulty clone boards with 3DR boards and this info would be quite useful as the MPU6K problem is somehow resolved...
My bad, I just understood that manufacturing process could have broken the chip but it was just a joint.
I do value the vehicle and I'll of course try to rma my board as it seems clearly have a problem.
Asked that because I thought one is main accel (mpu?) and second one a backup (lsm?).
Craig, is one of the contributing factors you mentioned "two issues with where and how we placed the MPU6000 on the Pixhawk" is decoupling related? I have noticed that when I plug the batt with my taranis next to the copter I usually get bad_acc message. However if there is about 6-7 meters between the copter and taranis it is usually fine. also, I think it is worth investigating with supplying clean power at boot time? In my case on one of the machines I have put an LC filter in the power line as well as powering my Rx with a separate BEC (for landing gear) with the power wire removed from the sbus connection. This setup has not given me any trouble since late December, however, as soon as I plug the power wire in the sbus connection for backup power I get bad_acc messages. Removing BEC from Rx and powering it from the sbus line gives bad_acc messages rather often. This last experiment as well as the proximity to RC Tx led me to think of issues with decoupling.
I say that I am no expert in this field, but for what it's worth. Walkera insist on using two battery connectors, with the neg. connected first and then the positive. The reverse for disconnection. Some healthy Cap. sparks to be seen....
Power issues is an interesting point.
I had some power issues since the beginning (RTFHawk, intermittent LSM303D failures).
I always related it to my usb connection not giving enough power to the board.
But unfortunately these problems are quite difficult to reproduce and one would have to bench test it. Actually I also have some sparking (4S, extra caps between battery and ESC in the arms...) on battery connection...
There's always this solution:
I've also seen XT-90 connectors with a spark eliminator built into the connector itself. As you plug the unit together, there is a pre-shorting resistor that charges the caps prior to fully seating the connector.
I don't believe this is the cause of random boot-up errors. I've been watching this thread and have been in contact with 3DR and have had conversations with @Craig Elder. Craig has pointed out some additional pre-check functions that occur in AC3.2 (different from AC3.1.5), but I believe there are additional issues. I'm an Electronic Engineer (by BS degree), and something just doesn't make sense when the errors are cleared by simply re-booting the F/C. Temperature change is not an issue on my F/C, I live in Vegas and the temps have been in the mid 70's for the past several weeks and I still experience the errors. For me, taking the F/C from indoors to outdoors is not the problem. Power supply is not the problem, I run a very stout UBEC, and have been very careful to not move the F/C on boot-up (gently plug the battery connectors). I've also upgraded my MinimOSD firmware to the r800 version, and 50% of the time I boot-up, I get a "No Mavlink Data" error, then I power-cycle, and get Mavlink data. My OSD worked perfect everytime prior to the F/W upgrade.
I know I have described a few different errors and conditions which I am not asking anyone to troubleshoot, I'm just stating that only about 25% of the time do I have a stable boot-up. No changes what-so-ever in my hardware, or wiring, prior to the AC3.2 upgrade, it's just not reliable any more...
@Craig has invited me to look at the firmware coding, I just need to be pointed in the right direction to get started. I did a lot of programming 10 and 15 years ago, so learning one language vs. another shouldn't be a struggle.
Sorry, just frustrated, had only one good flight out of two this past weekend with the Pix-Hex, the first of which resulted in another broken landing gear (no thrust with stick full up on a gentle decent, resulted in a "bump and go" with the ground) was on a 10,000mAh battery only two minutes in on the flight. Was able to land, re-boot and fly another 15 minutes on the same battery, so charge and/or weight was not an issue.
On a side-note, got six great flights with my NAZA M V2 on my F450. I've asked 3DR for an RMA / exchange on my Pix, because if AC3.2 really is reporting additional errors on the F/C and they were always there before AC3.2 (just not reported), then there is something wrong with the F/C in my opinion. 50% success rate for flying, and 25% success rate on boot-up is not leaving any "warm-fuzzies". I may be on the short list for pulling the Pix because wrecking my $1800 multi-rotor is not an option.
Not decoupling. All mechanical related to pads / solder paste / thermal profile in the re-flow oven and de-paneling the boards.
I'm also a EE. I've worked on many detailed Failure Analyses, as well. A cracked solder joint on a SMT device could show up as intermittent with vibration or thermal excursion. In fact, I have seen this before. One could stretch and say that plugging in the battery heated the PWB and thus caused the connection to close. But as in your case, I could plug and unplug numerous times with random results. This signature leads one away from the cracked solder joint theory and more towards a timing or sequencing problem in the design. It could also be a self calibration problem at the device level. One thing is for sure, there are enough occurrences that this is not a very isolated, random anomaly. There is a root cause somewhere.
This is just my two cents and not meant to step on the efforts of 3DR and others to isolate the failure cause.
I would suggest we try and keep this thread primarily to those having issues they think are related to a bad IMU and post any and all logs so that the devs can look at them to see if that is in fact the case. If the production issue has been resolved then the only remaining thing is to try and help those effected the rest is just noise to the devs. This post included I suppose :]
Huhh? So if the problem is intermittent and one sends the log that happens to be when the accel is operating within normal limits, is it an okay accel? If only some small number of accels were affected by a solder joint anomaly due to an incorrect thermal/time profile of the reflow oven and the rest have another technical issue, what to do then? Scientific inquiry does not look for the simple answer but the right answer. Without true root cause, containment is impossible.
And if you haven't read, I have two Pixhawks that exhibit intermittent anomalous behavior (Bad Accel Health). I have looked at the solder joints on the MPU6000 and LSM303D at 10X and 20X and see no evidence of insufficient or a non spec soldering job.