Hello,

Here are key points in a few words:

  1. The "BAD ACCELL HEALTH" error may be induced by two different causes:
    • IMU1 error - MPU6000 soldering issue while production in some early batches of PixHawk. Currently is fixed by 3DR.
    • IMU2 error - LSM303D improper discharge process during power off.
  2. This solution is only for the LSM303D case.
  3. Needed hardware parts always were there. They can be controlled by FMU firmware.
  4. For some reason this feature seems has never been used in FMU initialisation. But it was always accessible via nsh CLI as "fmu sensor_reset" command.
  5. It seems the reason why it isn't used in FMU initialisation is a bug in sensor_reset function. It breaks SPI functionality in firmware.
  6. The fix for broken SPI after "fmu sensor_reset" is here.
  7. It permits to use sensor_reset function in FMU initialisation to properly discharge IMU2 to avoid IMU2-related BAD ACCELL HEALTH.
  8. It has been implemented as firmware fix and available here.
  9. Pre-built 3.2.1 image with mentioned fixes are available here for testing purposes.
  10. Currently I testing v3.2.1 fixed firmware. You can do the same on your own risk.
  11. There is a chance it will not resolve the issue. Then hardware fix will be needed. It is proposed below also.
    • It could be implemented with a minor design change (Q601 replacement).
    • Also it may be implemented as post-production hardware workaround (SMD resistor soldered on top of C506).

UPDATE 1: 20ms is not enough

Tests were performed with some number of flights. This fix don't breaks anything. But it doesn't work with 20ms discharge time. Going to increase it up to 1s, but I have not too much hope it will do. It seems pure software solution will not fix it w/o hardware fixes.

UPDATE 2: It depends on temperature

I have two items of HKPilot32. I flashed item #2 with original ArduCopter-3.2.1 Item #1 already flashed with 1s power reset fix. I put item #1 in refrigerator and started with item #2.

Powering attempts numbered as they were performed in common sequence. Powering cycles were performed by battery disconnection and connection back. To reproduce issue a disconnection time should be less than 1 second. My results are below:

Item #2, original 3.2.1, room temp about 22C
--------------------------------------------
bad: 2 4 5 6 7 8 9 10 11 12 14 15 16 17 18 19
good: 1 3 13 20

Item #2, 1s reset fix for 3.2.1, room temp about 22C
------------------------------------------------------
bad: -
good: all from 21 to 40


WOW! Doesn't it work?! Let's check it with cooled item #1.
I took item #1 out of the refrigerator and put there an item #2.

cooled-px4fmu245.jpg


Item #1, 1s reset fix for 3.2.1, from refrigerator
--------------------------------------------------
bad: 43 44 45 46 47 48 49 50 51 52 
good: 41 42 53 54 55 56 57 58 59 60

No luck. What about to warm it a bit?


Item #1, 1s reset fix for 3.2.1, after 30 minutes in room temperature
---------------------------------------------------------------------
bad: -
good: all from 61 to 80

It looks like it works for room temperature! But what about cooled item #2?


Item #2, 1s reset fix for 3.2.1, from refrigerator
--------------------------------------------------
bad: 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
good: 81

Being cooled it doesn't work again.

All logs are here. Next step I going to play with 5sec power reset on cooled FMUs..

UPDATE 3: It mostly works for 5 sec power reset time

Item #1, 5s reset fix for 3.2.1, from refrigerator
--------------------------------------------------
bad: 107
good: 101 102 103 104 105 106 108 109 110 111 112 113 114 115 116 117 118 119 120

With 5s reset time it looks like it mostly works for the same item and same temperature.
Also it seems for the attempt #107 I've had really small (about 0.2s) disconnection time. It is an unlikely scenario of actual copter use.


Link to test image is updated, now it have 5sec reset time fix. It would be nice to have test results from people suffering from this problem.

UPDATE 4: It seems temperature dependency is not about capacitors

I measured discharge process on C506 of the same FMU item for two cases: a) FMU have room temperature and b) FMU has been cooled 30 minutes in refrigerator. It appears that measured values difference is really subtle. It is about 20mV difference for discharge level on 5 sec period. So it seems the temperature-capacity dependence is not a factor for this issue.


UPDATE 5: MOSFETs test results

Performed tests with SPICE models and in hardware (a number of MOSFETs were used) demonstrated that issue cannot be resolved reliable w/o hardware fix due to the fact Rds goes high when Vds and Vgs going down. VDD_3V3_SENSORS rail needs to be connected to GND via 220 Ohm resistor and it still need a software fix. As result the EN1 input of mic5332 doing its job perfectly w/o any MOSFETs. But it will not work reliable for the case with resistor connected between any other power line and GND, this way the issue still possible to happen for cases of short power disconnection.

UPDATE 6: hardware fix is enough for most cases

I must admit that it seems working ok even without a software fix. Just performed a number of tests with original ArduCopter-3.2.1 firmware on cooled (all night in refridgerator) board with 220 Ohm resistor which is installed in parallel to C506. As result: by disconnecting and connecting battery back with my hands I was unable do it in enough short time to reproduce issue. All logs from attempt 121 through attempt 140 are showing IMU2 works just fine. Software fix still needed for non-fixed boards only. In case you are able to (de)solder SMD parts the better place for resistor is instead of Q601, you need to desolder Q601 and solder resistor on emitter and collector pads.

Owners of non-modified boards are have to wait for a software fix. It have a chance to be included in ArduCopter-3.3.3.

Here is a long story:

Some time ago I decided to switch from APM to PixHawk, so I bought two items of HKPilot32 FMU from HobbyKing. So far I can see they are clones of PX4 FMU v2.4.5. Boards are labeled as "px4fmu-2.4".

HKPilot32-board.jpg

Those days I had not too much experience with PX4 platform, so all intermittent issues and misbehaving were considered as a lack of my skills. When I first stuck with BAD ACCEL HEALTH issue I googled for the issue and found some mentions of power filtering and so on. So I decided to put some ferrite rings on power lines to FMU and FPV, to decouple possible noise and spikes on power lines. It seemed as it helped me for some time and I've seen no more BAD ACCEL HEALTH for a long time.

But later it happened again. So I decided to investigate it in a more details and googled this thread.

While checking my DataFlash logs I realised it wasn't MPU6K issue. It was definitely LSM303D issue. I read through all in the mentioned thread and I collected all important points together in a single document. Also I collected there some important points drom other forums like HobbyKing and Plololu. (All 'hello' and 'thanks' were skipped.)

I discovered that possible key to the problem is in the message by Artem on February 19, 2015 at 11:48am: "first hit is pololu community forum thread. Apparently lsm303d is very sensitive to how it needs to be powered down". So I googled once more and found original message on Pololu forum: "When we were testing the accelerometer on the LSM303, we noticed we could get a similar behavior where the accelerometer constantly reported a single value for all axes. It seems that if you interrupt power to the accelerometer in a certain way (like disconnecting or turning off/on power) so that the voltage falls below a certain amount but not all the way to 0, then it can brown out and get stuck in a bad state".

So I disassembled my HKPilot32 and downloaded all pix4fmu-2.4.5 schemas and pcb layouts. As you can see there are capacitors (C506, C507) on VDD_3V3_SENSORS net close to LSM303D, also there are some capacitors on the same net close to LDO and some capacitors should be close to other sensors:

px4fmu-2.4.5-sch-LSM303-VDD_3V3_SENSORS.png

Having a hope that I would be able to solder SMD resistor on top of C506 I found exact location of C506 and started to measure LSM303D power voltage on C506 pins:

px4fmu-2.4.5-photo-LSM303.jpgpx4fmu-2.4.5-photo-C506.jpg

Here is how LSM303D power drops down once battery get disconnected. There are no any other modules connected to FMU or PM (Power Module). As you can see it takes about 50ms to drop on level 0.8V and it takes about 0.1 sec to drop on level 0.6V. And it doesn't fall below 0.344V even after 1 minute since battery get disconnected from PM. It's definitely not good. And it can be even worse in case other modules are connected to FMU or PM.

px4fmu-2.4.5-scope-LSM303-power-down-08v.jpgpx4fmu-2.4.5-scope-LSM303-power-down-06v.jpg

I was ready to solder SMD 1k resistor on top of C506 when I suddenly discovered Q601. Then I traced VDD_3V3_SENSORS_EN net and..

px4fmu-2.4.5-sch-VDD_3V3_SENSORS_EN-Q601.pngpx4fmu-2.4.5-sch-MIC5332-VDD_3V3_SENSORS.png
px4fmu-2.4.5-sch-U101PE3-VDD_3V3_SENSORS_EN.png

WOW! It is controlled by PE3 pin of STM32F4 and it is there definitely to discharge sensors power line and to shutdown LDO output. By design it looks like it should do sensors power line shutdown in a proper way. It was designed to do exactly that we need to avoid BAD ACCEL HEALTH.
So why it doesn't work, what I missed? Broken Q601 transistor? PCB issue? Anything else? Well, let's measure it on R620.

To shutdown sensors power line the base of Q601 should go down with logical zero on PE3. And I believe it should happen on FMU initialisation. Ok, FMU is powered, pushing reset button..

px4fmu-2.4.5-scope-VDD_3V3_SENSORS_EN-on-reset.jpg

During FMU initialisation it goes up for some (really long 5.4sec) time, until PE3 will be configured as an output. Ok, no problem. Then I setup trigger for my oscilloscope on 3V level to catch voltage drop and no luck. It means Q601 doesn't receive logical zero from PE3 on FMU initialisation. At all.

NOTE: Hereafter I will refer to source code of version 3.2.1.

Ok, it's to time to dig the source code. What about to grep PX4Firmware sources for "SENSORS_EN" word? Here it is. It is used in "sensor_reset" method to do exactly what we needed. Let's see where "sensor_reset" methods is used. Nowhere else, for the moment of v3.2.1 released it is used just in the same file "PX4Firmware/src/drivers/px4fmu/fmu.cpp". What about v3.3.1? Checked, the same.

After browsing fmu.cpp file I can state definitely: for versions 3.2.1 and 3.3.1 it is used just to provide ability to run nsh command "fmu sensor_reset" with an optional parameter. Nothing more. And it definitely doesn't used by FMU initialisation process. Ok, let's try to execute "fmu sensor_reset 20" to shutdown power line for 20ms.

px4fmu-2.4.5-scope-LSM303-discharge-08v.jpgpx4fmu-2.4.5-scope-LSM303-discharge-06v.jpg

It works! As you can see it falls down to 0.8V in 1.8ms and to 0.6V in 10ms.

The bad side is that it doesn't falls down to zero due to Vce(sat) parameter value for Q601. Q601 is BJT transistor of type MMBT3906 and it have Vce(sat) from 0.25V to 0.4V. So VDD_3V3_SENSORS net cannot be discharged below Vce(sat). In case a pure software solution with PE3 and Q601 will not do for BAD ACCELL HEALTH, then design for Q601 should be changed to use some MOSFET instead. It permits to discharge sensors power line much closer to zero level. Also there are low Vce(sat) BJT switching transistors on the market.

The good side is that level of power line discharge isn't an only important factor of issue. The rate of discharge may also affect LSM303D behaviour and we able to fix it right now. Anyway, here is chance it will do so let's try it.

I implemented it this way and performed FMU reset by pressing reset button. Next moment my board is turn on multicolour LED as red and it means a error while initialisation. I connected to PX4 terminal and found following startup error messages:

Starting APM sensors [MS5611_SPI] on SPI bus 1 at 3ms5611: interface init failed
Error in startup

Then I turned on debug messages and realised that SPI is affected:

Starting APM sensors [MS5611_SPI] on SPI bus 1 at 3ms5611: prom all zero
ms5611: prom readout failed
ms5611: interface init failed
Error in startup

After some reading of "sensor_reset" source code I discovered the cause of SPI issue. It doesn't enable back SCK/MOSI/MISO pins after reset. So I fixed it and then FMU loads ok.

As a bottom line:

  • I unable to see why mentioned hardware feature (Q601) being known for a long time is never been used in FMU initialisation.
  • It may fix IMU2-related BAD ACCEL HEALTH issue. Needs to be tested.
  • In case it will not, then really minor design changes are needed to fix it in reliable way.

Comments and corrections are appreciated!

Regards, Dmitry Prokhorov

IMU2-error.BIN

You need to be a member of diydrones to add comments!

Join diydrones

Email me when people reply –

Replies

  • Can I just take 3.3V anywhere, and then a 200-1k ohm resistor to ground ? Might be easier to solder somewhere else. And I am on Pixhawk Lite.

    • You can do it anywhere, but it should be definitely the VDD_3V3_SENSORS wire. Not any other 3.3V wire.

      As an example it could be performed this way

  • Thanks a lot, I applied your hardware fix, which was suprisingly easy to do, and I havent had these problems ever since. You saved me a lot of trouble and a lot of money!

    • do you have a picture how to do that?

  • Hi @Dmitry,

    Do you think the failure of the lsm303d in colder weather is related to this discharge issue? I live in colder climate and since November the bad accel health has been continuous. Happens on all of my 3 chinese pixhawk clones and a px4lite(gold version). If I power up the multirotor indoors, the lsm303d provides good data on the 3 accel axises, but about 30 seconds after I bring it outdoors where the temperature is below 0C, the telemetry says the 2nd accel is showing ~18000 on the 3 axis. But if I bring it back indoors, it provides good data again (without rebooting). Further, the other two sensors on the lsm303d seem to provide good data throughout (the mag and the gyro).

    I haven't implemented the hardware fix, but I did remove some lines of code from 3.4-dev (master) and was able to fly. I also set use_ins2=0 to ensure only the IMU1 is used. I don't know if all of the deletions were necessary. Removed lines 213-215 from https://github.com/diydrones/ardupil...CS_Mavlink.cpp

    Then these

    ardupilot/libraries/AP_Arming/AP_Arming.cpp
    https://www.diffchecker.com/mtxyeway

    ardupilot/arducopter/arming_checks.cpp
    https://www.diffchecker.com/laxay6uw

    It does fly without the bad accel health errors by removing those checks, and setting use_ins2=0 (though risky because its no longer checking the sanity of IMU1).

    Although in the last week or two the temperatures where I live have risen, so I no longer get the lsm303d cold weather related failures, so it'll be a little more difficult to diagnose. But even so, I am eager to hear your opinion Dmitry, and anyone else that has anything to add/


  • I have that flaw in my PX4 . What should I do to fix it without having to touch the plate on the inside ?. I need your help , thank you. a greeting.

    • In a case hardware fix is not an option for you, then you have to wait until software fix will be included in release images. Currently it isn't finished yet. Software fix works for most, but not for all low-temperature conditions.

      • then I have to wait for the software solution . or an update of software that you can get and where to get it? thank you very much , nice .
  • @Dmitry,

    I exactly know what you mean but I am really happy to meet genuine Pixhawk expert on DIYDrones since official support by 3DR at DIYDrones just terminated.
    I have followed your full thread and you are exactly the right man to join
    Peer To Drone Accidents Investigators
    to study hundreds of drone fly-away cases and provide support to individual hobbyists since official support just terminated and is closed.

    Ok, you have studied pre-flight check bugs and you have provided excellent solution (either hardware or software solutions)

    Now is the time to study airframe resonant frequencies, to test drone at vibration table, to test accuracy of GPS, higher harmonics, software loop clocks, sensor clocks, install third party data loggers to get drone fly-away risk contained.

    Flying a fly-away syndrome affected drone under new legislation by FAA or just enacted new legislations by Canada, Europe, Russia, South Africa is high risk operation, especially if you fly at higher altitude.

    So you are welcome to study drone fly-away syndrome now.

    Peer To Drone Accident Investigators
    darius
    manta103g@gmail.com
  • @Dmitry,

    thank you very much for your excellent hard job done.
    Could you explain if the above described bugs can be attributed to so-called
    Drone Fly-away Syndrome ?

    Many cases of drone fly-away reported on this forum, drone flies away at higher altitude, control is lost, drone is lost, crashed, law enforcement officers get involved, pilot wanted.
    Really not nice.

    You are welcome to join
    Peer to Drone Crash Investigators
    as a honorary member

    Thank you

    darius
This reply was deleted.

Activity