After more than two hundred flights with my APM2 equipped quad, today, in the remotest of possible locations my quad failed to respond and attempted to fly away by itself! All I can say is thank goodness for Knobthorn trees.

I wanted to get some video of the area but didn't have my small cam with me so I attached my Galaxy Smartphone to the front of the quad, I have done this before and know that the video won't be that great (lots of jello) but will at least give an idea of the scenery. I waited for satellite lock and then hand launched (too much dust on the ground) just in front of where I was standing, I went slowly up to about 20m, hoverered then did a 180deg pirouette and slowly started descending back towards my position.

When the quad was about 3m from me and 2m off the ground I throttled up to stop the descent which was fine but now the quad was drifting forward slowly so I tried to tilt it backwards... suddenly I had that sickening realization that it wasn't responding, I wiggled the sticks...NOTHING!. I ran towards it but it was still drifting forward and climbing very slowly, now at about 2.5m. It continued very calmly and smoothly and I briefly considered throwing the only thing in hand (my transmitter) at it to try bring it down, but it flew on until it collided with a Knobthron tree at about 4m high.

3 motors were jammed by branches but one was still spinning away calmly. I wasn't... I was able to run to the nearby parked pickup and drive under the tree, reach up to unplug the battery and then get the quad down.

I have read the horror stories of fly-aways but one naturally never expects it to 'happen to me'.

- My radio is a tried and tested two year old Hitec Aurora 9 which has one of the most robust AFHSS 2.4GHz systems. The quad was 3m from me when it happened!

- My failsafe is setup correctly and double-tested, with the throttle channel going to 900ms on event of signal loss.

- All four my SS 30A ESC's (2A BEC) power the APM2 (each tested 0.09v of each other) effectively quadrupling the current capacity. I've never had a suggestion of a brownout.

So what happened? I don't know, I can't see anything definative in the logs (attached). If someone has any ideas I'd be very grateful!

As for now my faith in APM2 has been shattered, like an old friend you can no longer trust.

Views: 4334

Attachments:

Reply to This

Replies to This Discussion

Graham, so you're saying that if you remove power from the Rx, the PPM Encoder does the proper failsafe.  But if you pull the signal wires, then it flatlines...  But I guess you're talking about all 4 of those wires coming out, but you must be leaving wires 5-8 connected?

Sounds like a practical impossibility.

So as Randy says, if the PPM Encoder failed, but how?  

Back to the EMF blasting theory, is it possible for strong radiation to corrupt the processor?

Sorry, just double checked, pulling 1-4 out produces flatlines, pulling 5 out triggers failsafe (there's no 6-8).

But I have not been able to reproduce the same flatlines shown in the crash log.

Why does the APM need to be connected to look at a log? I would guess it is so it knows which version of AruPilot you are running. Can this information be determined from the log? thx

Not sure, whatever came on the board. As to digging deeper I think that's beyond my skill level and/or pay grade. Sorry.

To answer my own question the log does state ArduCopter 2.8.1 at the top of the file. I still don't understand why MP needs the APM plugged in?

This is what we know:
  • The 2560 continued running through out the event.
  • The GPS or the serial interface was experiencing issues so the APM was not getting a regular position update.  The loss of position information was occurring more often right before and after the loss of control.
  • Flying under the trees will cause a reduction of the GPS signal.
  • The presence of the cell phone provides a transmitter that could cause interference with the GPS.
  • The RC input values to the APM stopped updating causing the loss of control.
  • The lack of updates could be caused by either an issue with the PPM encoder or the RC receiver.
  • There is no evidence to support a higher probably of failure with either device.
If somebody can make some further analysis of the logs that could produce some additional data that would great.
Thanks.

Having cell towers far away but detectable by the phone is worse. If the phone cannot hear a signal from a cell tower, it does not transmit, only listens. Otherwise it would just waste lot's of battery talking to nothing,

I would concur most likely problem was EMF. And this EMF was generated by the phone radio. The phone  started to transmit. Though I think that is was not the 1800Mhz or other standard GSM band, but the WiFi which is on the same frequency as the radio? If the phone WiFi chip thought it could hear network, it may have started to broadcast to trigger SSID detection. It may have even ramped up the power to make it signal better for the phantom WIFI network. In the end this meant that the phone could have 'jammed' or 'confused' the HiTec receiver and it stopped working, producing the flatlined outputs.

The micro-controller is really only susceptible to false inputs. i.e Digital In being read inverted or Analog ones being exaggerated or depressed. EMF corruption of the processor would need to be a large EMP! Any nuclear tests going on nearby ? ;-)

I suppose the one good thing is that the signals to the APM flatlined, and this is an obvious thing that can be detected and processed. Are their extra efforts* to add a failsafe when the inputs 'flatline' or otherwise fall outside what is considered normal operating parameters?

*I realize there are some being done, but it would be interesting  to have a shopping list of which are considered

Bill 

That's a really good point. I was thinking about a transmission on the normal GSM frequency but the phone's WI-FI and the RC radio would be at 2.4GHz.

That's the best hypothesis I have seen so far.

I found this review here of the HiTec radio. http://www.youtube.com/watch?v=AdIhbNLC0yg and it is very interesting that they don't use the full 2.4Ghz band. This can be a problem as it makes the comms more susceptible to interference. The idea of spread spectrum is that you spread the signal as wide as possible, and only momentarily reside on a smaller channel. By being pseudo random in channel selection (i.e hopping) the signal is almost un-jamable without lots of power. Spread Spectrum was military only tech for many years due to this jam resistant quality.

And this made me remember that a good example is how WiFi which doesn't use spread spectrum and it doesn't play nice with spread spectrum technologies like Bluetooth. Devices that support bluetooth and WiFi usually do so on the same chip and they cooperate when to TX, by multiplexing transmission between the two.

By having a phone using WiFi and HiTec AFHSS Rx transmitting so close to each other caused the issue. In this case the RX went down. 

The HiTec rx is a two-way telemetry system. Was it configured to send telemetry info? (even just battery info?) this would mean two 2.4GHz transmitters next to each other, and a possible root cause in an escalation in TX power from both devices, causing the control failure.

I think this would be really hard to replicate, as it may only be under certain circumstances the positive feedback loop is triggered.

Probably, and this is conjecture, the best advice is to make sure you don't have WiFi on a drone. It will be interesting to see what happens with the WiFi enabled GoPro 3?

Bill, in this case WiFi was NOT turned on, only the normal GSM cell radio was on. The Hitec system uses telemetry all the time when connected to a compatible receiver. While I suspect that the phone played a part in the signal loss I have not been able to reproduce the loss. I put the phone directly on top of the APM and Hitec Rx (covering both) and even put the Tx in an aluminium box 20m away and all was fine.

The part 3 of the Hitec system review paints a much better picture: http://www.youtube.com/watch?v=7f43mydRdjA&feature=relmfu

Quote from Bruce: "...explains why Hitec has one of the highest interference rejection scores I've encountered in my testing"

Thanks for the link. I just watched it. I think their way of avoiding collisions on channels is unique, but I still think that WiFi with it's 11 channels over the whole band can cause issue that we should be cautious of. Just divde that screen into 11 sections and wipe out 1 due to a WiFi channel, that's a 10% loss. This is most critical when transmitter and receivers units are close together, especially if they have no awareness that the other exists.

IMHO I really think that your flyaway was really down to a set of perfect conditions for it to happen. And I agree the phone was the catalyst. You said you have done this many time before with no issue.

What really is fact is that the control inputs flatlined. And this can be simulated as you have demonstrated by pulling cables. The cause was the PPM encoder or HiTec RX failed for some unexplained reason.

The solution seems to be add some defensive code for all PWM inputs to the APM so they are checked to be a valid PWM signal. If PWM signals are outside of the bounds of reality the APM goes into a failsafe mode that makes sense (this is the hard part). Also equally hard is adding checking that doesn't impact the performance of APM. More code, more to process, the less responsive the system can become. Not an easy balance. I think John is looking at ways to add some more checking without major performance impact to the  PPM encoder and this would help if the RX failed, but not if the PPM encoder has an issue.

When you really get into programming in robustness to mission critical systems you can see why the space shuttle had four computers that conferred on all decisions and if all of them failed to agree a 5 computer system implemented by a separate company took over. This stuff is hard, and no matter how many checks you put in place there's going to a gremlin hiding.

I like chasing gremlin's :-D

I was wondering if  a solution is that it is best to have a telemetry system running on 900Mhz or 433Mhz and this would be enough for us hobby enthusiast to send a new RTL coords or Land Now command as backup?

Reply to Discussion

RSS

© 2019   Created by Chris Anderson.   Powered by

Badges  |  Report an Issue  |  Terms of Service