First this IS NOT a bashing session this is a search for answers and opinions so try to be respectful and courteous !
me personally i think the APM is capable of bringing my plane back and landing if the rx fell out ! do i think DIY should be responsible to implement it ? no but they are flying too so if we come up with a good proposal i,m sure they would try to make it happen !
this discussion is intended to come up with scenarios where your platform would go out of control ! and what we can do ! there should be a sister post Mitigating the chances of losing control .
i,ll start out . with Geofencing(here after to be referred to as GF) turned on i can't see how your platform can fly away so maybe GF should be turned on from default with a tiny box that you have to adjust to your area,platform,and conditions ? but not all of us carry a laptop to the field so maybe we should be able to save it and recall a few different versions for different fields and or conditions that way you could program it at home and go fly , but if you turn it on to far away from the place you selected as home it should lock up and beep or flash an error that way it wont try to fly 40mls back to your house (00) a safe configurable selectable autoland function tied too GF would be nice too.
at least this would protect the DIY community from litigation and put responsibility on the user and would save a newbie from a painful costly learning experience !!! feel free to poke holes in my ideas !
Questions leads to answers and isn't it wonderful we need not wait another second to make the APM better and if we do a good enough job maybe the government will force the AMA to use our product on all there large dangerous aircraft ! and would help in the UAV community acceptance into the sport
now have at it
Andreas IMHO your right but we can make it safer for noobs buy not arming the motors unless you've selected a failsafe action or even no action if you want to risk it but you would have to check that option then its all on you
I agree with Andreas. If you're flying with a 9x, upgrade to the FrSky Tx Module/Rx for about $40.
Regarding the internal watchdog timer, that's a different issue that's mainly for handling internal software errors.
Ok, guys this is what happens, when I leave the house for a few hours. I thought we already agreed that this thread was to discuss technical issues related to failsafe? We're not debating the merits here, please take that discussion to the other thread. If people don't keep this discussion technical, there's no point in having two threads discussing the same thing, so I will close one of them.
Totally agree. Can do fault detection and mitigation, without clear failure modes.
One can't possibly program the code to catch all paths of failure. We need to first determine the frequency of occurrence of any fault. This is the key behind being fault tolerant. Without it, you may end up wasting code on faults that occur infrequently, and fail to handle those that actual do occur.
> One can't possibly program the code to catch all paths of failure.
Sure you can. That's the whole point of a watchdog timer. If any part of the code fails to report that it's ok for more than a fraction of a sec the whole unit resets or triggers some sort of failure mode.
In each section of code you either report ok, return a failure code for further action, or you report nothing (probably because it has hung) and the watchdog triggers a reset.
There already is a watchdog timer. Look at failsafe.pde in the code
So it looks like it's checking at 1khz for the main loop taking more than 0.2 sec. It's running from timer interrupt.
It looks to me like it might be vulnerable to certain kinds of hangs since it's not using the hardware watchdog.
My idea would be to move this to the hardware watchdog timer, and rather than checking for main loop execution speed it should check a series of software watchdog timers or variables that are each tied to the execution of individual code sections.
I guess there's all sorts of ways to implement a more sophisticated watchdog/failsafe system. Maybe someone with a better understanding of the overall program flow could comment?
Jake, the way the watchdog works in the Atmega is as follows:
If the watchdog timer is not reset within that interval a hard reset is triggered. This would work great for non-time critical applications, but in the case of the autopilot a reset in the air will cause further problems than it solves. This will certainly not help things, in the case of a radio signal loss, as that will not solve the radio fault. I should also re-iterate that the autopilot should be able to handle a radio fault, more gracefully than doing a reset.
Well, sometimes the cure is worse than the disease. If the fault is a hard failure, like a failure of the decoder to receive a proper input from the receiver, a global reset on this will end up with endless resets. This means your craft is essentially dead, when instead it could have been able to use it sensors to keep flying in a pattern or hover.
On the other hand, the failure could be a transient radio glitch. A reset in that case could make the situation worse, and cause the craft to fall from the sky, again when the autopilot could have maintained proper flight.
There's not such thing as a "catch-all". Each fault needs to be handled separately, depending on the circumstances.
The idea of the watchdog is that it IS the catch all. It causes a reset when everything else has failed. An unhandled exception represents a total failure. Loss of radio need not necessarily allow the watchdog reset if another mode or function can handle the exception.
Radio loss for a certain period would initiate an error condition. Another section of code would catch this if it could (valid sensor readings) and reset the timer, preventing reset. If the auto code didn't have enough sensor info it would not handle the exception and it would end up with the watchdog timer expiring and triggering a reset.
A watchdog reset can be handled differently on startup. Knowing you are starting from a watchdog reset you can start up in full "recovery attempt" mode.
You also have two processors, so the one having trouble could reset without resetting the other. The PPM encoder should be able to reset very quickly and resume functioning. This would hopefully cure the majority of reported control freezes. You'd get a tiny glitch rather than freezing until you crash.
The main processor could signal the PPM encoder to hold a certain failsafe setting or go to manual until it resets.
The way I think of it is that when your computer locks up first you try closing the current program, then you try ctrl-alt-del, finally you hit the reset switch. As soon as you've tried your first two options and they don't work... any further time you wait before hitting reset just results in a longer period of down time. Once you run out of options or none are appropriate you have no choice but to reset, which almost always works.
The system I see would be like a conveyor belt in a factory. You have a series of linked deadman switches. Man A has to cap a bottle or release his switch to stop the line. Man B either caps the bottle man A missed or he releases his trigger also. So on and so forth until the last man either caps the bottle or stops the conveyor before the bottle falls off and smashes on the floor.
That's probably way too simplistic of a view, but the bottom line is that the bottle never falls off the conveyor (crashes under power) no matter what. If guys B, C, and D are on break or not paying attention the conveyor stops right away.
No Jake, the watchdog timer is NOT a 'catch-all' and it would do precisely nothing whatsoever to protect against receiver issues.
A watchdog timer protects against a very specific class of failure:
That's it. It doesn't protect against anything else whatsoever.
Trying to abuse it the way you've suggested will cause aircraft to crash.
Rebooting the APM makes it forget where it is and even which way is down!
It's going to be quite a while before it can figure out what's going on.
- If it gets "down" wrong, then it may go ahead and power the craft straight into the ground, convinced that "Down" is up.
So it will always cause a copter to crash and will often cause a fixed wing to crash, as for several seconds the aircraft will be completely uncontrolled. Once in freefall, it's not even possible to work out which way is down using the available sensors.
To use your bottle analogy, you're suggesting that if the last guy misses the cap you should throw them all out of the building and put four new guys in their places.
In the case of a copter, one of those guys was also balancing the bottle on his chin.
Those guys should stay on the job because they are the ones who remember what just happened.
I'm going stop feeding the troll now, as your posts have now gone from being a little OTT to outright stupid.