I am participating in the UAV Outback Challenge and wanted to calculate the MTBF of the APM 2.6+ for use as a standalone fail safe device.
MTBF is a standard measurement of failure rate in the Engineering world, it is normally represented in Hours, or failures per Million Hours. From Wikipedia:
Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a system during operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The MTBF is typically part of a model that assumes the failed system is immediately repaired (mean time to repair, or MTTR), as a part of a renewal process. This is in contrast to the mean time to failure (MTTF), which measures average time to failures with the modeling assumption that the failed system is not repaired (infinite repair time).
Why do we care?
MTBF is an important indicator of reliability in a system, the higher it is the less likely it is to fail.
How do you calculate it?
I looked at the schematics for the APM 2.6+ and exported the Bill Of Materials (BOM), used this tool to calculate the MTBF of all the items according to the ANSI/VITA 51.1 standards assuming the operating temperature was 60°C (very Conservative), and that ther were the most unreliable class of devices (Consumer Grade), and for more specialized items (ICs and such) I looked them up on the manufactures website (here is an example, the 3.3v level shifter for the MPU).
After finding all these values I input them all into a spreadsheet, and classified them by Subsystem, and if the item was critical for that subsystems operation. (For example LEDs are critical to the LED subsystem, but not critical to the USB UART subsystem)
I then took all the components in every subsystem that were critical and totaled up the MTBF (assum any faluare of any part would cause a total system failure, using the formula: total MTBF = 1/(1/1st MTBF + 1/2nd MTBF + 1/3rd MTBF ect...))
This gives the following MTBFs for the following subsystems:
MPU MTBF: | 876099.0559 |
3.3v Regulator MTBF: | 2097175.403 |
AT2560 MTBF: | 1738803.191 |
AT32-U2 MTBF: | 291897.2053 |
DataFlash MTBF: | 4334969.547 |
MUX MTBF: | 1116667.245 |
Pressure MTBF: | 680851.5715 |
PWM input MTBF: | 558709.4613 |
PWM output MTBF: | 558709.4613 |
Magneto MTBF: | 581071.3857 |
We can then use that data to total the MTBF for the entire APM to be 61664.5 Hours (7 Years!), however not every one of those systems are required to maintain level flight, as far as I am aware only the following subsystems are required to continue to fly, MPU, 3.3v Regulator, AT2560, and PWM Input/Output Giving us a MTBF of 173218.9 Hours (~20.7 Years!)
Take everything I have said here with a grain of salt, as I am not a professional, in addition most of these failure rates were calculated, and the calculations might be a little off, or the manufacture data might be overestimating the true lifetime of their products. I have attached the Excel sheet I have used to calculate these values in the hopes someone might spot some mistakes I have made or find it useful in the future.
Comments
@Mark Omo
Many thanks for your Job, It is very interested.
One question, you say "used this tool to calculate the MTBF of all the items"
What is the tool?
Thanks
There are some standards that could help you estimating the MTBF of components in the actual scenario.
Among these, are MIL HDBK 217F, IEC/TR 62380, SN29500.
There you can find formulas for taking into account temperature, vibration, pressure, although there's nothing like testing to get a real picture, I agree with Cliff-E and others.
In the standards, terms like "benign airborne" or "aggressive airborne" may be used to identify interesting scenarios.
I'd also say that the contribution of electronic components to the overall MTBF is much less compared to that of mechanical and electromechanical part, connectors being probably the main source of problems.
On the other hand, the life time of the components must be taken in to account.
The life time is the time span after the "infant mortality" and before a point in time where the "bath tube graph" starts to rise again.
The MTBF holds only during this "life time" and has the meaning of how "deep" is the bottom of the "bath tube graph".
Having calculated an MTBF of, say, 20 years doesn't mean that a component will last for 20 years; it means only that - during the life time - the probability of a failure is one in 20 years.
To tackle with the infant mortality issue, one can do a burn-in phase, where the system is operated for some time, possibly cycling the temperature.
Before reaching the end of the life time, the component / part / sybsystem must be replaced if one wants the same MTTF.
A session of vibration tests (sinusoidal, bump, random) can be priceless in discovering design and reliability defects, although it is not something everybody can afford.
Personally, I would be more concerned about the software errors than the MTBF of the components.
+1. Great discussion. Given that it's easier to get accurate MTBF numbers from large batch sizes, in the long run it's conceivable that the reliability of UAV systems will become better than conventional avionics, even if the testing isn't up to the same standard.
I'm with Jason Franciosa that it's interesting analysis and at least it introduces us to the topic.
@DavidJames the dumber for the MPU-6000 is for the entire MPU-6000 subsystem including all the capacitors/resistors ect to make it function, my assumption that they use bargain barrel components at 60deg c is what drops that so much.
@Jason I agree, I thought the APM would have a lower MTBF but it has surprised me, it would seem that the APM's MTBF would far outweigh any ESC or Servo that you put on it
@Michael I could certainly do the same analysis for the Pixhawk, bot as many other people have pointed out I neglected a critical part, mechanical stress I assume that the solder joints never fail and that connectors never pop off ect... I am also unsure as to the esact inner workings of the Pixhawks redundant systems. I would love if someone could point me to some more information.
@F1P I agree this was never meant to be comprehensive just a starting point to get feedback on, and as it has been mentioned before I neglected mechanical stresses witch is a large part of this analysis. In the case that I have assumed (no thermal cycling and no environmental problems) the PCB and connectors would have a practically infinite MTBF.
Mark,
Your MTBF for the MPU6000 IMU may be a bit too conservative. Based on the data sheet High Temperature Accelerated Life qualification on the part I get 12e6 hour MTBF operating a 60deg C. The MPU6000 line is qualified by accelerated life testing using 3 lots of 77 for 1000 hrs each at 125 degC with 1 or less failures.
Modern electronics is pretty amazing. The MPU6000 IMU with 3 gyros, 3 accelerometers and signal processing/digitization electronics in one little 24 pad chip with a measured MTBF of 12 million hours!
I think that your overall observation, that the APM autopilot design is quite reliable, is correct. If the APM board is implemented as designed it is very reliable.
whether your estimate is accurate or not, I find it interesting and am glad it is being discussed. As Ardupilot continues to expand its use in commercial application and integrated into larger and larger platforms, this is most certainly a concern.
I do wish 3DR would make a Commercial grade version of the pixhawk. Even at twice the price I would gladly pay extra for it if they beefed up all critical components and ran it through a bit more testing before delivery.
Interestingly enough, there are a couple companies out there who are selling beefed up pixhawks.
Here is one:
http://store.uav-solutions.com/uav-solutions-pixhawk-avionics/
Can we do this analysis for a Pixhawk? It has redundant IMU's.
Hugues 1 hour ago
two remarks
1. U talk about case of defective product as result of poor quality control.
2. Motherboard is set of wire and contacts and will be assessed as large complex passive electronic component w own MTBF
Such very simplified model is true for idealized environment only