Incidents "reserved only for development team", are there more? Why hide this from us?

I decided it is time to dust off some of my old ArduPilot boards. As we all know we *should* be running the latest code at all times, per the development staff. *Supported* means running latest.

As several other users have pointed out "What I really don’t like is the lack of trust I have after every update" - http://diydrones.com/xn/detail/705844:Comment:1064781 Because of this I am always hesitant to jump onto the latest code revision. Alas, any time I have gear sit for an extended period of time I always poke around the forum for *known* issues that I may have missed.

The current top post "Naza M vs. APM 2" is what actually spawned my interest to spin up some of my old APM gear since I've been flying Naza a lot lately. http://diydrones.ning.com/forum/topics/naza-m-vs-apm-2 The tone of the conversation is Ford vs. Chevy as expected, however there has been some interesting commentary with regard to how DJI as a company chooses to inform its end users about issues.

Will Snodgrass put it quite well here http://diydrones.ning.com/xn/detail/705844:Comment:1065480 "They obviously have taken the time to setup a system that is there just so they contact you with important info regarding your product. So Why not use it?"

Somehow I got to poking around on YouTube in my hunt for recent bugs and wound up stumbling upon Marco Robustini's channel. Marco (http://diydrones.com/profile/marco67) says the following about himself on the channel: "I'm coordinator and developer of the ArduCopter Tester Team. My goal is to test the various electronic flight board to compare them and find bugs/problems and the related solutions for stable and safe flying". I thought the channel was pretty cool and I shared it with a few multi-rotor enthusiast friends of mine. One pointed out that if I had not seen Marco's Acro mode bug video that it was something that I HAD to see.

After a bit of hunting to find his "Acro mode bug" crash this gem was presented for my viewing pleasure.

I can see in the YouTube comments that Tridge said there were fixes out for this specific issue:
"Andrew Tridgell 11 months ago
this bug has now been fixed in ArduCopter master. Many thanks to Marco for his patience in working through the bug report with us and allowing us to find this bug!"

For what ever reason I simply can't remember hearing a single word about this *bug* anywhere on the forums. It most certainly was not trumpeted as something that folks should look out for. So oddly enough once again I find myself combing through the horrible Ning forum interface looking for answers. "you switched to kamikaze mode?" clearly was not what I was looking for.

After quite a bit of hunting this is what I uncovered:

Marco Robustini on January 9, 2012 at 2:51am
"I have also almost completed my heavy octo (destroyed after the crash for the "acro I-term bug" in the code = < R5)"

Based on the above comment we know the bug impacts code =< rR5 for the ArduCopter 2.1.1 tree.

Marco Robustini on January 10, 2012 at 2:41pm
"Do not fly with this version for know reason: - i2c library is not update (possible bus lockup) - "acro I term " bug is unfixed in this version. You are warned! :P"

We are told by Marco in a random forum comment not to fly ArduCopter 2.1.1 alpha. The root of the topic also has more generic bug info: "Update R5: This is a quick patch based on a bad crash Marco had. My theory was an I term that built up during wind that needed to be reset, but wasn't. It's a corner case but It bit Marco pretty bad."

R_Lefebvre on January 23, 2012 at 9:54am
http://diydrones.com/xn/detail/705844:Comment:766380
"I have been following this closely pretty much since 2.0.55 came out, and I can only think of 2 fatal bugs that definitely caused crashes. Marco's I-term bug, which was instituted in the code.... Might have been before 2.0.49?"

Marco Robustini on January 27, 2012 at 5:18pm
http://diydrones.com/forum/topics/arducopter-2-2-beta?commentId=705...
"Now John will try to explain something, but dunno if I can, I'll try anyway.
Explain it with a video that until now was reserved only for development team (back at the end of December), the incident was found that was due a code bug with "Acro I Term" (fixed after this event), and now it's time that people like you see this."

the first thing that caught my attention was that this bug was seemingly and intentionally hidden from public view? Why? I am especially concerned about this when I see comments like "I had the same, 2 times uncontrolled flights and crash." - http://diydrones.com/xn/detail/705844:Comment:769934

Marco Robustini on February 3, 2012 at 6:15am

http://www.diydrones.com/xn/detail/705844:Comment:774992
"I destroyed my heavy octo because these two lines were missing."

So… after all that reading and hunting I am still not sure exactly which versions were affected, what exactly triggers the issue and *when* it was full addressed in the code. Given the criticism with regard to how DJI has on file email address info, yet fails to utilize it to contact their end users I felt it appropriate to mention this "corner case". 3DR has all of our attention via the Ning forum package, additionally they have a Tumblr blog AND all of our info from the initial purchase. Is there any reason that there are not better efforts to help let us know what we should and should not be worrying about?

When I put my quad copter on the shelf 11 months ago it was working fine… I am hesitant to update to latest for obvious reasons, alas the version that I am running may have some latent / partially explained bug waiting to chop my face off.

Thoughts? do we just circle back to the "It is DIY, what do you expect?" mantra? Why are ANY "incidents" held from public view and made "reserved only for development team"? Should we not be sharing this stuff? Luckily no one has gotten hurt... 

Views: 901

Reply to This

Replies to This Discussion

"you'd rather kick the baby because you don't feel respected because you have a long history of contensious and argumentative interactions?" 

If you call kicking the baby simply asking WHY it is so hard to find information on exactly which versions of the code are impacted by the "Acro I term bug". Then sure I guess I am doing that.  

"You've sometimes had valid concerns, but those have been devalued by your own behavior" 

that made me chuckle... the validity of my concerns is unfortunately not at all impacted by my tone. Your personal opinion of me perhaps. IF there is a bug in the code there is a bug in the code... me having a "tense argumentative writing style" really does nothing to change that. It is either there or it isn't. 

"But really, if you don't want to help, then what exactly is your purpose?"

It is Christmas.... there was LOTS of Grandma money spent on Ardu* gear. Awareness is the key, no one wants to have an accidental "Acro mode I term" fail and drill their Grandma loot into the ground. Being aware of what is out there saves choppers. =] 

Interesting read, but based on your quoted text this all surrounds an alpha release, which I'd sort of expect to not be announced on the front page in red letters but 'hidden' in the developers mailing list and a contributors personal youtube comments.

I do tend to wait a bit after a new release, but I do that with all software (especially drivers or things that can chop my face off :)).

I remember this issue.  It was this incident that got Marco on the devlist and shortly after to be appointed as lead tester.

It caused a rather big stirr in the developer group and we immediately focussed our efforts in tracking down the bug.  If I remember correctly it was Tridge who finally nailed it with a very thourough analysis.

I have found the original thread back in the devlist, but not yet the thread where Tridge explains it and fixes it.  I do remember he did this in a seperate thread.

Will post the fix if I can find it.  

Also, from the initial thread it's clear that making the video private was Marco's own initiative.  He was not a developer or official tester yet at that time, and he didn't want to cause a panic with his clearly disturbing video, before it was clear if it was a bug, or some hardware failure from his octo.  I think this was understandable and commendable behaviour.

When it became clear that it was indeed a bug, he proposed to make the video public, so others could be warned.  I guess that's what he meant with the quote '"It's now time that you see this video".  He meant it's now confirmed as a bug, be warned.

I don't think there was any coverup intended on this one.  First post in the original topic on the devlist is from Chis, encouraging everyone to get to the bottom of this crash ASAP.

Awesome... thanks u4eake. 

Boston totally understood re: the alpha release. It was not entirely clear if it impacted other versions in general. There was one comment that I ran across mentioning pre 2.0.49 issues with I-term.

 

http://diydrones.com/xn/detail/705844:Comment:766380
"I have been following this closely pretty much since 2.0.55 came out, and I can only think of 2 fatal bugs that definitely caused crashes. Marco's I-term bug, which was instituted in the code.... Might have been before 2.0.49?"

"I do tend to wait a bit after a new release" me to, I think that there has been a mantra for folks to be on *current* at times. Not a point of contention really. 

Thanks you two. 

Actually that video from Marco led the developers team to focus a lot more on checking code quality.  It shifted focus from feature development to reliability development.  Models for hexa and octo were developed for the auto-test suite (an automatic software test that runs every day) and a whole buch of things were implemented to be automatically tested, which were not before. (eg. mode switches).

Since then the way we implement code changes and releases has changed.  Now we work with windows in which code can be changed, and then a window in which code is tested and can no new features can be introduced, only bugfixes.  

In short, you should interpret the advice to run the latest code as 'run the latest code that is generally accepted as the most stable we have".  At this time that would be 2.8.1.  

I admit that it will cost you a bit of reading and research to find the code that is generally considered as the most stable we have... :-)  Maybe some improvement can be made there.

I've dug it all out again :-)

The fix is here : ACM: reset all I terms on gyro calibration  It was put in master on jan 3, 2012 

The bug was introduced a few revisions before (unclear exactly when), when the acro I terms were split out based on feedback of multi-wii flyers.  However it was forgotten to reset those new I-terms on arming.  This allowed the acro I terms to wind up while marco's copter was still unarmed on the ground, without them being reset on arming.  This didn't hurt as long as he didn't switch to acro mode.  But at some point he switched to acro instead of loiter.  The copter immediately got the "bad" I-term applied which commanded a 70°/s roll rate, with the known result.

From the devlist (after a lot of posts hunting the bug) :

> A few revisions back, We split the g.pi_rate_ term to give a duplicate
> set for ACRO only. This was a request based on feedback from the
> Multi-Wii guys.

> The new g.pi_acro_ PI terms didn't get added to the reset call.

great, that's an easy fix then. I'd also suggest we wipe all the I terms
at the end of startup_ground(), and all the places where we call
imu.init_accel() and imu.init_gyro(). Maybe have a reset_I_all() call ?

Cheers, Tridge

Acro was almost not used at that time, and nobody in the dev team flew acro, that's why the bug went unnoticed for a few versions.

u4eake thanks very much for the background in the last two posts on this topic. Excellent information. Seriously... Tumble that sort of stuff on http://blog.3dr.cc

Seeing this stuff transparently banged out in a more visible location would be awesome, finding out about it months later and feeling uncertain sucks. 

Cheers to ya sir! Very glad to hear that this incident triggered the automation of many things that should have been automated before it occurred. 

Just so I'm clear on this, were talking about a bug that was apparently found and fixed along time ago?

If anything I see that the Dev community has been doing exactly what you what you would expect.

When I made the comment you quoted, I was talking about the system that DJI utilizes, where you must submit and verify an Email address before you can use there software. Yes 3DR has your Email when you purchase something, but you really can't expect them to hand over those addresses to an open source Dev community. Maybe something could be setup where you register to receive emails about bugs when they happen....but if your not interested in hearing about bugs in real time then that won't help you either.

If you insist on comparing this with DJI, It is at least worth mentioning that any bug found by the Naza team will be strictly an internal matter, and as here it will be fixed....you just won't know about it. (unless it is a major problem)

"were talking about a bug that was apparently found and fixed along time ago?"

It was mentioned that it could have impacted code as early as 2.0.49 (maybe up to 2.0.55?) , to an outsider it is unclear which versions were impacted, if they were alpha only, and when it was fixed. So to answer your question specifically, I am unsure. That was part of the problem.

With regard to "a long time ago"... yes I have had some gear sit on a shelf for about a year now that I wanted to dust off. When it was put away it was working just fine... I was simply doing the usual sanity check for *known* safety related gotchas when I ran across this. 

Sorting out the technical differences between the two systems really isn't much to discuss for me but I get your point. The thin line between 3drobotics and DIYDrones has always been blurry, this situation is no different with regard to collected email addresses. "you really can't expect them to hand over those addresses to an open source Dev community". That seems like a stretch, especially when http://www.3drobotics.com routes straight to the same domain of diydrones.com that the forum software runs from. It isn't at all like the two entities not are connected. After all we have a nice "Created by Chris Anderson" at the bottom of the OpenSource forum. I am gonna go out on a limb and say there are some elements of the actual company and of the forum that go hand in hand. 

I *AM* interested in hearing about bugs in realtime, it just wasn't my main focus. Simply being able to find useful bug information would suffice. Realtime was not the only desire. Brilliant idea with regard to a feed about bugs... I kinda thought that was the point of the issues list however. It just seems that it may not contain *ALL* of the bugs you guys dig up in private, rather only those that are directly reported. 

"any bug found by the Naza team will be strictly an internal matter...you just won't know about it. (unless it is a major problem)

Considering that they are not touting them selves as an Open Source community or product I really don't see an issue with this. As long as they keep making front page banners about critical issues so that they jump right in my face when I haven't flown in several months, I'm ok with that. What you brought up with regard to transparency in the interim while they are debugging an issue, I agree would be nice to see improvement there IF it seems to be a continuing problem with a lack of information. 

RSS

Social Networking

Contests

Season Two of the Trust Time Trial (T3) Contest has now begun. The fourth round is an accuracy round for multicopters, which requires contestants to fly a cube. The deadline is April 14th.

A list of all T3 contests is here

Groups

Advertisement

© 2013   Created by Chris Anderson.   Powered by

Badges  |  Report an Issue  |  Terms of Service