Hi All, 

 

Rz_Ten1 was having some very strange crashes, and a few other users chimed in too once they heard his symptoms. He was able to reproduce it, then narrow it down to a hardware failure. His compass did not have the ground pin soldered leading to issues during flight and heavy vibration. 

 

Even though HW was the issue, ultimately a software limitation of Arduino caused the APM to lock up. Most SW issues in APM can be quickly reproduced since we literally run the same code 200+ times a second. That's why I assumed HW first. 

 

The library at fault is a poorly designed I2C driver. When it performs reads it "blocks" execution of the code. What we need is someone to write an alternative with the same functionality, with the addition of a timeout and error reporting. 

 

We can use this new code in the hex generation and possibly in our own distribution of Arduino, until the main Arduino branch adopts it.

 

Rz_Ten1 has a head start looking at the code here:

 

After a quick skimming, these lines in twi.c really stand out:


// wait until twi is ready, become master receiver
while(TWI_READY != twi_state){
continue;
}
// wait for read operation to complete
while(TWI_MRX == twi_state){
continue;
}
Both of those look pretty bad. A simple loop counter (x++;) with a
maximum value would work, I think, ie:
int x = 0;
while(TWI_MRX == twi_state && x < 1000){
x++;
continue;
}
or this might be better, since the above will cause the program to fall
though:
int x = 0;
while(TWI_MRX == twi_state){
x++;
if (x > 1000) return 0;
continue;
}

What we really need in the code is to know that the compass did not respond so we can do the right thing up higher in the code.
Rz_Ten1 has offered to look into it more, but I thought we could coordinate the effort here. I would like to see this fixed ASAP so we can eliminate this safety concern.

Jason


Views: 2572

Reply to This

Replies to This Discussion

Thanks Jason, and great skillz, Rz. Guess what cables I'll be tugging on when I get home?

I suspected a similar problem once, long ago when I worked on my own quad project.  My solution was to timeout on each read and return a status code to indicate the success of the read.

The behavior of the caller was as follows:

- Before takesoff, I2C read failure was considered fatal and the quad would shut down

- After takeoff, I would turn on a flag that indicated a sensor failure, and tried to fly without that sensor (better than crashing!).  In my case, it was the accelerometer (you can try to land without it...) and the barometer (you will not have altitude hold.

You can look at the code here:

http://code.google.com/p/caspiquad/source/browse/trunk/CaspiQuad/i2...

 

It is indeed highly recommended to implement something similar in ACM!

 

Dror

That's the idea! This is open to everyone to post a working solution on ACM. Just grab a branch from GIT and try hacking the Wire lib or replacing it.

Thanks!

Jason

ok, then create a mavlink error msg to show to the user.

me doing a cortex cmsis i2c driver at the moment.

where i have to specify the retry count.

but i have to write info code as well showing possible problems on the i2c bus.

i have no idea about errors here - at least i never encountered them.

i assume bad esc firmware and cables to be an issue - soldering pnts !?!.

robert

Thanks for this guys.

 

I'm not sure if it's just coincidence that my first symptom was loss of tail control?  Would that indicate a problem with the Mag?  Probably not.

 

Anyway, just wondering if the problem was with the soldering of the ground within the board, or the wires *to* the board?  I understand that a SW fix is needed, but I just want to know if I should have a closer look at my Mag as well.  The wires were torn from the mag on impact, so that's one less thing for me to inspect.

 

Now, I'm just wondering, does the theory that this is the cause agree with my symptoms?  You say the code stops executing.  It appeared to me that it stopped, but then restarted just before or just after the impact.  What I mean is, the log was running, then seems to have stopped or been unaware of the state of the aircraft, but appears to have started again and recorded the crash location.  Does that make sense for this problem?  I wonder if what happened was that something vibrated loose in the mag, the program hung, then after impact whatever was loose made contact again and the code picked up executing right were it left off?

 

That's not a bad theory actually, as that's kind of what it looked like.  The log shows time passing, but no changes in data occur, then the heli is suddenly at the crash site and I think it even recorded some of the crash forces. 

 

How does the I2C driver react if the Mag is enabled, but completely not present?  Because currenty I'm rebuilding my heli, the mag is completely detached, but the program still runs.

This is exactly what Rz_Ten1 described. The code won't proceed, then the connection comes back and the code continues as if nothing happened. That assures me there is no crash, just a pause caused by the tight loop in the  blocking function for I2C. 

A slight intermittent connection or high noise on ground could cause this.

Jason

Ok, that gives me some comfort then that I've found root cause of my issue.  Thanks!

 

So do we need to look at the soldering on our boards, or is the problem most likely our wiring connections and/or noise on the cable?  The HW aspect is still important because if the copter loses the Mag signal during Loiter, bad things will probably still happen.

 

That might need to be something we need to think about.  If this failure occurs, not only do we not want to freeze the code, but the heli needs to do something to save itself.  Switching to Stabilize automatically might work if we have an Xbee and the Mission Planner can notify us.  But I'm not sure what to do otherwise.

Hi Jason,

You have my sympathies. I2C drivers are not as easy to implement as the I2C marketing literature would have you believe...I have the scars to prove it.  ;-)

Here is link to the MatrixPilot I2C magnetometer driver. You are welcome to take a look at it for ideas.

The MatrixPilot team was lucky enough to run into interference problems with our I2C driver early on. I still use a 72 Mhz Tx, and when the Tx is brought close enough to the magnetometer, it causes it to go into an unknown state, and lock up the I2C bus. Along the way, I discovered that the Honeywell magnetometer can get into some states that are forbidden by the I2C protocol!

In the end we were able to make our driver "bullet-proof". You can bring a 72 Mhz Tx right up to the magnetometer and/or disconnect and reconnect the magnetometer, and the I2C driver will recover.

One advantage we have in implementing our I2C driver is that everything in MatrixPilot is interrupt driven. There are no places in the code in which the code is blocked waiting for something.

Best regards,

Bill

Hi Jason,

I just realized that the MatrixPilot I2C driver might be a little hard to understand if you are not used to the dsPIC processors and/or interrupt driven programming, so I thought I should probably explain a couple of things about the driver:

1. The routine rxMagnetometer() is called on a regular basis, each time that it is desired to get a magnetometer reading.

2. Setting  _MI2CIF = 1 causes an I2C interrupt to be generated, which causes the _MI2CInterrupt() ISR to be called.

3. The driver uses a state machine concept.  (* I2C_state) () causes a handler for the present state of the I2C driver to be executed. Each handler determines what the next state should be, and sets I2C_state accordingly.

Let me know if you have any questions about how the MatrixPilot I2C driver works.

Best regards,

Bill

Would it be possible to make a full diagnostics programm? By that I mean i programm that tests all your sensors and will generate a warning if a sensor is to much out of sync. That way people could really check fast if there soldering is ok and if there is a hardware problem with there apm+imu. 

a mavlink error statistic message would be the solution.

what do you do - or would to you like to see when your copter is in the air ...

Hi Ruben,

I am not who you are addressing your question to. In MatrixPilot, we have hardware diagnostics programs. For the magnetometer, we have what we call a "roll-pitch-yaw" demo that can be executed on the ground. If you have a magnetometer, the demo performs a functional test on it. It is very clear whether it is working or not.

Best regards,

Bill

RSS

Social Networking

Contests

Season Two of the Trust Time Trial (T3) Contest has now begun. The fourth round is an accuracy round for multicopters, which requires contestants to fly a cube. The deadline is April 14th.

A list of all T3 contests is here

Groups

Advertisement

© 2013   Created by Chris Anderson.   Powered by

Badges  |  Report an Issue  |  Terms of Service