Developer

I2C Wiring Library - Lockup

Hi All, 

 

Rz_Ten1 was having some very strange crashes, and a few other users chimed in too once they heard his symptoms. He was able to reproduce it, then narrow it down to a hardware failure. His compass did not have the ground pin soldered leading to issues during flight and heavy vibration. 

 

Even though HW was the issue, ultimately a software limitation of Arduino caused the APM to lock up. Most SW issues in APM can be quickly reproduced since we literally run the same code 200+ times a second. That's why I assumed HW first. 

 

The library at fault is a poorly designed I2C driver. When it performs reads it "blocks" execution of the code. What we need is someone to write an alternative with the same functionality, with the addition of a timeout and error reporting. 

 

We can use this new code in the hex generation and possibly in our own distribution of Arduino, until the main Arduino branch adopts it.

 

Rz_Ten1 has a head start looking at the code here:

 

After a quick skimming, these lines in twi.c really stand out:


// wait until twi is ready, become master receiver
while(TWI_READY != twi_state){
continue;
}
// wait for read operation to complete
while(TWI_MRX == twi_state){
continue;
}
Both of those look pretty bad. A simple loop counter (x++;) with a
maximum value would work, I think, ie:
int x = 0;
while(TWI_MRX == twi_state && x < 1000){
x++;
continue;
}
or this might be better, since the above will cause the program to fall
though:
int x = 0;
while(TWI_MRX == twi_state){
x++;
if (x > 1000) return 0;
continue;
}

What we really need in the code is to know that the compass did not respond so we can do the right thing up higher in the code.
Rz_Ten1 has offered to look into it more, but I thought we could coordinate the effort here. I would like to see this fixed ASAP so we can eliminate this safety concern.

Jason


You need to be a member of diydrones to add comments!

Join diydrones

Email me when people reply –

Replies

  • Developer

    Just FYI, 

    Tridge pushed out a new I2C Library that we're now integrating. It seems to fix I2C lockups caused by the compass.

    Great news!!!

  • After a little more playing with the oscilloscope (I really need a logic analyzer) I'm not 100% certain we can fix this in software, at least not without skipping the dedicated TWI circuit on the processor.  It looks like during some of the lockups the TWI bus stops transmitting entirely.  It's not clock stretching, it's not sending commands, both signal and clock lines drop completely dead.

    I've been bouncing some emails off of Amtel, but no official news yet.

  • I can suggest this. Someone who had experienced failure could look into the datasheet for that particular compass module to find out if that model uses clock stretching. This is probably the only way an i2c bus can "hang up". This is when slave keeps signal (?) line high to buy some time for its internal operations of data processing, or when it can't respond to a master's request right away. So, when something happens to a unit at that  moment the the line remains high and master will wait for a response indefinitely.

     

    So the question is "does compass module use clock stretching?"

  • Has anyone taken this on?  I found the post that looks to be a similar solution.  I will try to see if I can make this solution work in the next few days.  If anyone has any additional insight I would love to hear it.

    http://dsscircuits.com/articles/arduino-i2c-master-library.html
  • Any update on this?

    I hope to fly this weekend, but I guess it'll only be in stabilize and Alt-Hold, I won't plug the compass in.

  • I am curious why this potentialy recuring error didn't apear during all the previous development and flying? It makes me feel that this I2C compass problem is not yet the problem that stops the code.

    I did find freezing problems in my codes when SRAM was full. Sorry for not helping at that.

     

  • Developer

     

         Perhaps it's a bit extreme but there is this i2cdev library which looks like a replacement of the standard arduino Wire library which seems to have non-blocking calls and a timeout.  I should add that I've never used it.

  • Let me hop on the bandwagon while this thread is still active :)

     

    The lines found by Rz_Ten1 are there for a great reason to be a guard or a last resort. I've put almost same lines in my interrupt-driven code for the pic24 (mini-bully) based ahrs. Usually the code expects certain state of the I2C when a read is requested. For instance, before reading I always check is a byte(s) are waiting in the buffer - then there is no delay and I can do other things.

     

    The state of the i2c driver should be also checked either by looking at the state register or comparing software state variable to the expected state. Then I can always distinguish 3 cases - hardware failure, i2c protocol failure or software protocol failure. In the first two cases i2c is taken to a known state by resetting i2c, in the second case no reset required. At least this is my experience working with i2c.

     

    In arduino wire implementation everything is done correctly as far as I remember and one can always distinguish between those failures. At least I can confirm that when I physically unplug power or i2c connection between my flight controller and ahrs they are both responsive and restore communication after I plug in the wires. My flight controller is based on arduino pro mini and uses wire library.

  • if someone has time maybe they can look through all of the arduino core and lib code to check for other places where a blocking issue might cause a similar problem to the i2c. i will do this when i can but i am short of time at the moment.

  • Well, I'll be anxiously following this situation because my heli is grounded until this is resolved. :(

This reply was deleted.

Activity