While transcribing parts of ArduPilot code in Python for testing and NMEA sentence parsing/emulation, I noticed that the checksum calculations/comparisons are made twice, once for the RMC and later for the GGA sentences.Could it be worth a few bytes or simply better to do it another way to limit the number of checksum to process (for now, all received sentences are cross-checksummed)Could we1) test the sentence header (RMC or GGA)2) if header found, do the checksum calc3) if chksum ok parsing the sentenceThis solution would leave all unnecessary sentences eventually sent by the GPS out of the processing loops a little earlier than now. this would probably save some processing time for those who do not program their GPS to limit the NMEA flow to the minimum sentence needed in ArduPilot.Another question concerning the code:is there an explanation as to why the latitude/longitude conversions to decimal degrees need temp variables. In Python, I do each conversion in one line:lat = float(`latIn`[:2]) + float(`latIn`[2:])/60lon = float(`lonIn`[:`lonIn`.index('.')-1]) + float(`lonIn`[`lonIn`.index('.')-1:])/60Would such single line be doable in Arduino?Finally, I would like know if there is any way to read a full line out of the GPS instead of parsing through each character. In Python, I use a serial.readline() and then read complete blocks between the comma delimiters:datablock = buffer.split(',')latIn = string.atof(datablock[3])lonIn = string.atof(datablock[5])course = string.atof(datablock[8])Is there aFinally, I wish to say I am not a programmer, but only an archaeologist, so I could be wrong ! Thanks for any inputs on this, I'll probably add more questions about the code in the coming days, I am a newbie.
For Otto. A simple case. Assume three key words, "dog", "cat and "bird". The order keyword table is then -
cat sy_cat
bird sy_bird
dog sy_dog
You have a scalar type declared to define the symbols (tokens) sy_cat etc.
where the second field is the token value. Your lexical analyzer parses out the keyword from the input stream and then searches the table for a match using code like this -
/*Perform a binary search on B-Tree node.*/
low = 0;
high = (int)curr_node->key_count - 1;
while ((low <= high) && (outcome != CMP_EQ)) {
ifmidx = (low + high) / 2;
if (outcome == CMP_GT) low = ifmidx + 1;
else {
if (outcome == CMP_LT) high = ifmidx - 1;
} /*if/else*/
} /*while*/
A large table can be searched in just a few probes. Obviously you should use a linear search when you have a small table, but it always should be ordered.
The tokens are the preferred word size for the machine you are using so that further identification is a single machine compare instruction and for readability the token has a symbolic name.
An alternative lookup is to use a hash function, but I consider the simple logic of the binary search outweighs the complexity of hashing.
The lexemes of the input stream can be extracted a character at a time translating the current input stream character value through a lookup table to identify the meaning of the character (use the char value as the index into the table by adding it to a base pointer). For example it might be a valid alpha a valid numeric, a delimiter, whitespace etc. Once a symbolic type is associated with the character the lexical analysis is reduced to single instruction compares.
As in most parts of CS building a state machine produces an efficient result. That is particularly the case in real time embedded systems.
Reto:
1) Checksums: I made similar change lately, check ArduPilot code from here.
2) temp variables are not needed if you can write code in one line. They're mostly used for better readability, compiler optimizes code in either case.
3) ArduPilot GPS parser reads one char at time, buffering it into a "line" buffer, and when EOL is received, it parses the line. Same buffering is surely done in your readline function. Buffering of line is needed because serial data come slowly and it may take several calls to GPS parser function to receive one NMEA line.
I am a programmer who spent many years writing compilers and similar products requiring efficient parsers. A very successful parser is hand coded and tokenizes the input using a binary search of ordered tables of keywords. By converting to atomic tokens at the first possible opportunity further processing is greatly simplified. Using string compares and searches is tedious and unecessary.
There are automated parsers which are driven by a description of the grammar of the input, but these do not generate perticularly efficient code. The traditional way is to use "lex" and "yacc". I have only glanced at it in passing but my impression is that NMEA has a simple grammar.
A language like Python is not ideally suited to efficient parsin and does not let you use effcient modern parser generators like "lemon".
Comments
cat sy_cat
bird sy_bird
dog sy_dog
You have a scalar type declared to define the symbols (tokens) sy_cat etc.
where the second field is the token value. Your lexical analyzer parses out the keyword from the input stream and then searches the table for a match using code like this -
/*Perform a binary search on B-Tree node.*/
low = 0;
high = (int)curr_node->key_count - 1;
while ((low <= high) && (outcome != CMP_EQ)) {
ifmidx = (low + high) / 2;
keyat = (ifmidx * ifmklg) + 4;
outcome = cmpkey(key, curr_node->keys);
if (outcome == CMP_GT) low = ifmidx + 1;
else {
if (outcome == CMP_LT) high = ifmidx - 1;
} /*if/else*/
} /*while*/
A large table can be searched in just a few probes. Obviously you should use a linear search when you have a small table, but it always should be ordered.
The tokens are the preferred word size for the machine you are using so that further identification is a single machine compare instruction and for readability the token has a symbolic name.
An alternative lookup is to use a hash function, but I consider the simple logic of the binary search outweighs the complexity of hashing.
The lexemes of the input stream can be extracted a character at a time translating the current input stream character value through a lookup table to identify the meaning of the character (use the char value as the index into the table by adding it to a base pointer). For example it might be a valid alpha a valid numeric, a delimiter, whitespace etc. Once a symbolic type is associated with the character the lexical analysis is reduced to single instruction compares.
As in most parts of CS building a state machine produces an efficient result. That is particularly the case in real time embedded systems.
1) Checksums: I made similar change lately, check ArduPilot code from here.
2) temp variables are not needed if you can write code in one line. They're mostly used for better readability, compiler optimizes code in either case.
3) ArduPilot GPS parser reads one char at time, buffering it into a "line" buffer, and when EOL is received, it parses the line. Same buffering is surely done in your readline function. Buffering of line is needed because serial data come slowly and it may take several calls to GPS parser function to receive one NMEA line.
There are automated parsers which are driven by a description of the grammar of the input, but these do not generate perticularly efficient code. The traditional way is to use "lex" and "yacc". I have only glanced at it in passing but my impression is that NMEA has a simple grammar.
A language like Python is not ideally suited to efficient parsin and does not let you use effcient modern parser generators like "lemon".