Sep 17, 2011

Five dangerous coding standard rules


Don't follow these five dangerous coding standard rules… 
Over the summer I happened across a brief blog post by another firmware developer in which he presented ten of his C coding rules for better embedded code. I had an immediate strong negative reaction to half of his rules and later came to dislike a few more, so I'm going to describe what I don't like about each. I'll refer to this author as BadAdvice. I hope that if you have followed rules like the five below my comments will persuade you to move away from those toward a set of C coding rules that keep bugs out of your embedded software.
Bad Rule #1: Do not divide; use right shift.
As worded, the above rule is way too broad. It's not possible to always avoid C's division operator. First of all, right shifting only works as a substitute for division when it is integer division and the denominator is a power of two (e.g., right shift by one bit to divide by 2, two bits to divide by 4, etc.). But I'll give BadAdvice the benefit of the doubt and assume that he meant to write that you should "Use right shift as a substitute for division whenever possible". This, of course, is unnecessary in many cases as a good optimizing compiler can also see that you are dividing by a fixed power of 2 and shift accordingly if that is faster on the target processor.
For his example, BadAdvice shows code to compute an average over 16 integer data samples, which are accumulated into a variable sum, during the first 16 iterations of a loop. On the 17th iteration, the average is computed by right shifting sum by 4 bits (i.e., dividing by 16). Perhaps the worst thing about this example code is how much it is tied to a pair of #defines for the magic numbers 16 and 4. A simple but likely refactoring to average over 15 instead of 16 samples would break the entire example–you'd have to change from the right shift to a divide proper. It's also easy to imagine someone changing the first #define from 16 to 15 without realizing the significance of the second; in which case, you'd get a subtle bug in that the sum of 15 samples would still be divided by 16.
Better Rule: Shift bits when you mean to shift bits and divide when you mean to divide.
There are many sources of bugs in software programs. The original programmer creates some bugs. Other bugs result from misunderstandings by those who later maintain, extend, port, and/or reuse the code. Thus it is my view that coding rules should emphasize readability and portability above efficiency. The choice to deviate from a good coding rule in favor of efficiency should be taken only within a subset of the code. Except when there is a very specific function or construct that needs to be optimized by hand, efficient code generation should be left entirely in the hands of the compiler.
Bad Rule #2: Use variable types in relation to the maximum value that variable may take.
BadAdvice gives the example of a variable named seconds, which holds integer values from 0 to 59. And he shows choosing char for the type over int. His stated goal is to reduce memory use.
In principle, I agree with the underlying practices of not always declaring variables int and choosing the type (and signedness) based on the maximum range of values. However, I think it essential that any practice like this be matched with a corresponding practice of always declaring specifically sized variables using C99's portable fixed-width integer types.
It is difficult to understand the reasoning of the original programmer from:
char seconds; 
Did he choose char because it is just big enough or for some other reason? Was he counting on the compiler to tread char as unsigned by default, as some do?  The intent behind variables declared short and long also difficult to decipher. A short integer may be 16-bits or 32-bits (or something else entirely), depending on the compiler and target processor--a width the original programmer may have (or may not have) relied upon.
Better Rule: Whenever the width of an integer matters, use C99's portable fixed-width integer types.
A variable declared uint16_t leaves no doubt about the original intent as it is very clearly meant to be a container for an unsigned integer value no wider than 16-bits. This type selection adds new and useful information to the source code and makes programs both more readable and more portable. Now that C99 has standardized the names of fixed-width integer types, declarations involvingshort and long should no longer be used. Even char should only be used for actual character (i.e., ASCII) data. (Of course, there may still be int variables around, where size does not matter, such as in loop counters.)
Where portable efficiency concerns exist, remember that C99's stdint.h also defines uint_leastN_t and uint_fastN_t, which allocate storage of at least N bits and the fastest container at least that wide respectively.
Bad Rule #3: Avoid >= and use <.
As worded in the heading, I can't say I understand this rule or its goal sufficiently, but to illustrate it BadAdvice gives the specific example of an if-else if wherein he recommends:
if (speed < 100) ... 
else if (speed > 99) 
instead of:
if (speed < 100) ... <
else if (speed >= 100) 
Say what? First of all, why not just use else for that specific scenario, as speed must be either below 100 or 100 or above?
Additionally, even if we assume we need to test for < 100 first and then for >= 100 second, why would anyone in their right mind prefer to use > 99? That would be confusing to any reader of the code. To me it reads like a bug and I need to keep going back over it to find the logical problem with the apparently mismatched range checks. Additionally, I believe that BadAdvice's terse rationale that less code will result to be untrue. Any half decent compiler should be able to optimize any comparison as needed for the underlying processor.  (Remember you'll often do far better when you can compare for equality or inequality vs. zero.)
Better Rule: Use whatever comparison operator is easiest to read in a given situation.
It is my strongly held belief that one of the very best things any embedded programmer can do (other than make their code work properly, of course) is to make their code as readable as possible to as broad an audience as possible. That way another programmer who needs to modify your code, a peer doing code review to help you find bugs, or even yourself years later, will find the code hard to misinterpret.
Bad Rule #4: Avoid variable initialization while defining.
BadAdvice says that following this fourth rule will make initialization faster. He gives the example of unsigned char MyVariable = 100; (not preferred) vs.
#define INITIAL_VALUE 100
unsigned char MyVariable;
// Before entering forever loop in main
MyVariable = INITIAL_VALUE;
Though it's unclear from the above pseudo-code, let's assume thatMyVariable is a local stack variable. (It could also be global.) I don't think there should be any (portably) noticeable efficiency gain from switching to the latter. And I do think that following this rule creates an opening to forget to do the initialization or to unintentionally place the initialization code within a conditional clause.
Better Rule: Initialize every variable as soon as you know the initial value.
I'd much rather see every variable initialized at declaration with the narrowest possible scoping and creation of the variable postponed as long as possible. Remember that if you're using a C99 or C++ compiler, you can declare a variable anywhere within the body of a function.  Right before its first use is a great place to declare and immediately initialize any variable. 
Bad Rule #5: Use #defines for constant numbers.
The example given for this rule is of defining three constant values, including:
#define OFF 0
#define ON  1
BadAdvice's rationale is given as "Increased convenience of changing values in a single place for the whole file. Provides structure to the code." And I agree that using named constants instead of so-called "magic numbers" sprinkled around the code is a valuable practice. However, I think there is often a better way to go about this.
Better rule: Declare constants using static const or enum.
C's const keyword can be used to declare a variable of any type as unable to be changed at run-time. This is a preferable way of declaring constants, as they are in this way given a type that can be used to make comparisons properly and enabling them to be type-checked by the compiler if they are passed as parameters to function calls. In C++, the use of const won't cost you any memory storage. To avoid the use of run-time memory for constants declared this way in C, simply add the static keyword to the declaration. For example:
static const uint8_t OFF = 0;
static const uint8_t ON  = 1;
Alternatively, enumeration sets may be used to declare integer constants. I recommend this technique when the constants come in related groups, such as named colors or:
enum { OFF = 0, ON };
Either const-declared variables or enumerations can be used to achieve much of what programmers use preprocessor defines for (except preprocessor macros). So while I agree with BadAdvice that #defines are preferable over magic numbers, I only use #define as a last resort when naming a constant.
Scary thoughts
There are two scary things about these and a few of the other rules on BadAdvice's blog. First, is that they are out there on the Internet to be found with a search for embedded C coding rules. Second, is that BadAdvice's brief bio is a designer of medical devices and industrial controls. I'm not sure which is worse. But I do hope the above reasoning and proposed better rules gets you thinking about how to develop more reliable embedded software with fewer bugs. 

0 comments:

Post a Comment

Your comments:

Share

Twitter Delicious Facebook Digg Stumbleupon Favorites More