Guidelines

Naming

Rules:

  1. Capitalize type and namespace names, but not other names. Capitalize only the first letter, even in acronymic abbreviations.

  2. Choose complete words or phrases for non-local names where all uses cannot be seen at once. Actual-argument and local variable names may be short. The rule is that the less local a name is, the better it must be documented. A good name is better than a comment explaining why it's not.

  3. Avoid non-standard abbreviations and compound words: "Iterator" is better than "Iter". A short word is better than a shortened word.

  4. Separate words in names with_an_underscore.

  5. Static-constant names and enumerators should be lower case. Enumerated values of the same type should share a common prefix.

  6. Choose names to indicate purpose rather than implementation; the implementation is already visible in the declaration.

  7. Boolean variables and predicates should read as English predicate phrases, so that conditional statements using them read as grammatical English sentences.

  8. Functions with side effects should be English active-verb phrases.

  9. ALL_CAPS names should be avoided, except for macros. Macros, particularly in headers, should be avoided where possible. Global macro names should be very long, or #undef'd after use. (Avoid defining macros in header files, of course.)

  10. Names containing "__" (double underscore) or beginning with "_" are entirely forbidden.

  11. Names should not be or contain trademarks (including "zembu").

  12. A member data name may have an underscore "m_" prefix, e.g. to distinguish it from an otherwise-identical member function name. Non-member names must not have an "m_" prefix.

  13. Don't violate common conventions when using short names in long functions: "i", "j", and "k", if used, should be loop control variables; "p" and "q" should be a pointers or iterators.

Examples:

  struct Sector_position { long track_cylinder_head; };
  enum Rw_permissions 
    { 
      rw_read, 
      rw_write, 
      rw_both 
    };
  bool  means_failure(Status s);
  void  write_out_file();  // function with side effect
  #ifndef INCLUDED_STATE_H
  size_t n = a.size();
  for (size_t i = 0; i < n; ++i)
    { ++a[i]; }
  if (buffer->was_empty()) 

Discussion:

Nobody likes busywork like capitalizing names, but C++ declaration syntax is ambiguous and hard to understand when it is not obvious whether a name is a type. Capitalizing the first letter of a type is the smallest change possible that visibly distinguishes type names.

Capitalizing type names also makes it easier to discuss classes in documentation, written identically as in the code, as they fill a role similar to proper names in English. Using underscore separators in names makes them easier to read when mixed into other text. Using proper grammar in names makes documentation easier to read, and code easier to understand.

Macro names don't scope, and their use even in system headers tends to anarchy.

Trademarked names cause disruptions when ownership of code or trademarks changes hands.

The naming pattern "is_foo" for predicate names comes from decades of C tradition. In general, type suffixes ("_p", "_ptr") are needed in languages and environments where object type has been lost, or to avoid colliding with type names.

Type qualifiers, and declaration statements:

Rules:

  1. Only one name may be declared in a definition statement.

  2. Type qualifiers go with the type name, not the object name.

  3. Define local variables as late as possible, when they can be meaningfully initialized and used immediately. Do not re-use a variable for different purposes.

Examples:

  char*  p = "flop";
  char&  c = *p;
  char*  extract(std::pair<char*,int>& p);
     (NOT)
  char *p = "flop", &c = *p, *foo (pair<char *, int> &p); // no

Discussion:

In C++, definitions are mixed into executable code, so must read well as statements. In the example above, p is being initialized, not *p. The extra space before the name being declared (in declaration statements and member declarations) helps to pick out the name being declared from the "noise" text around it.

Types as entities are more important in C++ than in C, and should be represented visually as a unit, so inserting unnecessary spaces and regrouping type representations weakens their visual effect.

These rules go together because with multiple definitions in a single statement forbidden, the type qualifiers binding syntactically more tightly to names than to types is no problem.

Re-using local variables does not save on resources; the compiler knows when a variable is no longer used. Re-using a variable may confuse the optimizer about its usage characteristics.

Expression spacing and bracketing

Rule:

  1. Binary operators are separated from their arguments by whitespace.

  2. Insert no space before unary post-operators such as the function-call, array-index, and post-increment operators.

  3. Fully parenthesize subexpressions involving bitwise operators "&", "|", and "^".

Examples:

  // Right:
  operator==(char) 
  p[i++] = q[j++];
  if (this->headers()->is_empty())
  return (0xf & (x + 1));

  // Wrong:
  operator == (char)  // no
  p [i ++] = q [j ++];
  if (this->headers ()->is_empty ())
  return 0xf & x + 1;

Discussion:

Nobody can remember precedence on bitwise operators, so parenthesizing them reduces time and bugs for everybody.

The following arguments, cumulatively, lead to the second rule above.

  • Parentheses are even more heavily overloaded in C++ than in C, and the placement of a space before some uses (e.g. in control structures and to group expressions) helps to distinguish those uses.

  • Spaces around binary operators help visually to identify terms in an expression, and to distinguish overloaded unary from binary operators. Lack of spaces next to unary operators helps emphasize this role.

  • In C++, unlike in C, "()" is explicitly an operator, and even GNU doesn't suggest a space before other unary operators (although it seems sometimes to suggest a space after some unary operators, e.g. "!", but not others, e.g. "++").

  • In C++, unlike in C, the argument list of a function is part of its name, just as the tilde is part of the destructor's name.

  • Spaces inserted in expressions involving two or more applications of the function-call operator()(), as in

  •      return this->guarantor ()->lookup (name)->as_string ().c_str ();
                +-------------+ +--------+ +---------------+ +------+ +-+      
    lead to odd groupings.

  • Every C++ textbook -- I have 10 here, not counting C books -- places a unary operator, including function-call, directly adjacent to its argument. Only GNU departs from this industry-wide standard, for reasons that appear to refer rather to LISP than C conventions.

Definitions in auto and class scope

Rule:

  1. When defining a name in an auto or class context, separate the type from the name by two spaces, or define the name on the next line, indented.

Examples:

  int  offset = bytes_left - this->buffer;
  char const*  flush_remaining(int max);
  std::map<Runtime::Symbol_name,Runtime::Value*>*
    table = new std::map<Runtime::Symbol_name,Runtime::Value*>;
  typedef Runtime::Symbol_name  Symbol_name;
  typedef Symbol_name (&  Lookup_helper)(char const* s);

Discussion:

This rule helps to compensate for the tendency of the defined name to get "lost" amid all the decorators and initializers C++ declaration syntax scatters about it, and calls attention to a definition statement in the midst of non-definition statements.

Note, this rule does not call for extra spaces in formal-argument declarations.

Indentation of definitions and declarations

Rules:

  1. Definitions at namespace or global scope begin at the left margin. The "template<>" line, the type and storage class, the name being defined, and the function or class body should each begin on a separate line, at the left margin. A blank line is required between definitions.

  2. Definitions in class bodies are indented two spaces from the enclosing brackets. Second and subsequent lines of member definitions in class bodies are indented.

  3. Arguments to function definitions are on a separate line and indented two spaces, not out to the open-parenthesis. Other breaks may be indented to line up with other expression elements.

Examples:

In a ".h" file
  namespace Database 
  {

  class Journal
  {
  public:
    explicit Journal();
    template<typename T>
      void  template_function(T* target, 
        std::basic_string<T> const& source);
    int  roll_back_transaction();
  };

  } // namespace Database

In a ".cc" file:
  Journal() 
  { }

  template<typename T>
  void 
  Database::template_function(
    T* target,
    std::basic_string<T> const&  source)
  { 
    target->instantiate_element(
      source.const_begin(), source.const_end());
  }

  int  
  Database::roll_back_transaction() 
  { ... }

Discussion:

At namespace scope, names must be at the left margin to be visible to [ce]tags.

Argument declarations are indented less than they might be in C because the long type names, coupled with the function-name qualification, would tend to push declarations off the right margin. The function-call operator "(" does not introduce a parenthesized expression, so is treated differently from parentheses that do.

Spaces in template instantiations

Rule:

  1. In template declarations, insert spaces between template formal argument declarations

  2. In instantiations of templates, omit unnecessary spaces

Examples:

  template<typename Iterator, size_t N>   // spaces inserted
    class Range; 
  std::pair<Runtime::Name,Runtime::Value> ...  // no spaces inserted

Discussion:

Unnecessary spaces in type names weaken their recognizability as a syntactic unit. Instantiations with names that are too unwieldy in this form should be typedef'ed.

Placement of const qualifier

Rules:

  1. On global or member object declarations in which "const" acts as a storage-class modifier, "const" precedes the type name.

  2. On variable, argument, and function declarations in which "const" acts as a pointer or reference access qualifier, "const" follows the base-type name.

Examples:

  const int  max_header_lines = 120;   // storage class
  char const*  log(char const* s);     // access qualifier
  std::string const&  lookup();        // access qualifier

Discussion:

C++ overloads const as both a storage-class modifier (similar to "extern" or "static") and, when applied to pointers and references, as an access-control modifier. You know which you mean when you write it, and can help the reader by recording your intent.

Comments

Rule:

  1. Comments are delimited with the C++ "//" notation.

  2. Long comments begin on a separate line from active code. Comments of more than one line are separated from other code by a blank line. Comments refer to code below them.

  3. Functions more than ten lines long must be commented. Shorter functions should be. The comment describes precisely what the function guarantees, what it requires, and any errors it reports. Accuracy is more important than brevity.

  4. Good implementation comments answer the question "why?", not "what?". Comments should not replicate code, but call attention to subtleties and fragilities. Obvious code doesn't need commenting; complex code needs more. A large comment block organized as paragraphs is more useful than cryptic one-line comments distributed through code, and less distracting.

  5. Comments should be correct English sentences, with proper capitalization and punctuation. Write "a class derived from", not "a derived class of" or "a subclass of". Prefer "member function" or "virtual function" over "method".

  6. Names of code entities (classes, functions, variables) in comments should be written identically as they appear in code; function names have "()" appended, classes are capitalized.

  7. Code being "commented out" is delimited with "#if 0/#endif", or "//", not "/* */" comment notation.

Discussion:

A short "()" makes the difference between having to say "calls the insert function" and "calls insert()".

Class organization

Rules:

  1. Names defined in classes appear in the following order:

    • Public type forward-declarations & typedefs

    • Public constructors & destructor

    • Public member functions

      -

    • Protected type forward-declarations & typedefs

    • Protected virtual functions

    • Protected non-virtual member functions

      -

    • Private type forward-declarations

    • Private member functions

    • Private member data

      -

    • Friend declarations

  2. Constructors that can be called with one argument are declared "explicit" unless automatic, implicit conversion is justified.

  3. Public data members are allowed only in C-like structs, where the only member functions permitted are constructors.

  4. Only one-line function definitions are permitted in the class definition body. Nested-class definitions should be defined outside the nesting class body, wherever possible.

Example:

  class Outer
  {
  public:
    // ...
  private:
    class Imp;
    Imp*  imp;
  };
  
  class Outer::Imp
  {
    // ...
  };

Discussion:

Class definitions are read most frequently by users, who are most interested in public members, particularly constructors. Data members are of interest only to maintainers, so they appear last.

Reading cluttered class bodies is hard enough without big function and nested-class definitions mixed in. Substantial blocks of "other" material can and should be forward-declared in the class body, and defined separately.

Inlines are an "attractive nuisance". They are the third most-misused C++ feature (after inheritance and overloading), used far more often than any practical criterion could justify, just because it is more convenient to write them in place. Anything that encourages their use beyond strict engineering merit is detrimental.

Inheritance

Rules:

  1. Virtual members should usually be"protected" -- except, possibly, the destructor. If necessary, define (inline) public forwarding functions that call the virtuals. Feel free to give the public functions a different interface.

  2. Data members are never "protected".

  3. Inheritance that violates the Liskov Substitution Principle is protected or private.

  4. Avoid inheritance except where necessary. Prefer composition or delegation where possible.

Discussion:

In object-oriented design, the derivation interface to a class has a different purpose from the client interface, and addresses different users, and thus should be designed and documented separately. A public interface involves presentation of an abstraction, hiding messy details. Derivation and virtual interfaces involve implementation of that abstraction, operating on those hidden details. Thus, the public and derivation interfaces are naturally opposed. Keeping the two interfaces separate reduces the temptation to mix concepts improperly between the two.

The base class may provide a public forwarding function to call a virtual function, but more often a different interface is appropriate for users of the class than for implementers of derived classes. Simple duplication of the virtual interface in the public interface suggests an immature design. Derivation used for non-object-oriented "implementation inheritance" (e.g. a function-pointer table) should declared "protected"; the (known) caller may be declared a friend.

Experience has shown that use of protected data leads to severe maintenance problems, as implementations at different levels of a class hierarchy make different assumptions about how protected data may be used.

Inheritance is the single most overused language feature. It is over-promoted in textbooks by equating its use with "object-oriented" programming, and that with "good" programming. Object-oriented programming is one style among many supported by C++.

The "Liskov Substitution Principle" mentioned above may be summarized:

Derive class D from class B only if a D will be passed by pointer or reference to functions declared to take a B.

Interface Documentation

Rules:

  1. A component is one or more related classes, and friends.

  2. Components are documented in a separate comment block before the class definition(s).

  3. The public interface to a component is documented in a separate section from the inherited/protected interface. The latter defines what is required of derived classes that implement the interface, and what the base class definition does (if anything).

  4. Templates must be documented with a list of what specific operations the template applies to actual-argument types, as in the requirements lists in the C++ Standard. (Where appropriate, documentation may simply refer to such a standard list.) It is a grave error to use an operation not listed, because the compiler may not report the misuse.

Implementation Documentation

Rules:

  1. Implementation notes appear separately from public interface and derivation interface documentation, preferably not in the header file.

  2. Documentation includes a concise and complete list of Class Invariants, preferably in a form that can be verified by assertions. (A class invariant is a condition guaranteed on entry to any public or protected member function, and restored before returning to user code.)

  3. Sprinkle assertions of invariants liberally in implementation code.

Examples:

  // Invariants:
  //   1. this->container is either 0 or points to a valid container
  //   2. if (this->container != 0) then (this->container->size() > 0).

Discussion:

Invariants provide maximally concise documentation of the usage pattern of data members in the various member functions, so that not all members need be studied in detail before modifying one of them.

Use of explicit Class Invariants is among the most powerful tools known for improving C++ code quality. Assertions which express invariants make testing much more likely to detect errors. Difficulty expressing concise invariants may indicate design problems, so minimizing one's list of invariants is part of design optimization. The earlier invariants are codified, the more useful they are.

Invariants are important for exception safety, as they provide a checklist of conditions which must be restored to enable safe re-entry after an exception.

Blocks and statements

Rules:

  1. Blocks (except at top level in a ".cc" file) are indented two spaces in from the previous line. Statements within a block are indented a further two spaces.

  2. One-line blocks may begin and end with brackets on the same line.

  3. Only true one-line blocks are allowed as function definitions in class bodies; longer inlines should not appear in the class body. (Longer functions may still be declared inline and defined outside the class body.)

  4. Dependent clauses of all control structures (if, while, else, case) should be blocks. One-line dependent clauses may be represented as one-line blocks. A dependent clause starts on a separate line from the conditional statement.

  5. Each statement occupies at least one full line.

  6. Second and subsequent lines of a statement are indented at least two spaces. The function-call operator "(" does not introduce a subexpression, so should not be grouped and indented like parentheses that do. Terms within an expression may be indented according to expression structure.

Examples:

  // inside a class definition:
    State_machine::Action*  traction_action_factory() 
      { return 0; }

    inline void  set_timeout_time();

  // outside a class definition:
  inline void  
  Action::set_timeout_time()
  { 
    this->timeo_time = Boottime::now().add_seconds(
      this->timeo == 0 ? 10000000 : this->timeo); 
  }

    // conditional clauses:
    if (this->input > this->input_start)
      { --this->input; }
    else
      {
        this->input = "\n";
        this->input_start = this->input_end = this->input + 1;
      }

Discussion:

The prevalence of very short functions in C++, and their appearance in header files, makes departing from GNU bracketing advisable for the case of one-line blocks, particularly in header files.

The use of one-line block notation in control structures (as in the "if" statement above) makes adding and removing statements to/from such clauses quicker and less error-prone, and highlights symmetry.

Ganging two or more statements on a line would make reading code, setting breakpoints, and tracing code in a debugger more difficult.

Critical-resource blocks

Rule:

  1. Each critical-resource object gets its own nested block scope, even where a separate block is not semantically necessary.

Example:

    void 
    Ref_serializer::remove(
      Action* act)
    {
      { 
        Interrupt_lock lock;
        if (p != 0)
          { p->remove(act); }
      }
      return;
    }

Discussion:

Critical resources managed by the constructor/destructor of a custodian object require special attention during maintenance, so must be easily recognizable. The "extra" nesting level is the least intrusive way to represent this in code. (As mentioned above, it is better to represent facts in code than in comments.)

Use of members

Rules:

  1. In implementations of member functions, uses of member functions and member data names not prefixed by "m_" should be preceded by "this->". ("m_" is pronounced "member".)

  2. Static member names may be referred to using "this->", or by explicit scoping.

Examples:

  void
  Action::Locks::set_hold_locks(
    bool hold)
  { this->holding = hold; }

    case Action::step_wait:

  void
  Action::Locks::set_hold_locks(
    bool hold)
  { m_holding = hold; }  // alternative

Discussion:

Of all the apparent "busywork" rules, this would seem the most intrusive. However, it avoids a serious problem in inherited name lookup, to be explained further below. In addition, as part of a policy of qualifying all names, it makes it easier to tell, when reading code, where names come from.

The name lookup problem is illustrated by the following example
  int count;

  struct Base 
  { 
    int count; 
  }; 

  template <typename T>
    class Derived : public Base
  {
    // increments ::count, not Base::count (!):
    void bump_count() { ++count; }   // wrong
  };
The lookup for the name "count" in the example above bypasses the base class and its member Base::count, and finds ::count, because its use is not clearly "dependent" on a template parameter. Not all compilers have implemented this yet, but all (including gcc) will eventually.

Strictly speaking, the language rule only affects templates, but code often moves back and forth between template and non-template code, so different rules for template- and non-template code would introduce subtle bugs.

In dense code, local reference variables may be used to reduce clutter by the various scope qualifiers.

Object ownership

Rule:

  1. Clearly document who is responsible for calling "delete" on a pointer, and whether a constructor or function which takes a pointer as an argument is assuming ownership of it.

  2. Preferably, make ownership of the object a non-issue; encapsulate ownership so that users need never be concerned with deletion.

Discussion:

This rule will be easier to enforce, and will be tightened up further, when we have an infrastructure of object-ownership templates. Then, ownership transfer will necessarily be visibly expressed in the code.

Exception Safety

Rules:

  1. Any resource that must be managed (e.g. memory) must be owned by an object at all times.

  2. After a throw, an object should be unchanged if possible; failing that, it should be left in a state consistent with its invariants; failing that, it must be destroyable. Which of these alternatives is implemented is part of the interface definition, and must be documented.

Example:

  // Right:
  { 
    std::auto_ptr<Actions_entry> 
      act(new Actions_entry(taskid));

    { 
      Interrupt_lock  lock;
      this->actions.push_back(act.get());
    }
    a.release();
  }

  // Wrong:
  { 
    { 
      Interrupt_lock  lock;
      this->actions.push_back(
        new Actions_entry(taskid));  // unsafe, slow
    }
  }

Discussion:

In the example above, if the call to push_back() were to throw, the std::auto_ptr<> destructor would delete the object _after_ the lock object has been destroyed, re-enabling interrupts. If the call succeeds, ownership of the Actions_entry object is explicitly released to the collection this->actions.

In this example, observing exception-safety discipline also produces more efficient and clearer code. The expensive allocate-and-initialize operation occurs outside the locked block, minimizing latency. Ownership of the allocated object is visibly passed on to the container.

While the current policy here is not to use exceptions, no code should rely on the fact. You must treat any function call as, potentially, a "throw" statement, and arrange to clean up any resources allocated or invariants violated. Generally, this means to use custodian objects to manage resources, rather than simply pairing function calls.

Namespaces

Rules:

  1. Only namespace names may be global; all other names must be defined within a namespace.

  2. When using a name in another namespace (particularly std::), explicitly qualify the name.

  3. As a practical exception to the above, C library names are not qualified at this time.

Examples:

  std::pair<std::string,Value*> mapping;
  std::copy(s.data(), 
    std::find(s.data(), s.data() + s.size(), '\n'), 
    buffer);
  if (!isprint(*p))   // for now
    { break; }

Discussion:

Keeping names in namespaces reduces exposure to the global names (often improperly) exported in various execution (and simulation) environments.

Qualification of non-local names helps the reader to recognize name origins. Non-local names that are heavily used may be aliased locally, so this rule need not lead to clutter.

The exception for C library names is a concession to the immature condition of current "standard" library implementations.

Overloading

Rule:

  1. Avoid overloading operators, except as part of a standard interface, such as operators =, *, ->, == or !=.

  2. Overload functions only where the variants have semantically identical effects.

  3. Default argument values should not run a constructor. Overload instead.

  4. Avoid overloading virtual functions.

Examples:

  bool  is_locked() const;  // OK
  void  is_locked(bool);  // Evil
  void  set_lock(bool);  // OK

  // wrong:
  void  format_disk(std::string const& volume_name = "");

  // right:
  void  format_disk(std::string const& volume_name); 
  void  format_disk();

Discussion:

Overloading is the second most overused language feature, after inheritance. Certain operators, such as operator=, have defined interactions with the language and may be necessary.

Because virtual functions are an implementation method, rather than an interface technique, overloading is usually not appropriate for virtuals.

Default arguments that "do work" invisibly make it harder to understand the function and its cost, and make debugging harder. Visible action is better than invisible action.

Conversions

Rules:

  1. Constructors that can be called with one argument (including those that take defaulted arguments) should be declared "explicit".

  2. Do not define conversion operators, particularly to numeric types. Defining operator bool() is always a mistake.

Examples:

  explicit Grunt(int);
  explicit Grunt(char const* = 0, int = 1);  // defaulted args count

Discussion:

Again, visible action is better than invisible action.

Implicit conversions are far more frequently a cause of invisible bugs and troublesome ambiguities than they are a convenience for users. Besides providing invisible conversion paths creating usually-unwanted temporaries, conversions may make it harder to call the correct one of a set of overloaded functions.

Because template function argument type deduction cannot take advantage of conversions, otherwise-useful conversions often are not usable in code that uses templates. In such a context explicit casts are needed, which introduces risk because it is easy to use the wrong cast accidentally, and the compiler won't help. Explicit conversion functions don't suffer these problems.

A conversion to operator bool() is also a conversion to int, char, and double. If a type absolutely must be used in a conditional context (this is rare; generally, such a type should have no other purpose) it should instead convert to "void const*", which doesn't implicitly promote to other types.

Known Bugs

Rule:

  1. Known problem areas in code are annotated with a comment "// FIXME: " and what is known about the problem.

Violations

Rule:

  1. Deliberate violations of the coding standard should first be proposed on the "software" list with a list of conforming alternatives and what is wrong with them.

  2. Approved violations of the coding standard should be noted with a comment, with the date of the discussion, so that somebody interested can read the archives.