by Derek Jones
The two major factors affecting the use of language constructs in a program's source code are likely to be driven the author's personal habits and the demands of the model used for the application.
Measurements of C source (see table) show a surprising amount of regularity. For instance, that perennial favourite Zipf's law shows up in Identifier name usage (see fig 130).
However, not all 'laws' are followed so closely; for instance, Benford's law (see fig 141) provides a poor fit for integer constants, and a slightly better fit (see fig 142) for floating-point constants.
The amount of nesting of language constructs often (e.g., fig 43, 44, 45, 184, 188, 193) has a log linear relationship. Whether this log-linear form is the result of the way that developers organize their code or is a natural consequence of solving real-world problems is an open question.
Educators will be interested in the fact that a large percentage of statements are very simple (e.g., table 192, 202, 205, 219). Concentrating on teaching the common cases will help students focus on what they will mostly encounter during program comprehension (and perhaps reduce the desire to write complicated code).
Anybody wanting to measure a different collection of C source can find some useful tools at www.knosof.co.uk/cbook/srccnt.tgz (note: a *nix based system is required).