By Chris Douce
Being someone who may occasionally be termed a professional programmer, I have been persecuted by an integer. This number has followed me around, has intruded in my most private data, and has assaulted me from the pages of public listings.
This number assumes a variety of disguises, sometimes causing problems that are relatively small, other times creating problems that can take a significant amount of time to fix.
I am, of course writing about the magical number zero.
One of the most obvious areas where zero introduces confusion is the with the issue of array indecies.
Being brought up on a diet of various types of home computer Basic I soon became aware of a difference that was a little more subtle than commands used to change the VDU colour. I found that different dialects of presented different ways as to how the zero'th index in an array was addressed.
Whilst attending a 'for fun' programming class many years ago to learn the rigours of Basic programming (my first language?), the tutor used what I thought to be an interesting heuristic in attempting to make the array situation clearer.
The heuristic was to increase the size of the array by one and for the student to forget about using the zero-th element altogether.
This confusion has continued to haunt me to this day. Recently I discovered that the current implementations of Visual Basic support reassignment of indeces allowing the programmer to choose what start indecies he or she wishes to use.
Today, if faced with source code that appeared to ignore the first element of the array, I would wonder why on earth the first element had been reserved and why there were no comments to explain the reservation! (Whether this is a reflection upon my expertise, or on my level of paranoia, I do not know).
The more one considers programming operators where an index can be found, more instances of the zero can be found.
Graphical controls such as tabs or list boxes being a further example. Not so long ago I was faced with the situation of loading a control with data, where the source array that was indexed that little bit differently to the widget array.
Another control, a text list box, could be commanded to unselect all selected items. How do you do this? You send it the value of -1.
I have sometimes had to delve into the unfortunate world of bit shuffling and assembly language - setting up registers of data to send them over a wire to another piece of hardware that will hopefully do something on your command. In the world of physical hardware, the value zero has a very profound meaning.
Zero indexed arrays, from a software engineering and efficiency perspective makes complete sense. In fact, it would almost seem silly (to many of us) to consider anything else.
The issue of zero does stop at simple graphical controls or array indecies.
Consider for a moment databases and data tables. A collegue turned to me not so long ago and asked, 'These data tables, they generate a unique primary key, right? Do they begin at one or begin at zero?'. I wanted to answer the question, but I couldn't. Which was it? In the particular implementation of a relational database that I was using, the answer was one.
Does this make sense? Possibly. Is it possible to conceptually confuse an order numbered zero with an order that doesn't exist? I personally would not like to be assigned customer number zero.
Many modern imperative programming languages support some form of ennumeration. If ennumerations can be indexed using loops, what value should they start at? Introspectively, if one is refering to physical objects within an ennumeration, it would feel odd for the first item to be labelled zero.
Ennumerations are, in essence, simple data structures. If combined together, the situation can become significantly more complicated.
Take the Java language Vector class (a name that positively screams confusion).
Some programmers may be tempted to distinguish between null and zero data items. This intrinsically sounds like a very bad idea. Common sense tells that one should choose a convention and stick to it.
The area where more often than not where I have become unstuck is when I have to construct data structure iterators. Here I am faced with a choice. Do I add a less than or equals or choose less than and add one to the value I want to count up to? I have not quite made up my mind. Either way, it shouldn't make much odds, since hopefully your modern optimising compiler may be able to make a decision for you.
Which is inherently easier to programmers? Is it making decisions based on whether something is equal easier than figuring out if something is smaller or larger?
I have to confess that iterators in modern languages in C# and Java have taken the fun away of working with arrays. I feel that using iterators would reduce the number of errors in comparable fragments of code where a comparable fragment uses the more 'traditional' forms of looping construct, simply because the numbers zero and one are shifted out of the equation.
This effect on error is, of course, pure conjecture. It is, I feel, something that would be interesting to test empirically.
There is, of course, one case of zero that may forever thwart a programmer - the famous Divide by Zero error. Once you know it, it ceases to be an obvious gotcha. It may be still there, however. It may reside in code in the form of a latent error, a waiting pathogen - an exception waiting to be raised.
A brief history of zero can be found at: A history of Zero
Thanks goes to the participants of Stephen Clarke's weblog for inspiring these ruminations!