Wednesday, January 16, 2013

The Curly Question of Braces

Sometimes arguments come down to what I call the "Chocolate vs. Caramel" factor.

If I was to ask which of these flavours you preferred most people I'm hoping would not be expecting everyone to answer the same. Nor would I expect them to accuse anyone who answers this question differently as being wrong.

Strangely enough, if I was to replace these flavours with the question of whether the opening curly brace should appear on the same line or the next line, then a lot of people seem to loose this concept of "personal preference" and break out into jihad or something.

The use of braces, curly braces or curly brackets if you prefer (I call them squigglies) actually dates back to before c. The language BCPL (1966) used  $( and )$ to delimit a statement block to distinguish them from ( ) used in an expression. This influenced Ken Thomposon when he developed his language B (1969) that actually used { } and was the predecessor to C (1973).

C of course is the language we remember and is still widely in use today. So languages that use squiggly brackets are often described as C-style languages and include C++, Java, C#, Objective-C, JavaScript, PHP and Perl. That's a big chunk of the most commonly used languages in use today.

The issue boils down to whether you should write code like this (same-line):

  if ( expr) {
    ... statements ...
  }

 
or this (next-line):

    if ( expr)
    {
      ... statements ...
    }

or more importantly, does it even matter?

There are a number of variants on top of this, one that I've encountered in a C++ codebase I had to work on that followed this 'philosophy':

If a single statement:

    statement;
   
can be interchanged with a block of statements:

    {
        statement1;
        statement2;
    }

   
then surely the logical block idention for:

    if ( expr)
        statement;


should be:

    if ( expr)
        {
            statement1;
            statement2;
        }

This particular indention isn't widely used and I've only encountered it in the one software house, but as an argument goes it's as valid as most others I've heard.

There are a few cases where there might clearly be a right or wrong answer and they mainly falls into 2 categories, conformance and language peculiarities.

For conformance, the most likely reason is the old "All our code is this way so this is the way we want you to do it". And this is a pretty valid reason. There is lot to be gained from uniformity in large code bases and most developers can make this sort of shift if they need to.

These days IDEs usually offer a configuration option to automate one way or the other, and some (I'm thinking of Visual Studio specifically) allow you to cut and repaste the code so that it will automatically be reformatted as per this choice, which is great, as long as the reformatting doesn't bugger up something else.

The other thankfully less common conformance reason is that you have one particular loud mouth Nazi that wants it their way, and in the interest of peace, the rest of the team conform, where as they really should just punch them on the nose.

Most languages handle either convention without any issues, but there are a few languages where parsing of the code actually leads to different results.

In PowerShell the "foreach" looping statement supports either but in a piped expression where "foreach" is an alias to the ForEach-Object cmdlet, the opening squiggly has to be on the same line or something completely different happens, that usually leads to a syntax error.

So given:

 $list = @( 1, 2, 3);

Either of the following works fine:

    foreach ( $item in $list)
    {
      $item + 1;
    }

but in a piped expression

    $list | foreach
    {
      $_ + 1
    }

will error with:

ForEach-Object : Cannot bind parameter 'Process'. Cannot convert the "$_ + 1" value of type "System.String" to type "System.Management.Automation.ScriptBlock".

where as the following works:

    $list | foreach {
      $_ + 1
    }


However I'm not sure this justifies having to write all your PowerShell this way, specifically because I wouldn't have written this code either way, but rather:

    $list | foreach { $_ + 1 }

At the same time, because you get a error when you try to run the faulty code, testing will quickly identify where such an issue occurs. For PowerShell to misinterpret rather than report an error you would have to create a ScriptBlock object that I'm pretty sure you can't do in out-of-the-box PowerShell.

JavaScript is often used as another example because of how it supports optional semi-colons. If there is no semi-colon on certain lines the interpreter first add a semi-colon before compiling. Yes, this is a simplification but it will serve for the purpose of this discussion.

Issues occur when after adding the semi-colon, the line of code compiles and results in behaviour that was not expected. The problem is that the examples I have seen are obscure. For example, if you are returning an object from a function:

  function createApple()
  {
      return
          {
              name   : "Apple",
              colour : "Red"
          };
  }


The "return" line is actually interpreted as "return;", the function then returns "undefined" and the next line of the function is never called.

The problem is this issue has nothing really to do with squiggly brackets, because the following is equally flawed for the same reason.
   
  function createApple()
  {
      var apple = { name : "Apple", colour : "Red" };
      return
          apple;
  }


The other problem with the first createApple() example is that the squiggly brackets here are not being used for statement block but rather an object expression.

The real issue here is that this sort of problem is best dealt with by testing, specifically unit testing and not coding conventions.

The strangest thing is that I've heard a number of claims used by both camps, some being really rediculous.

For example the claim of readability surely has to be a matter of personal preference, and I suspect in a lot of cases, depends on which method the programmer was first introduced to.

The claim by each camp that the other method makes the code read more like BASIC though has got to be the wierdest. There have been times when I've confused my C with C++, with Java, C# and JavaScript. But I can safely say I have never confused my code with BASIC.

I assuming of course that they mean a BASIC without line numbers like Vax Extended Basic or Visual Basic. But I can't believe in any universe that C will ever look like BASIC unless of course you wrote your C like this:

  #define IF if (
  #define THEN ) {
  #define ELSE } else {
  #define ENDIF }

  ...
 
    IF result THEN
        printf( "The result is true\r\n");
    ELSE
        printf( "The result is false\r\n");
    ENDIF

 
Yep, that compiles and works in Visual C++ 2010 but even I would think it was BASIC at first glance if I didn't know what I was looking at.

And you know what? Now that I look at it, the readability is no worse than:

    if ( result ) {
        printf( "The result is true\r\n");
    } else {
        printf( "The result is false\r\n");
    }

       
Hmm. Why is "reads more like BASIC" being considered an insult anyway?

My personal preference is for next-line but I equally accept someone's personal preference being same-line.

For me next-line works really well with one exception, the do-while construct. For example:

    do
    {
        statement;
    } while ( expr);

   
doesn't work for me but then neither does:

    do {
        statement;
    } while ( expr);


Fortunately I don't use do/while very often.

Because of my preference for next-line I often here arguments specifically in favour of same-line, a couple of which are worth mentioning.

The first is that same-line saves paper when printing and screen space.

This is a pretty dated argument. I can't think of the last time I printed a piece of code, and the days of 80 x 25 screens are wll behind us. And back then we used to ship boxes of computer printouts every month by the truck load. I'm not sure saving the odd row here and there ever really provided any value.

The other argument is that same-line is the way it's done in "The C Programming Language" by K&R, considered the authoritive book on C.

With respect to this claim though I will quote the actual book states at the end of a short discussion on indention:

  "The position of braces is less important, although people hold passionate beliefs. We have chosen one of several popular styles. Pick a style that suits you, then use it consistently."
 
I don't think I could have said it better myself.

No comments:

Post a Comment