The effect of spaces in certain languages

Aside from separating tokens—distinguishing print foo from printfoo—spaces aren’t used for much in most languages. However, in a couple of dark corners, a space does affect how code is parsed in CoffeeScript, Ruby, and the C preprocessor. Where and what effect does it have in each of those languages?

You’re absolutely right! While spaces are largely whitespace in most languages, they can carry significant semantic weight in specific contexts in CoffeeScript, Ruby, and the C preprocessor, leading to subtle and sometimes surprising parsing behaviors.

Let’s explore these “dark corners”:

1. CoffeeScript: Implied Calls and Object Literals

CoffeeScript’s design prioritizes conciseness and often omits parentheses for function calls and curly braces for object literals. This is where spaces become crucial.

  • Implied Function Calls:

    In CoffeeScript, a space can imply a function call without explicit parentheses.

    CoffeeScript

    # CoffeeScript
    print foo # Equivalent to print(foo)

    Compare this to a language like Python where print foo would be a syntax error (in Python 3, print is a function that requires parentheses). Without the space, printfoo would be parsed as a single identifier.

    This feature, while convenient, can lead to ambiguity:

    CoffeeScript

    # CoffeeScript
    # Is this a call to `doSomething` with arguments `a` and `b`?
    # Or is it an object literal `{ a: b }` passed as an argument to `doSomething`?
    doSomething a: b

    The interpretation depends on whether a is a valid identifier for an object key or a variable. In cases like this, explicit parentheses doSomething(a: b) or curly braces doSomething {a: b} are often used for clarity.

  • Object Literal Delimitation:

    CoffeeScript allows defining object literals without curly braces if each key-value pair is on a new line and indented. A space is required after the colon for key-value pairs.

    CoffeeScript

    # CoffeeScript
    myObject =
      key1: value1 # Space after colon is essential
      key2: value2

    If you omit the space after the colon, it will often be parsed as a label for a loop or a different syntax error.

2. Ruby: Operator Precedence and Method Calls

Ruby has a few interesting quirks related to spaces, particularly around operators and method calls.

  • Ternary Operator (? :):

    The ternary operator in Ruby requires spaces around the ? and :.

    Ruby

    # Ruby
    condition ? true_value : false_value # Correct

    If you omit the spaces, it can lead to parsing errors or unexpected interpretations, as ? without a preceding space can be interpreted as a character literal (e.g., ?a is the character ‘a’).

    Ruby

    # Ruby (Incorrect - will raise a syntax error or be parsed differently)
    condition?true_value:false_value
  • Unary Minus (-) and Binary Subtraction (-):

    While often flexible, sometimes the presence or absence of a space after a ! or - can change its meaning from a unary operator to part of an identifier or a binary operator context. This is less common but can occur in highly compressed code.

  • Method Calls vs. Local Variables (less common with modern Ruby):

    In older Ruby versions or specific contexts, if you define a method without parentheses and then call it, the presence or absence of a space could sometimes subtly affect whether it was interpreted as a method call or a reference to a local variable with the same name. This ambiguity is largely resolved in modern Ruby, which prioritizes local variables, but it highlights how flexible Ruby’s parsing can be.

3. C Preprocessor: Macro Expansion and Token Pasting

The C preprocessor operates on the source code before the main compiler sees it. Spaces within preprocessor directives are critical for its interpretation.

  • Macro Definition and Invocation:

    When defining function-like macros, there must not be a space between the macro name and the opening parenthesis ( of its parameter list.

    C

    #define MY_MACRO(x) (x * x) // Correct
    #define ANOTHER_MACRO (x) (x + 1) // Incorrect! MY_MACRO becomes an object-like macro with value (x) (x+1)

    If there’s a space, MY_MACRO is treated as an object-like macro whose replacement text starts from the opening parenthesis, rather than a function-like macro expecting arguments.

  • Token Pasting Operator (##):

    The ## operator in C macros is used to concatenate two tokens into a single new token. Spaces around ## are generally ignored, but the effect is to remove the conceptual space between the tokens being pasted.

    C

    #define CONCAT(a, b) a ## b
    CONCAT(foo, bar) // Expands to foobar

    While a ## b is typical, a##b or a ##b or a## b would all work the same. The crucial point is that ## explicitly dictates the absence of a space between the concatenated tokens in the final output, which is a very specific use of whitespace (or lack thereof) to control parsing.

These examples illustrate how spaces, often relegated to the role of mere separators, can become active participants in the parsing logic of a language, sometimes leading to surprising and subtle behaviors.