2019-09-29

Line Breaking

Early programming languages put rigid constraints on the length of a line. If a statement is too long, it must be broken.

Punch Cards

FORTRAN was designed in the punched card era. Each line was punched onto its own 80 column card. A card was broken into fields:

Field	From	Thru	Width	Purpose
comment	1	1	1	A `C` punched in column 1 means that the compiler ignores the rest of the card.
label	1	5	5	This is used to indicate the target of `GOTO` and `DO` statements. It is usually blank.
continuation	6	6	1	Any punch in this column means that the statement field of the previous card continues on this card.
statement	7	72	66	Program statements go here.
sequence	73	80	8	The compiler ignores this field. It can be used for card numbering and identification for when your card deck falls on the floor.

Cards had an actual cost, so there was a real economic incentive to minimize blank lines. Line continuations were to be avoided because of the cost of continuation cards and the complexity of their management, so statements tended to always start in column 7, spaces were not used, parentheses are discouraged, and variable names were kept very short. You want to keep the statement short so that it does not accidentally stray into the sequence field where the characters are ignored.

I adopted the practice of using 1 on the first continuation card, 2 on the second, and so on. But then I read the brilliant The Elements of Programming Style by Brian Kernighan and P. J. Plauger, who made this recommendation:

We have used $ as the continuation character, because it is the only standard Fortran character without any other syntactic meaning. It minimizes the chance for confusion, and is likely to cause a visible error if used in the wrong column.

Of course they were absolutely right. It made me angry. I had thought that the under-specified parts of languages were there to permit some form of self expression. But I was wrong. This was the beginning of my understanding of how to use a programming language correctly. K&P were not perfect. For example, their first sentence contains an unnecessary comma. But their approach was exactly right. It was something that I needed to learn to do. Damn them.

Paper Tape

Later languages regarded a program not as a deck of cards, but as a spool of paper tape. The C language imposed no length limit on lines because a program is just of stream of characters. A statement could be as long as a reel of tape. Paper tapes were replaced with magnetic disks. The cost of a line, compared to cards, is virtually free. We could at last afford whitespace in both dimensions, meaningful variable names, and clarifying parentheses. But there were still constraints on line length. Teletypewriters, and later, CRT terminals, shared the 80 column limit of punched cards. Eventually, as display technology improved, that limitation was relaxed. But there was still a human limitation: very long lines of text can be difficult to read, and it is important that our programs be readable. Coincidentally, 80 columns seems to be a good line length for programs, independent of punch cards and teletypes.

Over the years, I have employed many strategies for breaking lines that are longer than 80 characters. I would do what word processors do. I put as many characters as possible on a line. A symbol that violates the margin is moved to the next line. For a long time I followed the Java recommendation of indenting the continuation by 4 spaces, unless that caused ambiguity, in which case, indenting by 8 spaces. The lack of precision in that rule should have been a warning that it is a bad rule.

JavaScript forced me to be more thoughtful because of a misfeature called Automatic Semicolon Insertion, which can force the compiler to ignore errors, much like the FORTRAN card fields. I would try to break before or after particular punctuators, hoping to increase the likelihood that a copy/paste error would not be ignored.

I have been experimenting, testing, and refining over the years. My current recommendation is to break only after an opening character (such as (left parenthesis, [left bracket, {left brace). The following text is indented exactly 4 spaces. Structural breaks are permitted, allowing every expression in an argument list to be on its own line. The matching closing character is outdented 4 spaces, and the stream goes on.

I used to write like this:

const modulo = function modulo(dividend, divisor) {
    return subtract(dividend, multiply(floor(divide(dividend, divisor)),
        divisor));
};

Now I write like this:

const modulo = function modulo(
    dividend,
    divisor
) {
    return subtract(
        dividend,
        multiply(
            floor(
                divide(
                    dividend,
                    divisor
                )
            ),
            divisor
        )
    );
};

I don't break because the statement is too long. I break to inject some clarity into the coding. We have difficulty reading nested function invocations, but we are pretty good at reading outlines. So I open it up. This costs more lines, but when we got rid of punch cards, the cost of lines went way down. I am hoping that the next generation of languages provides better support for this, so that the unnecessary comma after dividend can be omitted. A linefeed is not a space. I think the compact form should be retained for short, simple expressions, but I might be wrong about that. I may still be desperately clinging to problematic conventions. This stuff is hard.

Hypertext

Obviously, the paper tape model is poorly suited to programming in the large. We should be programming with hypertext, as imagined by Engelbart and Nelson. Unfortunately, the hypertext system we got is HTML, which is wholly inadequate for expressing the complex structures in software systems. So we are still working with virtual paper tape streams. Our tooling has gotten significantly better, but even the smartest tools can not do what hypertext could do.