How Strong Is Your Programming Language?

Line-up at the 2013 Europe's Strongest Man competition
Camera: Canon EOS 7D | Date: 29-06-2013 05:31 | Resolution: 5184 x 3456 | ISO: 200 | Exp. bias: -1/3 EV | Exp. Time: 1/160s | Aperture: 13.0 | Focal Length: 70.0mm (~113.4mm)

I write this with slight trepidation as I don’t want to provoke a "religious" discussion. I would appreciate comments focused on the engineering issues I have highlighted.

I’m in the middle of learning some new programming tools and languages, and my observations are coalescing around a metric which I haven’t seen assessed elsewhere. I’m going to call this "strength", as in "steel is strong", defined as the extent to which a programming language and its standard tooling avoid wasted effort and prevent errors. Essentially, "how hard is it to break?". This is not about the "power" or "reach" of a language, or its performance, although typically these correlate quite well with "strength". Neither does it include other considerations such as portability, tool cost or ease of deployment, which might be important in a specific choice. This is about the extent to which avoidable mistakes are actively avoided, thereby promoting developer productivity and low error rates.

I freely acknowledge that most languages have their place, and that it is perfectly possible to write good, solid code with a "weaker" language, as measured by this metric. It’s just harder than it has to be, especially if you are free to choose a stronger one.

I have identified the following factors which contribute to the strength of a language:

1. Explicit variable and type declaration

Together with case sensitivity issues, this is the primary cause of "silly" errors. If I start with a variable called FieldStrength and then accidentally refer to FeildStrength, and this can get through the editing and compile processes and throw a runtime error because I’m trying to use an undefined value then then programming "language" doesn’t deserve the label. In a strong language, this will be immediately questioned at edit time, because each variable must be explicitly defined, with a meaningful and clear type. Named types are better than those assigned by, for example, using multiple different types of brackets in the declaration.

2 Strong typing and early binding

Each variable’s type should be used by the editor to only allow code which invokes valid operations. To maximise the value of this the language and tooling should promote strong, "early bound" types in favour of weaker generic types: VehicleData not object or var. Generic objects and late binding have their place, in specific cases where code must handle incoming values whose type is not known until runtime, but the editor and language standards should then promote the practice of converting these to a strong type at the earliest practical opportunity.

Alongside this, the majority of type conversions should be explicit in code. Those which are always "safe" (e.g. from an integer to a floating point value, or from a strong type to a generic object) may be implicit, but all others should be spelt out in code with the ability to trap errors if they occur.

3. Intelligent case insensitivity

As noted above, this is a primary cause of "silly" errors. The worst case is a language which allows unintentional case errors at edit time and through deployment, and then throws runtime errors when things don’t match. Such a language isn’t worth the name. Best case is a language where the developer can choose meaningful capitalisation for clarity when defining methods and data structures, and the tools automatically correct any minor case issues as the developer references them, but if the items are accessed via a mechanism which cannot be corrected (e.g. via a text string passed from external sources), that’s case insensitive. In this best case the editor and compiler will reject any two definitions with overlapping scope which differ only in case, and require a stronger differentiation.

Somewhere between these extremes a language may be case sensitive but require explicit variable and method declaration and flag any mismatches at edit time. That’s weaker, as it becomes possible to have overlapping identifiers and accidentally invoke the wrong one, but it’s better than nothing.

4. Lack of "cruft", and elimination of "ambiguous cruft"

By "cruft", I mean all those language elements which are not strictly necessary for a human reader or an intelligent compiler/interpreter to unambiguously understand the code’s intent, but which the language’s syntax requires. They increase the programmer’s work, and each extra element introduces another opportunity for errors. Semicolons at the ends of statements, brackets everywhere and multiply repeated type names are good (or should that be bad?) examples. If I forget the semicolon but the statement fits on one line and otherwise makes syntactic sense then then code should work without it, or the tooling should insert it automatically.

However, the worse issue is what I have termed "ambiguous cruft", where it’s relatively easy to make an error in this stuff which takes time to track down and correct. My personal bĂȘte noire is the chain of multiple closing curly brackets at the end of a complex C-like code block or JSON file, where it’s very easy to mis-count and end up with the wrong nesting.  Contrast this with the explicit End XXX statements of VB.Net or name-matched closing tags of XML. Another example is where an identifier may or may not be followed by a pair of empty parentheses, but the two cases have different meanings: another error waiting to occur.

5. Automated dependency checking

Not a lot to say about this one. The compile/deploy stage should not allow through any code without all its dependencies being identified and appropriately handled. It just beggars belief that in 2017 we still have substantial volumes of work in environments which don’t guarantee this.

6. Edit and continue debugging

Single-stepping code is still one of the most powerful ways to check that it actually does what you intend, or to track down more complex errors. What is annoying is when this process indicates the error, but it requires a lengthy stop/edit/recompile/retest cycle to fix a minor problem, or when even a small exception causes the entire debug session to terminate. Best practice, although rare, is "edit and continue" support which allows code to be changed during a debug session. Worst case is where there’s no effective single-step debug support.

 

Some Assessments

Having defined the metric, here’s an attempt to assess some languages I know using it.

It will come as no surprise to those who know me that I give VB.Net a rating of Very Strong. It scores almost 100% on all the factors above, in particular being one of very few languages to express the outlined best practice approach to case sensitivity . Although fans of more "symbolic" languages derived from C may not like the way things are spelled out in words, the number of "tokens" required to achieve things is very low, with minimal "cruft". For example, creating a variable as a new instance of a specific type takes exactly 5 tokens in VB.Net, including explicit scope control if required and with the type name (often the longest token) used once. The same takes at least 6 tokens plus a semicolon in Java or C#, with the type name repeated at least once. As noted above, elements like code block ends are clear and specific removing a common cause of  silly errors.

Is VB.Net perfect? No. For example if I had a free hand I would be tempted to make the declaration of variables for collections or similar automatically create a new instance of the appropriate type rather than requiring explicit initiation, as this is a common source of errors (albeit well flagged by the editor and easily fixed). It allows some implicit type conversions which can cause problems, albeit rarely. However it’s pretty "bomb proof". I acknowledge there may be some cause and effect interplay going on here: it’s my language of choice because I’m sensitive to these issues, but I’m sensitive to these issues because the language I know best does them well and I miss that when working in other contexts.

It’s worth noting that these strengths relate to the language and are not restricted to expensive tools from "Big bad Microsoft". For example the same statements can be made for the excellent VB-based B4X Suite from tiny Israeli software house Anywhere Software, which uses Java as a runtime, executes on almost any platform, and includes remarkable edit and continue features for software which is being developed on PC but running on a mobile device.

I would rate Java and C# slightly lower as Pretty Strong. As fully compiled, strongly typed languages many potential error sources are caught at compile time if not earlier. However, the case-sensitivity and the reliance on additional, arguably redundant "punctuation" are both common sources of errors, as noted above. Tool support is also maybe a notch down: for example while the VB.Net editor can automatically correct minor errors such as the case of an identifier or missing parentheses, the C# editor either can’t do this, or it’s turned off and well hidden. On a positive note, both languages enforce slightly more rigor on type conversions. Score 4.5 out of 6?

Strongly-typed interpreted languages such as Python get a Moderate rating. The big issue is that the combination of implicit variable declaration and case sensitivity allow through far too many "silly" errors which cause runtime failures. "Cruft" is minimal, but the reliance on punctuation variations to distinguish the declaration and use of different collection types can be tricky. The use of indentation levels to distinguish code blocks is clear and reasonably unambiguous, but can be vulnerable to editors invisibly changing whitespace (e.g. converting tabs to spaces). On a positive note the better editors make good use of the strong typing to help the developer navigate and use the class structure. I also like the strong separation of concerns in the Django/Jinja development model, which echoes that of ASP.Net or Java Server Faces. I haven’t yet found an environment which offers edit and continue debugging, or graceful handling of runtime exceptions, but my investigations continue. Score 2.5 out of 6?

Weakly-typed scripting languages such as JavaScript or PHP are Weak, and in my experience highly error prone, offering almost none of the protections of a strong language as outlined above. While I am fully aware that like King Canute, I am powerless to stop the incoming tide of these languages, I would like to hope that maybe a few of those who promote their use might read this article, and take a minute to consider the possible benefits of a stronger choice.

 

Final Thoughts

There’s a lot of fashion in development, but like massive platforms and enormous flares, not all fashions are sensible ones… We need a return to treating development as an engineering discipline, and part of that may be choosing languages and tools which actively help us to avoid mistakes. I hope this concept of a "strength" metric might help promote such thinking.

This entry was posted in Agile & Architecture, Code & Development. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *