Tuesday, March 3, 2009

For what it's worth: Java & expressiveness

Since Ola Bini started blogging about his very own language-project, Ioke, I've been pretty intrigued by the whole package. As soon as the first version had been released I started playing around with it, digging deeper into its concepts and contributing to the Programming Guide and Ioke's core libraries. One of the language's highest goals is expressiveness. It's been designed from the ground up to push the limits on what you can do with internal DSLs. Some of the ingredients that make that happen are Ioke's incredibly flexible treatment of operators as well as the inclusion of a macro system, which, coming from a non-Lisp background, is the most outstanding one for me. For that reason Ola uses the term folding language nowadays to describe Ioke. Well, but this is not a blog post about Ioke. It just sort of reminded me once again of Java's ineptitude when it comes to creating concise and expressive APIs/DSLs.

Java su... ah, excuse me, ... lacks!

Regarding expressiveness Java is like an iron maiden that constrains you and pierces you with a lot of sharp objects (no pun intended). You usually have to resort to all kinds of compromises and even trickery in a few cases (1). Sure, you can implement fluent APIs for providing more easily readable natural language constructs (e.g. JSR-310, the Date and Time API, goes to great lengths to make the API easy to read and natural to use). It makes sense and I think it's a valuable addition to good API design. But it still lacks in my opinion for numerous reasons.

An important one is that Java's restrictions in identifier naming make it impossible to use more appropriate symbols for certain operations. For example, it prevents you from using operator characters in method names or even just appending a '?' to function names, like in Ruby, which makes boolean functions very easy to read. This dilemma is probably due to the fact that operators, which often play an essential part in DSL design, are more or less considered primitives in the Java language rather than functions as in many (or should I say most?) other, particularly functional, programming languages. Whereas Java's closest cousin C++ at least allows operators to be defined as functions for user-defined classes, Java does not ... you're stuck with the built-in ones. I cringe everytime I see code using Java's BigInteger API. Truely horrible from an aesthetic point of view. Other languages offer considerably more freedom in that they achieve operator overloading (2) by treating operators as functions while additionally providing the syntactic sugar of "hiding" the actual method call. This paves the way for highly specialized and readable DSLs. For example, see Scala's parser combinator library that allows writing BNF-like syntax directly in the source code. Note that existing languages differ from each other in that some (e.g Ioke, Haskell or Smalltalk) allow the definition of entirely new operators, possibly incl. associativity and precedence, whereas others provide just a limited set (e.g. Groovy or Ruby). If Stuart Halloway had his way, operator overloading would be one of the distinguishing features of Java.next.

Alternatively, Java also lacks the syntactic sugar of languages like Groovy, Ruby or Scala to omit the parenthesis on n-ary methods (in Scala's case: n = 1) and use them in infix position such that you can write x or y instead of x.or(y). Such tiny syntactic gimmicks alone greatly improve the readability of code. Inspired by Io, Self and Smalltalk, Ioke generally uses whitespace to separate messages from each other, i.e. your reading of a line of code is not "interrupted" by punctuation marks.

Even if ...

... Java had all of the above, it would still be vastly inferior to other languages which are based on different concepts and paradigms and incorporate appropriate feature sets that make them particularly suited for implementing internal DSLs. Among these are:
  • The dynamic and reflective nature of some languages allows you to observe and modify a program's structure at runtime to shape core libraries to the needs of your DSL, e.g. by adding new methods or by redefining existing ones. Groovy's Categories and ExpandoMetaClass (even on a per-instance basis since version 1.6), Ruby's extensive metaprogramming facilities (see Rails) or Smalltalk's meta object protocol come to mind. Furthermore, concepts like Groovy's methodMissing / propertyMissing and invokeMethod, Ruby's method_missing, Ioke's pass etc. let you create effective DSLs with little effort. Ever tried to create an XML builder in Java? In Groovy it's just a few lines of code.

  • The dynamic-language argument is a bit weak though: Scala's implicit conversions, for example, are accountable to a very sophisticated static type system rather than a dynamic one. Moreover, InfoQ hosts an interesting interview with Lennart Augustsson on DSLs written in Haskell where he emphasizes amongst other things the language's ability to easily define new control constructs, operators and syntax without the need for dynamic or meta programming.

  • At the risk of making some fellow Java developers go mental: Closures facilitate the creation of custom control abstractions that (almost) feel like real language constructs.

  • Finally, Lisps (and now Ioke) raise the bar even higher with the inclusion of macro systems which provide powerful ways to hide "low level" details behind custom layers of abstraction and enable you to easily extend a language's syntax (3).
Conclusion

Personally, I thing that an expressive language should allow you to convey the things that you want to say without requiring a significant translation step by yourself and the users of the API. Once again, for example, Java's BigInteger API leaves a lot to be desired in this respect. I do realize that talking about what constructs and features contribute to the expressiveness of a programming language is highly subjective and often lies in the eye of the beholder (4). The power and structure of DSLs obviously vary from situation to situation and the amount of abstraction you need depends on specific requirements. Sometimes all you want is an API that more or less reads natural to your stakeholders. Yet at other times you also need custom general-purpose constructs like control structures, operators, etc. For those cases Java is not a good match. Even though there are numerous features that make other languages vastly superior to Java, things like operator overloading, extended character sets for identifiers or even just some syntactic sugar here or there are fundamental in my opinion and would kick it up a notch for internal DSLs written in Java. And there might be light at the end of the tunnel: Because of the increasing importance of the JVM as a language host, the recently established Project Coin could include a proposal that would allow Java to call into languages with different naming restrictions. How that would be achieved I don't know, but I'm certainly following the mailing lists should such a proposal show up ...

----------

(1) Adrian Kuhn shows how to use Roman numerals in your Java code by bending JSR-269, the Pluggable Annotation Processing API, to rewrite Java's AST. However, there're a few caveats: It only works with Java 6 and only with Sun's Java compiler because he uses internal APIs. Interesting, but shows the complexity that you're about to deal with should you choose that path. If you're interested you might also take a look at the The Hacker's Guide to Javac.

(2) Some have argued that the term operator overloading does not apply to languages that treat operators as functions. Rather, operators are syntactic sugar because they translate to an equivalent function call form.

(3) Paul Graham once said in one of his Lisp essays that "In Lisp, you don't just write your program down toward the language, you also build the language up toward your program."

(4) For those interested: Matthias Felleisen presents a formal notion of expressiveness in his paper called On the Expressive Power of Programming Languages (admittedly I've not read it yet in its entirety).

No comments: