Tuesday, March 3, 2009

For what it's worth: Java & expressiveness

Since Ola Bini started blogging about his very own language-project, Ioke, I've been pretty intrigued by the whole package. As soon as the first version had been released I started playing around with it, digging deeper into its concepts and contributing to the Programming Guide and Ioke's core libraries. One of the language's highest goals is expressiveness. It's been designed from the ground up to push the limits on what you can do with internal DSLs. Some of the ingredients that make that happen are Ioke's incredibly flexible treatment of operators as well as the inclusion of a macro system, which, coming from a non-Lisp background, is the most outstanding one for me. For that reason Ola uses the term folding language nowadays to describe Ioke. Well, but this is not a blog post about Ioke. It just sort of reminded me once again of Java's ineptitude when it comes to creating concise and expressive APIs/DSLs.

Java su... ah, excuse me, ... lacks!

Regarding expressiveness Java is like an iron maiden that constrains you and pierces you with a lot of sharp objects (no pun intended). You usually have to resort to all kinds of compromises and even trickery in a few cases (1). Sure, you can implement fluent APIs for providing more easily readable natural language constructs (e.g. JSR-310, the Date and Time API, goes to great lengths to make the API easy to read and natural to use). It makes sense and I think it's a valuable addition to good API design. But it still lacks in my opinion for numerous reasons.

An important one is that Java's restrictions in identifier naming make it impossible to use more appropriate symbols for certain operations. For example, it prevents you from using operator characters in method names or even just appending a '?' to function names, like in Ruby, which makes boolean functions very easy to read. This dilemma is probably due to the fact that operators, which often play an essential part in DSL design, are more or less considered primitives in the Java language rather than functions as in many (or should I say most?) other, particularly functional, programming languages. Whereas Java's closest cousin C++ at least allows operators to be defined as functions for user-defined classes, Java does not ... you're stuck with the built-in ones. I cringe everytime I see code using Java's BigInteger API. Truely horrible from an aesthetic point of view. Other languages offer considerably more freedom in that they achieve operator overloading (2) by treating operators as functions while additionally providing the syntactic sugar of "hiding" the actual method call. This paves the way for highly specialized and readable DSLs. For example, see Scala's parser combinator library that allows writing BNF-like syntax directly in the source code. Note that existing languages differ from each other in that some (e.g Ioke, Haskell or Smalltalk) allow the definition of entirely new operators, possibly incl. associativity and precedence, whereas others provide just a limited set (e.g. Groovy or Ruby). If Stuart Halloway had his way, operator overloading would be one of the distinguishing features of Java.next.

Alternatively, Java also lacks the syntactic sugar of languages like Groovy, Ruby or Scala to omit the parenthesis on n-ary methods (in Scala's case: n = 1) and use them in infix position such that you can write x or y instead of x.or(y). Such tiny syntactic gimmicks alone greatly improve the readability of code. Inspired by Io, Self and Smalltalk, Ioke generally uses whitespace to separate messages from each other, i.e. your reading of a line of code is not "interrupted" by punctuation marks.

Even if ...

... Java had all of the above, it would still be vastly inferior to other languages which are based on different concepts and paradigms and incorporate appropriate feature sets that make them particularly suited for implementing internal DSLs. Among these are:
  • The dynamic and reflective nature of some languages allows you to observe and modify a program's structure at runtime to shape core libraries to the needs of your DSL, e.g. by adding new methods or by redefining existing ones. Groovy's Categories and ExpandoMetaClass (even on a per-instance basis since version 1.6), Ruby's extensive metaprogramming facilities (see Rails) or Smalltalk's meta object protocol come to mind. Furthermore, concepts like Groovy's methodMissing / propertyMissing and invokeMethod, Ruby's method_missing, Ioke's pass etc. let you create effective DSLs with little effort. Ever tried to create an XML builder in Java? In Groovy it's just a few lines of code.

  • The dynamic-language argument is a bit weak though: Scala's implicit conversions, for example, are accountable to a very sophisticated static type system rather than a dynamic one. Moreover, InfoQ hosts an interesting interview with Lennart Augustsson on DSLs written in Haskell where he emphasizes amongst other things the language's ability to easily define new control constructs, operators and syntax without the need for dynamic or meta programming.

  • At the risk of making some fellow Java developers go mental: Closures facilitate the creation of custom control abstractions that (almost) feel like real language constructs.

  • Finally, Lisps (and now Ioke) raise the bar even higher with the inclusion of macro systems which provide powerful ways to hide "low level" details behind custom layers of abstraction and enable you to easily extend a language's syntax (3).
Conclusion

Personally, I thing that an expressive language should allow you to convey the things that you want to say without requiring a significant translation step by yourself and the users of the API. Once again, for example, Java's BigInteger API leaves a lot to be desired in this respect. I do realize that talking about what constructs and features contribute to the expressiveness of a programming language is highly subjective and often lies in the eye of the beholder (4). The power and structure of DSLs obviously vary from situation to situation and the amount of abstraction you need depends on specific requirements. Sometimes all you want is an API that more or less reads natural to your stakeholders. Yet at other times you also need custom general-purpose constructs like control structures, operators, etc. For those cases Java is not a good match. Even though there are numerous features that make other languages vastly superior to Java, things like operator overloading, extended character sets for identifiers or even just some syntactic sugar here or there are fundamental in my opinion and would kick it up a notch for internal DSLs written in Java. And there might be light at the end of the tunnel: Because of the increasing importance of the JVM as a language host, the recently established Project Coin could include a proposal that would allow Java to call into languages with different naming restrictions. How that would be achieved I don't know, but I'm certainly following the mailing lists should such a proposal show up ...

----------

(1) Adrian Kuhn shows how to use Roman numerals in your Java code by bending JSR-269, the Pluggable Annotation Processing API, to rewrite Java's AST. However, there're a few caveats: It only works with Java 6 and only with Sun's Java compiler because he uses internal APIs. Interesting, but shows the complexity that you're about to deal with should you choose that path. If you're interested you might also take a look at the The Hacker's Guide to Javac.

(2) Some have argued that the term operator overloading does not apply to languages that treat operators as functions. Rather, operators are syntactic sugar because they translate to an equivalent function call form.

(3) Paul Graham once said in one of his Lisp essays that "In Lisp, you don't just write your program down toward the language, you also build the language up toward your program."

(4) For those interested: Matthias Felleisen presents a formal notion of expressiveness in his paper called On the Expressive Power of Programming Languages (admittedly I've not read it yet in its entirety).

Sunday, March 1, 2009

The first days in the life of Project Coin

On February 27th the Compiler Group-sponsored OpenJDK project Project Coin has finally gone live. Proposals for small language changes may be submitted to the dev mailing list until March 30th 2009 after which a subset will be included in a JSR draft. In the two days since inception of the project there have been posted quite a few proposals to the list.

Neal Gafter has been very active so far and proposed no less than three language changes:
  • Block expressions allow a series of statements to be written inside a parenthesized expression. This avoids the need for helper functions that are used only once or the introduction of temporary variables.
  • Improved exception handling includes catching multiple exception types as well as improved checking for rethrown exceptions.
  • Improved wildcard syntax introduces syntactic sugar for writing ? extends T as out T (covariant) and ? super T as in T (contravariant). Both in and out would be considered as context-sensitive keywords, i.e. they still can be used as identifiers elsewhere. The syntax looks an awful lot like the upcoming variance feature of C# 4.0. However, C#'s designers apparently decided on declaration-site variance similar to Scala as opposed to Java's use-site variance.
Josh Bloch submitted a proposal for Automatic Resource Management. People who closely followed the discussion around closures in Java and the various proposals that were brought forward should recognize this one (however, I don't know if there are significant differences to Josh's older proposals). The posting of the proposal prompted an interesting discussion with Neal Gafter who (obviously?) seems not very much in favour of this feature or at least does not agree with Josh on its specification.

Jeremy Manson of Google proposed Improved Type Inference for Generic Instance Creation that would allow you to avoid the explicit declaration of parameterized types in class instance creation expressions even though the parameterized type of the constructor is obvious from the context. Neal Gafter also chimed in on this one and pointed out some of what he believes to be omissions or shortcomings of the specification. Apparently he and Joe Darcy, the initiator of Project Coin, are working on a similar proposal that is based on a different implementation strategy.

Additionally, Joe Darcy proposed the use of Strings in switch-statements, Ruslan Shevchenko submitted a proposal for multi-line strings and Adrian Kuhn suggested the use of the default keyword for default (i.e. package-private) visibility.

Altogether a very promising start in my opinion, though I don't necessarily want to see each and every one of those proposals in the language. For example, I've not yet made up my mind about Neal's improved wildcard syntax. On a first scan of the proposal I was not sure what to make of the in and out keywords, i.e. what they should convey to the developer. After refreshing my mind a bit by reading up on the theory behind variance as well as Scala's variance annotations and C#'s upcoming support I felt a bit more like I'd groked it. Well, that's maybe worth a blog post on its own.

Anyway, the mailing list is open to anyone for submitting a proposal and/or for chiming in on any of the previously submitted proposals. I'm not sure I will however because seeing the concentrated brain power on the list makes me afraid of looking like an idiot ;) Nevertheless, discussions so far have been very interesting and I'm confident that we'll see a lot more of that in the days and weeks to come. It's good to finally see things in motion ...