Tuesday, March 3, 2009

For what it's worth: Java & expressiveness

Since Ola Bini started blogging about his very own language-project, Ioke, I've been pretty intrigued by the whole package. As soon as the first version had been released I started playing around with it, digging deeper into its concepts and contributing to the Programming Guide and Ioke's core libraries. One of the language's highest goals is expressiveness. It's been designed from the ground up to push the limits on what you can do with internal DSLs. Some of the ingredients that make that happen are Ioke's incredibly flexible treatment of operators as well as the inclusion of a macro system, which, coming from a non-Lisp background, is the most outstanding one for me. For that reason Ola uses the term folding language nowadays to describe Ioke. Well, but this is not a blog post about Ioke. It just sort of reminded me once again of Java's ineptitude when it comes to creating concise and expressive APIs/DSLs.

Java su... ah, excuse me, ... lacks!

Regarding expressiveness Java is like an iron maiden that constrains you and pierces you with a lot of sharp objects (no pun intended). You usually have to resort to all kinds of compromises and even trickery in a few cases (1). Sure, you can implement fluent APIs for providing more easily readable natural language constructs (e.g. JSR-310, the Date and Time API, goes to great lengths to make the API easy to read and natural to use). It makes sense and I think it's a valuable addition to good API design. But it still lacks in my opinion for numerous reasons.

An important one is that Java's restrictions in identifier naming make it impossible to use more appropriate symbols for certain operations. For example, it prevents you from using operator characters in method names or even just appending a '?' to function names, like in Ruby, which makes boolean functions very easy to read. This dilemma is probably due to the fact that operators, which often play an essential part in DSL design, are more or less considered primitives in the Java language rather than functions as in many (or should I say most?) other, particularly functional, programming languages. Whereas Java's closest cousin C++ at least allows operators to be defined as functions for user-defined classes, Java does not ... you're stuck with the built-in ones. I cringe everytime I see code using Java's BigInteger API. Truely horrible from an aesthetic point of view. Other languages offer considerably more freedom in that they achieve operator overloading (2) by treating operators as functions while additionally providing the syntactic sugar of "hiding" the actual method call. This paves the way for highly specialized and readable DSLs. For example, see Scala's parser combinator library that allows writing BNF-like syntax directly in the source code. Note that existing languages differ from each other in that some (e.g Ioke, Haskell or Smalltalk) allow the definition of entirely new operators, possibly incl. associativity and precedence, whereas others provide just a limited set (e.g. Groovy or Ruby). If Stuart Halloway had his way, operator overloading would be one of the distinguishing features of Java.next.

Alternatively, Java also lacks the syntactic sugar of languages like Groovy, Ruby or Scala to omit the parenthesis on n-ary methods (in Scala's case: n = 1) and use them in infix position such that you can write x or y instead of x.or(y). Such tiny syntactic gimmicks alone greatly improve the readability of code. Inspired by Io, Self and Smalltalk, Ioke generally uses whitespace to separate messages from each other, i.e. your reading of a line of code is not "interrupted" by punctuation marks.

Even if ...

... Java had all of the above, it would still be vastly inferior to other languages which are based on different concepts and paradigms and incorporate appropriate feature sets that make them particularly suited for implementing internal DSLs. Among these are:
  • The dynamic and reflective nature of some languages allows you to observe and modify a program's structure at runtime to shape core libraries to the needs of your DSL, e.g. by adding new methods or by redefining existing ones. Groovy's Categories and ExpandoMetaClass (even on a per-instance basis since version 1.6), Ruby's extensive metaprogramming facilities (see Rails) or Smalltalk's meta object protocol come to mind. Furthermore, concepts like Groovy's methodMissing / propertyMissing and invokeMethod, Ruby's method_missing, Ioke's pass etc. let you create effective DSLs with little effort. Ever tried to create an XML builder in Java? In Groovy it's just a few lines of code.

  • The dynamic-language argument is a bit weak though: Scala's implicit conversions, for example, are accountable to a very sophisticated static type system rather than a dynamic one. Moreover, InfoQ hosts an interesting interview with Lennart Augustsson on DSLs written in Haskell where he emphasizes amongst other things the language's ability to easily define new control constructs, operators and syntax without the need for dynamic or meta programming.

  • At the risk of making some fellow Java developers go mental: Closures facilitate the creation of custom control abstractions that (almost) feel like real language constructs.

  • Finally, Lisps (and now Ioke) raise the bar even higher with the inclusion of macro systems which provide powerful ways to hide "low level" details behind custom layers of abstraction and enable you to easily extend a language's syntax (3).
Conclusion

Personally, I thing that an expressive language should allow you to convey the things that you want to say without requiring a significant translation step by yourself and the users of the API. Once again, for example, Java's BigInteger API leaves a lot to be desired in this respect. I do realize that talking about what constructs and features contribute to the expressiveness of a programming language is highly subjective and often lies in the eye of the beholder (4). The power and structure of DSLs obviously vary from situation to situation and the amount of abstraction you need depends on specific requirements. Sometimes all you want is an API that more or less reads natural to your stakeholders. Yet at other times you also need custom general-purpose constructs like control structures, operators, etc. For those cases Java is not a good match. Even though there are numerous features that make other languages vastly superior to Java, things like operator overloading, extended character sets for identifiers or even just some syntactic sugar here or there are fundamental in my opinion and would kick it up a notch for internal DSLs written in Java. And there might be light at the end of the tunnel: Because of the increasing importance of the JVM as a language host, the recently established Project Coin could include a proposal that would allow Java to call into languages with different naming restrictions. How that would be achieved I don't know, but I'm certainly following the mailing lists should such a proposal show up ...

----------

(1) Adrian Kuhn shows how to use Roman numerals in your Java code by bending JSR-269, the Pluggable Annotation Processing API, to rewrite Java's AST. However, there're a few caveats: It only works with Java 6 and only with Sun's Java compiler because he uses internal APIs. Interesting, but shows the complexity that you're about to deal with should you choose that path. If you're interested you might also take a look at the The Hacker's Guide to Javac.

(2) Some have argued that the term operator overloading does not apply to languages that treat operators as functions. Rather, operators are syntactic sugar because they translate to an equivalent function call form.

(3) Paul Graham once said in one of his Lisp essays that "In Lisp, you don't just write your program down toward the language, you also build the language up toward your program."

(4) For those interested: Matthias Felleisen presents a formal notion of expressiveness in his paper called On the Expressive Power of Programming Languages (admittedly I've not read it yet in its entirety).

Sunday, March 1, 2009

The first days in the life of Project Coin

On February 27th the Compiler Group-sponsored OpenJDK project Project Coin has finally gone live. Proposals for small language changes may be submitted to the dev mailing list until March 30th 2009 after which a subset will be included in a JSR draft. In the two days since inception of the project there have been posted quite a few proposals to the list.

Neal Gafter has been very active so far and proposed no less than three language changes:
  • Block expressions allow a series of statements to be written inside a parenthesized expression. This avoids the need for helper functions that are used only once or the introduction of temporary variables.
  • Improved exception handling includes catching multiple exception types as well as improved checking for rethrown exceptions.
  • Improved wildcard syntax introduces syntactic sugar for writing ? extends T as out T (covariant) and ? super T as in T (contravariant). Both in and out would be considered as context-sensitive keywords, i.e. they still can be used as identifiers elsewhere. The syntax looks an awful lot like the upcoming variance feature of C# 4.0. However, C#'s designers apparently decided on declaration-site variance similar to Scala as opposed to Java's use-site variance.
Josh Bloch submitted a proposal for Automatic Resource Management. People who closely followed the discussion around closures in Java and the various proposals that were brought forward should recognize this one (however, I don't know if there are significant differences to Josh's older proposals). The posting of the proposal prompted an interesting discussion with Neal Gafter who (obviously?) seems not very much in favour of this feature or at least does not agree with Josh on its specification.

Jeremy Manson of Google proposed Improved Type Inference for Generic Instance Creation that would allow you to avoid the explicit declaration of parameterized types in class instance creation expressions even though the parameterized type of the constructor is obvious from the context. Neal Gafter also chimed in on this one and pointed out some of what he believes to be omissions or shortcomings of the specification. Apparently he and Joe Darcy, the initiator of Project Coin, are working on a similar proposal that is based on a different implementation strategy.

Additionally, Joe Darcy proposed the use of Strings in switch-statements, Ruslan Shevchenko submitted a proposal for multi-line strings and Adrian Kuhn suggested the use of the default keyword for default (i.e. package-private) visibility.

Altogether a very promising start in my opinion, though I don't necessarily want to see each and every one of those proposals in the language. For example, I've not yet made up my mind about Neal's improved wildcard syntax. On a first scan of the proposal I was not sure what to make of the in and out keywords, i.e. what they should convey to the developer. After refreshing my mind a bit by reading up on the theory behind variance as well as Scala's variance annotations and C#'s upcoming support I felt a bit more like I'd groked it. Well, that's maybe worth a blog post on its own.

Anyway, the mailing list is open to anyone for submitting a proposal and/or for chiming in on any of the previously submitted proposals. I'm not sure I will however because seeing the concentrated brain power on the list makes me afraid of looking like an idiot ;) Nevertheless, discussions so far have been very interesting and I'm confident that we'll see a lot more of that in the days and weeks to come. It's good to finally see things in motion ...

Saturday, January 31, 2009

Nomisma jacta est

The brand new OpenJDK project that is hosting small language changes for JDK 7 is called Project Coin (an interesting play-of-words). There's no website and/or mailing list yet, but apparently the plan is to get them up and running sometime in February. People are encouraged to propose language changes by formally describing them in a detailed propsal form that is to be sent to the mailing list once it has been established. Interestingly, proposals that wish to remove existing features from Java will not be considered.

As part of the above announcement Joe Darcy lists a few language changes that he thinks might be worthwhile to include in Java 7. These are not exactly brand new and have been discussed on and off over the past view years. Forks of javac implementing some of those features are hosted on Kijaro. However, they received more attention again recently as it became clear that "language-shattering" proposals like closures, first-class properties, reified generics, operator overloading, etc. won't be considered. Thanks to Stephen Colebourne's (and others of course) tireless endeavour the most recent survey results regarding the popularity of those changes can be viewed on his blog. (However, as I've said before, I think that those polls are probably a little too biased because conference attendees are hardly a representative fraction of the entire community of Java developers). Well, Joe probably doesn't speak for Sun in general, but I think chances are that the majority of those changes he mentioned are going to be part of Java 7. I would certainly welcome most of them.

We'll see how it all turns out. I'm pretty excited because it sure looks like that the initial proposal phase is going to be carried out in the public and I'm looking forward to the discussions of the various proposals. Because with the creation of Project Coint we are finally looking at a formal process backed by Sun regarding the evolution of the Java language, these could be very intersting times ...

Sunday, January 25, 2009

Debugging the Ioke sources

As far as I'm concerned one of the best ways to learn what a software is doing internally and how it does it, is stepping through the code with a debugger. That allows you to get a feel for the call chain and to inspect the state of objects and variables at runtime. But how're you supposed to debug an external application? With Java applications it becomes quite simple.

To that end the Java platform provides the Java Platform Debugger Architecture. It consists of two interfaces, the native JVM Tools Interface (JVM TI) for the backend and the Java-based Java Debug Interface (JDI) for the front end, and a protocol, the Java Debug Wire Protocol (JDWP) that ties both interfaces together and provides the format of debugging information and requests between both. By means of this architecture, a debugger can now connect to remote Java processes.

On to a concrete example: Recently I've taken quite a liking to Ioke. I always try to understand the internals of the libraries, frameworks etc. that I'm using. In Ioke's case even more so because it's a programming language and I love programming languages, man! ;) Ioke's written in Java. That mean's we can debug it remotely using Java's Debugger Architecture. Even better, you can find many of Ioke's core concepts directly represented in the Java source. Thus, the cognitive transfer from some program written in Ioke to that program's representation in Java becomes a bit easier.

To debug an external Java process that process has to be running and must have been configured for external debugging. To debug Ioke start the IIk with the option -J-agentlib:jdwp= and use the following sub-options transport=dt_socket,address=8000,server=y,suspend=y.

The agentlib:jdwp option is the preferred way on Java 5 VMs and above (it uses the new interfaces introduced in Java 5). For older VMs you have to use the options -Xdebug and -Xrunjdwp. The above means the following: Listen for a socket connection on port 8000 (rather than explicitely connecting to a debugger) and suspend the VM before the main class loads. Once the debugger application connects, it can send a JDWP command to resume the VM.

I'll use Eclipse and its debugger as an example, but I guess NetBeans or IntelliJ are fine as well (though I've not yet tested it). In Eclipse open the Debug Configurations dialog and create a new configuration for a Remote Java Application. Specify the project, the host and the port (must be the same as the one you gave to the VM). Click on Debug and the VM Ioke's running in will resume normally. Type something in the interactive promt and send it to the interpreter. If you've set any breakpoints in the source code, the debugger will halt and you're good to go doing whatever you want to do.

Pretty neat, ey? Maybe you've already known this stuff, but if not ...

(If anybody knows a better and/or simpler way, I'd really like to know about it.)

Friday, January 23, 2009

Ioke

Ioke is a relatively new addition to the current crop of JVM languages and has been released as version Ioke O a month ago with the next release Ioke S scheduled for today. (Update: seems I'm a bit behind: Ioke S has just been released!). Ioke's designed by Ola Bini, a JRuby core developer, and features strong, dynamic typing, prototype-based object orientation and is a homoiconic general purpose language inspired by Io, Smalltalk, Self, Lisp and Ruby. It's mainly written in Java.

I've been following the evolution of Ioke since the beginning and became quite intrigued by it in the process. The language's main focus is expressiveness rather than performance. It's worth to point out that performance considerations have been completely neglected during the design process in favour of enhancing the expressive power of Ioke. According to Ola, "expressiveness is the ultimate goal of the language". I assume that doesn't mean that performance won't ever be a concern. It just hasn't played a decisive role in the design so far. Actually, Ola's made the point quite early on that Ioke's not your usual language:
"Ioke is a language I designed for myself. That means that some design choices might not match what you would expect. Ioke is a language that makes sense to me. If it makes sense to you too, great! But be warned that I have made several decisions that doesn't necessarily match what most general purpose languages choose to do."
If you want to know more about Ioke's features, I recommend taking a look at the Programming Guide as well as Ola's aforementioned blog posts. Therefore I don't intend to go into too much detail, but I think there are some noteworthy things to mention. Note, however, that I'm by far no expert, so please correct me if I'm talking nonsense.
  • Ioke has a powerful macro system similar to languages of the Lisp family. Macros are a kind of code generation facility that allow you e.g. to extend Ioke's syntax (comprehensions are implemented as macros for example). Mind you, we're not talking about mere string substitution like #define in C.

    During the transition from Ioke O to Ioke S a couple of new macro types were introduced that allow you to write even more powerful abstractions. That recently led Ola to borrow a term coined by Steve Yegge to describe Ioke as a folding language, that is, a language in which you can write code that writes code that writes code etc. If that sounds as confusing to you as it did for me when I first heard that term, just just click on above link for further clarification.

  • Ioke features a condition system influenced by Common Lisp that allows fine-grained control over any conditions (not necessarily constrained to error conditions) that might arise during execution of a program. It serves a similar purpose as Java's exception handling mechanism, but is more flexible in that it allows e.g. invoking restarts in the dynamic context of the place the condition happened to provide a way to get back to valid state. The debugger in Ioke's REPL, called IIk (interactive Ioke), is implemented using the condition system. It interactively provides various options to handle certain conditions. For example, a method in Ioke signals Condition Error Invocation TooFewArguments if it's not been given enough arguments. On that condition the debugger allows for giving additional arguments to the method and subsequently restarts the block of code that caused the condition. Nice ;)

  • As of the latest version, Ioke supports a bit of aspect-oriented programming in that it's possible to add before, around and after advice to cells (see below for an explanation of this term) of an object.

  • Just like Smalltalk, Ioke is all about sending messages to objects. There are no keywords or statements. Everything is an expression that is made up of a chain of messages that return something which in turn is the receiver of the next message. Inspired by Smalltalk and Io, messages are separated by whitespace rather than a period as for example in Java. If you want to know more, Ola's written a detailed blog post about his early decisions and reasoning regarding the syntax of Ioke.

  • Accompanying the distribution are a couple of tools specifically written for Ioke. First, there's a minimal port of Ruby's BDD framework RSpec called ISpec, which is used exclusively throughout Ioke's test suite. Additionally, a tool called DokGen is used to generate the reference documentation. You can view the current API here.
Well, 'nuff said. To learn more about a language and how it feels you've got to write some code. So I decided to port a Java implementation of an alogrithm to find all two-word anagrams of a given input word from a given dictionary. It turned out to be quite straightforward even though Ola had to help me out on a few occasions.

The algorithm works like this:
  • Compute a code for each word in the given input dictionary. The code is the product of prime numbers which have been assigned to each letter of the word.
  • Store the code and the word such that the code maps to a list of words that have the same code, that is, they consist of the same letters. I've used a Multimap for that.
  • Finally, the Multimap is searched for two codes that multiply to the code that corresponds to the input word. If such two codes can be found, you have a two-word anagram of the input word.
Here's the code (hope it's readable enough; still on the look out for a decent syntax highlighter):


Multimap = Origin mimic
Multimap initialize = method(self content = {})
Multimap cell("[]=") = method(key, value,
if(content key?(key),
content[key] << keys =" method(" words =" method(text," index =" method(words," encode =" method(word," asprime =" method(char," primes =" {}("> 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11,
"f" => 13, "g" => 17, "h" => 19, "i" => 23, "j" => 29,
"k" => 31, "l" => 37, "m" => 41, "n" => 43, "o" => 47,
"p" => 53, "q" => 59, "r" => 61, "s" => 67, "t" => 71,
"u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97,
"z" => 101) withDefault(1)

anagram = method(word,
code = encode(word)
Multimap keys each(x,
if(code % x == 0,
y = code / x
if(Multimap key?(y),
"#{Multimap[x]} and #{Multimap[y]}" println))))

index(words(FileSystem readFully("wordlist.txt")))
anagram("documenting")


Let's step through some parts of the code. First of all, there're quite a few terms appearing in the following paragraph(s) that denote core concepts of Ioke. It might not be obvious what they mean at first glance. I'll try to explain them very briefly, but for further clarification please refer to the Programming Guide.

The first part of the program defines a new kind of object, a Multimap, that mimics Origin. Origin is the kind (the Ioke term for type) that most objects in Ioke should start from. The term mimic captures the prototype-based part of Ioke. In prototype-based programming there's no distinction between classes and objects themselves. New objects are created by cloning them from existing objects, that is, existing objects sort of serve as prototypes for new objects. Ola's decided to call that mimicking an object. A mimic is basically the parent of an object. Multimap mimics Origin and therefore inherits (like in object-oriented programming) all its cells. Cells represent all data in Ioke (think slots in Self for example) and contain variables, methods, etc.

The Multimap's initialize method creates a cell on the receiving object (that's what the predefined variable self refers to) that contains an empty Dict, which is basically a HashMap. The remaining cells are redefined methods of Dict that just delegate to the cell called content ... composition over inheritance ;).

The remaining methods implement the algorithm described above. Methods in Ioke can be defined in several ways. Let's take the index method as an example for one of those ways:

index = method(words,
words each(x, Multimap[encode(x)] = x))

This method takes one positional argument and the actual code to execute. In this case the method is given a list of words on which the method each (from the kind Mixins Enumerable) is called. If you're familiar with Ruby, Groovy, etc., you should know that method. For those who don't: each executes the code in the second argument for each element x of the collection. Here the code for each word is computed and placed in the Multimap together with the actual word. As you can see, messages are separated by whitespace, which makes it very natural to read in my opinion.

The cell primes is an example of the definition of a Dict. It uses the method {} that takes a variables number of arguments, in this case Pairs that are defined by the literal Pair syntax =>. primes maps individual characters to prime numbers and is also given a default value of 1 that is returned if a queried key is not contained in the Dict.

The complete program works like this: The text file wordlist.txt (a huge collection of words separated by newline characters) is read in and passed as a Text object to the method words which converts it to lower case and tries to match it against the given pattern. The return value is a list of Text objects that were matched (here: all of them). This list is now indexed as described above: Each word in the list is encoded by means of the message chain word chars reduce(1, code, x, code * asPrime(x))) in the method encode. What it does is basically reducing the collection of characters of the input word to one value: the product of prime numbers that correspond to the particular characters according to the Dict primes. Again, if you're familiar with Ruby etc. that method should not come as something new. reduce is also known as fold or inject, both of which would have been available as well. Now the method anagram tries to find all two-word anagrams of the word documenting by first encoding it and subsequently by trying to find two keys in the Multimap that multiply to exactly that code. Pretty simple and straighforwad, ey? ;)

The program finishes successfully in about a minute (give or take) on my machine. The original Java implementation is a lot faster (it runs in about 0.3 seconds), but is also a lot less concise and more verbose in general. As I've already mentioned, performance has not been a key factor in the design of Ioke so far.

Note that there are a few subtleties in the code above, e.g. the seemingly strange cell("[]=") method definition in Multimap. These are described in a lot more detail in the Programming Guide.

To summarize, I had a lot of fun daring my first steps with Ioke. There are definitely a few things you need to wrap your head around, but I really like the way the code feels and reads. Naturally there's a lot more to it than the above code snippet shows. For instance, I've not used aspects, blocks or any of the more sophisticated features like conditions and macros. The entire package seems pretty well thought out and looks like it has received a lot of brain power and hard work, beginning with the language and the Programming Guide itself, the REPL, the debugger, the ISpec framework and the DokGen tool. Altogether there's something to Ioke that may well be made for something bigger in the months and years to come. Finally, for anyone who's thinking about occupying himself/herself more seriously with Ioke: The small community that's been forming recently is a very friendly and helpful crowd. Ola's also very open to outside contributions and actually asks people to help out on the project. So if you want to play a part in the evolution of a cool new language on the JVM ... there's nothing to stop you ;)

Some final bits of information ... really:
  • The Ioke source code is hosted at GitHub. Besides the Java source it contains variuos examples like a parser, a spelling corrector, the game of life, etc. written in Ioke. Also check out the implementation of the language's built-in functionality which is full of all the sophisticated stuff.
  • Mailing lists and a bug tracker are available at Project Kenai.
  • There's a Google Group called ioke-language.
  • An IRC channel called #ioke was set up on irc.freenode.net.
  • Sam Aaron, one of the contributors, set up an Ioke reddit.
  • Martin Elwin, also one of the contributors, has a couple of nice blog posts about setting up a development environment for Ioke under Linux.
  • The Ioke distribution contains an Emacs mode and a TextMate bundle for syntax highlighting. Those who're primarily working on Windows may take a look at the commercial e Text Editor, which supports TextMate bundles or, as Felipe Rodrigues suggested, the SciTE editor. The latter comes with syntax highlighting for Lisp, which might be good enough. It's also possible to define your own settings if you know what you're doing.
Well ... go try it out!

Update: Sam Aaron had a bit of a play with my implementation of a two-word anagram finder and even wrote some accompanying specs. Whereas my solution is mainly procedural in style, his is entirely object-oriented and encapsulates the algorithm. Plus, you get to see ISpec in action. Very cool!