martin's blog: January 2009

As far as I'm concerned one of the best ways to learn what a software is doing internally and how it does it, is stepping through the code with a debugger. That allows you to get a feel for the call chain and to inspect the state of objects and variables at runtime. But how're you supposed to debug an external application? With Java applications it becomes quite simple.

To that end the Java platform provides the Java Platform Debugger Architecture. It consists of two interfaces, the native JVM Tools Interface (JVM TI) for the backend and the Java-based Java Debug Interface (JDI) for the front end, and a protocol, the Java Debug Wire Protocol (JDWP) that ties both interfaces together and provides the format of debugging information and requests between both. By means of this architecture, a debugger can now connect to remote Java processes.

On to a concrete example: Recently I've taken quite a liking to Ioke. I always try to understand the internals of the libraries, frameworks etc. that I'm using. In Ioke's case even more so because it's a programming language and I love programming languages, man! ;) Ioke's written in Java. That mean's we can debug it remotely using Java's Debugger Architecture. Even better, you can find many of Ioke's core concepts directly represented in the Java source. Thus, the cognitive transfer from some program written in Ioke to that program's representation in Java becomes a bit easier.

To debug an external Java process that process has to be running and must have been configured for external debugging. To debug Ioke start the IIk with the option -J-agentlib:jdwp= and use the following sub-options transport=dt_socket,address=8000,server=y,suspend=y.

The agentlib:jdwp option is the preferred way on Java 5 VMs and above (it uses the new interfaces introduced in Java 5). For older VMs you have to use the options -Xdebug and -Xrunjdwp. The above means the following: Listen for a socket connection on port 8000 (rather than explicitely connecting to a debugger) and suspend the VM before the main class loads. Once the debugger application connects, it can send a JDWP command to resume the VM.

I'll use Eclipse and its debugger as an example, but I guess NetBeans or IntelliJ are fine as well (though I've not yet tested it). In Eclipse open the Debug Configurations dialog and create a new configuration for a Remote Java Application. Specify the project, the host and the port (must be the same as the one you gave to the VM). Click on Debug and the VM Ioke's running in will resume normally. Type something in the interactive promt and send it to the interpreter. If you've set any breakpoints in the source code, the debugger will halt and you're good to go doing whatever you want to do.

Pretty neat, ey? Maybe you've already known this stuff, but if not ...

(If anybody knows a better and/or simpler way, I'd really like to know about it.)

Ioke is a relatively new addition to the current crop of JVM languages and has been released as version Ioke O a month ago with the next release Ioke S scheduled for today. (Update: seems I'm a bit behind: Ioke S has just been released!). Ioke's designed by Ola Bini, a JRuby core developer, and features strong, dynamic typing, prototype-based object orientation and is a homoiconic general purpose language inspired by Io, Smalltalk, Self, Lisp and Ruby. It's mainly written in Java.

I've been following the evolution of Ioke since the beginning and became quite intrigued by it in the process. The language's main focus is expressiveness rather than performance. It's worth to point out that performance considerations have been completely neglected during the design process in favour of enhancing the expressive power of Ioke. According to Ola, "expressiveness is the ultimate goal of the language". I assume that doesn't mean that performance won't ever be a concern. It just hasn't played a decisive role in the design so far. Actually, Ola's made the point quite early on that Ioke's not your usual language:

"Ioke is a language I designed for myself. That means that some design choices might not match what you would expect. Ioke is a language that makes sense to me. If it makes sense to you too, great! But be warned that I have made several decisions that doesn't necessarily match what most general purpose languages choose to do."

If you want to know more about Ioke's features, I recommend taking a look at the Programming Guide as well as Ola's aforementioned blog posts. Therefore I don't intend to go into too much detail, but I think there are some noteworthy things to mention. Note, however, that I'm by far no expert, so please correct me if I'm talking nonsense.

Ioke has a powerful macro system similar to languages of the Lisp family. Macros are a kind of code generation facility that allow you e.g. to extend Ioke's syntax (comprehensions are implemented as macros for example). Mind you, we're not talking about mere string substitution like #define in C.

During the transition from Ioke O to Ioke S a couple of new macro types were introduced that allow you to write even more powerful abstractions. That recently led Ola to borrow a term coined by Steve Yegge to describe Ioke as a folding language, that is, a language in which you can write code that writes code that writes code etc. If that sounds as confusing to you as it did for me when I first heard that term, just just click on above link for further clarification.
Ioke features a condition system influenced by Common Lisp that allows fine-grained control over any conditions (not necessarily constrained to error conditions) that might arise during execution of a program. It serves a similar purpose as Java's exception handling mechanism, but is more flexible in that it allows e.g. invoking restarts in the dynamic context of the place the condition happened to provide a way to get back to valid state. The debugger in Ioke's REPL, called IIk (interactive Ioke), is implemented using the condition system. It interactively provides various options to handle certain conditions. For example, a method in Ioke signals Condition Error Invocation TooFewArguments if it's not been given enough arguments. On that condition the debugger allows for giving additional arguments to the method and subsequently restarts the block of code that caused the condition. Nice ;)
As of the latest version, Ioke supports a bit of aspect-oriented programming in that it's possible to add before, around and after advice to cells (see below for an explanation of this term) of an object.
Just like Smalltalk, Ioke is all about sending messages to objects. There are no keywords or statements. Everything is an expression that is made up of a chain of messages that return something which in turn is the receiver of the next message. Inspired by Smalltalk and Io, messages are separated by whitespace rather than a period as for example in Java. If you want to know more, Ola's written a detailed blog post about his early decisions and reasoning regarding the syntax of Ioke.
Accompanying the distribution are a couple of tools specifically written for Ioke. First, there's a minimal port of Ruby's BDD framework RSpec called ISpec, which is used exclusively throughout Ioke's test suite. Additionally, a tool called DokGen is used to generate the reference documentation. You can view the current API here.

Well, 'nuff said. To learn more about a language and how it feels you've got to write some code. So I decided to port a Java implementation of an alogrithm to find all two-word anagrams of a given input word from a given dictionary. It turned out to be quite straightforward even though Ola had to help me out on a few occasions.

The algorithm works like this:

Compute a code for each word in the given input dictionary. The code is the product of prime numbers which have been assigned to each letter of the word.
Store the code and the word such that the code maps to a list of words that have the same code, that is, they consist of the same letters. I've used a Multimap for that.
Finally, the Multimap is searched for two codes that multiply to the code that corresponds to the input word. If such two codes can be found, you have a two-word anagram of the input word.

Here's the code (hope it's readable enough; still on the look out for a decent syntax highlighter):


Multimap = Origin mimic
Multimap initialize = method(self content = {})
Multimap cell("[]=") = method(key, value,
 if(content key?(key),
   content[key] << keys =" method(" words =" method(text," index =" method(words," encode =" method(word," asprime =" method(char," primes =" {}("> 2,  "b" => 3,  "c" => 5,  "d" => 7,  "e" => 11,
 "f" => 13, "g" => 17, "h" => 19, "i" => 23, "j" => 29,
 "k" => 31, "l" => 37, "m" => 41, "n" => 43, "o" => 47,
 "p" => 53, "q" => 59, "r" => 61, "s" => 67, "t" => 71,
 "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97,
 "z" => 101) withDefault(1)

anagram = method(word,
 code = encode(word)
 Multimap keys each(x,
   if(code % x == 0,
     y = code / x
     if(Multimap key?(y),
      "#{Multimap[x]} and #{Multimap[y]}" println))))

index(words(FileSystem readFully("wordlist.txt")))
anagram("documenting")

Let's step through some parts of the code. First of all, there're quite a few terms appearing in the following paragraph(s) that denote core concepts of Ioke. It might not be obvious what they mean at first glance. I'll try to explain them very briefly, but for further clarification please refer to the Programming Guide.

The first part of the program defines a new kind of object, a Multimap, that mimics Origin. Origin is the kind (the Ioke term for type) that most objects in Ioke should start from. The term mimic captures the prototype-based part of Ioke. In prototype-based programming there's no distinction between classes and objects themselves. New objects are created by cloning them from existing objects, that is, existing objects sort of serve as prototypes for new objects. Ola's decided to call that mimicking an object. A mimic is basically the parent of an object. Multimap mimics Origin and therefore inherits (like in object-oriented programming) all its cells. Cells represent all data in Ioke (think slots in Self for example) and contain variables, methods, etc.

The Multimap's initialize method creates a cell on the receiving object (that's what the predefined variable self refers to) that contains an empty Dict, which is basically a HashMap. The remaining cells are redefined methods of Dict that just delegate to the cell called content ... composition over inheritance ;).

The remaining methods implement the algorithm described above. Methods in Ioke can be defined in several ways. Let's take the index method as an example for one of those ways:


index = method(words,
 words each(x, Multimap[encode(x)] = x))

This method takes one positional argument and the actual code to execute. In this case the method is given a list of words on which the method each (from the kind Mixins Enumerable) is called. If you're familiar with Ruby, Groovy, etc., you should know that method. For those who don't: each executes the code in the second argument for each element x of the collection. Here the code for each word is computed and placed in the Multimap together with the actual word. As you can see, messages are separated by whitespace, which makes it very natural to read in my opinion.

The cell primes is an example of the definition of a Dict. It uses the method {} that takes a variables number of arguments, in this case Pairs that are defined by the literal Pair syntax =>. primes maps individual characters to prime numbers and is also given a default value of 1 that is returned if a queried key is not contained in the Dict.

The complete program works like this: The text file wordlist.txt (a huge collection of words separated by newline characters) is read in and passed as a Text object to the method words which converts it to lower case and tries to match it against the given pattern. The return value is a list of Text objects that were matched (here: all of them). This list is now indexed as described above: Each word in the list is encoded by means of the message chain word chars reduce(1, code, x, code * asPrime(x))) in the method encode. What it does is basically reducing the collection of characters of the input word to one value: the product of prime numbers that correspond to the particular characters according to the Dict primes. Again, if you're familiar with Ruby etc. that method should not come as something new. reduce is also known as fold or inject, both of which would have been available as well. Now the method anagram tries to find all two-word anagrams of the word documenting by first encoding it and subsequently by trying to find two keys in the Multimap that multiply to exactly that code. Pretty simple and straighforwad, ey? ;)

The program finishes successfully in about a minute (give or take) on my machine. The original Java implementation is a lot faster (it runs in about 0.3 seconds), but is also a lot less concise and more verbose in general. As I've already mentioned, performance has not been a key factor in the design of Ioke so far.

Note that there are a few subtleties in the code above, e.g. the seemingly strange cell("[]=") method definition in Multimap. These are described in a lot more detail in the Programming Guide.

To summarize, I had a lot of fun daring my first steps with Ioke. There are definitely a few things you need to wrap your head around, but I really like the way the code feels and reads. Naturally there's a lot more to it than the above code snippet shows. For instance, I've not used aspects, blocks or any of the more sophisticated features like conditions and macros. The entire package seems pretty well thought out and looks like it has received a lot of brain power and hard work, beginning with the language and the Programming Guide itself, the REPL, the debugger, the ISpec framework and the DokGen tool. Altogether there's something to Ioke that may well be made for something bigger in the months and years to come. Finally, for anyone who's thinking about occupying himself/herself more seriously with Ioke: The small community that's been forming recently is a very friendly and helpful crowd. Ola's also very open to outside contributions and actually asks people to help out on the project. So if you want to play a part in the evolution of a cool new language on the JVM ... there's nothing to stop you ;)

Some final bits of information ... really:

The Ioke source code is hosted at GitHub. Besides the Java source it contains variuos examples like a parser, a spelling corrector, the game of life, etc. written in Ioke. Also check out the implementation of the language's built-in functionality which is full of all the sophisticated stuff.
Mailing lists and a bug tracker are available at Project Kenai.
There's a Google Group called ioke-language.
An IRC channel called #ioke was set up on irc.freenode.net.
Sam Aaron, one of the contributors, set up an Ioke reddit.
Martin Elwin, also one of the contributors, has a couple of nice blog posts about setting up a development environment for Ioke under Linux.
The Ioke distribution contains an Emacs mode and a TextMate bundle for syntax highlighting. Those who're primarily working on Windows may take a look at the commercial e Text Editor, which supports TextMate bundles or, as Felipe Rodrigues suggested, the SciTE editor. The latter comes with syntax highlighting for Lisp, which might be good enough. It's also possible to define your own settings if you know what you're doing.

Well ... go try it out!

Update: Sam Aaron had a bit of a play with my implementation of a two-word anagram finder and even wrote some accompanying specs. Whereas my solution is mainly procedural in style, his is entirely object-oriented and encapsulates the algorithm. Plus, you get to see ISpec in action. Very cool!

martin's blog

Saturday, January 31, 2009

Nomisma jacta est

Sunday, January 25, 2009

Debugging the Ioke sources

Friday, January 23, 2009

Ioke

About Me

Blog Archive