Clojure Gotchas
I’ve been programming in Clojure for well over a year now; originally, I heard about it care of Sam Aaron, an old PhD student of ours who gave a fun lunch time talk; I rarely go to these (although I probably should). Indirectly, Tawny-OWL came out of this one, so it is good that I went.
During the time that I have used Clojure, I have come to know it fairly well, and appreciate many of it’s finer points; these are not the same as many people, I realise; for me, the Java integration is simple, effective and very important, as Tawny-OWL is essentially a wrapper over a Java library. Meanwhile, a lot of the nice concurrency features are a matter of indifference to me, again for the same reason.
But like any language there are some problems, or at least thing that don’t work for me. On the off-chance it is useful to anyone else, here is my list of Clojure gotchas.
Lazy Lists
This is quite a common one, of course, which hits most Clojure beginners. We write something like:
(def x (map (fn [x] (println "hello") x) (range 1 100)))
and then wonder why nothing prints out. Or, the alternative problem,
write something like (range)
and find the REPL hangs. The latter is, I
think, a poorly performing REPL; infinity might be more principled a
point at which to stop than an arbitrarily choosen value but it’s not
useful.
Of course, once you have got past this point, it’s not so bad, but
laziness can still take you unawares, especially when I was using
Clojure just to drive a JVM library. This subtle bug from tawny.render
which is, essentially, one big recursive descent renderer, demonstrates
the problem. Consider, this code:
(concat
[:fact]
(form [:fact fact])
(form [:factnot factnot]))
Looks fine, but I need to pass options and a lookup cache around and had
done this with a number of dynamic vars. The cache, it turns out, would
not have been working for this form (although it was for others), but I
never noticed this. However, the options broke the code more cleanly.
concat
is, of course, lazy, and was happening outside the binding
form which defines the dynamic vars.
Now, I know dynamic vars and laziness don’t mix. In the end, I have added an additional parameter to all the functions in my renderer using the awesome power of lisp (i.e. I wrote a dodgy search and replace function in Emacs). And the cache now invalidates itself using a better technique than before. But I didn’t want laziness, I just got it by chance. In Clojure, it’s always there, wanted or not. Or, rather, it’s always sometimes there, because Clojure is only partly lazy.
Lisp-1 vs Lisp-2
Well, this argument is as old as the hills. Clojure is a lisp-1, so it has a single namespace for variables and functions, while Common Lisp and Emacs-Lisp are a lisp-2, so have one namespace for each.
I’ve had fun with single namespaces before — I used to teach Javascript to new programmers and it produces wierd and wonderful bugs that can be hard to track down. Still, I am too old and wize for that. If only!
During Tawny-OWL, I found accidental capture of functions produced some strange artifacts. Consider, for example, this code.
(defn my-get[x map]
(get x map))
Everything works fine here, of course, right up till the point that you
get bored of typing map
and change it to m
:
(defn my-get[x m]
(get x map))
Now things break in strange ways. map
is now the (global) function and
not the parameter. There are many ways around this, of course. I could
not have done (use 'clojure.core)
earlier and just imported the
functions I use; except that I did use map
elsewhere. I could
namespace everything (try and find some examples of Clojure code with
namespace qualfied or aliased clojure.core
functions).
In my case, exactly this problem hit me when I renamed parameters called
ontology
to o
. I thought the compiler would pick up my errors but
no, because I had an ontology
function. This situation is made worse
by my next gotcha which is:
Everything is a function
Consider this entirely pointless piece of code which makes lisp post-fix.
(defmacro do-do [x afn]
`(do ~(afn x)))
We can use this macro like so:
(do-do 1 inc)
Now, if you know only a little about lisp, you might expect this to
return 2. If you are more experienced, then you might think that this is
a strange thing to do, because the call to (inc 1)
happens at
macro-expansion time, and why would you want to do that? If you are more
experienced still, you will think, well actually inc
is not evaluated
so it is actually a symbol, and the whole thing is going to crash.
Actually, it returns nil
. The reason for this is that lots of things
in Clojure are functions that you wouldn’t expect, and symbol is one of
these. So, actually, ('inc 1)
returns nil
. Because symbols are
functions which lookup the occurance of the symbol in the collection
that follows.
Now this has advantages, of course, namely that you can use a symbol to look up a key in a collection. So, for example:
('bob {'bob 1})
Returns 1. Of course, this is nice, but how many times do you actually
want to do this? And when you do, would (get {'bob 1} 'bob)
really be
so hard? I can see the justification for (:bob {:bob 1})
but for
symbols I am really not convinced, unless I am missing some other
critical advantage.
Future, what’s a Future
So, your working along happily in your single threaded application, and then you write this:
(def x 1)
(def y (ref 2))
(+ @x y)
Now, in this small example, the error is easy to find; we should have
derefed y
and not x
. And what is the error that we get from this?
ClassCastException java.lang.Long cannot be cast to
java.util.concurrent.Future
clojure.core/deref-future (core.clj:2108).
But I have not used a future. I have never used a future. I do not even
know what a future is (although, I may, of course do so in the future).
The reason for this strange error message can be seen from the code for
deref
(which the @
reader macro uses. Since, integers do not
implement IDeref
we treat them as a Future
, which then causes the
cast exception.
(defn deref
{:added "1.0"
:static true}
([ref] (if (instance? clojure.lang.IDeref ref)
(.deref ^clojure.lang.IDeref ref)
(deref-future ref)))
([ref timeout-ms timeout-val]
(if (instance? clojure.lang.IBlockingDeref ref)
(.deref ^clojure.lang.IBlockingDeref ref timeout-ms timeout-val)
(deref-future ref timeout-ms timeout-val))))
This one is easy to solve. Deref should check instance? Future
on the
value if IDeref
fails, and crash with a better error message. One
instance?
check is well worth the effort.
Backtick really is for macros only
The backtick notation is found in many lisps, and this includes Clojure. It is primary use is in macros because it lets you build up forms programmatically, but have them look like normal typed in forms. Compare these two:
(defmacro inc2 [x]
`(+ ~x 2))
(defmacro inc2 [x]
(list + x 2))
In many lisps, though, the backtick is just a list creation macro, that happens to be mostly used for macros. In clojure, it’s been hard coded for macros. Consider:
(let [x 'john]
`(~x paul george ringo))
You might expect this to just return a list of four symbols (which it does), but the symbols are not what you might expect.
(john user/paul user/george user/ringo)
The symbols paul
, george
and ringo
get namespace qualified in the
return value even though they are not in the original form. Now, of
course, there is a good reason for this; it helps to prevent us from
accidental capture of symbols. All symbols should be qualified or
gensym’d.
But consider this:
(deftype bob []
java.lang.Runnable
(run [this]
(println "Hello")))
Now, I know this is a silly example, because bob is just implementing
Runnable
, and any function would do this, but Runnable
is nice and
simple. This is still quite a lot of typing, so, perhaps we should macro
this.
(defmacro defrunnable[name body]
`(deftype ~name []
java.lang.Runnable
(run [this]
~body)))
Unfortunately, this is actually wrong because the symbols run
and
this
get namespace qualified, so we end up with user/run
and
user/this
. The correct way to achieve this is this:
(defmacro defrunnable[name body]
`(deftype ~name []
java.lang.Runnable
(~'run [~'this]
~body)))
Now, this version is anaphoric and introduces this
, so perhaps it is
not ideal, but run
although it looks like a funtion is not one — it’s
a lexical symbol that Clojure translates to the method name.
Whitespace
In Clojure ,
is whitespace. Effectively, it is used to make code
pretty but has no meaning other than that. Those coming from other Lisps
will sooner or later do something like this:
(defmacro defrunnable[name body]
`(deftype ,name []
java.lang.Runnable
(,'run [,'this]
,body)))
This nearly always results in a strange error message somewhere down the
line which is not easy to debug. The point is that other lisps use ,
to mean “unquote” for which Clojure uses ~
. Not really Clojure’s fault
this one, I guess. But irritating none the less.
Running in Java
One of the most unfortunate things about Clojure is that it’s hosted on the JVM. Of course, this is also the reason that I am using it, so I guess it makes no sense to complain, except when writing a article of “gotchas”. But being hosted on the JVM means Clojure inherits some of the strangeness of the JVM.
While writing
Protege-NREPL, I had to
struggle with the an OSGi and Clojure’s dynamic ClassLoader
both of
which do sort of the same thing, but sort of differently. It’s while
getting this to work that I found that Clojure uses the context class
loader.
In the end, I found that I needed this code to get anything working:
private final ClassLoader cl = new DynamicClassLoader(this.getClass().getClassLoader());
Thread.currentThread().setContextClassLoader(cl);
No one understand what the context class loader is, nor what it is for. There again, no one understands class loaders, so this is perhaps not a surprise.
Two times
Clojure uses what is effectively a two-pass compilation step. I say effectively, because apparently it doesn’t but the practical upshot is that you have to declare things before you use them. This is just a pain.
A related problem is that Clojure dislikes circular namespace
dependencies. With Tawny-OWL, this means that the main namespace is not
really in the order that I want it. And it was a big problem for the
reasoner namespace. The problem is that the reasoner namespace has to
know about the owl.clj
namespace; but, also, the reasoner namespace
has to know when an ontology is removed (so that any reasoners can be
dropped). The obvious solution which is to have the owl.clj
call
reasoner.clj
doesn’t work because we now have a circular dependency.
In the end, I solved this by implementing a hook system like Emacs. Now
owl.clj
just runs a hook. Probably, I should reimplement this directly
with watches, but they were alpha at the time.
Goodbye Cons
One of the big wins for Clojure is built over
abstractions, so that cons cell which is
the core of most lisps is gone. Instead of this, we have ISeq
which is
an interface and looks like this:
Object first();
ISeq next();
ISeq more();
ISeq cons(Object o);
The problem is that it really does look like this; I mean, this is a cut-and-paste from the code. Aside these method declarations, that’s is. Nothing at all in the way of documentation.
Worse the entire API for Clojure consists of two classes, with the rest being considered “implementation detail”.
Strictly, therefore, Clojure is built over abstractions, but users of Clojure have no access to extend these abstractions themselves, unless they use implementation detail. Which, of course, they do; to access the heart of the language you have to. Given this reality, some documentation would be nice!
Conclusions
Clojure is a nice language, but in some parts it is still a little immature; some of these gotchas will disappear in time. The error message about Future’s is trivial to fix, for instance. Some of them already can be avoided with libraries: for example, the backtick issue can be avoided using an alternative implementation. Others, will I think, stick. Symbols will remain functions I suspect. The last issue, that of a public API, must be fixed if Clojure is to mature.
One gotcha I don’t mention is the lack of a type-system. There are many times when programming Clojure when I have created a bug that a type-system would have picked up instantly. This must, however, be set against those times when you stare at the screen in depression trying to work out why a perfectly innocuous piece of code will not compile. In the end, it’s often easier to debug running code, than it is to fix a broken type error. Both forms of problem are something you learn to live with, depending on your choice of language.