I’ve been banging away on Clojure for a few days now, and while it would
obviously take months of study and grinding through a big serious real-world
software project to become authoritative, I think that what I’ve learned is
useful enough to share.
[This is part of the
Concur.next
series.]
1. It’s the Best Lisp Ever
I don’t see how this can be a controversial statement. Issues of
language-design aside, every other Lisp I’ve worked with has been hobbled by
lacklustre libraries and poor integration with the rest of the IT
infrastructure. Running on the Java platform makes those problems go away,
poof!
Let’s assume hypothetically that there are other Lisps where certain design
choices are found to be better than Clojure’s. Well, you can
pile all those design choices up on top of each other and the pile will have
to be very high before they come close to balancing the value of Java’s huge
library repertoire and ease of integration with, well, just about
anything.
2. Being a Lisp Is a Handicap
There are a large number of people who find Lisp code hard to read. I’m
one of them. I’m fully prepared to admit that this is a shortcoming in myself
not Lisp, but I think the shortcoming is widely shared.
Perhaps if I’d learned Lisp before plunging into the procedural mainstream,
I wouldn’t have this problem — but it’s not clear the results of MIT’s
decades-long experiment in doing so would support that hypothesis.
I think it’s worse than that. In school, we all learn
3 + 4 = 7 and then
sin(π/2) = 1
and then many of us speak languages with infix verbs. So Lisp is fighting
uphill.
It also may be the case that there’s something about some human minds that
has trouble with thinking about data list-at-a-time rather than item-at-a-time
and thus reacts poorly to constructs like
(apply merge-with +
(pmap count-lines
(partition-all *batch-size*
(line-seq (reader filename)))))
I think I really totally understand the value of being
homoiconic, and
the awesome power of macros, and the notion of the reader. I want to like
Lisp; but I think readability is an insanely important characteristic in
programming systems.
Practically speaking, this means that it’d be hard for me to go out there on
Sun’s (or Oracle’s) behalf and tell them that the way to take the best
advantage of modern many-core hardware is to start with S-Expressions before
breakfast.
3. Clojure’s Concurrency Features Are Awesome
They do what they say they’re going to do, they require amazingly little
ceremony, and, near as I can tell, their design mostly frees you from having
to worry about deadlocks and race conditions.
Rich Hickey has planted a flag on high ground, and from here on in I think
anyone who wants to make any strong claims about doing concurrency had better
explain clearly how their primitives are distinguished from, or better than,
Clojure’s.
4. Agents Are Better Than Refs or Atoms
I’m using these terms in a Clojure-specific way: Specifically, I mean
agents,
refs, and
atoms.
Agents are not
actors nor are they
processes in either the Operating-System or Erlang senses. I’m not actually
sure how big a difference that makes; my suspicion is that programmers
probably think about using all three in about the same way, and that’s OK.
Anyhow, agents solve concurrency problems in the simplest possible way: By
removing concurrency. Send functions to an agent and they’ll get executed one
at a time in whatever order, taking the agent variable as their first
argument, replacing its value with their output.
Here is an example. I have a map (i.e. hash table) called
so-far in which the keys are strings and the values are integers
counting how many times each string has been encountered. If I use
refs to protect both the hash table and the counters, I get code
like this:
1 (defn new-counter [ so-far target ]
2 (dosync
3 (if-let [ c (@so-far target) ]
4 c
5 (let [counter (ref 0) ]
6 (ref-set so-far (assoc @so-far target counter))
7 counter))))
8
9 (defn record [target so-far]
10 (if-let [ counter (@so-far target) ]
11 (incr counter)
12 (incr (new-counter so-far target))))
Let’s start with the record function on Line 9. The
if-let looks up the target in the hash, ignoring concurrency
issues with @, and uses incr to bump the counter, if
there’s one there. If there isn’t, it calls new-counter to make
one.
Lines 3 and 4, in new-counter, are where it gets interesting.
Since everything’s running concurrently, we can’t just go ahead and bash a new
counter into the so-far hash table, because somebody might have
come along and done that already, recorded a few values even, so we’re at risk
of throwing away data. So after we’ve locked things down with
dosync, we check once again to see if the counter is there and if
so, just return it. Otherwise we create the new counter, load it into the
hash, and return it.
On the other hand, consider the agent-based approach; once
again we have a hash table called so-far, but protected by an
agent. If the code wants to increment the value for some target,
it says
(send so-far add target)
This will eventually call the add function with the hash table
(not a reference or anything, the actual table) as the first argument, and
target as the second. Here’s add:
(defn add [so-far target]
(if-let [count (so-far target)]
(assoc so-far target (inc count))
(assoc so-far target 1)))
Considerably simpler, and nothing (concurrency-wise) can go wrong.
I do have one nit with agents. Most of my code was infrastructure; a
module that reads lines out of a file and passes them one at a time to a
user-provided function. At one point, I made some of the code that fixes up
the lines that span I/O-block boundaries agent-based, because it was simpler.
Unfortunately that code also calls the user-provided function and when one of
those also tried to send work off to an agent, everything blew up because you
can’t have a send inside a send.
Actually, I think my nit is more general; in an ideal world, concurrency
primitives would all be orthogonal and friction-free. But anyhow it’s a nit,
not an architectural black hole, I think.
5. Clojure Concurrency Does Buy Real-World Performance
The Wide Finder runs I was using to test were processing 45G of data in a
way that turned out to be CPU-limited in Clojure (I think due to
inefficiencies in Java’s bytes-on-disk-to-String-objects pipeline, but I’m not
sure). So making this run fast on a high-core-count/low-clock-rate processor
was actually a pretty useful benchmark.
The single most important result: Clojure’s concurrency tools reduced
the elapsed run-time by a factor of four on an eight-core system, with a very
moderate amount of easy-to-read (for Lisp) code.
6. Performance is Wonky But It Doesn’t Matter
Some more results:
The amount of extra CPU burned to achieve the 4× speedup was remarkably
high, more than doubling the CPU of the whole job.
The costs of concurrency, as functions of whether you use refs, or
map/reduce, or agents, and also of block-size and thread-count and so on, are
wildly variable and exhibit no obvious pattern.
Well, agents did seem to be quite a bit more expensive than refs. But
refs were pretty cheap; a low-concurrency map/reduce approach was not
dramatically slower than doing the Simplest Thing That Could Possibly Work
with refs.
These results are irrelevant. Remember, this is Clojure
1.0 we’re working with. If we determine that the throughput
of the agent handlers is unacceptable, or that the STM-based infrastructure is
consuming excessive CPU overhead, I’m quite confident that can be fixed. For
example, we could lock Rich Hickey in a basement and put him on a
tofu-and-lettuce diet.
7. The Implementation Is Good
I pushed Clojure hard enough to have a couple of subtle code bugs blow out
the whole JVM, which takes considerable blowing-out on a Sun T2000.
But the bugs were mine not Clojure’s. In the course of quite a few
days pounding away at this thing with big data and tons of concurrency, I
only observed one bug that I’m pretty sure is in Clojure, and then I couldn’t
reproduce it.
Also, I never observed code in Clojure running significantly slower than
the equivalent code in Java.
So if I’m wrong and there’s scope for a Lisp to take hold in the
mainstream, Clojure would really be a good Lisp to bet on.
8. The Documentation Is OK
The current sources are Stuart Halloway’s
Programming
Clojure, Mark Volkmann’s
Clojure -
Functional Programming for the JVM, and of course the online
API reference.
I used the book most, and while it’s well-written and accurate,
it’s either missing some coverage or a little out
of date, as I discovered whenever I published code
and helpful commenters pointed out all the newer and better functions that I
could have used. I also found the apps they built the tutorial examples
around less than compelling.
Also, you can look through the source code, which is mostly in Clojure, and
even for someone like me who finds Lisp hard to read, that’s super-helpful.
But it’s clear that there’s good scope for a “Camel” or “Pitchfork” style book
to come along and grab the high ground.
9. The Community Is Excellent
As I’ve already observed, the Clojure community is terrific; we’ll see how
well that stands the test of time. I suspect I may linger around #clojure
even when I’ve moved on to other things, just because the company’s good.
10. The Tools Aren’t Bad
I used
Enclojure and I recommend it; having
it set up and manage my REPL was super-convenient, and it
never introduced any bugs or inconsistencies that I spotted. It’s also very
early on in its life and there are rough spots, but really it’s good stuff.
I gather
that rather more people use Emacs and some favor of
SLIME, and I’m sure I
would have been just fine with that too.
11. Tail Optimization Is Still a Red Herring
I wrote admiringly in
Tail Call Amputation about
the virtues of Clojure’s recur and loop forms, as
opposed to traditional tail-call optimization. This is clearly a religious
issue, and there’s lots of preaching in the comments to that piece. I read
them all and I followed pointers, and here’s what I think:
Clojure’s loop/recur delivers 80% of the value of
TCO, with greater syntax clarity. Clojure’s
trampoline delivers 80% of the
remaining 20%.
Near as I can tell, that leaves state-machine implementation as the big
outstanding case that you really need TCO for. I’ve done a ton of
state-machine work in my career, and while I recognize that you could
implement them with a bunch of trampolining tail-called routines, I’ve never
understood why that’s better than expressing them in some sort of (usually
sparse) array.
So, my opinion is that post-Clojure, this argument is over. I suspect that
this will convince exactly zero of the TCO fans, probably including Rich
Hickey, and that once again the comments will fill up with people explaining
how the real conclusion is that I don’t actually understand TCO. Oh well.
Thanks!
To Rich and the community for welcoming me and helping. I stuffed my code
fragments into the SVN repository at the Kenai
Divide and Conquer
project; they ain’t pretty. If anyone wants to have a whack at the big
dataset, send me a hail and if I think you’re serious I’ll get you an
account.
The quest for the Java of Concurrency continues.