Entrepreneur. Creator of Groovy++ Alex is a DZone Zone Leader and has posted 31 posts at DZone. You can read more from them at their website. View Full User Profile

How come that Groovy++ overperform Java?

01.12.2010
| 31018 views |
  • submit to reddit

This article continues my writings on statically typed groovy. More information on motivations and goals of the project can be found in my previous articles published on DZone.

To make a long story short "statically typed Groovy" (aka Static Groovy, aka Groovy Booster, aka Groovy++) is an atempt to combine performace and compile-time checks with the expressiveness of the Groovy programming language and to develop high-performant runtime libraries utilizing this combination.

I started the project as a research/hobby, but due to huge interest in the community my company MB Technologies decided to open source it and sponsor development.

Let me start with a short update of what happened since the last post:

  1. Public preview of statically typed Groovy is available now at Google Code
  2. There is discussion group at Google Groups
  3. The project is not open-sourced yet but announced to be open-sourced soon

Now let's go to our main subject.

We will deal with some non-trivial benchmarks and provide several different implementions. The purpose of our exercise will be to see how the power of expressive langauge and statically typed compilations play together.

The benchmark under investigation is counting word frequency of over 20000 text files of 4-20K each. The benchmark was initiated in blog posts of Zach Cox and continued by Lau B. Jensen.

All code below is available now at Google Code. 

Let us start with a pure Groovy solution. I want to use the opportunity and express my gratitude to Paul King, who brought this story to Groovy++ discussion group, provided pure Groovy implementation below and helped a lot with benchmarking.

package org.mbte.groovypp.examples.wordcount

for (i in 0..<10) {
t1 = System.currentTimeMillis()
counts = [:]
new File("./20_newsgroups").eachFileRecurse{ f ->
if (!f.isDirectory() && !f.path.contains(".svn")) {
f.text.toLowerCase().eachMatch(/\w+/) { w ->
def c = counts[w] ?: 0
counts[w] = c + 1 } } }
new File("counts-descreasing-groovy").withWriter { out ->
counts.sort { a, b -> b.value <=> a.value }.each { k, v -> out << "$k\t$v\n" } }
new File("counts-alphabetical-groovy").withWriter { out ->
counts.sort { it.key }.each { k, v -> out << "$k\t$v\n" } }
println "Finished in ${System.currentTimeMillis() - t1} millis"
}

First of all, I think it's nice. Short and clear. Secondly, it does the job, which is always important.

The only problem is it is a little bit slow. On average it takes 51.1sec to process files. Here I should probably say that all benchmarks run on 2 core Mac Book Pro with "-Xmx512m -server". To calculate the average, I take 10 measurements done by the script, drop off highest and lowest number and average the rest.

Let us now see statically typed solution:

@Typed package org.mbte.groovypp.examples.wordcount

for (i in 0..<10) {
def t1 = System.currentTimeMillis()
def counts = [:]
new File("./20_newsgroups").eachFileRecurse{ f ->
if (!f.isDirectory() && !f.path.contains(".svn")) {
f.text.toLowerCase().eachMatch(/\w+/) { w ->
def c = counts[w] ?: 0
counts[w] = c + 1 } } }
new File("counts-descreasing-groovy").withWriter { out ->
counts.sort { a, b -> b.value <=> a.value }.each { k, v -> out << "$k\t$v\n" } }
new File("counts-alphabetical-groovy").withWriter { out ->
counts.sort { it.key }.each { k, v -> out << "$k\t$v\n" } }
println "Finished in ${System.currentTimeMillis() - t1} millis"
}

Truly speaking, the difference with previous script is minor. We just added @Typed annotation in the line 1 and def-keyword in lines 4 and 5.

Amazingly, it was enough to achieve 6 times performance gain. Yes, the statically types script averagely process all our files in 8.6 seconds

Now is a good time to try to utilize multiple cores we have. By the nature of the problem there are two groups of activities and each of them can be run in parallel - accumulation of statistic and generating reports. Let us see what can we do with Groovy++.

Here is the code.

@Typed package org.mbte.groovypp.examples.wordcount

import java.util.concurrent.*
import groovy.util.concurrent.*

for (i in 0..<10) {

def t1 = System.currentTimeMillis()
def pool = Executors.newFixedThreadPool(30)
def counts = new AtomicIntegerMap ()

new File("./20_newsgroups").recurseFileIterator().filter{ file ->
!file.directory && !file.path.contains(".svn")
}.each(pool,4) { file ->
file.charSequence.eachMatch(/\w+/) { String w -> w = w.toLowerCase()
counts[w].incrementAndGet ()
}
}.whenBound {
pool.execute {
new File("counts-descreasing-groovy").withWriter { Writer out ->
counts.asList().sort { a, b -> b.get() <=> a.get() }.each { e -> out << "$e.key\t${e.get()}\n" }
}
} {
new File("counts-alphabetical-groovy").withWriter { Writer out ->
counts.asList().sort { a, b -> b.key <=> a.key }.each { e -> out << "$e.key\t${e.get()}\n" }
}
}
pool.shutdown()
}

pool.awaitTermination(30,TimeUnit.SECONDS)
println "Finished in ${System.currentTimeMillis() - t1} millis"
}

Obviously, it is more lengthy. Fortunately, it is much faster - averagely it processes files in 5.5 seconds.

Here is brief explaination of tools used. We create a thread pool, start with recursive iterator over our files, apply filter to skip unneeded directories and files and then start concurrent iteration of filtered iterator using our thread pool. It is done with each(pool, 4){...} call, which returns immiditely something called BindLater.

BindLater is specialized and high performance version of java.util.concurrent.Future, which also supports adding listeners of calculation completion. We add such listener with whenBound{...} call. When calculation complete the listener starts concurrent execution of report generation. Voila.

Last important thing to notice is AtomicIntegerMap used for accumulation results. AtomicIntegerMap is specialized and high performant concurrent map of AtomicIntegers. It based on the same implementation, which in the past allowed us to noticably increase performance of normal Groovy runtime. Standard library of static Groovy also provide Atomic(Boolean|Long|Reference)Map, which are handy in many situations.

Obviously, there is more place for improvement. Two things, which come to mind immidiately is that we can probably precompile regular expression pattern instead of doing it for each file and  we can separate reading of files from accumulating statistics by doing it also in parallel.

Now, we come to the original question of the article - why does Groovy++ overperform Java?

Truly speaking, I don't know. The best average I was able to achive (only once) with non-concurrent Java code was 10.5sec to process all files. I am sure that it has nothing to do with bytecode generation by javac but with a little bit different algorithm used by Java code. As this code wasn't written neither by me nor by Paul and none of us did any performance tuning of it the comparision itself is probably not very correct.

It will be very interesting if someone will be curios enough to do really fastest possible Java implementation.

But does it really matter? Even if statically typed groovy code performs exactly as slow as java one we have 16LOC instead of 65 and this 16LOC is concentrated minds, no boilerplate code.

Thank you for reading and give your own try to Groovy++!

Published at DZone with permission of its author, Alex Tkachman.
Tags:

Comments

Artur Biesiadowski replied on Tue, 2010/01/12 - 10:03am

I must say that opening/reading/closing 2500 files per second is already very impressive, without any word counting. With 10KB average per file, we are in 25000KB, which means 25MB per second of sustained input over possibly very fragmented (on disk) files, with 2500 open and close operations on top of that. Word counting is only topping on the cake.

I assume that everything is already in OS cache - I hardly believe possibility of doing 2500 accesses to disk per second. Can you redo the test WITHOUT word counting (just sum f.text.length() or something similar to force reading) and see what is the difference? There is a big chance that what you are measuring is just I/O performance, not word counting.

If this is the case, then static groovy would be a real benefit. To achieve 6 times speedup on the test which is mostly I/O bound, it would mean that actual work (word counting) has to be 100 or so times faster.

I'll try to play with static groovy at home and do the pure math benchmarks for comparison - especially my favorite sorting one, which turns out to be 300 times slower in groovy than in java (and calling Arrays.sort from groovy doesn't count as valid solution for benchmark of language...). If static groovy can get within only 2-3 times slower than java (which would mean 100+ times improvement from dynamic groovy), it would be a real killer.

Alex Tkachman replied on Tue, 2010/01/12 - 11:18am in response to: Artur Biesiadowski

Artur, you are absolutely right and the benchmark is IO bound and OS cache plays huge role. It's exact reason why I use 10 measurements and drop highest and lowest.

Charles Oliver ... replied on Tue, 2010/01/12 - 11:27am

If you have done no perf tuning of the Java code, it's probably premature to say that Groovy++ is faster than Java. I'm sure you worked on the Groovy++ code to get it to run as fast as possible. It's become a little bothersome to me that people claim X is faster than Java running on the Java VM. If X can be a certain speed, then Java can almost certainly achieve that speed if written properly. I will grant it may be easier to make X run faster than naive Java code, but to say you can't make it run that fast in Java is absurd. Write it correctly and the perf will be no different. A suggestion for your svn repository: perhaps you could commit all the newsgroup files as a zip, rather than directly into the respository. I've been watching it check out files for the past 10 minutes.

Alex Tkachman replied on Tue, 2010/01/12 - 11:46am in response to: Charles Oliver Nutter

Charlie, I openly say that I didn't tune Java code and I don't claim that static groovy is faster than Java - it is obvious. And I am sorry, that you find name of my post provoking but I am sure you understand that it's done with purpose. Regarding files, if you read original posts comparing performance of different languages including Ruby, you will find link for zip with files. I also think that better place to discuss svn suggestions is dedicated group.

Paul King replied on Tue, 2010/01/12 - 8:25pm

In Groovy trunk there is also a new traverse() method which means if you like dense code the normal Groovy solution can be pruned to 7 lines:

def (t1, counts) = [System.currentTimeMillis(), [:]]
new File("20_newsgroups").traverse(type:groovy.io.FileType.FILES){ f ->
f.text.toLowerCase().eachMatch(/\w+/){ w -> counts[w] = (counts[w] ?: 0) + 1 } }
def writeOrdered = { order, w -> counts.sort(order).each{ k, v -> w << "$k\t$v\n" } }
new File("counts-decreasing-groovy").withWriter{ writeOrdered.curry{ -it.value }(it) }
new File("counts-alphabetical-groovy").withWriter{ writeOrdered.curry{ it.key }(it) }
println "Finished in ${System.currentTimeMillis() - t1} millis"

Paul King replied on Tue, 2010/01/12 - 8:09pm

Not that microbenchmarks mean anything, but to compare best LOC for normal Groovy (horrible measure) and best performance for static Groovy (again IO bound so hard to know for sure if it means anything) the approx figures are shown below:
  LOC Runtime (secs)
Ruby 36 88
Scala (orig) 54 39
Clojure 19 25
Scala (2.7) 38 45
Scala (2.8) 38 54
Python 32 13
Java 61 21
Groovy/Static Groovy 7 8
Not meant to be accurate, just an indicative guide showing some useful recent progress in the Groovy ecosystem of late.

Dapeng Liu replied on Tue, 2010/01/12 - 9:12pm

so confused ...

how is that possible Groovy runs faster than native java code, isn't Groovy built on top of JVM? in this case, both are calling the same API

21 sec vs 8 sec is huge ... 

Paul King replied on Tue, 2010/01/12 - 11:32pm in response to: Dapeng Liu

It's not really an apples with apples comparison. I just took the timings from the original article and scaled the timings from the Groovy++ benchmark examples accordingly. The Java code is equivalent to what was given from the original article - no parallelism at all. The fastest Groovy++ examples are using some parallelism - just like some of the other examples in the referenced articles (and subsequent comments) do - hence all my caveats about only using the results as a guide. Interesting enough though is that the non-parallel unoptimised Static Groovy result is still approx 40% faster than the non-parallel unoptimised Java version - it would definitely be worthwhile investigating why that is so.

Adam Rabung replied on Wed, 2010/01/13 - 3:26pm

Disclaimer: microbenchmarks are baloney, but they sure are fun!

I modified your Java version and went from 12.6 seconds down 7.7. With that ratio, it would have taken about 6.4 seconds on your machine.

The most significant change I made was using a HashMap rather than a TreeMap to count the occurrences of the words, and adding an additional sort when displaying by word count. I was a bit surprised that this change would dramatically help performance, but it did. In retrospect, that's the exact algorithm you chose in Groovy - [:] instantiates a LinkedHashMap - you could probably use just Hashmap instead and shave off some more time.

I wonder how much a true IntMap would reduce this further, as it would get rid of a lot of unboxing. A higher level language like static Groovy could potentially sneak one of these in behind the scenes at compile time. Scala added a @specialized annotation in 2.8 to reduce boxing overhead.

One other note: your Java version is actually cheating a little bit - it is case-sensitive, so it's skipping a lot of .toLowerCase calls.

Final note: I first tried this with 1.5, and the numbers were more like (12.2/19.8). Java 6 is an amazing piece of software.

Paul King replied on Wed, 2010/01/13 - 9:59pm in response to: Adam Rabung

Yes, I suspected the TreeMap (see earlier emails in the Google group discussion list) but hadn't had time to verify yet. The Java version I run locally has the toLowerCase() calls but that part of my patch never made it back into the repo. It would be interesting to see how fast a hand-crafted java.util.concurrent.* solution ran.

Alex Tkachman replied on Thu, 2010/01/14 - 1:47pm in response to: Adam Rabung

Adam, any chances you can share with us your java version, so we include it in our repo?

Adam Rabung replied on Thu, 2010/01/14 - 3:30pm

Sure! Not much changed, however:
  • Collect the counts w/ a backing HashMap, rather than TreeHashMap (and then sort before display)
  • Word count needs to be case-insensitive (which is actually an optimization)
  • Buffered writers

http://gist.github.com/277457

Evgeny Goldin replied on Thu, 2010/01/14 - 6:56pm

Thank you! I hope one day you'll get to documenting the groovy.util.concurrent package - so far I'm decompiling it trying to understand what other options are available ..

Carla Brian replied on Sat, 2012/05/19 - 9:37am

I've heard from my friends who are good in groovy, they prefer this than java because of its methods and other packages. It is more reliable and efficient. - CWD Construction

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.