I wouldn’t normally choose the JVM as a platform for numerical computing. One rather tricky area is in working with arrays of primitives: ints and floating point data with compact and efficient representations, with hardware support for doing math. The JVM usually provides a contiguous chunk of memory to store this data in situ (even though it’s not guaranteed). The JVM can sometimes eliminate bounds checking, and avoid boxing and unboxing array elements during operations, so one might expect relatively close-to-native performance of array operations. A simple tight loop summing over an array on the JVM seems to perform 2x slower than native (C) code in my tests. This feels like a reasonable price to pay for the obvious benefits of the JVM.

In practice, it has proven difficult to avoid boxing and un-boxing and still write expressive, idiomatic, generic Scala code. One promising feature in Scala comes in the form of the @specialization type annotation, which directs the compiler to generate additional code specialized for primitive types. But some extremely subtle issues arise which make this a lot trickier than it ought to be.

For instance, in the following code, the Iterator-based sum is about 10x slower than the while loop. What happens is the non-specialized ‘sum’ function of TraversableOnce is called, which itself invokes the generic ‘foldLeft’, and onward into the guts of the collection library. Since neither sum nor foldLeft is specialized, their invocations force boxing and un-boxing while interacting with our specialized code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class Iter[@specialized(Double) T](val arr: Array[T]) extends Iterator[T] {
var idx = 0
override def hasNext: Boolean = idx < arr.size
override def next: T = { val t = arr(idx); idx += 1; t}
}
object Iter {
def apply[@specialized(Double) T](arr: Array[T]) = new Iter[T](arr)
}
object TestMe {
def clock(f: => Double) = {
var s = System.nanoTime
val v = f
(v, (System.nanoTime() - s)/1e9)
}
def whil(a: Array[Double]) = {
var i = 0
var acc = 0d
while(i < a.size) {
acc += a(i)
i += 1
}
acc
}
def main(args: Array[String]) {
val arr = Array.fill(1000000)(1d)
println(clock { Iter(arr).sum } )
println(clock { whil(arr) } )
}
}

The obvious answer is that we might provide our own 'sum' function. And following the design in the TraversableOnce trait, we ought to do so via a specialized 'foldLeft'. However, working alongside with my "Scalleague" Chris Lewis, I’ve found the compiler not to play along nicely. There are tons of specialization pitfalls that can cripple your fast-looking code. Details in a future post.

It took me some fiddling to fix the following message:

java.lang.UnsatisfiedLinkError: no jhdf5 in java.library.path

Hopefully this post will save someone else some time.

After you get the pre-built binaries, un-tar the archive to the /lib sub-directory of your sbt project. The /lib directory will look like this:

Shell
1
2
3
4
5
6
7
8
9
10
11
drwxr-xr-x@ 4 adam admin 136 Jul 8 13:26 ext/
-rwxr-xr-x@ 1 adam admin 105605 Jul 8 13:26 fits.jar*
-rw-r--r--@ 1 adam admin 44509 Jul 8 13:26 jhdf.jar
-rw-r--r--@ 1 adam admin 44001 Jul 8 13:26 jhdf4obj.jar
-rw-r--r--@ 1 adam admin 72604 Jul 8 13:26 jhdf5.jar
-rw-r--r--@ 1 adam admin 71126 Jul 8 13:26 jhdf5obj.jar
-rw-r--r--@ 1 adam admin 160142 Jul 8 13:26 jhdfobj.jar
-rw-r--r--@ 1 adam admin 403731 Jul 8 13:26 jhdfview.jar
-rwxr-xr-x@ 1 adam admin 198911 Jul 8 13:26 junit.jar*
drwxr-xr-x@ 6 adam admin 204 Jul 8 13:26 macosx/
-rwxr-xr-x@ 1 adam admin 551616 Jul 8 13:26 netcdf.jar*

If you run sbt -> console, and look at your java.library.path, you’ll probably see something like this:

scala> println(System.getProperty("java.library.path"))
.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java

So I added some symlinks to /usr/lib/java:

Shell
1
2
lrwxr-xr-x 1 root wheel 77 Jul 8 13:29 libjhdf.jnilib@ -> <dir>/lib/macosx/libjhdf.jnilib
lrwxr-xr-x 1 root wheel 78 Jul 8 13:29 libjhdf5.jnilib@ -> <dir>/lib/macosx/libjhdf5.jnilib

Where dir is replaced with the path to your sbt project’s directory.

After that, it worked.

© 2013 Adam Klein's Blog Suffusion theme by Sayontan Sinha, modified by Adam :)