I found myself using a lazy val to memoize the result of a computation. The computation, however, depended on external state. What I really wanted to do was (lazily) cache the computation and later invalidate it on external state changes. I also wanted to use it as seamlessly as I would use a lazy val. Enter LazyCache:

It can be used as follows:

In the previous post we saw how easily specialization fails when used naively.

To utilize this powerful feature effectively, you really gotta know what’s going on.

A Google Groups post by Erik Osheim should be required reading for attempting specialization in Scala 2.9.2. It goes over the implementation and its shortcomings, and makes suggestions for future improvements.

The post is here, but I will cut-and-paste it below. Then, if you feel like digging even deeper, go read Iulian Dragos’s thesis.

Even armed with all this knowledge, however, you may still fail at your specialization attempts. It behooves you to read all the outstanding issues and tags.


Hi everyone,

I. Background

I’ve been looking closely at specialization bugs recently, trying to
figure out how to make the feature less buggy, and more able to handle
real-world situations that users might throw at it. (As a library
author trying to use specialization I feel like I have a reasonably
good idea of some important use cases.)

This has involved reading Iulian’s thesis, working on fixing bugs, and
discussing it with others. I don’t claim to understand every detail of
the specialize phase implementation, but I feel like I have a pretty
good handle on how specialization does (or doesn’t) work, the current
state of it, and some of the (seemingly) intractable problems with it.

I’m writing this because recently I’ve been kicking around a different
class specialization design in my head, and I just want to get it out
into the wider world so that other people can either show me why it’s
naive and wrong, or to pave the way to a formal SIP around improving
specialization (or just a go-ahead signal from martin &co).

First, I will try to discuss some of the (seemingly intractable)
problems with specialization. In addition to these, there are lots of
other bugs we’re currently facing, but with work I think a lot of these
can be fixed. I want to focus on the problems I see with the overall
design.

II. Example of Specialization

Consider this code:

Specializing Foo does a lot of things. Here’s a rough outline of how
class specialization plays out in this case:

1. It decides which of Foo’s methods should be specialized on A, Foo’s
type parameter. In this case it will choose bar and baz.

2. For each of those methods, and for each primitive type we want to
specialize on (just Int in this case) it creates a “specialized
variant”, e.g. Foo.bar$mcI$sp, which replaces A with Int. This
method will call into the original method (e.g. forwarding to bar),
boxing and unboxing to satisfy its own interface. This is sometimes
called a “specialized overload”.

3. For each type we want to specialize on, a specialized subclass of
Foo is created. In this case, just Foo$mcI$spa is created,
corresponding to Foo[Int].

4. For each of the fields in Foo (e.g. a and b) Foo$mcI$sp is given
specialized versions of them (e.g. a$mcI$sp and b$mcI$sp).

5. All of the “specialized methods” in Foo from step #1 (both the
original method and the corresponding specialized variant) are
overridden in Foo$mcI$sp.

6. The specialized subclass’ override of the specialized variant (i.e.
Foo$mcI$sp.bar$mcI$sp) is implemented by copying the tree of the
original method (i.e. Foo.bar) and then rewritten by replacing all
fields/methods/classes with specialized versions where appropriate.
So, in bar’s implementation, “new Foo” is rewritten to be
“new Foo$mcI$sp”.

7. The subclass’ override of the original method (Foo$mcI$sp.bar)
is forwarded to the specialized variant (Foo$mcI$sp.bar$mcI$sp)
doing boxing/unboxing as required by its interface.

The result (in summarized form) is:

III. Analysis of Existing Strategy

This strategy is designed to allow generic and specialized code to be
intermixed: since all the specialized subclasses extend Foo, code that
treats a specialized class/instance as generic will continue working
(because the specialized instances supports the same interface as the
generic version). This strategy also reduces the amount of bytecode
duplicated, since only overrides some of Foo’s methods, rather than all
of them.

In practice, the fact that users provide one class (Foo) which
effectively becomes an inheritance hierarchy (i.e. Foo$mcI$sp extending
Foo) is a huge headache. It prevents specialized classes from
effectively using private (since specialized subclasses would not be
able to forward to or override a private member of Foo). Modifiers like
final and @inline often don’t work correctly for Foo: even though the
user has no intention of extending Foo, the specialization machinery
needs to be able to do just that.

There are problems with double-initializing final vals (once in
Foo$mcI$sp’s constructor as null, and again when it calls into Foo’s
constructor). There’s the fact that the specialized subclasses often
have twice as many fields as the original class. There’s the fact that
extending a specialized class doesn’t work correctly, meaning that
rather than a (mostly transparent) optimizing annotation,
specialization is a mandate to remove abstract classes in favor of
traits.

In my opinion, almost all of the major issues in specialization stem
from the fact that specialized subclasses end up having one class’ guts
smeared between two classes in a way that is hard for the end user to
predict or control.

IV. Proposed Changes

The major shift I propose is moving from a world where Foo is both a
generic class, and also the parent of specialized subclasses, to a
world where Foo becomes just an interface (a trait with only
public/protected members and without implementations). AnyRef already
moves us closer to this goal, but this change brings us all the way.

This means that all instances of Foo need to be rewritten, either to a
specialized form (e.g. Foo$mcI$sp) or to the AnyRef specialization
(e.g. Foo$mcL$sp[A]). These classes all extend Foo, but do not share
any implementation code: they would each contain a full copy of Foo’s
members (including private members), their own constructors, etc.

The downside of this is that you really do have x2-10 as much bytecode
as you did when Foo was just a single class. It also means that you
can’t write code that just uses Foo (because Foo is no longer a
concrete class), instead your compiler *always* has to translate Foo to
Foo$mcL$sp, even if you are compiling with specialization turned off.
This also means that people using things like reflection will always
see the “specialized names” of classes.

But the upsides are great! For one, you don’t need to rewrite any of
the access controls, annotations, or other modifiers like final. Since
we just renamed a class into another class (and performed some
transformations on types/names) we can be pretty sure that the
semantics of the class intialization, inlining, visibility, etc are
what the original author intended. We also don’t end up with extra
fields that we don’t plan to use.

Also it may not be more bytecode in some cases. Let’s say we generate 5
specializations of a class (with 3 specialized methods). The current
startegy involves creating 5 specialized overloads of the 3 methods,
then creating 5 subclasses overriding all the methods. That is, you end
up with 6 classes each with 18 methods (although many of the methods
just do boxing/unboxing and forward somewhere else). My strategy ends
up with 6 classes each with 3 methods (each method containing the full
implementation).

My sense is that the transformation I’m talking about is a lot easier
in practice than the one we are currently trying to support.
Unfortunately it is not backwards-compatible with the existing
strategy, although it could probably use the same naming scheme if we
want. I would rather not try to support both schemes.

V. Example

For completeness, I will sketch out how the previous class would be
transformed under my scheme:

As you can see, Foo$mcL$sp[A] and Foo$mcI$sp are both (relatively)
obvious transformations of the original Foo class, and any code which
would work with Foo[A] should work with these (either with Foo$mcI$sp
when A is known to be Int, or Foo$mcL$sp otherwise). The big difference
is that calling code would need to transform:

into

VI. Final Thoughts

Again, this doesn’t seem (to me) to be that much more difficult than
what is currently being done and I don’t see obvious problems with
partial compilation. In fact, I think the question about how to handle
specialized classes is significantly *less* complicated, since in these
cases you don’t need to munge the method names (or worry about the
difference between calling Foo.bar$mcI$sp and Foo$mcI$sp.bar$mcI$sp).

I haven’t mentioned “method specialization” because I think the
existing approach is basically fine and seems to work pretty well.
There may be bugs with it, but I don’t think these are structural. So,
even in my proposal users could specialized particular methods
basically as they do now (by creating a copy of their tree, renaming
it, and substituting type params and other classes/methods as
appropriate for that specialized version).

What do you all think? Is this a terrible idea? A great one? Are there
things I have completely overlooked? Am I wrong, and the existing
strategy is fine (modulo a few design changes)? I look forward to your
feedback.

— Erik

I wouldn’t normally choose the JVM as a platform for numerical computing. One rather tricky area is in working with arrays of primitives: ints and floating point data with compact and efficient representations, with hardware support for doing math. The JVM usually provides a contiguous chunk of memory to store this data in situ (even though it’s not guaranteed). The JVM can sometimes eliminate bounds checking, and avoid boxing and unboxing array elements during operations, so one might expect relatively close-to-native performance of array operations. A simple tight loop summing over an array on the JVM seems to perform 2x slower than native (C) code in my tests. This feels like a reasonable price to pay for the obvious benefits of the JVM.

In practice, it has proven difficult to avoid boxing and un-boxing and still write expressive, idiomatic, generic Scala code. One promising feature in Scala comes in the form of the @specialization type annotation, which directs the compiler to generate additional code specialized for primitive types. But some extremely subtle issues arise which make this a lot trickier than it ought to be.

For instance, in the following code, the Iterator-based sum is about 10x slower than the while loop. What happens is the non-specialized ‘sum’ function of TraversableOnce is called, which itself invokes the generic ‘foldLeft’, and onward into the guts of the collection library. Since neither sum nor foldLeft is specialized, their invocations force boxing and un-boxing while interacting with our specialized code.

The obvious answer is that we might provide our own ‘sum’ function. And following the design in the TraversableOnce trait, we ought to do so via a specialized ‘foldLeft’. However, working alongside with my “Scalleague” Chris Lewis, I’ve found the compiler not to play along nicely. There are tons of specialization pitfalls that can cripple your fast-looking code. Details in a future post.

It took me some fiddling to fix the following message:

java.lang.UnsatisfiedLinkError: no jhdf5 in java.library.path

Hopefully this post will save someone else some time.

After you get the pre-built binaries, un-tar the archive to the /lib sub-directory of your sbt project. The /lib directory will look like this:

If you run sbt -> console, and look at your java.library.path, you’ll probably see something like this:

scala> println(System.getProperty("java.library.path"))
.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java

So I added some symlinks to /usr/lib/java:

 

Where dir is replaced with the path to your sbt project’s directory.

After that, it worked.

© 2014 Adam Klein's Blog Suffusion theme by Sayontan Sinha, modified by Adam :)