I found myself using a lazy val to memoize the result of a computation. The computation, however, depended on external state. What I really wanted to do was (lazily) cache the computation and later invalidate it on external state changes. I also wanted to use it as seamlessly as I would use a lazy val. Enter LazyCache:

It can be used as follows:

In the previous post we saw how easily specialization fails when used naively.

To utilize this powerful feature effectively, you really gotta know what’s going on.

A Google Groups post by Erik Osheim should be required reading for attempting specialization in Scala 2.9.2. It goes over the implementation and its shortcomings, and makes suggestions for future improvements.

The post is here, but I will cut-and-paste it below. Then, if you feel like digging even deeper, go read Iulian Dragos’s thesis.

Even armed with all this knowledge, however, you may still fail at your specialization attempts. It behooves you to read all the outstanding issues and tags.

Hi everyone,

I. Background

I’ve been looking closely at specialization bugs recently, trying to
figure out how to make the feature less buggy, and more able to handle
real-world situations that users might throw at it. (As a library
author trying to use specialization I feel like I have a reasonably
good idea of some important use cases.)

This has involved reading Iulian’s thesis, working on fixing bugs, and
discussing it with others. I don’t claim to understand every detail of
the specialize phase implementation, but I feel like I have a pretty
good handle on how specialization does (or doesn’t) work, the current
state of it, and some of the (seemingly) intractable problems with it.

I’m writing this because recently I’ve been kicking around a different
class specialization design in my head, and I just want to get it out
into the wider world so that other people can either show me why it’s
naive and wrong, or to pave the way to a formal SIP around improving
specialization (or just a go-ahead signal from martin &co).

First, I will try to discuss some of the (seemingly intractable)
problems with specialization. In addition to these, there are lots of
other bugs we’re currently facing, but with work I think a lot of these
can be fixed. I want to focus on the problems I see with the overall

II. Example of Specialization

Consider this code:

Specializing Foo does a lot of things. Here’s a rough outline of how
class specialization plays out in this case:

1. It decides which of Foo’s methods should be specialized on A, Foo’s
type parameter. In this case it will choose bar and baz.

2. For each of those methods, and for each primitive type we want to
specialize on (just Int in this case) it creates a “specialized
variant”, e.g. Foo.bar$mcI$sp, which replaces A with Int. This
method will call into the original method (e.g. forwarding to bar),
boxing and unboxing to satisfy its own interface. This is sometimes
called a “specialized overload”.

3. For each type we want to specialize on, a specialized subclass of
Foo is created. In this case, just Foo$mcI$spa is created,
corresponding to Foo[Int].

4. For each of the fields in Foo (e.g. a and b) Foo$mcI$sp is given
specialized versions of them (e.g. a$mcI$sp and b$mcI$sp).

5. All of the “specialized methods” in Foo from step #1 (both the
original method and the corresponding specialized variant) are
overridden in Foo$mcI$sp.

6. The specialized subclass’ override of the specialized variant (i.e.
Foo$mcI$sp.bar$mcI$sp) is implemented by copying the tree of the
original method (i.e. Foo.bar) and then rewritten by replacing all
fields/methods/classes with specialized versions where appropriate.
So, in bar’s implementation, “new Foo” is rewritten to be
“new Foo$mcI$sp”.

7. The subclass’ override of the original method (Foo$mcI$sp.bar)
is forwarded to the specialized variant (Foo$mcI$sp.bar$mcI$sp)
doing boxing/unboxing as required by its interface.

The result (in summarized form) is:

III. Analysis of Existing Strategy

This strategy is designed to allow generic and specialized code to be
intermixed: since all the specialized subclasses extend Foo, code that
treats a specialized class/instance as generic will continue working
(because the specialized instances supports the same interface as the
generic version). This strategy also reduces the amount of bytecode
duplicated, since only overrides some of Foo’s methods, rather than all
of them.

In practice, the fact that users provide one class (Foo) which
effectively becomes an inheritance hierarchy (i.e. Foo$mcI$sp extending
Foo) is a huge headache. It prevents specialized classes from
effectively using private (since specialized subclasses would not be
able to forward to or override a private member of Foo). Modifiers like
final and @inline often don’t work correctly for Foo: even though the
user has no intention of extending Foo, the specialization machinery
needs to be able to do just that.

There are problems with double-initializing final vals (once in
Foo$mcI$sp’s constructor as null, and again when it calls into Foo’s
constructor). There’s the fact that the specialized subclasses often
have twice as many fields as the original class. There’s the fact that
extending a specialized class doesn’t work correctly, meaning that
rather than a (mostly transparent) optimizing annotation,
specialization is a mandate to remove abstract classes in favor of

In my opinion, almost all of the major issues in specialization stem
from the fact that specialized subclasses end up having one class’ guts
smeared between two classes in a way that is hard for the end user to
predict or control.

IV. Proposed Changes

The major shift I propose is moving from a world where Foo is both a
generic class, and also the parent of specialized subclasses, to a
world where Foo becomes just an interface (a trait with only
public/protected members and without implementations). AnyRef already
moves us closer to this goal, but this change brings us all the way.

This means that all instances of Foo need to be rewritten, either to a
specialized form (e.g. Foo$mcI$sp) or to the AnyRef specialization
(e.g. Foo$mcL$sp[A]). These classes all extend Foo, but do not share
any implementation code: they would each contain a full copy of Foo’s
members (including private members), their own constructors, etc.

The downside of this is that you really do have x2-10 as much bytecode
as you did when Foo was just a single class. It also means that you
can’t write code that just uses Foo (because Foo is no longer a
concrete class), instead your compiler *always* has to translate Foo to
Foo$mcL$sp, even if you are compiling with specialization turned off.
This also means that people using things like reflection will always
see the “specialized names” of classes.

But the upsides are great! For one, you don’t need to rewrite any of
the access controls, annotations, or other modifiers like final. Since
we just renamed a class into another class (and performed some
transformations on types/names) we can be pretty sure that the
semantics of the class intialization, inlining, visibility, etc are
what the original author intended. We also don’t end up with extra
fields that we don’t plan to use.

Also it may not be more bytecode in some cases. Let’s say we generate 5
specializations of a class (with 3 specialized methods). The current
startegy involves creating 5 specialized overloads of the 3 methods,
then creating 5 subclasses overriding all the methods. That is, you end
up with 6 classes each with 18 methods (although many of the methods
just do boxing/unboxing and forward somewhere else). My strategy ends
up with 6 classes each with 3 methods (each method containing the full

My sense is that the transformation I’m talking about is a lot easier
in practice than the one we are currently trying to support.
Unfortunately it is not backwards-compatible with the existing
strategy, although it could probably use the same naming scheme if we
want. I would rather not try to support both schemes.

V. Example

For completeness, I will sketch out how the previous class would be
transformed under my scheme:

As you can see, Foo$mcL$sp[A] and Foo$mcI$sp are both (relatively)
obvious transformations of the original Foo class, and any code which
would work with Foo[A] should work with these (either with Foo$mcI$sp
when A is known to be Int, or Foo$mcL$sp otherwise). The big difference
is that calling code would need to transform:


VI. Final Thoughts

Again, this doesn’t seem (to me) to be that much more difficult than
what is currently being done and I don’t see obvious problems with
partial compilation. In fact, I think the question about how to handle
specialized classes is significantly *less* complicated, since in these
cases you don’t need to munge the method names (or worry about the
difference between calling Foo.bar$mcI$sp and Foo$mcI$sp.bar$mcI$sp).

I haven’t mentioned “method specialization” because I think the
existing approach is basically fine and seems to work pretty well.
There may be bugs with it, but I don’t think these are structural. So,
even in my proposal users could specialized particular methods
basically as they do now (by creating a copy of their tree, renaming
it, and substituting type params and other classes/methods as
appropriate for that specialized version).

What do you all think? Is this a terrible idea? A great one? Are there
things I have completely overlooked? Am I wrong, and the existing
strategy is fine (modulo a few design changes)? I look forward to your

— Erik

I wouldn’t normally choose the JVM as a platform for numerical computing. One rather tricky area is in working with arrays of primitives: ints and floating point data with compact and efficient representations, with hardware support for doing math. The JVM usually provides a contiguous chunk of memory to store this data in situ (even though it’s not guaranteed). The JVM can sometimes eliminate bounds checking, and avoid boxing and unboxing array elements during operations, so one might expect relatively close-to-native performance of array operations. A simple tight loop summing over an array on the JVM seems to perform 2x slower than native (C) code in my tests. This feels like a reasonable price to pay for the obvious benefits of the JVM.

In practice, it has proven difficult to avoid boxing and un-boxing and still write expressive, idiomatic, generic Scala code. One promising feature in Scala comes in the form of the @specialization type annotation, which directs the compiler to generate additional code specialized for primitive types. But some extremely subtle issues arise which make this a lot trickier than it ought to be.

For instance, in the following code, the Iterator-based sum is about 10x slower than the while loop. What happens is the non-specialized ‘sum’ function of TraversableOnce is called, which itself invokes the generic ‘foldLeft’, and onward into the guts of the collection library. Since neither sum nor foldLeft is specialized, their invocations force boxing and un-boxing while interacting with our specialized code.

The obvious answer is that we might provide our own ‘sum’ function. And following the design in the TraversableOnce trait, we ought to do so via a specialized ‘foldLeft’. However, working alongside with my “Scalleague” Chris Lewis, I’ve found the compiler not to play along nicely. There are tons of specialization pitfalls that can cripple your fast-looking code. Details in a future post.

It took me some fiddling to fix the following message:

java.lang.UnsatisfiedLinkError: no jhdf5 in java.library.path

Hopefully this post will save someone else some time.

After you get the pre-built binaries, un-tar the archive to the /lib sub-directory of your sbt project. The /lib directory will look like this:

If you run sbt -> console, and look at your java.library.path, you’ll probably see something like this:

scala> println(System.getProperty("java.library.path"))

So I added some symlinks to /usr/lib/java:


Where dir is replaced with the path to your sbt project’s directory.

After that, it worked.

Reviewing some basic probability and statistics, I rediscovered the geometric distribution, which tells you the likelihood of winning a game after some number of attempts. With it, we can figure out the number of times you should expect to play the game until the outcome is “success” – or, equivalently, the number of plays you’d need in order to expect one success among them.

Why is this important? They say the odds of winning the current record-setting powerball lottery are 1 in 176 million. I want to be rich. How many tickets do I need to buy to expect a winning one?

The intuition is straightforward. For instance, if there is a p=25% chance of success on any one trial, the expected ratio of success to failures is 1/4, or one out of every four. Which means that any four trials should yield one success on average. Mathematically, if we let X be the number of trials required until a success occurs, then E[X] = 1/p.

So the odds are in my favor after buying 176 million randomly generated tickets. Who wants to loan me $176,000,000?

The proof is neat.

If p is the probability of success, and q = (1-p) is the probability of failure, then the probability of succeeding on the i’th trial after (i-1) failures is:

Pr(X=i) = q^(i-1)p


Lines three to four depend on some magic with infinite series. If you have a converging series where x < 1:

We use this last equation in the derivation above. So what else can we do with this? Suppose we want to know how many rolls of a die are needed to see all six sides come up at least once (given a six-sided, fair die, where each role is an independent event). On the first roll, the probability of getting a side we haven't seen is 1. The next role then has a 5/6 probability of yielding a unique side; and then, after the next unique number comes up, the probability becomes 4/6 ... and so on. So, 1 + 1/(5/6) + 1/(4/6) + ... = 14.7 Of course, I distrust math: I need evidence. And so Python to the rescue:


PyCon 2012 was a blast. One of the themes running through the conference was Python’s rising importance in data analysis, particularly as an exploratory and interactive platform. David Beazley points this out in his blog, taking note of the pandas and ipython projects.

pandas has always had time series functionality of necessity. It was originally developed for quantitative finance at AQR Capital Management, where Wes and I worked together several years back. There’s a rich DateOffset class to provide for date-time arithmetic (instances of which include Day, Bday, MonthEnd, etc.), and DateRange is a pandas Index composed of regularly recurring datetime objects for constructing time-indexed data structures.

However, we’ve also got a long-running branch in github to take pandas time series capabilities to the next level.

(FAIR WARNING: API subject to change!!!)


I’m currently merging the scikits.timeseries core functionality into pandas. This library is built around the concept of a Date object that carries with it frequency information. To me, it is helpful to think of Date as a particular interval of time. So for instance, a Date instance with daily frequency represents a particular day in which you have an associated observation; with a monthly frequency, the Date object is that particular month. For example,

As Date instances are represented internally as ordinals (essentially, the integer number of elapsed intervals from the interval in which the Gregorian proleptic date Jan 1, 0001 occurs), arithmetic and interval conversions can be blazingly fast. It can also be an idiomatic way to do date arithmetic. For instance, to get the last business day of this month:

The scikits.timeseries implementation turns out to be too rigid in certain ways. There’s no way to define half-day intervals, or say a daily set of intervals offset by one hour. One immediate difference with the pandas implementation is that it will allow for multiples of base intervals. For instance, in pandas timeseries:

And then of course, there is the associated IntervalIndex, which gives you a new index type to play with:

You can alter the interval as you might in scikits.timeseries (say for instance you need to coerce the index to the final hour of each observed interval):


Another large change is DateRange will be deprecated in favor of DatetimeIndex. Rather than a numpy array of (pointers to) Python datetime objects, the representation is an array of 64-bit ints compatible with the numpy datetime64 dtype … well, the standard UTC-microseconds-from-unix-epoch-ignoring-leap-seconds resolution, to be exact. Accessing the index will yield a Timestamp object, which is a datetime-derived subclass, which will carry frequency information (if any). I’ve worked hard to keep backward compatibility with the old DateRange, so hopefully this change will be fairly transparent.

For instance:

Unlike DateRange, a DatetimeIndex can be composed of arbitrary timestamps, making it irregular:

Additionally, there will be new slicing features:

Another new feature will be conversion and resampling using the pandas group-by machinery. For instance, we can do OHLC downsampling:

And upsampling:


Besides features, there are wins in deriving DatetimeIndex from Int64Index. We get khash-based hashing for free. The int64-compatible dtype allows for vectorized, cython-optimized datetime arithmetic and timezone conversions.


The timeline on stable release is still fuzzy, as there is still plenty to do:

  • API: Crafting an elegant, cohesive interface for the new features. Heavy dogfooding!
  • Internals: the DataFrame block manager needs to become datetime64-aware (perhaps interval-aware as well)
  • Plotting: scikits.timeseries had some great matplotlib plotting infrastructure we are doing our best to port

With all the fundamentals falling in place, pandas time series analysis will be a force to be reckoned with!

I like to think I know C well enough. I’ve read K&R cover to cover, and I’ve done my fair share of C hacking over the years. But I’ve learned that with C, you have to be prepared, always, and at all times, to eat humble pie. I feel compelled to share this whopper of a slice that came steaming hot from Van Der Linden – which I’ve now determined should be required reading for anyone even thinking about ever going near C.

#include "stdio.h"

int main(int argc, char *argv[]) {
    unsigned int x = 1;
    if(-1 < x)
    return 0;

Simple enough? WRONG. When a binary operation is invoked on two incompatible types (in this case, an unsigned int and an int), the “usual arithmetic conversions” apply.

Time to put on that language lawyer hat. According to the C99 rules:

If the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type is converted to the type of the operand with unsigned integer type.


The rank of any unsigned integer type is equal to the rank of the corresponding signed integer type, if any.

Which means, of course, that -1 is “promoted”, at least on the x86-64 platform. And you can be damn sure that 0xFFFFFFFF when considered 4,294,967,295 decimal, is greater than one. Now imagine that your unsigned int definition lives in some other translation unit, long, long ago and far, far away. No, -Wall will not save you.

Another psychopathic killer: automatic string concatenation. Take the following code:

#include "stdio.h"

int main(int argc, char *argv[]) {
    char *config[] = {
        "a man ",
        "a plan ",
        "a canal "
        "panama ",
    printf("%lu\n", sizeof(config)/sizeof(config[0]));
    printf("%s\n", config[2]);

    return 0;

Do you see the bug? I’ve misplaced the comma, which should come after “a canal ” instead of “panama “. Don’t let that final comma fool you: the C compiler just ignores it. However, the C compiler is just jonesing to remove that NUL terminating char ‘\0’ from ‘a canal \0’ and append ‘panama \0’, in which case you’ve just wound up with three, not four, config parameters; and with ‘a canal panama\0’ as your third config entry.

Happy bug hunting!

My friend was kind enough to give me a 64 gig solid-state disk drive to throw into my development box. I was curious about hard disk speed compared to what I’ve been running: a four-disk RAID5 array of Seagate Barracuda 7200 320GB 7200 RPM SATA (3.0Gb/s) on an Areca ARC-1210 PCI-Express x8 SATA II (3.0Gb/s) Controller Card. Just checked my Newegg order history: the two hard drives are from January 2006, the other two from Sept 2006. I didn’t know they had gotten so old! Haven’t had any failures yet, though.

A few tests (sdb5 = raid5, sda5 = sdd):

sudo hdparm -tT /dev/sdb5

 Timing cached reads:   15534 MB in  1.99 seconds = 7788.19 MB/sec
 Timing buffered disk reads: 456 MB in  3.01 seconds = 151.39 MB/sec

sudo hdparm -tT /dev/sda5

 Timing cached reads:   14776 MB in  1.99 seconds = 7407.07 MB/sec
 Timing buffered disk reads: 558 MB in  3.00 seconds = 185.80 MB/sec

Interesting: the cached throughput on the RAID controller is very good, beating my motherboard controller by a sound 400MB/sec. The reads on the SDD are faster by about 30MB/sec. What about random access seek times? I found some C code to do random seeks. Save the code as seeker.c, and compile with gcc seeker.c -o seeker.


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
#include <signal.h>
#include <sys/fcntl.h>
#include <sys/ioctl.h>
#include <linux/fs.h>

#define BLOCKSIZE 4096
#define TIMEOUT 30

int count;
time_t start;

void done()
    time_t end;


    if (end < start + TIMEOUT) {

    if (count) {
      printf(".\nResults: %d seeks/second, %.2f ms random access time\n",
         count / TIMEOUT, 1000.0 * TIMEOUT / count);

void handle(const char *string, int error)
    if (error) {

int main(int argc, char **argv)
    char buffer[BLOCKSIZE];
    int fd, retval;
    unsigned long numblocks;
    off64_t offset;

    setvbuf(stdout, NULL, _IONBF, 0);

    printf("Seeker v2.0, 2007-01-15, "

    if (argc != 2) {
        printf("Usage: seeker <raw disk device>\n");

    fd = open(argv[1], O_RDONLY);
    handle("open", fd < 0);

    retval = ioctl(fd, BLKGETSIZE, &numblocks);
    handle("ioctl", retval == -1);
    printf("Benchmarking %s [%luMB], wait %d seconds",
           argv[1], numblocks / 2048, TIMEOUT);

    signal(SIGALRM, &done);

    for (;;) {
        offset = (off64_t) numblocks * random() / RAND_MAX;
        retval = lseek64(fd, BLOCKSIZE * offset, SEEK_SET);
        handle("lseek64", retval == (off64_t) -1);
        retval = read(fd, buffer, BLOCKSIZE);
        handle("read", retval < 0);
    /* notreached */

You need an accurate blocksize; for me, that was 4096.

$ sudo dumpe2fs /dev/sdb5 | grep 'Block size'
dumpe2fs 1.41.14 (22-Dec-2010)
Block size:               4096

So the results?


Seeker v2.0, 2007-01-15, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sdb5 [420909MB], wait 30 seconds..............................
Results: 85 seeks/second, 11.73 ms random access time


Seeker v2.0, 2007-01-15, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda5 [27235MB], wait 30 seconds..............................
Results: 5429 seeks/second, 0.18 ms random access time

The SSD is two orders of magnitude improvement over the RAID5. Not surprising, but still pretty awesome.

Finally, I wanted to migrate my /home directory onto the SSD. This process worked for me:

sudo mkdir /mnt/ocz
sudo mount /dev/sda5 /mnt/ocz
sudo find . -depth -print0 | sudo cpio --null --sparse -pvd /mnt/ocz
sudo mv /home /home_old
sudo mkdir /home
sudo mount /dev/sda5 /home

Don’t forget to add the mount to /etc/fstab so it stays there! Best to use the UUID of the device, so:

11:29 ~ $ sudo blkid
[sudo] password for adam:
/dev/sda5: LABEL="Linux_OCZ" UUID="..." TYPE="ext4"

So /etc/fstab gets another entry, e.g.:

# /etc/fstab: static file system information.
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# /home => ocz ssd disk
UUID=...        /home           ext4    errors=remount-ro 0       1

As my colleague Wes McKinney likes to say (quoting Matthew Goodman): “Are you using IPython? If not, you’re doing it wrong!”

You shouldn’t have to wait for an exception to invoke the interactive debugger, and you definitely should be using the IPython debugger. One convenience function in the pandas (pandas.util.testing) code base is this:

def debug(f, *args, **kwargs):
    from pdb import Pdb as OldPdb
        from IPython.core.debugger import Pdb
        kw = dict(color_scheme='Linux')
    except ImportError:
        Pdb = OldPdb
        kw = {}
    pdb = Pdb(**kw)
    return pdb.runcall(f, *args, **kwargs)

You can invoke it on a function and arguments like so:

debug(test_function, arg1, arg2, named_arg1='hello')

You will get all the interactive IPython goodness as you step through your code. Funny enough, doesn’t seem like qtconsole version supports tab completion. Maybe will file a bug report…

The Python path determines how the Python interpreter locates modules. How exactly does Python construct the path?

Using the official docs on sys.path, with its footnote reference to the site┬ámodule, I’ll recap the process.

If a script is executed, the interpreter sets the first entry of sys.path to that script’s directory. If Python is launched interactively, the first entry is the empty string (“”), meaning Python will scan the present working directory first. The next entries of sys.path are the contents of the PYTHONPATH environment variable, if it exists. Then, installation-dependent entries are appended (example below).

When initializing, the interpreter normally imports the site module automatically. The module, on import, executes code to find .pth files in known site-packages directory locations, which themselves contain entries which are either paths to add to sys.path, or import calls. If we really want to trace what’s going on, we can launch a Python interpreter with -S to prevent loading the site module automatically, and instead trace the import.

(Note, I am working within a virtualenv called py27.)

(py27) ~$ python -S
Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
[GCC 4.6.1] on linux2
>>> import sys
>>> for p in sys.path:
... print p


I have no PYTHONPATH, so these are just my installation-dependent paths. Now, we need to add the directory where the pdb module lives, so we can import it:

>>> sys.path += ["/usr/lib/python2.7"]
>>> import pdb
>>> pdb.run("import site")
> <string>(1)<module>()
(Pdb) s
> /home/adam/.virtualenvs/py27/lib/python2.7/site.py(64)()
-> """

I’ll spare you the debugging session details, and summarize what I see:

– site.py grabs orig-prefix.txt from <VIRTUAL_ENV>/lib/python2.7, which for me contains “/usr”, and extends the sys.path array to contain additional “/usr”-based paths.

– site.py then scans the site-packages (in lib/python2.7). For each .pth file (in alphabetical order), step through its entries. If an entry begins with “import”, call exec() on the line; otherwise append the (absolute) path to sys.path. Then do the same in the user site-packages directory (in local/lib/python2.7).

Note, easy-install.pth contains executable code, eg:

import sys; sys.__plen = len(sys.path)
import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+len(new)

The executable lines move all the entries (some of which are .egg zipped packages) up to the top of the path.

– After stepping through all .pth files, add the existing site-packages directories themselves.

– Finally, attempt to call “import sitecustomize.py” (which doesn’t do anything on my install).

© 2014 Adam Klein's Blog Suffusion theme by Sayontan Sinha, modified by Adam :)