The Python path determines how the Python interpreter locates modules. How exactly does Python construct the path?

Using the official docs on sys.path, with its footnote reference to the site module, I’ll recap the process.

If a script is executed, the interpreter sets the first entry of sys.path to that script’s directory. If Python is launched interactively, the first entry is the empty string (“”), meaning Python will scan the present working directory first. The next entries of sys.path are the contents of the PYTHONPATH environment variable, if it exists. Then, installation-dependent entries are appended (example below).

When initializing, the interpreter normally imports the site module automatically. The module, on import, executes code to find .pth files in known site-packages directory locations, which themselves contain entries which are either paths to add to sys.path, or import calls. If we really want to trace what’s going on, we can launch a Python interpreter with -S to prevent loading the site module automatically, and instead trace the import.

(Note, I am working within a virtualenv called py27.)

(py27) ~$ python -S
Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
[GCC 4.6.1] on linux2
>>> import sys
>>> for p in sys.path:
... print p


I have no PYTHONPATH, so these are just my installation-dependent paths. Now, we need to add the directory where the pdb module lives, so we can import it:

>>> sys.path += ["/usr/lib/python2.7"]
>>> import pdb
>>>"import site")
> <string>(1)<module>()
(Pdb) s
> /home/adam/.virtualenvs/py27/lib/python2.7/
-> """

I’ll spare you the debugging session details, and summarize what I see:

– grabs orig-prefix.txt from <VIRTUAL_ENV>/lib/python2.7, which for me contains “/usr”, and extends the sys.path array to contain additional “/usr”-based paths.

– then scans the site-packages (in lib/python2.7). For each .pth file (in alphabetical order), step through its entries. If an entry begins with “import”, call exec() on the line; otherwise append the (absolute) path to sys.path. Then do the same in the user site-packages directory (in local/lib/python2.7).

Note, easy-install.pth contains executable code, eg:

import sys; sys.__plen = len(sys.path)
import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+len(new)

The executable lines move all the entries (some of which are .egg zipped packages) up to the top of the path.

– After stepping through all .pth files, add the existing site-packages directories themselves.

– Finally, attempt to call “import” (which doesn’t do anything on my install).

Cython is my new favorite tool. It lets you write compiled C extension modules for the CPython interpreter using annotated Python for speed-critical parts and pure Python for non-critical parts. Further, you can import and call C functions directly. The user guide is (surprisingly?) well-written.

In particular, it lets you do blazing computations using Numpy. See this excellent whitepaper.

But what about that old Python extension module you have lying around? What if you want to utilize Cython to call into it, fast, bypassing its Python API? You don’t want to rip out all the C(++) code you care about from that module and recompile it into a new Cython extension module. Or maybe you do. But suppose you don’t.

You’ll just have to give that rickety old extension a C API and expose it properly!

Let’s imagine you’ve got a function “myfunc” in your old extension module called “myold”. So for example in the file myoldmodule.cpp you may have:

static float64_t myfunc(float64_t x) { ... }

We need to create a new header file, myold_capi.h, that declares and exports the relevant symbols that live in the compiled myold module, and that we would like to import into the new Cython module to call. We use the Python Capsule mechanism for this, and the following comes right out of the Python documentation.

#ifndef _MYOLD_CAPI_H_
#define _MYOLD_CAPI_H_

/* import required header files here */

#ifdef __cplusplus
extern "C" {

/* Total number of C API functions to export */
#define MYOLD_CAPI_pointers 1

/* C API functions to export */
#define MYOLD_myfunc_NUM 0
#define MYOLD_myfunc_RETURN float64_t
#define MYOLD_myfunc_PROTO (float64_t x)

/* This section is used when compiling myold */

static MYOLD_myfunc_RETURN myfunc MYOLD_myfunc_PROTO;
/* This section is used in modules that compile against myold's C API */

static void **MYOLD_CAPI;

#define myfunc \
     (*(MYOLD_myfunc_RETURN (*)MYOLD_myfunc_PROTO) MYOLD_CAPI[MYOLD_myfunc_NUM])

/* Return -1 on error, 0 on success.
   PyCapsule_Import will set an exception if there's an error.  */

static int
    MYOLD_CAPI = (void **)PyCapsule_Import("myold._C_API", 0);
    return (myold_CAPI != NULL) ? 0 : -1;


#ifdef __cplusplus

#endif /* !defined(_MYOLD_CAPI_H_) */

Now, we have to include this header in our old module, myoldmodule.cpp. So right before, say,

PyObject* pModule = 0;

Add these lines:

#include "myold_capi.h"

Finally, in your PyInit_myold() or initmyold() function that initializes your module, you need to create the Capsule holding the array of function pointers you are exporting:

    // start capsule creation for C API
    static void *MYOLD_CAPI[MYOLD_CAPI_pointers];

    MYOLD_CAPI[MYOLD_myfunc_NUM] = (void *)myfunc;

    /* Create a Capsule containing the API pointer array's address */
    PyObject *c_api_object = PyCapsule_New((void *)MYOLD_CAPI, "myold._C_API", NULL);

    if (c_api_object != NULL)
        PyModule_AddObject(pModule, "_C_API", c_api_object);

    // end capsule creation

Awesome. Now, does your old C module still compile? I hope so!

Next, we need to create a new Cython header file, myold.pxd. It should look something like this:

cdef extern from "myold_capi.h":
    # C-API exports via the myold capsule
    float64_t myfunc(float64_t x)
    # must call this before using module
    int import_myold()

Now, go ahead and write your new Cython module, for example mynew.pyx:

from myold cimport *

# The following call is required to initialize the
# static capsule variable that holds the pointers
# to the myold C API functions

cdef class NewClass():
    cpdef float64_t mynewfunc(self, float64_t x):
        return myfunc(x)

Not too bad!

Sadly, this is often the case:


But not today! I wanted to get my dual monitors set up with my good ole’ GTS 250 and got a cringe-inducing nvidia-settings error:

“Failed to set MetaMode (1) ‘DFP-0: nvidia-auto-select@1680×1050 +0+0, DFP-1: nvidia-auto-select @1680×1050 +0+0 (Mode 1680×1050, id: 52) on X screen 0.”

Could a recently-released, updated driver solve this issue? Yes.

From the command line:

sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
sudo apt-get update
sudo apt-get install nvidia-current

Next: reboot. Update additional packages per any suggestions.

No tears!

After reading this post about equations every computer scientist should know, I got to thinking. I love clear, plain-English explanations of powerful mathematical concepts. Not the what, but the why. I’d like to share my understanding of the binomial formula:

This is usually stated “N choose K”, and provides a way to calculate the number of ways in which you may select (or draw) K items from a collection of N items in total, where you don’t care about the order in which you draw them.

Now, if we have N elements, there are N ways to choose a first element; and, if you don’t put that first element back, N-1 ways to choose a second; and so on. That means the total number of ways to draw N elements (counting the order in which we draw them) is N x (N-1) x … x 1. We call this N!, “N factorial”.

But, we don’t care about drawing N elements. We actually care about just K elements. We don’t care about the remaining (N-K) elements. For example, if we are trying to figure out 5 choose 2, we draw from five choices, and then from four, but that’s it. No need to worry about the other (5-2) = 3 draws. So, instead of calculating 5x4x3x2x1, we calculate just 5×4 by “removing” 3x2x1, by dividing. That is, we calculate (5x4x3x2x1)/(3x2x1), or, 5!/(5-2)!. That’s where (N-K)! in the denominator comes from.

Finally, when we draw K elements, we don’t care the order in which we draw them. Following our example, with 5 choose 2, if we calculate 5×4 = 20 choices, we double-count the cases where we choose the same two items in a different order. That means we have to “remove” (divide by) the ways in which we choose K elements in different orders. This is K! ways.

So, there’s the prize: N choose K = N! / [K!x(N-K)!]

I’ve got a spanking new install of Kubuntu 11.10, and I need to get it set up for Python data hacking.  Sure, I could spring for an Enthought Python Distrubution, but where would be the masochism in that?

Inspired by Zed, let’s do this the hard way.

The linux distro comes with Python 2.7.2. Perfect! Or, use Pythonbrew to set up a local Python build that you want to use. I presume you know how to get to the command line, as well as how to edit text files using emacs, vim, pico, whatever.

Let’s get some tools:

sudo apt-get install git gfortran g++

We need to compile against Python headers and get setuptools and pip:

sudo apt-get install python-dev python-pip

Let’s isolate our Python distro from carnage:

sudo apt-get python-virtualenv
sudo pip install virtualenvwrapper

Now these lines to your ~/.bashrc:

source /usr/local/bin/
export WORKON_HOME=$HOME/.virtualenvs

Now open a new terminal and establish a virtual environment, say “py27”:

mkvirtualenv py27
workon py27

We need some math libraries (ATLAS + LAPACK):

sudo apt-get install libatlas-base-dev liblapack-dev

Ok, now to install and build all the scientific python hotness:

pip install numpy scipy

For matplotlib, we need lots of libraries. This one is dependency heavy. Note we can ask Ubuntu what we need, what’s installed, and what is not:

apt-cache depends python-matplotlib | awk '/Depends:/{print $2}' | xargs dpkg --get-selections

Easiest thing to do is just build all the dependencies (just say yes if it asks to install deps of matplotlib instead of python-matplotlib):

sudo apt-get build-dep python-matplotlib

Ok, now this should work:

pip install matplotlib

Now, of course, we need the IPython interpreter. Don’t settle for 0.11!

pip install -e git+
cd ~/.virtualenvs/py27/src/ipython
python install

Note, you may need to sudo rm /usr/bin/ if there is a conflict.

Ok, let’s beef up the IPython interpreter. Note the pip commands FAIL. This is ok. We’ll do it by hand.

sudo apt-get install qt4-dev-tools

pip install sip
cd ~/.virtualenvs/py27/build/sip
sudo make install

pip install pyqt
cd ~/.virtualenvs/py27/build/pyqt
sudo make install

# clean up
cd ~/.virtualenvs/py27/
rm -rf build

Just a few more things, you won’t be disappointed.

sudo apt-get install libzmq-dev
pip install tornado pygments pyzmq

Alright, let’s get pandas. It’s under heavy development (Wes is a beast); so lets pull the latest from git.

pip install nose cython
pip install -e git+

# we should get statsmodels too
pip install -e git+

Btw, you’ll note this git stuff goes into your ~/.virtualenvs/py27/src directory, if you want to git pull and update.

OK! Phew! For the grand finale:

Run the amazing qtconsole:

ipython qtconsole --pylab=inline

Or the even more amazing WEB BROWSER:

ipython notebook --pylab=inline

Launch a browser and point to http://localhost:8888/. For kicks, try opening one of Wes’s tutorial workbooks, here. You may have to fiddle a bit, but it should work.


I promised I would say something on why I switched to Homebrew from Macports. Ted Wise has a good rundown. I was slightly annoyed when Macports started to pull in so many dependencies and do 45 minute builds – basically, it trusts nothing already installed on OSX. Although, depending on your needs, this may not be a bad thing.

My take on it:

– if you need a tool here and there that is missing on OS X, go with Homebrew.
– if you are doing hardcore open-source development, Macports is a more complete answer.

A couple industrial-strength tools from MacPorts:

– gcc_select, for when you want more than one gcc install
– the gnu Ada compiler, gnat-gcc. You definitely ain’t gonna find Ada in Homebrew.


The Euclidean Algorithm is one of the first algorithms ever written down. It is a darling of computer science. The algorithm cleverly reveals the greatest common divisor of two natural numbers. Despite its ancient origins, it is still well-used today (and not just for teaching). It can determine when two numbers are relatively prime – ie, when their greatest common divisor is one – which has important modern applications in cryptography. And it does so efficiently.

Here is a cute implementation:

def gcd(a,b):
    return a if b == 0 else gcd(b,a % b)

(NB. Error checking omitted for clarity.)

Look ma, recursion! To spell it out, when b is zero, then gcd(a,b) = a. When b is non-zero, then gcd(a,b) = gcd(b, r), where r is the remainder of a divided by b. Let’s do a short example: gcd(12, 8). We have:

gcd(12, 8) = gcd(8, 4) = gcd(4, 0) = 4

Ok, so what the heck is going on here? The gcd algorithm is deceptively simple looking. However, it leverages some rather deep properties of the natural numbers.

First, a little background. The titular Euclid, that crafty fellow, once proved something called the Fundamental Theorem of Arithmetic, which says that any number can be expressed (uniquely!) as the product of primes. For example,

12 = 2 x 2 x 3 = 4 x 3
8 = 2 x 2 x 2 = 4 x 2

You can see by inspecting the prime factors that the gcd is 4. But factoring numbers is computationally difficult, especially when they get larger. The gcd algorithm is so efficient (and remarkable) in part because it doesn’t have to do any factoring!

One useful fact is that when a number divides x and y, that number also divides x + y and x – y. You can see this by playing around with prime factors again. In the example above:

12 – 8 = 4 x 3 – 4 x 2 = 4 x (3 – 2)
12 + 8 = 4 x 3 + 4 x 2 = 4 x (3 + 2)

You can see that 4 is a factor not only of 12 and 8, but also 12 + 8, and 12 – 8. Still with me?

Ok, one other thing to get under your belt is the division theorem. It says we can come up with a unique whole number quotient q and remainder r for any natural number division a / b. That is, a = qb + r for some unique whole numbers q and r, where 0 <= r < b.

Finally, we can write out step by step what is happening in the Euclidean algorithm. We start with the arguments a and b, and put a row in the table which shows the current arguments along with what is computed (ie, the remainder a % b). We add a row for each recursive invocation.

step a b a % b
0 a = q[0] b + r[0]
1 b = q[1] r[0] + r[1]
2 r[0] = q[2] r[1] + r[2]
… etc …
n-1 r[n-3] = q[n-1] r[n-2] + r[n-1]
n r[n-2] = q[n] r[n-1] + r[n]
n+1 return r[n-1]

At each step, we pass in as arguments to the gcd function the values in columns a and b, and we compute the remainder a % b of their division and, if we want, the quotient, but we can ignore the quotient for now. Wash, rinse, repeat.

Let’s check out gcd(12, 8):

step a b a % b
0 12 = 1 8 + 4
1 8 = 2 4 + 0
2 return 4

In the nth iteration, the final remainder, r[n], is zero. Why is this? It’s because the remainder r of a division such as a = q b + r is strictly less than b, so the arguments shrink inexorably throughout the computation, and eventually you get down to zero.

In row n, where r[n] = 0, the dividend and divisor are r[n-2] and r[n-1], so we know that r[n-1] divides r[n-2] evenly (since there is no remainder to the division!) And clearly r[n-1] divides itself. So that means r[n-1] divides r[n-3] too, since it divides both terms of the sum which defines it (see row n-1). Likewise, we can work all the way back up the call stack in order to say that r[n-1] divides both the original arguments a and b as well.

Wait! We still don’t know whether that final non-zero remainder r[n-1] is the greatest among all possible common divisors. We do know that divisor r[n-1] is <= gcd, by definition of gcd. But we need to prove one more statement: ANY common divisor of a and b divides ALL the remainders r[i] of the algorithm.

ORLY? Well, just look at the procedure above and notice how the first remainder can be written r[0] = a – q[0] b. So anything that divides a and b must divide r[0]. Likewise, r[1] = b – q[1] r[0], so anything that divides b and r[0] must divide r[1]. Following the turtles all the way down the rabbit hole: any common divisor of a and b divides ALL remainders r[i].

One common divisor in particular is the gcd. Since it divides r[n-1] in particular, we can say that gcd <= r[n-1].

But if gcd <= r[n-1] and r[n-1] <= gcd, doesn’t that mean they are equal? Oh yes indeedy-do it does. Quod erat demonstratum, baby!

PS: for some visual intuition of the algorithm, check out the line-segment drawings in the top right hand corner of the Wikipedia entry.

Per my brother‘s suggestion, I decided to move my blog to a top-level domain.  While I’m at it, I thought, I might as well figure out how to host my own WordPress install. That way I can have handy control over the plug-ins, so I can do awesome things like code coloring:

#include <iostream>

void main() {
  std::cout << "Hello syntax highlighting!\n";

I’m using Codecolorer. I also decided I need more control over my theme, so I switched to Suffusion, which is extremely configurable. One thing that puzzled me a bit was to stop flush-justifying all the text. The answer: in the Dashboard, go to Appearance > Editor (to get at the CSS), and find ‘justify’ and replace it with ‘left’.

Ok, so back to domain name.  I  purchased mine,, from Namecheap for, you guessed it, cheap.  I’m not sure it’s a blessing or curse to share my name with so many other Adam Klein’s, but in college I once got a birthday check from another guy’s grandmother.  I’m curious if I could have cashed it legally –

Then I got some reasonable hosting from DreamHost, which has a one-click install of WordPress.  I think I’m sharing a web server instance on IIS, because setting up my WordPress install at first led to some weird IIS errors.  But now obviously it’s ok. They have some helpful tutorials online.

Oh yeah, the final thing was to export my old blog content into an XML file which I then ftp’d to my blog site directory at DreamHost where I could then import it into the new WordPress instance.  Worked like a charm!

I just “figured out” (read: noticed) how to enable shortcut keys in GMail.

Inspired I see in part by Vim key bindings:

  • ‘j’ for down
  • ‘k’ for up
  • ‘o’ for open
  • ‘#’ for delete
  • ‘g i’ for go to inbox.
  • ‘l’ for label.

My efficiency just shot up 95%!

I have two hard disks on my new iMac:

– solid-state disk: 256Gb. Core operating system files and application installs.
– regular HD: 1Tb. User files and media.

I moved my user directory from /Users/adam on the SSD disk, where it was installed by default, to a new folder (/adam) on the secondary hard drive (/Volumes/Macintosh HD 2). This caused problems when I started doing Unix-ish things. Especially with Python. Python does NOT like spaces in your directory.

I soft-linked the original directory (/Users/adam) to the new directory (/Volumes/Macintosh HD 2/adam) using 

ln -s ~ /Users/adam

. I then (re)set my user directory in OS X to /Users/adam.  Still, some stuff “saw” the aliased directory /Volumes/Macintosh HD 2/adam.  In the end, I renamed my “Macintosh HD” directories – their factory-installed names – to “HD1” and “HD2”. Problem solved, so far.

© 2014 Adam Klein's Blog Suffusion theme by Sayontan Sinha, modified by Adam :)