Interpreting GHC cost centre stacks

A little-known feature of GHC is the ability to run a program (that has been compiled with profiling) with the +RTS -xc option. Now, every time an exception is raised a stack trace will be dumped to stderr. This is the only way get stack traces for your Haskell code currently. Unfortunately:

  • This stack trace is constructed from the cost-centre stack, so you'll only see names from modules you compiled in profiling mode with -auto-all or similar
  • The stack trace is shown regardless of whether the exception in question is actually handled or not - so you are likely to see lots of spurious stack traces where exceptions have been caught and handled

It's still better than the alternative (i.e. wailing about the lack of proper Haskell stack traces) though.

Now, the question is how to interpret the mess you get out. Here is an example stack trace from my current debugging session (I've added linebreaks to aid comprehension):


(I'm trying to figure out why OpenShake reliably dies with the dreaded "blocked indefinitely on MVar" exception.)

The answer lies in the GHC source code (see fprintCSS in Profiling.c). The items at the front of the stack trace are at the top of the stack and hence correspond to the approximate to location from which the exception was thrown.

Looking forward, it would be awesome if e.g. the Google Summer of Code could sponsor a student to take my work on GHC ticket 3693 further. Personally I would find stack trace support for Haskell totally invaluable, and I'm sure I'm not alone.

TEDxCam Hackathon

I spent Saturday at the TEDxCam Hackathon. This was an event which got together more than 40 coders to work in teams on a project in the area of data visualisation to compete for gadgets and tickets to the main conference.

Here is our team, hard at work (thanks to Dawson King for the photo and Paul Querelle for the brightness fix):

Our team decided to combine Timetric population data with general election outcomes from the Guardian to produce a movie showing the political evolution of the UK.

The idea was to colour each electoral constituency on the map with a color based on the winner of the election. There were several problems with completing this apparently simple task:

  • There is no good data source for where constituencies actually are, or what their boundaries are
  • There is also no good data source for population over time at a fine enough scale
  • The Guardian election data went back only 4 years

Of necessity, we had to find a smart way to use the limited data available. Because we didn't know constituency boundaries, but might have been able to find their centrepoints, we decided to use a Voronoi diagram to approximate the boundaries. The idea of such a diagram is that you divide up the 2D plane into cells around each input point so that every point in that cell is nearer to that input point than any other. Graphically, that means drawing the lines in this diagram:

In fact, we decided to go one better than that, and used a weighted Voronoi diagram. In this scheme, cells corresponding to constituencies could grow depending on the population of voters in that region.

The idea was that we could then make a movie showing how these cells grew, shrank, and changed color over time. Unfortunately it didn't quite come together on the day, but we did produce some lovely still images for the judges:

I spent some time on Sunday hacking the code up with some other data sources to produce a fully-fledged movie, which you can watch on Youtube. This version uses a power diagram, mainly because CGAL has some code for computing such diagrams much faster than we were able to compute our weighted Voronoi diagrams (10 minute rendering times became 10 seconds with this change!). Unfortunately, the code is buggier than a bucket full of cockroaches, as you will see if you watch the video 🙂

The Hackathon was tremendous fun, even if our project didn't turn out exactly as hoped! Thanks to the organisers (Cong Cong Bo and Dawson King) and to the excellent team who hacked through a long day to produce the stuff I've shown you above:

  • Max Bolingbroke
  • Nick Pilkington
  • Paul Querelle
  • Diva Tommei
  • George Weller

Ditaa support for gitit

I hacked together a quick plugin for the most excellent gitit wiki today. It's written in Haskell, so it's an absolute pleasure to write code for it.

What I added support for is a neat little tool called ditaa (DIagrams Through Ascii Art). Basically, in the markdown source of your Gitit wiki you can now write something like the following:

~~~ {.ditaa}
+--------+   +-------+    +-------+
|        | --+ ditaa +--> |       |
|  Text  |   +-------+    |diagram|
|Document|   |!magic!|    |       |
|     {d}|   |       |    |       |
+---+----+   +-------+    +-------+
    :                         ^
    |       Lots of work      |

The plugin will then call out to the ditaa command line tool (written in Java, boo!) to render that to a beautiful image:

A ditaa image on a Gitit page

To get this set up for yourself, try the following from the root of your Gitit wiki:

git clone git:// batterseapower-plugins
wget -O

Now edit your Gitit configuration file so the plugins list includes my plugin:

plugins: batterseapower-plugins/Ditaa.hs

That's it - restart Gitit and you should be ready to go!

Remembering The Hanzi

I've been learning Mandarin Chinese this year, and I've just finished a truly excellent book on the subject: Remembering The Hanzi, by Heisig and Richardson. The basic premise of this book is to introduce a system of mnemonic "stories" around each of the characters of the Chinese writing system which help you remember how to draw the character.

To give a simple example of how this works: first you learn some "primitive" collections of strokes (which may not necessarily be characters themselves) but which are easy to commit to visual memory. So pretty near the start of the book you learn the character 口 (meaning mouth) and 卜 (meaning divination, but which is given the mental image of a magic wand). Then you start building up stories around those characters, so the story around 占, which Heisig has you remember as meaning "fortune telling", could be something like: "To tell fortunes you need two things: a mouth, and a magic wand with which to make the predictions".

Remembering The Simplified Hanzi

Remembering these "stories" is usually vastly easier than remembering abstract collections of strokes, so I've found it a great help in learning the language. If this sounds interesting to you, you can actually get a free preview of the first 100 characters from their website.

Thanks to this excellent system, and help from sites like Reviewing The Kanji (for use when I get stuck for stories for some character) I've been able to learn 1500 hanzi in my spare time over the last 5 months. I am of course using Anki, a spaced repitition system, to continually reinforce the stories and check my knowledge of the characters, and I've exported my progress from it, which looks like this:

Hanzi Progress

The flat area at the start is where my progress is stalled at 100 characters due to waiting for the book to arrive from Hawaii and only having access to the sample pages. Since then my progress has been... erratic, but more recently I've settled down into a regular rhythm where I try and learn 20 every day, and it looks like I've managed to average 17 a day during that period - not too bad!

For you Anki users, I've just uploaded the final version of my Remembering The Hanzi flashcard deck into the shared decks area, so you base your own decks on that if you wish to.

What's next? Well, I want to get the second book in the series when it comes out (allegedly sometime this summer) to take the number of characters I know to 3000. However, more importantly I'm going to start working on my knowledge of pronuncation of some of these characters (the book links the characters to English keywords, not their Mandarin pronunciations), and how you use them in compound words, with an eye to taking the first level of the 汉语水平考试 test before the end of the year. I feel confident that this process can go quite quickly, with such a strong base of knowledge of the character system to build upon.

Constraint families

Various people, notably John Meacham, have proposed adding "context aliases" to Haskell. The basic idea is that you could write declarations like the following in Haskell:

context Num a = (Monoid a, Group a, Multiplicative a, FromInteger a)

Now, what this means is that when you write Num a in a constraint, you really mean all of Monoid a, Group a and so on. This means that the following program is valid, and presumably computes the number 7:

foo :: Num a => a -> a
foo = fromInteger 2 `mappend` fromInteger 5

This lets you write shorter type signatures in programs which make ubiquitous use of type classes. However, in the brave new world of type families an obvious generalisation is to allow class-associated constraints. In particular, this lets us solve the classic problem where you can't make Set an instance of Monad:

class RMonad m where
    context RMonadElem a
    return :: RMonadElem a => a -> m a
    (>>=) :: (RMonadElem a, RMonadElem b) => m a -> (a -> m b) -> m b

instance RMonad [] where
    context RMonadElem a = ()
    return x = [x]
    (>>=) = concatMap

instance RMonad Set where
    context RMonadElem a = Ord a
    return x = singleton x
    (>>=) = fold (\a s' -> union (f a) s') empty s

A few interesting points:

  1. What is the kind signature of the context synonym? We probably need another "kind" - that of class constraints - which is preserved by n-ary tupling.
  2. Can you provide a default implementation for the kind synonym? This would let us change the definition of the Monad type class in a backward compatible way, by defaulting RMonadElem a to ()
  3. I mentioned this idea to Ganesh at Fun In The Afternoon, and he told me about his rmonad package, which essentially does exactly this, but by reifying the dictionaries explicitly as data. This is a nice demonstration that the approach is workable, but I think we could really do without the boilerplate dictionary management.
  4. Amusingly, GHC currently represents type classes internall as a normal data type, with some extra invariants. This means that most of the existing machinery for dealing with associated type synonyms could probably be used changed to implement this extension!

I don't think that this introduces any horrendous type checking problems, and I can see how the desugarer has to treat dictionaries arising from such contexts. Nonetheless, there are probably some weirdnesses that I'm forgetting, so I should probably try to come up with a real specification (and implementation!) when I get some time...

(P.S: It looks like some guys at Hac5 were working on adding the simple constraint families to GHC - does anyone know how far they got with that?)

New paper: Types Are Calling Conventions

I've just submitted a paper, coauthored with Simon Peyton Jones, to the Haskell Symposium. In this paper, we outline what we think is an interesting point in the design space of intermediate languages for a lazy functional programming language like Haskell, and show some cool optimisations that we can do with it that are hard or impossible to express in the intermediate language used by GHC today.

Although this mainly represents a potential improvement in GHC's internals, where I'd really like to go with this is to push the ability to make a distinction between strict and lazy data into the type system of Haskell itself. This would mean that you could, for example, write functions that produce element-strict lists, and document some of the strictness properties of your functions in their types.

If any of this sounds interesting to you, you can obtain the paper from my freshly-minted Computer Lab website. You can leave any comments you may have on the corresponding Wiki page.

OMake and Literate Haskell

I've often thought it would be a nice idea to mash together Eelco Dolstra's ideas about purely-functional software deployment with functional reactive programming to produce a purely functional build system. In this world, compilers are pure functions from source (.c files) to object (.o files) and these dependencies are encoded explicitly by argument-passing in the program rather than happening implicitly via the file system.

Having grappled with GNU Make somewhat for the last week, I thought I'd take another look at making this work nicely and made a little progress. However, that is not the topic of this post! In the course of research, Google turned up an interesting find: OMake, a build system written in OCaml. It has a number of niceish features over GNU Make but the thing that really caught my eye was built-in support for LaTeX building - just the problem I'd had over the last week. Thus inspired, I had a go at integrating this support with one of my Literate Haskell documents, and this post will preserve the result for posterity 🙂

OMake syntax is somewhat Make like, so our OMakefile opens with something you may find familiar:

# Phony targets are scoped, so you probably want to declare them first.

.PHONY: all clean

The notion of scoping is somewhat different from that Make, and it includes the nice feature that you can introduce local scopes into your OMakefile with a "section" declaration - I don't use any of that good stuff here though.

Next I need to tell OMake how to build .lhs files, since it only understands .tex. I do this by declaring a function that generates rules when it is called (pretty cool!):

# Lhs2Tex builder.

LHS2TEX = lhs2TeX

Lhs2Tex(tex, lhs) =
    # Add the suffixes
    tex_name = $(addsuffix .tex, $(tex))
    lhs_name = $(addsuffix .lhs, $(lhs))

    # The build rule
    $(tex_name): $(lhs_name)
        $(LHS2TEX) --$(LHSSTYLE) -o $@ < $+

    # Return the tex file name
    value $(tex_name)

At this point I could also use the .SCANNER target to tell OMake how to determine the dependencies of a .lhs file, but I didn't need that for my situation, so I didn't bother to do it. Instead, let's plunge onwards into the declaration of the rules particular to my program:

# LaTeX configuration.

DOCUMENT = closure-elimination
GENERATED_TEX = closure-elimination

foreach(tex, $(GENERATED_TEX)):
    Lhs2Tex($(tex), $(tex))

LaTeXDocument($(DOCUMENT), closure-elimination)

This turns my file closure-elimination.lhs into closure-elimination.tex and then finally into a PDF using the built-in LaTeXDocument function. I've used a foreach loop to call my Lhs2Tex function just to show you that this can be done - the OMake system includes a whole rich programming language of it's own, which even includes first-class functions!

Finally I need to define those phony targets I declared way up at the start of the post:

# Phony targets

all: $(DOCUMENT).pdf

  rm -f \
     $(addsuffix .tex, $(GENERATED_TEX)) \
     $(addsuffixes .pdf .dvi .aux .log .out, $(DOCUMENT)) \

My "clean" target is somewhat unsatisfactory. I rather think that the use of LaTeXDocument should mean that the LaTeX junk (.pdf, .dvi, .aux, .log, .out) should be automatically cleaned somehow, but I can't work out how to achieve that. (My functional build system would consider this stuff part of the "local side effects" of a build command - encoded through some sort of monad structure - and hence fair game for any "clean".)

Overall, OMake looks like a nice system - indeed, anything that handles the tedious business of running LaTeX over my document right number of times is doing better than GNU Make in my book. I'll certainly consider using it in future, larger scale projects - but I'm a bit concerned the code has been abandonded, as the last release was in August of 2007...

Other alternatives for LaTeX projects include (if you want to stick with GNU Make) the Ultimate LaTeX Makefile. Another option might be cute-looking thing I just discovered called rubber, which is a specialised LaTeX build tool.

Bitesize Functional Programming: Universals As Existentials

In type theory, the existential quantifier, \(\exists\) is the dual to the more familiar universal quantifier, \(\forall\) (which expresses parametric polymorphism). Believe it or not, these existentially quantified types can be useful in day-to-day programming in languages like Haskell where you wish to perform some kind of "information hiding". As a simple example, consider this program:

data Showable = forall a. Show a => MkShowable a

We start off by declaring a datatype, Showable, with a single constructor, MkShowable that has type \(\forall \alpha. (\text{S}\text{how}~\alpha) \Rightarrow \alpha \rightarrow \text{S}\text{how}\text{able}\). This is to say, it injects any value with a Show instance into the Showable type.

This is really an existentially quantified datatype. However, many newbies are confused by this pronouncement, as it quite clearly uses the forall keyword, which introduces universal quantification - I'll come back to that at the end of this article.

instance Show Showable where
    show (Showable x) = show x

This next chunk of code declares that any value of type Showable is itself a member of the Show type class. We implement it by unpacking the Showable value and applying show "recursively" to that value. We are able to do this precisely because of the constraint in the MkShowable constructor that specified that whatever type gets injected into Showable must have a Show instance somewhere.

An aside: this construction has something of the flavour of object orientation, as it implements a uniform interface using dynamic dispatch. I'll leave it up to the inimitable Oleg to expand on this point...

showable_stuff :: [Showable]
showable_stuff = [MkShowable 1337, MkShowable "I'm a string!", MkShowable 3.1415]

Here we make use of the marvellous information-hiding properties of the existential to create a list holding values of varying types. All we can know about the elements of the resulting list is that they certainly have a Show instance.

main = mapM_ print showable_stuff

This last declaration is just a bit of top-level gunk that outputs all the things in that list to the screen using the Show typeclass.

So, that's a pretty interesting technique. But in precisely what sense is it existential, given that we got all this going by using the universal quantifier (the forall keyword)? I think the easiest way to look at this problem is using the marvellously useful Curry-Howard correspondence between types and terms in a logic system.

First, remember than the function arrow at the type level (\(\tau_1 \rightarrow \tau_2\)) just corresponds to implication in proofs, and the quantifiers carry through unchanged. Secondly, remember that in the proof system corresponding to simple ML-style types (second-order intuionistic logic) \(\sigma_1 \rightarrow \sigma_2 \equiv \lnot \sigma_1 \lor \sigma_2\). Thirdly, recall that if \(\alpha\) is free in \(\sigma_2\) then \(\forall \alpha. \sigma_1 \lor \sigma_2 \equiv (\forall \alpha. \sigma_1) \lor \sigma_2\). Finally, you need to know that pushing a negation through an existiental or universal quantifier flips the kind of quantifier we have (the statement that an \(x\) does not exist satisfying \(P\) is equivalent to saying that all \(x\)s do not satisfy \(P\)). Now we can gradually manipulate the type of a simple data constructor function into a form that makes it obvious where the existiential comes from. Let's try this example (which doesn't have any typeclasses, since they don't really fit into the classical correspondence):

data K = forall a. MkK a

So MkK has type \(\forall \alpha. \alpha \rightarrow K\). Let's push that into the realm of logic and proof and see what we can do:

\[\begin{array}{ccc}\forall \alpha. \alpha \rightarrow K &=& \forall \alpha. \lnot \alpha \lor K\\ &=& (\forall \alpha. \lnot \alpha) \lor K\\ &=& \lnot (\exists \alpha. \alpha) \lor K\\ &=& (\exists \alpha. \alpha) \rightarrow K\end{array}\]

And we're done! We can extract this last proof term back out as a type in our source language without any fuss, and hence from only a few logical rules (which, if you didn't know them, are fairly easy to convince yourself of) we have derived why these constructs are known as "existential". In fact, some Haskell compilers once supported the exists keywoard to let you declare them directly, but due to precisely this correspondence between the two quantifiers GHC dropped the extra keyword in favour of just using the forall to do both jobs.

Upgrading An Unactivated Windows Install To Parallels 4.0

This is a pretty obscure problem, but I'm going to put a post up about it on the off chance I can help someone else out. My regular reader (hi dad!) will probably find this of no interest and should give it a miss 🙂

The situation I found myself in was upgrading a Boot Camp install of Windows Vista for the new release of Parallels Desktop 4.0 - no big deal, you may think. Unfortunately, I had forgotten that that particular install of Windows Vista wasn't activated, which caused the automatic upgrade process to bork, dropping me back to manual mode.

To complete the upgrade I needed to run the Parallels Tools setup executable. However, since I hadn't activated, I could only log in as far as getting the "please activate Windows now" screen. As it happened, I knew that I could get rid of this screen by feeding it the details of a Windows Vista license I own, but in order to do that I needed an Internet connection (I don't think my PAYG phone had enough credit on it for an extended Microsoft call centre experience). However, to get an Internet connection I had to install the Parallels Ethernet Connection drivers, and hence the Tools. Catch 22!

The workaround is convoluted, to say the least. First, we need a command prompt in the restricted Vista activation session. You do this by clicking any of the links in the activation window: they should cause a browser to open. From here, you can ask the browser to "Open a file" and direct it to C:\Windows\System32\cmd.exe - this should initiate "download" of the executable. Click the option to run the file and voila!

Now you have a command prompt the fun really begins. You might think you could just type D:\setup.exe and the Tools would begin installing, but life just isn't that simple - in Their infinite wisdom, Microsoft have imposed quotas on the resource consumption of the session they set up for the purposes of activation. This is probably the Right Thing to do from their POV, but it's just a pain in the arse for us.

The workaround is to get the internet connection working, so you can do the activation and hence lift the resource limits. To do this, create a floppy disk image containing the Windows 2000 drivers for a Realtek 8029AS adapter (you should be able to get those from here, until Realtek break their incoming links again). Personally I did this by using another virtual machine to download the files and extract them onto a new floppy disk image (you can create a blank image on the Floppy Drive tab of the VM settings). I would make the fruits of this labour available to you as a simple download if it were not for (unfounded?) fear of Realtek's highly trained attack lawyers.

Once you have the requisite image in your sweaty virtual paws you can proceed to mount it into the Vista VM. To finish up, type compmgmt.msc into that command prompt and update the drivers for the detected network adapter by searching for new ones on the A:\ drive.

You should now be free to run the online activation and break the Catch 22, allowing installation of the Tools - at this point feel free to help yourself to a cup of coffee and a ginger-snap biscuit to celebrate a difficult job done well (I know I did...).

I'm really quite suprised that I had to jump through this many hoops - the Realtek drivers allegedly come with Vista, for one thing. But - c'est la vie! It's also quite pleasing that the humble, long outmoded, floppy drive still has a place in solving modern IT problems 🙂