Jun 11 2009

Remembering The Hanzi

I’ve been learning Mandarin Chinese this year, and I’ve just finished a truly excellent book on the subject: Remembering The Hanzi, by Heisig and Richardson. The basic premise of this book is to introduce a system of mnemonic “stories” around each of the characters of the Chinese writing system which help you remember how to draw the character.

To give a simple example of how this works: first you learn some “primitive” collections of strokes (which may not necessarily be characters themselves) but which are easy to commit to visual memory. So pretty near the start of the book you learn the character 口 (meaning mouth) and 卜 (meaning divination, but which is given the mental image of a magic wand). Then you start building up stories around those characters, so the story around 占, which Heisig has you remember as meaning “fortune telling”, could be something like: “To tell fortunes you need two things: a mouth, and a magic wand with which to make the predictions”.

Remembering The Simplified Hanzi

Remembering these “stories” is usually vastly easier than remembering abstract collections of strokes, so I’ve found it a great help in learning the language. If this sounds interesting to you, you can actually get a free preview of the first 100 characters from their website.

Thanks to this excellent system, and help from sites like Reviewing The Kanji (for use when I get stuck for stories for some character) I’ve been able to learn 1500 hanzi in my spare time over the last 5 months. I am of course using Anki, a spaced repitition system, to continually reinforce the stories and check my knowledge of the characters, and I’ve exported my progress from it, which looks like this:

Hanzi Progress

The flat area at the start is where my progress is stalled at 100 characters due to waiting for the book to arrive from Hawaii and only having access to the sample pages. Since then my progress has been… erratic, but more recently I’ve settled down into a regular rhythm where I try and learn 20 every day, and it looks like I’ve managed to average 17 a day during that period – not too bad!

For you Anki users, I’ve just uploaded the final version of my Remembering The Hanzi flashcard deck into the shared decks area, so you base your own decks on that if you wish to.

What’s next? Well, I want to get the second book in the series when it comes out (allegedly sometime this summer) to take the number of characters I know to 3000. However, more importantly I’m going to start working on my knowledge of pronuncation of some of these characters (the book links the characters to English keywords, not their Mandarin pronunciations), and how you use them in compound words, with an eye to taking the first level of the 汉语水平考试 test before the end of the year. I feel confident that this process can go quite quickly, with such a strong base of knowledge of the character system to build upon.


May 11 2009

Constraint families

Various people, notably John Meacham, have proposed adding “context aliases” to Haskell. The basic idea is that you could write declarations like the following in Haskell:

context Num a = (Monoid a, Group a, Multiplicative a, FromInteger a)

Now, what this means is that when you write Num a in a constraint, you really mean all of Monoid a, Group a and so on. This means that the following program is valid, and presumably computes the number 7:

foo :: Num a => a -> a
foo = fromInteger 2 `mappend` fromInteger 5

This lets you write shorter type signatures in programs which make ubiquitous use of type classes. However, in the brave new world of type families an obvious generalisation is to allow class-associated constraints. In particular, this lets us solve the classic problem where you can’t make Set an instance of Monad:

class RMonad m where
    context RMonadElem a

    return :: RMonadElem a => a -> m a
    (>>=) :: (RMonadElem a, RMonadElem b) => m a -> (a -> m b) -> m b

instance RMonad [] where
    context RMonadElem a = ()

    return x = [x]
    (>>=) = concatMap

instance RMonad Set where
    context RMonadElem a = Ord a

    return x = singleton x
    (>>=) = fold (\a s' -> union (f a) s') empty s

A few interesting points:

  1. What is the kind signature of the context synonym? We probably need another “kind” – that of class constraints – which is preserved by n-ary tupling.
  2. Can you provide a default implementation for the kind synonym? This would let us change the definition of the Monad type class in a backward compatible way, by defaulting RMonadElem a to ()
  3. I mentioned this idea to Ganesh at Fun In The Afternoon, and he told me about his rmonad package, which essentially does exactly this, but by reifying the dictionaries explicitly as data. This is a nice demonstration that the approach is workable, but I think we could really do without the boilerplate dictionary management.
  4. Amusingly, GHC currently represents type classes internall as a normal data type, with some extra invariants. This means that most of the existing machinery for dealing with associated type synonyms could probably be used changed to implement this extension!

I don’t think that this introduces any horrendous type checking problems, and I can see how the desugarer has to treat dictionaries arising from such contexts. Nonetheless, there are probably some weirdnesses that I’m forgetting, so I should probably try to come up with a real specification (and implementation!) when I get some time…

(P.S: It looks like some guys at Hac5 were working on adding the simple constraint families to GHC – does anyone know how far they got with that?)


May 10 2009

New paper: Types Are Calling Conventions

I’ve just submitted a paper, coauthored with Simon Peyton Jones, to the Haskell Symposium. In this paper, we outline what we think is an interesting point in the design space of intermediate languages for a lazy functional programming language like Haskell, and show some cool optimisations that we can do with it that are hard or impossible to express in the intermediate language used by GHC today.

Although this mainly represents a potential improvement in GHC’s internals, where I’d really like to go with this is to push the ability to make a distinction between strict and lazy data into the type system of Haskell itself. This would mean that you could, for example, write functions that produce element-strict lists, and document some of the strictness properties of your functions in their types.

If any of this sounds interesting to you, you can obtain the paper from my freshly-minted Computer Lab website. You can leave any comments you may have on the corresponding Wiki page.


Dec 7 2008

OMake and Literate Haskell

I’ve often thought it would be a nice idea to mash together Eelco Dolstra’s ideas about purely-functional software deployment with functional reactive programming to produce a purely functional build system. In this world, compilers are pure functions from source (.c files) to object (.o files) and these dependencies are encoded explicitly by argument-passing in the program rather than happening implicitly via the file system.

Having grappled with GNU Make somewhat for the last week, I thought I’d take another look at making this work nicely and made a little progress. However, that is not the topic of this post! In the course of research, Google turned up an interesting find: OMake, a build system written in OCaml. It has a number of niceish features over GNU Make but the thing that really caught my eye was built-in support for LaTeX building – just the problem I’d had over the last week. Thus inspired, I had a go at integrating this support with one of my Literate Haskell documents, and this post will preserve the result for posterity :-)

OMake syntax is somewhat Make like, so our OMakefile opens with something you may find familiar:

########################################################################
# Phony targets are scoped, so you probably want to declare them first.
#

.PHONY: all clean
.DEFAULT: all

The notion of scoping is somewhat different from that Make, and it includes the nice feature that you can introduce local scopes into your OMakefile with a “section” declaration – I don’t use any of that good stuff here though.

Next I need to tell OMake how to build .lhs files, since it only understands .tex. I do this by declaring a function that generates rules when it is called (pretty cool!):

########################################################################
# Lhs2Tex builder.
#

LHS2TEX = lhs2TeX
LHSSTYLE = tt

Lhs2Tex(tex, lhs) =
    # Add the suffixes
    tex_name = $(addsuffix .tex, $(tex))
    lhs_name = $(addsuffix .lhs, $(lhs))

    # The build rule
    $(tex_name): $(lhs_name)
        $(LHS2TEX) --$(LHSSTYLE) -o $@ < $+

    # Return the tex file name
    value $(tex_name)

At this point I could also use the .SCANNER target to tell OMake how to determine the dependencies of a .lhs file, but I didn’t need that for my situation, so I didn’t bother to do it. Instead, let’s plunge onwards into the declaration of the rules particular to my program:

########################################################################
# LaTeX configuration.
#

DOCUMENT = closure-elimination
GENERATED_TEX = closure-elimination

foreach(tex, $(GENERATED_TEX)):
    Lhs2Tex($(tex), $(tex))

LaTeXDocument($(DOCUMENT), closure-elimination)

This turns my file closure-elimination.lhs into closure-elimination.tex and then finally into a PDF using the built-in LaTeXDocument function. I’ve used a foreach loop to call my Lhs2Tex function just to show you that this can be done – the OMake system includes a whole rich programming language of it’s own, which even includes first-class functions!

Finally I need to define those phony targets I declared way up at the start of the post:

########################################################################
# Phony targets
#

all: $(DOCUMENT).pdf

clean:
  rm -f \
     $(addsuffix .tex, $(GENERATED_TEX)) \
     $(addsuffixes .pdf .dvi .aux .log .out, $(DOCUMENT)) \
     *.omc

My “clean” target is somewhat unsatisfactory. I rather think that the use of LaTeXDocument should mean that the LaTeX junk (.pdf, .dvi, .aux, .log, .out) should be automatically cleaned somehow, but I can’t work out how to achieve that. (My functional build system would consider this stuff part of the “local side effects” of a build command – encoded through some sort of monad structure – and hence fair game for any “clean”.)

Overall, OMake looks like a nice system – indeed, anything that handles the tedious business of running LaTeX over my document right number of times is doing better than GNU Make in my book. I’ll certainly consider using it in future, larger scale projects – but I’m a bit concerned the code has been abandonded, as the last release was in August of 2007…

Other alternatives for LaTeX projects include (if you want to stick with GNU Make) the Ultimate LaTeX Makefile. Another option might be cute-looking thing I just discovered called rubber, which is a specialised LaTeX build tool.


Nov 12 2008

Bitesize Functional Programming: Universals As Existentials

In type theory, the existential quantifier, $$\exists$$ is the dual to the more familiar universal quantifier, $$\forall$$ (which expresses parametric polymorphism). Believe it or not, these existentially quantified types can be useful in day-to-day programming in languages like Haskell where you wish to perform some kind of “information hiding”. As a simple example, consider this program:

data Showable = forall a. Show a => MkShowable a

We start off by declaring a datatype, Showable, with a single constructor, MkShowable that has type $$\forall \alpha. (\text{S}\text{how}~\alpha) \Rightarrow \alpha \rightarrow \text{S}\text{how}\text{able}$$. This is to say, it injects any value with a Show instance into the Showable type.

This is really an existentially quantified datatype. However, many newbies are confused by this pronouncement, as it quite clearly uses the forall keyword, which introduces universal quantification – I’ll come back to that at the end of this article.

instance Show Showable where
    show (Showable x) = show x

This next chunk of code declares that any value of type Showable is itself a member of the Show type class. We implement it by unpacking the Showable value and applying show “recursively” to that value. We are able to do this precisely because of the constraint in the MkShowable constructor that specified that whatever type gets injected into Showable must have a Show instance somewhere.

An aside: this construction has something of the flavour of object orientation, as it implements a uniform interface using dynamic dispatch. I’ll leave it up to the inimitable Oleg to expand on this point

showable_stuff :: [Showable]
showable_stuff = [MkShowable 1337, MkShowable "I'm a string!", MkShowable 3.1415]

Here we make use of the marvellous information-hiding properties of the existential to create a list holding values of varying types. All we can know about the elements of the resulting list is that they certainly have a Show instance.

main = mapM_ print showable_stuff

This last declaration is just a bit of top-level gunk that outputs all the things in that list to the screen using the Show typeclass.

So, that’s a pretty interesting technique. But in precisely what sense is it existential, given that we got all this going by using the universal quantifier (the forall keyword)? I think the easiest way to look at this problem is using the marvellously useful Curry-Howard correspondence between types and terms in a logic system.

First, remember than the function arrow at the type level ($$\tau_1 \rightarrow \tau_2$$) just corresponds to implication in proofs, and the quantifiers carry through unchanged. Secondly, remember that in the proof system corresponding to simple ML-style types (second-order intuionistic logic) $$\sigma_1 \rightarrow \sigma_2 \equiv \lnot \sigma_1 \lor \sigma_2$$. Thirdly, recall that if $$\alpha$$ is free in $$\sigma_2$$ then $$\forall \alpha. \sigma_1 \lor \sigma_2 \equiv (\forall \alpha. \sigma_1) \lor \sigma_2$$. Finally, you need to know that pushing a negation through an existiental or universal quantifier flips the kind of quantifier we have (the statement that an $$x$$ does not exist satisfying $$P$$ is equivalent to saying that all $$x$$s do not satisfy $$P$$). Now we can gradually manipulate the type of a simple data constructor function into a form that makes it obvious where the existiential comes from. Let’s try this example (which doesn’t have any typeclasses, since they don’t really fit into the classical correspondence):

data K = forall a. MkK a

So MkK has type $$\forall \alpha. \alpha \rightarrow K$$. Let’s push that into the realm of logic and proof and see what we can do:

$$!\begin{array}{ccc}\forall \alpha. \alpha \rightarrow K &=& \forall \alpha. \lnot \alpha \lor K\\ &=& (\forall \alpha. \lnot \alpha) \lor K\\ &=& \lnot (\exists \alpha. \alpha) \lor K\\ &=& (\exists \alpha. \alpha) \rightarrow K\end{array}$$

And we’re done! We can extract this last proof term back out as a type in our source language without any fuss, and hence from only a few logical rules (which, if you didn’t know them, are fairly easy to convince yourself of) we have derived why these constructs are known as “existential”. In fact, some Haskell compilers once supported the exists keywoard to let you declare them directly, but due to precisely this correspondence between the two quantifiers GHC dropped the extra keyword in favour of just using the forall to do both jobs.


Nov 11 2008

Upgrading An Unactivated Windows Install To Parallels 4.0

This is a pretty obscure problem, but I’m going to put a post up about it on the off chance I can help someone else out. My regular reader (hi dad!) will probably find this of no interest and should give it a miss :-)

The situation I found myself in was upgrading a Boot Camp install of Windows Vista for the new release of Parallels Desktop 4.0 – no big deal, you may think. Unfortunately, I had forgotten that that particular install of Windows Vista wasn’t activated, which caused the automatic upgrade process to bork, dropping me back to manual mode.

To complete the upgrade I needed to run the Parallels Tools setup executable. However, since I hadn’t activated, I could only log in as far as getting the “please activate Windows now” screen. As it happened, I knew that I could get rid of this screen by feeding it the details of a Windows Vista license I own, but in order to do that I needed an Internet connection (I don’t think my PAYG phone had enough credit on it for an extended Microsoft call centre experience). However, to get an Internet connection I had to install the Parallels Ethernet Connection drivers, and hence the Tools. Catch 22!

The workaround is convoluted, to say the least. First, we need a command prompt in the restricted Vista activation session. You do this by clicking any of the links in the activation window: they should cause a browser to open. From here, you can ask the browser to “Open a file” and direct it to C:\Windows\System32\cmd.exe – this should initiate “download” of the executable. Click the option to run the file and voila!

Now you have a command prompt the fun really begins. You might think you could just type D:\setup.exe and the Tools would begin installing, but life just isn’t that simple – in Their infinite wisdom, Microsoft have imposed quotas on the resource consumption of the session they set up for the purposes of activation. This is probably the Right Thing to do from their POV, but it’s just a pain in the arse for us.

The workaround is to get the internet connection working, so you can do the activation and hence lift the resource limits. To do this, create a floppy disk image containing the Windows 2000 drivers for a Realtek 8029AS adapter (you should be able to get those from here, until Realtek break their incoming links again). Personally I did this by using another virtual machine to download the files and extract them onto a new floppy disk image (you can create a blank image on the Floppy Drive tab of the VM settings). I would make the fruits of this labour available to you as a simple download if it were not for (unfounded?) fear of Realtek’s highly trained attack lawyers.

Once you have the requisite image in your sweaty virtual paws you can proceed to mount it into the Vista VM. To finish up, type compmgmt.msc into that command prompt and update the drivers for the detected network adapter by searching for new ones on the A:\ drive.

You should now be free to run the online activation and break the Catch 22, allowing installation of the Tools – at this point feel free to help yourself to a cup of coffee and a ginger-snap biscuit to celebrate a difficult job done well (I know I did…).

I’m really quite suprised that I had to jump through this many hoops – the Realtek drivers allegedly come with Vista, for one thing. But – c’est la vie! It’s also quite pleasing that the humble, long outmoded, floppy drive still has a place in solving modern IT problems :-)


Aug 31 2008

Hackage Releases Made Easy

The Haskell community has built up a great resource: the Hackage Haskell package database, where we recently hit the 500-package mark!

One of those 500 packages was mine, I added another to their number just an hour ago, and I’ve got two more in the oven. Given, then, that I’m starting to maintain a few packages, I went to the trouble of automating the Hackage release process, and in this post I’m going to briefly walk through setting up this automated environment.

  1. Install cabal-upload from Hackage. I’m afraid that at the time of writing this is not perfectly simple because it won’t build with GHC 6.8 or above: this can be fixed with a new .cabal file, however, which I’ve made available here. (Edit: I’ve just noticed that this functionality seems to have been added to Cabal itself! You may just be able to use cabal upload. However, I’m not sure what the right config file location is for the next step).
  2. Add a file containing your Hackage username and password in the format (”username”,”password”) called ~/.cabal-upload/auth.
  3. Copy the following shell script into a file called release in the root of your project (the same directory as the Setup.lhs file):
    #!/bin/bash
    #
    
    echo "Have you updated the version number? Type 'yes' if you have!"
    read version_response
    
    if [ "$version_response" != "yes" ]; then
        echo "Go and update the version number"
        exit 1
    fi
    
    sdist_output=`runghc Setup.lhs sdist`
    
    if [ "$?" != "0" ]; then
        echo "Cabal sdist failed, aborting"
        exit 1
    fi
    
    # Want to find a line like:
    # Source tarball created: dist/ansi-terminal-0.1.tar.gz
    
    # Test this with:
    # runghc Setup.lhs sdist | grep ...
    filename=`echo $sdist_output | sed 's/.*Source tarball created: \([^ ]*\).*/\1/'`
    echo "Filename: $filename"
    
    if [ "$filename" = "$sdist_output" ]; then
        echo "Could not find filename, aborting"
        exit 1
    fi
    
    # Test this with:
    # echo dist/ansi-terminal-0.1.tar.gz | sed ...
    version=`echo $filename | sed 's/^[^0-9]*\([0-9\.]*\).tar.gz$/\1/'`
    echo "Version: $version"
    
    if [ "$version" = "$filename" ]; then
        echo "Could not find version, aborting"
        exit 1
    fi
    
    echo "This is your last chance to abort! I'm going to upload in 10 seconds"
    sleep 10
    
    git tag "v$version"
    
    if [ "$?" != "0" ]; then
        echo "Git tag failed, aborting"
        exit 1
    fi
    
    # You need to have stored your Hackage username and password as directed by cabal-upload
    # I use -v5 because otherwise the error messages can be cryptic :-)
    cabal-upload -v5 $filename
    
    if [ "$?" != "0" ]; then
        echo "Hackage upload failed, aborting"
        exit 1
    fi
    
    # Success!
    exit 0
  4. When you’re ready to release something, simply run the shell script! Not only will this package up your project and upload it to Hackage, it will also add a version tag to your Git repository (obviously you should change this bit if you are using another VCS!).

If you would like to follow my continuing adventures in Haskell open source, please check out my GitHub profile! Patches gratefully accepted :-)


Aug 11 2008

Compiler Plugins AngloHaskell Talk

I was given the opportunity to speak on the topic of compiler plugins for GHC at AngloHaskell 2008 last weekend, and the slides and audio are now available here for those interested.

I wish they had taken video instead, as I went forwards and backwards over my slides quite a bit and wrote quite a lot on the whiteboard. I fear this may have left the presentation entirely incomprehensible to those who weren’t there in person (and maybe even some people who were :-) .

The other presenters were great, and I especially enjoyed Neil Mitchells talk on how Hoogle works, as type-searching is an area I was totally unfamiliar with. If you’re a local Haskeller, I thoroughly reccomend attending the 2009 event!


Jul 21 2008

Compiler Plugins For GHC: Week Six

Are we six weeks into it already? It’s flown by. What did I get up to this week?

Core HTML Output
I bring you the “mystery” project that I promised you last week: a HTML pretty printer for GHCs Core language! Features of my implementation are:

  • Syntax highlighting
  • Hyperlinked variables: click one to jump to its definition site if it refers to a local name, or to a Hoogle search for it otherwise
  • Mark interesting parts of the Core output with a thick border by clicking on a binder
  • Hover over variable usage sites to highlight their binding sites
  • Handy index phases run by the compiler, click a phase name to jump to the Core output by that phase

I’ve put some sample output up here for you to try out, but beware: it’s a 500kB document!

For the inspiration for this project I owe a debt of gratitude to Neil Mitchell’s YHC.Core.HTML. If you are interested in other human readable Core output formats, check out Don Stewart’s ghc-core package, which gives you a syntax-highlighted command line pager for Core.

Spit And Polish
I spent the vast majority of last week tidying up loose ends, hunting and squashing bugs and preening my sample plugin code. This reflects the fact that the torrent of new GHC features I’ve announced here are being bedded down in preparation for finishing the project off and getting them merged into HEAD.

Conclusion
More of the same this week: a focus on tidying up and getting some solid documentation done. I want to get something releasable ready well before the deadline which I can then enhance with discretionary improvements without fear of leaving the Summer of Code period without having something ready for real-world use.

Since this activity is all rather tedious to the casual Summer of Code follower, this will probably be my last weekly blog post. However, I’ll still try and write about any major developments in the project: see you then!


Jul 14 2008

Compiler Plugins For GHC: Week Five

Welcome to the fourth instalment in this ongoing series! How have compiler plugins progressed this week?

Haddocking GHC Internals
If we’re going to have any hope of people other than GHC developers writing plugins, there needs to be accessible documentation about the relevant parts of the compiler. The lions share of my time last week was taken up documenting about 40 of GHCs modules, describing their functions and writing some notes about GHC jargon that is useful to know.

Unfortunately, I don’t yet have an HTML version to show you since Haddock won’t run on GHC’s source code yet due to our use of the C-preprocessor. However, I’m assured that this is being resolved..

Annotation System Enhancements
I’ve added the ability for plugins to attach new annotations to Core syntax trees during compilation, rather than just sticking with the ones that were present in the source statically. This means you could now implement e.g. a strictness analysis pass with a seperate pass that used the annotations generated by that pass to perform the worker-wrapper transform.

I’ve additionally added the ability to generate annotations through Template Haskell. This smells like it might be useful to someone.

Phase Aliases
Roman Leshchinskiy replied to my previous email on phase control for GHC and came up with some interesting comments. The upshot of this is that I’ve added the ability to specify phase equality, rather than just inequality:

{-# PHASE Foo = Bar, < Spqr #-}

Pan Optimization Sample Plugin
My mentor for this project, Sean Seefried, worked on pluggable compilers as part of his doctoral studies. One of the things that came out of that was a GHC plugin that performed a domain-specific optimization known as “image lifting” on his reimplementation of the Pan combinator library for functional image construction.

I’ve ported this plugin to my own compiler plugins framework, and extracted considerable amounts of it’s utility code into GHC itself for use by other plugin authors. I also hope to use it’s codebase as an exemplar of how to write a large compiler plugin.

API Cleanup
GHC has grown by accretion, and as a result some of the APIs we provide are inconsistently named, are parts of incomplete sets of functions and so on. I’ve been spending some time refactoring the worst offenders so the code is a bit more presentable, and hopefully the code will be a bit easier to get a handle on for users new to GHC.

Conclusion
A good week: I’ve hit all the goals I laid out in the last installment of this series. However, documentation work is a bit tedious and I’m glad I’ve got a large chunk of it out of the way: this leaves me free to work on some more exciting things this week.

This week I’m focusing on polishing off the rough edges in my API and sample plugins, so they are something approaching releasable. I also have numerous annoying to-dos accumulated from the last five weeks that I will ned to take another another look at. This small stuff aside, I’ve just spent the first day of week 6 working on a rather exciting feature that I think will be very useful even for those who do not plan to be plugin authors: you’ll have to wait until my next post to find out about that!