Jul 21 2008

Compiler Plugins For GHC: Week Six

Are we six weeks into it already? It’s flown by. What did I get up to this week?

Core HTML Output
I bring you the “mystery” project that I promised you last week: a HTML pretty printer for GHCs Core language! Features of my implementation are:

  • Syntax highlighting
  • Hyperlinked variables: click one to jump to its definition site if it refers to a local name, or to a Hoogle search for it otherwise
  • Mark interesting parts of the Core output with a thick border by clicking on a binder
  • Hover over variable usage sites to highlight their binding sites
  • Handy index phases run by the compiler, click a phase name to jump to the Core output by that phase

I’ve put some sample output up here for you to try out, but beware: it’s a 500kB document!

For the inspiration for this project I owe a debt of gratitude to Neil Mitchell’s YHC.Core.HTML. If you are interested in other human readable Core output formats, check out Don Stewart’s ghc-core package, which gives you a syntax-highlighted command line pager for Core.

Spit And Polish
I spent the vast majority of last week tidying up loose ends, hunting and squashing bugs and preening my sample plugin code. This reflects the fact that the torrent of new GHC features I’ve announced here are being bedded down in preparation for finishing the project off and getting them merged into HEAD.

Conclusion
More of the same this week: a focus on tidying up and getting some solid documentation done. I want to get something releasable ready well before the deadline which I can then enhance with discretionary improvements without fear of leaving the Summer of Code period without having something ready for real-world use.

Since this activity is all rather tedious to the casual Summer of Code follower, this will probably be my last weekly blog post. However, I’ll still try and write about any major developments in the project: see you then!


Jul 14 2008

Compiler Plugins For GHC: Week Five

Welcome to the fourth instalment in this ongoing series! How have compiler plugins progressed this week?

Haddocking GHC Internals
If we’re going to have any hope of people other than GHC developers writing plugins, there needs to be accessible documentation about the relevant parts of the compiler. The lions share of my time last week was taken up documenting about 40 of GHCs modules, describing their functions and writing some notes about GHC jargon that is useful to know.

Unfortunately, I don’t yet have an HTML version to show you since Haddock won’t run on GHC’s source code yet due to our use of the C-preprocessor. However, I’m assured that this is being resolved..

Annotation System Enhancements
I’ve added the ability for plugins to attach new annotations to Core syntax trees during compilation, rather than just sticking with the ones that were present in the source statically. This means you could now implement e.g. a strictness analysis pass with a seperate pass that used the annotations generated by that pass to perform the worker-wrapper transform.

I’ve additionally added the ability to generate annotations through Template Haskell. This smells like it might be useful to someone.

Phase Aliases
Roman Leshchinskiy replied to my previous email on phase control for GHC and came up with some interesting comments. The upshot of this is that I’ve added the ability to specify phase equality, rather than just inequality:

{-# PHASE Foo = Bar, < Spqr #-}

Pan Optimization Sample Plugin
My mentor for this project, Sean Seefried, worked on pluggable compilers as part of his doctoral studies. One of the things that came out of that was a GHC plugin that performed a domain-specific optimization known as “image lifting” on his reimplementation of the Pan combinator library for functional image construction.

I’ve ported this plugin to my own compiler plugins framework, and extracted considerable amounts of it’s utility code into GHC itself for use by other plugin authors. I also hope to use it’s codebase as an exemplar of how to write a large compiler plugin.

API Cleanup
GHC has grown by accretion, and as a result some of the APIs we provide are inconsistently named, are parts of incomplete sets of functions and so on. I’ve been spending some time refactoring the worst offenders so the code is a bit more presentable, and hopefully the code will be a bit easier to get a handle on for users new to GHC.

Conclusion
A good week: I’ve hit all the goals I laid out in the last installment of this series. However, documentation work is a bit tedious and I’m glad I’ve got a large chunk of it out of the way: this leaves me free to work on some more exciting things this week.

This week I’m focusing on polishing off the rough edges in my API and sample plugins, so they are something approaching releasable. I also have numerous annoying to-dos accumulated from the last five weeks that I will ned to take another another look at. This small stuff aside, I’ve just spent the first day of week 6 working on a rather exciting feature that I think will be very useful even for those who do not plan to be plugin authors: you’ll have to wait until my next post to find out about that!


Jul 5 2008

Compiler Plugins For GHC: Weeks Three and Four

I was attending graduation last weekend so didn’t have time to put a progress post together, which means that this week you get a double helping of GHC-plugins goodness! For those new to the series, you might like to read the first two posts before continuing.

Type Safe Dynamic Loading

This project is all about dynamically loading plugin code into GHC so it can transform the program being compiled. Until now, I blindly trusted that if the plugin exposed a value with the name “plugin” it was indeed a Plugin data structure, but now I’ve pieced together some parts of the GHC typechecker to use the associated .hi files to check whether this is indeed the case.

It might be useful to expose this to users as an alternative to parts of hs-plugins, but as it would need a spot of generalization work to be done first this is not on the agenda at the moment.

Annotations System

What is an annotations system? It is entirely analagous to the annotations system of Java or the attributes of .NET languages in that it allows you to associate bits of user data with elements of the program. In my current design, you can attach annotations to modules, types, data constructors and top level values, like so:

module Example where

import AnnotationTypes ( MyAnnotationType )

{-# MODANN MyAnnotationType { friendliness = 10, friend = "Jim" } #-}

{-# TYPEANN Foo MyAnnotationType { friendliness = 20, friend = "Bob" } #-}
data Foo = Bar

{-# ANN f MyAnnotationType { friendliness = 30, friend = "Jane" } #-}
f x = x

Actually, you can use arbitrary expressions in your annotations, as long as they eventually boil down to something with a Typeable instance. So this rather esoteric expression would be perfectly fine:

{-# ANN f SillyAnnotation { foo = (id 10) + $( [| 20 |]), bar = 'f } #-}

You probably want to avoid non-termination or even expensive computation in those annotations as they are potentially evaluated at compile time! Plugins can see annotations during compilation and hence can use them to guide the transformations they perform on your code, but you can also access the annotations of any module via the GHC API.

I’ve previously alluded to the difficulties with an annotations system: I’ll take this opportunity to discuss them a bit further. Consider this program:

module Example2 ( exported ) where

exported = not_exported

{-# ANN not_exported Just "Hello" #-}
not_exported = 10

Because “not_exported” is only used once its definition will be inlined straight into “exported” (regardless of it’s size). This means that the annotation on it is entirely useless, as odds are that the plugin will never see the not_exported identifier in the program!

We have the same sort of problem with identifiers in the rules system, and require manual addition of NOINLINE pragmas to the relevant identifiers to circumvent it, but it all feels rather clumsy and I’m not sure what the best solution is.

Note that this is not a problem for other modules accessing the annotation, as by definition they do so on an exported identifier that does not suffer this treatment.

Another problem with annotations is that it’s almost impossible to allow them on non top-level identifiers with the current GHC implementation, as those identifiers get created and destroyed with reckless abandon by compiler passes.

We work around this for things like SCCs by actually making SCCs a kind of expression in the intermediate language, but doing this for annotations doesn’t sit well with the idea of being able to look up the annotations attached to a particular identifier from other modules using the GHC API. So again, I’m not quite sure what the solution is here.

Sample Plugins

To try and produce some sample code for the eventual release and get some experience about the API I need to provide to plugin authors I have implemented some simple compiler plugins. I’ve got two complete so far:

  • A plugin to make Haskell a strict rather than lazy language
  • A plugin that performs GHCs current common subexpression elimination pass, but outside of the main compiler

There are some more planned: watch this space.

Conclusion
It’s been a month since I began the project, and I’m fairly pleased with my progress up to this point. There’s still a lot left to do, but I’m confident I should have something presentable by the end of the Summer Of Code period.

Next week will probably see some refinements to the annotation and phase control systems, the construction of a few more sample plugins, and perhaps even a start on documenting GHCs internals to some extent.


Jun 23 2008

Compiler Plugins For GHC: Week Two

I wasn’t quite as productive with my Summer Of Code project this week as I was last week. Let’s take a look at the big ticket items that were accomplished.

Phase Control System Implemented

I covered the design of in my last post, and most of my work over the week has been on implementing and refining that proposal. It’s not essentially done, with a remaining small but significant wrinkle. The new phase control system lets you write rules that make use of phases above and beyond the existing ontology of phase 0, 1 and 2. An example of such a rule is as follows:

import GHC.Prim ({-# PHASE ConstructorSpecialization #-})

{-# RULES "foldr/build" [~ConstructorSpecialization] foldr c n (build g) = g c n #-}

That’s all very well, but the snag is that we actually have one of these phases for every compiler pass in GHC, so in order to ensure that we always fire the RULEs that may be set up we need to insert a full simplifier pass after almost every compiler pass – yikes! That’s a lot more simplification than we currently do and it can’t be good for compile times. I’m still thinking about how to resolve that one.

Compiler Pipeline Dynamically Constructed From Constraints
This is the reason why we have one phase for every compiler pass: I’ve changed GHC so its entire core-to-core pipeline is now built up from the relative ordering of these phases and the phase tags I’ve attached to every pass. This is a prerequisite for allowing compiler plugin authors to insert their own core-to-core passes by specifying declaratively when they would like them to run.

Template Haskell Phase Integration
So we have these phase pragmas, but how do plugin go about referring to phases in their actual code that talks to GHC? The answer is with the new support for phases in Template Haskell!

{-# PHASE MyPhase #-}

... stuff ...

getPluginPasses :: HscEnv -> CoreModule -> IO [PhasedPluginPass]
getPluginPasses hsc_env cm = do
    ... stuff ...
    Just phase_name <- thNameToGhcName hsc_env '''MyPhase
    return [PhasedPluginPass phase_name (VanillaPluginPass pass)]

This code is using the new triple quote notation to get a Template Haskell Name for the compiler phase, which is converted to a GHC Name and finally given to GHC itself. Of course, the Template Haskell support allows a lot more than this, such as generating new phases and splicing them in to your code at compile time.

Conclusion
The project is still coming steadily along. I’m starting this week with ancillary work on the Static Argument Transformation that isn’t directly related to the project, but then I hope to move on to the plugin annotations system that I called out last week as a looming and highly thorny issue: expect to see more on this topic soon!


Jun 15 2008

Compiler Plugins For GHC: The First Week

Things have been coming along very well with my Summer of Code project to add dynamically loaded plugins to the Glasgow Haskell Compiler. In my first week of coding post-finals I’ve got a lot done. I’ll be discussing two of the headline items in this post.

Proof Of Concept Plugin Loading

GHC is capable of dynamically loading plugins specified on the command line from any installed package, and running the compiler phases that they install. To give you an idea of what that looks like, here is the current code for my sample-plugin project:

module Simple.Plugin(plugin) where

import UniqSupply
import DynFlags
import HscTypes
import CoreSyn
import Outputable
import Module

import Plugins (Plugin(..), PluginPass(..))

plugin :: Plugin
plugin = Plugin {
    initializePlugin = initialize,
    getPluginPasses = getPasses
  }

initialize :: IO ()
initialize = do
    putStrLn "Simple Plugin Initialized"

getPasses :: CoreModule -> IO [PluginPass]
getPasses cm = do
    putStrLn "Simple Plugin Passes Inspecting Module"
    let mod_s = showSDoc (pprModule (cm_module cm))
    putStrLn $ "Simple Plugin Passes Queried For " ++ mod_s
    return [PluginPass pass]

pass :: DynFlags -> UniqSupply -> [CoreBind] -> IO [CoreBind]
pass _ _ binds = do
    putStrLn "Simple Plugin Pass Run"
    return binds

There’s a lot of work still to do here: the biggie is allowing “annotations” a-la languages like C# that let you mark identifiers or expressions in the language with extra stuff that meta-programs can make use of. For example, you might want to tag which functions you want your compiler plugin to analyse or add instrumenting code to. It’s quite hard to get this feature right, and I’ll probably be posting some more about the issues involved later as I get closer to implementing it.

Phase Control

GHC compiles your programs in a classic pipelined style: the main stages in a typical pipeline would be lexing, parsing, typechecking, desugaring, optimization and finally code generation. Although most of these stages have to run in a particular order, some of the stages can potentially run in multiple orders, most notably those sub-stages that make up the “optimization” stage I mentioned.

This is relevant to plugins because we need to be able to say when any phases you install should run. However, it also turns out that we use this feature to allow compiler users to control when inlining and source code rewrite rules should be applied, as documented in the user guide.

The current system we use is a bit ugly and just establishes a mapping between controlled things and the natural numbers to establish an ordering. This week, I’ve proposed (and partially implemented) a system for phase control that uses phase names that are declared in PHASE pragmas and henceforth exported and imported just like any other Haskell name, so for example:

module Spqr(... {-# PHASE C #-} ...) where

import GHC.Phases({-# PHASE SpecConstr #-})

{-# PHASE C < SpecConstr #-}

{-# RULE "silly" [~C] id = (\x -> x) #-}

This establishes a new phase C that must run before GHCs constructor specialization phase. This phase is in turn used to control the activation of the “silly” rule, and the phase exported so it can be referred to by other modules.

If you have any comments about this system, please make yourself heard on glasgow-haskell-users!

Conclusion

I’m fairly pleased with my progress so far and having a great time finally doing some coding again after the long exam period!

Hopefully this coming week will see me complete the implementation of the new phase control system and a refactoring of GHCs existing pipeline construction to take into account that phase information. I should then be able to move on to some issues more directly related to plugins, such as the rather thorny issue of the annotation system.


May 8 2008

Free Monads In Haskell

Did you know that you can actually generate a monad automatically given any endofunctor? This is the free monad of that functor. How do we perform this magic trick? Quite simple, everything follows from this data type:

data FreeM f a = Return a
               | Bind (f (FreeM f a))

In this, the f parameter is just the functor we are going to build a monad for, and a is the type of the contents of a FreeM value. We’ve said that the free monad can either contain such a value directly (boring) or it can contain a value of it’s own type, but wrapped in the functor f – hmm! As an aside, I’m not sure that Bind is strictly the best name for this constructor, but I like the symmetry.

If you want to build up some intuition about this definition, you can think of it as specifying values that consist of an actual a (in a Return constructor) but nested within a finite number of functor applications.

As another aside, you might like to know that actually Haskell would let you create an infinite stack of functors fairly straightforwardly, since all its data declarations are actually really co-data declarations. Don’t do this, though, or you might start having some non-termination issues :-)

Having got this far through the post, you won’t be surprised to learn that we can make this definition into a functor itself. Here comes the implementation:

instance (Functor f) => Functor (FreeM f) where
    fmap f (Return a) = Return (f a)
    fmap f (Bind fm) = Bind (fmap (fmap f) fm)

That Bind case looks pretty scary, doesn’t it? The leftmost fmap call is that from the functor f that we are making a free monad over, and the rightmost one is actually a recursion back into the fmap we are currently trying to compute, but one functor “lower”.

Essentially all we are doing in this definition is fmaping all the way down our tower of functors until we finally reach a nice sane a value in the Return case, which we handle in the obvious way.

So far so good. But we’ve come all this way and still don’t have a monad, even though I promised you one at the start. Well, let’s sort that out now! Instead of defining one using the usual Haskell >>= operator, I’m going to use the more category-theoretical join :: Monad m => m (m a) -> m a construction:

joinFreeM :: Functor f => FreeM f (FreeM f a) -> FreeM f a
joinFreeM (Return a) = a
joinFreeM (Bind fm) = Bind (fmap joinFreeM fm)

If you think about the type of this operation, what we want to do is stick the pile of functors associated with the outermost FreeM onto the pile of functors associated with the innermost one, to produce one cohesive functor pile. How we go about this is fairly straightforward: in the Return case there are no functor applications on the outermost FreeM so we just give back the inner one. The Bind case simply recurses its way down the whole functor pile, mushing them together in some sense :-) .

That was it. The code to actually declare a Monad instance is entirely boilerplate given that we have a join operation:

instance (Functor f) => Monad (FreeM f) where
    return = Return
    m >>= f = joinFreeM ((fmap f) m)

Pretty neat! But what, pray tell, are these things actually useful for? Well, after I’d implemented my own version of free monads I found that Wouter Swierstra’s excellent paper “Data types à la carte” had already publicly demonstrated the same thing (in passing) with his Term data structure, and he has some nice examples. For instance, consider this functor:

data Zero a = Zero
  deriving (Show)

instance Functor Zero where
    fmap _ Zero = Zero

zeroTest :: FreeM Zero Int -> FreeM Zero Int
zeroTest my = do
    x <- return 5
    y <- my
    return $ x + y

-- > zeroTest (return 1)
-- 6
-- > zeroTest (Bind Zero)
-- Zero

Remind you of anything? It should: it’s just the Maybe monad!

What about this?

data One a = One a
  deriving (Show)

instance Functor One where
    fmap f (One a) = One (f a)

oneTest :: FreeM One Int -> FreeM One Int
oneTest my = do
    x <- return 5
    y <- my
    return $ x + y

-- > oneTest (Return 2)
-- 7
-- > oneTest (Bind $ One $ Return 2)
-- One 7
-- > oneTest (Bind $ One $ Bind $ One $ Return 2)
-- One One 7

Something like the identity monad: pretty boring! What if we try something a bit more exotic (and not mentioned in Wouter’s paper) like the list functor?

listTest :: FreeM [] Int
listTest = do
    x <- Bind [Return 1, Return 2]
    y <- Bind [Return 3, Return 4]
    return $ x + y

-- > listTest
-- [[4,5],[5,6]]

But for want to a call to concat, this looks strikingly similar to the list monad – and all without ever having to define it. Interesting stuff!

I’ve found that the fun of free monads is that you can play around with any functor you like and get something interesting generated from it. Since they’re free, this happens without you typing a single line of non-trivial code, so I encourage you to go and try a few out quickly and see what you can make happen!


May 1 2008

Fixing File Associations Eaten By Parallels Desktop

I use OS X, and so for those rare cases where I must deign to run a program designed for Windows I make use of my copy of it hosted in Parallels Desktop. Now, Parallels has been generally good to me, but I recently came across the problem documented in this forum thread where its ability to launch Mac applications from Windows caused some basic file associations in Windows-land to stop working. For example, Excel files were listed as “xls_auto_file” by Explorer and double clicking on them only gave me the “Open With” dialog: not very helpful.

The thread does suggest a means of solving the problem, by creating a batch file that re-associates a white-list of some of the possibly affected extensions, like this:

REM Restores MS Office File Type Associations
assoc .doc=Word.Document.8
assoc .dochtml=wordhtmlfile
assoc .docmhtml=wordmhtmlfile
assoc .docxml=wordxmlfile
assoc .dot=Word.Template.8
assoc .pot=PowerPoint.Template.8
assoc .pps=PowerPoint.SlideShow.8
assoc .ppt=PowerPoint.Show.8
assoc .rtf=Word.RTF.8
assoc .wbk=Word.Backup.8
assoc .xlc=Excel.Chart.8
assoc .xlm=Excel.Macrosheet
assoc .xls=Excel.Sheet.8
assoc .xlt=Excel.Template
assoc .xlw=Excel.Workspace

However, this felt a bit ad-hoc to me, and in particular some of the extensions that were affected for me were not on the list. Thus, like any good programmer I went off and whipped up a utility designed to undo the damage inflicted on my poor defenseless HKEY_CLASSES_ROOT by Parallels.



Using it is pretty simple: just click “Scan” and then “Fix Selected” repeatedly until the scan finds nothing more of interest. Now, before I give you a download link to the utility, I need to give you the standard disclaimer to which you must agree in order to use it:

This utility is provided on an ‘as is’ basis, without warranties of any kind, and no warranty, express or implied, is given that the operation of the utility is correct or safe to use. I do not accept any liability for any error or omission. Use of the utility is at your own risk.

Sorry for the legalese but this utility is modifying your registry and hence (though I feel it is highly unlikely) could get something quite, quite wrong. You would be also wise to have a backup of the HKEY_CLASSES_ROOT registry branch to restore in case you aren’t happy with the changes made by the utility.

That said, without further ado here is the utility and its source code (C# 3.0), both of which (being of my sole authorship) I release into the public domain. I hope someone else finds it useful!


Apr 27 2008

FizzBuzzing With Type Families

Some time ago, I published a post discussing how we solve the now-notorious FizzBuzz problem with computation in the type system. In the interests of flogging this horse well and truly to death, I’ve since adapted the code to make use of an upcoming GHC feature: type families. These are essentially a form of function at the type level: just what we need to make our FizzBuzz program a little more sane!

If you’re playing along at home, you’ll need a HEAD version of GHC and the following options pragma:

{-# OPTIONS_GHC -XEmptyDataDecls -XTypeOperators -XTypeFamilies -fallow-undecidable-instances
    -XMultiParamTypeClasses -XFunctionalDependencies #-}

As you can see, the extension doesn’t do anything to mitigate our “undecidability” problems. I’m also still making use of some type class extensions, but as we will see later that’s just necessary in one place, for quite an interesting reason.

Preliminaries are pretty much as before:

module Main where

data S a
data Z

data Negative

type Zero = Z
type One = S Z
type Three = S (S (S Z))
type Five = S (S (S (S (S Z))))
type OneHundred = (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S Z)
                  )))))))))))))))))))))))))))))))))
                  )))))))))))))))))))))))))))))))))
                  )))))))))))))))))))))))))))))))))

Now, we reach our first type family: the subtraction function!

type family Sub a b
type instance Sub a Z = a
type instance Sub Z (S b) = Negative
type instance Sub (S a) (S b) = Sub a b

Woah, that’s it? Let’s remind ourselves of what it looked like last time:

class Sub a b c | a b -> c, c b -> a where
instance Sub a Z a where
instance (Sub a b c) => Sub (S a) (S b) c where

As you may guess, the function-ness of Sub is now enforced by the type families, rather than the functional dependencies. Apart from that it really just looks like I swapped some keywords around (and added a clause for negative numbers). However, the modulus finding code is a hell of a lot more readable:

type family Mod a b
-- Need -fallow-undecidable-instances for nested applications
type instance Mod a b = ModIter a b (Sub a b)

type family ModIter a b next
type instance ModIter a b Negative = a
type instance ModIter a b Z = Z
type instance ModIter a b (S next) = Mod (S next) b

In fact, it’s so readable it almost looks like a real programming language :-) ! The rest of the translated code is similarly improved, though we still need to define distinct type functions if we want to do some sort of case split, which bulks out the program a bit:

type family IfZero test true false
type instance IfZero Z true false = true
type instance IfZero (S a) true false = false

data Boring
data Fizz
data Buzz
data FizzBuzz

type family FizzBuzziness fizz buzz
type instance FizzBuzziness Boring Boring = Boring
type instance FizzBuzziness Boring Buzz = Buzz
type instance FizzBuzziness Fizz Boring = Fizz
type instance FizzBuzziness Fizz Buzz = FizzBuzz

type family AnswerAt i
-- Need -fallow-undecidable-instances for nested applications
type instance AnswerAt i = FizzBuzziness (IfZero (Mod i Three) Fizz Boring) (IfZero (Mod i Five) Buzz Boring)

data Nil
data a :+: b

type family AnswerIter i
type instance AnswerIter Z = Nil
type instance AnswerIter (S a) = AnswerIter a :+: AnswerAt (S a)

Here’s where we come to a somewhat interesting issue. If you recall, when we wanted to actually compute the type-level answer last time we just wrote something like this:

tAnswerIter :: AnswerIter i a => i -> a
tAnswerIter = undefined

answer = tAnswerIter (undefined :: OneHundred)

In resolving the AnswerIter constraint the compiler was forced into evaluating our huge stack of type classes. You might expect that we could write something like this and then ask for it’s type in GHCi to get the same effect with type families:

answer = (undefined :: AnswerIter OneHundred)

Unfortunately (and entirely consistently, when you think about it) type families are lazily evaluated! This means that GHCi just tells us that answer has the type AnswerIter OneHundred: this is no help at all! In order to force evaluation of the function I had to define a “deep seq” style operation by dropping back into type classes:

answer = deepSeq (undefined :: AnswerIter OneHundred)

class DeeqSeq a b | a -> b where
    deepSeq :: a -> b

instance DeeqSeq Nil Nil where
    deepSeq = id

instance DeeqSeq (x :+: y) (x :+: y) where
    deepSeq = id

Now we get the expected result, just as before:

Prelude Main> :t answer
answer :: (…(Nil :+: Boring) :+: Boring) :+: Fizz) :+: Boring) :+: Buzz)
       :+: Fizz) :+: Boring) :+: Boring) :+: Fizz) :+: Buzz) :+: Boring)
       :+: Fizz) :+: Boring) :+: Boring) :+: FizzBuzz) :+: …)

IT LIVES! AGAIN!

I’ve yet to actually do anything useful with type families, but they seem like a powerful extension. Haskell users will be able to get their sweaty hands on them when the next release of GHC rolls around, which I’ve heard will be ICFP08 time.

Now, I promise not to write the word “FizzBuzz” again for at least three months ;-) .


Apr 24 2008

The Summer Of Code, or Compiler Development for the Masses

I’m very pleased to report that my application for the Google Summer of Code has been accepted! It almost goes without saying to mention that I’ve proposed work on the leading compiler for my language-du-jour: Haskell!

So, what exactly am I working on? Well, I and my mentor, Sean Seefried, think it would be awesome if we could give users of Haskell the ability to extend the compiler with their own code. Of course, they can do this today if they are willing to dig into and try and grok the rather intimidating guts of GHCs 216KLOC codebase, but we’d really like to let you do it without a source checkout of GHC and in a way so that it’s easy to use other peoples extensions too.

How are we going to meet these exacting criteria? The plan is to let people write modules that we can distribute via the existing Cabal packaging/build system infrastructure and load into GHC dynamically! We owe the ability to do this to Don Stewart’s excellent hs-plugins library. I’m also going to rustle up a good chunk of documentation and sample code to make it easy as pie to get into development.

This is going to give the Haskell community a whole new way in which to extend the language: I’m very excited to see what they come up with! However, here are just a taste of some of the more reasonable things I think our plugins are going to be able to do:

  • Selectively make Haskell a strict functional programming language
  • Optimize your code in whatever application-specific way you can come up with
  • Declaratively memoize arbitrary function definitions
  • Compile Haskell code to run on GPUs if available
  • Simple empirical research on functional programming by letting you write code analysis extensions
  • Cure world hunger

OK, that last one might be a bit optimistic, but I’m still very excited about the possibilities :-) .

What’s more, although nothing is certain, it looks like come October I’ll be working here at the Cambridge Computer Lab as a PhD student! I’ve proposed to investigate some aspects of parallelism in functional programming – more on this as it unfolds. I’ve just got to worry about getting a first in my finals – which are only a month or so away! Gah!


Mar 14 2008

Bitesize Functional Programming: Comprehensive Comprehensions

As my final year undergraduate project I’ve implemented Philip Wadler’s and Simon Peyton Jones’ Comprehensive Comprehensions in the Glasgow Haskell Compiler. I’m happy to report that my patch was accepted for inclusion in the compiler, so this is a feature you’ll really be able to use come the release of 6.10!

So, what are comprehensive comprehensions? Well, first let’s just review what list comprehensions are:

triples = [(a,b,c) | a <- [1..20]
                   , b <- [a..20]
                   , let sum = a^2 + b^2
                   , c <- [b..20]
                   , sum == c^2]

This Haskell code draws numbers from 1 to 20 into the variable a, and for each one of those elements draws the numbers from a to 20 into b. We then compute the sum of the squares of each of those numbers, and finally give up in all but the case where we can find some c in the range from b to 20 such that that sum is the same as the square of c. Assuming that we have passed this test, we output a tuple containing a, b and c. Clearly, this implements some code for finding small Pythagorean triples.

Pythagorean Triples

So, the result will be:

[(3,4,5), (5,12,13), (6,8,10), (8,15,17), (9,12,15), (12,16,20)]

We can build an analogy between what we can do with list comprehensions and what we can do with parts of SQL. Drawing an item from a list is like SELECTion, filtering out certain items is similar to a WHERE clause and so on. However, there are some prominent features of SQL that aren’t so easy to express with just these vanilla list comprehensions. How could we write the following query in Haskell?

SELECT Name, SUM(Salary)
FROM Employees
GROUP BY Dept
ORDER BY SUM(Salary) ASC
LIMIT 5

Well, the answer is that we can use our lovely new generalized list comprehensions and write this:

[(the dept, sum salary)
  | (name, dept, salary) <- employees
  , then group by dept
  , then sortWith by sum salary
  , then take 5]

Notice that the existing keyword “then” is being used to introduce the sorting and grouping functionality that we require: this happens to be a bit different from the syntax in the paper, so be careful you don’t get caught out by this.

What isn’t obvious from the presentation above is just how general these comprehensive comprehensions are! Unlike SQL you can sort or group at any point in the query without making use of subqueries. Unlike SQL you can use any aggregation function, not just one it has defined like COUNT or SUM. Most significantly, unlike SQL you can actually use alternative grouping and sorting functions! So, you could choose a grouping function that actually looked for runs in the thing you are grouping by, or a “sorting” function that just returns the first 5 elements of the list: if you look at the example above we used just this feature to implement the LIMIT clause of SQL.

These new pieces of syntax are pretty nice, and speaking from personal experience it’s quite hard to do heavy duty list manipulation without them once you’ve grown used to having them available. In particular, I found them to have a killer application in the code I used for benchmarking the syntax extension, for slicing and dicing lists of the runtime and heap usage of various program runs.

If you want the full story on how to use the syntax, you can check out the documentation. Go forth and conquer with your new, readable, list-munging powers! I’ll leave you with a picture of the guys you have it all to thank for, those giants of functional programming Simon Peyton Jones and Philip Wadler:

Simon Peyton Jones and Philip Wadler