Jul 14 2008

Compiler Plugins For GHC: Week Five

Welcome to the fourth instalment in this ongoing series! How have compiler plugins progressed this week?

Haddocking GHC Internals
If we're going to have any hope of people other than GHC developers writing plugins, there needs to be accessible documentation about the relevant parts of the compiler. The lions share of my time last week was taken up documenting about 40 of GHCs modules, describing their functions and writing some notes about GHC jargon that is useful to know.

Unfortunately, I don't yet have an HTML version to show you since Haddock won't run on GHC's source code yet due to our use of the C-preprocessor. However, I'm assured that this is being resolved..

Annotation System Enhancements
I've added the ability for plugins to attach new annotations to Core syntax trees during compilation, rather than just sticking with the ones that were present in the source statically. This means you could now implement e.g. a strictness analysis pass with a seperate pass that used the annotations generated by that pass to perform the worker-wrapper transform.

I've additionally added the ability to generate annotations through Template Haskell. This smells like it might be useful to someone.

Phase Aliases
Roman Leshchinskiy replied to my previous email on phase control for GHC and came up with some interesting comments. The upshot of this is that I've added the ability to specify phase equality, rather than just inequality:

{-# PHASE Foo = Bar, < Spqr #-}

Pan Optimization Sample Plugin
My mentor for this project, Sean Seefried, worked on pluggable compilers as part of his doctoral studies. One of the things that came out of that was a GHC plugin that performed a domain-specific optimization known as "image lifting" on his reimplementation of the Pan combinator library for functional image construction.

I've ported this plugin to my own compiler plugins framework, and extracted considerable amounts of it's utility code into GHC itself for use by other plugin authors. I also hope to use it's codebase as an exemplar of how to write a large compiler plugin.

API Cleanup
GHC has grown by accretion, and as a result some of the APIs we provide are inconsistently named, are parts of incomplete sets of functions and so on. I've been spending some time refactoring the worst offenders so the code is a bit more presentable, and hopefully the code will be a bit easier to get a handle on for users new to GHC.

Conclusion
A good week: I've hit all the goals I laid out in the last installment of this series. However, documentation work is a bit tedious and I'm glad I've got a large chunk of it out of the way: this leaves me free to work on some more exciting things this week.

This week I'm focusing on polishing off the rough edges in my API and sample plugins, so they are something approaching releasable. I also have numerous annoying to-dos accumulated from the last five weeks that I will ned to take another another look at. This small stuff aside, I've just spent the first day of week 6 working on a rather exciting feature that I think will be very useful even for those who do not plan to be plugin authors: you'll have to wait until my next post to find out about that!


Jul 5 2008

Compiler Plugins For GHC: Weeks Three and Four

I was attending graduation last weekend so didn't have time to put a progress post together, which means that this week you get a double helping of GHC-plugins goodness! For those new to the series, you might like to read the first two posts before continuing.

Type Safe Dynamic Loading

This project is all about dynamically loading plugin code into GHC so it can transform the program being compiled. Until now, I blindly trusted that if the plugin exposed a value with the name "plugin" it was indeed a Plugin data structure, but now I've pieced together some parts of the GHC typechecker to use the associated .hi files to check whether this is indeed the case.

It might be useful to expose this to users as an alternative to parts of hs-plugins, but as it would need a spot of generalization work to be done first this is not on the agenda at the moment.

Annotations System

What is an annotations system? It is entirely analagous to the annotations system of Java or the attributes of .NET languages in that it allows you to associate bits of user data with elements of the program. In my current design, you can attach annotations to modules, types, data constructors and top level values, like so:

module Example where

import AnnotationTypes ( MyAnnotationType )

{-# MODANN MyAnnotationType { friendliness = 10, friend = "Jim" } #-}

{-# TYPEANN Foo MyAnnotationType { friendliness = 20, friend = "Bob" } #-}
data Foo = Bar

{-# ANN f MyAnnotationType { friendliness = 30, friend = "Jane" } #-}
f x = x

Actually, you can use arbitrary expressions in your annotations, as long as they eventually boil down to something with a Typeable instance. So this rather esoteric expression would be perfectly fine:

{-# ANN f SillyAnnotation { foo = (id 10) + $( [| 20 |]), bar = 'f } #-}

You probably want to avoid non-termination or even expensive computation in those annotations as they are potentially evaluated at compile time! Plugins can see annotations during compilation and hence can use them to guide the transformations they perform on your code, but you can also access the annotations of any module via the GHC API.

I've previously alluded to the difficulties with an annotations system: I'll take this opportunity to discuss them a bit further. Consider this program:

module Example2 ( exported ) where

exported = not_exported

{-# ANN not_exported Just "Hello" #-}
not_exported = 10

Because "not_exported" is only used once its definition will be inlined straight into "exported" (regardless of it's size). This means that the annotation on it is entirely useless, as odds are that the plugin will never see the not_exported identifier in the program!

We have the same sort of problem with identifiers in the rules system, and require manual addition of NOINLINE pragmas to the relevant identifiers to circumvent it, but it all feels rather clumsy and I'm not sure what the best solution is.

Note that this is not a problem for other modules accessing the annotation, as by definition they do so on an exported identifier that does not suffer this treatment.

Another problem with annotations is that it's almost impossible to allow them on non top-level identifiers with the current GHC implementation, as those identifiers get created and destroyed with reckless abandon by compiler passes.

We work around this for things like SCCs by actually making SCCs a kind of expression in the intermediate language, but doing this for annotations doesn't sit well with the idea of being able to look up the annotations attached to a particular identifier from other modules using the GHC API. So again, I'm not quite sure what the solution is here.

Sample Plugins

To try and produce some sample code for the eventual release and get some experience about the API I need to provide to plugin authors I have implemented some simple compiler plugins. I've got two complete so far:

  • A plugin to make Haskell a strict rather than lazy language
  • A plugin that performs GHCs current common subexpression elimination pass, but outside of the main compiler

There are some more planned: watch this space.

Conclusion
It's been a month since I began the project, and I'm fairly pleased with my progress up to this point. There's still a lot left to do, but I'm confident I should have something presentable by the end of the Summer Of Code period.

Next week will probably see some refinements to the annotation and phase control systems, the construction of a few more sample plugins, and perhaps even a start on documenting GHCs internals to some extent.


Jun 23 2008

Compiler Plugins For GHC: Week Two

I wasn't quite as productive with my Summer Of Code project this week as I was last week. Let's take a look at the big ticket items that were accomplished.

Phase Control System Implemented

I covered the design of in my last post, and most of my work over the week has been on implementing and refining that proposal. It's not essentially done, with a remaining small but significant wrinkle. The new phase control system lets you write rules that make use of phases above and beyond the existing ontology of phase 0, 1 and 2. An example of such a rule is as follows:

import GHC.Prim ({-# PHASE ConstructorSpecialization #-})

{-# RULES "foldr/build" [~ConstructorSpecialization] foldr c n (build g) = g c n #-}

That's all very well, but the snag is that we actually have one of these phases for every compiler pass in GHC, so in order to ensure that we always fire the RULEs that may be set up we need to insert a full simplifier pass after almost every compiler pass - yikes! That's a lot more simplification than we currently do and it can't be good for compile times. I'm still thinking about how to resolve that one.

Compiler Pipeline Dynamically Constructed From Constraints
This is the reason why we have one phase for every compiler pass: I've changed GHC so its entire core-to-core pipeline is now built up from the relative ordering of these phases and the phase tags I've attached to every pass. This is a prerequisite for allowing compiler plugin authors to insert their own core-to-core passes by specifying declaratively when they would like them to run.

Template Haskell Phase Integration
So we have these phase pragmas, but how do plugin go about referring to phases in their actual code that talks to GHC? The answer is with the new support for phases in Template Haskell!

{-# PHASE MyPhase #-}

... stuff ...

getPluginPasses :: HscEnv -> CoreModule -> IO [PhasedPluginPass]
getPluginPasses hsc_env cm = do
    ... stuff ...
    Just phase_name <- thNameToGhcName hsc_env '''MyPhase
    return [PhasedPluginPass phase_name (VanillaPluginPass pass)]

This code is using the new triple quote notation to get a Template Haskell Name for the compiler phase, which is converted to a GHC Name and finally given to GHC itself. Of course, the Template Haskell support allows a lot more than this, such as generating new phases and splicing them in to your code at compile time.

Conclusion
The project is still coming steadily along. I'm starting this week with ancillary work on the Static Argument Transformation that isn't directly related to the project, but then I hope to move on to the plugin annotations system that I called out last week as a looming and highly thorny issue: expect to see more on this topic soon!


Jun 15 2008

Compiler Plugins For GHC: The First Week

Things have been coming along very well with my Summer of Code project to add dynamically loaded plugins to the Glasgow Haskell Compiler. In my first week of coding post-finals I've got a lot done. I'll be discussing two of the headline items in this post.

Proof Of Concept Plugin Loading

GHC is capable of dynamically loading plugins specified on the command line from any installed package, and running the compiler phases that they install. To give you an idea of what that looks like, here is the current code for my sample-plugin project:

module Simple.Plugin(plugin) where

import UniqSupply
import DynFlags
import HscTypes
import CoreSyn
import Outputable
import Module

import Plugins (Plugin(..), PluginPass(..))

plugin :: Plugin
plugin = Plugin {
    initializePlugin = initialize,
    getPluginPasses = getPasses
  }

initialize :: IO ()
initialize = do
    putStrLn "Simple Plugin Initialized"

getPasses :: CoreModule -> IO [PluginPass]
getPasses cm = do
    putStrLn "Simple Plugin Passes Inspecting Module"
    let mod_s = showSDoc (pprModule (cm_module cm))
    putStrLn $ "Simple Plugin Passes Queried For " ++ mod_s
    return [PluginPass pass]

pass :: DynFlags -> UniqSupply -> [CoreBind] -> IO [CoreBind]
pass _ _ binds = do
    putStrLn "Simple Plugin Pass Run"
    return binds

There's a lot of work still to do here: the biggie is allowing "annotations" a-la languages like C# that let you mark identifiers or expressions in the language with extra stuff that meta-programs can make use of. For example, you might want to tag which functions you want your compiler plugin to analyse or add instrumenting code to. It's quite hard to get this feature right, and I'll probably be posting some more about the issues involved later as I get closer to implementing it.

Phase Control

GHC compiles your programs in a classic pipelined style: the main stages in a typical pipeline would be lexing, parsing, typechecking, desugaring, optimization and finally code generation. Although most of these stages have to run in a particular order, some of the stages can potentially run in multiple orders, most notably those sub-stages that make up the "optimization" stage I mentioned.

This is relevant to plugins because we need to be able to say when any phases you install should run. However, it also turns out that we use this feature to allow compiler users to control when inlining and source code rewrite rules should be applied, as documented in the user guide.

The current system we use is a bit ugly and just establishes a mapping between controlled things and the natural numbers to establish an ordering. This week, I've proposed (and partially implemented) a system for phase control that uses phase names that are declared in PHASE pragmas and henceforth exported and imported just like any other Haskell name, so for example:

module Spqr(... {-# PHASE C #-} ...) where

import GHC.Phases({-# PHASE SpecConstr #-})

{-# PHASE C < SpecConstr #-}

{-# RULE "silly" [~C] id = (\x -> x) #-}

This establishes a new phase C that must run before GHCs constructor specialization phase. This phase is in turn used to control the activation of the "silly" rule, and the phase exported so it can be referred to by other modules.

If you have any comments about this system, please make yourself heard on glasgow-haskell-users!

Conclusion

I'm fairly pleased with my progress so far and having a great time finally doing some coding again after the long exam period!

Hopefully this coming week will see me complete the implementation of the new phase control system and a refactoring of GHCs existing pipeline construction to take into account that phase information. I should then be able to move on to some issues more directly related to plugins, such as the rather thorny issue of the annotation system.


May 8 2008

Free Monads In Haskell

Did you know that you can actually generate a monad automatically given any endofunctor? This is the free monad of that functor. How do we perform this magic trick? Quite simple, everything follows from this data type:

data FreeM f a = Return a
               | Bind (f (FreeM f a))

In this, the f parameter is just the functor we are going to build a monad for, and a is the type of the contents of a FreeM value. We've said that the free monad can either contain such a value directly (boring) or it can contain a value of it's own type, but wrapped in the functor f - hmm! As an aside, I'm not sure that Bind is strictly the best name for this constructor, but I like the symmetry.

If you want to build up some intuition about this definition, you can think of it as specifying values that consist of an actual a (in a Return constructor) but nested within a finite number of functor applications.

As another aside, you might like to know that actually Haskell would let you create an infinite stack of functors fairly straightforwardly, since all its data declarations are actually really co-data declarations. Don't do this, though, or you might start having some non-termination issues :-)

Having got this far through the post, you won't be surprised to learn that we can make this definition into a functor itself. Here comes the implementation:

instance (Functor f) => Functor (FreeM f) where
    fmap f (Return a) = Return (f a)
    fmap f (Bind fm) = Bind (fmap (fmap f) fm)

That Bind case looks pretty scary, doesn't it? The leftmost fmap call is that from the functor f that we are making a free monad over, and the rightmost one is actually a recursion back into the fmap we are currently trying to compute, but one functor "lower".

Essentially all we are doing in this definition is fmaping all the way down our tower of functors until we finally reach a nice sane a value in the Return case, which we handle in the obvious way.

So far so good. But we've come all this way and still don't have a monad, even though I promised you one at the start. Well, let's sort that out now! Instead of defining one using the usual Haskell >>= operator, I'm going to use the more category-theoretical join :: Monad m => m (m a) -> m a construction:

joinFreeM :: Functor f => FreeM f (FreeM f a) -> FreeM f a
joinFreeM (Return a) = a
joinFreeM (Bind fm) = Bind (fmap joinFreeM fm)

If you think about the type of this operation, what we want to do is stick the pile of functors associated with the outermost FreeM onto the pile of functors associated with the innermost one, to produce one cohesive functor pile. How we go about this is fairly straightforward: in the Return case there are no functor applications on the outermost FreeM so we just give back the inner one. The Bind case simply recurses its way down the whole functor pile, mushing them together in some sense :-).

That was it. The code to actually declare a Monad instance is entirely boilerplate given that we have a join operation:

instance (Functor f) => Monad (FreeM f) where
    return = Return
    m >>= f = joinFreeM ((fmap f) m)

Pretty neat! But what, pray tell, are these things actually useful for? Well, after I'd implemented my own version of free monads I found that Wouter Swierstra's excellent paper "Data types à la carte" had already publicly demonstrated the same thing (in passing) with his Term data structure, and he has some nice examples. For instance, consider this functor:

data Zero a = Zero
  deriving (Show)

instance Functor Zero where
    fmap _ Zero = Zero

zeroTest :: FreeM Zero Int -> FreeM Zero Int
zeroTest my = do
    x <- return 5
    y <- my
    return $ x + y

-- > zeroTest (return 1)
-- 6
-- > zeroTest (Bind Zero)
-- Zero

Remind you of anything? It should: it's just the Maybe monad!

What about this?

data One a = One a
  deriving (Show)

instance Functor One where
    fmap f (One a) = One (f a)

oneTest :: FreeM One Int -> FreeM One Int
oneTest my = do
    x <- return 5
    y <- my
    return $ x + y

-- > oneTest (Return 2)
-- 7
-- > oneTest (Bind $ One $ Return 2)
-- One 7
-- > oneTest (Bind $ One $ Bind $ One $ Return 2)
-- One One 7

Something like the identity monad: pretty boring! What if we try something a bit more exotic (and not mentioned in Wouter's paper) like the list functor?

listTest :: FreeM [] Int
listTest = do
    x <- Bind [Return 1, Return 2]
    y <- Bind [Return 3, Return 4]
    return $ x + y

-- > listTest
-- [[4,5],[5,6]]

But for want to a call to concat, this looks strikingly similar to the list monad - and all without ever having to define it. Interesting stuff!

I've found that the fun of free monads is that you can play around with any functor you like and get something interesting generated from it. Since they're free, this happens without you typing a single line of non-trivial code, so I encourage you to go and try a few out quickly and see what you can make happen!


Apr 27 2008

FizzBuzzing With Type Families

Some time ago, I published a post discussing how we solve the now-notorious FizzBuzz problem with computation in the type system. In the interests of flogging this horse well and truly to death, I've since adapted the code to make use of an upcoming GHC feature: type families. These are essentially a form of function at the type level: just what we need to make our FizzBuzz program a little more sane!

If you're playing along at home, you'll need a HEAD version of GHC and the following options pragma:

{-# OPTIONS_GHC -XEmptyDataDecls -XTypeOperators -XTypeFamilies -fallow-undecidable-instances
    -XMultiParamTypeClasses -XFunctionalDependencies #-}

As you can see, the extension doesn't do anything to mitigate our "undecidability" problems. I'm also still making use of some type class extensions, but as we will see later that's just necessary in one place, for quite an interesting reason.

Preliminaries are pretty much as before:

module Main where

data S a
data Z

data Negative

type Zero = Z
type One = S Z
type Three = S (S (S Z))
type Five = S (S (S (S (S Z))))
type OneHundred = (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S Z)
                  )))))))))))))))))))))))))))))))))
                  )))))))))))))))))))))))))))))))))
                  )))))))))))))))))))))))))))))))))

Now, we reach our first type family: the subtraction function!

type family Sub a b
type instance Sub a Z = a
type instance Sub Z (S b) = Negative
type instance Sub (S a) (S b) = Sub a b

Woah, that's it? Let's remind ourselves of what it looked like last time:

class Sub a b c | a b -> c, c b -> a where
instance Sub a Z a where
instance (Sub a b c) => Sub (S a) (S b) c where

As you may guess, the function-ness of Sub is now enforced by the type families, rather than the functional dependencies. Apart from that it really just looks like I swapped some keywords around (and added a clause for negative numbers). However, the modulus finding code is a hell of a lot more readable:

type family Mod a b
-- Need -fallow-undecidable-instances for nested applications
type instance Mod a b = ModIter a b (Sub a b)

type family ModIter a b next
type instance ModIter a b Negative = a
type instance ModIter a b Z = Z
type instance ModIter a b (S next) = Mod (S next) b

In fact, it's so readable it almost looks like a real programming language :-)! The rest of the translated code is similarly improved, though we still need to define distinct type functions if we want to do some sort of case split, which bulks out the program a bit:

type family IfZero test true false
type instance IfZero Z true false = true
type instance IfZero (S a) true false = false


data Boring
data Fizz
data Buzz
data FizzBuzz

type family FizzBuzziness fizz buzz
type instance FizzBuzziness Boring Boring = Boring
type instance FizzBuzziness Boring Buzz = Buzz
type instance FizzBuzziness Fizz Boring = Fizz
type instance FizzBuzziness Fizz Buzz = FizzBuzz

type family AnswerAt i
-- Need -fallow-undecidable-instances for nested applications
type instance AnswerAt i = FizzBuzziness (IfZero (Mod i Three) Fizz Boring) (IfZero (Mod i Five) Buzz Boring)


data Nil
data a :+: b

type family AnswerIter i
type instance AnswerIter Z = Nil
type instance AnswerIter (S a) = AnswerIter a :+: AnswerAt (S a)

Here's where we come to a somewhat interesting issue. If you recall, when we wanted to actually compute the type-level answer last time we just wrote something like this:

tAnswerIter :: AnswerIter i a => i -> a
tAnswerIter = undefined

answer = tAnswerIter (undefined :: OneHundred)

In resolving the AnswerIter constraint the compiler was forced into evaluating our huge stack of type classes. You might expect that we could write something like this and then ask for it's type in GHCi to get the same effect with type families:

answer = (undefined :: AnswerIter OneHundred)

Unfortunately (and entirely consistently, when you think about it) type families are lazily evaluated! This means that GHCi just tells us that answer has the type AnswerIter OneHundred: this is no help at all! In order to force evaluation of the function I had to define a "deep seq" style operation by dropping back into type classes:

answer = deepSeq (undefined :: AnswerIter OneHundred)

class DeeqSeq a b | a -> b where
    deepSeq :: a -> b

instance DeeqSeq Nil Nil where
    deepSeq = id

instance DeeqSeq (x :+: y) (x :+: y) where
    deepSeq = id

Now we get the expected result, just as before:

Prelude Main> :t answer
answer :: (…(Nil :+: Boring) :+: Boring) :+: Fizz) :+: Boring) :+: Buzz)
       :+: Fizz) :+: Boring) :+: Boring) :+: Fizz) :+: Buzz) :+: Boring)
       :+: Fizz) :+: Boring) :+: Boring) :+: FizzBuzz) :+: …)

IT LIVES! AGAIN!

I've yet to actually do anything useful with type families, but they seem like a powerful extension. Haskell users will be able to get their sweaty hands on them when the next release of GHC rolls around, which I've heard will be ICFP08 time.

Now, I promise not to write the word "FizzBuzz" again for at least three months ;-).


Apr 24 2008

The Summer Of Code, or Compiler Development for the Masses

I'm very pleased to report that my application for the Google Summer of Code has been accepted! It almost goes without saying to mention that I've proposed work on the leading compiler for my language-du-jour: Haskell!

So, what exactly am I working on? Well, I and my mentor, Sean Seefried, think it would be awesome if we could give users of Haskell the ability to extend the compiler with their own code. Of course, they can do this today if they are willing to dig into and try and grok the rather intimidating guts of GHCs 216KLOC codebase, but we'd really like to let you do it without a source checkout of GHC and in a way so that it's easy to use other peoples extensions too.

How are we going to meet these exacting criteria? The plan is to let people write modules that we can distribute via the existing Cabal packaging/build system infrastructure and load into GHC dynamically! We owe the ability to do this to Don Stewart's excellent hs-plugins library. I'm also going to rustle up a good chunk of documentation and sample code to make it easy as pie to get into development.

This is going to give the Haskell community a whole new way in which to extend the language: I'm very excited to see what they come up with! However, here are just a taste of some of the more reasonable things I think our plugins are going to be able to do:

  • Selectively make Haskell a strict functional programming language
  • Optimize your code in whatever application-specific way you can come up with
  • Declaratively memoize arbitrary function definitions
  • Compile Haskell code to run on GPUs if available
  • Simple empirical research on functional programming by letting you write code analysis extensions
  • Cure world hunger

OK, that last one might be a bit optimistic, but I'm still very excited about the possibilities :-).

What's more, although nothing is certain, it looks like come October I'll be working here at the Cambridge Computer Lab as a PhD student! I've proposed to investigate some aspects of parallelism in functional programming - more on this as it unfolds. I've just got to worry about getting a first in my finals - which are only a month or so away! Gah!


Mar 14 2008

Bitesize Functional Programming: Comprehensive Comprehensions

As my final year undergraduate project I've implemented Philip Wadler's and Simon Peyton Jones' Comprehensive Comprehensions in the Glasgow Haskell Compiler. I'm happy to report that my patch was accepted for inclusion in the compiler, so this is a feature you'll really be able to use come the release of 6.10!

So, what are comprehensive comprehensions? Well, first let's just review what list comprehensions are:

triples = [(a,b,c) | a <- [1..20]
                   , b <- [a..20]
                   , let sum = a^2 + b^2
                   , c <- [b..20]
                   , sum == c^2]

This Haskell code draws numbers from 1 to 20 into the variable a, and for each one of those elements draws the numbers from a to 20 into b. We then compute the sum of the squares of each of those numbers, and finally give up in all but the case where we can find some c in the range from b to 20 such that that sum is the same as the square of c. Assuming that we have passed this test, we output a tuple containing a, b and c. Clearly, this implements some code for finding small Pythagorean triples.

Pythagorean Triples

So, the result will be:

[(3,4,5), (5,12,13), (6,8,10), (8,15,17), (9,12,15), (12,16,20)]

We can build an analogy between what we can do with list comprehensions and what we can do with parts of SQL. Drawing an item from a list is like SELECTion, filtering out certain items is similar to a WHERE clause and so on. However, there are some prominent features of SQL that aren't so easy to express with just these vanilla list comprehensions. How could we write the following query in Haskell?

SELECT Name, SUM(Salary)
FROM Employees
GROUP BY Dept
ORDER BY SUM(Salary) ASC
LIMIT 5

Well, the answer is that we can use our lovely new generalized list comprehensions and write this:

[(the dept, sum salary)
  | (name, dept, salary) <- employees
  , then group by dept
  , then sortWith by sum salary
  , then take 5]

Notice that the existing keyword "then" is being used to introduce the sorting and grouping functionality that we require: this happens to be a bit different from the syntax in the paper, so be careful you don't get caught out by this.

What isn't obvious from the presentation above is just how general these comprehensive comprehensions are! Unlike SQL you can sort or group at any point in the query without making use of subqueries. Unlike SQL you can use any aggregation function, not just one it has defined like COUNT or SUM. Most significantly, unlike SQL you can actually use alternative grouping and sorting functions! So, you could choose a grouping function that actually looked for runs in the thing you are grouping by, or a "sorting" function that just returns the first 5 elements of the list: if you look at the example above we used just this feature to implement the LIMIT clause of SQL.

These new pieces of syntax are pretty nice, and speaking from personal experience it's quite hard to do heavy duty list manipulation without them once you've grown used to having them available. In particular, I found them to have a killer application in the code I used for benchmarking the syntax extension, for slicing and dicing lists of the runtime and heap usage of various program runs.

If you want the full story on how to use the syntax, you can check out the documentation. Go forth and conquer with your new, readable, list-munging powers! I'll leave you with a picture of the guys you have it all to thank for, those giants of functional programming Simon Peyton Jones and Philip Wadler:

Simon Peyton Jones and Philip Wadler


Feb 27 2008

FizzBuzzing At The Type Level

It seems like the FizzBuzz meme has resurfaced, thanks to this reply to Raganwald's post on the subject. Since I didn't get on the bandwagon last time, I thought I'd take a crack at implementing with a bit of a twist: using the Haskell type system! This is my first moderately large type level program, so it's probably not quite as elegant as it could be, but it does work and is fairly readable.

As you may know, the FizzBuzz task is to output a list of length 100 that contains a "fizz" whenever the associated index is divisible by 3, a "buzz" if it is divisible by 5, and the concatenation of the two if the index satisfies both of those conditions.

Let's take a look at how we can build up a monstrous abuse of the GHC type system to implement this specification piece by piece. First, we need to turn on a crapload of options to make the compiler accept the twisted program we are going to give it:

{-# OPTIONS_GHC -XEmptyDataDecls -XMultiParamTypeClasses -XFlexibleInstances -XFunctionalDependencies
  -XTypeOperators -fallow-undecidable-instances -fcontext-stack=200 #-}

Now, we need some notion of numbers in the type system so we can actually compute divisibility. The standard trick here is to use the Peano numerals, and since it's quite a common need there are actually existing libraries for this: for instance, I could use Oleg's amazing code that even does things like type level logarithm and factorial. However I want the example to be self contained so I will just implement the restricted arithmetic system that I need for FizzBuzz here:

module Main where

data S a
data Z

type Zero = Z
type One = S Z
type Three = S (S (S Z))
type Five = S (S (S (S (S Z))))
type OneHundred = (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S
                  (S (S (S (S (S (S (S (S (S (S Z)
                  )))))))))))))))))))))))))))))))))
                  )))))))))))))))))))))))))))))))))
                  )))))))))))))))))))))))))))))))))

That's all very well, but we also need to be able to do operations on these numbers at the type level. To do this, we encode type level functions as type classes with associated instance declarations: one for each recursive case. We also mix is some magic pixie dust by using appropriate functional dependencies to make sure that we restrict the instance to be really functional, rather than relational.

Here's an example of this, for the subtraction function:

class Sub a b c | a b -> c, c b -> a where
instance Sub a Z a where
instance (Sub a b c) => Sub (S a) (S b) c where

This definition should not be too surprising to you if you've seen Peano arithmetic or Church numerals before: it's pretty standard recursive style. However, the modulus function is rather more interesting:

class Mod a b c | a b -> c where
instance (ModIter a b b c) => Mod a b c

class ModIter a i b c | a i b -> c where
instance (Sub b Z a) => ModIter Z Z b Z where
instance (Sub b (S i) a) => ModIter Z (S i) b a where
instance (ModIter (S a) b b c) => ModIter (S a) Z b c where
instance (ModIter a i b c) => ModIter (S a) (S i) b c where

Eeep! The essential trick here is to use a helper type class, ModIter, to loop repeatedly through the range from the modulus we are using down to 0, where on every iteration we subtract one from the thing we want to find the modular reduction of. When we can no longer decrement that number any further (i.e. it hits zero) we exit the loop. The result is just the modulus minus the current iteration number in the inner loop, which gives us exactly the remainder we need. To help you put this in the context of the definition above, the type variable a is the current number we want the modular reduction of, i is the inner loop index, b is the (constant) modular base and c is the eventual result.

Whew. Not too simple! That's probably the most complicated bit of the program however, it's easy going from here on out! First, let's have some helper functions for computing various mod operations:

class Mod3 a b | a -> b where
instance (Mod a Three b) => Mod3 a b where

class Mod5 a b | a -> b where
instance (Mod a Five b) => Mod5 a b where

Now we get onto the actual business of FizzBuzz: we're going to need yet more types. We start with some to represent the possible values in our output list:

data Boring
data Fizz
data Buzz
data FizzBuzz

Of course, there isn't really any such thing is as a type level list yet! Let's have one of those too:

data Nil
data a :+: b

Great. Now, we need to be able to work out the current fizzbuzziness of any particular index. This is pretty straightforward with the following definitions:

class Fizziness a b | a -> b where
instance Fizziness Z Fizz where
instance Fizziness (S a) Boring where

class Buzziness a b | a -> b where
instance Buzziness Z Buzz where
instance Buzziness (S a) Boring where

class FizzBuzziness a b c | a b -> c where
instance FizzBuzziness Boring Boring Boring where
instance FizzBuzziness Boring Buzz Buzz where
instance FizzBuzziness Fizz Boring Fizz where
instance FizzBuzziness Fizz Buzz FizzBuzz where

class AnswerAt i a | i -> a where
instance (Mod3 a m3, Mod5 a m5, Fizziness m3 f, Buzziness m5 b, FizzBuzziness f b o) => AnswerAt a o

It's all a bit tortured to ensure that we can prove to the compiler that the functional dependency on AnswerAt is respected, but it otherwise quite basic. We have a type class each for both fizziness and buzziness that, given the modular reduction of the index, outputs either the appropriate "noise" type if the mod is 0 (and hence we have divisibility) or the Boring type otherwise. These results are then laboriously munged together by the FizzBuzziness type class in the obvious way, and AnswerAt just serves to tie the knot between all of the components we have defined so far.

Lastly, we can generate the list of results using a trivial (!) type level for loop/map style of thing:

class AnswerIter i a | i -> a where
instance AnswerIter Z Nil where
instance (AnswerIter i a, AnswerAt (S i) f) => AnswerIter (S i) (a :+: f) where

Now, the moment you've all been waiting for! Let's actually get the answer to all our FizzBuzz woes!

tAnswerIter :: AnswerIter i a => i -> a
tAnswerIter = undefined

answer = tAnswerIter (undefined :: OneHundred)

Just load this puppy up in GHCi (this may take a while to compile!) and ask for the type of answer:

Prelude Main> :t answer
answer :: (...(Nil :+: Boring) :+: Boring) :+: Fizz) :+: Boring) :+: Buzz)
       :+: Fizz) :+: Boring) :+: Boring) :+: Fizz) :+: Buzz) :+: Boring)
       :+: Fizz) :+: Boring) :+: Boring) :+: FizzBuzz) :+: ...)

IT LIVES!

Frankenstein


Feb 24 2008

Raytracing With A Single Haskell List Comprehension

My dissertation this year has been on adding Simon Peyton Jones and Phil Wadlers Comprehensive Comprehensions to the Glasgow Haskell Compiler, and I might blog about that in the future. During this work, I needed to come up with some test cases for the generalized list comprehensions and so inspired by this post by LukeH I decided to implement a ray tracer in Haskell where all the raytracing logic is put within a single comprehension!

It turns out that the resulting program (which is heavily based on LukeH's version) only needs to use one feature from the generalized comprehensions, but since it's still pretty neat I thought you might be interested in seeing the (fairly crazy looking) inner loop:

render (Scene { lights, things, camera = Camera { forward, right, up, position } }) = [ row
        | y <- reverse $ (-fOV_HEIGHT, fOV_HEIGHT) `sampledBy` rESOLUTION_Y
        , let row = [ fromMaybe bACKGROUND_COLOR (traceRay ray 0)
                        | x <- (-fOV_WIDTH, fOV_WIDTH) `sampledBy` rESOLUTION_X
                , let lookingAt = norm $ forward #+ (x <* right) #+ (y <* up)
                , let ray = Ray { start = position, direction = lookingAt }
                , let traceRay innerRay depth = maybeHead [ (foldl' (#+) reflectColor naturalColors)
                        | (thing, Just (Intersection { distance, surfacePoint, surfaceNormal }))
                            <- [(thing, intersect thing innerRay) | thing <- things]
                        , then sortWith by distance
                        , let thingSurf = surface thing
                        , let reflectionNormal = norm $ (direction ray) `reflectedIn` surfaceNormal
                        , let naturalColors = [ color
                                | Light { lightPosition, lightColor } <- lights
                                , let lightVector = lightPosition #- surfacePoint
                                , let lightNormal = norm lightVector
                                , let lightDistance = magnitude lightVector
                                , let shadowTestRay = Ray { start = surfacePoint, direction = lightNormal }
                                , let nearestBlockerDistance = [ shadowDistance
                                        | Just (Intersection { distance = shadowDistance }) <-
                                            [intersect otherThing shadowTestRay | otherThing <- things]
                                        , then sortWith by shadowDistance ]
                                        `headOr` infinity
                                , nearestBlockerDistance > lightDistance
                                , let illuminance = lightNormal <.> surfaceNormal
                                , let diffuseColor = (max 0.0 illuminance) <* lightColor
                                , let specularity = lightNormal <.> reflectionNormal
                                , let specularColor = ((max 0.0 specularity) ^ (roughness thingSurf)) <* lightColor
                                , let color = ((diffuseColorAt thingSurf surfacePoint) #* diffuseColor) #+
                                              ((specularColorAt thingSurf surfacePoint) #* specularColor) ]
                        , let reflectSurfacePoint = surfacePoint #+ (epsilonOf reflectionNormal)
                        , let reflectColor =
                                if depth >= mAX_DEPTH
                                then eRROR_COLOR
                                else case (traceRay (Ray { start = reflectSurfacePoint,
                                    direction = reflectionNormal }) (depth + 1)) of
                                    Just reflectedColor ->
                                        (reflectance thingSurf surfacePoint) <* reflectedColor
                                    Nothing -> bACKGROUND_COLOR ] ] ]

Can you spot the generalized list comprehension feature that I'm using?

Here is the obligatory shot of some raytraced reflective colored spheres that it can produce:

Beautiful raytraced colored spheres

Aww, ain't that purdy? I'll post the full code up here if anyone shows any interest, but be warned - it's not nearly as pretty as the pictures it produces! I might be writing a further post about some of the more interesting techniques I've used within in, however..