Sep 30 2011

GHC-specific Alias Analysis for LLVM

The setup

A few years ago, David Terei did some great work adding a LLVM backend to the Glasgow Haskell Compiler. The idea with this is that instead of writing our own optimiser and assembly-code generators for our custom three-address-code, we can just translate into LLVM IR and have LLVM do the heavy lifting. In theory, this means that GHC will be able to compile for many different CPUs, and will benefit from the smart optimisations the LLVM team have implemented.

The portability part has definitely worked out for us: for example, a couple of people have successfully got GHC to compile for the ARM by using the LLVM backend. However, the promise of LLVM being able to speed up our generated code has never really been fully borne out. LLVM-generated code does tend to be better than that produced by GHCs own backends, but this is mostly because LLVM is doing much better register allocation (it is much smarter about reusing the “pinned registers” required that form part of the interface between GHC’s generated code and the garbage collector).

The reason that LLVM does not optimise as much as we would like is often to do with aliasing. In particular, LLVM conservatively assumes that GHC’s stack (which is explicitly represented in the generated code as an array of words) and the heap may alias.

What’s the problem?

A concrete example of this is the following Haskell program:

module Main(main) where

import Data.Array.Base
import Data.Array.IO
import Data.Array.MArray

main :: IO ()
main = do
    arr <- newArray_ (0, 200)
    go arr 2 0 100

go :: IOUArray Int Int -> Int -> Int -> Int -> IO ()
go arr stride x y | x < y     = do unsafeWrite arr (x * stride) 1337
                                   go arr stride (x + 1) y
                  | otherwise = return ()

This loop compiles to fairly good Core:

Main.main_$s$wa =
  \ (@ sg0_sKA::Data.Array.Base.STUArray
                  GHC.Prim.RealWorld GHC.Types.Int GHC.Types.Int
                  ~
                Data.Array.IO.Internals.IOUArray GHC.Types.Int GHC.Types.Int)
    (sc_sKs :: GHC.Prim.State# GHC.Prim.RealWorld)
    (sc1_sKt :: GHC.Prim.Int#)
    (sc2_sKu :: GHC.Prim.Int#)
    (sc3_sKv :: GHC.Prim.Int#)
    (sc4_sKw :: GHC.Types.Int)
    (sc5_sKx :: GHC.Types.Int)
    (sc6_sKy :: GHC.Types.Int)
    (sc7_sKz :: GHC.Prim.MutableByteArray# GHC.Prim.RealWorld) ->
    case GHC.Prim.<# sc2_sKu sc1_sKt of _ {
      GHC.Bool.False -> (# sc_sKs, GHC.Unit.() #);
      GHC.Bool.True ->
        case GHC.Prim.writeIntArray#
               @ GHC.Prim.RealWorld
               sc7_sKz
               (GHC.Prim.*# sc2_sKu sc3_sKv)
               1337
               sc_sKs
        of s2#_aHo { __DEFAULT ->
        Main.main_$s$wa
          @ (sym
               Data.Array.IO.Internals.NTCo:IOUArray GHC.Types.Int GHC.Types.Int)
          s2#_aHo
          sc1_sKt
          (GHC.Prim.+# sc2_sKu 1)
          sc3_sKv
          sc4_sKw
          sc5_sKx
          sc6_sKy
          sc7_sKz
        }
    }

One weird thing about this Core is that it passes around a number of dead arguments (sc4_sKw, sc5_sKx and sc6_sKy). This is a known bug in GHC, and is caused by a phase ordering problem. However, this particular infelicity should not prevent LLVM from being able to do the classic loop optimisation of strength reduction on our code.

The particular strength reduction we would like to perform si to replace the multiplication GHC.Prim.*# sc2_sKu sc3_sKv in the main_$s$wa loop with an addition. This is possible because the left operand sc2_sKu is a loop induction variable, increasing by 1 every iteration. Thus, on every iteration the value of the multiplication GHC.Prim.*# sc2_sKu sc3_sKv is just the value of the multiplication on the previous loop, plus sc3_sKv. Thus, by adding a loop variable that records the value of the multiplication on the previous iteration, we can replace the multiplication by an addition.

Unfortunately, LLVM currently can’t strength-reduce this loop in the suggested way. As we will soon see, this is due to aliasing.

Why does the problem happen?

We can immediately see the problem if we look at the optimised LLVM code for this loop:

c1TW.lr.ph:
  ...
  %ln1TL1 = load i64* %Sp_Arg, align 8
  ...

c1TW:                                             ; preds = %c1TW.lr.ph, %c1TW
  %ln1TL4 = phi i64 [ %ln1TL1, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1Uy = mul i64 %ln1Uu, %ln1TL4
  %ln1Uz = add i64 %ln1Uw, %ln1Uy
  %ln1UA = inttoptr i64 %ln1Uz to i64*
  store i64 1337, i64* %ln1UA, align 8
  %ln1UE = load i64* %Sp_Arg, align 8
  %ln1UF = add i64 %ln1UE, 1
  store i64 %ln1UF, i64* %Sp_Arg, align 8
  %ln1TP = load i64* %ln1TN, align 8
  %ln1TQ = icmp slt i64 %ln1UF, %ln1TP
  br i1 %ln1TQ, label %c1TW, label %n1TX.loopexit

The strength reduction optimisation depends on one of the operands to the multiplication being a loop induction variable. In our case, we expect that sc2_sKu will be such a variable. However, looking at the LLVM code we can see that the equivalent LLVM variable, %ln1TL4, has its induction-ness hidden because it is reloaded from the stack by load i64* %Sp_Arg on every iteration.

You might wonder why the store to the same stack location by store i64 %ln1UF, i64* %Sp_Arg is not forwarded to this load by LLVM. If this were to happen, we could get code like this:

c1TW.lr.ph:
  ...
  %ln1TL1 = load i64* %Sp_Arg, align 8
  %ln1UE.ph = load i64* %Sp_Arg, align 8
  ...

c1TW:                                             ; preds = %c1TW.lr.ph, %c1TW
  %ln1TL4 = phi i64 [ %ln1TL1, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1UE = phi i64 [ %ln1UE.ph, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1Uy = mul i64 %ln1Uu, %ln1TL4
  %ln1Uz = add i64 %ln1Uw, %ln1Uy
  %ln1UA = inttoptr i64 %ln1Uz to i64*
  store i64 1337, i64* %ln1UA, align 8
  %ln1UF = add i64 %ln1UE, 1
  store i64 %ln1UF, i64* %Sp_Arg, align 8
  %ln1TP = load i64* %ln1TN, align 8
  %ln1TQ = icmp slt i64 %ln1UF, %ln1TP
  br i1 %ln1TQ, label %c1TW, label %n1TX.loopexit

In this code the fact that %ln1UE is an induction variable is obvious, and not obscured by an intermediate load from memory. And indeed, LLVM is able to strength-reduce this loop!

The reason that LLVM does not forward this load is because in general it is unsafe, since the store to %ln1UA might alias it if %ln1UA were equal to %Sp_Arg. The ridiculous thing about this is that we know that in the code generated by GHC, the stack pointer will never be stored away anywhere, so it can’t possible alias with the unknown pointer %ln1UA and LLVM is being unnecessarily conservative.

The solution

LLVM is a beautiful bit of software, and it provides exactly the extensibility point we require to resolve this problem: we can write our own alias analysis pass that knows that GHC’s stack never alias with any another non-stack pointer and dynamically load it into the LLVM optimisation tool chain.

This is exactly what I’ve done. The code is available as a Gist, and interested parties (who use OS X!) can build it like so:

g++ -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-exceptions -fno-rtti -fno-common -Wall \
-Wl,-flat_namespace -dynamiclib GHCAliasAnalysis.cpp -o GHCAliasAnalysis.dylib -lLLVM-`llvm-config --version`

Once built, we can dynamically load the resulting dylib into LLVMs opt tool using the -load option, and then use the new -ghc-aa flag to tell LLVM to use our alias analyser as a complement to the default one. Unfortunately, due to an infelicity in LLVM, we have to specify -ghc-aa in between every single optimisation pass if we want to be sure that it is used. So the final command line to opt, including all passes done by the standard -O2 optimisation level, and the -loop-reduce strength-reduction pass, needs to look something like this:

opt -load GHCAliasAnalysis.dylib -S -no-aa -tbaa -basicaa -ghc-aa \
-globalopt -ghc-aa -ghc-aa -ipsccp -ghc-aa -deadargelim -ghc-aa -instcombine -ghc-aa -simplifycfg \
-ghc-aa -basiccg -ghc-aa -prune-eh -ghc-aa -inline -ghc-aa -functionattrs -ghc-aa -scalarrepl-ssa \
-ghc-aa -domtree -ghc-aa -early-cse -ghc-aa -simplify-libcalls -ghc-aa -lazy-value-info -ghc-aa \
-jump-threading -ghc-aa -correlated-propagation -ghc-aa -simplifycfg -ghc-aa -instcombine -ghc-aa \
-tailcallelim -ghc-aa -simplifycfg -ghc-aa -reassociate -ghc-aa -domtree -ghc-aa -loops -ghc-aa \
-loop-simplify -ghc-aa -lcssa -ghc-aa -loop-rotate -ghc-aa -licm -ghc-aa -lcssa -ghc-aa -loop-unswitch \
-ghc-aa -instcombine -ghc-aa -scalar-evolution -ghc-aa -loop-simplify -ghc-aa -lcssa -ghc-aa -indvars \
-ghc-aa -loop-idiom -ghc-aa -loop-deletion -ghc-aa -loop-unroll -ghc-aa -memdep -ghc-aa -gvn -ghc-aa \
-memdep -ghc-aa -memcpyopt -ghc-aa -sccp -ghc-aa -instcombine -ghc-aa -lazy-value-info -ghc-aa \
-jump-threading -ghc-aa -correlated-propagation -ghc-aa -domtree -ghc-aa -memdep -ghc-aa -dse \
-ghc-aa -adce -ghc-aa -simplifycfg -ghc-aa -instcombine -ghc-aa -strip-dead-prototypes -ghc-aa \
-constmerge -loop-reduce

(Yes, I know this is ridiculous! I hope the LLVM developers fix this soon.)

With my new alias analysis pass, LLVM is able to produce the following beautiful code for the loop:

c1TW:                                             ; preds = %c1TW, %c1TW.lr.ph
  %lsr.iv = phi i64 [ %lsr.iv.next, %c1TW ], [ %5, %c1TW.lr.ph ]
  %ln1UF1 = phi i64 [ %ln1TL1, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1UA = inttoptr i64 %lsr.iv to i64*
  store i64 1337, i64* %ln1UA, align 8
  %ln1UF = add i64 %ln1UF1, 1
  %lsr.iv.next = add i64 %lsr.iv, %6
  %ln1TQ = icmp slt i64 %ln1UF, %ln1TP2
  br i1 %ln1TQ, label %c1TW, label %n1TX.loopexit

Note that the original loop contained a store and two loads, but the optimised loop contains only a single store: our new alias analysis has allowed the loads to be floated out of the loop. This has in turn allowed LLVM to discover the loop induction variable and apply strength reduction - note that the final loop never uses the multiplication instruction!

The final program runs 8.8% faster than the version that is compiled without the custom alias analysis.

Conclusions

My custom alias analyser for GHC-generated code gives LLVM much more room for applying its existing powerful optimisation. There is plenty of scope for improvement, though:

  1. I’d really like people to report their experiences using with this alias analyser and the LLVM backend. Do you see a big speed boost on your data-parallel Haskell programs, for example?

  2. Of course, I would like this alias analyser to included with GHC so you can all seamlessly benefit from it. I’ll be working with GHC HQ to make this happen.

  3. I think there is still scope for getting even more useful information about GHC-generated code into LLVM. For example, currently LLVM is unable to eliminate stores to stack locations that we can see will never be accessed because we make a tail call to another function with a stack pointer that points above these locations. I can think of at least two ways to express this to LLVM, and this would produce another nice gain.

    If would also be great if we could teach LLVM something about the garbage collector, as currently if your loop does any allocation at all the presence of calls to the GC pessimises the output code a lot.

I was partly inspired to do this by Ben Lippmeier, whose paper at the Haskell Symposium this year had to do strength-reduction manually at the Haskell level because LLVM wasn’t working for him. I hope I’ve fixed that issue.

Performance problems were also a focus of the discussions about the future of Haskell at ICFP. I’ve been to these discussions three years in a row, and several topics keep cropping back up: performance, and the fact that Hackage 2.0 still isn’t released. I’ve grown tired of hearing so much talk about the issues with little-to-no action to resolve them, so I spent this post-ICFP week doing my best to fix them. I first wrote a documentation build bot for the Hackage 2.0 effort, and then moved on to the LLVM performance issues - if everyone helps to move these issues along then hopefully we can finally talk about some different problems next year!


Sep 10 2011

Constraint Kinds for GHC

I recently implemented a powerful new extension to GHC HEAD called ConstraintKinds. This (Literate Haskell) post will explain what this means, and how we can exploit it to do some cool stuff.

(For long-time readers, this stuff is a generalisation of my earlier post about constraint families which was later also expounded on by Dominic Orchard and Tom Schrijvers in Type Constraints Unleashed. The proposal in its current form is due to Conor McBride.)

First of all, we’re going to turn on a whacking great load of extensions:

{-# LANGUAGE UndecidableInstances,
             MultiParamTypeClasses,
             KindSignatures,
             Rank2Types,
             ConstraintKinds,
             FlexibleInstances,
             OverlappingInstances #-}

(Yes, some of the cooler examples will require UndecidableInstances. Never mind!)

Let’s have some imports as well:

import qualified Data.Set as S

When we talk about constraints in Haskell, we usually mean one of the following things:

  • Class contexts such as Show a
  • Implicit parameters, such as ?x::Int
  • Equality assertions, such as a ~ Int
  • Tuples of any of the above, such as (Show a, Read a)

Is standard Haskell, these constraints can only occur to the left of => arrow, and they are the only thing that can appear there. With the ConstraintKinds extension, we instead allow any type of a brand-new kind Constraint to appear to the left of =>. Naturally, all of the constraints we already mentioned are parsed as types, and are all given an appropriate kind:

  • Show :: * -> Constraint
  • (?x::Int) :: Constraint
  • (a ~ Int) :: Constraint
  • (Show a, Read a) :: Constraint

Constraint synonyms

At the simplest level, this unification of constraints and types means that code like the following is valid:

type Func cxt a = cxt a => a -> a
 
incc :: Func Num a
incc = (+1)

Or we can even use type synonyms as constraint synonyms:

type Stringy a = (Show a, Read a)
 
viaString :: Stringy a => a -> a
viaString = read . show

Simulating this without the extension is a little more cumbersome:

class (Show a, Read a) => Stringy a where
instance Stringy a where

Indexed constraints

But it doesn’t stop there. Since constraints are just types, we can type-index them using type functions! We can use this to solve the well-known problem where lists can be an instance of the Monad type class, but sets cannot. This problem arises because the elements of a set must be orderable, but e.g. the return method of the Monad class allows an element of any type to be made into an “element” of the monad — not only the orderable ones.

A restricted monad is a monad where we need to impose some constraints on the elements it can contain. Existing Hackage packages such as Ganesh Sittampalam’s rmonad package provide a way to define these monads in unextended Haskell. However, with our new extension we get a much smoother user experience by reusing the type function mechanism to encode a class of restricted monads:

class RMonad m where
  type RMonadCtxt m a :: Constraint
  return :: RMonadCtxt m a => a -> m a
  (>>=) :: (RMonadCtxt m a, RMonadCtxt m b) => m a -> (a -> m b) -> m b

Lists can of course be an instance of this class:

instance RMonad [] where
  type RMonadCtxt [] a = ()
  return x = [x]
  (>>=) = flip concatMap

But now so can sets:

instance RMonad S.Set where
  type RMonadCtxt S.Set a = Ord a
  return = S.singleton
  mx >>= fxmy = S.fromList [y | x <- S.toList mx, y <- S.toList (fxmy x)]

Another feature I added to GHC recently is associated type defaults. With this, we can change the RMonad class definition so that normal Monads which do not make any special demands of their element types can be defined without giving an explicit instance for RMonadCtxt:

class RMonad m where
  type RMonadCtxt m a :: Constraint
  type RMonadCtxt m a = ()
  return :: ...
  (>>=) :: ...

(Associated type defaults were always described in the published papers about associated types, but were never implemented until now).

Reified dictionaries

A common trick is to reify a constraint as an explicit dictionary using a GADT:

data ShowDict a where
  ShowDict :: Show a => ShowDict a
 
showish :: ShowDict a -> a -> String
showish ShowDict x = show x
 
use_showish :: String
use_showish = showish ShowDict 10

With our extension we can generalise this so you can define one reified dictionary to rule them all:

data Dict ctxt where
  Dict :: ctxt => Dict ctxt
 
showish' :: Dict (Show a) -> a -> String
showish' Dict x = show x
 
use_showish' :: String
use_showish' = showish' Dict 10

Generic programming

In “Scrap Your Boilerplate With Class”, Simon Peyton Jones and Ralf Laemmel proposed an encoding for generic functions in terms of type classes. However, their presentation was impeded by the fact that they could not abstract over type classes, and they had to have a heavy encoding mechanism to make it work. With our new extension we can write generic functions in their style in a much cleaner fashion.

First, we define the class of Data which has a generic mapping operation that applies a type-indexed function one level down in the data structure, returning all the results as a list:

class (cxt a) => Data cxt a where
  gmapQ :: Proxy cxt -> (forall b. Data cxt b => b -> r) -> a -> [r]

The cxt type variable will later be instantiated to a type class corresponding to the generic function we wish to apply. The Proxy cxt argument to gmapQ is an unfortunate artifact of fact that Haskell still has no explicit type applications, so we have to use dummy value arguments to disambiguate which cxt we actually mean when we call gmapQ. The definition is trivial:

data Proxy (ctxt :: * -> Constraint) = Proxy

We can define Data instances for some built in types:

instance (cxt Int) => Data cxt Int where
  gmapQ _ f n = []
 
instance (cxt [a], Data cxt a) => Data cxt [a] where
  gmapQ _ f [] = []
  gmapQ _ f (x:xs) = [f x, f xs]

Now we can define a generic function gsize:

class Size a where
  gsize :: a -> Int

We can say how gsize works on particular types by giving an instance:

instance Size Int where
  gsize x = x

If no other instance is available, an overlapping instance based on gmapQ will be used:

instance Data Size t => Size t where
  gsize t = 1 + sum (gmapQ (Proxy :: Proxy Size) gsize t)

We can now evaluate gsize at both types Int and [Int] even though we never said explicitly what it means to take the size of a list:

use_gsize :: Int
use_gsize = gsize (1 :: Int) + gsize [1 :: Int, 2]

Wrapping up

The ConstraintKinds extension makes these three idioms much neater, but I’m sure there are plenty of other places where this new power will come in useful. Try it out for yourself in GHC 7.4 and find out!

Thanks are due to Simon Marlow for organising CamHac, where I started working on the implementation, and Dominic Orchard and Nicolas Wu who collaborated with me during the early stages of coding. Thanks also to Simon Peyton Jones for invaluable advice that finally let me merge it into GHC.


Mar 8 2011

How to build 32/64 bit fat (universal) binaries

The OS X version of the Glasgow Haskell Compiler compiles Haskell into 32-bit code. Unfortunately, this means that if you are on a system where it is the default for libraries to be built in 64-bit mode, you tend to get errors when linking Haskell code telling you that you are trying to link 32-bit code against 64-bit code.

The best solution to this problem is to build all libraries you intend to link to from Haskell code as universal binaries that include both 32-bit and 64-bit versions of the code. These libraries will then work seamlessly with both Haskell code and also when pulled in as part of the build process for non-Haskell 64-bit executables.

If you can install the library using MacPorts, this is easy to do. Instead of doing:

sudo port install mylibrary

Just do:

sudo port install mylibrary +universal

However, if the library you want is not available through MacPorts or the MacPorts version is not up to date you will need to know how to build these universal libraries for yourself. This is the process that I aim to explain in this post. I'm going to use igraph as my example library because it's what I needed to install (I needed to install the unreleased v0.6).

The easy method

If you are lucky, building a universal library is as simple as changing how you invoke make. Run the library's configure scripts etc as usual, and then invoke make as follows:

make CXXFLAGS="-arch i386 -arch x86_64" CFLAGS="-arch i386 -arch x86_64" LDFLAGS="-arch i386 -arch x86_64"

The -arch flags tell GCC and the linker to build and link both versions of the library. If this works, you are done. In the case of igraph, this wasn't quite enough - the above command failed with this error:

gcc-4.2: -E, -S, -save-temps and -M options are not allowed with multiple -arch flags

The reason that this occurs is because igraph invokes GCC with the -M series of flags that generate makefile dependency rules from the C code - but GCC doesn't like generating those rules for two architectures simultaneously. Luckily, there was an easy workaround in my case - I just needed to reconfigure igraph as follows:

./configure --disable-dependency-tracking

The --disable-dependency-tracking flag just stops Automake from determining the dependencies of each C file as it compiles it. It is totally harmless to disable this because that dependency information is only used in order to rebuild less stuff upon subsequent invocations of make - the worst that happens when you disable it is that if you make more than once you will have to wait a bit longer. For more information on this feature see also the relevant section of the Automake manual.

After reconfiguring in this manner, the original make invocation worked correctly for igraph.

The hard method

The above method may perhaps fail for some libraries, in which case you can use this more arduous manual method. The idea is to run the library's build process from scratch twice: once to get the 32-bit library and once for the 64-bit library. We can then use the lipo to glue together the build artifacts from the two runs.

We start by building the 32-bit version:

make clean
make CXXFLAGS=-m32 CFLAGS=-m32 LDFLAGS=-m32 -j12

We now need to store the 32-bit build artifacts somewhere. Exactly which files you have to save will vary according to the library you are building, but for igraph this was sufficient:

mkdir -p ~/Junk/32 ~/Junk/64
cp src/.libs/libigraph.{a,0.dylib} ~/Junk/32

Now do the 64-bit build and once again save the artifacts somewhere:

make clean
make CXXFLAGS=-m64 CFLAGS=-m64 LDFLAGS=-m64 -j12
cp src/.libs/libigraph.{a,0.dylib} ~/Junk/64

Finally we can use lipo to finish up:

lipo -create ~/Junk/{32,64}/libigraph.a -output src/.libs/libigraph.a
lipo -create ~/Junk/{32,64}/libigraph.0.dylib -output src/.libs/libigraph.0.dylib

At this point, you can do sudo make install and get a universal version of the library installed.

If you want to check that your libraries are indeed universal, you can use lipo -info:

$ lipo -info src/.libs/libigraph.a
Architectures in the fat file: src/.libs/libigraph.a are: i386 x86_64

Conclusions

Building universal 32-bit/64-bit binaries is apparently fairly straightforward but it was tricky to find documentation for the process. I hope this article helps others who need to get this done.


Jan 28 2011

Solving GHC iconv problems on OS X 10.6

A problem that has plagued my GHC installation for a while is that whenever I tried to install any non-trivial package I would get a horrible link error like this:

Undefined symbols:
  "_iconv_close", referenced from:
      _hs_iconv_close in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_close)
  "_iconv_open", referenced from:
      _hs_iconv_open in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_open)
  "_iconv", referenced from:
      _hs_iconv in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_open, _hs_iconv_close , _hs_iconv )
  "_locale_charset", referenced from:
      _localeEncoding in libHSbase-4.3.1.0.a(PrelIOUtils.o)
ld: symbol(s) not found
collect2: ld returned 1 exit status

The reason for this is a combination of several factors:

  • The base library that comes with the GHC binary distribution wants to link against the standard Mac iconv
  • I have installed MacPorts libiconv, which renames the function that is named iconv_open in the standard iconv to libiconv_open
  • The Haskell library being installed by cabal depends transitively on some library that was built with something like extra-lib-dirs: /opt/local/lib, which causes -L/opt/local/lib to be passed to the linker
  • The linker's -L/opt/local/lib option occurs before -L/usr/lib, so the linker prefers to link against the MacPorts libiconv instead of the system one

In my case, it was the Haskell readline wrapper that was causing /opt/local/lib to be pulled in. I had to link the Haskell readline against MacPorts readline because the standard Mac libreadline is actually libeditline, which is almost-but-not-quite compatible and misses some crucial features.

There are several ways to fix the problem:

  • Perhaps you don't really need the MacPorts libiconv. In this case, you can stop it from being used by just doing port deactivate libiconv. This is the route I took.
  • Perhaps it's OK to link this particular library against the system libraries in preference to the MacPorts one. In this case, you can configure the package with cabal configure --extra-lib-dir=/usr/lib, so /usr/lib is searched before the MacPorts directory. This may fail if the package that needed -L/opt/local/lib requires a MacPorts version of some library that is also present in /usr/lib, though.
  • You could build GHC yourself and link it against the MacPorts library versions. This is not for the faint-hearted, but if the version of GHC you need is in MacPorts I imagine that you can just do port install ghc

I'm glad I've finally got this sorted out. If you are still having trouble, you might find some helpful information in the threads that finally helped me to overcome the issue and prompted this writeup.


Apr 3 2010

Ditaa support for gitit

I hacked together a quick plugin for the most excellent gitit wiki today. It's written in Haskell, so it's an absolute pleasure to write code for it.

What I added support for is a neat little tool called ditaa (DIagrams Through Ascii Art). Basically, in the markdown source of your Gitit wiki you can now write something like the following:

~~~ {.ditaa}
+--------+   +-------+    +-------+
|        | --+ ditaa +--> |       |
|  Text  |   +-------+    |diagram|
|Document|   |!magic!|    |       |
|     {d}|   |       |    |       |
+---+----+   +-------+    +-------+
    :                         ^
    |       Lots of work      |
    +-------------------------+
~~~

The plugin will then call out to the ditaa command line tool (written in Java, boo!) to render that to a beautiful image:

A ditaa image on a Gitit page

To get this set up for yourself, try the following from the root of your Gitit wiki:

git clone git://github.com/batterseapower/gitit-plugins.git batterseapower-plugins
wget http://downloads.sourceforge.net/project/ditaa/ditaa/0.9/ditaa0_9.zip?use_mirror=kent -O ditaa0_9.zip
unzip ditaa0_9.zip

Now edit your Gitit configuration file so the plugins list includes my plugin:

plugins: batterseapower-plugins/Ditaa.hs

That's it - restart Gitit and you should be ready to go!


May 11 2009

Constraint families

Various people, notably John Meacham, have proposed adding "context aliases" to Haskell. The basic idea is that you could write declarations like the following in Haskell:

context Num a = (Monoid a, Group a, Multiplicative a, FromInteger a)

Now, what this means is that when you write Num a in a constraint, you really mean all of Monoid a, Group a and so on. This means that the following program is valid, and presumably computes the number 7:

foo :: Num a => a -> a
foo = fromInteger 2 `mappend` fromInteger 5

This lets you write shorter type signatures in programs which make ubiquitous use of type classes. However, in the brave new world of type families an obvious generalisation is to allow class-associated constraints. In particular, this lets us solve the classic problem where you can't make Set an instance of Monad:

class RMonad m where
    context RMonadElem a
    
    return :: RMonadElem a => a -> m a
    (>>=) :: (RMonadElem a, RMonadElem b) => m a -> (a -> m b) -> m b

instance RMonad [] where
    context RMonadElem a = ()
    
    return x = [x]
    (>>=) = concatMap

instance RMonad Set where
    context RMonadElem a = Ord a
    
    return x = singleton x
    (>>=) = fold (\a s' -> union (f a) s') empty s

A few interesting points:

  1. What is the kind signature of the context synonym? We probably need another "kind" - that of class constraints - which is preserved by n-ary tupling.
  2. Can you provide a default implementation for the kind synonym? This would let us change the definition of the Monad type class in a backward compatible way, by defaulting RMonadElem a to ()
  3. I mentioned this idea to Ganesh at Fun In The Afternoon, and he told me about his rmonad package, which essentially does exactly this, but by reifying the dictionaries explicitly as data. This is a nice demonstration that the approach is workable, but I think we could really do without the boilerplate dictionary management.
  4. Amusingly, GHC currently represents type classes internall as a normal data type, with some extra invariants. This means that most of the existing machinery for dealing with associated type synonyms could probably be used changed to implement this extension!

I don't think that this introduces any horrendous type checking problems, and I can see how the desugarer has to treat dictionaries arising from such contexts. Nonetheless, there are probably some weirdnesses that I'm forgetting, so I should probably try to come up with a real specification (and implementation!) when I get some time...

(P.S: It looks like some guys at Hac5 were working on adding the simple constraint families to GHC - does anyone know how far they got with that?)


May 10 2009

New paper: Types Are Calling Conventions

I've just submitted a paper, coauthored with Simon Peyton Jones, to the Haskell Symposium. In this paper, we outline what we think is an interesting point in the design space of intermediate languages for a lazy functional programming language like Haskell, and show some cool optimisations that we can do with it that are hard or impossible to express in the intermediate language used by GHC today.

Although this mainly represents a potential improvement in GHC's internals, where I'd really like to go with this is to push the ability to make a distinction between strict and lazy data into the type system of Haskell itself. This would mean that you could, for example, write functions that produce element-strict lists, and document some of the strictness properties of your functions in their types.

If any of this sounds interesting to you, you can obtain the paper from my freshly-minted Computer Lab website. You can leave any comments you may have on the corresponding Wiki page.


Aug 31 2008

Hackage Releases Made Easy

The Haskell community has built up a great resource: the Hackage Haskell package database, where we recently hit the 500-package mark!

One of those 500 packages was mine, I added another to their number just an hour ago, and I've got two more in the oven. Given, then, that I'm starting to maintain a few packages, I went to the trouble of automating the Hackage release process, and in this post I'm going to briefly walk through setting up this automated environment.

  1. Install cabal-upload from Hackage. I'm afraid that at the time of writing this is not perfectly simple because it won't build with GHC 6.8 or above: this can be fixed with a new .cabal file, however, which I've made available here. (Edit: I've just noticed that this functionality seems to have been added to Cabal itself! You may just be able to use cabal upload. However, I'm not sure what the right config file location is for the next step).
  2. Add a file containing your Hackage username and password in the format ("username","password") called ~/.cabal-upload/auth.
  3. Copy the following shell script into a file called release in the root of your project (the same directory as the Setup.lhs file):
    #!/bin/bash
    #
    
    echo "Have you updated the version number? Type 'yes' if you have!"
    read version_response
    
    if [ "$version_response" != "yes" ]; then
        echo "Go and update the version number"
        exit 1
    fi
    
    sdist_output=`runghc Setup.lhs sdist`
    
    if [ "$?" != "0" ]; then
        echo "Cabal sdist failed, aborting"
        exit 1
    fi
    
    # Want to find a line like:
    # Source tarball created: dist/ansi-terminal-0.1.tar.gz
    
    # Test this with:
    # runghc Setup.lhs sdist | grep ...
    filename=`echo $sdist_output | sed 's/.*Source tarball created: \([^ ]*\).*/\1/'`
    echo "Filename: $filename"
    
    if [ "$filename" = "$sdist_output" ]; then
        echo "Could not find filename, aborting"
        exit 1
    fi
    
    # Test this with:
    # echo dist/ansi-terminal-0.1.tar.gz | sed ...
    version=`echo $filename | sed 's/^[^0-9]*\([0-9\.]*\).tar.gz$/\1/'`
    echo "Version: $version"
    
    if [ "$version" = "$filename" ]; then
        echo "Could not find version, aborting"
        exit 1
    fi
    
    echo "This is your last chance to abort! I'm going to upload in 10 seconds"
    sleep 10
    
    git tag "v$version"
    
    if [ "$?" != "0" ]; then
        echo "Git tag failed, aborting"
        exit 1
    fi
    
    # You need to have stored your Hackage username and password as directed by cabal-upload
    # I use -v5 because otherwise the error messages can be cryptic :-) 
    cabal-upload -v5 $filename
    
    if [ "$?" != "0" ]; then
        echo "Hackage upload failed, aborting"
        exit 1
    fi
    
    # Success!
    exit 0
  4. When you're ready to release something, simply run the shell script! Not only will this package up your project and upload it to Hackage, it will also add a version tag to your Git repository (obviously you should change this bit if you are using another VCS!).

If you would like to follow my continuing adventures in Haskell open source, please check out my GitHub profile! Patches gratefully accepted :-)


Aug 11 2008

Compiler Plugins AngloHaskell Talk

I was given the opportunity to speak on the topic of compiler plugins for GHC at AngloHaskell 2008 last weekend, and the slides and audio are now available here for those interested.

I wish they had taken video instead, as I went forwards and backwards over my slides quite a bit and wrote quite a lot on the whiteboard. I fear this may have left the presentation entirely incomprehensible to those who weren't there in person (and maybe even some people who were :-) .

The other presenters were great, and I especially enjoyed Neil Mitchells talk on how Hoogle works, as type-searching is an area I was totally unfamiliar with. If you're a local Haskeller, I thoroughly reccomend attending the 2009 event!


Jul 21 2008

Compiler Plugins For GHC: Week Six

Are we six weeks into it already? It's flown by. What did I get up to this week?

Core HTML Output
I bring you the "mystery" project that I promised you last week: a HTML pretty printer for GHCs Core language! Features of my implementation are:

  • Syntax highlighting
  • Hyperlinked variables: click one to jump to its definition site if it refers to a local name, or to a Hoogle search for it otherwise
  • Mark interesting parts of the Core output with a thick border by clicking on a binder
  • Hover over variable usage sites to highlight their binding sites
  • Handy index phases run by the compiler, click a phase name to jump to the Core output by that phase

I've put some sample output up here for you to try out, but beware: it's a 500kB document!

For the inspiration for this project I owe a debt of gratitude to Neil Mitchell's YHC.Core.HTML. If you are interested in other human readable Core output formats, check out Don Stewart's ghc-core package, which gives you a syntax-highlighted command line pager for Core.

Spit And Polish
I spent the vast majority of last week tidying up loose ends, hunting and squashing bugs and preening my sample plugin code. This reflects the fact that the torrent of new GHC features I've announced here are being bedded down in preparation for finishing the project off and getting them merged into HEAD.

Conclusion
More of the same this week: a focus on tidying up and getting some solid documentation done. I want to get something releasable ready well before the deadline which I can then enhance with discretionary improvements without fear of leaving the Summer of Code period without having something ready for real-world use.

Since this activity is all rather tedious to the casual Summer of Code follower, this will probably be my last weekly blog post. However, I'll still try and write about any major developments in the project: see you then!