Jan 28 2011

Solving GHC iconv problems on OS X 10.6

A problem that has plagued my GHC installation for a while is that whenever I tried to install any non-trivial package I would get a horrible link error like this:

Undefined symbols:
  "_iconv_close", referenced from:
      _hs_iconv_close in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_close)
  "_iconv_open", referenced from:
      _hs_iconv_open in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_open)
  "_iconv", referenced from:
      _hs_iconv in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_open, _hs_iconv_close , _hs_iconv )
  "_locale_charset", referenced from:
      _localeEncoding in libHSbase-4.3.1.0.a(PrelIOUtils.o)
ld: symbol(s) not found
collect2: ld returned 1 exit status

The reason for this is a combination of several factors:

  • The base library that comes with the GHC binary distribution wants to link against the standard Mac iconv
  • I have installed MacPorts libiconv, which renames the function that is named iconv_open in the standard iconv to libiconv_open
  • The Haskell library being installed by cabal depends transitively on some library that was built with something like extra-lib-dirs: /opt/local/lib, which causes -L/opt/local/lib to be passed to the linker
  • The linker's -L/opt/local/lib option occurs before -L/usr/lib, so the linker prefers to link against the MacPorts libiconv instead of the system one

In my case, it was the Haskell readline wrapper that was causing /opt/local/lib to be pulled in. I had to link the Haskell readline against MacPorts readline because the standard Mac libreadline is actually libeditline, which is almost-but-not-quite compatible and misses some crucial features.

There are several ways to fix the problem:

  • Perhaps you don't really need the MacPorts libiconv. In this case, you can stop it from being used by just doing port deactivate libiconv. This is the route I took.
  • Perhaps it's OK to link this particular library against the system libraries in preference to the MacPorts one. In this case, you can configure the package with cabal configure --extra-lib-dir=/usr/lib, so /usr/lib is searched before the MacPorts directory. This may fail if the package that needed -L/opt/local/lib requires a MacPorts version of some library that is also present in /usr/lib, though.
  • You could build GHC yourself and link it against the MacPorts library versions. This is not for the faint-hearted, but if the version of GHC you need is in MacPorts I imagine that you can just do port install ghc

I'm glad I've finally got this sorted out. If you are still having trouble, you might find some helpful information in the threads that finally helped me to overcome the issue and prompted this writeup.


Jan 19 2011

The case of the mysterious "thread blocked indefinitely in an MVar operation" exception

I recently tracked down the cause of the persistent "thread blocked indefinitely in an MVar operation" exceptions I was getting from the GHC runtime when I used my parallel-io library. There doesn't seem to be much information about this exception on the web, so I'm documenting my experiences here to help those unfortunate souls that come across it.

The essence of the parallel-io library is a combinator like this:

parallel :: [IO a] -> IO [a]

The list of IO actions on the input are run (possibly in parallel) and the results returned as a list. The library ensures that we never go beyond a certain (user specified) degree of parallelism, but that's by-the-by for this post.

There are a number of interesting design questions about the design of such a combinator, but the one that tripped me up is the decision about what to do with exceptions raised by the IO actions. What I used to do was have parallel run all the IO actions wrapped in a try, and wait for all of the actions to either throw an exception or complete normally. The combinator would then either return a list of the results, or, if at least one action threw an exception, the exception would be rethrown on the thread that had invoked parallel.

At first glance, this seems very reasonable. However, in this turns out to be a disastrous choice that all-but-guarantees your program will go down in flames with "thread blocked indefinitely in an MVar operation".

Consider this simple program:

-- The main action executes on thread 1:
main = do
    res1 <- newEmptyMVar
    res2 <- newEmptyMVar
    
    wait <- newEmptyMVar
    
    -- Spawn thread 2:
    forkIO $ saveExceptionsTo res1 $ do
        () <- takeMVar wait
        putMVar res1 (Right "Hello")
    
    -- Spawn thread 3:
    forkIO $ saveExceptionsTo res2 $ do
        throwIO $ ErrorCall "Oops!"
        
        putMVar wait () -- Unblocks thread 2
        putMVar res2 (Right "World")
    
    -- Wait for the results of both threads:
    liftM2 (,) (takeMVar res1) (takeMVar res2) >>= print

saveExceptionsTo :: MVar (Either SomeException a) -> IO () -> IO ()
saveExceptionsTo mvar act = act `catch` \e -> putMVar mvar (Left e)

This code is similar to what the parallel combinator does in that the main thread (thread 1) waits for thread 2 and thread 3 to both return a result or exception to res1 and res2 respectively. After it has both, it shows the exception or result.

The fly in the ointment is that thread 2 is waiting on thread 3, so the thread dependency graph looks something like this:

(You should read an arrow that points from thread A to thread B as thread B is waiting on thread A to unblock it).

Unfortunately, our programmer has written buggy code, and the code running in thread 3 throws an exception before it can wake up thread 2. Thread 3 writes to the res2 MVar just fine, but thread 1 is still blocked on the res1 MVar, which will never be written to. At this point, everything comes crashing down with "thread blocked indefinitely in an MVar operation".

The most annoying part is that the exception that originally caused the error is squirrelled away in an MVar somewhere, and we never get to see it!

This goes to show that the proposed semantics for exceptions above is dangerous. In the presence of dependencies between the actions in the parallel call, it tends to hide exceptions that we really want to see.

The latest version of the parallel-io library solves this by the simple method of using the asynchronous exceptions feature of GHC to rethrow any exceptions coming from an action supplied to parallel as soon as they occur. This unblocks thread 1 and lets us deal with the exception in whatever way is appropriate for our situation.

Although this bug is simple to describe and obvious in retrospect, it took a hell of a time to diagnose. It never ceases to amaze me exactly how difficult it is to write reliable concurrent programs!


Jan 18 2011

Interpreting GHC cost centre stacks

A little-known feature of GHC is the ability to run a program (that has been compiled with profiling) with the +RTS -xc option. Now, every time an exception is raised a stack trace will be dumped to stderr. This is the only way get stack traces for your Haskell code currently. Unfortunately:

  • This stack trace is constructed from the cost-centre stack, so you'll only see names from modules you compiled in profiling mode with -auto-all or similar
  • The stack trace is shown regardless of whether the exception in question is actually handled or not - so you are likely to see lots of spurious stack traces where exceptions have been caught and handled

It's still better than the alternative (i.e. wailing about the lack of proper Haskell stack traces) though.

Now, the question is how to interpret the mess you get out. Here is an example stack trace from my current debugging session (I've added linebreaks to aid comprehension):


<Control.Concurrent.ParallelIO.Local.parallelE,
Control.Concurrent.ParallelIO.Local.parallel,
Control.Concurrent.ParallelIO.Local.parallel_,
Control.Concurrent.ParallelIO.Local.withPool,
Development.Shake.shake,Development.Shake.CAF>

(I'm trying to figure out why OpenShake reliably dies with the dreaded "blocked indefinitely on MVar" exception.)

The answer lies in the GHC source code (see fprintCSS in Profiling.c). The items at the front of the stack trace are at the top of the stack and hence correspond to the approximate to location from which the exception was thrown.

Looking forward, it would be awesome if e.g. the Google Summer of Code could sponsor a student to take my work on GHC ticket 3693 further. Personally I would find stack trace support for Haskell totally invaluable, and I'm sure I'm not alone.


Jan 18 2011

Blogging is hard, let's go tweeting!

I seem to have been fairly consistent at using my Twitter account, so look there if you want information about my latest projects (there are lots of them!) that is updated more frequently than this blog.

I will still try and blog here when I can summon the energy to write a longer-form article.