<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>:: (Bloggable a) =&#62; a -&#62; IO ()</title>
	<atom:link href="http://blog.omega-prime.co.uk/?feed=rss2&#038;p=127" rel="self" type="application/rss+xml" />
	<link>http://blog.omega-prime.co.uk</link>
	<description></description>
	<lastBuildDate>Thu, 06 Dec 2012 13:28:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Rpath emulation: absolute DLL references on Windows</title>
		<link>http://blog.omega-prime.co.uk/?p=138</link>
		<comments>http://blog.omega-prime.co.uk/?p=138#comments</comments>
		<pubDate>Thu, 06 Dec 2012 13:28:52 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Solutions]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=138</guid>
		<description><![CDATA[When creating an executable or shared library on Linux, it’s possible to include an ELF RPATH header which tells the dynamic linker where to search for the any shared libraries that you reference. This is a pretty handy feature because it can be used to nail down exactly which shared library you will link against, [...]]]></description>
			<content:encoded><![CDATA[<p>When creating an executable or shared library on Linux, it’s possible to include an ELF <a href="http://en.wikipedia.org/wiki/Rpath">RPATH header</a> which tells the dynamic linker where to search for the any shared libraries that you reference. This is a pretty handy feature because it can be used to nail down exactly which shared library you will link against, without leaving anything up to chance at runtime.</p>
<p>Unfortunately, Windows does not have an equivalent feature. However, it does have an <em>undocumented</em> feature which may be enough to replace your use of rpath if you are porting software from Linux.</p>
<p>Executables or DLLs or Windows always reference any DLLs that they import by name only. So, the import table for an executable will refer to <code>kernel32.dll</code> rather than <code>C:\Windows\kernel32.dll</code>. Window’s dynamic loader will look for a file with the appropriate name in the <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx">DLL search path</a> as usual. (For full details on DLL import tables and more, you can check out my previous <a href="http://blog.omega-prime.co.uk/?p=115">in depth post</a>.)</p>
<p>However, Window’s dynamic loader will, as a completely undocumented (and presumably unsupported) feature, also accept <em>absolute paths</em> in the import table. This is game-changing because it means that you can hard-code exactly which DLL you want to refer to, just like you would be able to with rpath on Linux.</p>
<h1 id="demonstration">Demonstration</h1>
<p>To demonstrate this technique, we’re going to need code for a DLL and a referring EXE:</p>
<pre class="sourceCode c"><code class="sourceCode c">$ cat library.c
<span class="ot">#include &lt;stdio.h&gt;</span>

__declspec(dllexport) <span class="dt">int</span> librarycall(<span class="dt">void</span>) {
        printf(<span class="st">&quot;Made library call!</span><span class="ch">\n</span><span class="st">&quot;</span>);
        <span class="kw">return</span> <span class="dv">0</span>;
}

$ cat rpath.c
__declspec(dllimport) <span class="dt">int</span> librarycall(<span class="dt">void</span>);

<span class="dt">int</span> main(<span class="dt">int</span> argc, <span class="dt">char</span> **argv) {
        <span class="kw">return</span> librarycall();
}</code></pre>
<p>If we were building a DLL and EXE normally, we would do this:</p>
<pre><code>gcc -c library.c
gcc -shared -o library.dll library.o
gcc -o rpath rpath.c -L./ -llibrary</code></pre>
<p>This all works fine:</p>
<pre><code>$ ./rpath
Made library call!</code></pre>
<p>However, as you would expect, if you move <code>library.dll</code> elsewhere, the EXE will fail to start:</p>
<pre><code>$ mv library.dll C:/library.dll
$ ./rpath
/home/Max/rpath/rpath.exe: error while loading shared libraries: library.dll: cannot open shared object file: No such file or directory</code></pre>
<p>Now let’s work some magic! If we open up <code>rpath.exe</code> in a hex editor, we see something like this:</p>
<p><a href="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/rpath-exe-before.png"><img src="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/rpath-exe-before.png" alt="" title="rpath-exe-before" width="639" height="469" class="aligncenter size-full wp-image-140" /></a></p>
<p>Let’s just tweak that a bit to change the relative path to <code>library.dll</code> to an absolute path. Luckily there is enough padding to make it fit:</p>
<p><a href="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/rpath-exe-after.png"><img src="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/rpath-exe-after.png" alt="" title="rpath-exe-after" width="639" height="469" class="aligncenter size-full wp-image-142" /></a></p>
<p>The EXE will now work perfectly!</p>
<pre><code>$ ./rpath
Made library call!</code></pre>
<h1 id="in-practice">In practice</h1>
<p>Knowing that this feature exists is one thing. Actually making use of it in a reliable way is another. The problem is that to my knowledge no linkers are capable of creating a DLL or EXE which include an absolute path in their import tables. Sometimes we will be lucky enough that the linker creates an EXE or DLL with enough padding in it for us to manually edit in an absolute path, but with the method above there is no guarantee that this will be possible.</p>
<p>In order to exploit this technique robustly, we’re going to use a little trick with import libraries. Instead of using GCC’s ability to link directly to a DLL, we will generate an import library for the DLL, which we will call <code>library.lib</code>:</p>
<pre><code>$ dlltool --output-lib library.lib --dllname veryverylongdllname.dll library.o</code></pre>
<p>When you use <code>dlltool</code> you either need to write a <a href="http://msdn.microsoft.com/en-us/library/d91k01sh(v=vs.80).aspx"><code>.def</code> file</a> for the DLL you are creating an import library for, or you need to supply all the object files that were used to create the DLL. I’ve taken the second route here and just told <code>dlltool</code> that the our DLL was built from <code>library.o</code>.</p>
<p>Now we have an import library, we can do our hex-editing trick again, but this time on the library. Before:</p>
<p><a href="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/library-lib-before.png"><img src="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/library-lib-before.png" alt="" title="library-lib-before" width="639" height="469" class="aligncenter size-full wp-image-139" /></a></p>
<p>And after (note that I have null-terminated the new absolute path):</p>
<p><a href="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/library-lib-after.png"><img src="http://blog.omega-prime.co.uk/wp-content/uploads/2012/12/library-lib-after.png" alt="" title="library-lib-after" width="639" height="469" class="aligncenter size-full wp-image-141" /></a></p>
<p>The beauty of editing the import library rather than the output of the linker is that using the <code>--dllname</code> option we can ensure that the import library contains as much space as we need to fit the entire absolute path of the DLL, no matter how long it may be. This is the key to making robust use of absolute paths in DLL loading, even if linkers don’t support them!</p>
<p>Now we have the import library, we can link <code>rpath.exe</code> again, but this time using the import library rather than <code>library.dll</code>:</p>
<pre><code>$ gcc -o rpath rpath.c library.lib
$ ./rpath
Made library call!</code></pre>
<p>Yes, it really is using the DLL on the <code>C:</code> drive:</p>
<pre><code>$ mv C:/library.dll C:/foo.dll
$ ./rpath
/home/Max/rpath/rpath.exe: error while loading shared libraries: C:\library.dll: cannot open shared object file: No such file or directory</code></pre>
<h1 id="conclusion">Conclusion</h1>
<p>I haven’t seen this technique for using absolute paths for DLL references anywhere on the web, so it doesn’t seem to be widely known. However, it works beautifully on Windows 7 and probably on all other versions of Windows as well.</p>
<p>I may apply these techniques to the Glasgow Haskell Compiler in order to improve the support for Haskell shared objects on Windows: more information on this topic can be found on the <a href="http://hackage.haskell.org/trac/ghc/wiki/DynamicByDefault">GHC wiki</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=138</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GHC-specific Alias Analysis for LLVM</title>
		<link>http://blog.omega-prime.co.uk/?p=135</link>
		<comments>http://blog.omega-prime.co.uk/?p=135#comments</comments>
		<pubDate>Fri, 30 Sep 2011 08:45:22 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Assembly]]></category>
		<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=135</guid>
		<description><![CDATA[The setup A few years ago, David Terei did some great work adding a LLVM backend to the Glasgow Haskell Compiler. The idea with this is that instead of writing our own optimiser and assembly-code generators for our custom three-address-code, we can just translate into LLVM IR and have LLVM do the heavy lifting. In [...]]]></description>
			<content:encoded><![CDATA[<h2 id="the-setup"><a href="#TOC">The setup</a></h2>
<p>A few years ago, <a href="http://www.scs.stanford.edu/~davidt/">David Terei</a> did some great work adding a <a href="http://llvm.org/">LLVM</a> backend to the Glasgow Haskell Compiler. The idea with this is that instead of writing our own optimiser and assembly-code generators for our custom three-address-code, we can just translate into LLVM IR and have LLVM do the heavy lifting. In theory, this means that GHC will be able to compile for many different CPUs, and will benefit from the smart optimisations the LLVM team have implemented.</p>
<p>The portability part has definitely worked out for us: for example, a <a href="http://ghcarm.wordpress.com/">couple of people</a> have successfully got GHC to compile for the ARM by using the LLVM backend. However, the promise of LLVM being able to speed up our generated code has never really been fully borne out. LLVM-generated code <strong>does</strong> tend to be better than that produced by GHCs own backends, but this is mostly because LLVM is doing much better register allocation (it is much smarter about reusing the “pinned registers” required that form part of the interface between GHC’s generated code and the garbage collector).</p>
<p>The reason that LLVM does not optimise as much as we would like is often to do with aliasing. In particular, LLVM conservatively assumes that GHC’s stack (which is explicitly represented in the generated code as an array of words) and the heap may alias.</p>
<h2 id="whats-the-problem"><a href="#TOC">What’s the problem?</a></h2>
<p>A concrete example of this is the following Haskell program:</p>
<pre class="haskell"><code>module Main(main) where

import Data.Array.Base
import Data.Array.IO
import Data.Array.MArray

main :: IO ()
main = do
    arr &lt;- newArray_ (0, 200)
    go arr 2 0 100

go :: IOUArray Int Int -&gt; Int -&gt; Int -&gt; Int -&gt; IO ()
go arr stride x y | x &lt; y     = do unsafeWrite arr (x * stride) 1337
                                   go arr stride (x + 1) y
                  | otherwise = return ()
</code></pre>
<p>This loop compiles to fairly good Core:</p>
<pre class="haskell"><code>Main.main_$s$wa =
  \ (@ sg0_sKA::Data.Array.Base.STUArray
                  GHC.Prim.RealWorld GHC.Types.Int GHC.Types.Int
                  ~
                Data.Array.IO.Internals.IOUArray GHC.Types.Int GHC.Types.Int)
    (sc_sKs :: GHC.Prim.State# GHC.Prim.RealWorld)
    (sc1_sKt :: GHC.Prim.Int#)
    (sc2_sKu :: GHC.Prim.Int#)
    (sc3_sKv :: GHC.Prim.Int#)
    (sc4_sKw :: GHC.Types.Int)
    (sc5_sKx :: GHC.Types.Int)
    (sc6_sKy :: GHC.Types.Int)
    (sc7_sKz :: GHC.Prim.MutableByteArray# GHC.Prim.RealWorld) -&gt;
    case GHC.Prim.&lt;# sc2_sKu sc1_sKt of _ {
      GHC.Bool.False -&gt; (# sc_sKs, GHC.Unit.() #);
      GHC.Bool.True -&gt;
        case GHC.Prim.writeIntArray#
               @ GHC.Prim.RealWorld
               sc7_sKz
               (GHC.Prim.*# sc2_sKu sc3_sKv)
               1337
               sc_sKs
        of s2#_aHo { __DEFAULT -&gt;
        Main.main_$s$wa
          @ (sym
               Data.Array.IO.Internals.NTCo:IOUArray GHC.Types.Int GHC.Types.Int)
          s2#_aHo
          sc1_sKt
          (GHC.Prim.+# sc2_sKu 1)
          sc3_sKv
          sc4_sKw
          sc5_sKx
          sc6_sKy
          sc7_sKz
        }
    }
</code></pre>
<p>One weird thing about this Core is that it passes around a number of dead arguments (<code>sc4_sKw</code>, <code>sc5_sKx</code> and <code>sc6_sKy</code>). This is a known bug in GHC, and is caused by a phase ordering problem. However, this particular infelicity should not prevent LLVM from being able to do the classic loop optimisation of <a href="http://en.wikipedia.org/wiki/Strength_reduction">strength reduction</a> on our code.</p>
<p>The particular strength reduction we would like to perform si to replace the multiplication <code>GHC.Prim.*# sc2_sKu sc3_sKv</code> in the <code>main_$s$wa</code> loop with an addition. This is possible because the left operand <code>sc2_sKu</code> is a loop induction variable, increasing by 1 every iteration. Thus, on every iteration the value of the multiplication <code>GHC.Prim.*# sc2_sKu sc3_sKv</code> is just the value of the multiplication on the previous loop, plus <code>sc3_sKv</code>. Thus, by adding a loop variable that records the value of the multiplication on the previous iteration, we can replace the multiplication by an addition.</p>
<p>Unfortunately, LLVM currently can’t strength-reduce this loop in the suggested way. As we will soon see, this is due to aliasing.</p>
<h2 id="why-does-the-problem-happen"><a href="#TOC">Why does the problem happen?</a></h2>
<p>We can immediately see the problem if we look at the optimised LLVM code for this loop:</p>
<pre class="llvm"><code>c1TW.lr.ph:
  ...
  %ln1TL1 = load i64* %Sp_Arg, align 8
  ...

c1TW:                                             ; preds = %c1TW.lr.ph, %c1TW
  %ln1TL4 = phi i64 [ %ln1TL1, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1Uy = mul i64 %ln1Uu, %ln1TL4
  %ln1Uz = add i64 %ln1Uw, %ln1Uy
  %ln1UA = inttoptr i64 %ln1Uz to i64*
  store i64 1337, i64* %ln1UA, align 8
  %ln1UE = load i64* %Sp_Arg, align 8
  %ln1UF = add i64 %ln1UE, 1
  store i64 %ln1UF, i64* %Sp_Arg, align 8
  %ln1TP = load i64* %ln1TN, align 8
  %ln1TQ = icmp slt i64 %ln1UF, %ln1TP
  br i1 %ln1TQ, label %c1TW, label %n1TX.loopexit
</code></pre>
<p>The strength reduction optimisation depends on one of the operands to the multiplication being a loop induction variable. In our case, we expect that <code>sc2_sKu</code> will be such a variable. However, looking at the LLVM code we can see that the equivalent LLVM variable, <code>%ln1TL4</code>, has its induction-ness hidden because it is reloaded from the stack by <code>load i64* %Sp_Arg</code> on every iteration.</p>
<p>You might wonder why the store to the same stack location by <code>store i64 %ln1UF, i64* %Sp_Arg</code> is not forwarded to this <code>load</code> by LLVM. If this were to happen, we could get code like this:</p>
<pre class="llvm"><code>c1TW.lr.ph:
  ...
  %ln1TL1 = load i64* %Sp_Arg, align 8
  %ln1UE.ph = load i64* %Sp_Arg, align 8
  ...

c1TW:                                             ; preds = %c1TW.lr.ph, %c1TW
  %ln1TL4 = phi i64 [ %ln1TL1, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1UE = phi i64 [ %ln1UE.ph, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1Uy = mul i64 %ln1Uu, %ln1TL4
  %ln1Uz = add i64 %ln1Uw, %ln1Uy
  %ln1UA = inttoptr i64 %ln1Uz to i64*
  store i64 1337, i64* %ln1UA, align 8
  %ln1UF = add i64 %ln1UE, 1
  store i64 %ln1UF, i64* %Sp_Arg, align 8
  %ln1TP = load i64* %ln1TN, align 8
  %ln1TQ = icmp slt i64 %ln1UF, %ln1TP
  br i1 %ln1TQ, label %c1TW, label %n1TX.loopexit
</code></pre>
<p>In this code the fact that <code>%ln1UE</code> is an induction variable is obvious, and not obscured by an intermediate load from memory. And indeed, LLVM is able to strength-reduce this loop!</p>
<p>The reason that LLVM does not forward this load is because in general it is unsafe, since the <code>store</code> to <code>%ln1UA</code> might alias it if <code>%ln1UA</code> were equal to <code>%Sp_Arg</code>. The ridiculous thing about this is that we <strong>know</strong> that in the code generated by GHC, the stack pointer will never be stored away anywhere, so it can’t possible alias with the unknown pointer <code>%ln1UA</code> and LLVM is being unnecessarily conservative.</p>
<h2 id="the-solution"><a href="#TOC">The solution</a></h2>
<p>LLVM is a beautiful bit of software, and it provides exactly the extensibility point we require to resolve this problem: we can write our own <em>alias analysis</em> pass that knows that GHC’s stack never alias with any another non-stack pointer and dynamically load it into the LLVM optimisation tool chain.</p>
<p>This is exactly what I’ve done. The code is available as a <a href="https://gist.github.com/1253064">Gist</a>, and interested parties (who use OS X!) can build it like so:</p>
<pre><code>g++ -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-exceptions -fno-rtti -fno-common -Wall \
-Wl,-flat_namespace -dynamiclib GHCAliasAnalysis.cpp -o GHCAliasAnalysis.dylib -lLLVM-`llvm-config --version`
</code></pre>
<p>Once built, we can dynamically load the resulting <code>dylib</code> into LLVMs <code>opt</code> tool using the <code>-load</code> option, and then use the new <code>-ghc-aa</code> flag to tell LLVM to use our alias analyser as a complement to the default one. Unfortunately, due to an infelicity in LLVM, we have to specify <code>-ghc-aa</code> in between <strong>every single optimisation pass</strong> if we want to be sure that it is used. So the final command line to <code>opt</code>, including all passes done by the standard <code>-O2</code> optimisation level, and the <code>-loop-reduce</code> strength-reduction pass, needs to look something like this:</p>
<pre><code>opt -load GHCAliasAnalysis.dylib -S -no-aa -tbaa -basicaa -ghc-aa \
-globalopt -ghc-aa -ghc-aa -ipsccp -ghc-aa -deadargelim -ghc-aa -instcombine -ghc-aa -simplifycfg \
-ghc-aa -basiccg -ghc-aa -prune-eh -ghc-aa -inline -ghc-aa -functionattrs -ghc-aa -scalarrepl-ssa \
-ghc-aa -domtree -ghc-aa -early-cse -ghc-aa -simplify-libcalls -ghc-aa -lazy-value-info -ghc-aa \
-jump-threading -ghc-aa -correlated-propagation -ghc-aa -simplifycfg -ghc-aa -instcombine -ghc-aa \
-tailcallelim -ghc-aa -simplifycfg -ghc-aa -reassociate -ghc-aa -domtree -ghc-aa -loops -ghc-aa \
-loop-simplify -ghc-aa -lcssa -ghc-aa -loop-rotate -ghc-aa -licm -ghc-aa -lcssa -ghc-aa -loop-unswitch \
-ghc-aa -instcombine -ghc-aa -scalar-evolution -ghc-aa -loop-simplify -ghc-aa -lcssa -ghc-aa -indvars \
-ghc-aa -loop-idiom -ghc-aa -loop-deletion -ghc-aa -loop-unroll -ghc-aa -memdep -ghc-aa -gvn -ghc-aa \
-memdep -ghc-aa -memcpyopt -ghc-aa -sccp -ghc-aa -instcombine -ghc-aa -lazy-value-info -ghc-aa \
-jump-threading -ghc-aa -correlated-propagation -ghc-aa -domtree -ghc-aa -memdep -ghc-aa -dse \
-ghc-aa -adce -ghc-aa -simplifycfg -ghc-aa -instcombine -ghc-aa -strip-dead-prototypes -ghc-aa \
-constmerge -loop-reduce
</code></pre>
<p>(Yes, I know this is ridiculous! I hope the LLVM developers fix this soon.)</p>
<p>With my new alias analysis pass, LLVM is able to produce the following beautiful code for the loop:</p>
<pre><code>c1TW:                                             ; preds = %c1TW, %c1TW.lr.ph
  %lsr.iv = phi i64 [ %lsr.iv.next, %c1TW ], [ %5, %c1TW.lr.ph ]
  %ln1UF1 = phi i64 [ %ln1TL1, %c1TW.lr.ph ], [ %ln1UF, %c1TW ]
  %ln1UA = inttoptr i64 %lsr.iv to i64*
  store i64 1337, i64* %ln1UA, align 8
  %ln1UF = add i64 %ln1UF1, 1
  %lsr.iv.next = add i64 %lsr.iv, %6
  %ln1TQ = icmp slt i64 %ln1UF, %ln1TP2
  br i1 %ln1TQ, label %c1TW, label %n1TX.loopexit
</code></pre>
<p>Note that the original loop contained a store and two loads, but the optimised loop contains only a single store: our new alias analysis has allowed the loads to be floated out of the loop. This has in turn allowed LLVM to discover the loop induction variable and apply strength reduction - note that the final loop never uses the multiplication instruction!</p>
<p>The final program runs 8.8% faster than the version that is compiled without the custom alias analysis.</p>
<h2 id="conclusions"><a href="#TOC">Conclusions</a></h2>
<p>My custom alias analyser for GHC-generated code gives LLVM much more room for applying its existing powerful optimisation. There is plenty of scope for improvement, though:</p>
<ol>
<li>
<p>I’d really like people to report their experiences using with this alias analyser and the LLVM backend. Do you see a big speed boost on your data-parallel Haskell programs, for example?</p>
</li>
<li>
<p>Of course, I would like this alias analyser to included with GHC so you can all seamlessly benefit from it. I’ll be working with GHC HQ to make this happen.</p>
</li>
<li>
<p>I think there is still scope for getting even more useful information about GHC-generated code into LLVM. For example, currently LLVM is unable to eliminate stores to stack locations that we can see will never be accessed because we make a tail call to another function with a stack pointer that points above these locations. I can think of at least two ways to express this to LLVM, and this would produce another nice gain.</p>
<p>If would also be great if we could teach LLVM something about the garbage collector, as currently if your loop does any allocation at all the presence of calls to the GC pessimises the output code a lot.</p>
</li>
</ol>
<p>I was partly inspired to do this by <a href="http://www.cse.unsw.edu.au/~benl/">Ben Lippmeier</a>, whose paper at the Haskell Symposium this year had to do strength-reduction manually at the Haskell level because LLVM wasn’t working for him. I hope I’ve fixed that issue.</p>
<p>Performance problems were also a focus of the discussions about the future of Haskell at ICFP. I’ve been to these discussions three years in a row, and several topics keep cropping back up: performance, and the fact that Hackage 2.0 still isn’t released. I’ve grown tired of hearing so much talk about the issues with little-to-no action to resolve them, so I spent this post-ICFP week doing my best to fix them. I first wrote a documentation build bot for the Hackage 2.0 effort, and then moved on to the LLVM performance issues - if everyone helps to move these issues along then hopefully we can finally talk about some different problems next year!</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=135</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Constraint Kinds for GHC</title>
		<link>http://blog.omega-prime.co.uk/?p=127</link>
		<comments>http://blog.omega-prime.co.uk/?p=127#comments</comments>
		<pubDate>Sat, 10 Sep 2011 20:30:34 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Haskell]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=127</guid>
		<description><![CDATA[I recently implemented a powerful new extension to GHC HEAD called ConstraintKinds. This (Literate Haskell) post will explain what this means, and how we can exploit it to do some cool stuff. (For long-time readers, this stuff is a generalisation of my earlier post about constraint families which was later also expounded on by Dominic [...]]]></description>
			<content:encoded><![CDATA[<p>I recently implemented a powerful new extension to GHC HEAD called <code>ConstraintKinds</code>. This (Literate Haskell) post will explain what this means, and how we can exploit it to do some cool stuff.</p>
<p>(For long-time readers, this stuff is a generalisation of my earlier post about <a href="http://blog.omega-prime.co.uk/?p=61">constraint families</a> which was later also expounded on by Dominic Orchard and Tom Schrijvers in <a href="http://www.springerlink.com/content/r87810un65965001/">Type Constraints Unleashed</a>. The proposal in its current form is due to <a href="http://hackage.haskell.org/trac/ghc/wiki/KindFact">Conor McBride</a>.)</p>
<p>First of all, we’re going to turn on a whacking great load of extensions:</p>
<pre class="haskell"><code>{-# LANGUAGE UndecidableInstances,
             MultiParamTypeClasses,
             KindSignatures,
             Rank2Types,
             ConstraintKinds,
             FlexibleInstances,
             OverlappingInstances #-}
</code></pre>
<p>(Yes, some of the cooler examples will require <code>UndecidableInstances</code>. Never mind!)</p>
<p>Let’s have some imports as well:</p>
<pre class="haskell"><code>import qualified Data.Set as S
</code></pre>
<p>When we talk about constraints in Haskell, we usually mean one of the following things:</p>
<ul>
<li>Class contexts such as <code>Show a</code></li>
<li>Implicit parameters, such as <code>?x::Int</code></li>
<li>Equality assertions, such as <code>a ~ Int</code></li>
<li>Tuples of any of the above, such as <code>(Show a, Read a)</code></li>
</ul>
<p>Is standard Haskell, these constraints can only occur to the left of <code>=&gt;</code> arrow, and they are the <strong>only</strong> thing that can appear there. With the <code>ConstraintKinds</code> extension, we instead allow any <em>type</em> of a brand-new kind <code>Constraint</code> to appear to the left of <code>=&gt;</code>. Naturally, all of the constraints we already mentioned are parsed as types, and are all given an appropriate kind:</p>
<ul>
<li><code>Show :: * -&gt; Constraint</code></li>
<li><code>(?x::Int) :: Constraint</code></li>
<li><code>(a ~ Int) :: Constraint</code></li>
<li><code>(Show a, Read a) :: Constraint</code></li>
</ul>
<h2 id="constraint-synonyms"><a href="#TOC">Constraint synonyms</a></h2>
<p>At the simplest level, this unification of constraints and types means that code like the following is valid:</p>
<pre class="haskell"><code>type Func cxt a = cxt a =&gt; a -&gt; a
 
incc :: Func Num a
incc = (+1)
</code></pre>
<p>Or we can even use type synonyms as constraint synonyms:</p>
<pre class="haskell"><code>type Stringy a = (Show a, Read a)
 
viaString :: Stringy a =&gt; a -&gt; a
viaString = read . show
</code></pre>
<p>Simulating this without the extension is a little more cumbersome:</p>
<pre class="haskell"><code>class (Show a, Read a) =&gt; Stringy a where
instance Stringy a where
</code></pre>
<h2 id="indexed-constraints"><a href="#TOC">Indexed constraints</a></h2>
<p>But it doesn’t stop there. Since constraints are just types, we can type-index them using type functions! We can use this to solve the well-known problem where lists can be an instance of the <code>Monad</code> type class, but sets cannot. This problem arises because the elements of a set must be orderable, but e.g. the <code>return</code> method of the <code>Monad</code> class allows an element of any type to be made into an “element” of the monad — not only the orderable ones.</p>
<p>A restricted monad is a monad where we need to impose some constraints on the elements it can contain. Existing Hackage packages such as Ganesh Sittampalam’s <a href="http://hackage.haskell.org/package/rmonad">rmonad package</a> provide a way to define these monads in unextended Haskell. However, with our new extension we get a much smoother user experience by reusing the type function mechanism to encode a class of restricted monads:</p>
<pre class="haskell"><code>class RMonad m where
  type RMonadCtxt m a :: Constraint
  return :: RMonadCtxt m a =&gt; a -&gt; m a
  (&gt;&gt;=) :: (RMonadCtxt m a, RMonadCtxt m b) =&gt; m a -&gt; (a -&gt; m b) -&gt; m b
</code></pre>
<p>Lists can of course be an instance of this class:</p>
<pre class="haskell"><code>instance RMonad [] where
  type RMonadCtxt [] a = ()
  return x = [x]
  (&gt;&gt;=) = flip concatMap
</code></pre>
<p>But now so can sets:</p>
<pre class="haskell"><code>instance RMonad S.Set where
  type RMonadCtxt S.Set a = Ord a
  return = S.singleton
  mx &gt;&gt;= fxmy = S.fromList [y | x &lt;- S.toList mx, y &lt;- S.toList (fxmy x)]
</code></pre>
<p>Another feature I added to GHC recently is associated type defaults. With this, we can change the <code>RMonad</code> class definition so that normal <code>Monad</code>s which do not make any special demands of their element types can be defined without giving an explicit instance for <code>RMonadCtxt</code>:</p>
<pre class="haskell"><code>class RMonad m where
  type RMonadCtxt m a :: Constraint
  type RMonadCtxt m a = ()
  return :: ...
  (&gt;&gt;=) :: ...
</code></pre>
<p>(Associated type defaults were always described in the published papers about associated types, but were never implemented until now).</p>
<h2 id="reified-dictionaries"><a href="#TOC">Reified dictionaries</a></h2>
<p>A common trick is to reify a constraint as an explicit dictionary using a GADT:</p>
<pre class="haskell"><code>data ShowDict a where
  ShowDict :: Show a =&gt; ShowDict a
 
showish :: ShowDict a -&gt; a -&gt; String
showish ShowDict x = show x
 
use_showish :: String
use_showish = showish ShowDict 10
</code></pre>
<p>With our extension we can generalise this so you can define one reified dictionary to rule them all:</p>
<pre class="haskell"><code>data Dict ctxt where
  Dict :: ctxt =&gt; Dict ctxt
 
showish' :: Dict (Show a) -&gt; a -&gt; String
showish' Dict x = show x
 
use_showish' :: String
use_showish' = showish' Dict 10
</code></pre>
<h2 id="generic-programming"><a href="#TOC">Generic programming</a></h2>
<p>In <a href="http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/">“Scrap Your Boilerplate With Class”</a>, Simon Peyton Jones and Ralf Laemmel proposed an encoding for generic functions in terms of type classes. However, their presentation was impeded by the fact that they could not abstract over type classes, and they had to have a heavy encoding mechanism to make it work. With our new extension we can write generic functions in their style in a much cleaner fashion.</p>
<p>First, we define the class of <code>Data</code> which has a generic mapping operation that applies a type-indexed function one level down in the data structure, returning all the results as a list:</p>
<pre class="haskell"><code>class (cxt a) =&gt; Data cxt a where
  gmapQ :: Proxy cxt -&gt; (forall b. Data cxt b =&gt; b -&gt; r) -&gt; a -&gt; [r]
</code></pre>
<p>The <code>cxt</code> type variable will later be instantiated to a type class corresponding to the generic function we wish to apply. The <code>Proxy cxt</code> argument to <code>gmapQ</code> is an unfortunate artifact of fact that Haskell still has no explicit type applications, so we have to use dummy value arguments to disambiguate which <code>cxt</code> we actually mean when we call <code>gmapQ</code>. The definition is trivial:</p>
<pre class="haskell"><code>data Proxy (ctxt :: * -&gt; Constraint) = Proxy
</code></pre>
<p>We can define <code>Data</code> instances for some built in types:</p>
<pre class="haskell"><code>instance (cxt Int) =&gt; Data cxt Int where
  gmapQ _ f n = []
 
instance (cxt [a], Data cxt a) =&gt; Data cxt [a] where
  gmapQ _ f [] = []
  gmapQ _ f (x:xs) = [f x, f xs]
</code></pre>
<p>Now we can define a generic function <code>gsize</code>:</p>
<pre><code>class Size a where
  gsize :: a -&gt; Int
</code></pre>
<p>We can say how <code>gsize</code> works on particular types by giving an instance:</p>
<pre class="haskell"><code>instance Size Int where
  gsize x = x
</code></pre>
<p>If no other instance is available, an overlapping instance based on <code>gmapQ</code> will be used:</p>
<pre class="haskell"><code>instance Data Size t =&gt; Size t where
  gsize t = 1 + sum (gmapQ (Proxy :: Proxy Size) gsize t)
</code></pre>
<p>We can now evaluate <code>gsize</code> at both types <code>Int</code> and <code>[Int]</code> even though we never said explicitly what it means to take the size of a list:</p>
<pre class="haskell"><code>use_gsize :: Int
use_gsize = gsize (1 :: Int) + gsize [1 :: Int, 2]
</code></pre>
<h2 id="wrapping-up"><a href="#TOC">Wrapping up</a></h2>
<p>The <code>ConstraintKinds</code> extension makes these three idioms much neater, but I’m sure there are plenty of other places where this new power will come in useful. Try it out for yourself in GHC 7.4 and find out!</p>
<p>Thanks are due to Simon Marlow for organising CamHac, where I started working on the implementation, and Dominic Orchard and Nicolas Wu who collaborated with me during the early stages of coding. Thanks also to Simon Peyton Jones for invaluable advice that finally let me merge it into GHC.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=127</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>The Sad State of Symbol Aliases</title>
		<link>http://blog.omega-prime.co.uk/?p=121</link>
		<comments>http://blog.omega-prime.co.uk/?p=121#comments</comments>
		<pubDate>Wed, 06 Jul 2011 11:40:36 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=121</guid>
		<description><![CDATA[This point continues my quest to condense and write down some of the folklore surrounding assemblers linkers. In this case, I recently came across a situation where it would be useful to be able to generate an object file that contained an alias for a symbol defined elsewhere. For example, I want an object file [...]]]></description>
			<content:encoded><![CDATA[<p>This point continues my quest to condense and write down some of the folklore surrounding assemblers linkers. In this case, I recently came across a situation where it would be useful to be able to generate an object file that contained an alias for a symbol defined elsewhere. For example, I want an object file to export a symbol <code>foo</code> that aliases <code>bar</code>, such that when any use site of <code>foo</code> is linked against the object file that use site then behaves exactly as if it had referenced <code>bar</code> instead.</p>
<p>This could be done straightforwardly (just export both <code>foo</code> with the same value as <code>bar</code>) except for the wrinkle that in general <code>bar</code> is not defined in the object file exporting <code>foo</code>, so we don't know its value yet.</p>
<p>This article picks apart support for this feature on a platform-by-platform basis. Long story short: this is supported by the object file format on OS X and Windows, but you can't get to it from the assembly code level. Linux has no support at all.</p>
<h2 id="os-x">OS X</h2>
<p>Buried deep within the <a href="http://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/MachORuntime/Reference/reference.html">Mach-O specification</a> is a mention of the symbol table entry type <code>N_INDR</code>. Quoth the standard: &quot;The symbol is defined to be the same as another symbol. The n_value field is an index into the string table specifying the name of the other symbol. When that symbol is linked, both this and the other symbol have the same defined type and value&quot;.</p>
<p>This is great stuff, and exactly what we want! The fly in the ointment is that the <a href="http://www.opensource.apple.com/source/cctools/cctools-800/as/">latest version of Apples assembler</a> has no support for actually generating such indirections. The source tree <em>does</em> contain a tool called <a href="http://www.opensource.apple.com/source/cctools/cctools-800/misc/indr.c"><code>indr</code></a> which is capable of generating these indirections in a limited capacity, but it is not distributed with OS X and anyway not general enough for our needs. Happily, Apple's linker does seem to include support for <code>N_INDR</code>, so everything should work OK if you managed to generate an object file making use of that type.</p>
<h2 id="windows">Windows</h2>
<p>Interestingly, Windows DLLs support something called &quot;<a href="http://blogs.msdn.com/b/oldnewthing/archive/2006/07/19/671238.aspx">forwarders</a>&quot; which give us the behaviour we want for dynamically exported symbols. You can create such DLLs with special syntax in your <a href="http://msdn.microsoft.com/en-us/library/hyx1zcd3(v=vs.80).aspx">.def file EXPORTS section</a>. This is not relevant to our problem though, because there is no equivalent at the object file level.</p>
<p>Page 44 of the <a href="http://msdn.microsoft.com/en-us/windows/hardware/gg463119">PE/COFF specification</a> talks about symbol tables. Reading carefully, we find a mention of &quot;Weak Externals&quot; on page 51:</p>
<blockquote>
<p>“Weak externals” are a mechanism for object files that allows flexibility at link time. A module can contain an unresolved external symbol (sym1), but it can also include an auxiliary record that indicates that if sym1 is not present at link time, another external symbol (sym2) is used to resolve references instead. If a definition of sym1 is linked, then an external reference to the symbol is resolved normally. If a definition of sym1 is not linked, then all references to the weak external for sym1 refer to sym2 instead. The external symbol, sym2, must always be linked; typically, it is defined in the module that contains the weak reference to sym1.</p>
</blockquote>
<p>This is not exactly what we had in mind, but it can be abused for the same effect. Nothing will go wrong unless someone else defines a symbol with the same name as our alias in another object file.</p>
<p>As far as I can see, the GNU assembler can't be persuaded to generate this. The assembler does have rudimentary support for generating weak externals, but only uses it in the rudimentary capacity of supporting the <code>.weak</code> directive (with ELF-style semantics) on Windows. And as we shall shortly see, ELF semantics are not what we want at all...</p>
<h2 id="linux">Linux</h2>
<p>Turning to page 1-16 of the <a href="http://www.skyfree.org/linux/references/ELF_Format.pdf">ELF specification</a> we find the definition of the ELF symbol table. As far as I can tell, there is no support whatsoever for this use case. Bah.</p>
<p>We might be tempted to search for some equivalent to the weak externals feature on Windows. Unfortunately, ELF weak symbols have a rather different semantics:</p>
<ol style="list-style-type: decimal">
<li>An <em>undefined</em> weak symbol will not cause the linker to error out if a definition is not found. Instead, the symbol will be filled in with a default value of 0.</li>
<li>A <em>defined</em> weak symbol has a lower link precedence than a strong symbol of the same name, and will not cause the linker to generate an error about duplicate symbol definitions in the case of such a conflict.</li>
</ol>
<p>The difference between this and the Windows situation is that Windows basically lets us change the default value filled in by the linker in the case of no definition being found to an arbitrary symbol.</p>
<h2 id="gcc">GCC</h2>
<p>GCC supports an <a href="http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html"><code>alias</code> attribute</a> that does exactly what I want. Unfortunately despite a <a href="http://gcc.gnu.org/ml/gcc-help/2005-04/msg00185.html">few</a> <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20652">people</a> trying to do exactly what I want they have elected to reject the construct:</p>
<blockquote>
<p>This is because it's meaningless to define an alias to an undefined symbol. On Solaris, the native assembler would have caught this error, but GNU as does not.</p>
</blockquote>
<p>This comment refers to the fact that assembly like this:</p>
<pre><code>.globl reexport
.globl export
.equiv export, reexport
</code></pre>
<p>Does not fail to compile with the GNU assembler, but generates an object file that does not define any symbols despite referencing the <code>reexport</code> symbol.</p>
<h2 id="conclusion">Conclusion</h2>
<p>A sufficiently motivated hacker could support a (weak) aliasing feature along the lines described above in the GNU assembler on Windows and OS_X without problems. However, there seems to be no way to support it on Linux within the bounds of the ELF specification.</p>
<p>Unusually Linux is the platform that lags behind the others in linker features! I usually find that quite the opposite is true.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=121</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Everything You Never Wanted To Know About DLLs</title>
		<link>http://blog.omega-prime.co.uk/?p=115</link>
		<comments>http://blog.omega-prime.co.uk/?p=115#comments</comments>
		<pubDate>Mon, 04 Jul 2011 09:38:04 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Assembly]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=115</guid>
		<description><![CDATA[I've recently had cause to investigate how dynamic linking is implemented on Windows. This post is basically a brain dump of everything I've learnt on the issue. This is mostly for my future reference, but I hope it will be useful to others too as I'm going to bring together lots of information you would [...]]]></description>
			<content:encoded><![CDATA[<p>I've recently had cause to investigate how dynamic linking is implemented on Windows. This post is basically a brain dump of everything I've learnt on the issue. This is mostly for my future reference, but I hope it will be useful to others too as I'm going to bring together lots of information you would otherwise have to hunt around for.</p>
<p>Without further ado, here we go:</p>
<h2 id="export-and-import-directories">Export and import directories</h2>
<p>The Windows executable loader is responsible for doing all dynamic loading and symbol resolution before running the code. The linker works out what functions are exported or imported by each image (an <em>image</em> is a DLL or EXE file) by inspecting the <code>.edata</code> and <code>.idata</code> sections of those images, respectively.</p>
<p>The contents of these sections is covered in detail by the <a href="http://msdn.microsoft.com/en-us/windows/hardware/gg463119">PE/COFF specification</a>.</p>
<h3 id="the-.edata-section">The <code>.edata</code> section</h3>
<p>This section records the exports of the image (yes, EXEs can export things). This takes the form of:</p>
<ul>
<li>The export address table: an array of length N holding the addresses of the exported functions/data (the addresses are stored relative to the image base). Indexes into this table are called <em>ordinals</em>.</li>
<li>The export name pointer table: an array of length M holding pointers to strings that represent the name of an export. This array is lexically ordered by name, to allow binary searches for a given export.</li>
<li>The export ordinal table: a parallel array of length M holding the ordinal of the corresponding name in the export name pointer table.</li>
</ul>
<p>(As an alternative to importing an image's export by its name, it is possible to import by specifying an ordinal. Importing by ordinal is slightly faster at runtime because the dynamic linker doesn't have to do a lookup. Furthermore, if the import is not given a name by the exporting DLL, importing by ordinal is the <strong>only</strong> way to do the import.)</p>
<p>How does the <code>.edata</code> section get created in the first place? There are two main methods:</p>
<ol style="list-style-type: decimal">
<li>
<p>Most commonly, they start life in the object files created by compiling some source code that defines a function/some data that was declared with the <code>__declspec(dllimport)</code> modifier. The compiler just emits an appropriate <code>.edata</code> section naming these exports.</p>
</li>
<li>
<p>Less commonly, the programmer might <a href="http://msdn.microsoft.com/en-us/library/d91k01sh(v=vs.80).aspx">write a .def file</a> specifying which functions they would like to export. By supplying this to <code>dlltool --output-exp</code>, an export file can be generated. An export file is just an object file which only contains a <code>.edata</code> section, exporting (via some unresolved references that will be filled in by the linker in the usual way) the symbols named in the .def file. This export library must be named by the programmer when he comes to link together his object files into a DLL.</p>
</li>
</ol>
<p>In both these cases, the linker collects the <code>.edata</code> sections from all objects named on the link line to build the <code>.edata</code> for the overall image file. One last possible way that the <code>.edata</code> can be created is by the linker itself, without having to put <code>.edata</code> into any object files:</p>
<ol start="3" style="list-style-type: decimal">
<li>The linker could choose to export all symbols defined by object files named on the link line. For example, this is the <a href="http://sourceware.org/binutils/docs/ld/WIN32.html">default behaviour of GNU ld</a> (the behaviour can also be explicitly asked for using <code>–-export-all-symbols</code>). In this case, the linker generates the <code>.edata</code> section itself. (GNU ld also supports specifying a .def file on the command line, in which case the generated section will export just those things named by the .def).</li>
</ol>
<h3 id="the-.idata-section">The <code>.idata</code> section</h3>
<p>The <code>.idata</code> section records those things that the image imports. It consists of:</p>
<ul>
<li>
<p>For every image from which symbols are imported:</p>
<ul>
<li>
<p>The filename of the image. Used by the dynamic linker to locate it on disk.</p>
</li>
<li>
<p>The import lookup table: an array of length N, which each entry is either an ordinal <em>or</em> a pointer to a string representing the name to import.</p>
</li>
<li>
<p>The import address table: an array of N pointers. The dynamic linker is responsible for filling out this array with the address of the function/data named by the corresponding symbol in the import lookup table.</p>
</li>
</ul>
</li>
</ul>
<p>The ways in which <code>.idata</code> entries are created are as follows:</p>
<ol style="list-style-type: decimal">
<li>
<p>Most commonly, they originate in a library of object files called an <code>import library'. This import library      can be created by using</code>dlltool` on the DLL you wish to export <em>or</em> a .def file of the type we discussed earlier. Just like the export library, the import library must be named by the user on the link line.</p>
</li>
<li>
<p>Alternatively, some linkers (like GNU ld) let you specify a DLL directly on the link line. The linker will automatically generate <code>.idata</code> entries for any symbols that you must import from the DLL.</p>
</li>
</ol>
<p>Notice that unlike the case when we were exporting symbols, <code>__declspec(dllimport)</code> does not cause <code>.idata</code> sections to be generated.</p>
<p>Import libraries are a bit more complicated than they first appear. The Windows dynamic loader fills the import address table with the <strong>addresses</strong> of the imported symbols (say, the address of a function <code>Func</code>). However, when the assembly code in other object files says <code>call Func</code> they expect that <code>Func</code> to name the address of that code. But we don't know that address until runtime: the only thing we know statically is the address <strong>where that address will be placed by the dynamic linker</strong>. We will call this address <code>__imp__Func</code>.</p>
<p>To deal with this extra level of indirection, the import library exports a function <code>Func</code> that just dereferences <code>__imp__Func</code> (to get the actual function pointer) and then <code>jmp</code>s to it. All of the other object files in the project can now say <code>call Func</code> just as they would if <code>Func</code> had been defined in some other object file, rather than a DLL. For this reason, saying <code>__declspec(dllimport)</code> in the declaration of a dynamically linked <em>function</em> is optional (though in fact you will get slightly more efficient code if you add them, as we will see later).</p>
<p>Unfortunately, there is no equivalent trick if you want to import <em>data</em> from another DLL. If we have some imported data <code>myData</code>, there is no way the import library can be defined so that a <code>mov $eax, myData</code> in an object file linked against it writes to the storage for <code>myData</code> in that DLL. Instead, the import library defines a symbol <code>__imp__myData</code> that resolves to the address at which the linked-in address of the storage can be found. The compiler then ensures that when you read or write from a <em>variable</em> defined with <code>__declspec(dllimport)</code> those reads and writes go through the <code>__imp_myData</code> indirection. Because different code needs to be generated at the use site, <code>__declspec</code> declarations on data imports are <em>not</em> optional.</p>
<h2 id="practical-example">Practical example</h2>
<p>Theory is all very well but it can be helpful to see all the pieces in play.</p>
<h3 id="building-a-dll">Building a DLL</h3>
<p>First, lets build a simple DLL exporting both functions and data. For maximum clarity, we'll use an explicit export library rather instead of decorating our functions with <code>declspec(dllexport)</code> or supply a .def file to the linker.</p>
<p>First lets write the .def file, <code>library.def</code>:</p>
<pre><code>LIBRARY library
EXPORTS
   function_export
   data_export      DATA
</code></pre>
<p>(The <code>DATA</code> keyword and <code>LIBRARY</code> line only affects how the import library is generated, as explained later on. Ignore them for now.)</p>
<p>Build an export file from that:</p>
<pre><code>$ dlltool --output-exp library_exports.o -d library.def
</code></pre>
<p>The resulting object basically just contains an <code>.edata</code> section that exports the symbols <code>_data_export</code> and <code>_function_export</code> under the names <code>data_export</code> and <code>function_export</code> respectively:</p>
<pre><code>$ objdump -xs library_exports.o

...

There is an export table in .edata at 0x0

The Export Tables (interpreted .edata section contents)

Export Flags                    0
Time/Date stamp                 4e10e5c1
Major/Minor                     0/0
Name                            00000028 library_exports.o.dll
Ordinal Base                    1
Number in:
        Export Address Table            00000002
        [Name Pointer/Ordinal] Table    00000002
Table Addresses
        Export Address Table            00000040
        Name Pointer Table              00000048
        Ordinal Table                   00000050

Export Address Table -- Ordinal Base 1

[Ordinal/Name Pointer] Table
        [   0] data_export
        [   1] function_export

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000000  2**2
                  ALLOC
  3 .edata        00000070  00000000  00000000  000000b4  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
SYMBOL TABLE:
[  0](sec -2)(fl 0x00)(ty   0)(scl 103) (nx 1) 0x00000000 fake
File
[  2](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000028 name
[  3](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000040 afuncs
[  4](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000048 anames
[  5](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000050 anords
[  6](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000054 n1
[  7](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000060 n2
[  8](sec  1)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .text
AUX scnlen 0x0 nreloc 0 nlnno 0
[ 10](sec  2)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .data
AUX scnlen 0x0 nreloc 0 nlnno 0
[ 12](sec  3)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .bss
AUX scnlen 0x0 nreloc 0 nlnno 0
[ 14](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .edata
AUX scnlen 0x70 nreloc 8 nlnno 0
[ 16](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 _data_export
[ 17](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 _function_export


RELOCATION RECORDS FOR [.edata]:
OFFSET   TYPE              VALUE
0000000c rva32             .edata
0000001c rva32             .edata
00000020 rva32             .edata
00000024 rva32             .edata
00000040 rva32             _data_export
00000044 rva32             _function_export
00000048 rva32             .edata
0000004c rva32             .edata


Contents of section .edata:
 0000 00000000 c1e5104e 00000000 28000000  .......N....(...
 0010 01000000 02000000 02000000 40000000  ............@...
 0020 48000000 50000000 6c696272 6172795f  H...P...library_
 0030 6578706f 7274732e 6f2e646c 6c000000  exports.o.dll...
 0040 00000000 00000000 54000000 60000000  ........T...`...
 0050 00000100 64617461 5f657870 6f727400  ....data_export.
 0060 66756e63 74696f6e 5f657870 6f727400  function_export.
</code></pre>
<p>We'll fulfil these symbol with a trivial implementation of the DLL, <code>library.c</code>:</p>
<pre class="c"><code>int data_export = 42;

int function_export() {
    return 1337 + data_export;
}
</code></pre>
<p>We can put it together into a DLL:</p>
<pre><code>$ gcc -shared -o library.dll library.c library_exports.o
</code></pre>
<p>The export table for the DLL is as follows, showing that we have exported what we wanted:</p>
<pre><code>The Export Tables (interpreted .edata section contents)

Export Flags                    0
Time/Date stamp                 4e10e5c1
Major/Minor                     0/0
Name                            00005028 library_exports.o.dll
Ordinal Base                    1
Number in:
        Export Address Table            00000002
        [Name Pointer/Ordinal] Table    00000002
Table Addresses
        Export Address Table            00005040
        Name Pointer Table              00005048
        Ordinal Table                   00005050

Export Address Table -- Ordinal Base 1
        [   0] +base[   1] 200c Export RVA
        [   1] +base[   2] 10f0 Export RVA

[Ordinal/Name Pointer] Table
        [   0] data_export
        [   1] function_export
</code></pre>
<h3 id="using-the-dll">Using the DLL</h3>
<p>When we come to look at using the DLL, things become a lot more interesting. First, we need an import library:</p>
<pre><code>$ dlltool --output-lib library.dll.a -d library.def
</code></pre>
<p>(The reason that we have an import <strong>library</strong> but an export <strong>object</strong> is because using a library for the imports allows the linker to discard <code>.idata</code> for any imports that are not used. Contrariwise ,he linker can never discard any <code>.edata</code> entry because any export may potentially be used by a user of the DLL).</p>
<p>This import library is rather complex. It contains one object for each export (<code>disds00000.o</code> and <code>disds00001.o</code>) but also two other object files (<code>distdt.o</code> and <code>disdh.o</code>) that set up the header and footer of the import list. (The header of the import list contains, among other things, the name of the DLL to link in at runtime, as derived from the <code>LIBRARY</code> line of the .def file.)</p>
<pre><code><br />$ objdump -xs library.dll.a
In archive library.dll.a:

disdt.o:     file format pe-i386

...

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000000  2**2
                  ALLOC
  3 .idata$4      00000004  00000000  00000000  00000104  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  4 .idata$5      00000004  00000000  00000000  00000108  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  5 .idata$7      0000000c  00000000  00000000  0000010c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
[  0](sec -2)(fl 0x00)(ty   0)(scl 103) (nx 1) 0x00000000 fake
File
[  2](sec  1)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .text
AUX scnlen 0x0 nreloc 0 nlnno 0
[  4](sec  2)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .data
AUX scnlen 0x0 nreloc 0 nlnno 0
[  6](sec  3)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .bss
AUX scnlen 0x0 nreloc 0 nlnno 0
[  8](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .idata$4
AUX scnlen 0x4 nreloc 0 nlnno 0
[ 10](sec  5)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .idata$5
AUX scnlen 0x4 nreloc 0 nlnno 0
[ 12](sec  6)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .idata$7
AUX scnlen 0x7 nreloc 0 nlnno 0
[ 14](sec  6)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 __library_dll_a_iname


Contents of section .idata$4:
 0000 00000000                             ....
Contents of section .idata$5:
 0000 00000000                             ....
Contents of section .idata$7:
 0000 6c696272 6172792e 646c6c00           library.dll.

disdh.o:     file format pe-i386

...

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000000  2**2
                  ALLOC
  3 .idata$2      00000014  00000000  00000000  00000104  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  4 .idata$5      00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, DATA
  5 .idata$4      00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, DATA
SYMBOL TABLE:
[  0](sec -2)(fl 0x00)(ty   0)(scl 103) (nx 1) 0x00000000 fake
File
[  2](sec  6)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 hname
[  3](sec  5)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 fthunk
[  4](sec  1)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .text
AUX scnlen 0x0 nreloc 0 nlnno 0
[  6](sec  2)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .data
AUX scnlen 0x0 nreloc 0 nlnno 0
[  8](sec  3)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .bss
AUX scnlen 0x0 nreloc 0 nlnno 0
[ 10](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 1) 0x00000000 .idata$2
AUX scnlen 0x14 nreloc 3 nlnno 0
[ 12](sec  6)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$4
[ 13](sec  5)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$5
[ 14](sec  4)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 __head_library_dll_a
[ 15](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 __library_dll_a_iname


RELOCATION RECORDS FOR [.idata$2]:
OFFSET   TYPE              VALUE
00000000 rva32             .idata$4
0000000c rva32             __library_dll_a_iname
00000010 rva32             .idata$5


Contents of section .idata$2:
 0000 00000000 00000000 00000000 00000000  ................
 0010 00000000                             ....

disds00001.o:     file format pe-i386

...

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000008  00000000  00000000  0000012c  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000000  2**2
                  ALLOC
  3 .idata$7      00000004  00000000  00000000  00000134  2**2
                  CONTENTS, RELOC
  4 .idata$5      00000004  00000000  00000000  00000138  2**2
                  CONTENTS, RELOC
  5 .idata$4      00000004  00000000  00000000  0000013c  2**2
                  CONTENTS, RELOC
  6 .idata$6      00000012  00000000  00000000  00000140  2**1
                  CONTENTS
SYMBOL TABLE:
[  0](sec  1)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .text
[  1](sec  2)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .data
[  2](sec  3)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .bss
[  3](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$7
[  4](sec  5)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$5
[  5](sec  6)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$4
[  6](sec  7)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$6
[  7](sec  1)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 _function_export
[  8](sec  5)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 __imp__function_export
[  9](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 __head_library_dll_a


RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE
00000002 dir32             .idata$5


RELOCATION RECORDS FOR [.idata$7]:
OFFSET   TYPE              VALUE
00000000 rva32             __head_library_dll_a


RELOCATION RECORDS FOR [.idata$5]:
OFFSET   TYPE              VALUE
00000000 rva32             .idata$6


RELOCATION RECORDS FOR [.idata$4]:
OFFSET   TYPE              VALUE
00000000 rva32             .idata$6


Contents of section .text:
 0000 ff250000 00009090                    .%......
Contents of section .idata$7:
 0000 00000000                             ....
Contents of section .idata$5:
 0000 00000000                             ....
Contents of section .idata$4:
 0000 00000000                             ....
Contents of section .idata$6:
 0000 01006675 6e637469 6f6e5f65 78706f72  ..function_expor
 0010 7400                                 t.

disds00000.o:     file format pe-i386

...

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000000  2**2
                  ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000000  2**2
                  ALLOC
  3 .idata$7      00000004  00000000  00000000  0000012c  2**2
                  CONTENTS, RELOC
  4 .idata$5      00000004  00000000  00000000  00000130  2**2
                  CONTENTS, RELOC
  5 .idata$4      00000004  00000000  00000000  00000134  2**2
                  CONTENTS, RELOC
  6 .idata$6      0000000e  00000000  00000000  00000138  2**1
                  CONTENTS
SYMBOL TABLE:
[  0](sec  1)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .text
[  1](sec  2)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .data
[  2](sec  3)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .bss
[  3](sec  4)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$7
[  4](sec  5)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$5
[  5](sec  6)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$4
[  6](sec  7)(fl 0x00)(ty   0)(scl   3) (nx 0) 0x00000000 .idata$6
[  7](sec  5)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 __imp__data_export
[  8](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 0) 0x00000000 __head_library_dll_a


RELOCATION RECORDS FOR [.idata$7]:
OFFSET   TYPE              VALUE
00000000 rva32             __head_library_dll_a


RELOCATION RECORDS FOR [.idata$5]:
OFFSET   TYPE              VALUE
00000000 rva32             .idata$6


RELOCATION RECORDS FOR [.idata$4]:
OFFSET   TYPE              VALUE
00000000 rva32             .idata$6


Contents of section .idata$7:
 0000 00000000                             ....
Contents of section .idata$5:
 0000 00000000                             ....
Contents of section .idata$4:
 0000 00000000                             ....
Contents of section .idata$6:
 0000 00006461 74615f65 78706f72 7400      ..data_export.
</code></pre>
<p>Note that the object corresponding to <code>data_export</code> has an empty <code>.text</code> section, whereas <code>function_export</code> <em>does</em> define some code. If we disassemble it we get this:</p>
<pre><code>00000000 &lt;_function_export&gt;:
   0:   ff 25 00 00 00 00       jmp    *0x0
                        2: dir32        .idata$5
   6:   90                      nop
   7:   90                      nop
</code></pre>
<p>The relocation of type <code>dir32</code> tells the linker how to fill in the address being dereferenced by the <code>jmp</code>. We can see that <code>_function_export</code>, when entered, will jump directly to the function at the address loaded from the memory named <code>.idata$5</code>. Inspection of the complete <code>.idata</code> section satisfies us that <code>.idata$5</code> corresponds to the address of the fragment of the import address table corresponding to the <code>function_export</code> import name, and hence the address where the absolute address of the loaded <code>function_export</code> import can be found.</p>
<p>Although only <code>function_export</code> gets a corresponding <code>_function_export</code> function, both of the exports have lead to a symbol with the <code>__imp__</code> prefix (<code>__imp__data_export</code> and <code>__imp__function_export</code>) being defined in the import library. As discussed before, this symbol stands for the address at which the pointer to the data/function will be inserted by the dynamic linker. As such, the <code>__imp__</code> symbols always point directly into the import address table.</p>
<p>With an import library in hand, we are capable of writing some client code that uses our exports, <code>main1.c</code>:</p>
<pre class="c"><code>#include &lt;stdio.h&gt;

__declspec(dllimport) extern int function_export(void);
__declspec(dllimport) extern int data_export;

int main(int argc, char **argv) {
    printf(&quot;%d\n&quot;, function_export());
    printf(&quot;%d\n&quot;, data_export);

    data_export++;

    printf(&quot;%d\n&quot;, function_export());
    printf(&quot;%d\n&quot;, data_export);

    return 0;
}
</code></pre>
<p>Build and link it against the import library and we will get the results we expect:</p>
<pre><code>$ gcc main1.c library.dll.a -o main1 &amp;&amp; ./main1
1379
42
1380
43
</code></pre>
<p>The reason that this works even though there is no <code>data_export</code> symbol defined by <code>library.dll.a</code> is because the <code>__declspec(dllimport)</code> qualifier on our <code>data_export</code> declaration in <code>main.c</code> has caused the compiled to generate code that uses the <code>__imp_data_export</code> symbol directly, as we can see if we disassemble the generated code:</p>
<pre><code>$ gcc -c main1.c -o main1.o &amp;&amp; objdump --disassemble -r main1.o

main1.o:     file format pe-i386


Disassembly of section .text:

00000000 &lt;_main&gt;:
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  -0x4(%ecx)
   a:   55                      push   %ebp
   b:   89 e5                   mov    %esp,%ebp
   d:   51                      push   %ecx
   e:   83 ec 14                sub    $0x14,%esp
  11:   e8 00 00 00 00          call   16 &lt;_main+0x16&gt;
                        12: DISP32      ___main
  16:   a1 00 00 00 00          mov    0x0,%eax
                        17: dir32       __imp__function_export
  1b:   ff d0                   call   *%eax
  1d:   89 44 24 04             mov    %eax,0x4(%esp)
  21:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
                        24: dir32       .rdata
  28:   e8 00 00 00 00          call   2d &lt;_main+0x2d&gt;
                        29: DISP32      _printf
  2d:   a1 00 00 00 00          mov    0x0,%eax
                        2e: dir32       __imp__data_export
  32:   8b 00                   mov    (%eax),%eax
  34:   89 44 24 04             mov    %eax,0x4(%esp)
  38:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
                        3b: dir32       .rdata
  3f:   e8 00 00 00 00          call   44 &lt;_main+0x44&gt;
                        40: DISP32      _printf
  44:   a1 00 00 00 00          mov    0x0,%eax
                        45: dir32       __imp__data_export
  49:   8b 00                   mov    (%eax),%eax
  4b:   8d 50 01                lea    0x1(%eax),%edx
  4e:   a1 00 00 00 00          mov    0x0,%eax
                        4f: dir32       __imp__data_export
  53:   89 10                   mov    %edx,(%eax)
  55:   a1 00 00 00 00          mov    0x0,%eax
                        56: dir32       __imp__function_export
  5a:   ff d0                   call   *%eax
  5c:   89 44 24 04             mov    %eax,0x4(%esp)
  60:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
                        63: dir32       .rdata
  67:   e8 00 00 00 00          call   6c &lt;_main+0x6c&gt;
                        68: DISP32      _printf
  6c:   a1 00 00 00 00          mov    0x0,%eax
                        6d: dir32       __imp__data_export
  71:   8b 00                   mov    (%eax),%eax
  73:   89 44 24 04             mov    %eax,0x4(%esp)
  77:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
                        7a: dir32       .rdata
  7e:   e8 00 00 00 00          call   83 &lt;_main+0x83&gt;
                        7f: DISP32      _printf
  83:   b8 00 00 00 00          mov    $0x0,%eax
  88:   83 c4 14                add    $0x14,%esp
  8b:   59                      pop    %ecx
  8c:   5d                      pop    %ebp
  8d:   8d 61 fc                lea    -0x4(%ecx),%esp
  90:   c3                      ret
  91:   90                      nop
  92:   90                      nop
  93:   90                      nop
</code></pre>
<p>In fact, we can see that the generated code doesn't even use the <code>_function_export</code> symbol, preferring <code>__imp__function_export</code>. Essentially, the code of the <code>_function_export</code> symbol in the import library has been inlined at every use site. This is why using <code>__declspec(dllimport)</code> can improve performance of cross-DLL calls, even though it is entirely optional on function declarations.</p>
<p>We might wonder what happens if we drop the <code>__declspec(dllimport)</code> qualifier on our declarations. Because of our discussion about the difference between data and function imports earlier, you might expect linking to fail. Our test file, <code>main2.c</code> is:</p>
<pre class="c"><code>#include &lt;stdio.h&gt;

extern int function_export(void);
extern int data_export;

int main(int argc, char **argv) {
    printf(&quot;%d\n&quot;, function_export());
    printf(&quot;%d\n&quot;, data_export);

    data_export++;

    printf(&quot;%d\n&quot;, function_export());
    printf(&quot;%d\n&quot;, data_export);

    return 0;
}
</code></pre>
<p>Let's try it out:</p>
<pre><code>$ gcc main2.c library.dll.a -o main2 &amp;&amp; ./main2
1379
42
1380
43
</code></pre>
<p>What the hell -- it worked? This is a bit uprising. The reason that it works despite the fact that the import library <code>library.dll.a</code> not defining the <code>_data_export</code> symbol is because of a nifty feature of GNU ld called auto-import. Without auto-import the link fails as we would expect:</p>
<pre><code>$ gcc main2.c library.dll.a -o main2 -Wl,--disable-auto-import &amp;&amp; ./main2
/tmp/ccGd8Urx.o:main2.c:(.text+0x2c): undefined reference to `_data_export'
/tmp/ccGd8Urx.o:main2.c:(.text+0x41): undefined reference to `_data_export'
/tmp/ccGd8Urx.o:main2.c:(.text+0x49): undefined reference to `_data_export'
/tmp/ccGd8Urx.o:main2.c:(.text+0x63): undefined reference to `_data_export'
collect2: ld returned 1 exit status
</code></pre>
<p>The Microsoft linker does not implement auto-import, so this is the error you would get if you were using the Microsoft toolchain.</p>
<p>However, there is a way to write client code that does not depend on auto-import or use the <code>__declspec(dllimport)</code> keyword. Our new client, <code>main3.c</code> is as follows:</p>
<pre class="c"><code>#include &lt;stdio.h&gt;

extern int (*_imp__function_export)(void);
extern int *_imp__data_export;

#define function_export (*_imp__function_export)
#define data_export (*_imp__data_export)

int main(int argc, char **argv) {
    printf(&quot;%d\n&quot;, function_export());
    printf(&quot;%d\n&quot;, data_export);

    data_export++;

    printf(&quot;%d\n&quot;, function_export());
    printf(&quot;%d\n&quot;, data_export);

    return 0;
}
</code></pre>
<p>In this code, we directly use the <code>__imp__</code>-prefixed symbols from the import library. These name an address at which the real address of the import can be found, which is reflected by our C-preprocessor definitions of <code>data_export</code> and <code>function_export</code>.</p>
<p>This code compiles perfectly even without auto-import:</p>
<pre><code>$ gcc main3.c library.dll.a -o main3 -Wl,--disable-auto-import &amp;&amp; ./main3
1379
42
1380
43
</code></pre>
<p>If you have followed along until this point you should have a solid understanding of how DLL import and export are implemented on Windows.</p>
<h2 id="how-auto-import-works">How auto-import works</h2>
<p>As a bonus, I'm going to explain how auto-import is implemented by the GNU linker. It is a rather cute hack you may get a kick out of.</p>
<p>As a reminder, auto-import is a feature of the linker that allows the programmer to declare an item of DLL-imported data with a simple <code>extern</code> keyword, without having to explicitly use <code>__declspec(dllimport)</code>. This is extremely convenient because this is exactly how most <em>nix source code declares symbols it expects to import from a shared library, so by supporting this use case that</em>nix code becomes more portable to Windows.</p>
<p>Auto-import kicks in whenever the linker finds an object file making use of a symbol <code>foo</code> which is not defined by any other object in the link, but where a symbol <code>__imp_foo</code> <strong>is</strong> defined by some object. In this case, it assumes that the use of <code>foo</code> is an attempt to access some DLL-imported data item called <code>foo</code>.</p>
<p>Now, the problem is that the linker needs to replace the use of <code>foo</code> with the address of <code>foo</code> itself. However, all we seem to know statically is an address where that address will be placed at runtime (<code>__imp_foo</code>). To square the circle, the linker plays a clever trick.</p>
<p>The trick is to extend the <code>.idata</code> of the image being created with an entry for a &quot;new&quot; DLL. The new entry is set up as follows:</p>
<ul>
<li>
<p>The filename of the image being imported is set to the same filename as the <code>.idata</code> entry covering <code>__imp_foo</code>. So if <code>__imp_foo</code> was being filled out by an address in <code>Bar.dll</code>, our new <code>.idata</code> entry will use <code>Bar.dll</code> here.</p>
</li>
<li>
<p>The import lookup table is of length 1, whose sole entry is a pointer to the name of the imported symbol corresponding to <code>__imp_foo</code>. So if <code>__imp_foo</code> is filled out by the address of the <code>foo</code> export from <code>Bar.dll</code>, the name of the symbol we put in here will be <code>foo</code>.</p>
</li>
<li>
<p>The import address table is of length 1 -- and here is the clever bit -- <strong>is located precisely at the location in the object file that was referring to the (undefined) symbol <code>foo</code></strong>.</p>
</li>
</ul>
<p>This solution neatly defers the task of filling out the address that the object file wants to the dynamic linker. The reason that the linker can play this trick is that it can see all of the object code that goes into the final image, and can thus fix all of the sites that need to refer to the imported data.</p>
<p>Note that in general the final image's <code>.idata</code> will contain several entries for the same DLL: one from the import library, and one for every place in <em>any</em> object file in the link which referred to some data exported by the DLL. Although this is somewhat unusual behaviour, the Windows linker has no problem with there being several imports of the same DLL.</p>
<h3 id="a-wrinkle">A wrinkle</h3>
<p>Unfortunately, the scheme described above only works if the object code has an undefined reference to <code>foo</code> itself. What if instead it has a reference to <code>foo+N</code>, an address N bytes after the address of <code>foo</code> itself? There is no way to set up the <code>.idata</code> so that the dynamic linker adds a constant to the address it fills in, so we seem to be stuck.</p>
<p>Alas, such relocations are reasonably common, and originate from code that accesses a field of a DLL-imported structure type. Cygwin actually contains another hack to make auto-import work in such cases, known as &quot;pseudo-relocations&quot;. If you want to know the details of how these works, there is more information in the <a href="http://www.cygwin.com/ml/cygwin-apps/2002-06/msg00276.html">original thread on the topic</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Dynamic linking on Windows is hairier than it at first appears. I hope this article has gone some way to clearing up the meaning of the mysterious <code>dllimport</code> and <code>dllexport</code> keywords, and at clarifying the role of the import and export libraries.</p>
<p>Linux and friends implement dynamic linking in a totally different manner to Windows. The scheme they use is more flexible and allows more in-memory sharing of code, but incurs a significant runtime penalty (especially on i386). For more details see <a href="http://www.greyhat.ch/lab/downloads/pic.html">here</a> and <a href="http://www.skyfree.org/linux/references/ELF_Format.pdf">the Dynamic Linking section of the the ELF spec</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=115</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Fixing &quot;files could not be moved&quot; error in Boot Camp Assistant</title>
		<link>http://blog.omega-prime.co.uk/?p=112</link>
		<comments>http://blog.omega-prime.co.uk/?p=112#comments</comments>
		<pubDate>Mon, 04 Apr 2011 10:24:29 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[Solutions]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=112</guid>
		<description><![CDATA[Recently I've been trying to install Windows on an OS X laptop by using Boot Camp. However, every time the Boot Camp Assistant would tell me that "some files could not be moved" during the creation of the Windows partition. The most commonly suggested solution is a total reinstall of OS X, which I was [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I've been trying to install Windows on an OS X laptop by using Boot Camp. However, every time the Boot Camp Assistant would tell me that "some files could not be moved" during the creation of the Windows partition. The most commonly suggested solution is a total reinstall of OS X, which I was absolutely not willing to perform.</p>
<p>I read online that this problem could sometimes be solved by using <a href="http://www.coriolis-systems.com/iDefrag.php">iDefrag</a> in "Compact" mode to manually move all your files to the front of the disk before using the Assistant. However, buying and running this £24 software had <strong>absolutely no effect</strong> on the problem.</p>
<p>Looking at iDefrag's summary view, it seems like there was a single unmovable "alternate volume label" block right at the end of the drive which might account for the problem. Anyway, I never got to the bottom of this, since I found an alternate solution that worked: hold down Apple+S during startup to enter single user mode and then run these commands:</p>
<p><code>/sbin/fsck -fy<br />
exit</code></p>
<p>This just repairs any filesystem errors on your disk. It looks like filesystem errors were the true culprits, not unmovable files, since running the Assistant after this let me create the partition with no problems - Windows is installing right now.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=112</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Security implications of PEP 383</title>
		<link>http://blog.omega-prime.co.uk/?p=107</link>
		<comments>http://blog.omega-prime.co.uk/?p=107#comments</comments>
		<pubDate>Tue, 29 Mar 2011 17:41:11 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=107</guid>
		<description><![CDATA[I've been looking into improving GHC's support for non-ASCII text, and my investigations have lead to me to PEP 383. One motivation behind this PEP is as follows: on Unix, the names of files, command line arguments, and environment variables should probably be treated as sequences of bytes. However, for good reasons it is quite [...]]]></description>
			<content:encoded><![CDATA[<p>I've been looking into improving GHC's support for non-ASCII text, and my investigations have lead to me to <a href="http://www.python.org/dev/peps/pep-0383/">PEP 383</a>.</p>
<p>One motivation behind this PEP is as follows: on Unix, the names of files, command line arguments, and environment variables should probably be treated as sequences of bytes. However, for good reasons it is quite natural for programs to act on them as if they were strings. This means that we have to choose some text encoding to use to interpret those byte sequences.</p>
<p>Unfortunately, whatever encoding you choose to use, it is quite likely that some byte sequences you encounter in practice will not in fact decode nicely using that encoding. An example would be a Big5 filename supplied as a command line argument to a program run in the UTF-8 locale.</p>
<p>In this case, what should happen? One sensible thing to do would be to fail, but this might be surprising. Python 3, with PEP 383, chooses to encode the non-decodable bytes <strong>as part of the string</strong> using surrogates. So if we try to parse a Big5 filename as a string we get a string full of surrogates representing the raw bytes we had to begin with.</p>
<p>This is a good thing to do because if that string is then immediately fed back into a function that just decodes the filename for use on the file system, the original byte sequence can be exactly reconstituted by decoding the surrogates back into bytes and using the locale encoding for the rest. If the user attempts to do something else with a string containing surrogates (such as e.g. display it to the terminal), <strong>then</strong> an exception will be raised.</p>
<p>This is a reasonably neat solution to a hard problem. However, it has weird implications. For example, consider this script that uses a black list to control access to some files:</p>
<p><code>#!/usr/bin/env python3</p>
<p>import sys</p>
<p>file = sys.argv[1]</p>
<p>blacklist = open("blacklist.big5", encoding='big5').read().split()<br />
print("Blacklist is:\n" + repr(blacklist))</p>
<p>if file in blacklist:<br />
	print("Blacklisted file, not allowed!")<br />
else:<br />
	print("OK, I'll let you in!")<br />
	print(open(file).read())<br />
</code></p>
<p>Let's say that the blacklist contains a single entry, for the file 你好 (encoded in Big5, naturally).</p>
<p>Seems simple enough, right? Although I store file names as Big5, I compare Python's Unicode strings. And indeed this program works perfectly when run from a terminal in the Big5 locale, with Big5 file names.</p>
<p>However, consider what happens when the terminal is set to UTF-8 and we invoke the script with the command line argument 你好 (encoded in Big5 of course, because the file name on disk is still Big5 even though we changed the terminal locale). In this case, Python 3 will attempt to decode the file name as UTF-8. Naturally, it will fail, so the Big5 filename will be represented in memory with surrogates.</p>
<p>Now for the punchline: when we come to compare that string (containing surrogates) with the entry from the blacklist (without surrogates) they will <b>not</b> be equal. Yet, when we go on to open the file, the filename (with surrogates) is decoded perfectly back into valid Big5 and hence we get the contents of the blacklisted file.</p>
<p>In my opinion, the fact that the current encoding affects the results of string comparisons is <strong>deeply</strong> weird behaviour and could probably be the cause of subtle security bugs. This is just one reason that I'm wary about adopting PEP 383-like behaviour for GHC.</p>
<p>P.S. For those who believe that my code is broken because you should only compare normalised unicode strings, I will add that even after using <tt>unicodedata.normalize</tt> to normalise to NFC I get the same problem.</p>
<p>P.P.S I will further note that you get the same issue even if the blacklist and filename had been in UTF-8, but this time it gets broken from a terminal in the Big5 locale. I didn't show it this way around because <a href="http://bugs.python.org/issue2128#msg125827">I understand</a> that Python 3 may only have just recently started using the locale to decode argv, rather than being hardcoded to UTF-8.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=107</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>How to build 32/64 bit fat (universal) binaries</title>
		<link>http://blog.omega-prime.co.uk/?p=102</link>
		<comments>http://blog.omega-prime.co.uk/?p=102#comments</comments>
		<pubDate>Tue, 08 Mar 2011 14:49:42 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Apple]]></category>
		<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Solutions]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=102</guid>
		<description><![CDATA[The OS X version of the Glasgow Haskell Compiler compiles Haskell into 32-bit code. Unfortunately, this means that if you are on a system where it is the default for libraries to be built in 64-bit mode, you tend to get errors when linking Haskell code telling you that you are trying to link 32-bit [...]]]></description>
			<content:encoded><![CDATA[<p>The OS X version of the <a href="http://www.haskell.org/ghc/">Glasgow Haskell Compiler</a> compiles Haskell into 32-bit code. Unfortunately, this means that if you are on a system where it is the default for libraries to be built in 64-bit mode, you tend to get errors when linking Haskell code telling you that you are trying to link 32-bit code against 64-bit code.</p>
<p>The best solution to this problem is to build all libraries you intend to link to from Haskell code as universal binaries that include both 32-bit and 64-bit versions of the code. These libraries will then work seamlessly with both Haskell code and also when pulled in as part of the build process for non-Haskell 64-bit executables.</p>
<p>If you can install the library using MacPorts, this is easy to do. Instead of doing:</p>
<p><code>sudo port install mylibrary<br />
</code></p>
<p>Just do:</p>
<p><code>sudo port install mylibrary +universal<br />
</code></p>
<p>However, if the library you want is not available through MacPorts or the MacPorts version is not up to date you will need to know how to build these universal libraries for yourself. This is the process that I aim to explain in this post.  I'm going to use <a href="http://igraph.sourceforge.net/">igraph</a> as my example library because it's what I needed to install (I needed to install the unreleased v0.6).</p>
<h3>The easy method</h3>
<p>If you are lucky, building a universal library is as simple as changing how you invoke <tt>make</tt>. Run the library's <tt>configure</tt> scripts etc as usual, and then invoke <tt>make</tt> as follows:</p>
<p><code>make CXXFLAGS="-arch i386 -arch x86_64" CFLAGS="-arch i386 -arch x86_64" LDFLAGS="-arch i386 -arch x86_64"<br />
</code></p>
<p>The <tt>-arch</tt> flags tell GCC and the linker to build and link both versions of the library. If this works, you are done. In the case of igraph, this wasn't quite enough - the above command failed with this error:</p>
<p><code>gcc-4.2: -E, -S, -save-temps and -M options are not allowed with multiple -arch flags<br />
</code></p>
<p>The reason that this occurs is because igraph invokes GCC with the <tt>-M</tt> series of flags that generate makefile dependency rules from the C code - but GCC doesn't like generating those rules for two architectures simultaneously. Luckily, there was an easy workaround in my case - I just needed to reconfigure igraph as follows:</p>
<p><code>./configure --disable-dependency-tracking<br />
</code></p>
<p>The <tt>--disable-dependency-tracking</tt> <a href="http://www.gnu.org/software/hello/manual/automake/Dependencies.html">flag</a> just stops Automake from determining the dependencies of each C file as it compiles it. It is totally harmless to disable this because that dependency information is only used in order to rebuild less stuff upon subsequent invocations of <tt>make</tt> - the worst that happens when you disable it is that if you <tt>make</tt> more than once you will have to wait a bit longer. For more information on this feature see also <a href="http://www.gnu.org/software/hello/manual/automake/Dependency-Tracking.html#Dependency-Tracking">the relevant section of the Automake manual</a>.</p>
<p>After reconfiguring in this manner, the original <tt>make</tt> invocation worked correctly for igraph.</p>
<h3>The hard method</h3>
<p>The above method may perhaps fail for some libraries, in which case you can use this more arduous manual method. The idea is to run the library's build process from scratch twice: once to get the 32-bit library and once for the 64-bit library. We can then use the <tt>lipo</tt> to glue together the build artifacts from the two runs.</p>
<p>We start by building the 32-bit version:</p>
<p><code>make clean<br />
make CXXFLAGS=-m32 CFLAGS=-m32 LDFLAGS=-m32 -j12<br />
</code></p>
<p>We now need to store the 32-bit build artifacts somewhere. Exactly which files you have to save will vary according to the library you are building, but for igraph this was sufficient:</p>
<p><code>mkdir -p ~/Junk/32 ~/Junk/64<br />
cp src/.libs/libigraph.{a,0.dylib} ~/Junk/32<br />
</code></p>
<p>Now do the 64-bit build and once again save the artifacts somewhere:</p>
<p><code>make clean<br />
make CXXFLAGS=-m64 CFLAGS=-m64 LDFLAGS=-m64 -j12<br />
cp src/.libs/libigraph.{a,0.dylib} ~/Junk/64<br />
</code></p>
<p>Finally we can use <tt>lipo</tt> to finish up:</p>
<p><code>lipo -create ~/Junk/{32,64}/libigraph.a -output src/.libs/libigraph.a<br />
lipo -create ~/Junk/{32,64}/libigraph.0.dylib -output src/.libs/libigraph.0.dylib<br />
</code></p>
<p>At this point, you can do <tt>sudo make install</tt> and get a universal version of the library installed.</p>
<p>If you want to check that your libraries are indeed universal, you can use <tt>lipo -info</tt>:</p>
<p><code>$ lipo -info src/.libs/libigraph.a<br />
Architectures in the fat file: src/.libs/libigraph.a are: i386 x86_64<br />
</code></p>
<h2>Conclusions</h2>
<p>Building universal 32-bit/64-bit binaries is apparently fairly straightforward but it was tricky to find documentation for the process. I hope this article helps others who need to get this done.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=102</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Solving GHC iconv problems on OS X 10.6</title>
		<link>http://blog.omega-prime.co.uk/?p=96</link>
		<comments>http://blog.omega-prime.co.uk/?p=96#comments</comments>
		<pubDate>Fri, 28 Jan 2011 08:25:15 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Haskell]]></category>
		<category><![CDATA[Solutions]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=96</guid>
		<description><![CDATA[A problem that has plagued my GHC installation for a while is that whenever I tried to install any non-trivial package I would get a horrible link error like this: Undefined symbols: "_iconv_close", referenced from: _hs_iconv_close in libHSbase-4.3.1.0.a(iconv.o) (maybe you meant: _hs_iconv_close) "_iconv_open", referenced from: _hs_iconv_open in libHSbase-4.3.1.0.a(iconv.o) (maybe you meant: _hs_iconv_open) "_iconv", referenced from: [...]]]></description>
			<content:encoded><![CDATA[<p>A problem that has plagued my <a href="http://www.haskell.org/ghc">GHC</a> installation for a while is that whenever I tried to install any non-trivial package I would get a horrible link error like this:</p>
<pre>Undefined symbols:
  "_iconv_close", referenced from:
      _hs_iconv_close in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_close)
  "_iconv_open", referenced from:
      _hs_iconv_open in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_open)
  "_iconv", referenced from:
      _hs_iconv in libHSbase-4.3.1.0.a(iconv.o)
     (maybe you meant: _hs_iconv_open, _hs_iconv_close , _hs_iconv )
  "_locale_charset", referenced from:
      _localeEncoding in libHSbase-4.3.1.0.a(PrelIOUtils.o)
ld: symbol(s) not found
collect2: ld returned 1 exit status</pre>
<p>The reason for this is a combination of several factors:</p>
<ul>
<li>The base library that comes with the GHC binary distribution wants to link against the standard Mac iconv</li>
<li>I have installed MacPorts libiconv, which renames the function that is named iconv_open in the standard iconv to libiconv_open</li>
<li>The Haskell library being installed by cabal depends transitively on some library that was built with something like <code>extra-lib-dirs: /opt/local/lib</code>, which causes <code>-L/opt/local/lib</code> to be passed to the linker</li>
<li>The linker's <code>-L/opt/local/lib</code> option occurs before <code>-L/usr/lib</code>, so the linker prefers to link against the MacPorts libiconv instead of the system one
</ul>
<p>In my case, it was the Haskell readline wrapper that was causing <code>/opt/local/lib</code> to be pulled in. I had to link the Haskell readline against MacPorts readline because the standard Mac <code>libreadline</code> is actually <code>libeditline</code>, which is almost-but-not-quite compatible and misses some crucial features.</p>
<p>There are several ways to fix the problem:</p>
<ul>
<li>Perhaps you don't really need the MacPorts <code>libiconv</code>. In this case, you can stop it from being used by just doing <code>port deactivate libiconv</code>. This is the route I took.</li>
<li>Perhaps it's OK to link this particular library against the system libraries in preference to the MacPorts one. In this case, you can configure the package with <code>cabal configure --extra-lib-dir=/usr/lib</code>, so <code>/usr/lib</code> is searched before the MacPorts directory. This may fail if the package that needed <code>-L/opt/local/lib</code> <strong>requires</strong> a MacPorts version of some library that is also present in <code>/usr/lib</code>, though.</li>
<li>You could build GHC yourself and link it against the MacPorts library versions. This is not for the faint-hearted, but if the version of GHC you need is in MacPorts I imagine that you can just do <code>port install ghc</code></li>
</ul>
<p>I'm glad I've finally got this sorted out. If you are still having trouble, you might find some helpful information in <a href="http://www.haskell.org/pipermail/haskell-cafe/2010-June/079204.html">the</a> <a href="http://thread.gmane.org/gmane.comp.lang.haskell.general/18064/">threads</a> that finally helped me to overcome the issue and prompted this writeup.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=96</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The case of the mysterious &quot;thread blocked indefinitely in an MVar operation&quot; exception</title>
		<link>http://blog.omega-prime.co.uk/?p=92</link>
		<comments>http://blog.omega-prime.co.uk/?p=92#comments</comments>
		<pubDate>Wed, 19 Jan 2011 16:20:59 +0000</pubDate>
		<dc:creator>Max</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.omega-prime.co.uk/?p=92</guid>
		<description><![CDATA[I recently tracked down the cause of the persistent "thread blocked indefinitely in an MVar operation" exceptions I was getting from the GHC runtime when I used my parallel-io library. There doesn't seem to be much information about this exception on the web, so I'm documenting my experiences here to help those unfortunate souls that [...]]]></description>
			<content:encoded><![CDATA[<p>I recently tracked down the cause of the persistent "thread blocked indefinitely in an MVar operation" exceptions I was getting from the GHC runtime when I used my <a href="http://batterseapower.github.com/parallel-io/">parallel-io library</a>. There doesn't seem to be much information about this exception on the web, so I'm documenting my experiences here to help those unfortunate souls that come across it.</p>
<p>The essence of the parallel-io library is a combinator like this:</p>
<pre class="textmate-source"><span class="source source_haskell"><span class="meta meta_function meta_function_type-declaration meta_function_type-declaration_haskell"><span class="entity entity_name entity_name_function entity_name_function_haskell">parallel</span> <span class="punctuation punctuation_separator punctuation_separator_double-colon punctuation_separator_double-colon_haskell">::</span> [<span class="constant constant_other constant_other_haskell">IO</span> <span class="variable variable_other variable_other_generic-type variable_other_generic-type_haskell">a</span>] <span class="punctuation punctuation_separator punctuation_separator_arrow punctuation_separator_arrow_haskell">-&gt;</span> <span class="constant constant_other constant_other_haskell">IO</span> [<span class="variable variable_other variable_other_generic-type variable_other_generic-type_haskell">a</span>]</span></span></pre>
<p>The list of IO actions on the input are run (possibly in parallel) and the results returned as a list. The library ensures that we never go beyond a certain (user specified) degree of parallelism, but that's by-the-by for this post.</p>
<p>There are a number of interesting design questions about the design of such a combinator, but the one that tripped me up is the decision about what to do with <strong>exceptions</strong> raised by the IO actions. What I used to do was have <code>parallel</code> run all the IO actions wrapped in a <code>try</code>, and wait for all of the actions to either throw an exception or complete normally. The combinator would then either return a list of the results, or, if at least one action threw an exception, the exception would be rethrown on the thread that had invoked <code>parallel</code>.</p>
<p>At first glance, this seems very reasonable. However, in this turns out to be a disastrous choice that all-but-guarantees your program will go down in flames with "thread blocked indefinitely in an MVar operation".</p>
<p>Consider this simple program:</p>
<pre class="textmate-source"><span class="source source_haskell"><span class="comment comment_line comment_line_double-dash comment_line_double-dash_haskell"><span class="punctuation punctuation_definition punctuation_definition_comment punctuation_definition_comment_haskell">--</span> The main action executes on thread 1:
</span>main <span class="punctuation punctuation_separator punctuation_separator_equal-sign punctuation_separator_equal-sign_haskell">=</span> <span class="keyword keyword_control keyword_control_haskell">do</span>
    res1 &lt;- newEmptyMVar
    res2 &lt;- newEmptyMVar
    
    wait &lt;- newEmptyMVar
    
    <span class="comment comment_line comment_line_double-dash comment_line_double-dash_haskell"><span class="punctuation punctuation_definition punctuation_definition_comment punctuation_definition_comment_haskell">--</span> Spawn thread 2:
</span>    forkIO $ saveExceptionsTo res1 $ <span class="keyword keyword_control keyword_control_haskell">do</span>
        () &lt;- takeMVar wait
        putMVar res1 (<span class="constant constant_other constant_other_haskell">Right</span> <span class="string string_quoted string_quoted_double string_quoted_double_haskell"><span class="punctuation punctuation_definition punctuation_definition_string punctuation_definition_string_begin punctuation_definition_string_begin_haskell">"</span>Hello<span class="punctuation punctuation_definition punctuation_definition_string punctuation_definition_string_end punctuation_definition_string_end_haskell">"</span></span>)
    
    <span class="comment comment_line comment_line_double-dash comment_line_double-dash_haskell"><span class="punctuation punctuation_definition punctuation_definition_comment punctuation_definition_comment_haskell">--</span> Spawn thread 3:
</span>    forkIO $ saveExceptionsTo res2 $ <span class="keyword keyword_control keyword_control_haskell">do</span>
        throwIO $ <span class="constant constant_other constant_other_haskell">ErrorCall</span> <span class="string string_quoted string_quoted_double string_quoted_double_haskell"><span class="punctuation punctuation_definition punctuation_definition_string punctuation_definition_string_begin punctuation_definition_string_begin_haskell">"</span>Oops!<span class="punctuation punctuation_definition punctuation_definition_string punctuation_definition_string_end punctuation_definition_string_end_haskell">"</span></span>
        
        putMVar wait () <span class="comment comment_line comment_line_double-dash comment_line_double-dash_haskell"><span class="punctuation punctuation_definition punctuation_definition_comment punctuation_definition_comment_haskell">--</span> Unblocks thread 2
</span>        putMVar res2 (<span class="constant constant_other constant_other_haskell">Right</span> <span class="string string_quoted string_quoted_double string_quoted_double_haskell"><span class="punctuation punctuation_definition punctuation_definition_string punctuation_definition_string_begin punctuation_definition_string_begin_haskell">"</span>World<span class="punctuation punctuation_definition punctuation_definition_string punctuation_definition_string_end punctuation_definition_string_end_haskell">"</span></span>)
    
    <span class="comment comment_line comment_line_double-dash comment_line_double-dash_haskell"><span class="punctuation punctuation_definition punctuation_definition_comment punctuation_definition_comment_haskell">--</span> Wait for the results of both threads:
</span>    liftM2 <span class="entity entity_name entity_name_function entity_name_function_infix entity_name_function_infix_haskell">(,)</span> (takeMVar res1) (takeMVar res2) &gt;&gt;= <span class="support support_function support_function_prelude support_function_prelude_haskell">print</span>

<span class="meta meta_function meta_function_type-declaration meta_function_type-declaration_haskell"><span class="entity entity_name entity_name_function entity_name_function_haskell">saveExceptionsTo</span> <span class="punctuation punctuation_separator punctuation_separator_double-colon punctuation_separator_double-colon_haskell">::</span> <span class="constant constant_other constant_other_haskell">MVar</span> (<span class="constant constant_other constant_other_haskell">Either</span> <span class="constant constant_other constant_other_haskell">SomeException</span> <span class="variable variable_other variable_other_generic-type variable_other_generic-type_haskell">a</span>) <span class="punctuation punctuation_separator punctuation_separator_arrow punctuation_separator_arrow_haskell">-&gt;</span> <span class="constant constant_other constant_other_haskell">IO</span> () <span class="punctuation punctuation_separator punctuation_separator_arrow punctuation_separator_arrow_haskell">-&gt;</span> <span class="constant constant_other constant_other_haskell">IO</span> ()
</span>saveExceptionsTo mvar act <span class="punctuation punctuation_separator punctuation_separator_equal-sign punctuation_separator_equal-sign_haskell">=</span> act <span class="entity entity_name entity_name_function entity_name_function_infix entity_name_function_infix_haskell"><span class="punctuation punctuation_definition punctuation_definition_entity punctuation_definition_entity_haskell">`</span>catch<span class="punctuation punctuation_definition punctuation_definition_entity punctuation_definition_entity_haskell">`</span></span> \e -&gt; putMVar mvar (<span class="constant constant_other constant_other_haskell">Left</span> e)</span></pre>
<p>This code is similar to what the <code>parallel</code> combinator does in that the main thread (thread 1) waits for thread 2 and thread 3 to both return a result or exception to <code>res1</code> and <code>res2</code> respectively. After it has both, it shows the exception or result.</p>
<p>The fly in the ointment is that <em>thread 2 is waiting on thread 3</em>, so the thread dependency graph looks something like this:</p>
<p><a href="http://blog.omega-prime.co.uk/wp-content/uploads/2011/01/Dependencies.png"><img src="http://blog.omega-prime.co.uk/wp-content/uploads/2011/01/Dependencies.png" alt="" title="Dependencies" width="203" height="133" class="aligncenter size-full wp-image-93" /></a></p>
<p>(You should read an arrow that points from thread A to thread B as thread B is waiting on thread A to unblock it).</p>
<p>Unfortunately, our programmer has written buggy code, and the code running in thread 3 throws an exception before it can wake up thread 2. Thread 3 writes to the <code>res2</code> MVar just fine, but thread 1 is still blocked on the <code>res1</code> MVar, which will never be written to. At this point, everything comes crashing down with "thread blocked indefinitely in an MVar operation".</p>
<p>The most annoying part is that the exception that originally caused the error is squirrelled away in an MVar somewhere, and we never get to see it!</p>
<p>This goes to show that the proposed semantics for exceptions above is dangerous. In the presence of dependencies between the actions in the <code>parallel</code> call, it tends to hide exceptions that we really want to see.</p>
<p>The latest version of the parallel-io library solves this by the simple method of using the asynchronous exceptions feature of GHC to rethrow any exceptions coming from an action supplied to <code>parallel</code> <strong>as soon as they occur</strong>. This unblocks thread 1 and lets us deal with the exception in whatever way is appropriate for our situation.</p>
<p>Although this bug is simple to describe and obvious in retrospect, it took a hell of a time to diagnose. It never ceases to amaze me exactly how difficult it is to write reliable concurrent programs!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.omega-prime.co.uk/?feed=rss2&#038;p=92</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
