Apr 3 2010

Ditaa support for gitit

I hacked together a quick plugin for the most excellent gitit wiki today. It’s written in Haskell, so it’s an absolute pleasure to write code for it.

What I added support for is a neat little tool called ditaa (DIagrams Through Ascii Art). Basically, in the markdown source of your Gitit wiki you can now write something like the following:

~~~ {.ditaa}
+--------+   +-------+    +-------+
|        | --+ ditaa +--> |       |
|  Text  |   +-------+    |diagram|
|Document|   |!magic!|    |       |
|     {d}|   |       |    |       |
+---+----+   +-------+    +-------+
    :                         ^
    |       Lots of work      |
    +-------------------------+
~~~

The plugin will then call out to the ditaa command line tool (written in Java, boo!) to render that to a beautiful image:

A ditaa image on a Gitit page

To get this set up for yourself, try the following from the root of your Gitit wiki:

git clone git://github.com/batterseapower/gitit-plugins.git batterseapower-plugins
wget http://downloads.sourceforge.net/project/ditaa/ditaa/0.9/ditaa0_9.zip?use_mirror=kent -O ditaa0_9.zip
unzip ditaa0_9.zip

Now edit your Gitit configuration file so the plugins list includes my plugin:

plugins: batterseapower-plugins/Ditaa.hs

That’s it – restart Gitit and you should be ready to go!


Dec 7 2008

OMake and Literate Haskell

I’ve often thought it would be a nice idea to mash together Eelco Dolstra’s ideas about purely-functional software deployment with functional reactive programming to produce a purely functional build system. In this world, compilers are pure functions from source (.c files) to object (.o files) and these dependencies are encoded explicitly by argument-passing in the program rather than happening implicitly via the file system.

Having grappled with GNU Make somewhat for the last week, I thought I’d take another look at making this work nicely and made a little progress. However, that is not the topic of this post! In the course of research, Google turned up an interesting find: OMake, a build system written in OCaml. It has a number of niceish features over GNU Make but the thing that really caught my eye was built-in support for LaTeX building – just the problem I’d had over the last week. Thus inspired, I had a go at integrating this support with one of my Literate Haskell documents, and this post will preserve the result for posterity :-)

OMake syntax is somewhat Make like, so our OMakefile opens with something you may find familiar:

########################################################################
# Phony targets are scoped, so you probably want to declare them first.
#

.PHONY: all clean
.DEFAULT: all

The notion of scoping is somewhat different from that Make, and it includes the nice feature that you can introduce local scopes into your OMakefile with a “section” declaration – I don’t use any of that good stuff here though.

Next I need to tell OMake how to build .lhs files, since it only understands .tex. I do this by declaring a function that generates rules when it is called (pretty cool!):

########################################################################
# Lhs2Tex builder.
#

LHS2TEX = lhs2TeX
LHSSTYLE = tt

Lhs2Tex(tex, lhs) =
    # Add the suffixes
    tex_name = $(addsuffix .tex, $(tex))
    lhs_name = $(addsuffix .lhs, $(lhs))

    # The build rule
    $(tex_name): $(lhs_name)
        $(LHS2TEX) --$(LHSSTYLE) -o $@ < $+

    # Return the tex file name
    value $(tex_name)

At this point I could also use the .SCANNER target to tell OMake how to determine the dependencies of a .lhs file, but I didn’t need that for my situation, so I didn’t bother to do it. Instead, let’s plunge onwards into the declaration of the rules particular to my program:

########################################################################
# LaTeX configuration.
#

DOCUMENT = closure-elimination
GENERATED_TEX = closure-elimination

foreach(tex, $(GENERATED_TEX)):
    Lhs2Tex($(tex), $(tex))

LaTeXDocument($(DOCUMENT), closure-elimination)

This turns my file closure-elimination.lhs into closure-elimination.tex and then finally into a PDF using the built-in LaTeXDocument function. I’ve used a foreach loop to call my Lhs2Tex function just to show you that this can be done – the OMake system includes a whole rich programming language of it’s own, which even includes first-class functions!

Finally I need to define those phony targets I declared way up at the start of the post:

########################################################################
# Phony targets
#

all: $(DOCUMENT).pdf

clean:
  rm -f \
     $(addsuffix .tex, $(GENERATED_TEX)) \
     $(addsuffixes .pdf .dvi .aux .log .out, $(DOCUMENT)) \
     *.omc

My “clean” target is somewhat unsatisfactory. I rather think that the use of LaTeXDocument should mean that the LaTeX junk (.pdf, .dvi, .aux, .log, .out) should be automatically cleaned somehow, but I can’t work out how to achieve that. (My functional build system would consider this stuff part of the “local side effects” of a build command – encoded through some sort of monad structure – and hence fair game for any “clean”.)

Overall, OMake looks like a nice system – indeed, anything that handles the tedious business of running LaTeX over my document right number of times is doing better than GNU Make in my book. I’ll certainly consider using it in future, larger scale projects – but I’m a bit concerned the code has been abandonded, as the last release was in August of 2007…

Other alternatives for LaTeX projects include (if you want to stick with GNU Make) the Ultimate LaTeX Makefile. Another option might be cute-looking thing I just discovered called rubber, which is a specialised LaTeX build tool.


Nov 11 2008

Upgrading An Unactivated Windows Install To Parallels 4.0

This is a pretty obscure problem, but I’m going to put a post up about it on the off chance I can help someone else out. My regular reader (hi dad!) will probably find this of no interest and should give it a miss :-)

The situation I found myself in was upgrading a Boot Camp install of Windows Vista for the new release of Parallels Desktop 4.0 – no big deal, you may think. Unfortunately, I had forgotten that that particular install of Windows Vista wasn’t activated, which caused the automatic upgrade process to bork, dropping me back to manual mode.

To complete the upgrade I needed to run the Parallels Tools setup executable. However, since I hadn’t activated, I could only log in as far as getting the “please activate Windows now” screen. As it happened, I knew that I could get rid of this screen by feeding it the details of a Windows Vista license I own, but in order to do that I needed an Internet connection (I don’t think my PAYG phone had enough credit on it for an extended Microsoft call centre experience). However, to get an Internet connection I had to install the Parallels Ethernet Connection drivers, and hence the Tools. Catch 22!

The workaround is convoluted, to say the least. First, we need a command prompt in the restricted Vista activation session. You do this by clicking any of the links in the activation window: they should cause a browser to open. From here, you can ask the browser to “Open a file” and direct it to C:\Windows\System32\cmd.exe – this should initiate “download” of the executable. Click the option to run the file and voila!

Now you have a command prompt the fun really begins. You might think you could just type D:\setup.exe and the Tools would begin installing, but life just isn’t that simple – in Their infinite wisdom, Microsoft have imposed quotas on the resource consumption of the session they set up for the purposes of activation. This is probably the Right Thing to do from their POV, but it’s just a pain in the arse for us.

The workaround is to get the internet connection working, so you can do the activation and hence lift the resource limits. To do this, create a floppy disk image containing the Windows 2000 drivers for a Realtek 8029AS adapter (you should be able to get those from here, until Realtek break their incoming links again). Personally I did this by using another virtual machine to download the files and extract them onto a new floppy disk image (you can create a blank image on the Floppy Drive tab of the VM settings). I would make the fruits of this labour available to you as a simple download if it were not for (unfounded?) fear of Realtek’s highly trained attack lawyers.

Once you have the requisite image in your sweaty virtual paws you can proceed to mount it into the Vista VM. To finish up, type compmgmt.msc into that command prompt and update the drivers for the detected network adapter by searching for new ones on the A:\ drive.

You should now be free to run the online activation and break the Catch 22, allowing installation of the Tools – at this point feel free to help yourself to a cup of coffee and a ginger-snap biscuit to celebrate a difficult job done well (I know I did…).

I’m really quite suprised that I had to jump through this many hoops – the Realtek drivers allegedly come with Vista, for one thing. But – c’est la vie! It’s also quite pleasing that the humble, long outmoded, floppy drive still has a place in solving modern IT problems :-)


Aug 31 2008

Hackage Releases Made Easy

The Haskell community has built up a great resource: the Hackage Haskell package database, where we recently hit the 500-package mark!

One of those 500 packages was mine, I added another to their number just an hour ago, and I’ve got two more in the oven. Given, then, that I’m starting to maintain a few packages, I went to the trouble of automating the Hackage release process, and in this post I’m going to briefly walk through setting up this automated environment.

  1. Install cabal-upload from Hackage. I’m afraid that at the time of writing this is not perfectly simple because it won’t build with GHC 6.8 or above: this can be fixed with a new .cabal file, however, which I’ve made available here. (Edit: I’ve just noticed that this functionality seems to have been added to Cabal itself! You may just be able to use cabal upload. However, I’m not sure what the right config file location is for the next step).
  2. Add a file containing your Hackage username and password in the format (“username”,”password”) called ~/.cabal-upload/auth.
  3. Copy the following shell script into a file called release in the root of your project (the same directory as the Setup.lhs file):
    #!/bin/bash
    #
    
    echo "Have you updated the version number? Type 'yes' if you have!"
    read version_response
    
    if [ "$version_response" != "yes" ]; then
        echo "Go and update the version number"
        exit 1
    fi
    
    sdist_output=`runghc Setup.lhs sdist`
    
    if [ "$?" != "0" ]; then
        echo "Cabal sdist failed, aborting"
        exit 1
    fi
    
    # Want to find a line like:
    # Source tarball created: dist/ansi-terminal-0.1.tar.gz
    
    # Test this with:
    # runghc Setup.lhs sdist | grep ...
    filename=`echo $sdist_output | sed 's/.*Source tarball created: \([^ ]*\).*/\1/'`
    echo "Filename: $filename"
    
    if [ "$filename" = "$sdist_output" ]; then
        echo "Could not find filename, aborting"
        exit 1
    fi
    
    # Test this with:
    # echo dist/ansi-terminal-0.1.tar.gz | sed ...
    version=`echo $filename | sed 's/^[^0-9]*\([0-9\.]*\).tar.gz$/\1/'`
    echo "Version: $version"
    
    if [ "$version" = "$filename" ]; then
        echo "Could not find version, aborting"
        exit 1
    fi
    
    echo "This is your last chance to abort! I'm going to upload in 10 seconds"
    sleep 10
    
    git tag "v$version"
    
    if [ "$?" != "0" ]; then
        echo "Git tag failed, aborting"
        exit 1
    fi
    
    # You need to have stored your Hackage username and password as directed by cabal-upload
    # I use -v5 because otherwise the error messages can be cryptic :-)
    cabal-upload -v5 $filename
    
    if [ "$?" != "0" ]; then
        echo "Hackage upload failed, aborting"
        exit 1
    fi
    
    # Success!
    exit 0
  4. When you’re ready to release something, simply run the shell script! Not only will this package up your project and upload it to Hackage, it will also add a version tag to your Git repository (obviously you should change this bit if you are using another VCS!).

If you would like to follow my continuing adventures in Haskell open source, please check out my GitHub profile! Patches gratefully accepted :-)


May 1 2008

Fixing File Associations Eaten By Parallels Desktop

I use OS X, and so for those rare cases where I must deign to run a program designed for Windows I make use of my copy of it hosted in Parallels Desktop. Now, Parallels has been generally good to me, but I recently came across the problem documented in this forum thread where its ability to launch Mac applications from Windows caused some basic file associations in Windows-land to stop working. For example, Excel files were listed as “xls_auto_file” by Explorer and double clicking on them only gave me the “Open With” dialog: not very helpful.

The thread does suggest a means of solving the problem, by creating a batch file that re-associates a white-list of some of the possibly affected extensions, like this:

REM Restores MS Office File Type Associations
assoc .doc=Word.Document.8
assoc .dochtml=wordhtmlfile
assoc .docmhtml=wordmhtmlfile
assoc .docxml=wordxmlfile
assoc .dot=Word.Template.8
assoc .pot=PowerPoint.Template.8
assoc .pps=PowerPoint.SlideShow.8
assoc .ppt=PowerPoint.Show.8
assoc .rtf=Word.RTF.8
assoc .wbk=Word.Backup.8
assoc .xlc=Excel.Chart.8
assoc .xlm=Excel.Macrosheet
assoc .xls=Excel.Sheet.8
assoc .xlt=Excel.Template
assoc .xlw=Excel.Workspace

However, this felt a bit ad-hoc to me, and in particular some of the extensions that were affected for me were not on the list. Thus, like any good programmer I went off and whipped up a utility designed to undo the damage inflicted on my poor defenseless HKEY_CLASSES_ROOT by Parallels.



Using it is pretty simple: just click “Scan” and then “Fix Selected” repeatedly until the scan finds nothing more of interest. Now, before I give you a download link to the utility, I need to give you the standard disclaimer to which you must agree in order to use it:

This utility is provided on an ‘as is’ basis, without warranties of any kind, and no warranty, express or implied, is given that the operation of the utility is correct or safe to use. I do not accept any liability for any error or omission. Use of the utility is at your own risk.

Sorry for the legalese but this utility is modifying your registry and hence (though I feel it is highly unlikely) could get something quite, quite wrong. You would be also wise to have a backup of the HKEY_CLASSES_ROOT registry branch to restore in case you aren’t happy with the changes made by the utility.

That said, without further ado here is the utility and its source code (C# 3.0), both of which (being of my sole authorship) I release into the public domain. I hope someone else finds it useful!


Mar 3 2008

Google Images Plugin For Corripio

A simple post this time. Corripio is a program for OS X that lets you easily find album artwork for your music collection. However, it’s a bit rough around the edges and its built in web based mechanisms for retrieving this artwork are rather limited and frankly a bit rubbish. I’ve managed to solve all these issues by writing a plugin to fix its unaccountable lack of support for Google Images.

However, a word to the wise: this is probably against the terms of service, so make sure you only use this plugin for personal use!

#!/usr/bin/ruby
#

require 'net/http'
require 'open-uri'
require 'uri'
require 'cgi'

# Construct a query for good-sized Google images by taking the supplied keywords and adding "album"
query = (ARGV + ["album"]).join(" ")
sizes = ["medium", "large", "xlarge"].join("|")
uri = "http://images.google.co.uk/images?hl=en&q=#{CGI.escape(query)}&imgsz=#{CGI.escape(sizes)}"

source = open(URI.parse(uri)).read.gsub("\r\n", "")

# Look for javascript statements of the form dyn.Img("foo", "bar", "lol", "http://www.example.com/yay.jpg")
matcher = /dyn.Img\(".*?",".*?",".*?","(.*?)"/
matches = source.scan(matcher)

# Push back the best 10 results
matches.first(10).each { |match| puts match.first }

To install the plugin, copy the above code into a new file in ~/Library/Application Support/Corripio/Plugins and make sure it is marked executable. Now, you can drag that file into the Plugins section of the Corripio preference pane and then (optionally) disable the default image search plugins. When you’re done the preferences pane should look something like this:

Corripio preference pane


Feb 14 2008

Breaking The Running Key Cipher: Part 2

This is the second post in a series of three covering the running key cipher. You can find the first part here.

To recap, in the last part we covered the definition of the running key cipher and looked at a simple mechanism that you might use to try and break it: a known plaintext attack. This time we’ll cover the next most advanced method you can use to try and effect a break of a piece of running key ciphertext: n-gram analysis.

For those readers unfamiliar with the fields of cryptanalysis and information retrieval, we’ll briefly cover what an n-gram is. Essentially it is just a list of n symbols, where symbols can be letters, words, or whatever else you are interested in analyzing. However, in typical usage the term implies that a probability that somehow reflects the likelihood of observing that symbol list is attached to each such list.

In our case we are interested in character based n-grams, simply because the running key cipher is character based, and so example bigrams (n = 2) would include “ot”, “ke” and “re”. Likewise, we can consider trigrams (n = 3) such as “the”, “foo” and “ter”, and even higher order n-grams such as the pentagrams (n = 5) “erenc” and “licks” or hexagrams (n = 6) like “okingh”. Of course, to be useful, these strings have to have their attached probability. As we are interested in analyzing typical English text, these probabilities will typically be estimated from the frequency of n-gram occurrence in some publicly available corpus such as newspaper archives or the internet itself.

Before we look at how we can use n-grams for cryptanalysis, let’s take a look at some code to create them. Let’s begin with a class to represent the n-gram itself:

class NGram
{
    public string Text;
    public double Probability;

    public NGram(string text, double probability)
    {
        this.Text = text;
        this.Probability = probability;
    }
}

So far so simple. We will also need a class to analyze and contain a collection of n-grams:

class NGramFrequencies
{
    private Dictionary<string, int> occurrences;
    private int totalOccurrences;

    private NGramFrequencies(Dictionary<string, int> occurrences, int totalOccurrences)
    {
        this.occurrences = occurrences;
        this.totalOccurrences = totalOccurrences;
    }

    public double ProbabilityOf(String nGram)
    {
        int nGramOccurrence;
        if (this.occurrences.TryGetValue(nGram, out nGramOccurrence))
        { return this.ProbabilityOf(nGramOccurrence); }
        else
        { return 0.0; }
    }

    private double ProbabilityOf(int occurrence)
    { return (double)occurrence / this.totalOccurrences; }

    public IEnumerable<NGram> NGrams
    {
        get
        {
            foreach (KeyValuePair<string, int> pair in this.occurrences)
            { yield return new NGram(pair.Key, ProbabilityOf(pair.Value)); }
        }
    }

    public static NGramFrequencies Analyse(StreamReader reader, int length)
    {
        LinkedList<char> accumulator = new LinkedList<char>();
        Dictionary<string, int> occurances = new Dictionary<string, int>();
        int totalOccurances = 0;

        while (!reader.EndOfStream)
        {
            char nextCharacter;
            if (TryNormalizeCharacter((char)reader.Read(), out nextCharacter))
            {
                if (accumulator.Count == length)
                {
                    StringBuilder nGram = new StringBuilder(length);
                    foreach (char value in accumulator)
                    { nGram.Append(value); }
                    string nGramString = nGram.ToString();

                    int currentOccurrences;
                    if (occurrences.TryGetValue(nGramString, out currentOccurances))
                    { occurrences[nGramString] = currentOccurrences + 1; }
                    else
                    { occurrences[nGramString] = 1; }

                    totalOccurrences++;
                    accumulator.RemoveFirst();
                }

                accumulator.AddLast(nextCharacter);
            }
        }

        return new NGramFrequencies(occurances, totalOccurances);
    }

    private static bool TryNormalizeCharacter(char input, out char output)
    {
        if (Char.IsLetter(input))
        {
            output = Char.ToUpper(input);
            return true;
        }
        else
        {
            output = '?';
            return false;
        }
    }
}

That’s a fair chunk of code! However, most of it is pretty simple. The real core is the Analyse method, which just counts the frequency of n-gram occurrence ready to be turned into a probability when the n-grams are enumerated later. It takes as arguments a StreamReader, containing the corpus we wish to analyze and the length of n-grams we are interested in.

Now, before we continue I feel that I should point out the most glaring flaw in this implementation: it deals very poorly with data sparsity. Any n-gram which does not occur in the corpus will be assigned a probability of zero. This is probably undesirable since almost every string is at least possible in practice, if not necessarily particularly likely! The process of estimating probabilities for unseen n-grams is called smoothing, and the methods you can use to do it vary from the simple (initialize all n-gram counts to 1 instead of 0) to the complicated (Good-Turing smoothing). However, I have to say that in practice I’ve not found a huge difference in outcomes when I use the simple scheme versus any more complicated one.

Despite this flaw we will plunge forward with our use of n-grams. The fundamental algorithm we will be using for the cryptanalysis is based on a so-called maximum likelihood criterion. What this means is that for every chuck of ciphertext we will try and decipher it in a bruteforce way with every possible keytext, and we will then use our n-grams to estimate the likelihood of this deciphering by taking the product of the probabilities of the resulting plaintext block and the guessed key fragment (which we are assuming to be normal English text). The deciphering which has the highest such product is considered to be the most likely one and is shown to the user. A simple implementation of this algorithm is shown below:

class Program
{
    static void Main(string[] args)
    {
        NGramFrequencies[] nGramFrequencies = ObtainNGramFrequencies(new StreamReader("D:\\Downloads\\Frankenstein.txt"));
        string cipherText = File.ReadAllText("D:\\Downloads\\Ciphertext.txt");

        const int BLOCK_SIZE = 4;
        NGramFrequencies frequencies = nGramFrequencies[BLOCK_SIZE - 1];

        for (int offset = 0; offset < BLOCK_SIZE; offset++)
        {
            StringBuilder plainTextOutput = new StringBuilder();
            StringBuilder keyTextOutput = new StringBuilder();
            for (int a = offset; a < cipherText.Length - (BLOCK_SIZE + offset); a += BLOCK_SIZE)
            {
                string cipherTextBlock = cipherText.Substring(a, BLOCK_SIZE);

                double maximumProbability = 0.0;
                string bestPlainText = null;
                string bestKeyText = null;
                foreach (NGram nGram in frequencies.NGrams)
                {
                    string plainText = Decipher(cipherTextBlock, nGram.Text);
                    double probability = frequencies.ProbabilityOf(plainText) * nGram.Probability;

                    if (probability >= maximumProbability)
                    {
                        maximumProbability = probability;
                        bestPlainText = plainText;
                        bestKeyText = nGram.Text;
                    }
                }

                plainTextOutput.Append(bestPlainText);
                keyTextOutput.Append(bestKeyText);
            }

            Console.WriteLine("After analysis on offset {0}", offset);
            Console.WriteLine("'Plain': {0}", plainTextOutput);
            Console.WriteLine("'Key': {0}", keyTextOutput);
        }
    }

    static string Decipher(string cipherText, string keyText)
    {
        StringBuilder output = new StringBuilder(cipherText.Length);
        for (int a = 0; a < cipherText.Length; a++)
        { output.Append((char)('A' + ReduceMod(cipherText[a] - keyText[a]))); }

        return output.ToString();
    }

    static int ReduceMod(int value)
    {
        if (value < 0)
        { return value + 26; }
        else
        { return value % 26; }
    }

    static NGramFrequencies[] ObtainNGramFrequencies(StreamReader corpusStream)
    {
        NGramFrequencies[] nGramFrequencies = new NGramFrequencies[6];
        for (int a = 1; a <= nGramFrequencies.Length; a++)
        {
            nGramFrequencies[a - 1] = NGramFrequencies.Analyse(corpusStream, a);
            corpusStream.BaseStream.Seek(0, SeekOrigin.Begin);
        }

        return nGramFrequencies;
    }
}

Notice that this program assumes that you have a copy of Frankenstein on your drive to use as a corpus. Obviously in reality you will want to use a somewhat larger text than this, composed of the concatenation of many texts. I have found Project Gutenberg to be a great source for text you can use to construct such a corpus without too much trouble.

The output from this program is rather user unfriendly and it certainly won’t give you the plaintext in a straightforward way. I have found it most useful for suggesting possible plaintext fragments that can then be used as a starting point for a known-plaintext search, which I described in the last post. Recall that the plaintext we suggested from last time was:

IENJOYEATINGWHILESUSPENDEDUPSIDEDOWN

And we enciphered this horrible secret with the running key:

GOBOILYOURBOTTOMSYOUSONSOFSILLYPERSONS

So let’s take a look at what our n-gram based analysis program spits out as possible plaintext fragments when it is fed the corresponding ciphertext, with a BLOCK_SIZE of 4 as in the example code:

After analysis on offset 0
'Plain': TOBEECOUNIGHEAUTDWITLLSTOVERDING
'Key': VENTSHOUARINLACETUATWHICENIGALON
After analysis on offset 1
'Plain': ANDDREATSOFAIOUSCOVEENCEBEFOATIO
'Key': SBUTSYOUHAPPSIDEOURDONTOHISPTILT
After analysis on offset 2
'Plain': BEENVOUROJECETTEWILLERBEIGHT
'Key': NTSWHATIALLYSEDMMEWHWEREERWA
After analysis on offset 3
'Plain': ESSAATTHRISEJECTTHISEOFTEDAT
'Key': TERCOUGHDHISOSOPTAKIREDTTATI

Although this example is really too small and our corpus too sparse to give very helpful results, you can see immediately that the analyzer has got at least some things right: for instance, you can see the string EAT occurs at the appropriate position in the plaintext output for blocks offset by 1. It is worth noting that there is no guarantee that the plaintext will be output in the ‘Plain’ field and the keytext in the ‘Key’ field: they could equally be swapped around, due to the commutativity of adding characters (which is essentially addition in modular arithmetic to modulus 26) and the fact that we have assumed that the governing n-gram distributions behind both texts are the same.

Although this example might have been somewhat disappointing, I’ve had fairly good with n-gram analysis on larger ciphertexts (where alignments of common character sequences in the plain and key texts are more common) where I’ve used a more complete corpus. However, we can apply even more technology to the problem: next time I’ll try and show how we can add hidden Markov models into the mix to resolve some of the more fundamental problems with the crude maximum likelihood approach shown above.


Feb 3 2008

Leaving Windows Behind

So, I finally switched to Mac a month ago. I’ve had a Mac laptop since last summer and have been very pleased with the experience, so since Windows Vista has been giving me huge amounts of trouble (e.g. see my last post on getting Cygwin to work, though I won’t go into the full gamut of issues I had here) I decided to go for an Apple desktop machine too.

Happily, Steve Jobs has heard my cries of Windows-inflicted pain and ordered his minions to release a new revision of this baby:

Mac Pro

Beautiful, isn’t it? With 8 cores of Xeon love, it’s no slouch in the performance department either. Salivation-inducing hardware aside, it comes with OS X, which is so much better than Vista that its simply not even funny. Overall it’s fair to say that I’ve been very pleased with my purchase :-)

There have been some problems switching, of course. I have Parallels Desktop installed so that I can still develop using C# and I will probably end up installing Office 2007 on there at some point as well, but for pretty much everything else I’ve been able to find an acceptable or beyond-acceptable alternative for OS X. Here are some of my favourites:

LaunchBar

LaunchBar is a very neat application that you can use to quickly access many things on your Mac. For instance, if I want to play the album “Twin Cinema” in iTunes, I just press Option-Space, type “twin” into the box that comes up and press enter: fast and convenient. Similarly, if I wanted to open TextMate I simply press Option-Space and type “mate”. Of course, there are loads more things you can do with it such as running any AppleScript you like or make a Google search.. the list goes on. LaunchBar learns over time what abbreviations you want to associate with an action, and hence it becomes so natural that so you soon find it hard to live without it!

Unfortunately it is payware, but it’s certainly well worth the price tag.

Plot

Plot is a really nice graphing application. On Windows I was using Gnuplot, which is doubtless powerful but insanely hard to use. Plot just works and supports pretty much every feature I need. The graphs it outputs look very professional: see for yourself.

LyX

LyX is what I’m using instead of Office (NeoOffice, the OS X OpenOffice port, is too sluggish for words so I’m trying to avoid it). It’s a nice friendly interface onto an OS X LaTeX distribution that makes the common case fast while still letting you access the full power of LaTeX when you need it. The application is actually nominally cross platform but I had numerous problems with crashes and weird behavior in the Windows version that have yet to occur on Mac.

1Passwd

I bought 1Password (it actually came as part of the MacHeist deal) to replace my long-time Windows password manager Password Manager XP. I have no complaints: on the contrary, 1Passwords integration with Firefox and the OS is much more reliable and complete than Password Manager XP ever managed.

What’s more, they are about to release a service called my1Password that will let me get web-enabled access to my passwords from any location and platform! I’m happy as a clam about this as it’s proven impossible to find a decent cross platform desktop password manager application. I should give a shout out to Clipperz here as they have had a decent implementation of this for a while, but the lack of integration with my main password manager (so I have to maintain two lists) and minuscule password limit have put me off using it regularly. UPDATE: Marco Barulli from Clipperz has responded to what I said here: please read this post to get the full story.

Time Machine

Time Machine, oh Time Machine, how did I ever get backups done before I had you? The answer is: with great difficulty. On Windows I set up a scheduled task to use SyncBack to clone my hard disk to another server. Unfortunately, this was pretty unreliable (partly because I was backing up onto a Linux file system that had an imperfect emulation of Windows security and didn’t seem to support Unicode properly) and also meant that I only had a backup of the most recent version of my filesystem. With Time Machine everything is seamless and I can go back weeks or months in time to see my files at any point, all from within the Finder! Awesome!

Terminal

And finally, maybe you don’t find the OS X Terminal very exciting, but for someone who has wasted many hours struggling with Cygwin and its numerous problems (e.g. the awkward attempt to reconcile the Windows and Unix permission models) it is a godsend to finally have a real Unix shell available :-)

I haven’t even mentioned some perennial favourites like Transmission, Perian or AppFresh, but my time is limited! If you really feel the need to peek into all the applications I have installed, take a peek at my iusethis profile.

Overall my switching experience has been almost entirely painless and has certainly made me more productive and satisfied with my machine. Here’s to many more happy years with Apple computers!

Champagne


Oct 20 2007

Using Cygwin on Windows Vista as an Administrator

I’ve just managed to fix a particularly annoying issue I was having with Cygwin under Vista with UAC enabled. Essentially, when I ran the Cygwin bash shell as an Administrator I was seeing the following:

  • Bash started in /bin/
  • The bash shell had no knowledge of Unix paths (i.e. the current working directory was /c/Cygwin/bin)
  • The root file system was just my C: drive
  • My .bashrc and .bash_profile were not being executed

All of this made it fairly hard to actually do anything. However, I was given a clue when I started up Poderosa (a nice GUI terminal emulator) and tried to open a Cygwin console: there was an error message about not being able to access “HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin\mounts v2\/”. This key contains the mount information for the root directory of the Cygwin instance, so it looked like the root of the problem.

I then opened up Process Monitor to see what happened when, running under the Administrator user, Poderosa tried to access that key. It got “NAME NOT FOUND” as expected. I then tried the same thing as a normal user, and found that the response code as “REPARSE” instead! The log looked like this:

Poderosa Process Monitor Log

What, do you ask, is all that VirtualStore nonsense after the REPARSE has occurred? Well, that REPARSE is actually redirecting Poderosas read of the LOCAL_MACHINE registry key to some keys in the CURRENT_USER hive: this is a new “feature” of Windows Vista designed to ensure that old applications that assume they can write to LOCAL_MACHINE still work. Any such writes by a process without the appropriate permissions are redirected to the VirtualStore instead, and then read back later transparently by this REPARSE mechanism. However, in their wisdom Microsoft have decided that this redirection is disabled when applications are run as an Administrator from a UAC protected account.

What must have happened is that I accidentally installed or modified Cygwin using the Setup program as a normal user, and it’s writes to HKLM were hence redirected to VirtualStore, which is then not consulted when you try and run Cygwin proper as an Administrator. My fix was simply to:

  1. Export the VirtualStore branch as a “.reg” file from the registry editor
  2. Open the resulting file in Notepad and replace the text “HKEY_CURRENT_USER\Software\Classes\VirtualStore\MACHINE” with “HKEY_LOCAL_MACHINE”
  3. Import the modified .reg file using the registry editor again

After that process the user-private data written by Cygwin setup was accessible machine-wide, solving my problem. I hope that this post will be able to help out someone else with similar problems!


Jun 20 2007

Breaking The Running Key Cipher: Part 1

I love challenges of all kinds, both the algorithmic and cryptographic. There is just something inherently compelling about pitting your wits alone against an intellectual challenge, free to apply any and all means to beat it. I recently took part in a crypto challenge where we were meant to break a text encoded with the running key cipher, which I hadn’t encountered before, and I came up with a rather satisfying solution for I thought might be good to share here.

For those of you who aren’t familiar with the running key cipher, it’s very simple. Say you have some text you would love to be kept secret:

IENJOYEATINGWHILESUSPENDEDUPSIDEDOWN

But still feel the need to communicate it to someone, with whom you have agreed some (usually well known) key beforehand, say:

GOBOILYOURBOTTOMSYOUSONSOFSILLYPERSONS

We just “add” the two texts together, with the letters being used as numbers in the obvious way: A = 0, B = 1, …, Z = 25. This means that e.g. C + B = 2 + 1 = 3 = D. Furthermore, we do this modulo 26, so that e.g. Z + B = A. Additionally, since in our case our key is a bit longer than the secret message, we will just not use the last couple of letters of the key. This means that our encoded message is:

OSOXWJCONZOUPAWXWQIMHSAVSIMXDTBTHFOB

The receiver can just subtract their known key to retrieve the original text. Wow, great! Since we never never reuse the key at any point in that message, and assuming we never use they key again, it must constitute a one time pad! This means that we must have perfect security, right? Ah, grasshopper, if it only it were so simple! Actually, this is not very secure at all because both our key and our message exhibit distinctly non-random distributions of letters. A simple example of this can be seen if you imagine that we subtract just the letter E (the most commonly used in English) from every letter in the intercepted ciphertext. We would expect to get considerably more than 1/26 of the resulting letters being either from the key or plaintext. If we do this with our example, we obtain:

IENJOYEATINGWHILESUSPENDEDUPSIDEDOWN
GOBOILYOURBOTTOMSYOUSONSOFSILLYPERSONS
KOKTSFYKJVKQLWSTSMEIDOWROEITZPXPDBKX
 K    K         K    K  K      KP

I’ve labelled each resulting letter with a K or a P depending on whether it came from the key text or the plain text. As you can see, most of the letters are from the key because it contains very few E characters.

This is all very well, but it’s clearly a fairly crude attack. If we want to have a chance of actually breaking the cipher we need some more heavyweight methods. As far as I know, there are a choice of three major techniques:

  • Known plaintext attacks
  • Crude N-gram analysis
  • Hidden Markov model N-gram analysis

The last one is what I actually used to break the challenge, and the one that I think is just plain coolest, but we need to build up to it as it’s pretty scary :-) . So, let’s take a look at the other techniques first:

Known Plaintext Attack
This is definitely the easiest attack to employ if you know something about the message being sent. For instance, if you, as the evil attacker, happen to know that the sender is a passionate gourmand, you might guess that EATING will appear somewhere in the key or plaintext. Knowing this, you can exhaustively subtract EATING from the ciphertext at every offset. If we do this for our example text, we obtain:

Text offset Text deciphered with key
0 KSVPJD
1 OOEOWW
2 KXDBPI
3 TWQUBH
4 SJJGAT
5 FCVFMI
6 YOURBO
7 KNGGHJ
8 JZVMCU
9 VOBHNQ
10 KUWSJR
11 QPHOKQ
12 LADPJK
13 WWEODC
14 SXDIVG
15 TWXAZB
16 SQPEUM
17 MITZFU
18 EMOKNP
19 IHZSIM
20 DSHNFC
21 OACKVG
22 WVZAZR
23 RSPEKX
24 OITPQN
25 EMEVGV
26 IXKLON
27 TDATGB
28 ZTILUZ
29 PBAZSI
30 XTOXBV

Most of those look like gibberish, but one stands out as eminently readable English text: YOURBO. This is, of course, part of the key text. The diligent cryptanalyst will then typically be able to use further guesses (perhaps inspired by the discovered piece of text) to reveal more of the message and key. For instance, he might try every word beginning with BO to try and extend the known plaintext rightwards (though this is a rather difficult proposition since there are at least 368 possibilities according to my dictionary). A small aside here about my methodology: I found the optimal experience was got by combining a suitable dictionary (I use 5desk) with grep in a Cygwin Bash shell. You can then make queries such as the one below, which counts the number of words in the dictionary beginning with “bo”.

$ grep '^bo' 5desk.txt | wc -l
368

Note that it is not strictly possible to tell whether the English words you recover are part of the key or plain text. This is because of the symmetry of the modular addition we got to perform the ciphering step. However, it is usually possible to work it out by your knowledge of the messages probable contents. Once you know which text stream is part of the key, a Google search will usually allow you to obtain the entire thing from your fragmentary knowledge, assuming that the key is a public text (as is typical: Bible extracts are popular). This will let you break the code!

However, what if you don’t know anything about the message, or, like me, you’re just a bit rubbish at making good guesses as to how to extend your known fragment of text? We probably need to employ something a little more sophisticated, like one of the N-gram analyses I mentioned earlier. However, as this blog entry is already getting overlong I shall leave exposition of these rather more clever techniques for a future entry: watch this space, especially those of you whom I have just bored senseless by explaining the absolute basics.