Mar 14 2008

Bitesize Functional Programming: Comprehensive Comprehensions

As my final year undergraduate project I've implemented Philip Wadler's and Simon Peyton Jones' Comprehensive Comprehensions in the Glasgow Haskell Compiler. I'm happy to report that my patch was accepted for inclusion in the compiler, so this is a feature you'll really be able to use come the release of 6.10!

So, what are comprehensive comprehensions? Well, first let's just review what list comprehensions are:

triples = [(a,b,c) | a <- [1..20]
                   , b <- [a..20]
                   , let sum = a^2 + b^2
                   , c <- [b..20]
                   , sum == c^2]

This Haskell code draws numbers from 1 to 20 into the variable a, and for each one of those elements draws the numbers from a to 20 into b. We then compute the sum of the squares of each of those numbers, and finally give up in all but the case where we can find some c in the range from b to 20 such that that sum is the same as the square of c. Assuming that we have passed this test, we output a tuple containing a, b and c. Clearly, this implements some code for finding small Pythagorean triples.

Pythagorean Triples

So, the result will be:

[(3,4,5), (5,12,13), (6,8,10), (8,15,17), (9,12,15), (12,16,20)]

We can build an analogy between what we can do with list comprehensions and what we can do with parts of SQL. Drawing an item from a list is like SELECTion, filtering out certain items is similar to a WHERE clause and so on. However, there are some prominent features of SQL that aren't so easy to express with just these vanilla list comprehensions. How could we write the following query in Haskell?

SELECT Name, SUM(Salary)
FROM Employees
GROUP BY Dept
ORDER BY SUM(Salary) ASC
LIMIT 5

Well, the answer is that we can use our lovely new generalized list comprehensions and write this:

[(the dept, sum salary)
  | (name, dept, salary) <- employees
  , then group by dept
  , then sortWith by sum salary
  , then take 5]

Notice that the existing keyword "then" is being used to introduce the sorting and grouping functionality that we require: this happens to be a bit different from the syntax in the paper, so be careful you don't get caught out by this.

What isn't obvious from the presentation above is just how general these comprehensive comprehensions are! Unlike SQL you can sort or group at any point in the query without making use of subqueries. Unlike SQL you can use any aggregation function, not just one it has defined like COUNT or SUM. Most significantly, unlike SQL you can actually use alternative grouping and sorting functions! So, you could choose a grouping function that actually looked for runs in the thing you are grouping by, or a "sorting" function that just returns the first 5 elements of the list: if you look at the example above we used just this feature to implement the LIMIT clause of SQL.

These new pieces of syntax are pretty nice, and speaking from personal experience it's quite hard to do heavy duty list manipulation without them once you've grown used to having them available. In particular, I found them to have a killer application in the code I used for benchmarking the syntax extension, for slicing and dicing lists of the runtime and heap usage of various program runs.

If you want the full story on how to use the syntax, you can check out the documentation. Go forth and conquer with your new, readable, list-munging powers! I'll leave you with a picture of the guys you have it all to thank for, those giants of functional programming Simon Peyton Jones and Philip Wadler:

Simon Peyton Jones and Philip Wadler


Mar 3 2008

Google Images Plugin For Corripio

A simple post this time. Corripio is a program for OS X that lets you easily find album artwork for your music collection. However, it's a bit rough around the edges and its built in web based mechanisms for retrieving this artwork are rather limited and frankly a bit rubbish. I've managed to solve all these issues by writing a plugin to fix its unaccountable lack of support for Google Images.

However, a word to the wise: this is probably against the terms of service, so make sure you only use this plugin for personal use!

#!/usr/bin/ruby
#

require 'net/http'
require 'open-uri'
require 'uri'
require 'cgi'

# Construct a query for good-sized Google images by taking the supplied keywords and adding "album"
query = (ARGV + ["album"]).join(" ")
sizes = ["medium", "large", "xlarge"].join("|")
uri = "http://images.google.co.uk/images?hl=en&q=#{CGI.escape(query)}&imgsz=#{CGI.escape(sizes)}"

source = open(URI.parse(uri)).read.gsub("\r\n", "")

# Look for javascript statements of the form dyn.Img("foo", "bar", "lol", "http://www.example.com/yay.jpg")
matcher = /dyn.Img$$".*?",".*?",".*?","(.*?)"/
matches = source.scan(matcher)

# Push back the best 10 results
matches.first(10).each { |match| puts match.first }

To install the plugin, copy the above code into a new file in ~/Library/Application Support/Corripio/Plugins and make sure it is marked executable. Now, you can drag that file into the Plugins section of the Corripio preference pane and then (optionally) disable the default image search plugins. When you're done the preferences pane should look something like this:

Corripio preference pane