Digital Hutspot: How To Write Parallel Programs

Notes from How to write Parallel Programs: A First Course

How To Write Parallel Programs

- Nicholas Carriero & David Gelernter

Why did I read this book?

Cpu counts are going up, and the way we write software is gonna change - take a look at The free lunch is over . While that was penned three (four?) years ago, the way we really design software (if you are not a video game developer or something) hasn't changed all that much.

This is going to have to change eventually if we want our software to ever use the computing power that really is out there. What are we gonna do? Well as the above article points out, guys like me are gonna have to change how they think about implementing software. So ...

I read a really short review somewhere and I had to get the book.

Parallel programming

The book talks about a coordination language called Linda, which looks like to me as if it was cooked up in Lisp. The basic idea behind Linda is that you have this database called a tuple store (or space), and you have all processes interact by means of this database.

There are only a few commands and you can 'search' the database using variables as you would in prolog.

example:

in("test", 1, ?x)

so that would find a tuple of length 3 in the database and it would match any tuple with "test" in the first position and 1 in the second and whatever in the third. If it can't find anything, the default behavior is to block as we are expecting for it to turn up at some point.

Another method is to insert something in the tuple store:

out("test", 1, 1)

The third method is eval. Eval basically starts a new thread (in some implementations, groups of threads) and it eventually adds the tuple into the space:

eval("test", 1, somefunction(x))

so this would go run a new thread, evaluate somefunction in that thread and put the tuple in the store. This will not bock the parent.

There are other operators, but these are the three main ones. It is not intended to be a full language, just a 'bolt on' to an existing one.

That's it.

Basic paradigms when writing parallel programs

The book used the metaphor of a house and how you would sequence the workers to work on the house.

Result

Here you take every part of a house by section and you would assign a worker to one section, no matter what might be involved. For example, you might assign a worker a 'bedroom' and the worker would do everything for the bedroom, including framing, the wiring, the carpet, everything.

This would possibly run into problems because any workers given simple jobs might get done before the ones with complex jobs (i.e. kitchen, bathroom) and then you would have workers just sitting around.

Specialist

Here you would build a house like we typically do, that is you would have the plumber do the plumbing, the carpenter do the framing, the guy with the big whopping machines dig out the basement. etc.

Of course here you have the problem (assuming you only want to build one house) that you have a all the roofers waiting around on the guys with the big whopping machines and the carpenters. (ok the metaphor breaks down)

Agenda

Here you have all the workers line up and dish out the next thing that needs to be done. If you are building a foundation, you can have all the workers line up and bring a block to the building, apply mortar and then go back and do the next thing. Each worker will need to be told precisely what to do and it assumes all workers are equally adept at doing all the jobs.

Model substitution

According to the book, (at least in the software world) if one model fits the problem domain right, but is inefficient, we can (and this was important) substitute any of the methods for any of the others. We do this to fit the run-time environment that we are actually operating in, whatever is the most efficient. ( the book goes through several examples)

Granularity knobs

Another important point in the book is that each run-time environment is different from the next. We may actually be running on a one-processor computer, or we may be trying to solve on a supercomputer with a gazillion cores. The run-time dynamics of each system is going to vary by quite a bit. The book shows several examples where we may only use a fraction of the resources for certain parts of the problem, but could use more than we have for other parts of the problem.

To address the situation, they suggest granularity knobs, or regulating how much each subprocess could do at once, and ideally make this a dynamic property during execution. To use our house analogy, you might make more progress if each person brings 3 blocks to the house instead of one, but you might not make much progress at all if you make them bring 1000 blocks at a time.

Wrapping it up

This was an interesting book (available online btw) that helped to think about parallel programming. It would be interesting to try to write Linda in Lisp (or clojure?). It would also be interesting to also try some functional methods to parallel programming (haskell comes to mind), and compare them to the actor model that is presented in the erlang book. Each one of them should be able to implement the ideas contained in this book, if not Linda itself.

Digital Hutspot

Tuesday, February 10, 2009