Time Traveler's Guide to Software Design: Embaressignly Parallel : Amstrong, Ch 2

To oversimplify, there are two primary approaches to software concurrency: lock-based and message-based.

Lock-based concurrency is done primarily to separate out listener threads, and to enhance program performance. Multiple threads can (and generally) do share access to some data, which is protected by locks. Though this mechanism is mathematically reliable, in practice, implementation can be very tricky, and all sorts of bugs can arise out of it.

Message-based concurrency has a different focus: it utilizes parallelism chiefly for program reliability. Each thread exists as independent tasks, sharing no memory and communicating only through asynchronous messages. Though this approach can have performance drawbacks (particularly if much data needs to be passed back and forth), it is a simpler abstraction to work with, and is probably more conducive to generating stable programs, since program errors can be easily encapsulated within a single errant thread. Importantly, a well-organized, hierarchical ordering of task parallelism can scale far more easily; not only can threads easily run distributed across multiple machines, but a hundred-fold increase in thread count requires no extra concurrency organization, simply more processing power.

Joe Armstrong uses the term "Concurrency-Oriented Programming" to describe the design approach centering around large, message-based programs. This methodology involves identifying a program's tasks, outlining their interdependence, defining their message exchange, and then coding a direct translation of this task organization. So, how well does this approach work?

To find these answers, I traveled to 2035, to sneak an interview with AMD Lead Compiler Developer Roto Mitsumi.

Kurt: Mr. Mitsumi, back in 2009, we're still figuring out an idea scheme to parallelize our software. Can you provide us any insight into approaches we should consider?
Mitsumi: It is with great honor that I am to speak to your time. I am most humbled to have the opportunity to discuss my experience on behalf of my benevolent employer, AMD. It is--
Kurt: Mr. Mitsumi. Please, our time is limited.
Mitsumi: My sincerest apologies. If only we could parallelize this interview.
Kurt: Yes, ha. Ha. Very good. Please.
Mitsumi: The first thing one must bear in mind is the reason we want to parallelize programs in the first place. Form must follow function. A program must be a suitable abstraction of its executing hardware. So, as chips become parallelized, so must programs.
Kurt: Yes, that's where we are now.
Mitsumi: But if I am recalling correctly, that's not entirely where you are. Because you have CPUs with multiple identical cores. This does not last for long. The average mid-range AMD CPU released now has over 1000 cores, of many varieties. There are a few dozen dedicated vector cores, a few hundred cellular automata processors, a few hundred x256 cores, and many, many different optimized RISC processors.
Kurt: So, how does this affect design?
Mitsumi: Well, a programmer cannot possibly be aware of the precise system configuration, or even the type of processors available. He must be able to code in a fashion independent of the underlying instructions. To do this, the relationships between threads must be very explicitly defined.
Kurt: Are you describing a message-passing system of concurrency?
Mitsumi: Yes and no. In a traditional message-passing system, the programmer himself must define the communication. The structure emerges from the messages. This can become unmaintainable when one begins spinning off a few hundred threads. Refactoring can be profoundly challenging. Instead, we take a different approach: one still maintains the independence of threads, but instead works with the data, a much simpler abstraction to refactor around.
Kurt: Isn't that the opposite of a message passing system. So where do the messages come in?
Mitsumi: It depends on where you are looking. When the code is compiled, all messages are automatically generated from the data access. When executing, no data is actually directly shared by the threads: it is still passed by messages, but the messages are automatically generated. All a developer must do is hierarchically define a data structure, and evaluate the resulting message structure for deficiencies, performance bottlenecks.
Kurt: So, a developer can still think in objects, even while his code executes as messages?
Mitsumi: Your language is technically incorrect, but the idea is right. Most modern processors try to prioritize interprocessor message passing over shared memory access. The more cores you have, the less efficient it is to access shared memory, and we're at a point where such a system would be an unfathomable bottleneck. Because of this, the vast majority of modern computers distribute memory across the motherboard. Message-passing is how nearly all non-quantum machines will proceed.
Kurt: Interesting. I'll be curious to see this unfold in the next few decades.

There you have it. Message passing does come with its own limitations and complexities, but it is scalable, and the advantage of that cannot be underestimated as our hardware evolves. Any technique that will last in this domain must be able to cope with complexity an order of magnitude higher than most programmers currently address.

Time Traveler's Guide to Software Design

Tuesday, October 20, 2009

Embaressignly Parallel : Amstrong, Ch 2

No comments:

Post a Comment

The GRUMPS Time Machine

About Me

Notable Entries

Labels

Blog Archive