Time Traveler's Guide to Software Design: Fork/Join Parallelism

Early programming was highly dependent upon the underlying hardware. Programmers would manipulate data on what was effectively a register-by-register basis, executing [relatively] simple assembly commands to alter the information. With increasing computational power has come higher levels of abstraction, slowly freeing developers from the complexities of hardware, allowing effort to be focused on higher-level concerns.

Working with a modern, managed, programming language, one no longer needs to worry about assembly, about memory allocation or destruction, about paging or class loading, or a host of problems stemming from the operation of lower abstraction layers. This seems an inevitable, and generally beneficial trend. Developers can focus more on the busines problem and less on the technical.

In many ways then, modern parallel programming seems like a step back. Developers are being asked to consider aspects of the hardware they were previously able to ignore (the number of CPUs) and aspects of the data they were previously able to ignore (the relationship betweeen data members necessary for synchronization). Currently, many of our development models are heavy-handed: thread creation and signaling is a comprehensive, though highly error-prone coding style. The predominant abstraction--threading--often entails substantial computational overhead.

An alternate abstraction is a task-based view, often utilized in a fork/join framework. Under this methodology, a developer constructs independent tasks that can be sent to a queue to be processed by a worker thread. The threads themselves represent interfaces to processors and can operate independently of the developer's code.

This has one main advantage and one main disadvantage: on the upside, this method can spawn an optimum number of worker threads for a given system, whereas a traditional thread model entails however many functional threads a developer can identify. Furthermore, since the threads are permanent, there isn't substantial overhead in their creation and destruction. For many concurrent problems, this is an ideal approach, abstracting away the hardware again and simplifying the logical interdependency.

This simplification also reveals its weakness--or perhaps more appropriately, its limitation. For this model to work, each task or work unit must be completely functionally independent. On one hand, this limitation still meets the needs of many (perhaps most) problems requiring heavy parallelism. And unless the computational demands of a program are particularly high, then parallelism shouldn't necessarily be a chief concern. But as I stated, this is a limitation, and there are some types of thread behavior that this prohibits, particularly if multiple threads are dynamically modifying the same data.

Originally, I'd set out to interview Henry Ford, who is in many ways the godfather of modern parallelism: he is the man who optimized the execution of tasks by best mapping it to the available hardware (in this case, manpower). Unfortunately, I only was able to reach his secretary who informed me that "Mr. Ford isn't taking calls from the future. He considers the future too bombastic." Very well. I'd probably spend most of the time whining about Detroit anyway.

Time Traveler's Guide to Software Design

Thursday, October 1, 2009

Fork/Join Parallelism

No comments:

Post a Comment

The GRUMPS Time Machine

About Me

Notable Entries

Labels

Blog Archive