YOUR FEEDBACK
shirley wrote: As an ISV and service provider, we specialise in .NET based collaboration soluti...
Cloud Computing Conference
March 22-24, 2009, New York
Register Today and SAVE !..

2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts

SYS-CON.TV
TOP THREE LINKS YOU MUST CLICK ON


It's a Multi-Core World: Let the Data Flow
A functional parallelism paradigm that fits multi-core processor architecture

The multi-core buzz is everywhere. Pick up a newspaper and the local electronics mega-store is advertising multi-core desktops and laptops to the consumer. Interesting, but what does it mean to the everyday Java programmer? Maybe nothing. If you live in the application server world writing EJB-based applications your application server does most of the heavy lifting for you. It handles concurrency just fine. But that doesn't cover all applications. Multi-core technology will especially affect applications that must process large amounts of data in a non-transactional (outside of a database context) manner. For this class of applications, the implications of multi-core are huge.

Why? Well first, notice the processing speeds of multi-core processors. They're not getting faster. In fact, they may be slowing down. As manufacturers add more cores to a chip, the processing speed of each core is usually slowed down to prevent overheating. The 80-core chip that Intel demonstrated recently wasn't 80 cores of x86 architecture but a simpler architecture. This may be an industry trend as more and more cores are squeezed onto a chip. The processor architecture may become heterogeneous, with a few full-power "legacy" cores and many specialized cores. The IBM Cell architecture already employs this scheme, with a single PowerPC core at the center and eight SPU cores connected using high-speed interconnects.

One implication you should take away from all of these processor changes: your single-threaded application may actually slow down on a multi-core system. If you need faster runtimes to meet shrinking SLA windows, you 'll have to multithread your applications, now! No problem, right? Java has included the java.util.concurrent package since Java5. This library contains many powerful constructs from which you can build a fully concurrent and scalable application. But, that isn't always easy or straightforward. The java.util.concurrent package is a set of building blocks that you must master and put to good use. There are several good books on this subject. We highly recommend Java Concurrency in Practice by Brian Goetz, for one.

There's a technology that's been around for years called dataflow that can solve the multi-core dilemma. How? This article will go into detail about dataflow, but the gist is this: dataflow provides a functional parallelism paradigm that fits well into multi-core processor architecture. A dataflow instance consists of a directed graph of processing nodes connected with FIFO data queues. This pipelined architecture lets applications be built from small-size reusable components stitched together with queues. The diagram below gives a small example of dataflow graph. Since it's a pipelined architecture, it naturally takes advantage of many processing cores. But more on that later.

First, we'll cover the overall nature and architecture of dataflow technology, specifically focusing on dataflow in software. Then we'll cover the history of dataflow technology, how it was first conceived and has matured and morphed over the years. Then we'll discuss several implementations of dataflow technology that exist in the marketplace today, highlighting a Java implementation. Read on for a peek into a technology that may have been ahead of its time but appears poised for the new multi-core world.

Yet Another Programming Paradigm
You may be asking: why another programming paradigm? First, the languages we have available to us don't directly support the needs of application builders. As mentioned before, the java.util.concurrent package provides most of the constructs needed to build scalable applications. However, these are lower-level building blocks. It can take many months to become familiar with these constructs and apply them right.

Second, the programming frameworks that have traditionally been used to build highly scalable applications have been targeted at the academic and scientific community. Frameworks such as MPI and OpenMP have been used to solve very large complex problems, taking advantage of some of the world's largest computers and computer grids. However, the confluence of the information explosion with the availability of very inexpensive, off-the-shelf hardware has put high-performance computing within reach of even small and medium-sized businesses. These businesses have ever-increasing demands to process more data in shorter time periods. What they don't have is a staff of concurrent programming experts. (See Figure 1)

On the one extreme we have Java and all of the functionality that it provides. The building blocks to create scalable applications are there, but a cost must be paid to tap into this functionality. On the other extreme are frameworks such as MPI and OpenMP. Again, they provide high functionality, but have traditionally been used by the academic and scientific computing world. They are not easy tools to use. Something is needed to bridge this gap to provide high-performance, highly scalable data processing to the business world. Dataflow technology can be one way to bridge that gap.

Dataflow Programming Model
Dataflow is an alternative to the standard von Neumann model of computation. Typically, we think of a program as a series of instructions each executed one after the other by a processor keeping track of its progress with an instruction pointer. In dataflow, on the other hand, channels transmitting data in one direction join computations to one another. Conceptually, you can think of this structure as a directed graph with data channels as edges and processes performing computation on the data as nodes. The processes each operate only when data is available - the data flowing through the network is all that's needed to organize the computation. The immediate advantage is that many of the processes can be operating simultaneously, thus allowing dataflow applications to take advantage of hardware with multiple processor cores. Notice the concurrency happens external to the process; the developer doesn't have to bother with threads, deadlock detection, starvation, or concurrent memory access to build parallelism into his application.

This type of implicit parallelism stands in stark contrast to the concurrency mechanisms of many other programming paradigms. Gone are the locks of concurrent programming in imperative languages like C, which lack composability - two correct snippets of code using locks may not be correct when they're combined. Dataflow, on the other hand, allows composability: as long as the I/O contract is correct, sub-graphs may replace nodes or be spliced between them in the original dataflow network. This facilitates both program correctness, since sub-graphs can be tested as they're constructed then linked together to form larger programs, and code reuse, since commonly used sub-graphs can be copied from one application to the next.

Dataflow process networks bear some relationship to the dataflow variables of declarative programming languages like Oz. A dataflow variable is simply an unbound variable whose value can only be determined by a separate thread operating in parallel. If the dataflow variable is referenced before it's been bound, the referencing thread pauses awaiting the value. Combined with a single-assignment store (variables can be bound once at most), these variables lead to the nice property that it doesn't matter in what order we evaluate simultaneously executing expressions. Likewise, the outcome of a dataflow network is determined uniquely by its input, regardless of the order in which processes fire. Firing order impacts queue sizes and performance, of course, but this can be dealt with elsewhere besides explicitly within the program itself, dramatically simplifying the task of the dataflow programmer. (See Figure 2)

The History of Dataflow
In the early 1970s, many people grew skeptical of the von Neumann architecture's ability to cope with parallelism. The global instruction pointer and memory could both become bottlenecks in concurrent software if it wasn't carefully designed. Dataflow architecture arose as the only compelling competitor. Designed with concurrency in mind, it eliminated the global instruction pointer and memory by organizing the computation based on the flow of data through a network of processes. However, these radically different architectures proved difficult compiler targets for traditional imperative programs. Dataflow programming and languages arose in response to this need.

At this time, Jack B. Dennis developed the static dataflow model and applied it to the design of computer architectures. His model limited nodes to primitive computations, and the edges were seen as representing data dependencies among the various operations occurring in the network - they held only one data token at a time. The work of Gilles Kahn extended this idea in two ways. First, the edges of his dataflow graphs were unbounded first-in, first-out queues, providing for a flexible rate of flow across each node. Second, he allowed each node to be a complete sequential process, which is often called large-grain dataflow. This approach tends to be more effective in creating efficient software since the threads implementing processes are given a larger fraction of work, reducing the amount of time spent switching between them. Further, the model freed the concept of dataflow from defining the language of its processes. Now it was conceivable to implement them in standard programming languages such as C or Java but still have the network of code operate according to dataflow principles.


About Jim Falgout
Jim Falgout is solutions architect for Pervasive Software, where he applied dataflow principles to help architect Pervasive DataRush. He is active in the Java development community; in May of 2007, he presented a technical paper titled 'Unleashing the Power of Multi-Core Processors: Scalable Data Processing in Java Technology' at JavaOne.

About Matt Walker
Matt Walker is an engineer at Pervasive Software, seeking a deeper understanding of concurrent programming techniques to improve the Pervasive DataRush framework for dataflow programming. He holds an MS in computer science from UT and received his BS in electrical and computer engineering from Rice University.

BEA WEBLOGIC LATEST STORIES
Okay, here's the deal. When you observe the big software guys and see how quickly they adopt emerging technologies, which will change IT the way we know it today, here is what we see. Larry Ellison invested millions in old SaaS / cloud companies, which gave him zippo in return, and he ...
SYS-CON Events announced today that more than 40 Cloud technology providers, as well as Virtualization and SOA companies will exhibit at the upcoming 1st International Cloud Computing Conference & Expo (www.CloudComputingExpo.com), November 19-21, in San Jose, California. The conferenc...
SYS-CON Events announced today that the leading global SOA, Virtualization, Cloud Computing and Open Source technology provider FreedomOSS named "Gold Sponsor" of SYS-CON's SOA World Conference & Expo which will take place November 19-21, 2008, at the Fairmont Hotel in the heart of Sil...
Cassatt, the company started by BEA founder Bill Coleman, is redirecting its data center widgetry into creating internal clouds comparable to Amazon or Google out of infrastructure customers already have in-house. Coleman observed that most IT professionals aren’t comfortable outsour...
Just as people begin to understand the difference between web ops and IT, we are entering a period where clouds promise "Ops-Free" computing. Because it’s easy, scalable, available and disposable, the cloud is well on its way to becoming “technology’s next big thing.” However, ...
As far as the software industry goes, these tough economic days give the biggest business advantage to those companies who contribute directly to the solution of the big global problem and they will be the first to flourish as we dig ourselves from the ditch. Call that the new Y2K prob...
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON FEATURED WHITEPAPERS

ADS BY GOOGLE
BREAKING NEWS FROM THE WIRES

In the graph before the boilerplate, the first sentence should read: The Evans Data...