Overview
The High Performance Computing Initiative at Arizona State University has engaged with Pentum Group, Inc. (PGI) in an effort to construct new streams-based approaches to parallel processing. The emphasis of the project has been on developing new techniques that both provide semantics useful for supporting a range of signal processing applications, as well as being flexible enough to provide easy portability across a range of current and emerging computing architectures, from computing clusters to Field Programmable Gate Arrays and other hardware accelerators.
Signal processing applications require the processing, usually in real time, of large amounts of continuous data, usually from several sources. Developing these applications with traditional procedural programming requires extensive data movement and memory management and this becomes only more complicated in parallel and heterogeneous environments.
StreamVSIPL provides a potential solution to this problem by providing application developers a data streams-based programming model. Built upon the Vector Signal Image Processing Library (VSIPL) it provides a large tool set for signal analysis and manipulation. Rather than procedurally manage memory and data while calling VSIPL to perform an operation the data is placed into logical flows of data called streams. These streams are then connected to operations which perform computation on the data and return it to the stream. SCStreams was developed as a framework to support StreamVSIPL in both shared and distributed memory environments. It provides a framework for developing streaming applications in C++.
Example Streams Application

SCStreams Execution
The SCStream Runtime provides a simplified event loop for execution. SCStreams only provides one element of computation and one method of invocation. Models are broken down into "SV_Methods" which are encapsulated in an object with a run() method. These "SV_Methods" can be scheduled to execute on the next execution cycle by being "Triggered." There is no support for triggering SV_Methods to execute in a future cycle.
Parallelism
Shared Memory
- OpenMP during the event execution loop utilizes available processors
Distributed Memory
- Triggers to Modules off node are passed transparently to the appropriate processor
- Placement of Modules on nodes is handled at the library layer (StreamVSIPL)
- Modules can trigger on All nodes at once with a “Global” flag. Allows modules to implement domain decomposition in implementation.
Benchmark Results

Shown above is the performance of Cannon's Algorithm written in StreamVSIPL using variable number of 2.66ghz Intel Clovertown processors. Approximately linear speed up is seen from 4 to 36 processors. The marginal performance improvement then begins to tail off as communication and signaling overhead begin to impact efficiency.
Future Work
- The current implementation of SCStreams requires users to manually place operations onto individual nodes. A method to automatically place operations onto available resources would be beneficial.
- Resource schedulers could also be employed to migrate or alter this placement based on contention or necessity.
- The encapsulation of communication into pipes could allow for development of a fault tolerant communication layer. Methods such as message logging could be used with an automatic placement algorithm to route around and survive node or communication failures.
|