High Performance Computing systems continue to grow in size and complexity. Applications for these systems are growing in complexity to match. The recent advent of multi- and many- core chips has only accelerated this trend, with large scale applications now requiring tens to hundreds of thousands of threads to achieve maximum performance. Debugging technology has remained fairly constant in recent years, with most effort still focused on interactive debugging schemes. While commercial debuggers now exist that will execute at reasonably large scales, it is not clear that large scale interactive debugging is compatible with the way large systems are operated, or that the information presented in this way is useful to the developer at very large scales. The Offline Parallel Debugger (OPD) is a novel approach to parallel application debugging that moves the debugging process from interactive to offline. OPD allows debugging and performance analysis activities to be automated, and information about program state to be collected in a relational database for later analysis.
Resources
For more information please contact Anthony DiGirolamo or Karl Lindekugel
|