Thursday, January 13, 2011

GPU-Completeness

In the last decade, the computing industry has undergone a major revolution: software from suites to apps and hardware from workstations to mobile devices. The hardware platform of choice is no longer a souped up high clock-rate server class microprocessor but a system on a chip (SoC) combining a CPU, GPU, memory controller, etc., all running at very low power. The challenge is then how to partition a computation between CPU and GPU, which is done at runtime through an OpenCL kernel. This kernel is very difficult to write, because today it is based on arcane heuristics, not on an algorithm based on the system's current state.

At the Electronic Imaging Symposium, Dr. I-Jong Lin from HP Labs will present a new theory to solve this problem. His lecture on GPU-Completeness: Theory and Implications will be in the Conference on Parallel Processing for Imaging Applications. In a nutshell, when an algorithm is transformed from serial to parallel, there is a loss of accuracy. When the accuracy vs. parallelism trade-off is exponential rather than polynomial, the problem belongs in the class of GPU-Completeness.

The algorithmic class of GPU-Completeness is defined in terms of input, output, parallel performance, and a quality metric. Dr. Lin will validate his theory with experimental data from imaging applications: color transformation, halftoning, and run length encoding.

Useful links: