From the earliest days of digital color reproduction, there has been a need to add vector processing units to achieve viable executions times. For many imaging operations, algorithms can easily be vectorized because they operate independently on the pixels. However, some operations are spatial: sharpening, compression, error diffusion halftoning, etc.
When the elements of a vector are not independent, algorithms require communication between the processors elements operating each pixel. The architecture of vector units trades complexity for number of processing elements, so they are mostly ill-equipped for communication between elements. For this reason, usually dithering is used for halftoning on GPU raster image processors (RIP).
Because customers prefer error diffusion halftoning, there have been many attempts to develop a parallel algorithm. In 1987 Donald Knuth attempted to tile the image and halftone the tiles in parallel. The problem is that in error diffusion the residual error is pushed down to the last line in the image, where they are not very visible, while in Knuth's approach each tile has a residual error in what Knuth called baron pixels, and they are all over the image.
In 1999 Metaxas proposed a diagonal wavefront algorithm, which however is unsuited for real world vector processors. It was only in 2005 that Allebach and Li devised a block-interlaced pinwheel error diffusion algorithm that mitigated the artifacts from the residual errors languishing in the tiles. However, they did not address the implementation details.
In a GPU, the architecture is not that of a pure vector processing unit: the algorithms are designed as a set of threads that can be processed independently. The scheduling of the threads is done in hardware using proprietary algorithms that are abstracted from the implementor. In the absence of performance prediction tools, the most efficient partition of a parallel algorithm for a GPU is a complex experimental process.
At the Electronic Imaging Symposium, John Ludd Recker and Robert Ulichney from HP Labs and Yao Zhang of UC Davies will present the results of their experiments. Yao Zhang's lecture on A Parallel Error Diffusion Implementation on a GPU will be in the Conference on Parallel Processing for Imaging Applications.