In January I wrote a post on parallel error diffusion. In the meantime the paper has been published with this citation: Yao Zhang, John L. Recker, Robert Ulichney, Giordano B. Beretta, Ingeborg Tastl, I-Jong Lin and John D. Owens, "A parallel error diffusion implementation on a GPU", Proc. SPIE 7872, 78720K (2011); doi:10.1117/12.872616. The link is http://dx.doi.org/10.1117/12.872616. In that paper we focussed on achieving a possibly efficient CUDA implementation of the BIPED algorithm.
A new paper, Yan Zhou, Chun Chen, Qiang Wang, Jiajun Bu and Hua Zhou, "Block-based threshold modulation error diffusion", J. Electron. Imaging 20, 013018 (Mar 25, 2011); doi:10.1117/1.3555132 just appeared in JEI. Their focus is on achieving a possibly high image quality with BIPED. Lacking performance data, I do not know how it performs compared to sequential ED. The link is http://dx.doi.org/10.1117/1.3555132.
Abstract: In this paper, we describe a block–based threshold modulation error diffusion algorithm to parallelize the halftone process without generating both block-boundary and diagonal artifacts. A novel scan path is used to pass the quantization error effectively between blocks and the input-dependent threshold modulation is applied during the inner-block processing. To obtain a suitable parameter set for the error weights and threshold modulation strength, a cost function is designed. Experimental results show that the proposed algorithm generates high quality halftone images which are visually similar to those generated by serial error diffusion algorithms. Our algorithm achieves better performance both in quality and parallelism compared to other halftoning approaches.