Thursday, January 14, 2010

Feeding big iron

Back in the mid-eighties, things were looking good for big iron printers. Tibor Fisli was getting very nice uniform dots with his quad-spot laser diodes and Gary Starkweather was succeeding with his 4000 dpi follower to the Platemaker, while Nick Sheridon was cranking up the printer speed to 300 ppm. The challenge for us in the Computer Science Lab was to be able to drive this big iron at speed.

The graphic designers who were producing their material digitally on scanners from Crosfield, Hell, and Scitex were suffering from hardware that was much slower than they could lay out a spread. Therefore, the next big investment of a successful pre-press house was the acquisition of a vector processor, which allowed feats like rotating an image. Parallel computing is key in the graphic arts and printing.

This told us that the way of the Dorado with its ECL logic was not the right way. The follower would be a multi-processor system with CMOS logic. Thus the Dragon was designed, and considerable effort went into simulations to balance the system architecture.

The simulations showed that scalability works only up to 8 processors. Adding more processors did not increase linearly Dragon's performance, but it was still a very powerful machine at the time.

Then came what Nathan likes to call a tenuki. Smart politicians realized that instead of out-braining the evil empire we can just out-spend it and destroy it that way. This marked the end of research and the beginning of out-sourcing. What counts is price, not performance, so everything just became done incrementally where ever the wages were lowest.

Until now. CMOS has hit the wall and is not getting faster, so we are back to multiprocessing, or in today's lingo, multi-cores. In the meantime, big iron has slowly kept growing:

Scitex and Indigo printers

These are real beasts and manufacture new material that can be designed on today's powerful workstations, like posters 5 meters high and half a kilometer long, or custom photo albums where each album has completely different pictures, or variable data print jobs where each piece is customized for the specific reader:

commercial and industrial printing

When you use an industrial printer to print a building-wrap, or a commercial printer to print a million different magazines, you cannot trade complexity for time. The halftoned separations are so big you do not have time to wait for the bits to be served from a slow disk. You need to print in real-time.

How can you feed big iron?

Today's general purpose processors are not really well suited for rendering pixels. In fact, they are really a RISC in a CISC and a lot of the chip surface is used to predict branches, cache loops, interpret complex instructions, etc. This is all stuff that is not really needed when you stream a gazillion pixels through the system and apply the same rendering operations to them.

CPU core

What you want is not a fancy core with most of the silicon just sitting there while you try to feed your big iron. It is better to have a simple basic processor, but to have a lot of them, like the vector processors of yore.


Well, an important computer application are games, and gamers have similar rendering requirements as yours, but they are many more, so GPUs are inexpensive consumer products.

Until a short time ago, the GPUs were very specialized, but their architecture has changed considerably in the last few years and they have become programmable. The latest crop, combined with OpenCL, are actually simple general purpose processors that can be programmed to render all the pixels required to feed the big iron.

I have oversimplified a bit. In fact, the print job comes in the form of a PDF file, and rendering is not the only task of a RIP. There are operations like interpretation that cannot be parallelized at the pixel level und must be executed serially on general purpose cores, where for example each core works on a different page or tile.

To run the big industrial and commercial jobs, we not only need scalability down, but we also need scalability up, because there is a limit on the number of cores in a system and we would like to have multiple systems working on the same job. This is achieved with mapReduce algorithms:

mapReduce flow

In summary, to feed big iron with jobs like 5 by 500 meters size posters or 1 million different book pages you need scalability, and you need to be able to scale up as well as scaling down:


To learn how to achieve this, you may want to attend the Electronic Imaging Symposium in San Jose next week, where in the conference Color Imaging XV: Displaying, Processing, Hardcopy, and Applications in the session on Color Reproduction and Printing, John L. Recker will present all the gory details.