Tuesday, September 22, 2009

Ripping efficiency

The tool for deciding informatics infrastructure investments is benchmarking. In the past, the game was to invent a new performance metric, create mind-share for its superiority, and then optimize one's offerings to become the best platform according to the new metric.

Today the game is much more complex. Integrated circuits have hit the wall and it is no longer possible to simply increase the clock speed. Also, there is the new requirement of both upwards and downwards scalability: an architecture must be applicable also to cluster computing (or cloud computing) and mobile devices. One of the hot themes in benchmarking is the comparison of CPUs to GPUs, as we are now exeriencing a return of array co-processors in the form of programmable GPUs.

Recently David Kanter wrote the interesting article Computational Efficiency in Modern Processors in Real World Technologies. He uses double precision (DP) GFLOPS/s per W and DP GFLOPS/s per mm2 as the metrics. Since all integrated circuits are based on the same physics, not surprisingly these metrics do not give new insights for classical designs but show that radically different paradigms like WLIW in AMD's approach give the edge to a handful of circuits in terms of computational efficiency: ATI's RV670 and RV770, IBM's PowerXCell 8i and Intel's Atom.

For color image processing, and more specifically ripping, it is more interesting to look at the overall efficiency of a RIP (raster image processor) system, also known as DFE (digital front end) in the trade. Currently our colleague I-Jong Lin is Louisville, Kentucky for IS&T's 25th Non-Impact Printing conference, also known as NIP. There he is presenting a first paper on Proposal for Next Generation Print Infrastructure: Gutenberg-Landa TCP/IP, where he discusses GPU-based RIPs.

He explains, Graphics Processing Units (GPU) are special purpose coprocessors originally targeted at the PC gaming market. The enormous size and volume of the gaming market have driven GPU capabilities up and costs down, making GPUs into a viable and well-supported parallel computing architecture. Today, the cost of a GPU is so low that even handheld devices have a GPU.

Besides the cost reduction from using commodity hardware instead of high-performance components, an array of simple CPUs uses less power than an array of full-fledged CPUs ripping for the same high-speed printer. Let us compare an HP SmartStream Production Pro Print Server to the GPU-RIP system proposed. Since the volume of the two systems is almost the same, we do not need a full exergy destruction analysis and can just compare the CO2 footprint of the two systems.

We use the HP Power Calculator Utility and the HP workstation quick specs to compile the energy usage data. The total amount of power a device requires from the facility AC feed is known as apparent power and is measured in volt-amperes (VA). The British Thermal Unit (BTU) is the standard for measuring the capacity of cooling systems. The amount of power (Watts) consumed by equipment determines the number of BTUs/hr required for component cooling.

units
component
speed [GHz]
procs
RAM [GB]
PCI cards
HDD [GB]
[BTU/h]
VA rating
6
DL360G5
3.2
1
4
2
146
1135
353
5
DL380G5
3.2
1
4
2
146
1178
362
1
MSA50
298
87
1
Procurve
1382
405

The table above lists the data for the conventional Digital Front-End (DFE) RIP. Multiplying the ratings for each system component with the number of units of that component and adding up yields a total BTU/h of 14,380 and VA rating of 4,420.4. The table below lists the data for our experimental GPU-RIP. Multiplying the ratings for each system component with the number of units of that component and adding up yields a total BTU/h of 4,200.96 and VA rating of 1233.36.

units
component
speed [GHz]
procs
RAM [GB]
PCI cards
HDD [GB]
[BTU/h]
VA rating
1
ML370G5
3
3
4
2
72
913
270
4
Z800
2
2
3
1
250
822
241

Assuming the RIP will be operated in a factory building, it will probably not need special cooling. Therefore, we can limit our calculations to the total system VA rating. Assuming a continuous full-capacity workload for an entire year, the DFE-RIP will consume 38,723 KWh and generate 23,234 Kg of CO2. For the GPU-RIP the corresponding numbers are 10,804 KWh and 6,483 Kg, i.e., the GPU-RIP in a year will generate 16,751 Kg less CO2. To put this number in context, we used an online CO2 footprint calculator and entered the largest car owned by the authors and the miles it was driven over the last year. The calculator determined his car generated 1.52 tonnes of CO2. Thus, with the CO2 saved by the GPU-RIP in a year, he can drive for 12 years, or if just two of you esteemed readers deploy a GPU-RIP instead of a DEF-RIP, in one year you will have offset his total lifetime CO2 footprint so far.

These are the numbers we have so far. In vero, the assumption of a continuous full-capacity workload for an entire year is not realistic. In a second paper with title Numerical Simulation and Analysis of Commercial Print Production Systems, I-Jong Lin presents a new simulation system, that will allow us to make realistic predictions of ripping efficiency in terms of the CO2 footprint as a performance metric.