Tuesday, March 23, 2010

vicissitudes of a color book

Books can do amazing. Last October, in his post To Bits and Back Again, Nathan picked up a story on how I had to clean up the clutter on my desk. He wrote how Google's Dan Bloomberg recycled my book clutter through his secret π machine for the edification of readers on the whole planet. Here is the next saga:

Lucia Ronchi and Jodi Sanford have been at it again and authored a book on the color of food based on Artusi's magnus opus and two psychophysics experiments on color naming. Satisfied with their World Wide Web success of the excentric blue, Prof. Ronchi announced that she would send me a copy of their latest baby so I can put that through the same process.

Dan's software to restore atoms to bits (at least when they pertain to books) is a tremendous piece of technology. It worked like a charm to salvage the atoms in my recycling bin, but that was an accidental moment. When it can be planned, is there a better way, without transmutations?

Timidly I asked Prof. Ronchi, whether somewhere in the publication process there was maybe a PDF file, and if it would be accessible. She promptly emailed me two PDF files, one with the jacket and one with the contents of the book. A quick look at the document properties reveals they where produced with InDesign, hence I was looking at a top quality professional job.

I quickly grabbed my credit card to buy Simoncini's Garamond font family, so I could rearrange the pieces in Illustrator the way Google Books likes them and sent the bits back to Prof. Ronchi, whose son Curzio promptly uploaded them on Google's servers and fulfilled all the copyright release steps.

Yet, when I searched for the book, Google could not find it. That book does not exist, claims Google, but Curzio rebuts "it does: it shows up every time." When he sends me the link, I can read the book, but I still cannot search for it. What does this mean?

There appears to be a discontinuity in Google's cloud. The search engine finds it when the search is posted from a computer with an IP address in Italy, but not with my IP address here in the USA.

If you dear reader cannot find it, at least you can use this blog as a spring board and navigate to the book. Here is the link:

Lucia Ronchi and Jodi Sandford: Traditional vocabulary of italian cuisine and of its color

There is one more discontinuity. The bits Curzio uploaded were of maximal quality, because the PDF file contained the text's characters plus the subsetted Simoncini font with the formatting instructions. Therefore, I am surprised to see that the PDF file served by Google Books has been rasterized.

Sure, the rasterization is at 600 dpi, so it prints well, but why throw away information? Now the text is no longer available for copy and paste to the non-experts (hint: you have to OCR it again!).

Could it be the file size? The file I created is 380 KB, and by running Acrobat's PDF optimizer it can be squeezed a little to 372 KB. Poking in Google's rasterized file — which is much smaller at 276 KB — with Quite a Box of Tricks, reveals that it is compressed with JBIG-2 (losing the red in the title), and Adam Langley has done an excellent job on the codec. But what are 104 KB to Google's cloud? Should Adobe's optimizer do a better job? Is Google's workflow too inflexible?

The old workflow was to print the book in Florence, air mail it to Palo Alto, drive it to Mountain View, destructively scan it, compress and OCR it, then make it available in the pangalactic cloud. Is the new workflow to email the bits to Palo Alto, print them, and then continue with the old workflow?

What am I missing?