Recently a friend showed me on YouTube a movie called Silicon Valley. The movie mostly introduced characters and their environment, finishing without a conclusion, so I suspect it is an episode from a TV series. The setting is a stereotype of the Web 2.0 Silicon Valley and a good part of the plot took place in a company called Hooli, a mini version of real world Google.
If you live in the Silicon Valley the movie might be boring. However, the technology the main character is supposed to have invented got my attention. It is supposed to be a lossless compression algorithm for audio files that can achieve a compression rate of 1:100. Of course, this is impossible as described in the movie, because on a stypical file the lossless compression rate using the Deflate algorithm (Lempel-Ziv followed by Huffman) is about 1:3. The description in the movie is impossible because there is not that much entropy in typical audio files.
Indeed, the writer forgot a qualifier, such as perceptually or better should have written about listening performance.
On that, my colleagues and I happen to have a couple of patents, namely US 5,883,979 A Method for selecting JPEG quantization tables for low bandwidth applications and US 5,850,484 A Machine for transmitting color images.
Facsimile (fax) is an old technology for transmitting images over phone lines that is probably alien to today's readers. In analog fax, the machine consisted of a metal cylinder on which one would affix the page of a document. On the sender side, a head would shuttle in the fast scan direction and at the end of the cylinder the head would shuttle back while the cylinder rotates by one scan line in the slow direction. During the first shuttle, a photosensor in the head would produce a sound in the phone line each time it encounters a black photosite.
At the receiving end, a similar machine would move synchronously and each time a sound arrives in the phone line, it would produce a spark that would burn a dark spot in the paper.
This process was extremely slow, so it would be used only for dense documents. For text documents one would retype the document on a telex machine, which produces a legally valid copy of the text.
Forty years ago I was using fax all the time. When as a field engineer I had an OS crash I could not figure out, I would print out the core dump as a hexadecimal string and fax it from Zürich to Goleta, where the R&D division was. At the time email was not encrypted and people at any forwarding node could and did read the messages. Furthermore, telex was a European thing that was not commonly used in the USA.
A revolution happened in 1964, when Xerox invented the telecopier, which was based on a digital fax technology. The machine would convert the photosites into zeros and ones and store them in a buffer as a digital string. This string would be compressed before being transmitted. There was a hierarchy of compression algorithms that could use 1-d , 2-d coding schemes or pattern matching, with names like MH (ITU T.4), MR, MMR (T.6) and JBIG (T.85).
Having a digital signal that can be compressed with mathematical algorithms, the transmission time dropped dramatically from an hour to under two minutes per page with a typical 9600 baud modem of the time. A dozen years after the Xerox telecopier, Japanese companies were producing very affordable fax machines that became ubiquitous. In Japan, every household had a fax machine, because you could handwrite kanji text on a sheet of paper and fax it, while typing kanas was rather slow.
In 1994 I joined a team inventing the color fax technology. The international effort took place under the ITU umbrella as T.42. For the color encoding we used CIELAB, because being perceptually uniform it allowed the most compact representation. For the spacial encoding we used JPEG.
At that time, digital color imaging was still in its infancy (in Windows 3.1 you could only have 16 device colors by default) and the early inkjet printers were fuzzy, as were the early color scanners of the time. The signal processing researchers on the team applied spacial filters to improve the quality of the images, but this actually made the images look worse because the compression artifacts were being amplified.
I had the crazy idea of transforming the sharpening algorithm itself to the cosine domain. There the sharpening function could be expressed as a transformation of the DQT, or the quantization tables for the 64 kernels of the discrete cosine transform. We called this image processing in the compressed domain and essentially it consisted in lying about the DQT. For the JPEG encoding we used DQTs optimized for the input image, while the DQT included in the JPEG image was a transformed DQT including the sharpening. This is the essence of patent US 5,850,484.
Office documents consist of a combination of text and image data or mixed raster content (MRC, see here), so we would segment the document stripe by stripe and compress the foreground for example with JBIG, the mask with MMR and the background with JPEG. The ITU standards were T.44 for MRC and T.43 for JBIG in CIELAB.
Even so, transmitting the test targets (e.g., 4CP01) over a 9600 baud line would take 6 minutes per page, which in 1994 was considered unacceptable. At that time the experience was that when a device transitions from black-and-white to color, the price could be at most 25% more and the performance would have to be the same. We felt that a color fax could not take longer than 2 minutes per page on a 9600 baud connection. We achieved 90 seconds.
This prompted us to investigate perceptually lossy compression. In lossless compression, after decompression we obtain exactly the same data as in the input file. In perceptually lossless compression like JPEG or MPEG-2 Audio Layer III (a.k.a. MP3), after decompression we obtain less data, but we cannot perceive the difference. In other words, we leave out the information we cannot perceive anyway. The cosine transform makes the discretization straightforward.
This is like in color encoding we can transform the images to the CIELAB color space because it is perceptually uniform and one unit corresponds approximatively to a JND (just noticeable difference), so we can discretize from floating point to integer without perceiving a difference.
Staying with color, the next step is to further discretize the colors, so that we can perceive a difference (perceptually lossy), but it does not impair our ability to make correct decisions based on the degraded images. This had led us to color consistency and using color names to compare colors. This is related to cognitive color and categorization.
The analogue for the text in mixed documents is reading efficiency, i.e., our reading performance is not reduced based on reading speed or the ability to ready without errors. This is covered by patent 5,883,979, which I explained in this SPIE paper:
Giordano B. Beretta ; Vasudev Bhaskaran ; Konstantinos Konstantinides and Balas R. Natarajan "Perceptually lossy compression of documents", Proc. SPIE 3016, Human Vision and Electronic Imaging II, 126 (June 3, 1997); doi:10.1117/12.274505; http://dx.doi.org/10.1117/12.274505.
This is a long explanation and you cannot do it in a movie, but at least the script writer should have added the qualifier perceptual in the algorithm name and it would all have been more plausible.
If the invention is sufficiently novel that it can become the basis for a plot in a Hollywood movie twenty years later, why was my professional career a failure? As it happens, 1994 was also the time when the Internet became available to the general public and everybody went on email. An email attachment is more convenient than having a separate fax machine, especially in a crammed Japanese house. Also, the Internet was running on fiber to the home (FTTH) instead of the slow copper phone lines of the phone and fax.
Timing is everything.