Monday, April 24, 2017

Juggling Tools

Discussions about imaging invariably mention imaging pipelines. A simple pipeline to transform the image data to a different color space may have three stages: a lookup table to linearize the signal, a linear approximation to the second color space, and a lookup table to model the non-linearity of the target space. As an imaging product evolves, engineers add more pipeline stages: tone correction, gamut mapping, anti-aliasing, de-noising, sharpening, blurring, etc.

In the early days of digital image processing, researchers quickly realized that imaging pipelines should be considered harmful because, due to discretization, at each stage, the resulting image space became increasingly sparse. However, in the early 1990s, with the early digital cameras and consumer color printers, imaging pipelines came back. After some 25 years of experience, engineers have become more careful with the pipelines, but they are still a trap.

In data analytics, people often make a similar mistake. There are also three basic steps, namely data wrangling, statistical analysis, and presentation of the result. As development progresses, the analysis becomes richer; when the data is a signal, it is filtered in various ways to create different views, statistical analyses are applied, the data is modeled, classifiers are deployed, estimates and inferences are computed, etc. Each step is often considered as a separate task, encapsulated in a script that parses in a comma separated values (CSV) data file, calls one or more functions, and the writes out a new CSV file for the next stage.

The pipeline is not a good model to use when architecting a complex data processing endeavor.

I cannot remember if it was 1976 or 1978 when at PARC the design of the Dorado was finished and Chuck Thacker hand-wrote the first formal note on the next workstation: the Dragon. While the Dorado had a bit-sliced processor in ECL technology, the Dragon was designed as a multi-processor full-custom VLSI system in nMOS technology.

The design was much more complex than any chip design that had been previously attempted, especially after the underlying technology was switched from nMOS to CMOS. It became immediately evident that it was necessary to design new design automation (DA) tools that could handle such big VLSI chips.

A system based on full-custom VLSI design was a sequence of iterations of the following steps: design a circuit as a schematic, lay out the symbolic circuit geometry, check the design rules, perform logic and timing analysis, create a MOSIS tape, debug the chip. Using stepwise refinement, the process was repeated at the cadence of MOSIS runs. In reality, the process was very messy, because, at the same time, the physicists were working on the CMOS fab, the designers were creating the layout, the DA people were writing the tools, and the system people were porting the Cedar operating system. Just in the Computer Science Laboratory alone, about 50 scientists were working on the Dragon project.

The design rule checker Spinifex played a somewhat critical role, because it parsed the layout created with ChipNDale, analyzed the geometry, flagged the design rule errors, and generated the various input files for the logic simulator Rosemary and the timing simulator Thyme. Originally, Spinifex was an elegant hierarchical design rule checker, which allowed to verify all the geometry for a layout in memory. However, with the transition from nMOS to CMOS, the designers transitioned more and more to a partially flat design, which broke Spinifex. The situation was exacerbated by the endless negotiations between designers and physicists to allow for exceptions to the rules, leading to a number of complementary specialized design rule checkers.

With 50 scientists on the project, ChipNDale, Rosemary, and Thyme were also rapidly evolving. With the time pressure of the tape-outs, there were often inconsistencies in the various parsers. As the whipping boy in the middle of all this, one morning, while showering, I had an idea. The concept of a pipeline was contra naturam compared to the work process. The Smalltalk researchers on the other end of the building had an implementation process where a tree structure described some gestalt and methods would be written that decorate this representation of the gestalt.

In the following meeting, I proposed to define a data structure representing a chip. Tools like the circuit designer, the layout design tool, and the routers would add to the structure while tools like the design rule checkers and simulators would analyze the structure, with their output being further decorations added to the data structure. Even the documentation tools could be integrated. I did not expect this to have any consequence, but there were some very smart researchers in the room. Bertrand Serlet and Rick Barth implemented this paradigm and project representation and called it Core.

The power was immediately manifest. Everybody chipped in: Christian Jacobi, Christian Le Cocq, Pradeep Sindhu, Louis Monier, Mike Spreitzer and others joined Bertrand and Rick in rewriting the entire tool set around Core. Bob Hagman wrote the Summoner, which summoned all Dorados at PARC and dispatched parallel builds.

Core became an incredible game changer. While before there was never an entirely consistent system, now we could do nightly builds of the tools and the chips. Besides, the tools were no longer broken at the interfaces all the time.

The lubricant of the Silicon Valley are the brains wandering from one company to the other. When one brain wandered to the other side of the Coyote Hill, the core concept gradually became an important architectural paradigm that is on the basis of some modern operating systems.

If you are a data scientist, do not think in terms of scripts for pipelines connected by CSV files. Think of a core structure representing your data and the problem you are trying to solve. Think about literate programs that decorate your core structure. When you make the core structure persistent, think rich metadata and databases, not files with plain tables. Last but not least, also your report should be generated automatically by the system.

data + structure = knowledge

Thursday, April 20, 2017

Free Citizenship Workshop May 12

On May 12th the International Rescue Committee (IRC) is holding a free citizenship workshop hosted at and supported by Airbnb HQ located at 888 Brannan St. in San Francisco. The event starts at 1:30pm and ends at 4:30pm. Flyers are available online: English Flyer & Spanish Flyer. There will be free food and each client will be offered a $10 Clipper Card to help with transportation.

At the workshop, clients will get help from IRC to apply for citizenship (submit the N-400), submit a fee waiver request (it’s $725 to apply otherwise), and prepare for the naturalization test. All cases will be reviewed, filed, and expertly managed by an IRC Department of Justice accredited legal representative who will serve as clients legal representatives with USCIS, alert clients to updates in their cases, and provide them advice throughout the entire process. All services are free and it’s open to the public. Registration is required, but folks can choose to register online at or by phone at (408) 658-9206 or email Lots of options!

The International Rescue Committee (IRC) is an international non-profit organization founded in 1933 at the request of Albert Einstein. IRC is at work in more than 40 countries and 28 U.S. cities and each year its programs serve 23 million people worldwide.

Thursday, April 13, 2017

Computational Imaging for Robust Sensing and Vision

In the early days of digital imaging, we were excited about having the images in numerical form and not being bound by the laws of physics. We had big ideas and quickly ran for their realization. However, we immediately reached the boundaries of the digital world: the computers of the day were too slow to process images, did not have enough memory, and the I/O was inadequate (from limited sensors to non-existing color printers).

Now has finally come the time when these dreams can be realized and computational color imaging has become possible, thanks to good sensors and displays, and racks full of general purpose graphical processing units (GPGPUs) with hundred of gigabytes of primary memory and petabytes of secondary storage. All this, at an affordable price.

Wednesday, 12 April 2017, Felix Heide gave a talk at The Stanford Center for Image Systems Engineering (SCIEN) with the title Capturing the “Invisible”: Computational Imaging for Robust Sensing and Vision. He presented three implementations.

One application is image classification. In the last couple of years we have seen what is possible with deep learning when you have a big Hadoop server farm and millions of users who provide large data sets they carefully label, creating gigantic training sets for machine learning. Felix Heide uses Bayesian inference to implement a much better system that is robust and fast. It better leverages the available ground-truth and uses proximal optimization to reduce the computational cost.

To facilitate the development of new algorithms, Felix Heide has created the ProxImaL Python-embedded modeling language for image optimization problems, available from

computational imaging

Quantum imaging beyond the classical Rayleigh limit

A decade has passed since we were working on quantum imaging, as we reported in an article in the New Journal of Physics that was downloaded 2316 times. We had described the experimental set-up in a second article in Optics Express that was viewed 540 times. It is interesting that the second article was most popular in May 2016, indicating we were some 6 years ahead of time with this publication and over 10 years ahead when Neil Gunther started actively working on the experiment. The problem of coming too early is that it is more difficult to get funding.

Edoardo Charbon continued the research at the Technical University of Delft, where he built a true digital camera that used a built-in flash to create a three-dimensional model of the scene, and the sunlight to create a texture map of the image that could be mapped on the 3-d model. This is possible because the photons from the built-in flash—a chaotic light source that produces the photons from excited particles—and those from the sun—which is a thermal radiator (hot body)—have different statistics.

We looked at the first- and second-order correlation functions to tell the photons from the flash from those originating in the sun. Since the camera controlled the flash, the photon's time of flight could be computed to create the 3-d model. The camera worked well up to a distance of 50 meters.

I am glad that Dmitri Boiko is still continuing this line of research. With a group at the Fondazione Bruno Kessler (FBK) in Trento, Italy and a group at the Institute of Applied Physics at the University of Bern in Bern, Switzerland, he is working on a new generation of optical microscope systems by exploiting the properties of entangled photons to acquire images at a resolution beyond the classical Rayleigh limit).

Read the SPIE Newsroom article Novel CMOS sensors for improved quantum imaging and the open access invited paper SUPERTWIN: towards 100kpixel CMOS quantum image sensors for quantum optics applications in Proc. SPIE 10111, Quantum Sensing and Nano Electronics and Photonics XIV, 101112L (January 27, 2017).