Thursday, March 23, 2017

Breaking the barriers to true augmented reality

Today, when you run a job on a digital press, you just turn it on, load the stock, and start printing. An army of sensors and feedback loops work with mathematical models will set up the press. In the heydays of color printing, the situation was very different: skilled press operators would spend hours making ready the press, with only the use of a densitometer and their eyes. It took them years of experience to achieve their master status.

A big breakthrough came when in 1968 when Felix Brunner invented the print control strip, which made press make-ready more of a technical process instead of a magic ceremony. Felix Brunner lived in Corippo, Val Verzasca.

Corippo seen from Fiorenzo Scaroni's rustico in Lavertezzo. © 13 July 2003 by Yoko Nonaka

Corippo is a beautiful village but it had been abandoned—people like Michael Silacci of the Opus One Winery, whose grandparents had come to California and never went back. Corippo is still the smallest municipality in Switzerland, with a population of just 13.

Corippo is so stunning, in 1975 it became a protected heritage village. This was quite difficult because the village had become dilapidated. Switzerland raised the funds to transform it into a state-of-the-art modern village that would attract a sophisticated population like Felix Brunner. The challenge was to rebuild it to modern architectural standards without changing its atmosphere and look.

The architecture department at the ETH in Zurich build a 3D model of the entire village, then one by one they started rebuilding the interiors of the houses to the state-of-the-art. The department acquired an Evans and Sutherland Picture System and at each planning step, the commission walked through the virtual village to ascertain that nothing changed the spirit outdoors. For example, if a roof was raised, it was not allowed to cast new and unexpected shadows. If a window was changed, the character of the street was not allowed to change for a passerby, and the view had to feel original from any window.

Although the Picture System was limited to 35,000 polygons, the experience was truly impressive for the planners. If you have a chance to visit Corippo, you will be surprised by the realization. The system was such a breakthrough for urbanists, that Unesco used it for the restoration of Venice. I was also sufficiently impressed to sit down and implement an interactive 3D rendering system, although on the PDP-11with 56 KB of memory running RT-11, I could only display wireframes.

My next related experience was in 1993 when Canon had developed a wearable display and was looking for an acquirer of the rendering software. While the 1975 system for Corippo was rendering coarse polygons, by early 1990 it was possible to do ray tracing, although using an SGI RealityEngine for each eye. An application was to train astronauts for building a space station.

On the quest of finding an interested party for the software, I had the chance to visit almost all companies in the San Francisco Bay Area who were developing wearable displays. On one side, using ray tracing instead of rendering plain solid color polygons made the scene feel more natural, but the big advantage over the Picture System was to be immersed in the virtual scene instead of looking at a display.

There were still quite a few drawbacks. For one, the helmets felt like they were made of lead. The models were still crude because to follow the head movements, ideally, the refresh rate should have been 90 Hz, but even with simple scenes, the refresh rate was typically just 15 or 30 Hz. However, the worst perceptual problem was the lag, which disabled the physiological equilibrium system and caused motion sickness. Another positive development was the transition from the dials and joysticks of 1975 to gloves providing a haptic user interface.

People from my generation spent 13 years in school learning technical drawing, which allows us to visualize mentally a 3D scene from three orthographic projections or from an axonometric projection with perspective. However, in general, understanding a 3D scene from projections is difficult for most people. The value of an immersive display is that you can move your head and thus more easily decode the scene. Consequently, there is still a high interest in wearable displays.

Today, a decent smartphone with CPU, GPU, and DSP has sufficient computing power to do all the rendering necessary for a wearable display. The electronic is so light that it can be fit in a pair of big spectacles that are relatively comfortable to wear and are affordable for professionals to buy. Last year, Bernard Kress had predicted that 2017 would be the year of the wearable display, with dozens of brands and prices affordable by consumers. Why is it not happening?

On March 14, 2017, Prof. Christian Sandor of the Nara Institute of Science and Technology (NAIST) gave a talk with title Breaking the Barriers to True Augmented Reality at SCIEN in Stanford, where he suggested the problem might be that today's developers are not able to augment reality so that the viewer cannot tell what is real. He showed the example of Burnar, where flames are mixed with the user's hands and these users had to interrupt the experiment because their hands were feeling too hot.

Christian Sandor, Burnar

True AR has the following two requirements:

  1. undetectable modification of user's perception
  2. goal: seamless blend of real and virtual world

On a line from manipulating atoms with controlled matter to manipulating perception with implanted AR, current systems should achieve surround AR (full light field display) or personalized AR (perceivable subset). In a full light-field display, the display functions as a window, but with the problem of matching accommodation and vergence. Personalized AR is a smarter approach because the human visual system is measured and only a subset of the light-field is generated, reducing the required display pixels by several orders of magnitude.

In many current systems, the part of the image generated from a computer model is just rendered as a semitransparent blue rendering, hence it is perceived as separate from the real world. True AR requires a seamless blend. The most difficult step is the alignment calibration with the single point active alignment method (SPAAM). The breakthrough from NAIST is that they need to perform SPAAM only once: after that, they use eye tracking for calibration.

The technology is hard to implement. The HoloLens has solved the latency problem, but Microsoft has invested thousands of man-years in developing the system. The optics are very difficult and there are only a few universities teaching it.