Thursday, October 13, 2016

Facebook Surround

Yesterday afternoon, Brian Cabral, Director of Engineering at Facebook, gave a talk at the Stanford Center for Image Systems Engineering (SCIEN) with the title "The Soul of a New Camera: The design of Facebook's Surround Open Source 3D-360 video camera." Here is his abstract:

Around a year ago we set out to create an open-source reference design for a 3D-360 camera. In nine months, we had designed and built the camera and published the specs and code. Our team leveraged a series of maturing technologies in this effort. Advances and availability in sensor technology, 20+ of computer vision algorithm development, 3D printing, rapid design photo-typing and computation photography allowed our team to move extremely fast. We will delve into the roles each of these technologies played in the designing of the camera, giving an overview of the system components and discussing the tradeoffs made during the design process. The engineering complexities and technical elements of 360 stereoscopic video capture will be discussed as well. We will end with some demos of the system and its output.

The design goals for the Surround were the following:

  • High-quality 3D-360 video
  • Reliable and durable
  • Fully spherical
  • Open and accessible
  • End-to-end system

These goals cannot be achieved by strapping together GoPro cameras because they get too hot and it is very difficult to make them work reliably. Monoscopic is old and no longer interesting. The challenge for VR is to do it stereoscopically: we are interested in a stereoscopic 3D-360 capture.

They are using 14 Point Grey cameras with wide angle lenses around the equator and a camera with a fisheye on the north pole. For the south pole they are using two fisheyes to get rid of the pole holding the Surround.

A rolling shutter is much worse in 3D than in 2D, so it is necessary to use a global shutter, at the expense of SNR. Brian Cabral discussed the various trade-offs between number and size of cameras, spatial resolution, wide angle vs. fisheye lenses and physical size.

Today, we have a lot of progress in rapid prototype designs. We can just try out things in the lab. For this application, the hardware is easy, but stitching together the images is difficult. The solution is to use optical flow and to simulate slit cameras.

No attempt is made to compress the data. The images are copied completely raw to a RAID of SSD drives. The rendering then takes 30 seconds per frame.

The Surround has been used for a multi-million dollar shot at grand Central Station. The camera is being open sourced because so far it is only 1% of the solution and making it open will encourage many people to contribute to the remaining 99%.

At the end of the presentation, two VR displays were available to experience the result. I did not quite dare to strap in front of my eyes a recalled smartphone that can explode anytime, so I passed on the demo. However, the brave people commented, that you can rotate your head but not move sidewise because the image falls apart. It was also commented, that the frame rate should be at least 90 Hz. Finally, people reported vergence problems and slight nausea.

Facebook Surround kit

Dataset metadata for search engine optimization

Last week I wrote a post on metadata. Google is experimenting with a new metadata schema it calls Science Datasets that will allow it to better make public datasets discoverable.

The mechanism is under development and they are currently soliciting interested parties with the following kinds of public data:

  • A table or a CSV file with some data
  • A file in a proprietary format that contains data
  • A collection of files that together constitute some meaningful dataset
  • A structured object with data in some other format that you might want to load into a special tool for processing
  • Images capturing the data
  • Anything that looks like a dataset to you

In your metadata schema you can use any of the dataset properties, but it should contain at least the following basic properties: name, description, url, sameAs, version, keywords, variableMeasured, and If your dataset is part of a corpus, you can reference it in the includedInDataCatalog property.

There are also properties for download information, temporal coverage, spatial coverage, citations and publications, and provenance and license information.

This is a worthwhile effort to make your research and public datasets more useful to the community.

Creative Commons LicenseGoogle

Thursday, October 6, 2016

Progress in wearable displays

Yesterday afternoon, Bernard Kress, Partner Optical Architect at Microsoft Corp, in the HoloLens project, gave a talk at the Stanford Center for Image Systems Engineering (SCIEN) with the title "Human-centric optical design: a key for next generation AR and VR optics." Here is the abstract:

The ultimate wearable display is an information device that people can use all day. It should be as forgettable as a pair of glasses or a watch, but more useful than a smartphone. It should be small, light, low-power, high-resolution and have a large field of view (FOV). Oh, and one more thing, it should be able to switch from VR to AR.

These requirements pose challenges for hardware and, most importantly, optical design. In this talk, I will review existing AR and VR optical architectures and explain why it is difficult to create a small, light and high-resolution display that has a wide FOV. Because comfort is king, new optical designs for the next-generation AR and VR system should be guided by an understanding of the capabilities and limitations of the human visual system.

There are three kinds of wearable displays:

  • Smart eyewear: extension of eyewear. Example: Google Glass
  • Augmented reality (AR) and mixed reality (MR): extension of the computer. An MR display has a built-in 3d scanner to create a 3d model of the world
  • Virtual reality (VR): extension of the gaming console

Bernard surveyed all avenues in wearable displays from their inception to the projections in the future. The speed of the presentation and the amount of material made it impossible to follow the talk unless you are an expert in the field. After the presentation, Bernard told me the size of his PowerPoint file is about 250 MB!

My takeaway was that the biggest issue in wearable displays is cost. So far, the optics engineers designed with cameras in mind and over-designed. The current breakthrough is that now the optics engineers start understanding the HVS, so they can design systems that are just as good as our MTF. Bernard claims that so far the industry has been mostly about hype but in 2017, products will take off and the new challenge is "show me the money."

By Microsoft Sweden [CC BY 2.0 (], via Wikimedia Commons

Tuesday, October 4, 2016


As Carlsson notes, big data in not about "big" but about complexity in format and structure. We can approach the format complexity through metadata, which allows us to navigate through the data sets and to determine what they are about.

Two important requirements on experiments are replicability and reproducibility. Replicability refers to the ability to rerun the exact data experiment to produce exactly the same result; it is an aspect of governance and it is good practice to always have somebody else to check the data and its analysis before it is published. Reproducibility refers to the ability to use different data, techniques, and equipment to confirm the same result as previously obtained. We can be confident in a result only after it has been reproduced independently. These two requirements guide us to what kind of metadata we need.

There are three classes of metadata: context, syntax, and semantic.

Context of data refers to how, when, and where it was collected. The context is usually written in a lab book. If we need to replicate an analysis at a later time, the lab book might be unretrievable, therefore the context of data has to be stored with the data. This can also be a big money and time saver because some ancillary data we need for an analysis might already be available from a previous experiment; we need to be able to find it.

The syntax of data refers to the format. Analysts spend a large amount of their time wrangling data. When the format of each time series is clearly described, this tedious work can be greatly simplified. During replication and reproduction, it can also help diagnose such frequent errors like the confusion between metric and imperial units of measure. Ideally, with the data, we should also store the APIs to the data because they are part of the syntax of data.

The semantic of data refers to its meaning and is the most difficult metadata to produce. We require a unified framework that researchers in all scientific disciplines can use to create consistent, easily searchable metadata. Ease of use is paramount. Because the ability to share data is so important, we want the process of metadata creation to be as painless as possible. This means that we must start by creating an ontology for each domain in which we create data.

Ontologies evolve with time. a big challenge is to track this evolution with the metadata. For example, if we called a technique "machine learning" but then realize the term is too generic and we should call it "cluster analysis" because this is what we were doing anyway, we have to update also the old metadata. Data curation applies also to the metadata.

the evolution of terms

Some metadata can be computed from the data itself, for example, the descriptive statistics. At NASA, the automatic extraction of metadata from data content is called data archeology.