Yesterday afternoon, Brian Cabral, Director of Engineering at Facebook, gave a talk at the Stanford Center for Image Systems Engineering (SCIEN) with the title "The Soul of a New Camera: The design of Facebook's Surround Open Source 3D-360 video camera." Here is his abstract:
Around a year ago we set out to create an open-source reference design for a 3D-360 camera. In nine months, we had designed and built the camera and published the specs and code. Our team leveraged a series of maturing technologies in this effort. Advances and availability in sensor technology, 20+ of computer vision algorithm development, 3D printing, rapid design photo-typing and computation photography allowed our team to move extremely fast. We will delve into the roles each of these technologies played in the designing of the camera, giving an overview of the system components and discussing the tradeoffs made during the design process. The engineering complexities and technical elements of 360 stereoscopic video capture will be discussed as well. We will end with some demos of the system and its output.
The design goals for the Surround were the following:
- High-quality 3D-360 video
- Reliable and durable
- Fully spherical
- Open and accessible
- End-to-end system
These goals cannot be achieved by strapping together GoPro cameras because they get too hot and it is very difficult to make them work reliably. Monoscopic is old and no longer interesting. The challenge for VR is to do it stereoscopically: we are interested in a stereoscopic 3D-360 capture.
They are using 14 Point Grey cameras with wide angle lenses around the equator and a camera with a fisheye on the north pole. For the south pole they are using two fisheyes to get rid of the pole holding the Surround.
A rolling shutter is much worse in 3D than in 2D, so it is necessary to use a global shutter, at the expense of SNR. Brian Cabral discussed the various trade-offs between number and size of cameras, spatial resolution, wide angle vs. fisheye lenses and physical size.
Today, we have a lot of progress in rapid prototype designs. We can just try out things in the lab. For this application, the hardware is easy, but stitching together the images is difficult. The solution is to use optical flow and to simulate slit cameras.
No attempt is made to compress the data. The images are copied completely raw to a RAID of SSD drives. The rendering then takes 30 seconds per frame.
The Surround has been used for a multi-million dollar shot at grand Central Station. The camera is being open sourced because so far it is only 1% of the solution and making it open will encourage many people to contribute to the remaining 99%.
At the end of the presentation, two VR displays were available to experience the result. I did not quite dare to strap in front of my eyes a recalled smartphone that can explode anytime, so I passed on the demo. However, the brave people commented, that you can rotate your head but not move sidewise because the image falls apart. It was also commented, that the frame rate should be at least 90 Hz. Finally, people reported vergence problems and slight nausea.