Tuesday, January 27, 2015

Adapting a Camera Entity for Virtual Reality

Classic rendering systems have a virtual "camera" that is active in the scene during rendering. These systems produce the image seen from that camera.

But head-mounted displays like the Oculus Rift, Samsung Gear VR, and Microsoft HoloLens need to render two views every frame, one for each eye. They also incorporate tracking of the body in a way that traditional rendering systems don't model.

Some of the problems with tracking arise even on a single-view tracking display, and some of the problems with multiple views are increased for parallax-barrier and other passive 3D systems that might require four or eight projections.

Extending a rendering system to support these devices is more complex than just "adding a second camera" because of the way that the modeling system and post processing need to maintain state and the way that tracking information is integrated. This article describes how Michael Mara and I are extending the open source G3D Innovation Engine to support head-mounted and other multiview displays.

The Problem

A modern single-view graphics engine usually has Entity and Model classes (I'm linking to G3D's documentation for the most common subclasses of these, so that you can see the kind of API I have in mind as well as relate them to the classes in your engine). An Entity is an object in the scene, which may reference a Model that describes its geometric properties. Models are sometimes also called mesh, surface, or shape classes.

Camera and Light are usually special Entity subclasses that don't reference Models because they aren't visible. It is useful to represent cameras and lights as Entitys so that they can be manipulated in a scene editor, scripted, and attached to other Entitys.

The Camera Entity usually combines the following properties, which are often abstracted through other helper classes:

  • Body frame: center of projection of the view as a rotation and translation (often using a 3x4 matrix or vector and quaternion). This is bolted to an object in the scene for first person views, simulated via leashes and springs for third person views, scripted for cinematics, and directly manipulated for debugging. Regardless, this is controlled by the simulation/application.
  • Projection: a perspective projection, usually represented as a 4x4 matrix or field-of-view, near plane, far plane, and subpixel-offset parameters.
  • Post-processing configuration: Settings that affect depth of field, motion blur, antialiasing, tone mapping, vignetting, and other post-processed camera effects. These may be specific to the algorithms used for those visual effects or model a physical camera through models like sensor sensitivity, exposure time, and iris size.

That information relates what is on screen to what is in the virtual world. The challenge of extending a traditional renderer to virtual reality is that a natural head-mounted display abstraction of that mapping requires twice as many state abstractions in a system that is unprepared for them. A VR system needs:

  • Body frame: As above. This is controlled by the simulation.
  • Two eye frames: The eye position can move relative to the body frame when the player turns his or her head, bends over, or leans. There are also two eyes, each offset from the center slightly. The eye frame is controlled by the player, not the simulation.
  • Two projections: Each eye may have a separate projection. (Oculus does, because it uses view-space translation to model the fact that that each eye image's "center" should not be in the center of the physical display--eyes don't have symmetric fields of view.)
  • Post-processing configuration: There is a single set of post-processing options that should be used for both eyes.

Making this adjustment requires addressing several ideas. The thorniest one is separating the body frame from the eye frame. When incorporating existing content, we want to still be able to move the "camera" using existing controls and simulation, yet we still want head tracking (which could also be done with Microsoft Kinect for a single view without a head mounted display). A lot of our high-level simulation code attaches controller objects to the Camera. These execute code like "void Controller::update() { m_target->setFrame(frame); }", where the target is the camera. We want these controller abstractions to continue to work without modification in VR. A lot of our low-level rendering code accepts a camera and asks it for its frame, meaning the center of projection. We want that code to continue to work as well, and we don't want those two "frames" to diverge, because for every other Entity, setting and followed getting the frame gives you back the original value.

We considered explicitly modeling a body Entity and attaching two Cameras to it, one for each eye. This has a few problems. First, it would require modifying all of our existing scenes and projects to support virtual reality. That's a problem because we want existing content to work with a new, slightly modified renderer.

Part of our productivity as researchers is being able to quickly construct experiments by combining pieces from our extensive library of code and assets. So, new tools need to integrate into that library instead of imposing new constraints on it. This is similar to the constraints on game engine programmers, who need to add new features without imposing high content creation costs. Second, the cameras need to move freely around the body Entity due to head tracking and need to incorporate the player's known real-world height.

So, we can't directly attach cameras to a body entity, but would need some kind of VR tracker-controller joint between them. Third, scenes modeled with an explicit body and two cameras won't work for anything other than two views, such as our existing single-view or new four-view displays. 

A Solution

Our current solution is to create two different implementations of Camera, one of which is not visible at the API level. Most code will still continue to use the existing G3D::Camera class and construct it using the existing Camera::create static factory method and the data-driven G3D::Any constructor that works with scene data files. The implementation of Camera::create will now construct what is effectively the body entity, and by default, will act like a single camera bolted to the body's root position. This means that the existing API will continue to function as it did in G3D 10.00 for traditional displays with no tracking.

Internally, however, the Camera will also have a VRCamera subclass that it can produce on demand. VR-aware applications can query the number of views available and request a Camera for each one. Camera synthesizes VRCameras as needed and returns them under these queries. The VRCameras are placed by taking the root position, computing a vertical root offset based on the difference between the height of the average target player current player's height, and then composing the VR tracker's coordinate frame with this new root. The VRCameras receive a copy or reference to all of the post-processing parameters of the original Camera. In this way, they act like a body-and-eye scene graph, but without that complexity visible outside of the API boundary.

G3D tracks scene changes to minimize state changes, light probe updates, and shadow map updates. This means that there is no need to modify the renderer to amortize the cost of most operations over multiple views--that happens automatically because they are amortized over multiple frames already.

In summary, the desirable properties of our design are:
  • The Renderer doesn't need to know about VRCamera. Any code that was written to work with Camera will still work, because VRCamera is a subclass.
  • Assets don't need to know about VRCamera. They still specifies the camera in the scene and the VR adjustment is made automatically.
  • There's no additional (public) scene graph complexity, so modeling and simulation code can assume that the "camera" is synonymous with the "player avatar" in the way that it historically had.
  • The design extends to more than two views.
  • Post-processing parameters will automatically be synchronized between views because the VRCameras copy all non-projection/orientation state from the master Camera.
  • The built-in 3D debugger and scene editor user interfaces will still work with a single camera and need no modification.
  • Full compatibility between VR-specific and previous assets.
Having a body avatar in VR is important for creating presence and reducing motion sickness. Without a body, the player feels like a floating head and has no solid reference. This API doesn't address that issue. A body avatar can be attached, but won't be by default. The Camera will have to export some tracking data so that VR-aware applications that attach a body automatically will be able to adjust it to the real-time tracked head position.

These changes, and likely a future evolution of the design from it, will be in the G3D source repository soon. That will show the details of how we integrated with the Oculus VR SDK, compensated for different coordinate system conventions, dealt with the notion of "field of view" needed for scene culling well before the VRCameras are instantiated each frame, and optimized the full implementation.

I'd like to check the Oculus Rift support into the G3D repository as well, but have to first carefully separate the parts of the Oculus VR SDK that are legally redistributable from those that are not. For developers to use G3D with the Oculus Rift, they will then have to install the Oculus VR SDK on their own, change some flags, and rebuild G3D. Since John Carmack, the CTO of Oculus, has a strong track record with open source and Oculus has little incentive to prohibit distribution of software that enables their hardware, I hope that they will someday allow unlimited redistribution of the SDK for open source projects.



Morgan McGuire (@morgan3d) is a professor of Computer Science at Williams College, visiting professor at NVIDIA Research, and a professional game developer. He is the author of the Graphics Codex, an essential reference for computer graphics now available in iOS and Web Editions.