I'm Morgan McGuire (@CasualEffects). I've been working on computer graphics and games for 20 years at great places including NVIDIA, University of Waterloo, Williams College, Brown University, Roblox, Unity, and Activision.

See my home page for a full index of my blog posts, books, research, and projects.

Friday, March 27, 2015

Fast Colored Transparency

This article extends my previous article on Implementing Weighted, Blended Order-Independent Transparency. I previously showed how to implement a fast and robust solution for both partial coverage and monochrome transmission. The speed of the method came from three sources: 1. being able to merge multiple transparent surfaces into a single draw call, 2. avoiding the overhead of pixel interlocks, and 3. minimizing bandwidth during blending. The robustness came from order independence, at a cost of some flattening of perceived depth between transmissive surfaces in the distance.

Update Feb 2016: Mike Mara and I published a paper at I3D'16 that fully-realizes the ideas originally sketched out in this blog post.

The method from the previous article could render surfaces that have varying reflectivity with respect to frequency of light ("color") and those that have partial coverage, such as the squares and transparent engine shown below. However, it could not handle transmission varying with frequency of light.

This new article describes a prototype of a simple extension to handle non-refractive colored transmission. For example, this can approximate the transmission through a green wine bottle, red blood, or a purple stained glass window.

The colored method still requires only one 3D pass over the transparent surfaces and one 2D compositing pass. It also has the same memory requirements: an RGB16F texture for accumulated color and an R8 texture for "revealage" (you can pack them into a single RGBA16F texture as well.)

The difference from the previous method is that the new one binds the final framebuffer's color texture as a third render target. It then modulates that color during the 3D transparent pass, since modulation is inherently independent of order. Doing so slightly increases the bandwidth cost of the transparent pass, but allows colored transmission.

This slight increase is undesirable if you know that your scene contains only monochrome transmission/coverage. Objects such as (colored) clouds, (colored) smoke, alpha-matted (colored) foliage, and colorless glass are cases that don't need the new method. I believe that the additional cost is easily acceptable for the quality increase if you need to render colored glass or fluids.

The new method is something that I just prototyped today, so I don't yet know its full strengths and failure cases. There's one theoretical concern: the implicit color normalization process by the average of averages is mathematically questionable. However, performing it would tint the highlights undesirably. When I've used the technique on diverse scenes and worked with it for a while, I'll write it up for formal publication.

3D Transparent Surface Pass

The render target setup for the transparent pass is as follows. The values written from the shader are based on the Reflected light, the coverage (a) and the Transmitted light. In practice, I use premultiplied-alpha for R, so I don't actually perform the explicit R.r * a, etc. computations.

Render TargetFormatClearSrc BlendDst BlendWrite ("Src")
accumRGBA16F(0,0,0,0)ONEONE(R.r*a, R.g*a, R.b*a, a) * w

The "color" buffer is the final framebuffer's color buffer, containing the already-rendered opaque parts of the scene. It is not cleared before the transparent pass and can have any RGB format.

The 3D color shader differs by only the two lines (which handle the color buffer) from the monochrome algorithm:
layout(location = 0) out float4 _accum;
layout(location = 1) out float  _revealage;
layout(location = 2) out float3 _modulate; /* NEW */

void writePixel(vec4 premultipliedReflect, vec3 transmit, float csZ) { 
    /* NEW: Perform this operation before modifying the coverage to account for transmission. */
    _modulate = premultipliedReflect.a * (vec3(1.0) - transmit);

    /* Modulate the net coverage for composition by the transmission. This does not affect the color channels of the
       transparent surface because the caller's BSDF model should have already taken into account if transmission modulates
       reflection. See 

       McGuire and Enderton, Colored Stochastic Shadow Maps, ACM I3D, February 2011

       for a full explanation and derivation.*/
     premultipliedReflect.a *= 1.0 - (transmit.r + transmit.g + transmit.b) * (1.0 / 3.0);

     // Intermediate terms to be cubed
     float tmp = (premultipliedReflect.a * 8.0 + 0.01) *
                 (-gl_FragCoord.z * 0.95 + 1.0);

     /* If a lot of the scene is close to the far plane, then gl_FragCoord.z does not 
        provide enough discrimination. Add this term to compensate:

        tmp /= sqrt(abs(csZ)); */

     float w    = clamp(tmp * tmp * tmp * 1e3, 1e-2, 3e2);
     _accum     = premultipliedReflect * w;
     _revealage = premultipliedReflect.a;                

The idea is to modulate the background image while accumulating the foreground color and coverage, and the apply averaged reflectance to the foreground.

2D Compositing Pass

Because the 3D pass already modulated the background, the compositing pass now needs only to add the weighted average of reflected color on top. The compositing shader run on the final framebuffer is run with the blending mode src = ONE_MINUS_SRC_ALPHA, dst = ONE:


/* sum(rgb * a, a) */
uniform sampler2D accumTexture;

/* prod(1 - a) */
uniform sampler2D revealageTexture;

out float4 result;

void main() {
    int2 C = int2(gl_FragCoord.xy);

    float revealage = texelFetch(revealageTexture, C, 0).r;
    if (revealage == 1.0) {
        // Save the blending and color texture fetch cost

    float4 accum     = texelFetch(accumTexture, C, 0);

    // Suppress overflow
    if (isinf(maxComponent(abs(accum)))) {
        accum.rgb = float3(accum.a);

    // dst' =  (accum.rgb / accum.a) * (1 - revealage) + dst
    // [dst has already been modulated by the transmission colors and coverage and the blend mode
    // inverts revealage for us] 
    result = float4(accum.rgb / max(accum.a, 0.00001), revealage);


Here's a simple example comparing the results of this new colored method to sorted transparency. Sorted transparency is significantly slower because it requires two draw calls per sorted primitive, one to modulate the background and one to add the reflected light.

Both images show two glass cages on a cobblestone ground (in the Uffizi Gallery, because this is computer graphics research). The red glass cage is closer to the viewer than the yellow one. The top image is the new fast, colored, and order-independent method. You can see the two buffers inset that it uses. The bottom image is a sorted transparency reference image. The order-independent method clearly captures most of the phenomena, including the intended tinting of the background where seen through the foreground. The primary difference is that the glossy highlights are dimmer (because they're averaged) in the new method. I don't show the old OIT method here...it would have both cages appear to be made from clear instead of colored glass.

New fast colored, order-independent transparency

Slow, sorted transparency reference image

Morgan McGuire (@morgan3d) is a professor of Computer Science at Williams College, a researcher at NVIDIA, and a professional game developer. His most recent games are Rocket Golfing and work on the Skylanders series. He is the author of the Graphics Codex, an essential reference for computer graphics now available in iOS and Web Editions.