Occlusion Culling for Walkthroughs of Large Virtual Environments |
The first category of polygon reduction strategies is epitomized by level-of-detail algorithms. These techniques compute a series of geometric approximations to objects in the scene with progressively less detail, and hence progressively fewer polygons. The levels-of-detail generally must be generated as a preprocess since the algorithms for computing them are rather time consuming. Levels of detail can be either static -- a fixed set of approximations; or continuous -- each simplification operation is stored and the best approximation is computed each frame. The end goal is the same, each object in the scene is rendered using an approximation with as few polygons as necessary to capture the detail visible from a given distance. When frame speed is paramount, level of detail algorithms can easily accomodate by providing coarse approximations that have less detail than would be visible from the current viewpoint. Thus LOD's can offer both essentially 'lossless' polygon reduction and varying degrees of 'lossy' reduction.
LOD techniques do a good job of rendering objects using the fewest number of polygons possible, but not all objects always need to be rendered. Culling techniques are used to determine which objects do not need to be rendered at all, and they represent the second large category of polygon reduction technique. There are many types of culling, but the most widely used thanks to its simplicity and effectiveness is view frustum culling. The idea of view frustum culling is simply to not render what is outside of the viewing volume defined by the eyepoint and window. The name comes from the shape of the viewing volume, a truncated pyramid. View frustum culling is most effective when combined with a scene organized into a spatial hierarchy so that entire branches of the scene graph tree can be culled with one in-out test.
Another form of culling is occlusion culling. Occlusion culling algorithms attempt to determine what objects aren't visible because they are behind other objects. Sevaral popular occlusion culling algorithms exist that take advantage of the restricted geometry in a scene. For instance the cells and portals technique is good for indoor architectural models. The key to its success is that the walls of a room occlude most geometry outside the room, or cell. Other rooms are only visible through small doorways or windows, also known as portals. Often it is only possible to see to the next room through a doorway, and the room after that is not visible at all. This is especially the case if the cells are part of a game level specifically designed to render quickly. Thus it generally suffices to render only the polygons in the current room and the ones immediately adjacent; the rest are culled out since they are known to be occluded.
Occlusion culling for general scenes is a difficult problem. To perform occlusion culling in general we would like to know which objects block many other objects. For an object to be culled it must be occluded completely, but often complete occlusion is achieved only through the combined contributions of two or more occluding objects. Clearly some form of aggregate occlusion representation which combines these contributions is desirable. We could try to find all objects which occlude other objects, but this is basically the same as asking which objects are visible, and if we knew that then we would just render those objects. There are a number of ways to combat this problem and the remainder of this paper will disscuss them and the implementation of the rest of an occlusion system.
First an aside on image based algorithms. Image based algorithms for polygon reduction are another popular topic of research. They are difficult to classify in the above taxonomy. They attempt to represent portions of the scene using images or sprites that contain as much detail as necessary to appear correct, but the images are also used to replace or cull out the geometry they represent. Thus they attempt to eliminate sub-pixel detail, while at the same time performing occlusion culling. An example of such an algorithm is image warping used to cover portals in a cells-and-portals type of system [Popescu98, Rafferty98]. The geometry beyond the portal is represented by (and culled by) the warped image.
Visibility culling. Traditional hidden surface removal and "tiling" algorithms: warnock et al, Greene's triage masks
More about HOM/HZB methods, their specific occlustion tests. Trade offs. Hybrid ideas.
First you set texturing to be bilinear:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST);
[pseudocode]
First, make sure that all texture filtering is off:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);and take care that w and h are powers of 2. Pad the image if this is not the case.
glBlendEquationEXT(GL_MAX_EXT); for (int pass=0; pass<4; pass++) { glPixelStorei(GL_UNPACK_ROW_LENGTH, w); int offset = (pass & 0x1) ? 1 : 0; offset += (pass & 0x2) ? w : 0; glTexImage2D(GL_TEXTURE_2D, 0, // mipmap level 2, // number of color components (2==GL_LUMINANCE_ALPHA) w/2, h/2, // size of target image GL_LUMINANCE, // format GL_FLOAT, // data type depth_image + offset); Render quad at image resoulution } glReadPixels to get new mipmap level into host memory glBlendEquationEXT(GL_FUNC_ADD_EXT); // restore standard blend modeThe key trick is getting the stride through the data right. This is accomplished by pretending we have two color components per pixel (so stride through a row will be 2) and that the width of a row in host memory is w of these 2 component pixels. In fact it is w/2 of the two component pixels, but by telling GL_UNPACK_ROW_LENGTH it is w while we tell glTexImage2D it is w/2 we manage to skip every other row.
Compared with the method for building alpha mipmaps, this is a bit more expensive since each pass requires 4 calls to glTeximage2D and 4 textured quads rather than just one. And since depth is typically a 4-byte float while alpha is just a byte, the host to texture memory bandwith increases with the increased component size, so it goes up by a factor of 4 compared to alpha mipmapping. Hence the break-even point where a software implementation begins to beat hardware will probably occur a few levels earlier in the process.
Since this method does not seem really to use any special features of texture mapping hardware, one might ask why not just use glDrawPixels. The answer is that glDrawPixels doesn't support drawing a different number of components from what is in the source image. Thus while we can still skip rows by using GL_UNPACK_ROW_LENGTH we cannot skip pixels within a row.
It might be the case that we would rather not do our hierarchical Z-buffer comparisons against the full resolution depth map. In that case we do not need the intermediate levels and so can do N^2 passes instead of 4 passes, where N is the reduction factor in width and height desired.
glBlendEquationEXT(GL_MAX_EXT); glTexImage2D(GL_TEXTURE_2D, 0, // mipmap level GL_LUMINANCE, // format in host memory w, h, // size of original image GL_LUMINANCE, // format for texture memory GL_FLOAT, // data type depth_image); glPixelStorei(GL_UNPACK_ROW_LENGTH, w); for (int pass=0; pass<4; pass++) { GLfloat offsetx = (pass & 0x1) ? -0.5 : 0; GLfloat offsety = (pass & 0x2) ? -0.5 : 0; Render quad at w/2 x h/2 resoulution, at coordinate (offsetx,offsety) } glReadPixels to get new mipmap level into host memory glBlendEquationEXT(GL_FUNC_ADD_EXT); // restore standard blend modeEssentially we do a subpixel jittering of the quad so that for each rendering a unique texel of the texture map is applied to each pixel. In this case we actually benefit from the lack of antialiasing and the zero order reconstruction which are typically associated with poor image quality.
The basic idea is that we would like to draw the occlusion map, then somehow render the bounding box so that we can instantly tell if any pixel was "updated".
First render the occluders in pure black with an alpha channel with all lighting attributes turned off, and no depth buffering. Set the blending mode to (1-alpha) * src + a * dst. Now render the bounding volume with all lighting completely off using only one channel, say blue. Using the histogram extension now, see what the largest blue component is in the resulting image. If there are enough pixels with blue value above a particular threshold then the object represented by the bounding volume must be rendered. Otherwise it may (possibly) be culled. Since high end hardware is typically well optimized for RGBA rendering, it should be possible to quickly test one bounding volume in each color channel for a total of 3 tests before having to "reset" the occlusion alpha channel image by recopying it.
The drawback to this method is that it does not tell us which pixels were visible and which were hidden. Alpha buffer occlusion is only a necessary condition for real occlusion, and not a sufficient one. Some sort of depth comparison is also required to give sufficient grounds for culling a node. Thus we would like to have some idea which regions are hidden according to the alpha buffer so that we can test them against a z buffer.
Q: What happens with pixels overdrawn many times using the given compositing operator? A: For a given channel c=(1-alpha)^n * c1 * c2 ... * cn. But since we draw the occluders first and only scan convert the bounding volumes after that, the solution is simply to only allow convex bounding volumes, and render with backface culling on so the depth complexity of bounding volumes will always be 1.
The depth values are not changed, so only the color buffer needs to be cleared for the next three node tests. This method does not use the min-max blending extension.
Actually it should be possible to do a node test per bit of color accuracy if logical OR'ing is used for blending. First render with color (0x1,0x0,0x0), then color (0x2,0x0,0x0) etc. with 24 bit color that means 24 bounding volumes can be rendered before needing to read back the histogram and reset the color buffer. The drawback is that interpreting the histogram becomes more difficult. For 8 bit RGB channels we need to compute three 256-entry histograms. Separating the information for the 8 channels combined into each original component is not too difficult: basically OR together each of the 256 buckets values that isn't empty. The resulting 24 bit number shoule be 0 in all bit positions that are occluded objects and 1 in positions that are not.
[Zhang97] "Visibility Culling Using Hierarchical Occlusion Map", Hansong Zhang, Dinesh Manocha, Tom Hudson and Kenny Hoff, Proceedings of SIGGRAPH 1997.
[Greene93] "Hierarchical Z-Buffer Visibility", Ned Greene, Michael Kass, and Gavin Miller, Proceedings of ACM SIGGRAPH, pp. 231-238, 1993
[Popescu98] " Efficient Warping for Architectural Walkthroughs using Layered Depth Images", Voicu Popescu, Anselmo Lastra, Daniel Aliaga, Manuel Oliveira Neto, IEEE Visualization '98, October 18-23, 1998.
[Rafferty98] " 3D Image Warping in Architectural Walkthroughs", Matthew M. Rafferty, Daniel G. Aliaga, Anselmo A. Lastra, VRAIS '98, pp. 228-233, March 14-18, 1998.