Mark's Noteblog
MARK HARRIS' RESEARCH NOTEBLOG (PUBLIC)
width= This weblog is where I write research thoughts. It serves as a decentralized research notebook. Hence, "Noteblog".

Monday, February 24, 2003
 

Pros and Cons of SlabOp

I really like how easy it was to write the fluid simulator by re-using a few simple and flexible SlabOp policies (see geep SlabOp, below). I think this is going to make building complex simulations much easier for me. That's the Pro.

The Con. This doesn't provide a "function call-like abstraction" like other designs (such as Aaron's) would. It would be nice if after Initializing, say the fragment program for the poisson solver and BCs, you could do this:


for (i = 0; i < _iNumPoissonSteps; ++i)
{
_pressureBC.Compute(pressure); // apply pure neumann boundary conditions
_poissonSolver.Compute(pressure, pressure, divergence, -1, 0.25f); // perform one Jacobi iteration
}

Here the arguments to _poissonSolver.Compute() are the x (output), x, b, centerFactor and stencilFactor parameters of the jacobi fragment program, so you don't have to make a whole bunch of calls to SetTextureParameter, SetFragmentParameter3f, etc. I have some early ideas on a generic way of doing this, too... This would be really nice. However, such a design would put a lot of constraints on usage and implementation of SlabOp. I don't want to lose the flexibility and ease of reuse that I have right now. Anything I do will keep that in mind. Maybe keep SlabOp as is, and provide another (wrapper?) class to provide the function-call abstraction.


 

geep SlabOp

OK, I finally finished the fluid demo that uses SlabOp, and I can describe how it's used, what I like and dislike, etc. Most of the operations in the demo use a generic SlabOp that uses a Cg fragment program (no vertex program), does no fancy render to texture (RTT is currently a bit slow on NVIDIA due to the context switch cost), and updates via a glCopyTexSubImage. It is defined like so:


typedef SlabOp
<
NoopRenderTargetPolicy,
NoopGLStatePolicy,
NoopVertexPipePolicy,
GenericCgGLFragmentPipePolicy,
MultiTextureGLComputePolicy,
CopyTexGLUpdatePolicy
>
FloSlabOp;

CopyTexGLUpdatePolicy simply performs the glCopyTexSubImage. A SetOutputTexture() is provided to specify the texture object that is the destination of the copy.
MultiTextureGLComputePolicy simply renders a screen quad, and outputs texture coordinates for multiple texture units (user specified).
The real power and flexibility comes from GenericCgGLFragmentPipePolicy (there is a similar VertexPipePolicy, also). To use it, you just call its Initialize() method, passing it the Cg context and a string filename for a fragment program. It provides methods such as

void SetTextureParameter(string name, GLuint texobj)
for specifying texture inputs, and
void SetFragmentParameter2f(string name, float x, float y)
void SetFragmentParameter3fv(string name, const float *v)
for specifying named fragment program parameters.

I'll use the Jacobi poisson solver operation as an example. The Cg code for this op looks like this:


fragout main(hvfFlo IN,
uniform half centerFactor,
uniform half stencilFactor,
uniform samplerRECT x,
uniform samplerRECT b)
{
fragout OUT;

half4 xL = h4texRECT(x, half2(IN.WPOS.x - 1, IN.WPOS.y));
half4 xR = h4texRECT(x, half2(IN.WPOS.x + 1, IN.WPOS.y));
half4 xB = h4texRECT(x, half2(IN.WPOS.x, IN.WPOS.y - 1));
half4 xT = h4texRECT(x, half2(IN.WPOS.x, IN.WPOS.y + 1));

half4 bCenter = h4texRECT(b, IN.WPOS.xy);

OUT.col = (xL + xR + xB + xT + centerFactor * bCenter) * stencilFactor;

return OUT;
}


We declare the SlabOp in the fluid simulator class header like this:

FloSlabOp _poissonSolver;

Then on initialization of the simulator, we have to set up the texture inputs and parameters for the cg programs:


_poissonSolver.InitializeFP(context, "programs/floPoissonSolve.cg");
_poissonSolver.SetTextureParameter("x", _iTextures[TEXTURE_PRESSURE]);
_poissonSolver.SetTextureParameter("b", _iTextures[TEXTURE_DIVERGENCE]);
_poissonSolver.SetFragmentParameter1f("centerFactor", -1);
_poissonSolver.SetFragmentParameter1f("stencilFactor", 0.25f);
_poissonSolver.SetTexCoordRect(0, sMin, tMin, sMax, tMax);
_poissonSolver.SetSlabRect(xMin, yMin, xMax, yMax);
_poissonSolver.SetOutputTexture(_iTextures[TEXTURE_PRESSURE], _iWidth, _iHeight);

And then during the simulation iteration, we use this Jacobi step to solve for the pressure like this:

ClearTexture(_iTextures[TEXTURE_PRESSURE], _iWidth, _iHeight);
for (i = 0; i < _iNumPoissonSteps; ++i)
{
_pressureBC.Compute(); // Apply pure neumann boundary conditions
_poissonSolver.Compute(); // perform one Jacobi iteration
}

Very simple! Notice the _pressureBC.Compute() call. Since we have the no-slip condition at the boundaries, during the poisson solution, we have to specify pure neumann boundary conditions at the boundaries during each iteration. I could have done the BCs in the same fragment program as the jacobi step. However not only does that reduce flexibility, but it is inefficient: conditionals in NV30 fragment programs are performed using conditional execution. That means an if statement can cause twice (or more) as many instrucitons to be executed. In this case, it's easy to evaluate the conditional "on the CPU" instead. Some of the fragment programs in my demo are less than half the length (compiled) that they were before I removed the conditionals, and this way I render exactly the same number of fragments. That's a big speedup. I used the flexibility of SlabOp to make this easy.

The pure-neumann boundary conditions cg code looks like this:


fragout main(hvfFlo IN, uniform samplerRECT x)
{
fragout OUT;
OUT.col = h4texRECT(x, IN.WPOS.xy + IN.TEX1.xy);
return OUT;
}

I created another type of SlabOp like this:

typedef SlabOp
<
NoopRenderTargetPolicy,
NoopGLStatePolicy,
NoopVertexPipePolicy,
GenericCgGLFragmentPipePolicy,
BoundaryGLComputePolicy,
NoopUpdatePolicy
>
FloBCOp;

The difference here is that instead of a MultiTextureGLComputePolicy, I use a BoundaryGLComputePolicy. The Compute() method of BoundaryGLComputePolicy draws four single-pixel strip quads around the edges of the viewport instead of a single quad centered in the viewport. Now, I can specify that _poissonSolver SlabOp draws to a quad that does not include a one-pixel border at the edges of the viewport. I setup _pressureBC like this in the class header:

FloBCOp _pressureBC;

And initialize it like this:

_pressureBC.InitializeFP(context, "programs/floPoissonBC.cg");
_pressureBC.SetTextureParameter("x", _iTextures[TEXTURE_PRESSURE]);
_pressureBC.SetTexCoordRect(0, 0, _iWidth, _iHeight);
_pressureBC.SetTexResolution(_iWidth, _iHeight);

So SlabOp's flexibility allowed me to create a custom operation without writing code for a whole new class -- reusing all functionality except the rendering. I didn't even have to create a new header file!

To demonstrate even more flexibility, I use the same _poissonSolver to solve both the Poisson-pressure equation and the viscous diffusion equation. Here's some of the actual code from the Update() method of the simulator, which runs a single iteration of the solver:


// After advection...
if (_rViscosity > 0)
{
float centerFactor = 1 / (_rViscosity * _rTimestep);
float stencilFactor = 1.0f / (4.0f + centerFactor);

// To use the same sover, I just reset its parameters before using it...
_poissonSolver.SetTextureParameter("x", _iTextures[TEXTURE_VELOCITY]);
_poissonSolver.SetTextureParameter("b", _iTextures[TEXTURE_VELOCITY]);
_poissonSolver.SetFragmentParameter1f("centerFactor", centerFactor);
_poissonSolver.SetFragmentParameter1f("stencilFactor", stencilFactor);
_poissonSolver.SetTexCoordRect(0, 0, 0, _iWidth, _iHeight);
_poissonSolver.SetOutputTexture(_iTextures[TEXTURE_VELOCITY], _iWidth, _iHeight);

// in this case, the BCs are no-slip velocity BCs. Since the buffer is initialized to zero, and
// _poissonSolver.Compute() doesn't overwrite the outside 1-pixel boundary, they are enforced.
for (i = 0; i < _iNumPoissonSteps; ++i)
_poissonSolver.Compute();
}

// Compute the divergence of the velocity field
_divergence.Compute();

// Solve for the pressure disturbance caused by the divergence, by solving
// the poisson problem Laplacian(p) = div(u)
// Since I changed the parameters to use _poissonSolver for the viscous
// diffusion solution, I have to set them for the Poisson-pressure problem here.
_poissonSolver.SetTextureParameter("x", _iTextures[TEXTURE_PRESSURE]);
_poissonSolver.SetTextureParameter("b", _iTextures[TEXTURE_DIVERGENCE]);
_poissonSolver.SetFragmentParameter1f("centerFactor", -1);
_poissonSolver.SetFragmentParameter1f("stencilFactor", 0.25f);
_poissonSolver.SetOutputTexture(_iTextures[TEXTURE_PRESSURE], iWidth, _iHeight);

// Clear the pressure texture, to initialize the pressure disturbance to zero before iterating.
ClearTexture(_iTextures[TEXTURE_PRESSURE], _iWidth, _iHeight);
for (i = 0; i < _iNumPoissonSteps; ++i)
{
_pressureBC.Compute(); // Apply pure neumann boundary conditions
_poissonSolver.Compute(); // perform one Jacobi iteration
}


Very flexible! I'll talk about pros and cons in my next post.


Friday, February 14, 2003
 

Updating the GPGPU website...

I'm going to be updating the GPGPU site. I'm moving over to a more dynamic content management (blog/news style). After evaluating several alternatives, I think I'm going to use blosxom to do the job.

The main benefits this will give me are:


  • Categories

  • Sorting -- by category or by date

  • Searching (with some extra effort)


So, the question is, what categories should I have. Currently, I can think of the obvious ones: "research", "courses", and "for developers", which I sort of already have on the site. But I'm thinking about getting finer-grained on "research". For example: "simulation", "image processing", etc. And I've actually had requests from some people to link their research which I think is more of a graphics use of a GPU than a GP use. After thinking about it, some of these (like Purcell's / Carr's ray tracing) really deserve mention / linkage. So maybe a "graphics' category is in order.


Wednesday, February 05, 2003
 

Abstracting GPGPU (ongoing)

I used a fluid simulation demo I have to write for my GDC talk as an excuse to start developing a simple framework. I think my ideas up to this point will work pretty well, and I should have a simple demonstration in a few days.

I particularly like using string names to refer to parameters (especially with Cg). For numeric parameters, Cg does state shadowing so you only need to set these parameters when they change. For texture parameters, Cg does no state shadowing, so my example FragmentPipePolicy does state shadowing itself. I just keep a std::map that maps CG texture parameters to texture object IDs. Then in SetState() (called before Compute()), I iterate over the map and bind and enable the texture parameters using Cg runtime calls. In ResetState(), I iterate over the map and disable the texture parameters. The map is nice because it won't allow duplicates -- if you call SetTextureParameter with the same CGparameter multiple times, it overwrites the texture object ID stored in the map.

I'm on the fence about whether the policy-based design is a win or not. So far I have only built two different SlabOps, and both use my generic FragmentPipePolicy. The only difference is that one is for display to the screen, so it uses a NoopUpdatePolicy, while the other uses a CopyUpdatePolicy. I haven't done anything I couldn't have done with a base class that provides virtual methods with default (Noop) implementations for the methods the policy-based ("host") version inherits from its policy classes. I do have a feeling that the policies will come in handy down the line -- I think they will be more flexible and support many unforeseen designs.


Wednesday, January 29, 2003
 

Abstracting GPGPU: Function Parameters

Aaron writes: "In my experience, getting the fcn parameters nailed down is going to be very hard. Is there a way to make this flexible?"

He's right, it's hard. I actually think the way Cg has done it is pretty clean and flexible. You refer to program parameters (textures and uniform / varying vertex / fragment parameters) by a string name -- the name used for the formal parameter in the shader code.

A VertexPipePolicy or FragmentPipePolicy can provide setters and getters for parameters that work this way. If the policy is implemented using Cg, this is really simple. If not, its not that hard to implement. I imagine that Cg just stores an STL map of parameters for each program.

One thought I had about this is that it would be inefficient compared to writing a setter/getter for each parameter that knows exactly what to change without searching in a map. True, but unless a program has a lot of parameters, I doubt you'll notice a performance hit from the lookup. Some of the NVIDIA demos (see the skin shader demo) do a lookup every frame just because they don't want to define a global / member variable for every parameter and keep it around. The demo uses multiple fragment programs that have the same parameters -- so it's convenient to do a lookup.

Then again, this method might not be right for some applications. But that's why using policies is so nice -- you don't have to write a whole new SlabOp subclass just to change the way parameters are handled. You just write a new VertexPipe or FragmentPipe policy. Less work, fewer bugs.


 

Well, rather than write another page, I wrote some sample code. This code is obviously incomplete, but it gives an idea of the structure of host and policy classes.

The policies will of course require parameters -- this class has no data! Fleshing the details of that out is the next task.


 

Abstracting GPGPU (cont.)

Earlier I laid out this anatomy of a SlabOp ("Slab Operation"):


ActivateRenderTarget();
SetGLState();
BindInputSlabs();
Render();
UpdateOutputSlab();
UnsetGLState();
DeactivateRenderTarget();

Each of the "organs" in the anatomy is a component of the slab operation. When all are performed in order, the SlabOp is complete. I mentioned that I thought SetGLState() should be divided into more detailed components. Let's do that:


ActivateRenderTarget();
SetGLState();
SetVertexPipeState();
SetFragmentPipeState();
BindInputSlabs();
Render();
UpdateOutputSlab();
UnsetGLState();
UnsetVertexPipeState();
UnsetFragmentPipeState();
DeactivateRenderTarget();

The Set[Vertex | Fragment]PipeState() methods (and their associated "Unsetters") serve to put the GPU's component processors into the right state for the SlabOp. This includes binding the right programs, enabling them, and setting program local parameters.

Wow, so that, basically, is an abstraction of what any normal SlabOp will do. Now, the challenge is to build a SlabOp framework in a generic way, so that we can get a nearly infinite variety of functionality for a reasonable amount of code. One approach, the approach I would have taken a week ago, would be to write a base class that looks something like this:


class SlabOp
{
public:
void Compute()
{
ActivateRenderTarget();
SetGLState();
SetVertexPipeState();
SetFragmentPipeState();
BindInputSlabs();
Render();
UpdateOutputSlab();
UnsetGLState();
UnsetVertexPipeState();
UnsetFragmentPipeState();
DeactivateRenderTarget();
}
protected:
virtual void ActivateRenderTarget() {}
virtual void SetGLState() {}
virtual void SetVertexPipeState() {}
virtual void SetFragmentPipeState() {}
virtual void BindInputSlabs() {}
virtual void Render() {}
virtual void UpdateOutputSlab() {}
virtual void UnsetGLState() {}
virtual void UnsetVertexPipeState() {}
virtual void UnsetFragmentPipeState() {}
virtual void DeactivateRenderTarget() {}
};


Now what's wrong with that? Well, first of all, it's not very flexible. What if you wanted some of the steps in Compute() to take some parameters? You would have to overload them in a derived class. But then you would be forced to override Compute() in the subclass. Suboptimal.

More importantly, code reuse is suboptimal. If I implement five different SlabOp derived classes that all use the same vertex pipe state, all five have to reimplement SetVertexPipeState(), even though the code is the same. Realistically, I would create an intermediate subclass that implements the shared functionality, and then derive the five from it. But still, I think there is a better way.

That better way is something I've been reading about this week: Policy-Based Class Design. It is described in detail in the book Modern C++ Design, by Andrei Alexandrescu.

In a nutshell, each of the "organs" in the above anatomy of a SlabOp is a policy. In Alexandrescu's words, "A policy defines a class interface or a class template interface.... For a given policy, there can be an unlimited number of implementations. The implementations of a policy are called policy classes. Policy classes are not intended for standalone use; instead, they are inherited by, or contained within, other classes. ... The classes that use one or more policies are called hosts, or host classes." Thus, by defining a policy (a simple interface) for each of the above organs, and combining them via multiple inheritance or aggregation into a host class (the SlabOp class itself), we can create many varieties of SlabOps with just a few policy implementations! The SlabOp host would be a template class with a template parameter for each policy choice.

The only really scary thing about policies is that their interfaces are not defined by code. A policy is just what it says: an english description of what a policy class should, and must do. (note that it may specify both should and must items!) Thus, there is no compile-time checking. However, if the policy-implementer follows the policy-writer's written instructions well, compile time checks do come into effect.

I'll get to actually defining some policies later.

One question lingers in my head: what about program local parameters? They can be of many types, sizes, etc. Is there a generic way to provide parameters that automatically get set, without requiring the user to create a new subclass of SlabOp for each operation desired? I think there are a number of ways, but I haven't thought them through yet.


Tuesday, January 28, 2003
 

Abstracting GPGPU

Tonight I started an email discussion with Aaron Lefohn about how best to design an abstraction for general purpose computation on graphics hardware (GPGPU). I'm going to start off this shiny new research blog with notes on that subject. This is gonna be long, I just know it...

The specific type of GPGPU that we are referring to (Aaron may have a slightly different view), is the use of the GPU for lattice computations. The lattice is typically a 2D or 3D grid of cells. Computations on the lattice consist of local (SIMD-style) operations that update the state of a cell based on its current state and the state of its neighbors. The computations may also depend on cells beyond the immediate neighborhood, but typically the operations are local.

Lattices, of course, map to pixel buffers on the GPU. Transient lattice states may exist only in a frame buffer of off-screen buffer, but this is rare. Usually, lattice state is stored in a texture. Scalar lattices need only a single component (R, G, B, etc.) texture. A 3D vector lattice needs an RGB texture. In any case, updates of the lattice state correspond to rendering to a texture with the GPU state set in a way that enables the desired computation. Inputs to the computation may be other lattice states (textures). If a texture is used as input to its own update, we get feedback. This is where the update operation becomes very powerful. Complex simulations can be built from simple rendering operations. Since the GPU is optimized for these operations, these simulations can be very efficient -- often many times more efficient than an equivalent simulation implemented on a single CPU.

From this point forward, I'll refer to a lattice update as a Slab Operation (borrowing Aaron's term, "Slab"). This is because frame buffers are (currently -- maybe I'll write some notes on this another day) inherently 2D, but lattices may be higher-dimensioned. This means that all updates of lattices, regardless of dimensionality, must occur on a single 2D cross-section, or slab, at a time. (I used to call these slices, but now that I think about it, if the lattice occupies a 3D volume of space, the slice represents a finite thickness of the volume, so slab is a nicer term.) To make things even easier, I'm going to use the shorthand SlabOp.

I would like to have a generic C++ class, or set of classes, that supports and abstracts slab operations. I've written a lot of slab computation code already -- on NV2X and NV30 GPUs, and every slab operation has certain characteristics, which means it's ripe for abstraction. I even abstracted the computation a bit in my previous work on CML simulation, but I didn't have enough experience with it yet to do a good job.

So, what are these shared characteristics? Here is the anatomy of a slab operation:


ActivateRenderTarget();
SetGLState();
BindInputSlabs();
Render();
UpdateOutputSlab();
UnsetGLState();
DeactivateRenderTarget();

Now let's break it down:

ActivateRenderTarget(): Often SlabOps are done in an off-screen buffer (aka pbuffer), but this need not be true. If a pbuffer is used, then it must be activated before the SlabOp is performed. Usually, however, multiple SlabOps will comprise a single time step (or other kind of simulation step). The cost of switching to the pbuffer device / rendering context once per SlabOp can add up, so usually it is better to activate a render target once for all SlabOps in a time step, and keep it active until the time step computation is complete. In this case, ActivateRenderTarget(), as well as DeactivateRenderTarget(), may be noops. Performance will change in the future, so any abstraction should be flexible.

SetGLState(): Set the rendering state needed to correctly update the slab on the GPU. This may be "fixed-function" GPU state, or, on new hardware, it may be programmable vertex or fragment pipelin state, such as binding a fragment program, setting fragment program local parameters, etc. In fact, I think it's better to divide SetGLState() into multiple parts, to better enable reuse (more later).

BindInputSlabs(): The new state of the output slab depends on some computed function of the input slabs (Often the input is the previous value of the output slab. See feedback, above). The input slabs, like the output slab, are textures, so we have to bind them to the appropriate texture units in order for the SlabOp to proceed correctly.

Render(): This is where the real work actually gets done. Ironically, this is usually the simplest portion of the SlabOp implementation, because most of the implementer's work is in setting the state correctly. Typically it consists of rendering a single screen-aligned and -sized quad into a viewport sized to fit the output slab resolution. Sometimes multiple quads are rendered. Sometimes (but not often) other things are rendered. It has to be flexible. One thing that I don't like about this is the name. I called it Render() because that's how it is implemented on a GPU -- by rendering something so that fragments get processed using all the state and fragment programs that have been set up. Really the rendering is used just to provoke the computations. Perhaps a better name is Submit()? You set up all the state for the SlabOp, then you submit it, then you clean up after it. Yeah, I'll go with that for now.

UpdateOutputSlab(): The output slab is a texture. We have to update it after Submit() completes. If we are using Render-To-Texture (RTT), then the update will actually be completed by DeactivateRenderTarget(), below. If not, here is where we copy from the active render target to texture memory. (glCopyTexSubImage(), in OpenGL).

UnsetOpenGLState(): This is where we clean up after ourselves. Of course, there are two schools of thought on graphics state management. One school, the "I'm a loner... a rebel" school, says to set the state you need when you need it (and unset the state you don't need!). Another school, the "Mamma said to..." school, says set the state you need, but put it back when you are done. If the latter is desirable, this is where you put the state back. If the former, this can be a noop.

DeactivateRenderTarget(): If ActivateRenderTarget() changed to a new render target, then here is where we change back. If it didn't, this is a noop. If RTT is in use, this is where the proper RTT interface is called to make sure the render target is now usable as a texture.

I need to take a break. I've got more thoughts about a more detailed breakdown, how to build this abstraction into a real framework, etc. I'll type about them later.


 

New!
I just created this blog as a place to type my notes. It's convenient because it is decentralized from where I am working at any given moment.


Mark Harris Page
Copyright 2003 Mark J. Harris
Email Mark Harris