Thursday, November 24, 2011

Advanced HLSL using closures and function pointers

Shader languages like HLSL, Cg or GLSL are nowadays driving the most powerful processors in the world, but if you are developing with them, you may have been already a little bit frustrated by one of their expressiveness limitations: the common problem of abstraction and code reuse. In order to overcome this problem, solutions so far were mostly using a glue combination of #define/#include preprocessors directives in order to generate combinations of code, permutation of shaders, so called UberShaders. Recently, this problem has been addressed, for HLSL (new in Direct3D11), by providing the concept of Dynamic Linking, and for GLSL, the concept of SubRoutines, For Direct3D11, the new mechanism has been only available for Shader Model 5.0, meaning that even if this could greatly simplified the problem of abstraction, It is unfortunately only available for Direct3D11 class graphics card, which is of course a huge limitation...

But, here is the good news: While the classic usage of dynamic linking is not really possible from earlier version (like SM4.0 or SM3.0), I have found an interesting hack to bring some kind of closures and functions pointers to HLSL(!). This solution doesn't involve any kind of preprocessing directive and is able to work with SM3.0 and SM4.0, so It might be interesting for folks like me that like to abstract and reuse the code as often as possible! But let's see how It can be achieved...


A simple problem of abstraction and code reuse in HLSL


I have been working recently at my work on a GPU implementation of a versatile perlin/simplex/fbm/turbulence noise in HLSL. While some of the individual algorithm are pretty simples, it is often common to use several permutations of those functions in order to produce some nice noise and turbulences functions (like the worm-lava texture I did for Ergon 4k intro). Thus, they are an ideal candidate to demonstrate the use of closures and functions pointers. I won't explain here the basic principle of perlin and fbm noise generation to focus on the problem of code reuse in HLSL.


Here is a simplified version of a Turbulence Noise implemented in a Pixel Shader:

float PerlinNoise(float2 pos){
  ....
}

float AbsNoise(float2 pos) {
    return abs(PerlinNoise(pos));
}

float FBMNoise(float2 pos) {
    float value = 0.0f;
    float frequency = InitialFrequency;
    float amplitude = 1.0f;
    // Classic FBM loop
    for ( int i=0; i < Octaves; i++ )
    {
        float noiseValue = AbsNoise(pos);
        value += amplitude * noiseValue;
        frequency *= Lacunarity;
        amplitude *= Amplitude;
    }
    return value;
}

// Turbulence noise:
// Fbm + Abs + Perlin
float TurbulenceAbsPerlinNoisePS(float4 pos : SV_POSITION, float2 texPos : TEXCOORD0)
 : SV_Target
{
    return FBMNoise(texPos);
}

The problem with the previous code is that if we want to change the code behind AbsNoise called from FBMNoise (for example, apply cos/sin on the coordinates, or use of a simplex noise instead of the old Perlin Noise), we would have to duplicate the FBMNoise function to call the other function. Of course, we could use the preprocessor to inline the code, but It would end up in something less readable, less debuggable, error prone...etc.

Another example: Ken Perlin introduced some really cool functions to modify the noise, like the famous marble effect:

static float stripes(float x, float f) {
    float t = .5 + .5 * sin(f * 2*PI * x);
    return t * t - .5;
}

float MarbleNoise(float2 pos) { 
    return stripes(pos.x + 2 * FBMNoise(pos), 1.6f);
}

But wait! The MarbleNoise function could even be used in place of the AbsNoise function, in order to get another noise effect. So we could have a marble function calling a FBM... but we could also have a marble function called by a FBM... or both...  ugh... so as we can see, It is possible to permute those functions to generate interesting patterns, but unfortunately, the shading language doesn't provide us a way to make those functions pluggable!... Almost! In fact, there is a small breach in the HLSL language and we are going to use it!


Introduction to Dynamic Linking in HLSL


So as I said in the introduction, Direct3D11 has introduced the concept of dynamic linking. I suggest the reader to go to an explanation on msdn "Interfaces and classes". Basically, the main feature introduced in the HLSL language is a bit of Object Oriented Programming (OOP) in order to address the problem of abstraction: Now HLSL has the class and interface keyword. But they were mainly introduced for dynamic linking of a shader, and as I said, dynamic linking is only available with SM5.0 profile.


// An interface describing a light
interface ILight {
    float3 ComputeAmbient(...);
    float3 ComputeDiffuse(...);
    float3 ComputeSpecular(...);
};

// A 1st implem of the ILight interface
class MyModelLight1 : ILight { 
    float3 ComputeAmbient(...) {
        ...
        return color;
    } 
    ...
};

// A 2ns implem of the ILight interface
class MyModelLight2 : ILight { 
    float3 ComputeAmbient(...) {
        ...
        return color;
    } 
    ...
}

// The variable through which we are going to access the light model
ILight abstractLight;

// We need to declare the two implems in order to get a reference 
// to them from C++ code
MyModelLight1  modelLight1;
MyModelLight2 modelLight2;

float4 PixelShader(PS_INPUT Input ) : SV_Target
{
    // Call the abstractLight that was previously setup by C++ at 
    // PixelShader creation time
    float3 ambient = abstractLight.ComputeAmbient(Input.Pos);
    float3 diffuse = abstractLight.ComputeDiffuse(Input.Pos);
    float3 specular = abstractLight.ComputeSpecular(Input.Pos);

    return float4(saturate( Ambient + Diffuse + Specular ), 1.0);
}

To be able to use this shader, we need to setup the abstractLight variable from the C++/C# code, through the usage of ID3D11Device::CreateClassLinkage and in the instatiation of a Pixel Shader ID3D11Device::CreatePixelShader.

As we can see, we need to declare the interface and classes variable globally, so that they can be accessed by the C++ program. This is the standard way to use dynamic linking in HLSL... but what If we want to use this differently?

Hacking function pointers in HLSL


The principle is very simple: Instead of using interface and classes as global variables, we can in fact use them as function parameters and even local variables from method. The way to use it is then straightforward:
// Base class for a calculator
interface ICalculator {
    float Compute(...);
};

// 1st implem of the calculator
class ClassicCalculator : ICalculator { 
    float Compute(...) {
        ...
        return value;
    } 
};

// 2nd implem of the calculator
class ComplexCalculator : ICalculator { 
    float Compute(...) {
        ...
        return value;
    } 
};

// A function using the interface ICalculator 
float MyFunctionUsingICalculator(ICalculator calculator, ...) {
    ...
    value += calculator.Compute(...);
    ...
    return value;
} 

// A Pixel shader using the ClassicCalculator
float PixelShader1(PS_INPUT Input ) : SV_Target
{
    ClassicCalculator classic;
    return MyFunctionUsingICalculator(classic, ...);
}

// A Pixel shader using the ComplexCalculator
float PixelShader2(PS_INPUT Input ) : SV_Target
{
    ComplexCalculator complex;
    return MyFunctionUsingICalculator(complex, ...);
}

The previous example could be compiled flawlessly with ps_4_0 (Shader Model 4) or ps_3_0 (with some minor changes for the pixel shader), and It would compile just fine! So basically, the interface ICalculator is acting as a function pointer, that has two implementations available through the ClassicCalculator and ComplexCalculator classes.  MyFunctionUsingICalculator doesn't have to change its signature to adapt to the underlying function, so as we can see, we have a suitable solution for developing function pointers in HLSL.

Now, lets try to see if we could use this model to build our flexible noise functions. Replace ICalculator by a INoise interface. We are seeing that an implementation would have to call another INoise interface. In fact, ideally, we would like to code something like this:
// Base class for a noise function
interface INoise {
    float Compute(...);
};

// Perlin noise implem
class PerlinNoise : INoise { 
    float Compute(...) {
        ...
        return value;
    } 
};

// FBM noise implem
class FBMNoise : INoise { 
    // Would be ideal to be able to do that
    // We could even make an abstract generic class 
    // that could provide a base Source INoise
    // BUT, THIS IS NOT COMPILING!!!
    INoise Source;

    float Compute(...) {
        float value = 0.0f;
        float frequency = InitialFrequency;
        float amplitude = 1.0f;
        // Classic FBM loop
        for ( int i=0; i < Octaves; i++ )
        {
            // Call the source abstract INoise
            float noiseValue = Source.Compute(pos);
            value += amplitude * noiseValue;
            frequency *= Lacunarity;
            amplitude *= Amplitude;
        }
        return value;
    } 
};


// A Pixel shader using the FBMNoise combined with PerlinNoise
float PixelShader1(PS_INPUT Input ) : SV_Target
{
    FBMNoise fbmNoise;
    PerlinNoise perlin;
    // This is not possible, interface variable members are not allowed
    fbmNoise.Source = perlin;
    return fbmNoise.Compute(...);
}


Unfortunately, HLSL doesn't permit the use of interface as variable members!. This limitation was quite annoying, as It excludes a whole range of combination, like aggregation, composition... making these function pointers useful only for a very limited set of cases...
I have tried to overcome this problem using abstract class instead of interface, as classes can be declared as variable members of classes... but, again, there is a huge limitation: The class variable is in fact acting a a final or const variable that cannot be changed, thus making its usage almost useless...
But I knew that HLSL permits lots of unusual constructions, and this is where closures are going to resolve this.

Hacking Closures in HLSL


So we know that interfaces can be used as function pointers, but their usage is limited as we cannot use anykind of composition. An interesting fact is that we can declare local variables in methods as being class or interfaces... The trick is to use a quite uncommon feature of HLSL: It is possible to declare local classes inside a method, that can access local parameters! Therefore, It is possible to use a kind of deferred composition/aggregation using this technique. Let's rewrite our noise functions using this new closure technique:

1. Declare a INoise interface that is able to compute the noise by using a next INoise implementation.

// It is possible to compile this code under ps_4_0 and ps_3_0

// Declare our INoise interface
interface INoise {
    // Here an interesting hack: We can declare a method that is returning a INoise 
    // interface. This method will be implemented by the pixel shaders. 
    INoise Next();
    
    // The compute method of a Noise
    float Compute(float2 pos);
};

2. Declare NoiseBase as an abstract implementation of INoise that is implementing the methods. If we had the keyword abstract in hlsl we wouldn't have to implement methods of this class.

// We are creating an abstract class from INoise in order
// to implement both methods
class NoiseBase : INoise {
    INoise Next() {
        // This code will never be used. It is only 
        // used to declare this class
        NoiseBase base;
        return base;
    }

    float Compute(float2 pos) {
        // This code will never be used. It is only 
        // used to declare this class
        return Next().Compute(pos);
    }
};

3. Use NoiseBase to implement final INoise functions. If you look at AbsNoise, FbmNoise or MarbleNoise, they are using the INoise::Next() method to get an instance of the INoise interface they rely on. This is where functions pointers are extremely useful here.

// PerlinNoise implem
class PerlinNoise : NoiseBase {
    float Compute(float2 pos) {
        // call a standard perlin_noise implemented as a simple external function
        return perlin_noise(pos);
    }
};

// AbsNoise implem
class AbsNoise : NoiseBase {
    float Compute(float2 pos) {
        // Note: We are using Next to access the next underlying function pointer
        return abs(Next().Compute(pos));
    }
};

// FbmNoise implem
class FbmNoise : NoiseBase {
    float Compute(float2 pos) {
        float value = 0.0f;
        float amplitude = 1.0f;
        float frequency = InitialFrequency;
        for ( int i=0; i < Octaves; i++ )
        {
            float noiseValue = Next().Compute(pos);
            value += amplitude * noiseValue;
            frequency *= Lacunarity;
            amplitude *= Amplitude;
        }
        return value;
    }
};

// MarbleNoise implem
class MarbleNoise : NoiseBase {
    float Compute(float2 pos) { 
        return stripes(2 * Next().Compute(pos, frequency), 1.6f);
    }

    static float stripes(float x, float f) {
        float t = .5 + .5 * sin(f * 2*PI * x);
        return t * t - .5;
    }
};

4. Implements the pixel shaders with the closure mechanism. We are declaring local classes that will override INoise::Next() method in order to chain INoise function pointers together.

// Fbm -> PerlinNoise
float FbmPerlinNoise2DPS( float4 pos : SV_POSITION, float2 texPos : TEXCOORD0 )
 : SV_Target
{
    // Look! We are declaring a local class
    class Noise1 : PerlinNoise {} noise1;
    // and this local classs can access local variable!
    // For example, Noise2 can access previous noise1 variable.
    class Noise2 : FbmNoise { INoise Next() { return noise1; } } noise2;

    // Allowing us to cascade the calls and making a kind of deferred composition.
    return noise2.Compute(texPos);
}

// Fbm -> Abs -> PerlinNoise
float FbmAbsPerlinNoise2DPS( float4 pos : SV_POSITION, float2 texPos : TEXCOORD0 )
 : SV_Target
{
    class Noise1 : PerlinNoise {} noise1;
    class Noise2 : AbsNoise { INoise Next() { return noise1; } } noise2;
    class Noise3 : FbmNoise { INoise Next() { return noise2; } } noise3;

    // FbmNoise is calling indirectly AbsNoise that will call PerlinNoise.
    return noise3.Compute(texPos);
}

// Marble -> Fbm -> Abs -> PerlinNoise
float FbmAbsPerlinNoise2DPS( float4 pos : SV_POSITION, float2 texPos : TEXCOORD0 )
 : SV_Target
{
    class Noise1 : PerlinNoise {} noise1;
    class Noise2 : AbsNoise { INoise Next() { return noise1; } } noise2;
    class Noise3 : FbmNoise { INoise Next() { return noise2; } } noise3;
    class Noise4 : MarbleNoise { INoise Next() { return noise3; } } noise4;

    // MarbleNoise is calling FbmNoise that is calling indirectly AbsNoise 
    // that will call PerlinNoise.
    return noise4.Compute(texPos);
}


// Fbm -> Marble -> Abs -> PerlinNoise
float FbmAbsPerlinNoise2DPS( float4 pos : SV_POSITION, float2 texPos : TEXCOORD0 )
 : SV_Target
{
    class Noise1 : PerlinNoise {} noise1;
    class Noise2 : AbsNoise { INoise Next() { return noise1; } } noise2;
    class Noise3 : MarbleNoise { INoise Next() { return noise2; } } noise3;
    class Noise4 : FbmNoise { INoise Next() { return noise3; } } noise4;

    // FbmNoise is calling MarbleNoise that is calling indirectly AbsNoise 
    // that will call PerlinNoise.
    return noise4.Compute(texPos);
}

Et voila! As you can see, we are able to declare local classes from a pixel shader that are acting as closures. It is for example even possible to declare local classes that have a specific code in their Compute() methods.
Behind the scene, when chaining the INoise::Next() methods, the fxc HLSL compiler is seeing all thoses classes as "INoise*".
It is then possible to perform a fbm(marble(abs(perlin_noise()))) as well as a marble(fbm(abs(perlin_noise()))).

In the end, It is effectively possible to implement closures in HLSL that can be used in SM4.0 as well as SM3.0!

Improving closures chaining


From the previous example, we can extend the concept by
1. Adding static local constructors to each Noise function :
// PerlinNoise implem
class PerlinNoise : NoiseBase {
    float Compute(float2 pos) {
        // call a standard perlin_noise implemented as a simple external function
        return perlin_noise(pos);
    }
    // Add local "constructor"
    static INoise New() {
        PerlinNoise noise;
        return noise;
    }
};

// AbsNoise implem
class AbsNoise : NoiseBase {
    float Compute(float2 pos) {
        // Note: We are using Next to access the next underlying function pointer
        return abs(Next().Compute(pos));
    }
    // Add local constructor and chain with From INoise
    static INoise New(INoise from) {
        class LocalNoise : AbsNoise { INoise Next() { return from; } } noise;
        return noise;
    }
};

// Add the same constructors to FbmNoise and MarbleNoise.
// ....
2. And then we can rewrite the Pixel shader functions to chain operators in a shorter form:
// Fbm -> Marble -> Abs -> PerlinNoise
float FbmAbsPerlinNoise2DPS( float4 pos : SV_POSITION, float2 texPos : TEXCOORD0 )
 : SV_Target
{
    // FbmNoise is calling MarbleNoise that is calling indirectly AbsNoise 
    // that will call PerlinNoise.
    return FbmNoise::New(MarbleNoise::New(AbsNoise::New(PerlinNoise::New()))).Compute(texPos);
}

This way, It allows a syntax that is even more concise and modular!

Further Considerations


This is a very exciting technique that could open lots of abstraction opportunities while developing in HLSL. Though, in order to use this technique, there are a couple of advantages and things to take into account:
  • An interface cannot inherit from another interface (that would be really interesting)
  • An interface can only have method members.
  • A class can inherit from another class and from several interfaces.
  • Unlike in C/C++, we cannot pre-declare an interface, but we can use a declaration being declared (See the example of the method INoise::Next, returning a INoise).
  • The compiler has a limitation against the reuse of an implementation in a call chain and will complain about a recursive call (even if there is no recursive call at all): For example, It is not possible to reuse twice the sample type of class closure in a call chain, meaning that it is not possible to make a call chain like this one: Marble => FBM => Marble => Abs => Perlin. The fxc compiler would complain about the second "Marble" as It would see it as a kind of recursive call. In order to reuse a function, we need to duplicate it, that's probably the only point that is annoying here.
  • Generated compiled asm output from closures are exactly the same as using standard inlining methods.
  • Before going to local class-closure, I have tried several techniques that were sometimes crashing fxc compiler.
  • Thus, as it is a way of hacking the usage HLSL, It is not guarantee that this will be supported in the future. But at least, if it is working for SM5.0, SM4.0 and 3.0, we can expect that we are safe for a while!
  • Also, the compilation time under vs_3_0/ps_3_0 profile seems to take more time, not sure if its the language construction or a regular behavior of 3.0 profiles.
Let me know if you are able to use this technique and If you are finding other interesting constructions or problems. That would be very interesting to dig a little more into the opportunities it opens. Lastly, I have done a small google search about this kind of technique, but didn't found anything... but It could have been used already by someone else, thus this whole technique is a new hypothetical discovery, but I enjoyed a lot to discover it!

4 comments:

  1. Very clever! Now on to try it on Cg :)

    ReplyDelete
  2. At first I thought: mixing interfaces and abstract classes != closures, then read on and it hit me. I approve.

    ReplyDelete
  3. You can declare a new class within a function!? Very nice indeed! I didn't know HLSL allowed this. I have been working on a different approach for using higher order functions. A library I am working on translates F# to HLSL. Check it out: https://github.com/rookboom/SharpShaders/wiki/Higher-order-functions

    ReplyDelete
  4. Love this. I'm using it in my deferred shading engine, to select the materials based on an index, so i don't have to loop trough each option. Really nice indeed

    ReplyDelete