Recursion in OpenCL kernels

Recursion (at last)

The previous example was a lot of hard work to call a function but can now be extended to do some recursion. Before we do that, let’s briefly recap what recursion looks like in CPU-land.

As mentioned, to call a function arguments and a return address (program counter) are pushed onto the stack to perform a function call. If a function call is made inside an existing function call then those arguments and another return address are added to the stack. This can go on until the functions start returning and the stack unwinds: each function return removes a set of arguments and a return address. A recursive function call is one that calls itself.

The simplest example I could think of was calculating the factorial of a number, e.g. the factorial of 5 is 5 * 4 * 3* 2 * 1 = 120.

int factorial_recursive( int number )
{
    int localNumber = number;

    // exit condition
    if(localNumber <= 1)
    {
        return 1;
    }

    // recursive call
    int numberReturnedFromFunctionCall = factorial_recursive( localNumber - 1 );

    // work load
    int myReturnValue = localNumber * numberReturnedFromFunctionCall;

    // return
    return myReturnValue;
}

This is a little verbose to make it very clear what each of the bits of the function are doing. Now onto the actual recursion in OpenCL:

int factorial_switch( Stack* pStack, int number )
{
    int finalValue = 0; // the value we will ultimately return from this function

    // the various valid labels we can have in our "program"
    typedef enum
    {
        ENTRY_POINT,
        PRE_FACTORIAL_FUNCTION_CALL,
        POST_FACTORIAL_FUNCTION_CALL,
        EXIT_POINT,
        FACTORIAL_FUNCTION,
        PRE_RECURSIVE_FACTORIAL_FUNCTION_CALL,
        POST_RECURSIVE_FACTORIAL_FUNCTION_CALL
    } ProgramLabels;

    ProgramLabels pc = ENTRY_POINT; // the program counter
    bool bDone = false;             // has "program" execution completed

    int returnValueRegister;        // a return value "register"

    // the offset on the pStack frame of the passed arguments
    const int returnAddressOffset = 0;
    const int numberArgOffset = 1;

    while(!bDone)
    {
        switch (pc)
        {
                // Main part of "program"
                //

            case ENTRY_POINT:
                // do some stuff

            case PRE_FACTORIAL_FUNCTION_CALL:
                // get ready for function "call"

                // push variable(s) onto pStack
                stackPush(pStack, number);

                // push "program counter" return label
                stackPush( pStack, (int)POST_FACTORIAL_FUNCTION_CALL );

                // "jump" to function
                pc = FACTORIAL_FUNCTION;
                break;

            case POST_FACTORIAL_FUNCTION_CALL:
                // clean up pStack
                stackPop(pStack, 2);

                // use the returned value
                finalValue = returnValueRegister;

            case EXIT_POINT:
                // mark that we've done
                bDone = true;
                break;

                // Function part of "program"
                //

            case FACTORIAL_FUNCTION:
            {
                // where to return to
                ProgramLabels returnAddress = (ProgramLabels) stackTop(pStack, returnAddressOffset);

                // get variable(s) from pStack
                int localNumber = stackTop(pStack, numberArgOffset);

                // exit condition
                if(localNumber <= 1)
                {
                    // "return" a value
                    returnValueRegister = 1;

                    // "jump" to return label
                    pc = returnAddress;
                    break;
                }
            }

            case PRE_RECURSIVE_FACTORIAL_FUNCTION_CALL:
                // get ready for function "call"

                // push variable(s) onto pStack
                stackPush( pStack, stackTop(pStack, numberArgOffset) - 1 );

                // push "program counter" return label
                stackPush( pStack, (int)POST_RECURSIVE_FACTORIAL_FUNCTION_CALL );

                // "jump" to function
                pc = FACTORIAL_FUNCTION;
                break;

            case POST_RECURSIVE_FACTORIAL_FUNCTION_CALL:
            {
                // clean up pStack
                stackPop(pStack, 2);

                // do some work in the "function"
                //
                int myReturnValue = stackTop(pStack, numberArgOffset) * returnValueRegister;

                // "return" a value
                returnValueRegister = myReturnValue;

                // "jump" to return label
                pc = (ProgramLabels) stackTop(pStack, returnAddressOffset);
                break;
            }

            default:
                // should never get here so just abort
                bDone = true;
                break;
        }
    }

    // return the final calculated value
    return finalValue;
}

__kernel void recurse( __global int* input, __global int* output, __local int* pStackData, size_t maxStackSize )
{
    size_t localId = get_local_id(0);

    Stack stack;
    stackInit( &stack, &pStackData[localId * maxStackSize], maxStackSize );

    size_t i = get_global_id(0);
    output[i] = factorial_switch( &stack, input[i] );
}

Final Thoughts

So there we have it. Recursion in OpenCL to whatever depth is required based on runtime variables 🙂 It is rather messy but very useful if you do want to have a recursive implementation of some algorithm. Bear in mind it’s often not the best way, e.g. a quick sort may not be the best sort to put on a GPU, so remember to think around the problem and no just blindly go in to an implementation. Also, if you want recursion based on a compile time constant this technique can be modified slightly for more speed by using private memory (registers) for the stack rather than local memory as the OpenCL compiler should be able to optimize away most of the register usage.

Now go forth and recurse!

One thought on “Recursion in OpenCL kernels

Leave a Reply

Your email address will not be published. Required fields are marked *