To complete our stack we just need to set it up before use. Enough local memory needs to allocated so that each thread in a workgroup can access maxStackSize elements, i.e. maxStackSize * numItemsInWorkGroup * sizeof(int) bytes.
__kernel void useStack( __local int* pStackData, size_t maxStackSize, ... ) { size_t localId = get_local_id(0); Stack stack; stackInit( &stack, &pStackData[localId * maxStackSize], maxStackSize ); ... etc. }
A Simple OpenCL Example
Armed with a stack and some knowledge of how functions are called here’s a simple OpenCL function to demonstrate “calling” a function via a stack.
int simple_switch( Stack* pStack, int number ) { int finalValue = 0; // the value we will ultimately return from this function // the various valid labels we can have in our "program" typedef enum { ENTRY_POINT, PRE_FUNCTION_CALL, POST_FUNCTION_CALL, EXIT_POINT, FUNCTION_TO_CALL } ProgramLabels; ProgramLabels pc = ENTRY_POINT; // the program counter bool bDone = false; // has "program" execution completed int returnValue; // a return value "register" while(!bDone) { switch (pc) { case ENTRY_POINT: // do some stuff case PRE_FUNCTION_CALL: // get ready for function "call" // push variable(s) onto pStack stackPush(pStack, number); // push "program counter" return label stackPush( pStack, (int)POST_FUNCTION_CALL ); // "jump" to function pc = FUNCTION_TO_CALL; break; case POST_FUNCTION_CALL: // clean up pStack stackPop(pStack, 1); stackPop(pStack, 1); // use the returned value finalValue = returnValue * 10; case EXIT_POINT: // mark that we've done bDone = true; break; case FUNCTION_TO_CALL: { // where to return to ProgramLabels returnAddress = (ProgramLabels) stackTop(pStack, 0); stackPop(pStack, 1); // get variables from pStack int localNumber = stackTop(pStack, 0); stackPop(pStack, 1); // do some work in the "function" // localNumber++; // "return" a value returnValue = localNumber; // "jump" to return label pc = returnAddress; break; } default: // should never get here so just abort bDone = true; break; } } // return the final calculated value return finalValue; }
Call it from a kernel something like
__kernel void recurse( __global int* input, __global int* output, __local int* pStackData, size_t maxStackSize ) { size_t localId = get_local_id(0); Stack stack; stackInit( &stack, &pStackData[localId * maxStackSize], maxStackSize ); size_t i = get_global_id(0); output[i] = simple_switch( &stack, input[i] ); }
The function doesn’t really do much other than demonstrate how to pull an argument from the stack, modify it and return the modified value. The change in program counter are done via the labels within the switch statement.
Thanks for the great post and examples.