Sonntag, 2. November 2014

Reflections on SICP (Part 1) - Function dispatching in C

Yesterday I've committed the last work for the Scheme project of SICP. Work, namely a lot of mighty C coding is mainly responsible for not having finished the course yet. However, since I am in the last quarter of the course material I decided to recap from time to time - what are the things I've learned, what was good, what was surprising and so on.

One of my favorite learnings is the idea of a dispatch table. Instead of having a long list of if/elseif branches which direct to a certain function the idea is to have this dispatching being done in a hash table. Since the table is dynamic the dispatching can be changed at run time (late binding).

Although I appreciate the "get my quickly from A to B" qualities of Python most of my daily business is based on good old C and Java. In this post I want to sketch out how function dispatching could be implemented in C - as usual as high level as possible.

Here is a naive Python example of using a dispatch table. See the last code example of this post of a more Pythonic version. This following code will be the template of our transformation to C later:

#!/usr/bin/python3

def _sumAll(numberList):
    sum = 0
    for number in numberList:
        sum = sum + number
    
    return sum

def _sumEven(numberList):
    sum = 0
    for number in numberList:
        if number % 2 == 0:
            sum = sum + number

    return sum

def _multiplyAll(numberList):
    prod = 1
    for number in numberList:
        prod = prod * number
    
    return prod


dispatchTable = {
    'sumAll' : _sumAll,
    'sumEven' : _sumEven,
    'multiplyAll' : _multiplyAll
}

def accumulateList(method, numberList):
    return dispatchTable[method](numberList)

numbers = (2,4,5,6)

print(accumulateList('sumAll', numbers))
print(accumulateList('sumEven', numbers))
print(accumulateList('multiplyAll', numbers))

Dispatching takes place in accumulateList where the dictionary dispatchTable is being looked up for the entry method. The function returned  is directly executed, hence the numberList argument.

So let'S look at C. As a friend of high-level expressions I'm using as usual GLib data structures. GList for the number list and GHashtable for the dispatch table.
#include <stdio.h>
#include <glib-2.0/glib.h>

static GHashTable* dispatchTable;

int _sumAll(GList* numberList) {
    int sum = 0;
    GList *iter = numberList;

    while(iter != NULL) {
        sum = sum + GPOINTER_TO_INT(iter->data);
        iter = g_list_next(iter);
    }

    return sum;
}

int _sumEven(GList* numberList) {
    int sum = 0;
    GList *iter = numberList;

    while(iter != NULL) {
        int number = GPOINTER_TO_INT(iter->data);
        if (number % 2 == 0) {
            sum = sum + number;
        }
        iter = g_list_next(iter);
    }

    return sum;
}

int _multiplyAll(GList* numberList) {
    int prod = 1;
    GList *iter = numberList;

    while(iter != NULL) {
        prod = prod * GPOINTER_TO_INT(iter->data);
        iter = g_list_next(iter);
    }

    return prod;
}

void createDispatchTable() {
    dispatchTable = g_hash_table_new(g_str_hash, g_str_equal);

    g_hash_table_insert(dispatchTable, g_strdup("sumAll"), 
                        _sumAll);
    g_hash_table_insert(dispatchTable, g_strdup("sumEven"), 
                        _sumEven);
    g_hash_table_insert(dispatchTable, g_strdup("multiplyAll"), 
                        _multiplyAll);
}

int accumulateList(const char* method, GList* numberList) {
    int (*func)(GList*);
    func = g_hash_table_lookup(dispatchTable, method);

    return func(numberList);
}

void destroyDispatchTable() {
    g_hash_table_destroy(dispatchTable);
}

int main() {
    GList* numbers = NULL;
    createDispatchTable();

    numbers = g_list_append(numbers, GINT_TO_POINTER (2));
    numbers = g_list_append(numbers, GINT_TO_POINTER (4));
    numbers = g_list_append(numbers, GINT_TO_POINTER (5));
    numbers = g_list_append(numbers, GINT_TO_POINTER (6));

    printf("%i\n",accumulateList("sumAll", numbers));
    printf("%i\n",accumulateList("sumEven", numbers));
    printf("%i\n",accumulateList("multiplyAll", numbers));

    destroyDispatchTable();
    g_list_free(numbers);

    return 0;
}

The code structure is close to the Python example before. A remark for the strange GINT_TO_POINTER and GPOINTER_TO_INT macros: This is the GLib way of storing and retrieving integers in their data structures. For more details refer to the GLib manual.

createDispatchTable uses a GLib GHashtable to store string to function pointer pairs.

accumulateList is retrieving the right function pointer by looking up the dispatch table. As the next step the function is executed and it's result is directly returned.

Usually I would append a common prefix to the public functions createDispatchTable(),
accumulateLis()  and destroyDispatchTable() to indicate that they belong to one module (see here for details) but for this example I tried to keep it simple.

So what do we think about that. Well, of course we could add some macro magic to shrink certain function calls (like the g_list_append) but I tend to use the C language straight, falling back on the preprocessor only if I can't achieve my goals directly.

The reason for that is more related to (good) code style. A macro always adds a new level of indirection which needs to be resolved by a human brain when trying to understand the code.  Eclipse CDTs macro expansion feature is definitly helpful here but I rather live without personal syntactic sugar.

The matter is different little fellows like GINT_TO_POINTER and GPOINTER_TO_INT. They represent a feature which can't be accomplished by using the language straight - so their fine.

I personally would like to see the definition of the dispatch table somehow outside a function, more prominent and if I could choose defined by something like designated initializers.

The GHashtable approach does not support that so we would need to replace it by something like a struct (e.g. DispatchTableEntry) and an array of these DispatchTableEntry struct.  The accumulateList function would then iterate over this array of structs looking for the right entry.

I tried that approach but it has a couple of downsides. One thing is that for simple usage of designated initializers the dispatch table array ideally has a fixed size. My personal favorite, a NULL terminated array of unknown length would lead to a rather odd syntax. So there is a trade-off here.

Also, implementing my own lookup functionality in accumulateList isn't what I was looking for - I wanted to use this out of the shelf.

An other obvious approach would be to define enums for the function names and use them as an index of an ordinary C array where the value would be the according function pointer. The lookup would be just a simple array index access using the enum as well.

Beside the missing runtime flexibility (you can't add enums at runtime) this approach has the problem that you maintain data at two sections: a new dispatch route would mean an entry in enums (new function name) and an extra entry in the array (the actual dispatch entry).

To my knowledge the best solution for this double data maintenance issue are X macros whose philosophy dates back to the 1960s. When you're in a pure C environment (poor you) this would be my way to go.

After weighting my options I stuck with the C implementation above. To add a new dispatch route function createDispatchTable has to be extended. Everything else is using battle proof implementations (GLib functions) in favour of (once again) rolling my own stuff.

To summarize the C example: this is as good as it gets in (high-level) C.

To conclude I promised to provide a Pythonic version of our example:
#!/usr/bin/python3

from operator import add,mul
from functools import reduce

onlyEvenFilter = lambda x: x % 2 == 0

dispatchTable = {
    'sumAll' : lambda list: reduce(add, list),
    'sumEven' : lambda list: reduce(add, filter(onlyEvenFilter, list)),
    'multiplyAll' : lambda list: reduce(mul, list)
}

def accumulateList(method, numberList):
    return dispatchTable[method](numberList)

numbers = (2,4,5,6)

print(accumulateList('sumAll', numbers))
print(accumulateList('sumEven', numbers))
print(accumulateList('multiplyAll', numbers))

That is more then 80 line of code in C versus 21 in Python - the same functionality, expressed more clearer (in my opinion) in 1/4 of the code ;-)

Keine Kommentare:

Kommentar veröffentlichen