Yesterday I've committed the last work for the Scheme project of SICP. Work, namely a lot of mighty C coding is mainly responsible for not having finished the course yet. However, since I am in the last quarter of the course material I decided to recap from time to time - what are the things I've learned, what was good, what was surprising and so on.
One of my favorite learnings is the idea of a
dispatch table. Instead of having a long list of
if/elseif branches which direct to a certain function the idea is to have this
dispatching being done in a hash table. Since the table is dynamic the dispatching can be changed at run time (
late binding).
Although I appreciate the "get my quickly from A to B" qualities of Python most of my daily business is based on good old C and Java. In this post I want to sketch out how function dispatching could be implemented in C - as usual as high level as possible.
Here is a naive Python example of using a dispatch table. See the last code example of this post of a more Pythonic version. This following code will be the template of our transformation to C later:
#!/usr/bin/python3
def _sumAll(numberList):
sum = 0
for number in numberList:
sum = sum + number
return sum
def _sumEven(numberList):
sum = 0
for number in numberList:
if number % 2 == 0:
sum = sum + number
return sum
def _multiplyAll(numberList):
prod = 1
for number in numberList:
prod = prod * number
return prod
dispatchTable = {
'sumAll' : _sumAll,
'sumEven' : _sumEven,
'multiplyAll' : _multiplyAll
}
def accumulateList(method, numberList):
return dispatchTable[method](numberList)
numbers = (2,4,5,6)
print(accumulateList('sumAll', numbers))
print(accumulateList('sumEven', numbers))
print(accumulateList('multiplyAll', numbers))
Dispatching takes place in
accumulateList where the dictionary
dispatchTable is being looked up for the entry
method. The function returned is directly executed, hence the
numberList argument.
So let'S look at C. As a friend of high-level expressions I'm using as usual GLib data structures.
GList for the number list and
GHashtable for the dispatch table.
#include <stdio.h>
#include <glib-2.0/glib.h>
static GHashTable* dispatchTable;
int _sumAll(GList* numberList) {
int sum = 0;
GList *iter = numberList;
while(iter != NULL) {
sum = sum + GPOINTER_TO_INT(iter->data);
iter = g_list_next(iter);
}
return sum;
}
int _sumEven(GList* numberList) {
int sum = 0;
GList *iter = numberList;
while(iter != NULL) {
int number = GPOINTER_TO_INT(iter->data);
if (number % 2 == 0) {
sum = sum + number;
}
iter = g_list_next(iter);
}
return sum;
}
int _multiplyAll(GList* numberList) {
int prod = 1;
GList *iter = numberList;
while(iter != NULL) {
prod = prod * GPOINTER_TO_INT(iter->data);
iter = g_list_next(iter);
}
return prod;
}
void createDispatchTable() {
dispatchTable = g_hash_table_new(g_str_hash, g_str_equal);
g_hash_table_insert(dispatchTable, g_strdup("sumAll"),
_sumAll);
g_hash_table_insert(dispatchTable, g_strdup("sumEven"),
_sumEven);
g_hash_table_insert(dispatchTable, g_strdup("multiplyAll"),
_multiplyAll);
}
int accumulateList(const char* method, GList* numberList) {
int (*func)(GList*);
func = g_hash_table_lookup(dispatchTable, method);
return func(numberList);
}
void destroyDispatchTable() {
g_hash_table_destroy(dispatchTable);
}
int main() {
GList* numbers = NULL;
createDispatchTable();
numbers = g_list_append(numbers, GINT_TO_POINTER (2));
numbers = g_list_append(numbers, GINT_TO_POINTER (4));
numbers = g_list_append(numbers, GINT_TO_POINTER (5));
numbers = g_list_append(numbers, GINT_TO_POINTER (6));
printf("%i\n",accumulateList("sumAll", numbers));
printf("%i\n",accumulateList("sumEven", numbers));
printf("%i\n",accumulateList("multiplyAll", numbers));
destroyDispatchTable();
g_list_free(numbers);
return 0;
}
The code structure is close to the Python example before. A remark for the strange
GINT_TO_POINTER and
GPOINTER_TO_INT macros: This is the GLib way of storing and retrieving integers in their data structures. For more details refer to the
GLib manual.
createDispatchTable uses a GLib
GHashtable to store
string to
function pointer pairs.
accumulateList is retrieving the right function pointer by looking up the dispatch table. As the next step the function is executed and it's result is directly returned.
Usually I would append a common prefix to the public functions
createDispatchTable(),
accumulateLis() and
destroyDispatchTable() to indicate that they belong to one module (see
here for details) but for this example I tried to keep it simple.
So what do we think about that. Well, of course we could add some macro magic to shrink certain function calls (like the
g_list_append) but I tend to use the C language straight, falling back on the preprocessor only if I can't achieve my goals directly.
The reason for that is more related to (good) code style. A macro always adds a new level of indirection which needs to be resolved by a human brain when trying to understand the code.
Eclipse CDTs macro expansion feature is definitly helpful here but I rather live without personal syntactic sugar.
The matter is different little fellows like
GINT_TO_POINTER and
GPOINTER_TO_INT. They represent a feature which can't be accomplished by using the language straight - so their fine.
I personally would like to see the definition of the dispatch table somehow outside a function, more prominent and if I could choose defined by something like
designated initializers.
The
GHashtable approach does not support that so we would need to replace it by something like a struct (e.g.
DispatchTableEntry) and an array of these
DispatchTableEntry struct. The
accumulateList function would then iterate over this array of structs looking for the right entry.
I tried that approach but it has a couple of downsides. One thing is that for simple usage of
designated initializers the dispatch table array ideally has a fixed size. My personal favorite, a NULL terminated array of unknown length would lead to
a rather odd syntax. So there is a trade-off here.
Also, implementing my own lookup functionality in
accumulateList isn't what I was looking for - I wanted to use this out of the shelf.
An other obvious approach would be to define
enums for the function names and use them as an index of an ordinary C array where the value would be the according function pointer. The lookup would be just a simple array index access using the
enum as well.
Beside the missing runtime flexibility (you can't add
enums at runtime) this approach has the problem that you maintain data at two sections: a new dispatch route would mean an entry in
enums (new function name) and an extra entry in the array (the actual dispatch entry).
To my knowledge the best solution for this double data maintenance issue are
X macros whose philosophy dates back to the 1960s. When you're in a pure C environment (poor you) this would be my way to go.
After weighting my options I stuck with the C implementation above. To add a new dispatch route function
createDispatchTable has to be extended. Everything else is using battle proof implementations (GLib functions) in favour of (once again) rolling my own stuff.
To summarize the C example: this is as good as it gets in (high-level) C.
To conclude I promised to provide a Pythonic version of our example:
#!/usr/bin/python3
from operator import add,mul
from functools import reduce
onlyEvenFilter = lambda x: x % 2 == 0
dispatchTable = {
'sumAll' : lambda list: reduce(add, list),
'sumEven' : lambda list: reduce(add, filter(onlyEvenFilter, list)),
'multiplyAll' : lambda list: reduce(mul, list)
}
def accumulateList(method, numberList):
return dispatchTable[method](numberList)
numbers = (2,4,5,6)
print(accumulateList('sumAll', numbers))
print(accumulateList('sumEven', numbers))
print(accumulateList('multiplyAll', numbers))
That is more then 80 line of code in C versus 21 in Python - the same functionality, expressed more clearer (in my opinion) in 1/4 of the code ;-)