Addition of a NumbaCTarget

TL;DR I will start to work on NumbaCTarget which uses numba.cfunc instead of numba.jit. Input is welcome.

I have recently started looking into the NumbaTarget. My use case is that I want to do just-in-time compilation of loopy kernels from a C++ application. Using an embedded Python Interpreter + Numba seems like an easy option to add the necessary JIT runtime. I carried out some preliminary tests and they are actually quite satisfying. I tested with a finite element assembly kernel and cranked up the number of quadrature points to see how performance scales with kernel workload. At some moderately high degree, there was no significant difference between the hand-written code from Dune and the jitted kernel. However, for very small workloads - as often experienced in low order FEM - the overhead of calling the jitted function is very high. I suspect that this overhead goes beyond normal function call overhead quite drastically, as the assembly output from numba.jit is again wrapped in a Python wrapper. Ironically, I need to pack my C arguments into Python just to have them be unpacked by that wrapper immediately afterwards. From reading the Numba docs, I think that using numba.cfunc (http://numba.pydata.org/numba-doc/latest/user/cfunc.html) is the correct remedy for this problem. However, it is more work than just exchanging decorators, as

the automatic type inference of lazy numba jitting does not work anymore
the kernel invocation can no longer use named arguments (but the exact argument list depends on external (UFL) input)

I will therefore add this as an additional target NumbaCTarget - trying to reuse as much code as possible through base classes as is currently done with NumbaTarget and NumbaCudaTarget. I open this issue to let everybody know that I am working on this. If you have any feedback, let me know.