cpyext performance

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

cpyext performance

Antonio Cuni
recently I have been playing a bit with cpyext, so see if there are low haning fruits to be taken to improve the performance.

I didn't get any real result but I think it's interesting to share my findings. 
The benchmark I'm using is here:

it contains a simple C extension defining three methods, one for each METH_NOARGS, METH_O and METH_VARARGS flags.

So first, the results with CPython and PyPy 5.8:

$ python bench.py 
noargs : 0.78 secs
onearg : 0.89 secs
varargs: 1.05 secs

$ pypy bench.py 
noargs : 1.67 secs
onearg : 2.13 secs
varargs: 4.89 secs

Then, I tried my cpyext-jit branch; this branch does two things:
1) it makes cpyext visible to the JIT, and add enough @jit.dont_look_inside so that it actually compiles
2) merges part of the cpyext-callopt branch, up to rev 9cbc8bd76297 (more on this later): this adds fast paths for METH_NOARGS and METH_O to avoid going through the slow __args__.unpack():

$ pypy-cpyext-jit bench.py
noargs : 0.30 secs
onearg : 0.31 secs
varargs: 4.90 secs

So, apparently this is enough to greatly speedup the calls, and be even faster than CPython. Note that "onearg" calls "simple.onearg(None)".

However, things become more complicated as soon as I start passing various kind of objects to onearg():

$ pypy bench_oneargs.py   # pypy 5.8
onearg(None): 2.09 secs
onearg(1)   : 2.07 secs
onearg(i)   : 4.98 secs
onearg(i%2) : 4.92 secs
onearg(X)   : 2.13 secs
onearg((1,)): 2.30 secs
onearg((i,)): 9.80 secs

$ pypy-cpyext-jit bench_oneargs.py 
onearg(None): 0.30 secs
onearg(1)   : 0.30 secs
onearg(i)   : 2.52 secs
onearg(i%2) : 2.56 secs
onearg(X)   : 0.30 secs
onearg((1,)): 0.30 secs
onearg((i,)): 7.45 secs

so, the call optimization still helps, but as soon as we need to convert one object from pypy to cpython we are horribly slow. However, it is interesting to note that:
1) if we pass a constant object, we are fast: None, 1, (1,)
2) if we pass X (which is a global X=100), we are still fast
3) any other object which is created on the fly is slow

Looking at the traces, they look more or less the same in the three cases, so I don't really understand what is the difference.

Finally, about the branch cpyext-callopt, which was started in Leysin by Richard, Armin and me: I am not sure to fully understand the purpose of dbba78b270fd: apparently, the optimization done in 9cbc8bd76297 seems to work well, so what am I missing?


pypy-dev mailing list
[hidden email]