Static bytecode instruction optimization and pypy (JIT?) optimization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Static bytecode instruction optimization and pypy (JIT?) optimization

Rocky Bernstein
In looking at Python bytecode over the years, I've noticed that it does very little to no traditional static-compiler optimization. Specifically:

* Dead code elmination (most of the time)
* Jumps to Jumps or Jumps to the next instruction
* Constant propagation (most of the time)
* Common subexpression elimination
* Tail recursion
* Code hoisting/sinking

Yes, over the years more compiler optimization has been done but it's still pretty sparse.

The little that I've looked at pypy, it is pretty much the same thing at the Python bytecode level. That's why a python decompiler for say 2.7 will work largely unmodified for he corresponding pypy 2.7 version. Same is true 3.2 versus pypy 3.2.

I understand though that pypy can do and probably does some of the optimization when it JITs. But my question is: if traditional compiler optimization were done would this hinder, be neutral, or help pypy's optimization.

Of course, if there were static compiler optimization of the type described above, that might be a win when JITing doesn't kick in. (And perhaps then who cares)

But I'm interested in the other situation where both are in play.

_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

Rocky Bernstein
Yes, that's true, but I believe that such decisions should offered to the programmers rather than dictated.  There is also a whole other discussion on

* how/where optimizations/transformations could  be recorded
* how tracebacks could be handled
* debugging optimized code

But here I wanted to focus on just the optimization interaction. If you are implying that pypy doesn't do changes that would make such behaviorial changes, then that would suggest there may be a lot of benefit static compiler optimization.

On Sun, Jul 9, 2017 at 9:41 PM, [hidden email] <[hidden email]> wrote:
Part of the problem is behavioral changes. If you implement tail call recursion, your tracebacks change. Common subexpression elimination will also have subtle behavioral changes.


--
Ryan (ライアン)
Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone else
http://refi64.com
On Jul 9, 2017 at 8:11 PM, <[hidden email]> wrote:

In looking at Python bytecode over the years, I've noticed that it does very little to no traditional static-compiler optimization. Specifically:

* Dead code elmination (most of the time)
* Jumps to Jumps or Jumps to the next instruction
* Constant propagation (most of the time)
* Common subexpression elimination
* Tail recursion
* Code hoisting/sinking

Yes, over the years more compiler optimization has been done but it's still pretty sparse.

The little that I've looked at pypy, it is pretty much the same thing at the Python bytecode level. That's why a python decompiler for say 2.7 will work largely unmodified for he corresponding pypy 2.7 version. Same is true 3.2 versus pypy 3.2.

I understand though that pypy can do and probably does some of the optimization when it JITs. But my question is: if traditional compiler optimization were done would this hinder, be neutral, or help pypy's optimization.

Of course, if there were static compiler optimization of the type described above, that might be a win when JITing doesn't kick in. (And perhaps then who cares)

But I'm interested in the other situation where both are in play.
_______________________________________________ pypy-dev mailing list [hidden email] https://mail.python.org/mailman/listinfo/pypy-dev


_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

William ML Leslie
In reply to this post by Rocky Bernstein
On 10 July 2017 at 11:10, Rocky Bernstein <[hidden email]> wrote:

> In looking at Python bytecode over the years, I've noticed that it does very
> little to no traditional static-compiler optimization. Specifically:
>
>
> Yes, over the years more compiler optimization has been done but it's still
> pretty sparse.
>
> The little that I've looked at pypy, it is pretty much the same thing at the
> Python bytecode level. That's why a python decompiler for say 2.7 will work
> largely unmodified for he corresponding pypy 2.7 version. Same is true 3.2
> versus pypy 3.2.
>
> I understand though that pypy can do and probably does some of the
> optimization when it JITs. But my question is: if traditional compiler
> optimization were done would this hinder, be neutral, or help pypy's
> optimization.
>

It'd likely be neutral with respect to the JIT optimisation, which
operates on traces (what actually happens to the rpython objects)
rather than bytecode.  Within a trace, which is a fairly closed
universe, all of those optimisations are very easy to make.

As Ryan has mentioned, the bigger concern is going to be around the
expectations that people have on how python code will behave, and a
secondary concern is going to be people relying on optimisations that
only work on pypy.  It's one thing for people to tune their programs
for pypy, and another for people to rely on certain optimisations
being performed for correct execution.

If you attempt this, here's a few things to keep in mind:

> * Dead code elmination (most of the time)

Mostly a code size optimisation and not so key for performance, but it
is very strange that python generates different code for `return` at
the end of a function and `else: return` also at the end of a
function.  On the other hand, `if 0:` statements are removed, which is
handy.

> * Jumps to Jumps or Jumps to the next instruction

SGTM

> * Constant propagation (most of the time)

This is an interesting one.  Consider beta reducing xs in this code
(assume xs is otherwise unused):

\[
  xs = [1, 7, 2]

  if x in xs:
      ....
\]

Doing so would allow the list to be converted to a tuple and stored as
a constant against the code object (mostly because the value is never
used in an identity-dependent way).  However, people can use the
debugger to set a breakpoint on the assignment to xs, which they can't
do if no evaluation happens on this line.  They can even set xs to be
a different value in the debugger before continuing.

I'm a big fan of prebuilding objects at compile time, it's something
that the pypy team wished the JVM could do back when the Java backend
was written because loading time was awful.  Even now the JVM is
making changes to the constant pool layout that improve that
possibility.

> * Common subexpression elimination

This is not widely applicable in python, in fact the only
subexpressions that can be eliminated are a small group of expressions
that apply to objects constructed from literals.  A notable example is
that you can't remove a duplicated `x = 1 + y` because the object that
`y` refers to may have overridden __radd__ and that method may even
have side-effects.  There can be some success when either
whole-program optimisation, encoding preconditions into optimistically
compiled module code, or intraprocedural effect analysis is performed,
but there isn't much precedent for the first two in pypy and the third
is the subject of ongoing research.

> * Tail recursion

For functions that the compiler has shown are total and have a
reasonable and static bound on memory allocation and time?  Any
situation where asyncs need to be suspended, lest anyone hit C-c and
generate a traceback / kill the operation because it is taking too
long, need to be very tightly time bounded.

> * Code hoisting/sinking

Again, without a really solid type or effect system it's really hard
to know you're not changing the behaviour of the program when
performing such code movement, because almost all operations can
delegate to user-defined methods.

> Of course, if there were static compiler optimization of the type described
> above, that might be a win when JITing doesn't kick in. (And perhaps then
> who cares)
>
> But I'm interested in the other situation where both are in play.
>

A direction that might be interesting is seeing how far you can get by
making reasonable assumptions (module bindings for certain functions
have their original values, this value has this exact code object as
its special method `__radd__` or method `append`, this module `foo`
imported has this exact hash), checking them and branching to
optimised code, which may or may not be pypy bytecode.  The
preconditions themselves can't be written in pypy bytecode - you don't
want to be executing side effects when pre-emptively *looking up*
methods, which can happen due to descriptors, so you'd need to go
behind the scenes.  Nevertheless, they can be platform-independent and
could live in the .pyc for a first cut.

At the moment, it's necessary to assume *nothing* on return from *any*
user code, which can violate assumptions in remarkable ways, including
"what if the cmp or key functions change the list being sorted" or
"what if an __eq__ method removes the key being examined from the dict
or re-assigns it with a different hash", two cases that the pypy and
python runtimes handle without crashing.  Another good source of
problems is that python's concurrent memory model is far stricter than
Java's, so an optimisation that might be fine in a single threaded
situation may break sequential consistency.

I had been thinking that this sort of thing is a huge engineering
effort, but if you're just generating optimised bytecode and jumping
to that with some custom precondition-checking operation rather than
generating machine code, it might be achieveable. Generating machine
code is difficult because neither of rpython's compilers are well
suited to the task of generating general-purpose position-independent
machine code at app level - the JIT generates straight-line code only
and this assumption is made throughout the codebase, it also assumes
constants are in-memory during compilation and doesn't expect them to
be reloaded at a different location, wheras the static rpython
compiler expects to work on a whole program.

--
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

Armin Rigo-2
In reply to this post by Rocky Bernstein
Hi Rocky,

On 11 July 2017 at 06:28, Rocky Bernstein <[hidden email]> wrote:
> Yes, that's true, but I believe that such decisions should offered to the
> programmers rather than dictated.

If the choice for the programmer is between "run my program exactly
like I expect" and "run my program with some unexpected subtle effects
and no noticeable performance gain", I think the choice need not be
offered.


A bientôt,

Armin.
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

William ML Leslie
In reply to this post by William ML Leslie
On 11 July 2017 at 18:22, Rocky Bernstein <[hidden email]> wrote:
> There's too much generalization and abstraction in the discussion at least
> for me. So let me try to make this a little more concrete with real
> examples. First note that, although Python, yes, does "optimize" "if 0:"
> away, as soon as
> you do something as simple as "debug = 0; if debug:" it is confounded. And
> in my own experience I have done this.
>

Typically DEBUG is a global in the module, which means it can be set
from /outside/ the module, so it's not constant as far as the runtime
is concerned.  Within a function it's a bit wacky sure, and is only
kept around for interactive debugging reasons afaict.

Here's a possible type for `self.a` which illustrates why
CSE/Store-Sinking on a STORE_ATTR is not valid for the example
function `foo`.

class A(object):
    def set_b(self, value):
        print value
        self._b = value
    b = property(set_b)

class Example(object):
    def __init__(self):
        self.a = A()

    def foo(self):
        self.a.b = 5
        x = 0
        if x:
            self.a.b = 2
        else:
            return

Example().foo()

--
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

Rocky Bernstein
In reply to this post by Armin Rigo-2


On Tue, Jul 11, 2017 at 4:01 AM, Armin Rigo <[hidden email]> wrote:
Hi Rocky,

On 11 July 2017 at 06:28, Rocky Bernstein <[hidden email]> wrote:
> Yes, that's true, but I believe that such decisions should offered to the
> programmers rather than dictated.

If the choice for the programmer is between "run my program exactly
like I expect" and "run my program with some unexpected subtle effects
and no noticeable performance gain", I think the choice need not be
offered.

Sure but that's a straw argument and has a lot of packed opinions in it. Things like "run my program exactly like I expect" is loaded with opinions.  As has already been noted, Python removes "if 0:" by default.  When I first saw that, I thought it was cool, but honestly it wasn't "exactly as I expect" because I wanted my decompiler to recreate the source code as written which helps me in testing and lo and behold I was getting something different :-)  You know what?  I got use to it. I had to change my opinion of what "exactly as I expect" meant slightly.

And if it were the case that everyone felt that no choice needed to be offered, assuming everyone agreed on what "some subtle effects" and "no noticeable performance gain" meant, then gcc and clang wouldn't offer -O2 and -O3 optimization levels. Maybe not even -O or compile everything -O=-1.

As I said before, there's a different discussion to be had regarding debugging optimized code, and I have thoughts on that too.  And it's basically that if you have a decompiler and/or logs of how the program has been transformed that might be acceptable those who choose to use optimization.

I suspect a large number of people who write code, have tests as well. And those tests allow one some confidence in that the optimization performed is correct for the optimizer as well as the programmer using the optimizer.

In sum, for me, it is more about transparency in how the program has been worked on. If I can see that, I'm cool. Others may feel differently, but that's okay. Optimization is optional.
 


A bientôt,

Armin.


_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

Rocky Bernstein
In reply to this post by William ML Leslie


On Tue, Jul 11, 2017 at 4:35 AM, William ML Leslie <[hidden email]> wrote:
On 11 July 2017 at 18:22, Rocky Bernstein <[hidden email]> wrote:
> There's too much generalization and abstraction in the discussion at least
> for me. So let me try to make this a little more concrete with real
> examples. First note that, although Python, yes, does "optimize" "if 0:"
> away, as soon as
> you do something as simple as "debug = 0; if debug:" it is confounded. And
> in my own experience I have done this.
>

Typically DEBUG is a global in the module, which means it can be set
from /outside/ the module, so it's not constant as far as the runtime
is concerned.  Within a function it's a bit wacky sure, and is only
kept around for interactive debugging reasons afaict.

You presume to know much more about the programmer's intentions and behavior than I would care to venture into, or how and why such things could arise.



Here's a possible type for `self.a` which illustrates why
CSE/Store-Sinking on a STORE_ATTR is not valid for the example
function `foo`.

class A(object):
    def set_b(self, value):
        print value
        self._b = value
    b = property(set_b)

class Example(object):
    def __init__(self):
        self.a = A()

    def foo(self):
        self.a.b = 5
        x = 0
        if x:
            self.a.b = 2
        else:
            return

Example().foo()


Absolutely one can create pathological cases like this. I hope though this isn't your normal mode of coding though.

I suspect that this kind of thing doesn't occur often. You know, I encountered something analogous when I first started using Python's code coverage tool, coverage, and the Python style checker flake8. As per Mark Pilgram's original version of Dive into Python I used:

if __name__ == '__main__':

And I was annoyed that older versions of code coverage dinged me because that code was untested code unless I took special care to run it in testing. (Which sometimes I would do).

And I suspect most people have run into some annoyance with flake8 and the way it dictates formatting things.

So what people typically do if they choose to use coverage or flake8 is add additional comments that clew the tool not to consider that section of the code. As best as I can tell, it works out pretty well.


--
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.


_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

William ML Leslie
TLDR: Doing these sort of optimisations on /python/ requires you
understand the evaluation model.  You can't just do store-sinking like
descriptors aren't a thing, and you can't go inlining functions like
module globals aren't mutable bindings.  #XXTiredCompilation

On 11 July 2017 at 18:53, Rocky Bernstein <[hidden email]> wrote:

>
>
> On Tue, Jul 11, 2017 at 4:35 AM, William ML Leslie
> <[hidden email]> wrote:
>>
>> On 11 July 2017 at 18:22, Rocky Bernstein <[hidden email]> wrote:
>> > There's too much generalization and abstraction in the discussion at
>> > least
>> > for me. So let me try to make this a little more concrete with real
>> > examples. First note that, although Python, yes, does "optimize" "if 0:"
>> > away, as soon as
>> > you do something as simple as "debug = 0; if debug:" it is confounded.
>> > And
>> > in my own experience I have done this.
>> >
>>
>> Typically DEBUG is a global in the module, which means it can be set
>> from /outside/ the module, so it's not constant as far as the runtime
>> is concerned.  Within a function it's a bit wacky sure, and is only
>> kept around for interactive debugging reasons afaict.
>
>
> You presume to know much more about the programmer's intentions and behavior
> than I would care to venture into, or how and why such things could arise.
>

If the programmer assigns a module-level variable, then the variable
should appear to be set in that module.  What other intention do you
suppose to give this operation?

>
>>
>> Here's a possible type for `self.a` which illustrates why
>> CSE/Store-Sinking on a STORE_ATTR is not valid for the example
>> function `foo`.
>>
>> class A(object):
>>     def set_b(self, value):
>>         print value
>>         self._b = value
>>     b = property(set_b)
>>
>> class Example(object):
>>     def __init__(self):
>>         self.a = A()
>>
>>     def foo(self):
>>         self.a.b = 5
>>         x = 0
>>         if x:
>>             self.a.b = 2
>>         else:
>>             return
>>
>> Example().foo()
>
>
>
> Absolutely one can create pathological cases like this. I hope though this
> isn't your normal mode of coding though.
>

Descriptors - including properties - are a language feature that
people should be able to expect to work.  You can't just compile their
invocation out because you think it will make their program faster.

> I suspect that this kind of thing doesn't occur often.

I don't think you'll find many in industry that agree with you.  Have
you *seen* the level of magic behind django, flask, pandas?  Have you
ever had people bring their super-obscure bugs to you when they run
their program with your runtime?

> You know, I
> encountered something analogous when I first started using Python's code
> coverage tool, coverage, and the Python style checker flake8. As per Mark
> Pilgram's original version of Dive into Python I used:
>
> if __name__ == '__main__':
>
> And I was annoyed that older versions of code coverage dinged me because
> that code was untested code unless I took special care to run it in testing.
> (Which sometimes I would do).
>

PSA: don't run your tests directly, use a test runner, and say goodbye
to all the awful hacks around getting your path setup correctly and
bugs when the tests run against the installed version rather than the
version you are editing.

--
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

Carl Friedrich Bolz
In reply to this post by Rocky Bernstein
On 11/07/17 10:37, Rocky Bernstein wrote:
> Sure but that's a straw argument and has a lot of packed opinions in it.
> Things like "run my program exactly like I expect" is loaded with
> opinions.  As has already been noted, Python removes "if 0:" by
> default.  When I first saw that, I thought it was cool, but honestly it
> wasn't "exactly as I expect" because I wanted my decompiler to recreate
> the source code as written which helps me in testing and lo and behold I
> was getting something different :-)  You know what?  I got use to it. I
> had to change my opinion of what "exactly as I expect" meant slightly.

Actually, to me this would be an argument to remove the "if 0:"
optimization ;-).

Cheers,

Carl Friedrich
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Static bytecode instruction optimization and pypy (JIT?) optimization

William ML Leslie
On 11 July 2017 at 19:28, Carl Friedrich Bolz <[hidden email]> wrote:

> On 11/07/17 10:37, Rocky Bernstein wrote:
>> Sure but that's a straw argument and has a lot of packed opinions in it.
>> Things like "run my program exactly like I expect" is loaded with
>> opinions.  As has already been noted, Python removes "if 0:" by
>> default.  When I first saw that, I thought it was cool, but honestly it
>> wasn't "exactly as I expect" because I wanted my decompiler to recreate
>> the source code as written which helps me in testing and lo and behold I
>> was getting something different :-)  You know what?  I got use to it. I
>> had to change my opinion of what "exactly as I expect" meant slightly.
>
> Actually, to me this would be an argument to remove the "if 0:"
> optimization ;-).
>

Not least of all because `if False:` is more ideomatic, but False can
be bound to a value that is True, so the statement can't be folded!

The lengths the compiler goes to in order to do what you told it (:
and yet we still get bug reports that our built-in objects have
methods that cpython's built-ins don't have and this makes some
popular library give the wrong result.

Oblig: https://twitter.com/shit_jit_says/status/446914805191172096
https://twitter.com/fijall/status/446913102190505984

--
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev