Automated binding generation (and maintenance)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Automated binding generation (and maintenance)

Shaheed Haque
Hi,

I have been looking at the problem of automated binding generation (and maintenance) for large C++ code bases for a little while now [1], but am new to cppyy.

One issue I am struggling to find a good solution for is to generate an accurate list of the objects (classes, functions, variables etc) in a given header file in order to populate the selection .XML.

Ideally, I'd like to be able to say "all objects in this translation unit". I tried the wildcard "*" but I believe that selects the transitive fanout (and runs into errors).

Short of running Clang directly to generate the names, what options do I have? (Currently, I'm working around this by manually specifying a narrower wildcard such as "KJS*").

Also, as I looked around for approaches to this issue, I noted that the cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has 6.10.0.2. I'm not sure of the mapping of versions, but what is the cadence for updates to PyPI?



_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

Shaheed Haque
[ Resend now I am subscribed, apologies for any duplication ]

Hi,

I have been looking at the problem of automated binding generation (and maintenance) for large C++ code bases for a little while now [1], but am new to cppyy.

One issue I am struggling to find a good solution for is to generate an accurate list of the objects (classes, functions, variables etc) in a given header file in order to populate the selection .XML.

Ideally, I'd like to be able to say "all objects in this translation unit". I tried the wildcard "*" but I believe that selects the transitive fanout (and runs into errors).

Short of running Clang directly to generate the names, what options do I have? (Currently, I'm working around this by manually specifying a narrower wildcard such as "KJS*").

Also, as I looked around for approaches to this issue, I noted that the cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has 6.10.0.2. I'm not sure of the mapping of versions, but what is the cadence for updates to PyPI?




_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

wlavrijsen@lbl.gov
In reply to this post by Shaheed Haque
Shaheed,

> One issue I am struggling to find a good solution for is to generate an
> accurate list of the objects (classes, functions, variables etc) in a given
> header file in order to populate the selection .XML.

that option exists, but apparently no-one has ever used it, as it is clearly
broken. :P It should be:

<selection>
    <class     pattern="*" file_name="SomeHeader.h" />
    <enum      pattern="*" file_name="SomeHeader.h" />
    <function  pattern="*" file_name="SomeHeader.h" />
    <variable  pattern="*" file_name="SomeHeader.h" />
</selection>

genreflex exists for backwards compatibility, underneath it's rootcling,
which accepts this:

   #pragma link C++ defined_in "SomeHeader.h";

and that does work ... I'll dig a bit, see what goes wrong with genreflex;
should be no more than proper rule registration.

But if not restricting selection, what errors are you seeing?

> Short of running Clang directly to generate the names, what options do I
> have?

If using PyPy (not yet CPython), you can load all files in a header, include
that, and simply start looping over dir(cppyy.gbl). (This is one of a set of
things that I still have to equalize between PyPy/cppyy and CPython/cppyy.)

> Also, as I looked around for approaches to this issue, I noted that the
> cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has
> 6.10.0.2.

That getAllClasses was a hack for compatibility reasons that doesn't do what
the name supposes it does: there can always be more classes that could be
found through a mapping file, but haven't yet. Hence a functional dir() is a
better approach.

> I'm not sure of the mapping of versions, but what is the cadence for
> updates to PyPI?

It's only since a few months that I split everything off into a standalone
package (there's a reason the first version digit is still 0) and I'm still
sitting on some restructuring to separate things that update often from
things that don't. The backend part is expected to update every half year
or so, once packaging stabilizes (that's the cling schedule).

> [1] https://marc.info/?l=kde-core-devel&m=150464598710128&w=2

Just a few minor points in response to that message. E.g. yes, overloads
end up as a single Python function, but if you don't want that, then you
can use __disp__("signature") to pick out the ones you want. Those are
first-class objects, and allow any kind of restructuring that Python
allows.

As for needing cling, that's only if you need the dynamic features. It is
also possible to use it to generate bindings to be used for cffi. You need
to pre-instantiate templates and such, but that's already the case for any
other bindings tool. And for that matter, at that level you could use it
to generate what you need for SIP, too.

Best regards,
            Wim
--
[hidden email]    --    +1 (510) 486 6411    --    www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

Shaheed Haque
Hi Wim,

On 14 September 2017 at 01:04,  <[hidden email]> wrote:

> Shaheed,
>
>> One issue I am struggling to find a good solution for is to generate an
>> accurate list of the objects (classes, functions, variables etc) in a
>> given
>> header file in order to populate the selection .XML.
>
>
> that option exists, but apparently no-one has ever used it, as it is clearly
> broken. :P It should be:
>
> <selection>
>    <class     pattern="*" file_name="SomeHeader.h" />
>    <enum      pattern="*" file_name="SomeHeader.h" />
>    <function  pattern="*" file_name="SomeHeader.h" />
>    <variable  pattern="*" file_name="SomeHeader.h" />
> </selection>
>
> genreflex exists for backwards compatibility, underneath it's rootcling,
> which accepts this:
>
>   #pragma link C++ defined_in "SomeHeader.h";

Ah, I had not realised rootcling existed. I've seen that I can invoke
it using Python version-specific paths...is this the correct way to
invoke it:

ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend
LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h

or is there a recommended wrapper?

> and that does work ... I'll dig a bit, see what goes wrong with genreflex;
> should be no more than proper rule registration.
> But if not restricting selection, what errors are you seeing?
>

With this:

======
    <selection>
       <class pattern="*" />
       <function pattern="*" />
       <variable pattern="*" />
       <enum pattern="*" />
    </selection>
======

I actually get some warnings and then the error:

======
Warning: Class or struct
basic_string<char16_t,char_traits<char16_t>,allocator<char16_t>
>::_Alloc_hider was selected but its dictionary cannot be generated:
this is a private or protected class and this is not supported. No
direct I/O operation of
basic_string<char16_t,char_traits<char16_t>,allocator<char16_t>
>::_Alloc_hider instances will be possible.
Warning: Class or struct
basic_string<char32_t,char_traits<char32_t>,allocator<char32_t>
>::_Alloc_hider was selected but its dictionary cannot be generated:
this is a private or protected class and this is not supported. No
direct I/O operation of
basic_string<char32_t,char_traits<char32_t>,allocator<char32_t>
>::_Alloc_hider instances will be possible.
Warning: Class or struct
basic_string<_CharT,_Traits,_Alloc>::_Alloc_hider was selected but its
dictionary cannot be generated: this is a private or protected class
and this is not supported. No direct I/O operation of
basic_string<_CharT,_Traits,_Alloc>::_Alloc_hider instances will be
possible.
Warning: Class or struct string::_Alloc_hider was selected but its
dictionary cannot be generated: this is a private or protected class
and this is not supported. No direct I/O operation of
string::_Alloc_hider instances will be possible.
Warning: Class or struct
basic_string<wchar_t,char_traits<wchar_t>,allocator<wchar_t>
>::_Alloc_hider was selected but its dictionary cannot be generated:
this is a private or protected class and this is not supported. No
direct I/O operation of
basic_string<wchar_t,char_traits<wchar_t>,allocator<wchar_t>
>::_Alloc_hider instances will be possible.
Warning: Class or struct ios_base::_Callback_list was selected but its
dictionary cannot be generated: this is a private or protected class
and this is not supported. No direct I/O operation of
ios_base::_Callback_list instances will be possible.
Warning: Class or struct ios_base::_Words was selected but its
dictionary cannot be generated: this is a private or protected class
and this is not supported. No direct I/O operation of ios_base::_Words
instances will be possible.
Error in <CloseStreamerInfoROOTFile>: Cannot find class __pthread_mutex_s.
======

The command line in use is:

======
genreflex /usr/include/KF5/kjs/kjsinterpreter.h -s selection.xml -o
tmp3/kjsinterpreter.cpp -I/usr/include/x86_64-linux-gnu/qt5
-I/usr/include/x86_64-linux-gnu/qt5/QtCore -I/usr/include/KF5/kjs
-I/usr/include/KF5/wtf
======

I did wonder if I was missing some "-isystem" includes, and tried
adding them but the --debug output from genreflex seemed to suggest
they were being ignored.

>> Short of running Clang directly to generate the names, what options do I
>> have?
>
>
> If using PyPy (not yet CPython), you can load all files in a header, include
> that, and simply start looping over dir(cppyy.gbl). (This is one of a set of
> things that I still have to equalize between PyPy/cppyy and CPython/cppyy.)
>
>> Also, as I looked around for approaches to this issue, I noted that the
>> cppyy backend v610 5 source code has a "getAllClasses" whereas PyPI has
>> 6.10.0.2.
>
>
> That getAllClasses was a hack for compatibility reasons that doesn't do what
> the name supposes it does: there can always be more classes that could be
> found through a mapping file, but haven't yet. Hence a functional dir() is a
> better approach.

Ack. My driver code is exactly intended to handle this kind of thing
by walking the directories and invoking genreflex/rootcling. One issue
is that I've been experimenting with directly using cppyy.gbl.gROOT
et. al. to try to identify only the classes (and later variables etc)
directly in kjsinterprter.h by looking at
cppyy.gbl.gInterpreter.ClassInfo_FileName() for the relevant class
name with something roughly like this:

ci = cppyy.gbl.gInterpreter.ClassInfo_Factory('KJSInterpreter')
cppyy.gbl.gInterpreter.ClassInfo_FileName(ci)

What is interesting, and might possibly throw light on the selection
filter issue, is that the file name for the classes in
kjsinterpreter.h itself is always the empty string ''. Classes that
come from included files return non-empty strings such as
'kjsobject.h' for 'KJSObject'.

BTW, the reason for doing this is that lots of KDE code has multiple
classes and even namespaces in a single header file. Now, for
discoverability of the loaded objects, I find the incremental "pop
into cppyy,gbl on demand" somewhat limiting and I wanted to play about
with that. I could also workaround the filter issue if I precomputed
the needed names in a precursor pass.

Finally, and most importantly given the fidelity with which cppyy
renders the C++ code, I'm think about how Pythonisation customisation
might be handled: e.g. a Python wrapper layer to allow a
pointer-plus-size to render as a Python list/tuple, or generate a dict
mapping fora QSet, and so on. (I'm dimly aware of the
boost-recognition logic you have alluded to, this is specifically more
about Qt-specific patterns and ad-hoc scenarios).

>> I'm not sure of the mapping of versions, but what is the cadence for
>> updates to PyPI?
>
>
> It's only since a few months that I split everything off into a standalone
> package (there's a reason the first version digit is still 0) and I'm still
> sitting on some restructuring to separate things that update often from
> things that don't. The backend part is expected to update every half year
> or so, once packaging stabilizes (that's the cling schedule).
>
>> [1] https://marc.info/?l=kde-core-devel&m=150464598710128&w=2
>
>
> Just a few minor points in response to that message. E.g. yes, overloads
> end up as a single Python function, but if you don't want that, then you
> can use __disp__("signature") to pick out the ones you want. Those are
> first-class objects, and allow any kind of restructuring that Python
> allows.
>
> As for needing cling, that's only if you need the dynamic features. It is
> also possible to use it to generate bindings to be used for cffi. You need
> to pre-instantiate templates and such, but that's already the case for any
> other bindings tool. And for that matter, at that level you could use it
> to generate what you need for SIP, too.

Thanks for the kind hints, but you've only managed to whet my appetite
to get cppyy working as it is exactly things like the handling of
overloads and template instantiation that I want most!

Thanks, Shaheed

P.S. Please note that after today, I'll likely not have much Internet
access for a couple of weeks, so any responses may be limited.

> Best regards,
>            Wim
> --
> [hidden email]    --    +1 (510) 486 6411    --    www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

wlavrijsen@lbl.gov
Shaheed,

> Ah, I had not realised rootcling existed. I've seen that I can invoke
> it using Python version-specific paths...is this the correct way to
> invoke it:
>
> ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend
> LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h

Yes, and here's a description of the LinkDef.h format:

   https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-linkdef.h-file

> or is there a recommended wrapper?

No, but I'm going to add one for pip, same as I did for genreflex. I've
been fleshing out the backend generation, taken over from Anto:

   https://bitbucket.org/wlav/cppyy-backend

where all that can live. I'm told that I'll need rootcling anyway for
use of modules (see below).

> I actually get some warnings and then the error:

Add this set of exclusions to the selection.xml:

<exclusion>
    <class pattern="*thread_mutex*" />
    <class pattern="*new_allocator*" />
    <class pattern="*Alloc_hider*" />
</exclusion>

Of course, the larger problem of pulling in these standard libs over and
over again is that it is a waste of cpu and memory, so I do want to see
the file_name attribute fixed. As it stands, I'd simply exclude:

    <class pattern="std::*" />
    <class pattern="__gnu_cxx::*" />

especially since they are already available by default. Note that those two
rules cover the ones needed for new_allocator and Alloc_hider.

However, there is a more efficient approach that is right around the corner
(and has been right about the corner for a long time, so don't hold me to
that). Next release now seems likely though.

The long term goal has always been to use modules:

   http://clang.llvm.org/docs/Modules.html

but the original drivers (Apple, Google, and the C++ standards committee)
have been going back and forth on it. Now, things are finally falling into
place. Here's Google:

   https://www.youtube.com/watch?v=dHFNpBfemDI

And here's ROOT:

   https://indico.cern.ch/event/643728/contributions/2612822/attachments/1494074/2323893/ROOTs_C_modules_status_report.pdf

The big deal is that C++ developers have an incentive to deploy modules, so
being able to patch into that should be a huge time saver (and where they
don't, rootcling will soon be able to create modules from headers). Note
that modules don't come for free: it will require some ambiguity resolution,
but that is typically a Good Thing (code-quality wise).

Modules allow deserialization of only the piece of the AST that is actually
being requested, saving memory. This as opposed to header files (whether or
not precompiled) which pull in everything before them. See the status report
above for the improvements in memory usage.

And with modules, of course, selection becomes unnecessary (markup for
automatic streamers may still be useful, but that is not relevant for
bindings generation).

> I did wonder if I was missing some "-isystem" includes, and tried
> adding them but the --debug output from genreflex seemed to suggest
> they were being ignored.

Some flags are ignored as no-one was using them (so far). Some others
are definitely obsolete by now.

> What is interesting, and might possibly throw light on the selection
> filter issue, is that the file name for the classes in
> kjsinterpreter.h itself is always the empty string ''. Classes that
> come from included files return non-empty strings such as
> 'kjsobject.h' for 'KJSObject'.

That's after the fact (i.e. what is stored); I don't see the rule being
respected/used at all.

> BTW, the reason for doing this is that lots of KDE code has multiple
> classes and even namespaces in a single header file. Now, for
> discoverability of the loaded objects, I find the incremental "pop
> into cppyy,gbl on demand" somewhat limiting and I wanted to play about
> with that. I could also workaround the filter issue if I precomputed
> the needed names in a precursor pass.

The issue here is the memory cost of loading things that won't get used
in the end. This is why a functional dir() (which needs nothing but
strings, after all), in conjunction with lazy loading/creation when a
real access happens work well. LLVM is fully lookup based, btw. There
is a custom layer on top of Cling to make enumeration possible.

> Finally, and most importantly given the fidelity with which cppyy
> renders the C++ code, I'm think about how Pythonisation customisation
> might be handled: e.g. a Python wrapper layer to allow a
> pointer-plus-size to render as a Python list/tuple, or generate a dict
> mapping fora QSet, and so on. (I'm dimly aware of the
> boost-recognition logic you have alluded to, this is specifically more
> about Qt-specific patterns and ad-hoc scenarios).

In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of
a lack of test coverage, but did put in in PyROOT. Here's an example of
the "pointer-plus-size" pythonization (from ROOT.py):

     # python side pythonizations (should live in their own file, if we get many)
       def set_size(self, buf):
          buf.SetSize(self.GetN())
          return buf

     # TODO: add pythonization API to pypy-c
       if not PYPY_CPPYY_COMPATIBILITY_FIXME:
          cppyy.add_pythonization(
             cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$", "GetE?[XYZ]$", set_size))

The functions selected by the regexps return naked pointers, but the object
can be queried for the size (all have a consistent GetN() function). So the
method composer patches up the return value, making it a sized array,
instead of an "open-ended" one.

I'm sitting on some patches as I wanted to tweak his APIs a bit. There
was some ordering that I felt didn't compose well, but that is minor.

Similarly, there's code to apply ownership rules, mapping exceptions,
the new C++11 smartptrs, controlling auto-casting, handling the GIL, making
properties, and adding overloads. All driven by regexp matching of patterns.
See here:

   https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b256/python/cppyy/_pythonization.py?at=master

(plus further support inside the bindings layer itself).

Of course, one can hook up completely custom functions, and he made it so
that that is per C++ namespace, so nicely self-contained.

Again, this is currently only partly available, as I need to write a lot
more tests for PyPy (which are bound to unearth some problems along the
way). And then there is documentation to be written ...

> P.S. Please note that after today, I'll likely not have much Internet
> access for a couple of weeks, so any responses may be limited.

I'll make sure I have at least all my local changes pushed by then. :)

Best regards,
            Wim
--
[hidden email]    --    +1 (510) 486 6411    --    www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

Shaheed Haque
Wim,

Thanks for the detailed and thoughtful reply. I will digest and
respond when I am properly back in circulation.

On 15 September 2017 at 07:43,  <[hidden email]> wrote:

> Shaheed,
>
>> Ah, I had not realised rootcling existed. I've seen that I can invoke
>> it using Python version-specific paths...is this the correct way to
>> invoke it:
>>
>> ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend
>> LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h
>
>
> Yes, and here's a description of the LinkDef.h format:
>
>
> https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-linkdef.h-file
>
>> or is there a recommended wrapper?
>
>
> No, but I'm going to add one for pip, same as I did for genreflex. I've
> been fleshing out the backend generation, taken over from Anto:
>
>   https://bitbucket.org/wlav/cppyy-backend
>
> where all that can live. I'm told that I'll need rootcling anyway for
> use of modules (see below).
>
>> I actually get some warnings and then the error:
>
>
> Add this set of exclusions to the selection.xml:
>
> <exclusion>
>    <class pattern="*thread_mutex*" />
>    <class pattern="*new_allocator*" />
>    <class pattern="*Alloc_hider*" />
> </exclusion>
>
> Of course, the larger problem of pulling in these standard libs over and
> over again is that it is a waste of cpu and memory, so I do want to see
> the file_name attribute fixed. As it stands, I'd simply exclude:
>
>    <class pattern="std::*" />
>    <class pattern="__gnu_cxx::*" />
>
> especially since they are already available by default. Note that those two
> rules cover the ones needed for new_allocator and Alloc_hider.
>
> However, there is a more efficient approach that is right around the corner
> (and has been right about the corner for a long time, so don't hold me to
> that). Next release now seems likely though.
>
> The long term goal has always been to use modules:
>
>   http://clang.llvm.org/docs/Modules.html
>
> but the original drivers (Apple, Google, and the C++ standards committee)
> have been going back and forth on it. Now, things are finally falling into
> place. Here's Google:
>
>   https://www.youtube.com/watch?v=dHFNpBfemDI
>
> And here's ROOT:
>
>
> https://indico.cern.ch/event/643728/contributions/2612822/attachments/1494074/2323893/ROOTs_C_modules_status_report.pdf
>
> The big deal is that C++ developers have an incentive to deploy modules, so
> being able to patch into that should be a huge time saver (and where they
> don't, rootcling will soon be able to create modules from headers). Note
> that modules don't come for free: it will require some ambiguity resolution,
> but that is typically a Good Thing (code-quality wise).
>
> Modules allow deserialization of only the piece of the AST that is actually
> being requested, saving memory. This as opposed to header files (whether or
> not precompiled) which pull in everything before them. See the status report
> above for the improvements in memory usage.
>
> And with modules, of course, selection becomes unnecessary (markup for
> automatic streamers may still be useful, but that is not relevant for
> bindings generation).
>
>> I did wonder if I was missing some "-isystem" includes, and tried
>> adding them but the --debug output from genreflex seemed to suggest
>> they were being ignored.
>
>
> Some flags are ignored as no-one was using them (so far). Some others
> are definitely obsolete by now.
>
>> What is interesting, and might possibly throw light on the selection
>> filter issue, is that the file name for the classes in
>> kjsinterpreter.h itself is always the empty string ''. Classes that
>> come from included files return non-empty strings such as
>> 'kjsobject.h' for 'KJSObject'.
>
>
> That's after the fact (i.e. what is stored); I don't see the rule being
> respected/used at all.
>
>> BTW, the reason for doing this is that lots of KDE code has multiple
>> classes and even namespaces in a single header file. Now, for
>> discoverability of the loaded objects, I find the incremental "pop
>> into cppyy,gbl on demand" somewhat limiting and I wanted to play about
>> with that. I could also workaround the filter issue if I precomputed
>> the needed names in a precursor pass.
>
>
> The issue here is the memory cost of loading things that won't get used
> in the end. This is why a functional dir() (which needs nothing but
> strings, after all), in conjunction with lazy loading/creation when a
> real access happens work well. LLVM is fully lookup based, btw. There
> is a custom layer on top of Cling to make enumeration possible.
>
>> Finally, and most importantly given the fidelity with which cppyy
>> renders the C++ code, I'm think about how Pythonisation customisation
>> might be handled: e.g. a Python wrapper layer to allow a
>> pointer-plus-size to render as a Python list/tuple, or generate a dict
>> mapping fora QSet, and so on. (I'm dimly aware of the
>> boost-recognition logic you have alluded to, this is specifically more
>> about Qt-specific patterns and ad-hoc scenarios).
>
>
> In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of
> a lack of test coverage, but did put in in PyROOT. Here's an example of
> the "pointer-plus-size" pythonization (from ROOT.py):
>
>     # python side pythonizations (should live in their own file, if we get
> many)
>       def set_size(self, buf):
>          buf.SetSize(self.GetN())
>          return buf
>
>     # TODO: add pythonization API to pypy-c
>       if not PYPY_CPPYY_COMPATIBILITY_FIXME:
>          cppyy.add_pythonization(
>             cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$",
> "GetE?[XYZ]$", set_size))
>
> The functions selected by the regexps return naked pointers, but the object
> can be queried for the size (all have a consistent GetN() function). So the
> method composer patches up the return value, making it a sized array,
> instead of an "open-ended" one.
>
> I'm sitting on some patches as I wanted to tweak his APIs a bit. There
> was some ordering that I felt didn't compose well, but that is minor.
>
> Similarly, there's code to apply ownership rules, mapping exceptions,
> the new C++11 smartptrs, controlling auto-casting, handling the GIL, making
> properties, and adding overloads. All driven by regexp matching of patterns.
> See here:
>
>
> https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b256/python/cppyy/_pythonization.py?at=master
>
> (plus further support inside the bindings layer itself).
>
> Of course, one can hook up completely custom functions, and he made it so
> that that is per C++ namespace, so nicely self-contained.
>
> Again, this is currently only partly available, as I need to write a lot
> more tests for PyPy (which are bound to unearth some problems along the
> way). And then there is documentation to be written ...
>
>> P.S. Please note that after today, I'll likely not have much Internet
>> access for a couple of weeks, so any responses may be limited.
>
>
> I'll make sure I have at least all my local changes pushed by then. :)
>
>
> Best regards,
>            Wim
> --
> [hidden email]    --    +1 (510) 486 6411    --    www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

Shaheed Haque
Hi Wim,

After reviewing your comments, I propose to check out rootcling. I
initially had some trouble using pip3 to install the newer code, but
that seems to have been resolved as of yesterday's 0.2.3 build. I did
notice one message during the install which seems to be benign, so I
mention it here merely in passing:

  Running command /usr/bin/python3 -u -c "import setuptools,
tokenize;__file__='/tmp/pip-build-spz01kkp/cppyy-backend/setup.py';f=getattr(tokenize,
'open', open)(__file__);code=f.read().replace('\r\n',
'\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d
/tmp/tmpe2h6yls0pip-wheel- --python-tag cp36
  running bdist_wheel
  running build
  running build_ext
  error: [Errno 2] No such file or directory: 'cling-config': 'cling-config'
error
  Failed building wheel for cppyy-backend
  Running setup.py clean for cppyy-backend

I'll no doubt be back with questions :-).

Thanks for all the good work, Shaheed



On 23 September 2017 at 06:24, Shaheed Haque <[hidden email]> wrote:

> Wim,
>
> Thanks for the detailed and thoughtful reply. I will digest and
> respond when I am properly back in circulation.
>
> On 15 September 2017 at 07:43,  <[hidden email]> wrote:
>> Shaheed,
>>
>>> Ah, I had not realised rootcling existed. I've seen that I can invoke
>>> it using Python version-specific paths...is this the correct way to
>>> invoke it:
>>>
>>> ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend
>>> LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h
>>
>>
>> Yes, and here's a description of the LinkDef.h format:
>>
>>
>> https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-linkdef.h-file
>>
>>> or is there a recommended wrapper?
>>
>>
>> No, but I'm going to add one for pip, same as I did for genreflex. I've
>> been fleshing out the backend generation, taken over from Anto:
>>
>>   https://bitbucket.org/wlav/cppyy-backend
>>
>> where all that can live. I'm told that I'll need rootcling anyway for
>> use of modules (see below).
>>
>>> I actually get some warnings and then the error:
>>
>>
>> Add this set of exclusions to the selection.xml:
>>
>> <exclusion>
>>    <class pattern="*thread_mutex*" />
>>    <class pattern="*new_allocator*" />
>>    <class pattern="*Alloc_hider*" />
>> </exclusion>
>>
>> Of course, the larger problem of pulling in these standard libs over and
>> over again is that it is a waste of cpu and memory, so I do want to see
>> the file_name attribute fixed. As it stands, I'd simply exclude:
>>
>>    <class pattern="std::*" />
>>    <class pattern="__gnu_cxx::*" />
>>
>> especially since they are already available by default. Note that those two
>> rules cover the ones needed for new_allocator and Alloc_hider.
>>
>> However, there is a more efficient approach that is right around the corner
>> (and has been right about the corner for a long time, so don't hold me to
>> that). Next release now seems likely though.
>>
>> The long term goal has always been to use modules:
>>
>>   http://clang.llvm.org/docs/Modules.html
>>
>> but the original drivers (Apple, Google, and the C++ standards committee)
>> have been going back and forth on it. Now, things are finally falling into
>> place. Here's Google:
>>
>>   https://www.youtube.com/watch?v=dHFNpBfemDI
>>
>> And here's ROOT:
>>
>>
>> https://indico.cern.ch/event/643728/contributions/2612822/attachments/1494074/2323893/ROOTs_C_modules_status_report.pdf
>>
>> The big deal is that C++ developers have an incentive to deploy modules, so
>> being able to patch into that should be a huge time saver (and where they
>> don't, rootcling will soon be able to create modules from headers). Note
>> that modules don't come for free: it will require some ambiguity resolution,
>> but that is typically a Good Thing (code-quality wise).
>>
>> Modules allow deserialization of only the piece of the AST that is actually
>> being requested, saving memory. This as opposed to header files (whether or
>> not precompiled) which pull in everything before them. See the status report
>> above for the improvements in memory usage.
>>
>> And with modules, of course, selection becomes unnecessary (markup for
>> automatic streamers may still be useful, but that is not relevant for
>> bindings generation).
>>
>>> I did wonder if I was missing some "-isystem" includes, and tried
>>> adding them but the --debug output from genreflex seemed to suggest
>>> they were being ignored.
>>
>>
>> Some flags are ignored as no-one was using them (so far). Some others
>> are definitely obsolete by now.
>>
>>> What is interesting, and might possibly throw light on the selection
>>> filter issue, is that the file name for the classes in
>>> kjsinterpreter.h itself is always the empty string ''. Classes that
>>> come from included files return non-empty strings such as
>>> 'kjsobject.h' for 'KJSObject'.
>>
>>
>> That's after the fact (i.e. what is stored); I don't see the rule being
>> respected/used at all.
>>
>>> BTW, the reason for doing this is that lots of KDE code has multiple
>>> classes and even namespaces in a single header file. Now, for
>>> discoverability of the loaded objects, I find the incremental "pop
>>> into cppyy,gbl on demand" somewhat limiting and I wanted to play about
>>> with that. I could also workaround the filter issue if I precomputed
>>> the needed names in a precursor pass.
>>
>>
>> The issue here is the memory cost of loading things that won't get used
>> in the end. This is why a functional dir() (which needs nothing but
>> strings, after all), in conjunction with lazy loading/creation when a
>> real access happens work well. LLVM is fully lookup based, btw. There
>> is a custom layer on top of Cling to make enumeration possible.
>>
>>> Finally, and most importantly given the fidelity with which cppyy
>>> renders the C++ code, I'm think about how Pythonisation customisation
>>> might be handled: e.g. a Python wrapper layer to allow a
>>> pointer-plus-size to render as a Python list/tuple, or generate a dict
>>> mapping fora QSet, and so on. (I'm dimly aware of the
>>> boost-recognition logic you have alluded to, this is specifically more
>>> about Qt-specific patterns and ad-hoc scenarios).
>>
>>
>> In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of
>> a lack of test coverage, but did put in in PyROOT. Here's an example of
>> the "pointer-plus-size" pythonization (from ROOT.py):
>>
>>     # python side pythonizations (should live in their own file, if we get
>> many)
>>       def set_size(self, buf):
>>          buf.SetSize(self.GetN())
>>          return buf
>>
>>     # TODO: add pythonization API to pypy-c
>>       if not PYPY_CPPYY_COMPATIBILITY_FIXME:
>>          cppyy.add_pythonization(
>>             cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$",
>> "GetE?[XYZ]$", set_size))
>>
>> The functions selected by the regexps return naked pointers, but the object
>> can be queried for the size (all have a consistent GetN() function). So the
>> method composer patches up the return value, making it a sized array,
>> instead of an "open-ended" one.
>>
>> I'm sitting on some patches as I wanted to tweak his APIs a bit. There
>> was some ordering that I felt didn't compose well, but that is minor.
>>
>> Similarly, there's code to apply ownership rules, mapping exceptions,
>> the new C++11 smartptrs, controlling auto-casting, handling the GIL, making
>> properties, and adding overloads. All driven by regexp matching of patterns.
>> See here:
>>
>>
>> https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b256/python/cppyy/_pythonization.py?at=master
>>
>> (plus further support inside the bindings layer itself).
>>
>> Of course, one can hook up completely custom functions, and he made it so
>> that that is per C++ namespace, so nicely self-contained.
>>
>> Again, this is currently only partly available, as I need to write a lot
>> more tests for PyPy (which are bound to unearth some problems along the
>> way). And then there is documentation to be written ...
>>
>>> P.S. Please note that after today, I'll likely not have much Internet
>>> access for a couple of weeks, so any responses may be limited.
>>
>>
>> I'll make sure I have at least all my local changes pushed by then. :)
>>
>>
>> Best regards,
>>            Wim
>> --
>> [hidden email]    --    +1 (510) 486 6411    --    www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

Shaheed Haque
Oh wait, I think I see that cling-config is installed by the cppyy
package. (Seems a tad confusing, ho-hum).

On 11 October 2017 at 10:29, Shaheed Haque <[hidden email]> wrote:

> Hi Wim,
>
> After reviewing your comments, I propose to check out rootcling. I
> initially had some trouble using pip3 to install the newer code, but
> that seems to have been resolved as of yesterday's 0.2.3 build. I did
> notice one message during the install which seems to be benign, so I
> mention it here merely in passing:
>
>   Running command /usr/bin/python3 -u -c "import setuptools,
> tokenize;__file__='/tmp/pip-build-spz01kkp/cppyy-backend/setup.py';f=getattr(tokenize,
> 'open', open)(__file__);code=f.read().replace('\r\n',
> '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d
> /tmp/tmpe2h6yls0pip-wheel- --python-tag cp36
>   running bdist_wheel
>   running build
>   running build_ext
>   error: [Errno 2] No such file or directory: 'cling-config': 'cling-config'
> error
>   Failed building wheel for cppyy-backend
>   Running setup.py clean for cppyy-backend
>
> I'll no doubt be back with questions :-).
>
> Thanks for all the good work, Shaheed
>
>
>
> On 23 September 2017 at 06:24, Shaheed Haque <[hidden email]> wrote:
>> Wim,
>>
>> Thanks for the detailed and thoughtful reply. I will digest and
>> respond when I am properly back in circulation.
>>
>> On 15 September 2017 at 07:43,  <[hidden email]> wrote:
>>> Shaheed,
>>>
>>>> Ah, I had not realised rootcling existed. I've seen that I can invoke
>>>> it using Python version-specific paths...is this the correct way to
>>>> invoke it:
>>>>
>>>> ROOTCLING=/usr/local/lib/python3.6/dist-packages/cppyy_backend
>>>> LD_LIBRARY_PATH=$ROOTCLING/lib $ROOTCLING/bin/rootcling -h
>>>
>>>
>>> Yes, and here's a description of the LinkDef.h format:
>>>
>>>
>>> https://root.cern.ch/root/html/guides/users-guide/AddingaClass.html#the-linkdef.h-file
>>>
>>>> or is there a recommended wrapper?
>>>
>>>
>>> No, but I'm going to add one for pip, same as I did for genreflex. I've
>>> been fleshing out the backend generation, taken over from Anto:
>>>
>>>   https://bitbucket.org/wlav/cppyy-backend
>>>
>>> where all that can live. I'm told that I'll need rootcling anyway for
>>> use of modules (see below).
>>>
>>>> I actually get some warnings and then the error:
>>>
>>>
>>> Add this set of exclusions to the selection.xml:
>>>
>>> <exclusion>
>>>    <class pattern="*thread_mutex*" />
>>>    <class pattern="*new_allocator*" />
>>>    <class pattern="*Alloc_hider*" />
>>> </exclusion>
>>>
>>> Of course, the larger problem of pulling in these standard libs over and
>>> over again is that it is a waste of cpu and memory, so I do want to see
>>> the file_name attribute fixed. As it stands, I'd simply exclude:
>>>
>>>    <class pattern="std::*" />
>>>    <class pattern="__gnu_cxx::*" />
>>>
>>> especially since they are already available by default. Note that those two
>>> rules cover the ones needed for new_allocator and Alloc_hider.
>>>
>>> However, there is a more efficient approach that is right around the corner
>>> (and has been right about the corner for a long time, so don't hold me to
>>> that). Next release now seems likely though.
>>>
>>> The long term goal has always been to use modules:
>>>
>>>   http://clang.llvm.org/docs/Modules.html
>>>
>>> but the original drivers (Apple, Google, and the C++ standards committee)
>>> have been going back and forth on it. Now, things are finally falling into
>>> place. Here's Google:
>>>
>>>   https://www.youtube.com/watch?v=dHFNpBfemDI
>>>
>>> And here's ROOT:
>>>
>>>
>>> https://indico.cern.ch/event/643728/contributions/2612822/attachments/1494074/2323893/ROOTs_C_modules_status_report.pdf
>>>
>>> The big deal is that C++ developers have an incentive to deploy modules, so
>>> being able to patch into that should be a huge time saver (and where they
>>> don't, rootcling will soon be able to create modules from headers). Note
>>> that modules don't come for free: it will require some ambiguity resolution,
>>> but that is typically a Good Thing (code-quality wise).
>>>
>>> Modules allow deserialization of only the piece of the AST that is actually
>>> being requested, saving memory. This as opposed to header files (whether or
>>> not precompiled) which pull in everything before them. See the status report
>>> above for the improvements in memory usage.
>>>
>>> And with modules, of course, selection becomes unnecessary (markup for
>>> automatic streamers may still be useful, but that is not relevant for
>>> bindings generation).
>>>
>>>> I did wonder if I was missing some "-isystem" includes, and tried
>>>> adding them but the --debug output from genreflex seemed to suggest
>>>> they were being ignored.
>>>
>>>
>>> Some flags are ignored as no-one was using them (so far). Some others
>>> are definitely obsolete by now.
>>>
>>>> What is interesting, and might possibly throw light on the selection
>>>> filter issue, is that the file name for the classes in
>>>> kjsinterpreter.h itself is always the empty string ''. Classes that
>>>> come from included files return non-empty strings such as
>>>> 'kjsobject.h' for 'KJSObject'.
>>>
>>>
>>> That's after the fact (i.e. what is stored); I don't see the rule being
>>> respected/used at all.
>>>
>>>> BTW, the reason for doing this is that lots of KDE code has multiple
>>>> classes and even namespaces in a single header file. Now, for
>>>> discoverability of the loaded objects, I find the incremental "pop
>>>> into cppyy,gbl on demand" somewhat limiting and I wanted to play about
>>>> with that. I could also workaround the filter issue if I precomputed
>>>> the needed names in a precursor pass.
>>>
>>>
>>> The issue here is the memory cost of loading things that won't get used
>>> in the end. This is why a functional dir() (which needs nothing but
>>> strings, after all), in conjunction with lazy loading/creation when a
>>> real access happens work well. LLVM is fully lookup based, btw. There
>>> is a custom layer on top of Cling to make enumeration possible.
>>>
>>>> Finally, and most importantly given the fidelity with which cppyy
>>>> renders the C++ code, I'm think about how Pythonisation customisation
>>>> might be handled: e.g. a Python wrapper layer to allow a
>>>> pointer-plus-size to render as a Python list/tuple, or generate a dict
>>>> mapping fora QSet, and so on. (I'm dimly aware of the
>>>> boost-recognition logic you have alluded to, this is specifically more
>>>> about Qt-specific patterns and ad-hoc scenarios).
>>>
>>>
>>> In 2015, a GSoC student fleshed this out. I never put it into PyPy b/c of
>>> a lack of test coverage, but did put in in PyROOT. Here's an example of
>>> the "pointer-plus-size" pythonization (from ROOT.py):
>>>
>>>     # python side pythonizations (should live in their own file, if we get
>>> many)
>>>       def set_size(self, buf):
>>>          buf.SetSize(self.GetN())
>>>          return buf
>>>
>>>     # TODO: add pythonization API to pypy-c
>>>       if not PYPY_CPPYY_COMPATIBILITY_FIXME:
>>>          cppyy.add_pythonization(
>>>             cppyy.compose_method("^TGraph(2D)?$|^TGraph.*Errors$",
>>> "GetE?[XYZ]$", set_size))
>>>
>>> The functions selected by the regexps return naked pointers, but the object
>>> can be queried for the size (all have a consistent GetN() function). So the
>>> method composer patches up the return value, making it a sized array,
>>> instead of an "open-ended" one.
>>>
>>> I'm sitting on some patches as I wanted to tweak his APIs a bit. There
>>> was some ordering that I felt didn't compose well, but that is minor.
>>>
>>> Similarly, there's code to apply ownership rules, mapping exceptions,
>>> the new C++11 smartptrs, controlling auto-casting, handling the GIL, making
>>> properties, and adding overloads. All driven by regexp matching of patterns.
>>> See here:
>>>
>>>
>>> https://bitbucket.org/wlav/cppyy/src/4d14ba325e494f13cc11f3f11cbb87b44048b256/python/cppyy/_pythonization.py?at=master
>>>
>>> (plus further support inside the bindings layer itself).
>>>
>>> Of course, one can hook up completely custom functions, and he made it so
>>> that that is per C++ namespace, so nicely self-contained.
>>>
>>> Again, this is currently only partly available, as I need to write a lot
>>> more tests for PyPy (which are bound to unearth some problems along the
>>> way). And then there is documentation to be written ...
>>>
>>>> P.S. Please note that after today, I'll likely not have much Internet
>>>> access for a couple of weeks, so any responses may be limited.
>>>
>>>
>>> I'll make sure I have at least all my local changes pushed by then. :)
>>>
>>>
>>> Best regards,
>>>            Wim
>>> --
>>> [hidden email]    --    +1 (510) 486 6411    --    www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev
Reply | Threaded
Open this post in threaded view
|

Re: Automated binding generation (and maintenance)

wlavrijsen@lbl.gov
Shaheed,

> Oh wait, I think I see that cling-config is installed by the cppyy
> package. (Seems a tad confusing, ho-hum).

no, it's in cppyy-cling, which was freshly pulled in when starting from
cppyy, as all has been updated to take that new split into account. (I'm
not sure how to force such updates otherwise.)

As for the reasons for splitting and the overall package structure, rather
than posting it here, I added it to the docs:

   http://cppyy.readthedocs.io/en/latest/installation.html#package-structure

Basically, I want to avoid having to republish/reinstall all of Cling/LLVM
whenever I make a small change in the wrapper, as the former changes only
very infrequently (and takes a long time to build, as opposed to the wrapper
which is just a single C++ file).

I hope this is the last change I need to make to the package structure. :)

Once 1.0 is out, I'll look into whether something like conda is better
than pip (given the amount of C++ code). For now I think pip will do.

Best regards,
            Wim
--
[hidden email]    --    +1 (510) 486 6411    --    www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[hidden email]
https://mail.python.org/mailman/listinfo/pypy-dev