RFE: switch away from xmlSetExternalEntityLoader

Bug #2125513 reported by Cole Robinson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Medium
scoder

Bug Description

Valid with lxml git as of Sep 22 2025

lxml uses xmlSetExternalEntityLoader (all calls to _register_document_loader()) which registers a global libxml2 callback into lxml code. This does not play nice with other users of libxml2 in the same process.

libxml2 2.14.0 (released Mar 2025) added xmlCtxtSetResourceLoader which allows setting the same callback but on a per parse basis via xmlParserCtxt.

It would be nice if lxml could switch to that, when timing is appropriate. It would require bumping the minimum libxml2 version.

Thank you!

Revision history for this message
Cole Robinson (crobinso) wrote :

To expand a bit: virt-install/virt-manager does a lot of XML processing with libxml2 python bindings. we also talk to libvirt which uses libxml2 from C. libxml2 python bindings are going away, and we would like to move virt-install to use lxml. but any global callback stuff makes me wary. and lxml docs still explicitly warn against mixing with other libxml2 users unless lxml is built statically, which it isn't in Red Hat distro packages.

In the past, pre lxml 2020 commit that fixed https://bugs.launchpad.net/lxml/+bug/1880251 , just `import lxml` followed by calling into libvirt would crash immediately. the current state of lxml is way better, but still doesn't look safe in all our usage scenarios. virt-install/virt-manager does invoke some libvirt calls in threads, and the callback set by xmlSetExternalEntityLoader is not thread local, so it's still possible libvirt could end up calling into lxml code, with unexpected results.

I know there's been a ton of churn and API deprecations in libxml2 lately, but I think they added APIs that make all the global config stuff obsolete. hopefully lxml can get to a place that it's safe to mix with other libxml2 inprocess users, provided they are well-behaving (which libvirt is). especially with the libxml2 python bindings potentially going away

Revision history for this message
scoder (scoder) wrote :

Thanks for the report. That can probably be done quite easily by adding a shim C macro for the new function that configures the parser context in new libxml2 versions and uses the global setting in older versions. PR welcome.

Such a macro could be added to "tree.pxd", similar to other verbatim C code blocks in other pxd files. See
https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#including-verbatim-c-code

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.