Python lazy loading and namespace packages
Recently I have bumped into a really weird issue, where in one project whilst
trying to use a package pkg (I’ll use this name to avoid using a real project
name) and write tests using the foo backend, that lived in a different
package called pkg_foo. It was not working as intended when testing using the
bazel build system and this post is about the path to root-causing the issue
and arriving at the right conclusions.
TLDR: The source code with tests is in the github.com/aignas/anikevicius.lt/.
Python and it’s lesser-known features
Recently I have learned about two Python features that may help with the dependency hell and I am going to set the context really quickly before moving on with the issue at hand.
- You can achieve lazy-loading of your modules by using a
__getattr__function based on PEP562. - You can have a multiple packages constitute a single top-level module and use
pkgutil.extend_path for telling Python how to make imports work. For
example, the main package providing
pkgandpkg.maincan be extended by apkg_extensionpackage that addspkg.extensionand everything works magically.
The common path
The common path which is exercised by most of the developers out there is that
the pkg and pkg_foo gets installed into the same virtualenv by the developer
and everything works as intended, because the site-packages directory in that virtualenv
would contain the following subtree:
site-packages/
pkg
__init__.py # contains the __getattr__ function to implement lazy loading
main.py
pkg_extension/
__init__.py # contains pkgutil glue
extension/ # contains contents of `pkg_extension` package
...
This means that both of the features will work correctly because the installed
Python package share the same file system layout and usually the
pkg_extension is usually visited after the pkg is visited when trying to
import things from the pkg package.
This means that the minimum working code example for the contents of pkg/__init__.py is:
# Make `pkg` an namespace package.
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
__lazy_imports = {
"foo": (".dir.lib", "foo"),
}
def __getattr__(name: str):
# PEP-562: Lazy loaded attributes on python modules
module_path, attr_name = __lazy_imports.get(name, ("", ""))
if not module_path:
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
import importlib
mod = importlib.import_module(module_path, __name__)
if attr_name:
val = getattr(mod, attr_name)
else:
val = mod
# Store for next time
globals()[name] = val
return val
# We may include extra things below
And the pkg_extension/__init__.py only needs the hooks for the pkgutil:
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
When things may go wrong
In a simple virtual env, everything works as expected all of the time as the
directory traversal is deterministic (citation needed) and we usually hit the
__init__.py file from the pkg before the one from pkg_extension.
We were also careful to name our extensions with easy to find naming scheme and
everything works until we start using a build tool that has a different Python
package layout from the one expected by us.
When using bazel the sys.path order is determined by your build dependency
DAG (Directed Acyclic Graph), and the order of the packages appearing in the
sys.path is not lexicographically sorted (as of 6.2.0 at least). This means
that we may have the pkg_extension before the pkg in our sys.path which
will make it to be visited by the Python import machinery before the
pkg/__init__.py which has the lazy-loading magic.
To test this behaviour I have created a few tests in the example folder and decided to go down the rabbit hole. Below is a list of combinations that I have tested:
pkg and pkg_extension names
This is working as intended if the pkg appears before pkg_extension in the sys.path.
If we reverse the order of their entries in the sys.path, then the lazy-loaded functions
and the regular function from __init__.py from the pkg package is failing.
pkg and extension_pkg names
This is working in the same way, as the previous case, the pkg needs to be
before the extension_pkg in the sys.path.
Copy the lazy loading machinery to the extension_pkg_correct and use that
So what the two data points are telling us is that the lazy-loading ceases to function
when the first thing that gets visited is the __init__.py file without the PEP562 hooks.
So if we copy the PEP562 hooks to the extension file, what happens then?
This, as expected makes the lazy loading work, because the lazy loading works by specifying the absolute import path as such:
# __path__ manipulation added by bazelbuild/rules_python to support namespace pkgs.
__path__ = __import__("pkgutil").extend_path(__path__, __name__)
__lazy_imports = {
"foo": ("pkg.dir.lib", "foo"),
}
def __getattr__(name: str):
# PEP-562: Lazy loaded attributes on python modules
module_path, attr_name = __lazy_imports.get(name, ("", ""))
if not module_path:
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
import importlib
mod = importlib.import_module(module_path, __name__)
if attr_name:
val = getattr(mod, attr_name)
else:
val = mod
# Store for next time
globals()[name] = val
return val
Notice, the contents of __lazy_imports, which now has the absolute import
paths rather than relative ones as in the first snippet.
However it seems that we cannot import the function fizz that happens to be
in the pkg/__init__.py at the end of the file in the main package.
Conclusion
It seems that having lazy imports using PEP562 and supporting
pkgutil.extend_path usage to split the package into multiple parts does not
together. It may seem somewhat weird if one wants to do that, because if you
can depend on the lazy-import machinery, maybe you don’t need to split your
packages anymore. On the other hand, everything works as expected if you have
only a single site-packages location where you install your packages, which
is almost always the case for regular Python users or Python installations
inside containers.