## 问题
You would like to customize Python’s import statement so that it can transparently load modules from a remote machine.
## 解决方案
First, a serious disclaimer about security. The idea discussed in this recipe would be wholly bad without some kind of extra security and authentication layer. That said, the main goal is actually to take a deep dive into the inner workings of Python’s import statement. If you get this recipe to work and understand the inner workings, you’ll have a solid foundation of customizing import for almost any other purpose. With that out of the way, let’s carry on.
At the core of this recipe is a desire to extend the functionality of the import statement. There are several approaches for doing this, but for the purposes of illustration, start by making the following directory of Python code:
~~~
testcode/
spam.py
fib.py
grok/
__init__.py
blah.py
~~~
The content of these files doesn’t matter, but put a few simple statements and functions in each file so you can test them and see output when they’re imported. For example:
~~~
# spam.py
print("I'm spam")
def hello(name):
print('Hello %s' % name)
# fib.py
print("I'm fib")
def fib(n):
if n < 2:
return 1
else:
return fib(n-1) + fib(n-2)
# grok/__init__.py
print("I'm grok.__init__")
# grok/blah.py
print("I'm grok.blah")
~~~
The goal here is to allow remote access to these files as modules. Perhaps the easiest way to do this is to publish them on a web server. Simply go to the testcode directory and run Python like this:
~~~
bash % cd testcode
bash % python3 -m http.server 15000
Serving HTTP on 0.0.0.0 port 15000 ...
~~~
Leave that server running and start up a separate Python interpreter. Make sure you can access the remote files using urllib. For example:
~~~
>>> from urllib.request import urlopen
>>> u = urlopen('http://localhost:15000/fib.py')
>>> data = u.read().decode('utf-8')
>>> print(data)
# fib.py
print("I'm fib")
def fib(n):
if n < 2:
return 1
else:
return fib(n-1) + fib(n-2)
>>>
~~~
Loading source code from this server is going to form the basis for the remainder of this recipe. Specifically, instead of manually grabbing a file of source code using urlop en(), the import statement will be customized to do it transparently behind the scenes.
The first approach to loading a remote module is to create an explicit loading function for doing it. For example:
~~~
import imp
import urllib.request
import sys
def load_module(url):
u = urllib.request.urlopen(url)
source = u.read().decode('utf-8')
mod = sys.modules.setdefault(url, imp.new_module(url))
code = compile(source, url, 'exec')
mod.__file__ = url
mod.__package__ = ''
exec(code, mod.__dict__)
return mod
~~~
This function merely downloads the source code, compiles it into a code object using compile(), and executes it in the dictionary of a newly created module object. Here’s how you would use the function:
~~~
>>> fib = load_module('http://localhost:15000/fib.py')
I'm fib
>>> fib.fib(10)
89
>>> spam = load_module('http://localhost:15000/spam.py')
I'm spam
>>> spam.hello('Guido')
Hello Guido
>>> fib
<module 'http://localhost:15000/fib.py' from 'http://localhost:15000/fib.py'>
>>> spam
<module 'http://localhost:15000/spam.py' from 'http://localhost:15000/spam.py'>
>>>
~~~
As you can see, it “works” for simple modules. However, it’s not plugged into the usual import statement, and extending the code to support more advanced constructs, such as packages, would require additional work.
A much slicker approach is to create a custom importer. The first way to do this is to create what’s known as a meta path importer. Here is an example:
~~~
# urlimport.py
import sys
import importlib.abc
import imp
from urllib.request import urlopen
from urllib.error import HTTPError, URLError
from html.parser import HTMLParser
# Debugging
import logging
log = logging.getLogger(__name__)
# Get links from a given URL
def _get_links(url):
class LinkParser(HTMLParser):
def handle_starttag(self, tag, attrs):
if tag == 'a':
attrs = dict(attrs)
links.add(attrs.get('href').rstrip('/'))
links = set()
try:
log.debug('Getting links from %s' % url)
u = urlopen(url)
parser = LinkParser()
parser.feed(u.read().decode('utf-8'))
except Exception as e:
log.debug('Could not get links. %s', e)
log.debug('links: %r', links)
return links
class UrlMetaFinder(importlib.abc.MetaPathFinder):
def __init__(self, baseurl):
self._baseurl = baseurl
self._links = { }
self._loaders = { baseurl : UrlModuleLoader(baseurl) }
def find_module(self, fullname, path=None):
log.debug('find_module: fullname=%r, path=%r', fullname, path)
if path is None:
baseurl = self._baseurl
else:
if not path[0].startswith(self._baseurl):
return None
baseurl = path[0]
parts = fullname.split('.')
basename = parts[-1]
log.debug('find_module: baseurl=%r, basename=%r', baseurl, basename)
# Check link cache
if basename not in self._links:
self._links[baseurl] = _get_links(baseurl)
# Check if it's a package
if basename in self._links[baseurl]:
log.debug('find_module: trying package %r', fullname)
fullurl = self._baseurl + '/' + basename
# Attempt to load the package (which accesses __init__.py)
loader = UrlPackageLoader(fullurl)
try:
loader.load_module(fullname)
self._links[fullurl] = _get_links(fullurl)
self._loaders[fullurl] = UrlModuleLoader(fullurl)
log.debug('find_module: package %r loaded', fullname)
except ImportError as e:
log.debug('find_module: package failed. %s', e)
loader = None
return loader
# A normal module
filename = basename + '.py'
if filename in self._links[baseurl]:
log.debug('find_module: module %r found', fullname)
return self._loaders[baseurl]
else:
log.debug('find_module: module %r not found', fullname)
return None
def invalidate_caches(self):
log.debug('invalidating link cache')
self._links.clear()
# Module Loader for a URL
class UrlModuleLoader(importlib.abc.SourceLoader):
def __init__(self, baseurl):
self._baseurl = baseurl
self._source_cache = {}
def module_repr(self, module):
return '<urlmodule %r from %r>' % (module.__name__, module.__file__)
# Required method
def load_module(self, fullname):
code = self.get_code(fullname)
mod = sys.modules.setdefault(fullname, imp.new_module(fullname))
mod.__file__ = self.get_filename(fullname)
mod.__loader__ = self
mod.__package__ = fullname.rpartition('.')[0]
exec(code, mod.__dict__)
return mod
# Optional extensions
def get_code(self, fullname):
src = self.get_source(fullname)
return compile(src, self.get_filename(fullname), 'exec')
def get_data(self, path):
pass
def get_filename(self, fullname):
return self._baseurl + '/' + fullname.split('.')[-1] + '.py'
def get_source(self, fullname):
filename = self.get_filename(fullname)
log.debug('loader: reading %r', filename)
if filename in self._source_cache:
log.debug('loader: cached %r', filename)
return self._source_cache[filename]
try:
u = urlopen(filename)
source = u.read().decode('utf-8')
log.debug('loader: %r loaded', filename)
self._source_cache[filename] = source
return source
except (HTTPError, URLError) as e:
log.debug('loader: %r failed. %s', filename, e)
raise ImportError("Can't load %s" % filename)
def is_package(self, fullname):
return False
# Package loader for a URL
class UrlPackageLoader(UrlModuleLoader):
def load_module(self, fullname):
mod = super().load_module(fullname)
mod.__path__ = [ self._baseurl ]
mod.__package__ = fullname
def get_filename(self, fullname):
return self._baseurl + '/' + '__init__.py'
def is_package(self, fullname):
return True
# Utility functions for installing/uninstalling the loader
_installed_meta_cache = { }
def install_meta(address):
if address not in _installed_meta_cache:
finder = UrlMetaFinder(address)
_installed_meta_cache[address] = finder
sys.meta_path.append(finder)
log.debug('%r installed on sys.meta_path', finder)
def remove_meta(address):
if address in _installed_meta_cache:
finder = _installed_meta_cache.pop(address)
sys.meta_path.remove(finder)
log.debug('%r removed from sys.meta_path', finder)
~~~
Here is an interactive session showing how to use the preceding code:
~~~
>>> # importing currently fails
>>> import fib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'fib'
>>> # Load the importer and retry (it works)
>>> import urlimport
>>> urlimport.install_meta('http://localhost:15000')
>>> import fib
I'm fib
>>> import spam
I'm spam
>>> import grok.blah
I'm grok.__init__
I'm grok.blah
>>> grok.blah.__file__
'http://localhost:15000/grok/blah.py'
>>>
~~~
This particular solution involves installing an instance of a special finder object UrlMe taFinder as the last entry in sys.meta_path. Whenever modules are imported, the finders in sys.meta_path are consulted in order to locate the module. In this example, the UrlMetaFinder instance becomes a finder of last resort that’s triggered when a module can’t be found in any of the normal locations.
As for the general implementation approach, the UrlMetaFinder class wraps around a user-specified URL. Internally, the finder builds sets of valid links by scraping them from the given URL. When imports are made, the module name is compared against this set of known links. If a match can be found, a separate UrlModuleLoader class is used to load source code from the remote machine and create the resulting module object. One reason for caching the links is to avoid unnecessary HTTP requests on repeated imports.
The second approach to customizing import is to write a hook that plugs directly into the sys.path variable, recognizing certain directory naming patterns. Add the following class and support functions to urlimport.py:
~~~
# urlimport.py
# ... include previous code above ...
# Path finder class for a URL
class UrlPathFinder(importlib.abc.PathEntryFinder):
def __init__(self, baseurl):
self._links = None
self._loader = UrlModuleLoader(baseurl)
self._baseurl = baseurl
def find_loader(self, fullname):
log.debug('find_loader: %r', fullname)
parts = fullname.split('.')
basename = parts[-1]
# Check link cache
if self._links is None:
self._links = [] # See discussion
self._links = _get_links(self._baseurl)
# Check if it's a package
if basename in self._links:
log.debug('find_loader: trying package %r', fullname)
fullurl = self._baseurl + '/' + basename
# Attempt to load the package (which accesses __init__.py)
loader = UrlPackageLoader(fullurl)
try:
loader.load_module(fullname)
log.debug('find_loader: package %r loaded', fullname)
except ImportError as e:
log.debug('find_loader: %r is a namespace package', fullname)
loader = None
return (loader, [fullurl])
# A normal module
filename = basename + '.py'
if filename in self._links:
log.debug('find_loader: module %r found', fullname)
return (self._loader, [])
else:
log.debug('find_loader: module %r not found', fullname)
return (None, [])
def invalidate_caches(self):
log.debug('invalidating link cache')
self._links = None
# Check path to see if it looks like a URL
_url_path_cache = {}
def handle_url(path):
if path.startswith(('http://', 'https://')):
log.debug('Handle path? %s. [Yes]', path)
if path in _url_path_cache:
finder = _url_path_cache[path]
else:
finder = UrlPathFinder(path)
_url_path_cache[path] = finder
return finder
else:
log.debug('Handle path? %s. [No]', path)
def install_path_hook():
sys.path_hooks.append(handle_url)
sys.path_importer_cache.clear()
log.debug('Installing handle_url')
def remove_path_hook():
sys.path_hooks.remove(handle_url)
sys.path_importer_cache.clear()
log.debug('Removing handle_url')
~~~
To use this path-based finder, you simply add URLs to sys.path. For example:
~~~
>>> # Initial import fails
>>> import fib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'fib'
>>> # Install the path hook
>>> import urlimport
>>> urlimport.install_path_hook()
>>> # Imports still fail (not on path)
>>> import fib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'fib'
>>> # Add an entry to sys.path and watch it work
>>> import sys
>>> sys.path.append('http://localhost:15000')
>>> import fib
I'm fib
>>> import grok.blah
I'm grok.__init__
I'm grok.blah
>>> grok.blah.__file__
'http://localhost:15000/grok/blah.py'
>>>
~~~
The key to this last example is the handle_url() function, which is added to the sys.path_hooks variable. When the entries on sys.path are being processed, the functions in sys.path_hooks are invoked. If any of those functions return a finder object, that finder is used to try to load modules for that entry on sys.path.
It should be noted that the remotely imported modules work exactly like any other module. For instance:
~~~
>>> fib
<urlmodule 'fib' from 'http://localhost:15000/fib.py'>
>>> fib.__name__
'fib'
>>> fib.__file__
'http://localhost:15000/fib.py'
>>> import inspect
>>> print(inspect.getsource(fib))
# fib.py
print("I'm fib")
def fib(n):
if n < 2:
return 1
else:
return fib(n-1) + fib(n-2)
>>>
~~~
## 讨论
Before discussing this recipe in further detail, it should be emphasized that Python’s module, package, and import mechanism is one of the most complicated parts of the entire language—often poorly understood by even the most seasoned Python programmers unless they’ve devoted effort to peeling back the covers. There are several critical documents that are worth reading, including the documentation for the [importlib module](https://docs.python.org/3/library/importlib.html) and [PEP 302](http://www.python.org/dev/peps/pep-0302). That documentation won’t be repeated here, but some essential highlights will be discussed.
First, if you want to create a new module object, you use the imp.new_module() function. For example:
~~~
>>> import imp
>>> m = imp.new_module('spam')
>>> m
<module 'spam'>
>>> m.__name__
'spam'
>>>
~~~
Module objects usually have a few expected attributes, including __file__ (the name of the file that the module was loaded from) and __package__ (the name of the enclosing package, if any).
Second, modules are cached by the interpreter. The module cache can be found in the dictionary sys.modules. Because of this caching, it’s common to combine caching and module creation together into a single step. For example:
~~~
>>> import sys
>>> import imp
>>> m = sys.modules.setdefault('spam', imp.new_module('spam'))
>>> m
<module 'spam'>
>>>
~~~
The main reason for doing this is that if a module with the given name already exists, you’ll get the already created module instead. For example:
~~~
>>> import math
>>> m = sys.modules.setdefault('math', imp.new_module('math'))
>>> m
<module 'math' from '/usr/local/lib/python3.3/lib-dynload/math.so'>
>>> m.sin(2)
0.9092974268256817
>>> m.cos(2)
-0.4161468365471424
>>>
~~~
Since creating modules is easy, it is straightforward to write simple functions, such as the load_module() function in the first part of this recipe. A downside of this approach is that it is actually rather tricky to handle more complicated cases, such as package imports. In order to handle a package, you would have to reimplement much of the underlying logic that’s already part of the normal import statement (e.g., checking for directories, looking for __init__.py files, executing those files, setting up paths, etc.). This complexity is one of the reasons why it’s often better to extend the import statement directly rather than defining a custom function.
Extending the import statement is straightforward, but involves a number of moving parts. At the highest level, import operations are processed by a list of “meta-path” finders that you can find in the list sys.meta_path. If you output its value, you’ll see the following:
~~~
>>> from pprint import pprint
>>> pprint(sys.meta_path)
[<class '_frozen_importlib.BuiltinImporter'>,
<class '_frozen_importlib.FrozenImporter'>,
<class '_frozen_importlib.PathFinder'>]
>>>
~~~
When executing a statement such as import fib, the interpreter walks through the finder objects on sys.meta_path and invokes their find_module() method in order to locate an appropriate module loader. It helps to see this by experimentation, so define the following class and try the following:
~~~
>>> class Finder:
... def find_module(self, fullname, path):
... print('Looking for', fullname, path)
... return None
...
>>> import sys
>>> sys.meta_path.insert(0, Finder()) # Insert as first entry
>>> import math
Looking for math None
>>> import types
Looking for types None
>>> import threading
Looking for threading None
Looking for time None
Looking for traceback None
Looking for linecache None
Looking for tokenize None
Looking for token None
>>>
~~~
Notice how the find_module() method is being triggered on every import. The role of the path argument in this method is to handle packages. When packages are imported, it is a list of the directories that are found in the package’s __path__ attribute. These are the paths that need to be checked to find package subcomponents. For example, notice the path setting for xml.etree and xml.etree.ElementTree:
~~~
>>> import xml.etree.ElementTree
Looking for xml None
Looking for xml.etree ['/usr/local/lib/python3.3/xml']
Looking for xml.etree.ElementTree ['/usr/local/lib/python3.3/xml/etree']
Looking for warnings None
Looking for contextlib None
Looking for xml.etree.ElementPath ['/usr/local/lib/python3.3/xml/etree']
Looking for _elementtree None
Looking for copy None
Looking for org None
Looking for pyexpat None
Looking for ElementC14N None
>>>
~~~
The placement of the finder on sys.meta_path is critical. Remove it from the front of the list to the end of the list and try more imports:
~~~
>>> del sys.meta_path[0]
>>> sys.meta_path.append(Finder())
>>> import urllib.request
>>> import datetime
~~~
Now you don’t see any output because the imports are being handled by other entries in sys.meta_path. In this case, you would only see it trigger when nonexistent modules are imported:
~~~
>>> import fib
Looking for fib None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'fib'
>>> import xml.superfast
Looking for xml.superfast ['/usr/local/lib/python3.3/xml']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'xml.superfast'
>>>
~~~
The fact that you can install a finder to catch unknown modules is the key to the UrlMetaFinder class in this recipe. An instance of UrlMetaFinder is added to the end of sys.meta_path, where it serves as a kind of importer of last resort. If the requested module name can’t be located by any of the other import mechanisms, it gets handled by this finder. Some care needs to be taken when handling packages. Specifically, the value presented in the path argument needs to be checked to see if it starts with the URL registered in the finder. If not, the submodule must belong to some other finder and should be ignored.
Additional handling of packages is found in the UrlPackageLoader class. This class, rather than importing the package name, tries to load the underlying __init__.py file. It also sets the module __path__ attribute. This last part is critical, as the value set will be passed to subsequent find_module() calls when loading package submodules. The path-based import hook is an extension of these ideas, but based on a somewhat different mechanism. As you know, sys.path is a list of directories where Python looks for modules. For example:
~~~
>>> from pprint import pprint
>>> import sys
>>> pprint(sys.path)
['',
'/usr/local/lib/python33.zip',
'/usr/local/lib/python3.3',
'/usr/local/lib/python3.3/plat-darwin',
'/usr/local/lib/python3.3/lib-dynload',
'/usr/local/lib/...3.3/site-packages']
>>>
~~~
Each entry in sys.path is additionally attached to a finder object. You can view these finders by looking at sys.path_importer_cache:
~~~
>>> pprint(sys.path_importer_cache)
{'.': FileFinder('.'),
'/usr/local/lib/python3.3': FileFinder('/usr/local/lib/python3.3'),
'/usr/local/lib/python3.3/': FileFinder('/usr/local/lib/python3.3/'),
'/usr/local/lib/python3.3/collections': FileFinder('...python3.3/collections'),
'/usr/local/lib/python3.3/encodings': FileFinder('...python3.3/encodings'),
'/usr/local/lib/python3.3/lib-dynload': FileFinder('...python3.3/lib-dynload'),
'/usr/local/lib/python3.3/plat-darwin': FileFinder('...python3.3/plat-darwin'),
'/usr/local/lib/python3.3/site-packages': FileFinder('...python3.3/site-packages'),
'/usr/local/lib/python33.zip': None}
>>>
~~~
sys.path_importer_cache tends to be much larger than sys.path because it records finders for all known directories where code is being loaded. This includes subdirectories of packages which usually aren’t included on sys.path.
To execute import fib, the directories on sys.path are checked in order. For each directory, the name fib is presented to the associated finder found in sys.path_im porter_cache. This is also something that you can investigate by making your own finder and putting an entry in the cache. Try this experiment:
~~~
>>> class Finder:
... def find_loader(self, name):
... print('Looking for', name)
... return (None, [])
...
>>> import sys
>>> # Add a "debug" entry to the importer cache
>>> sys.path_importer_cache['debug'] = Finder()
>>> # Add a "debug" directory to sys.path
>>> sys.path.insert(0, 'debug')
>>> import threading
Looking for threading
Looking for time
Looking for traceback
Looking for linecache
Looking for tokenize
Looking for token
>>>
~~~
Here, you’ve installed a new cache entry for the name debug and installed the name debug as the first entry on sys.path. On all subsequent imports, you see your finder being triggered. However, since it returns (None, []), processing simply continues to the next entry.
The population of sys.path_importer_cache is controlled by a list of functions stored in sys.path_hooks. Try this experiment, which clears the cache and adds a new path checking function to sys.path_hooks:
~~~
>>> sys.path_importer_cache.clear()
>>> def check_path(path):
... print('Checking', path)
... raise ImportError()
...
>>> sys.path_hooks.insert(0, check_path)
>>> import fib
Checked debug
Checking .
Checking /usr/local/lib/python33.zip
Checking /usr/local/lib/python3.3
Checking /usr/local/lib/python3.3/plat-darwin
Checking /usr/local/lib/python3.3/lib-dynload
Checking /Users/beazley/.local/lib/python3.3/site-packages
Checking /usr/local/lib/python3.3/site-packages
Looking for fib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'fib'
>>>
~~~
As you can see, the check_path() function is being invoked for every entry on sys.path. However, since an ImportError exception is raised, nothing else happens (checking just moves to the next function on sys.path_hooks).
Using this knowledge of how sys.path is processed, you can install a custom path checking function that looks for filename patterns, such as URLs. For instance:
~~~
>>> def check_url(path):
... if path.startswith('http://'):
... return Finder()
... else:
... raise ImportError()
...
>>> sys.path.append('http://localhost:15000')
>>> sys.path_hooks[0] = check_url
>>> import fib
Looking for fib # Finder output!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'fib'
>>> # Notice installation of Finder in sys.path_importer_cache
>>> sys.path_importer_cache['http://localhost:15000']
<__main__.Finder object at 0x10064c850>
>>>
~~~
This is the key mechanism at work in the last part of this recipe. Essentially, a custom path checking function has been installed that looks for URLs in sys.path. When they are encountered, a new UrlPathFinder instance is created and installed into sys.path_importer_cache. From that point forward, all import statements that pass through that part of sys.path will try to use your custom finder.
Package handling with a path-based importer is somewhat tricky, and relates to the return value of the find_loader() method. For simple modules, find_loader() returns a tuple (loader, None) where loader is an instance of a loader that will import the module.
For a normal package, find_loader() returns a tuple (loader, path) where loader is the loader instance that will import the package (and execute __init__.py) and path is a list of the directories that will make up the initial setting of the package’s __path__ attribute. For example, if the base URL was[http://localhost:15000](http://localhost:15000/) and a user executed import grok, the path returned by find_loader() would be [ ‘[http://local](http://local/) host:15000/grok’ ].
The find_loader() must additionally account for the possibility of a namespace package. A namespace package is a package where a valid package directory name exists, but no __init__.py file can be found. For this case, find_loader() must return a tuple (None, path) where path is a list of directories that would have made up the package’s __path__ attribute had it defined an __init__.py file. For this case, the import mechanism moves on to check further directories on sys.path. If more namespace packages are found, all of the resulting paths are joined together to make a final namespace package. See Recipe 10.5 for more information on namespace packages.
There is a recursive element to package handling that is not immediately obvious in the solution, but also at work. All packages contain an internal path setting, which can be found in __path__ attribute. For example:
~~~
>>> import xml.etree.ElementTree
>>> xml.__path__
['/usr/local/lib/python3.3/xml']
>>> xml.etree.__path__
['/usr/local/lib/python3.3/xml/etree']
>>>
~~~
As mentioned, the setting of __path__ is controlled by the return value of the find_load er() method. However, the subsequent processing of __path__ is also handled by the functions in sys.path_hooks. Thus, when package subcomponents are loaded, the entries in __path__ are checked by the handle_url() function. This causes new instances of UrlPathFinder to be created and added to sys.path_importer_cache.
One remaining tricky part of the implementation concerns the behavior of the han dle_url() function and its interaction with the _get_links() function used internally. If your implementation of a finder involves the use of other modules (e.g., urllib.re quest), there is a possibility that those modules will attempt to make further imports in the middle of the finder’s operation. This can actually cause handle_url() and other parts of the finder to get executed in a kind of recursive loop. To account for this possibility, the implementation maintains a cache of created finders (one per URL). This avoids the problem of creating duplicate finders. In addition, the following fragment of code ensures that the finder doesn’t respond to any import requests while it’s in the processs of getting the initial set of links:
~~~
# Check link cache
if self._links is None:
self._links = [] # See discussion
self._links = _get_links(self._baseurl)
~~~
You may not need this checking in other implementations, but for this example involving URLs, it was required.
Finally, the invalidate_caches() method of both finders is a utility method that is supposed to clear internal caches should the source code change. This method is triggered when a user invokes importlib.invalidate_caches(). You might use it if you want the URL importers to reread the list of links, possibly for the purpose of being able to access newly added files.
In comparing the two approaches (modifying sys.meta_path or using a path hook), it helps to take a high-level view. Importers installed using sys.meta_path are free to handle modules in any manner that they wish. For instance, they could load modules out of a database or import them in a manner that is radically different than normal module/package handling. This freedom also means that such importers need to do more bookkeeping and internal management. This explains, for instance, why the implementation of UrlMetaFinder needs to do its own caching of links, loaders, and other details. On the other hand, path-based hooks are more narrowly tied to the processing of sys.path. Because of the connection to sys.path, modules loaded with such extensions will tend to have the same features as normal modules and packages that programmers are used to.
Assuming that your head hasn’t completely exploded at this point, a key to understanding and experimenting with this recipe may be the added logging calls. You can enable logging and try experiments such as this:
~~~
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> import urlimport
>>> urlimport.install_path_hook()
DEBUG:urlimport:Installing handle_url
>>> import fib
DEBUG:urlimport:Handle path? /usr/local/lib/python33.zip. [No]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'fib'
>>> import sys
>>> sys.path.append('http://localhost:15000')
>>> import fib
DEBUG:urlimport:Handle path? http://localhost:15000\. [Yes]
DEBUG:urlimport:Getting links from http://localhost:15000
DEBUG:urlimport:links: {'spam.py', 'fib.py', 'grok'}
DEBUG:urlimport:find_loader: 'fib'
DEBUG:urlimport:find_loader: module 'fib' found
DEBUG:urlimport:loader: reading 'http://localhost:15000/fib.py'
DEBUG:urlimport:loader: 'http://localhost:15000/fib.py' loaded
I'm fib
>>>
~~~
Last, but not least, spending some time sleeping with [PEP 302](http://www.python.org/dev/peps/pep-0302) and the documentation for importlib under your pillow may be advisable.
- Copyright
- 前言
- 第一章:数据结构和算法
- 1.1 解压序列赋值给多个变量
- 1.2 解压可迭代对象赋值给多个变量
- 1.3 保留最后N个元素
- 1.4 查找最大或最小的N个元素
- 1.5 实现一个优先级队列
- 1.6 字典中的键映射多个值
- 1.7 字典排序
- 1.8 字典的运算
- 1.9 查找两字典的相同点
- 1.10 删除序列相同元素并保持顺序
- 1.11 命名切片
- 1.12 序列中出现次数最多的元素
- 1.13 通过某个关键字排序一个字典列表
- 1.14 排序不支持原生比较的对象
- 1.15 通过某个字段将记录分组
- 1.16 过滤序列元素
- 1.17 从字典中提取子集
- 1.18 映射名称到序列元素
- 1.19 转换并同时计算数据
- 1.20 合并多个字典或映射
- 第二章:字符串和文本
- 2.1 使用多个界定符分割字符串
- 2.2 字符串开头或结尾匹配
- 2.3 用Shell通配符匹配字符串
- 2.4 字符串匹配和搜索
- 2.5 字符串搜索和替换
- 2.6 字符串忽略大小写的搜索替换
- 2.7 最短匹配模式
- 2.8 多行匹配模式
- 2.9 将Unicode文本标准化
- 2.10 在正则式中使用Unicode
- 2.11 删除字符串中不需要的字符
- 2.12 审查清理文本字符串
- 2.13 字符串对齐
- 2.14 合并拼接字符串
- 2.15 字符串中插入变量
- 2.16 以指定列宽格式化字符串
- 2.17 在字符串中处理html和xml
- 2.18 字符串令牌解析
- 2.19 实现一个简单的递归下降分析器
- 2.20 字节字符串上的字符串操作
- 第三章:数字日期和时间
- 3.1 数字的四舍五入
- 3.2 执行精确的浮点数运算
- 3.3 数字的格式化输出
- 3.4 二八十六进制整数
- 3.5 字节到大整数的打包与解包
- 3.6 复数的数学运算
- 3.7 无穷大与NaN
- 3.8 分数运算
- 3.9 大型数组运算
- 3.10 矩阵与线性代数运算
- 3.11 随机选择
- 3.12 基本的日期与时间转换
- 3.13 计算最后一个周五的日期
- 3.14 计算当前月份的日期范围
- 3.15 字符串转换为日期
- 3.16 结合时区的日期操作
- 第四章:迭代器与生成器
- 4.1 手动遍历迭代器
- 4.2 代理迭代
- 4.3 使用生成器创建新的迭代模式
- 4.4 实现迭代器协议
- 4.5 反向迭代
- 4.6 带有外部状态的生成器函数
- 4.7 迭代器切片
- 4.8 跳过可迭代对象的开始部分
- 4.9 排列组合的迭代
- 4.10 序列上索引值迭代
- 4.11 同时迭代多个序列
- 4.12 不同集合上元素的迭代
- 4.13 创建数据处理管道
- 4.14 展开嵌套的序列
- 4.15 顺序迭代合并后的排序迭代对象
- 4.16 迭代器代替while无限循环
- 第五章:文件与IO
- 5.1 读写文本数据
- 5.2 打印输出至文件中
- 5.3 使用其他分隔符或行终止符打印
- 5.4 读写字节数据
- 5.5 文件不存在才能写入
- 5.6 字符串的I/O操作
- 5.7 读写压缩文件
- 5.8 固定大小记录的文件迭代
- 5.9 读取二进制数据到可变缓冲区中
- 5.10 内存映射的二进制文件
- 5.11 文件路径名的操作
- 5.12 测试文件是否存在
- 5.13 获取文件夹中的文件列表
- 5.14 忽略文件名编码
- 5.15 打印不合法的文件名
- 5.16 增加或改变已打开文件的编码
- 5.17 将字节写入文本文件
- 5.18 将文件描述符包装成文件对象
- 5.19 创建临时文件和文件夹
- 5.20 与串行端口的数据通信
- 5.21 序列化Python对象
- 第六章:数据编码和处理
- 6.1 读写CSV数据
- 6.2 读写JSON数据
- 6.3 解析简单的XML数据
- 6.4 增量式解析大型XML文件
- 6.5 将字典转换为XML
- 6.6 解析和修改XML
- 6.7 利用命名空间解析XML文档
- 6.8 与关系型数据库的交互
- 6.9 编码和解码十六进制数
- 6.10 编码解码Base64数据
- 6.11 读写二进制数组数据
- 6.12 读取嵌套和可变长二进制数据
- 6.13 数据的累加与统计操作
- 第七章:函数
- 7.1 可接受任意数量参数的函数
- 7.2 只接受关键字参数的函数
- 7.3 给函数参数增加元信息
- 7.4 返回多个值的函数
- 7.5 定义有默认参数的函数
- 7.6 定义匿名或内联函数
- 7.7 匿名函数捕获变量值
- 7.8 减少可调用对象的参数个数
- 7.9 将单方法的类转换为函数
- 7.10 带额外状态信息的回调函数
- 7.11 内联回调函数
- 7.12 访问闭包中定义的变量
- 第八章:类与对象
- 8.1 改变对象的字符串显示
- 8.2 自定义字符串的格式化
- 8.3 让对象支持上下文管理协议
- 8.4 创建大量对象时节省内存方法
- 8.5 在类中封装属性名
- 8.6 创建可管理的属性
- 8.7 调用父类方法
- 8.8 子类中扩展property
- 8.9 创建新的类或实例属性
- 8.10 使用延迟计算属性
- 8.11 简化数据结构的初始化
- 8.12 定义接口或者抽象基类
- 8.13 实现数据模型的类型约束
- 8.14 实现自定义容器
- 8.15 属性的代理访问
- 8.16 在类中定义多个构造器
- 8.17 创建不调用init方法的实例
- 8.18 利用Mixins扩展类功能
- 8.19 实现状态对象或者状态机
- 8.20 通过字符串调用对象方法
- 8.21 实现访问者模式
- 8.22 不用递归实现访问者模式
- 8.23 循环引用数据结构的内存管理
- 8.24 让类支持比较操作
- 8.25 创建缓存实例
- 第九章:元编程
- 9.1 在函数上添加包装器
- 9.2 创建装饰器时保留函数元信息
- 9.3 解除一个装饰器
- 9.4 定义一个带参数的装饰器
- 9.5 可自定义属性的装饰器
- 9.6 带可选参数的装饰器
- 9.7 利用装饰器强制函数上的类型检查
- 9.8 将装饰器定义为类的一部分
- 9.9 将装饰器定义为类
- 9.10 为类和静态方法提供装饰器
- 9.11 装饰器为被包装函数增加参数
- 9.12 使用装饰器扩充类的功能
- 9.13 使用元类控制实例的创建
- 9.14 捕获类的属性定义顺序
- 9.15 定义有可选参数的元类
- 9.16 *args和**kwargs的强制参数签名
- 9.17 在类上强制使用编程规约
- 9.18 以编程方式定义类
- 9.19 在定义的时候初始化类的成员
- 9.20 利用函数注解实现方法重载
- 9.21 避免重复的属性方法
- 9.22 定义上下文管理器的简单方法
- 9.23 在局部变量域中执行代码
- 9.24 解析与分析Python源码
- 9.25 拆解Python字节码
- 第十章:模块与包
- 10.1 构建一个模块的层级包
- 10.2 控制模块被全部导入的内容
- 10.3 使用相对路径名导入包中子模块
- 10.4 将模块分割成多个文件
- 10.5 利用命名空间导入目录分散的代码
- 10.6 重新加载模块
- 10.7 运行目录或压缩文件
- 10.8 读取位于包中的数据文件
- 10.9 将文件夹加入到sys.path
- 10.10 通过字符串名导入模块
- 10.11 通过导入钩子远程加载模块
- 10.12 导入模块的同时修改模块
- 10.13 安装私有的包
- 10.14 创建新的Python环境
- 10.15 分发包
- 第十一章:网络与Web编程
- 11.1 作为客户端与HTTP服务交互
- 11.2 创建TCP服务器
- 11.3 创建UDP服务器
- 11.4 通过CIDR地址生成对应的IP地址集
- 11.5 生成一个简单的REST接口
- 11.6 通过XML-RPC实现简单的远程调用
- 11.7 在不同的Python解释器之间交互
- 11.8 实现远程方法调用
- 11.9 简单的客户端认证
- 11.10 在网络服务中加入SSL
- 11.11 进程间传递Socket文件描述符
- 11.12 理解事件驱动的IO
- 11.13 发送与接收大型数组
- 第十二章:并发编程
- 12.1 启动与停止线程
- 12.2 判断线程是否已经启动
- 12.3 线程间的通信
- 12.4 给关键部分加锁
- 12.5 防止死锁的加锁机制
- 12.6 保存线程的状态信息
- 12.7 创建一个线程池
- 12.8 简单的并行编程
- 12.9 Python的全局锁问题
- 12.10 定义一个Actor任务
- 12.11 实现消息发布/订阅模型
- 12.12 使用生成器代替线程
- 12.13 多个线程队列轮询
- 12.14 在Unix系统上面启动守护进程
- 第十三章:脚本编程与系统管理
- 13.1 通过重定向/管道/文件接受输入
- 13.2 终止程序并给出错误信息
- 13.3 解析命令行选项
- 13.4 运行时弹出密码输入提示
- 13.5 获取终端的大小
- 13.6 执行外部命令并获取它的输出
- 13.7 复制或者移动文件和目录
- 13.8 创建和解压压缩文件
- 13.9 通过文件名查找文件
- 13.10 读取配置文件
- 13.11 给简单脚本增加日志功能
- 13.12 给内库增加日志功能
- 13.13 记录程序执行的时间
- 13.14 限制内存和CPU的使用量
- 13.15 启动一个WEB浏览器
- 第十四章:测试调试和异常
- 14.1 测试输出到标准输出上
- 14.2 在单元测试中给对象打补丁
- 14.3 在单元测试中测试异常情况
- 14.4 将测试输出用日志记录到文件中
- 14.5 忽略或者期望测试失败
- 14.6 处理多个异常
- 14.7 捕获所有异常
- 14.8 创建自定义异常
- 14.9 捕获异常后抛出另外的异常
- 14.10 重新抛出最后的异常
- 14.11 输出警告信息
- 14.12 调试基本的程序崩溃错误
- 14.13 给你的程序做基准测试
- 14.14 让你的程序跑的更快
- 第十五章:C语言扩展
- 15.1 使用ctypes访问C代码
- 15.2 简单的C扩展模块
- 15.3 一个操作数组的扩展函数
- 15.4 在C扩展模块中操作隐形指针
- 15.5 从扩张模块中定义和导出C的API
- 15.6 从C语言中调用Python代码
- 15.7 从C扩展中释放全局锁
- 15.8 C和Python中的线程混用
- 15.9 用WSIG包装C代码
- 15.10 用Cython包装C代码
- 15.11 用Cython写高性能的数组操作
- 15.12 将函数指针转换为可调用对象
- 15.13 传递NULL结尾的字符串给C函数库
- 15.14 传递Unicode字符串给C函数库
- 15.15 C字符串转换为Python字符串
- 15.16 不确定编码格式的C字符串
- 15.17 传递文件名给C扩展
- 15.18 传递已打开的文件给C扩展
- 15.19 从C语言中读取类文件对象
- 15.20 处理C语言中的可迭代对象
- 15.21 诊断分析代码错误
- 附录A
- 关于译者
- Roadmap