Python module & package

Python module & package

本文介绍 python module 和 package 的知识

1. Module

Python import are runtime operations that perform three distinct steps the first time a program imports a given file:

  1. Find the module’s file
  2. Compile it to byte code (if needed)
  3. Run the module’s code to build the objects it defines

All three of these steps are carried out only the first time a module is imported during a program’s execution; later imports of the same module in a program run bypass all of these steps and simply fetch the already loaded module object in memory.

1.1 Find it

请不要使用 类似 import c:\dir1\b.py 这种形式 (包括路径和文件扩展名 .py)。因为 import 会特地忽略 path and extension details. Python 会使用 标准模块搜索路径 (standard module search path) 和已知的文件的类型来定位 import 声明对应的文件。

搜索路径包括 (优先级由高到低):

  1. The home directory of the program (automatic)
    Python 首先会搜索我们要运行程序的顶层脚本文件所在的目录。由于该目录总是第一个被搜索,所以请小心别覆盖后面的搜索路径的相同名字的其他模块

  2. PATHONPATH 环境变量 (configurable)
    由于 Python 首先会搜索执行脚本的主目录。只有在我们需要跨越目录边界来导入模块时,才有必要该变量。比如开发 substantial programs.

  3. Standard library directories (automatic)
    由于 Python 自带的 standard library modules 总是被搜索,因此也没有必要加入 PYTHONPATH 环境变量中

  4. .pth path file directories (configurable)
    users can just add directories to the module search path by simply listing them, one per line, in a text file whose name ends with a .pth suffix (for “path”)
    这个文件一般放置在 Python 安装的顶层目录 (比如 Windows 的 C:\Python33 ) 或者 site-packages 子目录 (C:\Python33\Lib\site-packages)
    对于某些应用,文本配置比环境变量配置更方便些

  5. The lib\site-packages directory of third-party extensions (automatic)
    最后搜索的是安装的第三方库文件

请记住: module search path settings are only needed to import across directory boundaries

如果我们写 import b , 那么导入的到底是哪个文件,它可能是:

  • 源码文件, b.py
  • 字节码文件,b.pyc
  • 优化后的字节码文件, b.pyo (用得比较少)
  • 名称为 b 的目录,用于包 (package) 导入
  • 编译后的扩展模块,用 C/C++,或其他语言编写,在导入时动态链接 (比如,Linux 的 b.so , Windows 和 Cygwin 的 b.dllb.pyd )
  • a compiled built-in module coded in C and statically linked into python
  • zip 文件模块,导入时自动解压
  • an in-memory image, for frozen executables
  • a Java class in the Jython version of Python
  • a .NET component, in the IronPython version of Python

其实 Python 的一些标准模块也是用 C 编写的,但 import 该模块以及使用,和普通的 Python 模块是完全相同的。即对用户是透明的。

注意 Python 自带的 distutils 可以了解一下

importfrom 的区别:

import fetches the module as a whole, so we must qualify tp fetch its names; in contrast, from fetches (or copies) specific name out of the module.

如果我们使用 from * 语句,那么我们将引用模块的顶层位置所有名字都拷贝出来

但强烈不建议这样做。 from module import * form really can corrupt namespaces and make names difficult to understand, especially when applied to more than one file.

1.2 Compile it (Maybe)

找到指定文件后,如果有必要,Python 会将它编译为字节码 (byte code)。

如何判断是否有必要 ?

检测文件的修改时间 (术语称 “时间戳” timestamps), 同时检测字节码的 Python 版本文件 (术语称 “magic” number embedded in the byte code or a filename, depending on the Python release being used) .

在 Python 3.2 后, byte code files are segregated in a __pycache__ subdirectory, which is located in the directory containing the corresponding source files. They are also named with their Python version to avoid contention and recompiles when multiple Python are installed, because each version of python installed can have its own uniquely named version of byte code files in the __pycache__ subdirectory, running under a given version doesn’t overwrite the byte code of another, and doesn’t require recompiles.

如果,Python 找到一个 .pyc byte code file 生成时间不晚于 (not older) 对应的 .py 源码文件,同时是被相同的 Python 版本创建,那么我们就可以忽略 source-to-byte-code 的编译步骤。此外,如果 Python 在搜索路径仅找到 byte code file, 没有源码,它会直接导入 byte code.

Notice that compilation happens when a file is being imported. Because of this, you will not usually see a .pyc file for the top-level of you program, unless it is also imported elsewhere – only imported files leave behind .pyc files on your machines.

1.3 Run it

import 操作的最后一步是执行模块的字节码。Python 会逐句执行命令,从上到下。任何赋值给某个 name 的语句都会给结果中的 module object 添加一个 属性 (attribute)

比如: 在 import 某个 module 时, 赋值语句 def 会创建一个 function, 并且将其赋值给该 module 的一个属性 (attribute)。随后我们就可以调用该 module 的这个 function attribute

由于 import 操作会运行导入的文件的代码,如果该 module file 的顶层文件确实做了某些工作,那么我们在 import 后就可以看到代码执行的结果。比如,如果在 module 中使用 print statement , 该模块被导入时,会显示 print statement 的输出结果。

注意:

A module’s code is run only once per process by default. To force a module’s code to be reloaded and return, you need to ask Python to do so explicitly by calling the reload build-in function.

  • reload runs a module file’s new code in the module’s current namespace.
  • Top-level assignments in the file replace names with new values
  • Reloads impact all clients that use import to fetch modules
  • Reloads impact future from clients only. Il doesn’t update the from clients in the past
  • Reloads apply to a single module only

1.4 Module Namespaces

Modules are probably best understood as simply packages of names – i.e., places to define names you want to make visible to the rest of a system.

  • Module statements run on the first import
  • Top-level assignments create module attributes
  • Module namespaces can be accessed via the attribute __dict__ or dir(M)
  • Modules are a single scope (local is global)

Python Scope 中提到,搜索 name 要服从 LEGB 规则,这里要重新指出,该规则只针对 bare, unqualified name

  1. simple variable (unqualified name)
    X means search for the name X in the current scope (following the LEGB rule)

  2. Qualification
    X.Y means find X in the current scope, then search for the attribute Y in the object X (not in scopes)

  3. Qualification paths
    X.Y.Z means loop up the same name Y in the object X , then loop up Z in the object X.Y

  4. Generality
    Qualification works on all objects with attributes: modules, classes, C extension types, etc

请记住:

  • Functions can never see names in other functions, unless they are physically enclosing
  • Module code can never see names in other modules, unless they are explicitly imported.

2. Packages

如果说 module 对应 file, 那么 package 对应的就是 directory 了。

格式:

Don’t use any platform-specific path syntax in your import statements, such as C:\dir1, My Documents.dir2 or ../dir1 , theses do not work syntactically. Instead, use any such platform-specific syntax in your module search path settings to name the container directories.

比如, 我们有一个 package (可以直接理解为 directory) dir0 , 硬盘位置是 C:\mycode ,下面的写法不符合语法

合理的做法是在 PYTHONPATH 变量或 .pth 文件中添加 C:\mycode ,然后写

Python 选择 . 作为路径分隔,部分原因是平台独立 (windows 和 linux 使用不同的下划线分割),同时也是因为 import 语句的路径也是真正的 nested object paths。

比如, import mod.py 是会被认为在 mod (这是一个包)中寻找 py.py 文件,会报错

2.1 __init__.py 文件

Before Python 3.3, each directory named within the path of a package import statement must contain a file named __init__.py, or our package imports will fail.

上面的例子中, 包 dir1dir2 必须包含文件 __init__.py, 而包含 dir1 的目录 dir0 必须添加到 PYTHONPATH 变量或 .pth 文件

dir0\ # Container on module search path
    dir1\
        __init__.py
        dir2\
            __init__.py
            mod.py

__init__.py 的作用:

Their code is run automatically the first time a Python program imports a directory, and thus serves primarily as a hook for performing initialization steps required by the package.

  • Package initialization
    import statements run each directory’s initialization file the first time that directory is traversed.

  • Module usability declaration

  • Module namespace initialization

  • from * statement behavior
    __init__.py 必须赋值 __all__ 变量,否则 from * 只会导入在目录的 __init__.py 文件中赋值的变量,而不会导入目录中的所有子模块

Python 3.X 的改变:

  1. It modifies the module import search path semantics to skip the package’s own directory by default. Imports check only paths on the sys.path search path. These are known as absolute imports
    请注意包目录是不在 sys.path 搜索路径中的
    但项目的顶层目录是在 sys.path 搜索路径中的

  2. It extends the syntax of from statements to allow them to explicitly request that imports search the package’s directory only, with leading dots. This is known as relative import syntax. But a from import without leading-dot syntax is considered absolute as well.

e.g. 下面的例子表示从相同的包目录中导入 spam module

下面的例子表示从相同包目录的 spam module 导入变量 name

从 Python 3.X 以后,下面的语句表示从 sys.path 包含的所有绝对路径中寻找 string module,

事实上,在相对路径导入中 . 表示 import 文件所在的包目录。如果再加一个点,即 .. ,则表示包目录的更上一级目录。学过 Linux 相对路径的应该都清楚。

总结:

  1. Relative imports apply to imports within packages only
  2. Relative imports apply to the from statement with leading dots only

发表评论

电子邮件地址不会被公开。 必填项已用*标注

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d 博主赞过: