Python 生成器 · JavaBeginnersTutorial 中文系列教程

# Python 生成器 > 原文： [https://javabeginnerstutorial.com/python-tutorial/python-generator/](https://javabeginnerstutorial.com/python-tutorial/python-generator/) 本文旨在为 python 生成器提供一些温和的介绍，并希望能激发读者为他们找到有趣的用法。这是一个有点复杂的主题，包含许多小细节，因此我们将使用下面的约定来提供更多信息。随意跳过这些内容，因为它们不是理解所必需的，但是对于好奇的人，它们应该提供更多的见解。 ## 什么是生成器？生成器是功能强大的编程工具，可用于对大型或计算昂贵的数据进行有效的迭代。与其他迭代方法相比，它还拥有更简单的代码和更好的性能。实现 python 生成器的主要方法是： * 生成器函数（在 Python 2.2 中添加） * 生成器表达式（在 Python 2.4 中添加）首先了解一些核心概念是必不可少的，因此现在让我们深入研究。 ## 迭代和可迭代对象您可能已经知道， *迭代*只是重复代码行的更正式术语，例如在`while`和`for`循环中。基于*可迭代*对象的概念，Python 在该语言中内置了一种特别灵活的迭代方法，**经常在`for`循环中使用**。每当我们以前谈论`for `循环时，**它们始终对可迭代对象进行操作**。可迭代的对象通常是序列类型，例如列表，范围，元组或集合，在 Python 中，可迭代的对象意味着可以用`for`之类的东西来迭代对象。 ```py # These are all iterable objects that 'for' is operating on my_list = [1,2,3,"Python",4] my_tuple = (5,6,7,"Rocks") my_set = {8,9,10} for item in my_list: print(item) for item in my_tuple: print(item) for item in my_set: print(item) for x in range(5): print(x) ``` > 在幕后，我们将对象称为“可迭代”这一事实意味着它公开了一个名为“`__iter__()`”的方法，该方法返回该对象的迭代器。 ## 迭代器迭代器是控制可迭代对象上的循环行为的对象。其目的是跟踪到目前为止迭代已到达的位置。当“`for`”函数和相关函数在可迭代对象上运行时，它们实际上首先要从该对象请求一个迭代器。如果失败，则将引发异常，否则，该迭代器将反复用于获取序列中的下一项，直到序列用尽。实际上，这意味着“`for`”可以循环可提供迭代器的任何对象，但不能在其他任何事情上循环。像`int`或`float`之类的对象不能与其一起使用，因为它们没有实现正确的方法。 ```py # Example: for statement works over a list, but not over a float my_list = [3,5,7,9] my_float = 1.1 for item in my_list: print(item) for item in my_float: print(item) ``` ### 输出 ```py 3 5 7 9 Traceback (most recent call last): File ".\examples\iterator_demo.py", line 23, in <module> for item in my_float: TypeError: 'float' object is not iterable ``` > 迭代器可以跟踪程序在循环中的位置，并在请求时提取下一个项目。迭代器必须： * 创建时将其设置为循环。 * 实现必须返回下一项的`__next __()`方法 * 循环结束时，`__next__()`方法还必须引发`StopIteration()`异常。通常，迭代器对象使用存储在其属性之一中的循环计数器或类似计数器来跟踪其位置。这是一个如何在实践中使用的示例： * 创建一个迭代器：将循环计数器设置为零。 * 迭代器的`__next__()`称为：检查循环计数器 * 如果完成，则返回`StopIteration()`异常 * 如果还没有完成，请增加循环计数器并返回下一个项目 ## 生成器细节生成器可以看作是迭代器的经过改进的版本，旨在由程序员以更直观的方式编写，并使用更少的代码。生成器和迭代器之间存在细微的差异： * 生成器是一种特殊的迭代器 * 他们定义了类似行为的函数 * 他们跟踪自己的内部状态（例如局部变量），这与迭代器不同 > 为了扩展最后一点，迭代器在每次循环时都以新状态开始-对`__next__()`的每次调用都是一个新的函数调用，需要自己设置并创建自己的状态。 > > 生成器不需要执行一次以上的设置-它可以重用上一次调用中的状态。当运行相同的代码数千次时，这将变得更加高效。在以下各节中，我们将介绍如何实现生成器，为什么从中受益于生成器以及一些可用代码示例。 **进一步阅读**：本文档中不涉及但与之紧密相关的高级主题包括： * 将值发送到生成器（“`send()`”方法） * 连接生成器（`yield from`表达式） * 并发/并行处理/协程在本文结尾处的一些参考文献中对它们进行了介绍，但是为了对所有这些概念进行很好的介绍，强烈建议特别使用 [Dave Beazley 的](http://www.dabeaz.com/generators-uk/)PPT。 ## 实现生成器 ### 生成器函数生成器函数可能是在 Python 中实现生成器的最简单方法，但与常规函数和循环相比，它们的学习曲线仍然稍高。简而言之，生成器函数是一种特殊的函数，可以在运行时逐个产生其结果，而不必等待完成并立即返回所有结果。它们很容易发现，因为您会注意到，使用关键字“`yield`”返回值。此处可以使用通常的“返回”，但只能退出该函数。 > 如果生成器没有设法“*产生*”任何数据，但是击中了“返回”，则调用者将返回一个空列表（`[]`）以下是一些通常用于创建生成器函数的语法示例： ```py # Form 1 - Loop through a collection (an iterable) and apply some processing to each item def generator_function(collection): #setup statements here if needed for item in collection: #do some processing return_value = apply_something_processing_to(item) yield return_value # Form 2 - Set up an arbitrary loop and return items that are generated by that - similar to range() function def generator_function(start,stop,step): #setup statements here loop_counter = <initial value based on start> loop_limit = <final value based on stop> #might need to add one to limit to be inclusive of final value loop_step = <value to increment counter by, based on step> #step could be negative, to run backwards while loop_counter != loop_limit: #do some processing return_value = generate_item_based_on(loop_counter) #increment the counter loop_counter += loop_step yield return_value # Form 3 - Illustrates return mechanism - imagine the processing we're doing requires some kind of setup beforehand, perhaps connecting to a network resource def generator_function(collection): #setup statements here if needed setup_succeeded = do_some_setup() if setup_succeeded: for item in collection: #do some processing return_value = apply_something_processing_to(item) yield return_value else: return ``` 如注释中所述，第一种形式逐步遍历可迭代的项目集合，并对每个项目进行某种处理，然后再“`yield`”。虽然不是很令人兴奋，但对于简单的旧迭代当然可以实现，但是它的强大之处在于它可以成为其他生成器链的一部分。由于每个项目需要更多的处理，生成器很快变得易于维护代码。第二种形式是一个示例，可以适用于生成任意长度的值的序列，支持无限序列或无限循环序列等。例如，该机制在处理数学或科学问题时非常有用。这是调用生成器函数的一些示例语法： ```py #Form 1 - more readable my_generator = generator_function(arguments) for result in my_generator: output(result) #Form 2 - more concise for result in generator_function(arguments): output(result) ``` 在第一种形式中，在第一条语句中设置了生成器，并将对它的引用存储在变量中。然后由以下“`for`”循环使用（或使用）它。在第二种形式中，不存储生成器，而是由“`for`”循环立即使用生成器。 #### 实践中的生成器函数这是说明创建和使用简单生成器函数的示例。它将文本文件过滤为仅包含特定字符串的行。它还显示了三种稍微不同的调用生成器的方式，有关详细信息，请参见注释： ```py #Example: Generator Functions def filtered_text(text_lines,wanted_text): """ Compares each line in text_lines to wanted_text Yields the line if it matches """ for line in text_lines: if wanted_text in line: yield line #slow method - read whole file into memory, then use the generator to filter text #need to wait for the whole file to load before anything else can begin #uses more memory #not much benefit here! with open("Programming_Books_List.txt",'r') as file_obj: lots_of_text = file_obj.readlines() matches = filtered_text(lots_of_text,"Python") for match in matches: print(match) #faster method - use the file object as an iterator, filter it with the generator #only needs to keep current line in memory #current line is only read directly before use #outputs each match directly after it is found (before the file has finished reading) with open("Programming_Books_List.txt",'r') as file_obj: matches = filtered_text(file_obj,"Python") for match in matches: print(match) #sleeker method - this is doing the same as the faster method above, but in fewer lines of code #instead of storing the generator object in a variable, it is immediately used in a for loop #this is perhaps less readable, so it can be harder to debug with open("Programming_Books_List.txt",'r') as file_obj: for match in filtered_text(file_obj,"Python"): print(match) ``` ### 生成器表达式生成器表达式是创建简单生成器函数的另一种方法。这些趋向于更加简洁，通常导致单行代码，但并不总是像生成器函数那样可读。它们的主要缺点是它们不如生成器函数灵活。很难在生成器表达式中实现任何特别复杂的操作，因为您限于可以在单个表达式中轻松编写的内容。一些示例语法可能会有所帮助： ```py #Form 1: basic form - iterate over all items, run some processing on each new_generator = (apply_processing to(item) for item in iterable) #Form 2: filter - rejects items if the condition is not true new_generator = (item for item in iterable if condition) #Form 3: combination of forms 1 and 2 new_generator = (apply_processing to(item) for item in iterable if condition) ``` 它们看起来类似于列表推导，但实际上，列表推导会在返回之前将其整个输出构建到内存中的列表中，而生成器表达式一次只会返回其输出一项。创建生成器后，即可使用与使用生成器函数几乎相同的方式来使用（或使用）生成器。这里有一些示例： ```py #Form 1 - more readable my_generator = (apply_processing to(item) for item in iterable) for result in my_generator: output(result) #Form 2 - more concise for result in (apply_processing to(item) for item in iterable): output(result) ``` #### 实践中的生成器表达式这是上一本书的清单示例，使用表格 2 进行了覆盖： ```py #Example: Generator Expressions with open("Programming_Books_List.txt",'r') as file_obj: for match in (line for line in file_obj if "Python" in line): print(match) ``` 注意，**仅需要这三行**。使用生成器函数执行此操作所需的最少行数为 7。这要简洁得多，并能生成精美的代码，但不能在所有情况下都使用。 ## 为什么要使用生成器？在以下情况下，生成器特别有用： 1. 对大量数据执行重复性任务，其中原始数据“仅需要一次” 2. 在长数据序列上执行计算（可能适合或可能不适合内存-甚至可能是无限的！） 3. 生成数据序列，其中仅应在需要时才计算每个项目（惰性求值） 4. 对数据流中的多个项目执行一系列相同的操作（在管道中，类似于 Unix 管道）在第一种情况下，如果数据本身不需要存储在内存中或再次引用，则生成器非常有效。它们使程序员可以零碎地处理较小的数据块，并逐一产生结果。程序绝对不需要保留先前迭代中的任何数据。生成器带来的好处是： * 减少内存使用 * 比其他迭代方法更快的速度和更少的开销 * 它们可以优雅地构造管道在上一个书单搜索示例中，使用生成器并没有真正对性能或资源使用带来任何好处，因为这是一个非常简单的用例，并且源数据不是很大。而且，所需的处理非常少，以致于可以使用其他方法轻松实现。但是，如果每行所需的处理复杂得多怎么办？也许是某种文本分析，自然语言处理或根据字典检查单词？假设在该示例中，我们还希望获取每个书名，在十个不同的在线书店中进行搜索，然后返回最便宜的价格。然后，让我们扩展源数据，例如我们将图书清单替换为 Amazon 可用的每本图书的副本。在这样的规模下，问题变得如此之大，以至于传统的迭代将需要大量资源，并且相对难以以任何效率进行编码。在这种情况下，使用生成器将大大简化代码，并且意味着可以在找到第一个书名后立即开始处理。此外，即使处理非常大的源文件，开销也很小。 ## 基本用法 ### 无限序列到目前为止，使用生成器生成斐波那契数列的一个相当陈词滥调的示例在计算机科学教学界是一个古老的偏爱，但仍然值得一看。这是代码： ```py #Example: Fibonacci sequence using generators def fibonacci(limit): """ Generate the fibonacci sequence, stop when we reach the specified limit """ current = 0 previous1 = 0 previous2 = 0 while current <= limit: return_value = current previous2 = previous1 previous1 = current if current == 0: current = 1 else: current = previous1 + previous2 yield return_value for term in fibonacci(144): print(term) ``` 其输出如下： ```py 0 1 1 2 3 5 8 13 21 34 55 89 144 ``` 这是一个非常琐碎的用例，但它确实说明了一个事实，即生成器正在处理其本地变量中的前两个值的存储，并且这些值在迭代之间不会丢失。没有其他数据被存储，因此该函数在整个生命周期（从第一个迭代到十万次迭代）中将使用几乎相同数量的内存。 ## 高级用法 ### 使用生成器作为管道管道是可以看到生成器实际功率的地方。它们可以通过简单地将生成器链接在一起来实现，以使一个生成器的输出传递到下一个生成器的输入。当需要对一组数据依次执行多项操作时，它们非常有用。正如 Dave Beazley 的精彩演讲（请参阅参考资料）中所指出的那样，系统程序员可以充分利用生成器。比兹利没有产生任意的数学序列，而是演示了有用的技术，例如解析 Web 服务器日志，监视文件和网络端口，遍历文件和文件系统等等。下面是我自己的示例，该示例混合使用了生成器函数和表达式来对多个文件执行文本搜索。我将其设置为在当前目录的所有 Python 文件中搜索字符串“`# TODO:`”。 > 每当我在代码中发现问题时，或者有一个以后想实现的想法时，我都想使用该表示法在尽可能靠近需要的地方插入待办事项。 > > 它通常派上用场，但是在处理包含大量文件的大型项目时，这些说明可能会丢失！这个示例有点费解，可以通过使用正则表达式（很有可能还有其他库或 OS 函数）进行改进，但是作为一个纯 Python 演示，它应该说明使用生成器可以实现的一些功能： ```py # pipeline_demo.py #Example: Search for "# TODO:" at start of lines in Python # files, to pick up what I need to work on next import os def print_filenames(filenames): """Prints out each filename, and returns it back to the pipeline""" for filename in filenames: print(filename) yield filename def file_read_lines(filenames): """Read every line from every file""" for filename in filenames: with open(filename,'r') as file_obj: for line in file_obj: yield line #get a list of all python files in this directory filenames_list = os.listdir(".") #turn it into a generator filenames = (filename for filename in filenames_list) #filter to only Python files (*.py) filenames = (filename for filename in filenames if filename.lower().endswith(".py")) #print out current file name, then pop it back into the pipeline filenames = print_filenames(filenames) #pass the filenames into the file reader, get back the file contents file_lines = file_read_lines(filenames) #strip out leading spaces and tabs from the lines file_lines = (line.lstrip(" \t") for line in file_lines) #filter to just lines starting with "# TODO:" filtered = (line for line in file_lines if line.startswith("# TODO:")) #strip out trailing spaces, tabs and newlines filtered = (line.rstrip() for line in filtered) #display output for item in filtered: print(item) # TODO: Write generator example # TODO: Test on current folder # TODO: Test on a line indented with spaces # TODO: Test on a line indented with tabs # TODO: Add more TODOs ``` 输出： ```py test-TODOs.py # TODO: Test finding a TODO in another file test-noTODOs.py pipeline_demo.py # TODO: Write generator example # TODO: Test on current folder # TODO: Test on a line indented with spaces # TODO: Test on a line indented with tabs # TODO: Add more TODOs ``` 这个想法可以进一步发展-如上所述，在系统编程中，甚至在系统管理空间中，都有无数的用例。如果需要管理服务器上的日志文件，则这种技术将非常有价值。请参阅以下参考资料，以获取有关该主题的更多信息和一些出色的示例。您可以在 [JBTAdmin Github 上获得与本文相关的所有代码](https://github.com/JBTAdmin/python/tree/master/Generator%20Examples)。 ## 参考书目/进一步阅读标题：Python Wiki – 生成器作者：多个来源： [Python Wiki](https://wiki.python.org/moin/Generators) 标题：Python Wiki – 迭代器 Authors: Multiple 来源： [Python Wiki](https://wiki.python.org/moin/Iterator) 标题：Python 生成器作者：斯科特·罗宾逊资料来源：[滥用栈网站](http://stackabuse.com/python-generators/) 标题：Python 实践手册 – 第 5 章。迭代器&生成器作者：Anand Chitipothu 资料来源： [Python 实践手册网站](https://anandology.com/python-practice-book/iterators.html) 标题：系统程序员的生成器技巧 – 版本 2.0 作者：David M. Beazley 资料来源： [David Beazley 的网站](http://www.dabeaz.com/generators-uk/) 标题：Python 生成器的 2 大好处（以及它们如何永久改变了我）作者：亚伦·麦克斯韦（Aaron Maxwell）来源： [O’Reilly 网站](https://www.oreilly.com/ideas/2-great-benefits-of-python-generators-and-how-they-changed-me-forever) ###### 下一篇文章