Python：如何读取和写入文件 · PythonGuru 中文系列教程

# Python：如何读取和写入文件 > 原文： [https://thepythonguru.com/python-how-to-read-and-write-files/](https://thepythonguru.com/python-how-to-read-and-write-files/) * * * 于 2020 年 1 月 7 日更新 * * * 在本文中，我们将学习如何在 Python 中读取和写入文件。处理文件包括以下三个步骤： 1. 打开文件 2. 执行读或写操作 3. 关闭文件让我们详细了解每个步骤。 ## 文件类型 * * * 有两种类型的文件： 1. 文本文件 2. 二进制文件文本文件只是使用 utf-8，latin1 等编码存储字符序列的文件，而对于二进制文件，数据以与计算机内存相同的格式存储。以下是一些文本和二进制文件示例：文本文件：Python 源代码，HTML 文件，文本文件，降价文件等。二进制文件：可执行文件，图像，音频等重要的是要注意，在磁盘内部，两种类型的文件都以 1 和 0 的顺序存储。唯一的区别是，当打开文本文件时，将使用与编码相同的编码方案对数据进行解码。但是，对于二进制文件，不会发生这种情况。 ## 打开文件 - `open()`函数 * * * `open()`内置函数用于打开文件。其语法如下： ```py open(filename, mode) -> file object ``` 成功时，`open()`返回文件对象。如果失败，它将引发`IOError`或它的子类。 | 参数 | 描述 | | --- | --- | | `filename` | 要打开的文件的绝对或相对路径。 | | `mode` | （可选）模式是一个字符串，表示处理模式（即读取，写入，附加等）和文件类型。 | 以下是`mode`的可能值。 | 模式 | 描述 | | --- | --- | | `r` | 打开文件进行读取（默认）。 | | `w` | 打开文件进行写入。 | | `a` | 以附加模式打开文件，即在文件末尾添加新数据。 | | `r+` | 打开文件以进行读写 | | `x` | 仅在尚不存在的情况下，打开文件进行写入。 | 我们还可以将`t`或`b`附加到模式字符串以指示将要使用的文件的类型。 `t`用于文本文件，`b`用于二进制文件。如果未指定，则默认为`t`。 `mode`是可选的，如果未指定，则该文件将作为文本文件打开，仅供读取。这意味着对`open()`的以下三个调用是等效的： ```py # open file todo.md for reading in text mode open('todo.md') open('todo.md', 'r') open('todo.md', 'rt') ``` 请注意，在读取文件之前，该文件必须已经存在，否则`open()`将引发`FileNotFoundError`异常。但是，如果打开文件进行写入（使用`w`，`a`或`r+`之类的模式），Python 将自动为您创建文件。如果文件已经存在，则其内容将被删除。如果要防止这种情况，请以`x`模式打开文件。 ## 关闭文件 - `close()`方法 * * * 处理完文件后，应将其关闭。尽管程序结束时该文件会自动关闭，但是这样做仍然是一个好习惯。无法在大型程序中关闭文件可能会出现问题，甚至可能导致程序崩溃。要关闭文件，请调用文件对象的`close()`方法。关闭文件将释放与其相关的资源，并将缓冲区中的数据刷新到磁盘。 ## 文件指针 * * * 通过`open()`方法打开文件时。操作系统将指向文件中字符的指针关联。文件指针确定从何处进行读取和写入操作。最初，文件指针指向文件的开头，并随着我们向文件读取和写入数据而前进。在本文的后面，我们将看到如何确定文件指针的当前位置，并使用它来随机访问文件的各个部分。 ## 使用`read()`，`readline()`和`readlines()`读取文件 * * * 要读取数据，文件对象提供以下方法： | 方法 | 参数 | | --- | --- | | `read([n])` | 从文件读取并返回`n`个字节或更少的字节（如果没有足够的字符读取）作为字符串。如果未指定`n`，它将以字符串形式读取整个文件并将其返回。 | | `readline()` | 读取并返回字符，直到以字符串形式到达行尾为止。 | | `readlines()` | 读取并返回所有行作为字符串列表。 | 当到达文件末尾（EOF）时，`read()`和`readline()`方法返回一个空字符串，而`readlines()`返回一个空列表（`[]`）。这里有些例子： **`poem.txt`** ```py The caged bird sings with a fearful trill of things unknown but longed for still ``` **示例 1** ：使用`read()` ```py >>> >>> f = open("poem.txt", "r") >>> >>> f.read(3) # read the first 3 characters 'The' >>> >>> f.read() # read the remaining characters in the file. ' caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still\n' >>> >>> f.read() # End of the file (EOF) is reached '' >>> >>> f.close() >>> ``` **示例 2** ：使用`readline()` ```py >>> >>> f = open("poem.txt", "r") >>> >>> f.read(4) # read first 4 characters 'The ' >>> >>> f.readline() # read until the end of the line is reached 'caged bird sings\n' >>> >>> f.readline() # read the second line 'with a fearful trill\n' >>> >>> f.readline() # read the third line 'of things unknown\n' >>> >>> f.readline() # read the fourth line 'but longed for still' >>> >>> f.readline() # EOF reached '' >>> >>> f.close() >>> ``` **示例 3** ：使用`readlines()` ```py >>> >>> f = open("poem.txt", "r") >>> >>> f.readlines() ['The caged bird sings\n', 'with a fearful trill\n', 'of things unknown\n', 'but longed for still\n'] >>> >>> f.readlines() # EOF reached [] >>> >>> f.close() >>> ``` ## 批量读取文件 * * * `read()`（不带参数）和`readlines()`方法立即将所有数据读入内存。因此，请勿使用它们读取大文件。更好的方法是使用`read()`批量读取文件，或使用`readline()`逐行读取文件，如下所示： **示例**：读取文件块 ```py >>> >>> f = open("poem.txt", "r") >>> >>> chunk = 200 >>> >>> while True: ... data = f.read(chunk) ... if not data: ... break ... print(data) ... The caged bird sings with a fearful trill of things unknown but longed for still >>> ``` **示例**：逐行读取文件 ```py >>> >>> f = open("poem.txt", "r") >>> >>> while True: ... line = f.readline() ... if not line: ... break ... print(line) ... The caged bird sings with a fearful trill of things unknown but longed for still >>> ``` 除了使用`read()`（带有参数）或`readline()`方法之外，您还可以使用文件对象一次遍历一行的文件内容。 ```py >>> >>> f = open("poem.txt", "r") >>> >>> for line in f: ... print(line, end="") ... The caged bird sings with a fearful trill of things unknown but longed for still >>> ``` 该代码与前面的示例等效，但是更加简洁，易读且易于键入。 **警告**：提防`readline()`方法，如果您在打开没有任何换行符的大文件时遇到不幸，那么`readline()`并不比`read()`好（无参数）。当您使用文件对象作为迭代器时，也是如此。 ## 使用`write()`和`writelines()`写入数据 * * * 为了写入数据，文件对象提供了以下两种方法： | 方法 | 描述 | | --- | --- | | `write(s)` | 将字符串`s`写入文件并返回写入的数字字符。 | | `writelines(s)` | 将序列`s`中的所有字符串写入文件。 | 以下是示例： ```py >>> >>> f = open("poem_2.txt", "w") >>> >>> f.write("When I think about myself, ") 26 >>> f.write("I almost laugh myself to death.") 31 >>> f.close() # close the file and flush the data in the buffer to the disk >>> >>> >>> f = open("poem_2.txt", "r") # open the file for reading >>> >>> data = f.read() # read entire file >>> >>> data 'When I think about myself, I almost laugh myself to death.' >>> >>> print(data) When I think about myself, I almost laugh myself to death. >>> >>> f.close() >>> ``` 请注意，与`print()`函数不同，`write()`方法不会在行尾添加换行符（`\n`）。如果需要换行符，则必须手动添加它，如下所示： ```py >>> >>> >>> f = open("poem_2.txt", "w") >>> >>> f.write("When I think about myself, \n") # notice newline 27 >>> f.write("I almost laugh myself to death.\n") # notice newline 32 >>> >>> f.close() >>> >>> >>> f = open("poem_2.txt", "r") # open the file again >>> >>> data = f.read() # read the entire file >>> >>> data 'When I think about myself, \nI almost laugh myself to death.\n' >>> >>> print(data) When I think about myself, I almost laugh myself to death. >>> >>> ``` 您还可以使用`print()`函数将换行符附加到该行，如下所示： ```py >>> >>> f = open("poem_2.txt", "w") >>> >>> print("When I think about myself, ", file=f) >>> >>> print("I almost laugh myself to death.", file=f) >>> >>> f.close() >>> >>> >>> f = open("poem_2.txt", "r") # open the file again >>> >>> data = f.read() >>> >>> data 'When I think about myself, \nI almost laugh myself to death.\n' >>> >>> print(data) When I think about myself, I almost laugh myself to death. >>> >>> ``` 这是`writelines()`方法的示例。 ```py >>> >>> lines = [ ... "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod", ... "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam," ... "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo", ... "consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse", ... "cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non", ... "proident, sunt in culpa qui officia deserunt mollit anim id est laborum." ... ] >>> >>> >>> f = open("lorem.txt", "w") >>> >>> f.writelines(lines) >>> >>> f.close() >>> ``` `writelines()`方法在内部调用`write()`方法。 ```py def writelines(self, lines): self._checkClosed() for line in lines: self.write(line) ``` 这是另一个以附加模式打开文件的示例。 ```py >>> >>> f = open("poem_2.txt", "a") >>> >>> f.write("\nAlone, all alone. Nobody, but nobody. Can make it out here alone.") 65 >>> f.close() >>> >>> data = open("poem_2.txt").read() >>> data 'When I think about myself, \nI almost laugh myself to death.\n\nAlone, all alone. Nobody, but nobody. Can make it out here alone.' >>> >>> print(data) When I think about myself, I almost laugh myself to death. Alone, all alone. Nobody, but nobody. Can make it out here alone. >>> ``` 假设文件`poem_2.txt`对于使用非常重要，并且我们不希望其被覆盖。为了防止在`x`模式下打开文件 ```py >>> >>> f = open("poem_2.txt", "x") Traceback (most recent call last): File "<stdin>", line 1, in <module> FileExistsError: [Errno 17] File exists: 'poem.txt' >>> ``` 如果`x`模式不存在，则仅打开该文件进行写入。 ## 缓冲和刷新 * * * 缓冲是在将数据移到新位置之前临时存储数据的过程。对于文件，数据不会立即写入磁盘，而是存储在缓冲存储器中。这样做的基本原理是，将数据写入磁盘需要花费时间，而不是将数据写入物理内存。想象一下，每当调用`write()`方法时一个程序正在写入数据。这样的程序将非常慢。当我们使用缓冲区时，仅当缓冲区已满或调用`close()`方法时，才将数据写入磁盘。此过程称为刷新输出。您也可以使用文件对象的`flush()`方法手动刷新输出。请注意，`flush()`仅将缓冲的数据保存到磁盘。它不会关闭文件。 `open()`方法提供了一个可选的第三个参数来控制缓冲区。要了解更多信息，请访问官方文档。 ## 读写二进制数据 * * * 通过将`b`附加到模式字符串来完成读写二进制文件。在 Python 3 中，二进制数据使用称为`bytes`的特殊类型表示。 `bytes`类型表示介于 0 和 255 之间的数字的不可变序列。让我们通过阅读`poem.txt`文件来创建诗歌的二进制版本。 ```py >>> >>> binary_poem = bytes(open("poem.txt").read(), encoding="utf-8") >>> >>> binary_poem b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still' >>> >>> >>> binary_poem[0] # ASCII value of character T 84 >>> binary_poem[1] # ASCII value of character h 104 >>> ``` 请注意，索引`bytes`对象将返回`int`。让我们将二进制诗写在一个新文件中。 ```py >>> >>> f = open("binary_poem", "wb") >>> >>> f.write(binary_poem) 80 >>> >>> f.close() >>> ``` 现在，我们的二进制诗已写入文件。要读取它，请以`rb`模式打开文件。 ```py >>> >>> f = open("binary_poem", "rb") >>> >>> data = f.read() >>> >>> data b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still' >>> >>> print(data) b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still' >>> >>> f.close() >>> ``` 重要的是要注意，在我们的情况下，二进制数据碰巧包含可打印的字符，例如字母，换行符等。但是，在大多数情况下并非如此。这意味着对于二进制数据，由于文件中可能没有换行符，因此我们无法可靠地使用`readline()`和文件对象（作为迭代器）来读取文件的内容。读取二进制数据的最佳方法是使用`read()`方法分块读取它。 ```py >>> >>> # Just as with text files, you can read (or write) binary files in chunks. >>> >>> f = open("binary_poem", "rb") >>> >>> chunk = 200 >>> >>> while True: ... data = f.read(chunk) ... if not data: ... break ... print(data) ... b'The caged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still' >>> >>> ``` ## 使用`fseek()`和`ftell()`的随机访问 * * * 在本文的前面，我们了解到打开文件时，系统将一个指针与它相关联，该指针确定从哪个位置进行读取或写入。到目前为止，我们已经线性地读写文件。但是也可以在特定位置进行读写。为此，文件对象提供了以下两种方法： | 方法 | 描述 | | --- | --- | | `tell()` | 返回文件指针的当前位置。 | | `seek(offset, [whence=0])` | 将文件指针移动到给定的`offset`。 `offset`引用字节计数，`whence`确定`offset`将相对于文件指针移动的位置。 `whence`的默认值为 0，这意味着`offset`将使文件指针从文件的开头移开。如果`wherece`设置为`1`或`2`，则偏移量将分别将文件的指针从当前位置或文件的末尾移动。 | 现在让我们举一些例子。 ```py >>> >>> ###### binary poem at a glance ####### >>> >>> for i in open("binary_poem", "rb"): ... print(i) ... b'The caged bird sings\n' b'with a fearful trill\n' b'of things unknown\n' b'but longed for still' >>> >>> f.close() >>> >>> ##################################### >>> >>> f = open('binary_poem', 'rb') # open binary_poem file for reading >>> >>> f.tell() # initial position of the file pointer 0 >>> >>> f.read(5) # read 5 bytes b'The c' >>> >>> f.tell() 5 >>> ``` 读取 5 个字符后，文件指针现在位于字符`a`（用字`caged`表示）。因此，下一个读取（或写入）操作将从此处开始。 ```py >>> >>> >>> f.read() b'aged bird sings\nwith a fearful trill\nof things unknown\nbut longed for still' >>> >>> f.tell() 80 >>> >>> f.read() # EOF reached b'' >>> >>> f.tell() 80 >>> ``` 现在，我们已到达文件末尾。此时，我们可以使用`fseek()`方法将文件指针后退到文件的开头，如下所示： ```py >>> >>> f.seek(0) # rewind the file pointer to the beginning, same as seek(0, 0) 0 >>> >>> f.tell() 0 >>> ``` 文件指针现在位于文件的开头。从现在开始的所有读取和写入操作将从文件的开头再次进行。 ```py >>> >>> f.read(14) # read the first 14 characters b'The caged bird' >>> >>> >>> f.tell() 14 >>> ``` 要将文件指针从当前位置从 12 个字节向前移动，请按以下步骤操作：`seek()`： ```py >>> >>> f.seek(12, 1) 26 >>> >>> f.tell() 26 >>> >>> ``` 文件指针现在位于字符`a`（在单词`with`之后），因此将从此处进行读取和写入操作。 ```py >>> >>> >>> f.read(15) b'a fearful trill' >>> >>> ``` 我们还可以向后移动文件指针。例如，以下对`seek()`的调用将文件指针从当前位置向后移 13 个字节。 ```py >>> >>> f.seek(-13, 1) 28 >>> >>> f.tell() 28 >>> >>> f.read(7) b'fearful' >>> ``` 假设我们要读取文件的最后 16 个字节。为此，将文件指针相对于文件末尾向后移 16 个字节。 ```py >>> >>> f.seek(-16, 2) 64 >>> >>> f.read() b'longed for still' >>> ``` `fseek()`的`whence`自变量的值在`os`模块中也定义为常量。 | 值 | 常量 | | --- | --- | | `0` | `SEEK_SET` | | `1` | `SEEK_CUR` | | `2` | `SEEK_END` | ## `with`语句 * * * 使用`with`语句可使我们在完成处理后自动关闭文件。其语法如下： ```py with expression as variable: # do operations on file here. ``` 必须像`for`循环一样，将`with`语句内的语句缩进相等，否则将引发`SyntaxError`异常。这是一个例子： ```py >>> >>> with open('poem.txt') as f: ... print(f.read()) # read the entire file ... The caged bird sings with a fearful trill of things unknown but longed for still >>> ``` * * * * * *