## Chapter 12. Serialization(序列化)
### Item 85: Prefer alternatives to Java serialization(优先选择 Java 序列化的替代方案)
When serialization was added to Java in 1997, it was known to be somewhat risky. The approach had been tried in a research language (Modula-3) but never in a production language. While the promise of distributed objects with little effort on the part of the programmer was appealing, the price was invisible constructors and blurred lines between API and implementation, with the potential for problems with correctness, performance, security, and maintenance. Proponents believed the benefits outweighed the risks, but history has shown otherwise.
当序列化在 1997 年添加到 Java 中时,它被认为有一定的风险。这种方法曾在研究语言(Modula-3)中尝试过,但从未在生产语言中使用过。虽然程序员不费什么力气就能实现分布式对象,这一点很吸引人,但代价也不小,如:不可见的构造函数、API 与实现之间模糊的界线,还可能会出现正确性、性能、安全性和维护方面的问题。支持者认为收益大于风险,但历史证明并非如此。
The security issues described in previous editions of this book turned out to be every bit as serious as some had feared. The vulnerabilities discussed in the early 2000s were transformed into serious exploits over the next decade, famously including a ransomware attack on the San Francisco Metropolitan Transit Agency Municipal Railway (SFMTA Muni) that shut down the entire fare collection system for two days in November 2016 [Gallagher16].
在本书之前的版本中描述的安全问题,和人们担心的一样严重。21 世纪初仅停留在讨论的漏洞在接下来的 10 年间变成了真实严重的漏洞,其中最著名的包括 2016 年 11 月对旧金山大都会运输署市政铁路(SFMTA Muni)的勒索软件攻击,导致整个收费系统关闭了两天 [Gallagher16]。
A fundamental problem with serialization is that its attack surface is too big to protect, and constantly growing: Object graphs are deserialized by invoking the readObject method on an ObjectInputStream. This method is essentially a magic constructor that can be made to instantiate objects of almost any type on the class path, so long as the type implements the Serializable interface. In the process of deserializing a byte stream, this method can execute code from any of these types, so the code for all of these types is part of the attack surface.
序列化的一个根本问题是它的可攻击范围太大,且难以保护,而且问题还在不断增多:通过调用 ObjectInputStream 上的 readObject 方法反序列化对象图。这个方法本质上是一个神奇的构造函数,可以用来实例化类路径上几乎任何类型的对象,只要该类型实现 Serializable 接口。在反序列化字节流的过程中,此方法可以执行来自任何这些类型的代码,因此所有这些类型的代码都在攻击范围内。
The attack surface includes classes in the Java platform libraries, in third-party libraries such as Apache Commons Collections, and in the application itself. Even if you adhere to all of the relevant best practices and succeed in writing serializable classes that are invulnerable to attack, your application may still be vulnerable. To quote Robert Seacord, technical manager of the CERT Coordination Center:
攻击可涉及 Java 平台库、第三方库(如 Apache Commons collection)和应用程序本身中的类。即使坚持履行实践了所有相关的最佳建议,并成功地编写了不受攻击的可序列化类,应用程序仍然可能是脆弱的。引用 CERT 协调中心技术经理 Robert Seacord 的话:
Java deserialization is a clear and present danger as it is widely used both directly by applications and indirectly by Java subsystems such as RMI (Remote Method Invocation), JMX (Java Management Extension), and JMS (Java Messaging System). Deserialization of untrusted streams can result in remote code execution (RCE), denial-of-service (DoS), and a range of other exploits. Applications can be vulnerable to these attacks even if they did nothing wrong. [Seacord17]
Java 反序列化是一个明显且真实的危险源,因为它被应用程序直接和间接地广泛使用,比如 RMI(远程方法调用)、JMX(Java 管理扩展)和 JMS(Java 消息传递系统)。不可信流的反序列化可能导致远程代码执行(RCE)、拒绝服务(DoS)和一系列其他攻击。应用程序很容易受到这些攻击,即使它们本身没有错误。[Seacord17]
Attackers and security researchers study the serializable types in the Java libraries and in commonly used third-party libraries, looking for methods invoked during deserialization that perform potentially dangerous activities. Such methods are known as gadgets. Multiple gadgets can be used in concert, to form a gadget chain. From time to time, a gadget chain is discovered that is sufficiently powerful to allow an attacker to execute arbitrary native code on the underlying hardware, given only the opportunity to submit a carefully crafted byte stream for deserialization. This is exactly what happened in the SFMTA Muni attack. This attack was not isolated. There have been others, and there will be more.
攻击者和安全研究人员研究 Java 库和常用的第三方库中的可序列化类型,寻找在反序列化过程中调用的潜在危险活动的方法称为 gadget。多个小工具可以同时使用,形成一个小工具链。偶尔会发现一个小部件链,它的功能足够强大,允许攻击者在底层硬件上执行任意的本机代码,允许提交精心设计的字节流进行反序列化。这正是 SFMTA Muni 袭击中发生的事情。这次袭击并不是孤立的。不仅已经存在,而且还会有更多。
Without using any gadgets, you can easily mount a denial-of-service attack by causing the deserialization of a short stream that requires a long time to deserialize. Such streams are known as deserialization bombs [Svoboda16]. Here’s an example by Wouter Coekaerts that uses only hash sets and a string [Coekaerts15]:
不使用任何 gadget,你都可以通过对需要很长时间才能反序列化的短流进行反序列化,轻松地发起拒绝服务攻击。这种流被称为反序列化炸弹 [Svoboda16]。下面是 Wouter Coekaerts 的一个例子,它只使用哈希集和字符串 [Coekaerts15]:
```
// Deserialization bomb - deserializing this stream takes forever
static byte[] bomb() {
Set<Object> root = new HashSet<>();
Set<Object> s1 = root;
Set<Object> s2 = new HashSet<>();
for (int i = 0; i < 100; i++) {
Set<Object> t1 = new HashSet<>();
Set<Object> t2 = new HashSet<>();
t1.add("foo"); // Make t1 unequal to t2
s1.add(t1); s1.add(t2);
s2.add(t1); s2.add(t2);
s1 = t1;
s2 = t2;
}
return serialize(root); // Method omitted for brevity
}
```
The object graph consists of 201 HashSet instances, each of which contains 3 or fewer object references. The entire stream is 5,744 bytes long, yet the sun would burn out long before you could deserialize it. The problem is that deserializing a HashSet instance requires computing the hash codes of its elements. The 2 elements of the root hash set are themselves hash sets containing 2 hash-set elements, each of which contains 2 hash-set elements, and so on, 100 levels deep. Therefore, deserializing the set causes the hashCode method to be invoked over 2100 times. Other than the fact that the deserialization is taking forever, the deserializer has no indication that anything is amiss. Few objects are produced, and the stack depth is bounded.
对象图由 201 个 HashSet 实例组成,每个实例包含 3 个或更少的对象引用。整个流的长度为 5744 字节,但是在你对其进行反序列化之前,资源就已经耗尽了。问题在于,反序列化 HashSet 实例需要计算其元素的哈希码。根哈希集的 2 个元素本身就是包含 2 个哈希集元素的哈希集,每个哈希集元素包含 2 个哈希集元素,以此类推,深度为 100。因此,反序列化 Set 会导致 hashCode 方法被调用超过 2100 次。除了反序列化会持续很长时间之外,反序列化器没有任何错误的迹象。生成的对象很少,并且堆栈深度是有界的。
So what can you do defend against these problems? You open yourself up to attack whenever you deserialize a byte stream that you don’t trust. **The best way to avoid serialization exploits is never to deserialize anything.** In the words of the computer named Joshua in the 1983 movie WarGames, “the only winning move is not to play.” **There is no reason to use Java serialization in any new system you write.** There are other mechanisms for translating between objects and byte sequences that avoid many of the dangers of Java serialization, while offering numerous advantages, such as cross-platform support, high performance, a large ecosystem of tools, and a broad community of expertise. In this book, we refer to these mechanisms as cross-platform structured-data representations. While others sometimes refer to them as serialization systems, this book avoids that usage to prevent confusion with Java serialization.
那么你能做些什么来抵御这些问题呢?当你反序列化一个你不信任的字节流时,你就会受到攻击。**避免序列化利用的最好方法是永远不要反序列化任何东西。** 用 1983 年电影《战争游戏》(WarGames)中名为约书亚(Joshua)的电脑的话来说,「唯一的制胜绝招就是不玩。」**没有理由在你编写的任何新系统中使用 Java 序列化。** 还有其他一些机制可以在对象和字节序列之间进行转换,从而避免了 Java 序列化的许多危险,同时还提供了许多优势,比如跨平台支持、高性能、大量工具和广泛的专家社区。在本书中,我们将这些机制称为跨平台结构数据表示。虽然其他人有时将它们称为序列化系统,但本书避免使用这种说法,以免与 Java 序列化混淆。
What these representations have in common is that they’re far simpler than Java serialization. They don’t support automatic serialization and deserialization of arbitrary object graphs. Instead, they support simple, structured data-objects consisting of a collection of attribute-value pairs. Only a few primitive and array data types are supported. This simple abstraction turns out to be sufficient for building extremely powerful distributed systems and simple enough to avoid the serious problems that have plagued Java serialization since its inception.
以上所述技术的共同点是它们比 Java 序列化简单得多。它们不支持任意对象图的自动序列化和反序列化。相反,它们支持简单的结构化数据对象,由一组「属性-值」对组成。只有少数基本数据类型和数组数据类型得到支持。事实证明,这个简单的抽象足以构建功能极其强大的分布式系统,而且足够简单,可以避免 Java 序列化从一开始就存在的严重问题。
The leading cross-platform structured data representations are JSON [JSON] and Protocol Buffers, also known as protobuf [Protobuf]. JSON was designed by Douglas Crockford for browser-server communication, and protocol buffers were designed by Google for storing and interchanging structured data among its servers. Even though these representations are sometimes called languageneutral, JSON was originally developed for JavaScript and protobuf for C++; both representations retain vestiges of their origins.
领先的跨平台结构化数据表示是 JSON 和 Protocol Buffers,也称为 protobuf。JSON 由 Douglas Crockford 设计用于浏览器与服务器通信,Protocol Buffers 由谷歌设计用于在其服务器之间存储和交换结构化数据。尽管这些技术有时被称为「中性语言」,但 JSON 最初是为 JavaScript 开发的,而 protobuf 是为 c++ 开发的;这两种技术都保留了其起源的痕迹。
The most significant differences between JSON and protobuf are that JSON is text-based and human-readable, whereas protobuf is binary and substantially more efficient; and that JSON is exclusively a data representation, whereas protobuf offers schemas (types) to document and enforce appropriate usage. Although protobuf is more efficient than JSON, JSON is extremely efficient for a text-based representation. And while protobuf is a binary representation, it does provide an alternative text representation for use where human-readability is desired (pbtxt).
JSON 和 protobuf 之间最显著的区别是 JSON 是基于文本的,并且是人类可读的,而 protobuf 是二进制的,但效率更高;JSON 是一种专门的数据表示,而 protobuf 提供模式(类型)来记录和执行适当的用法。虽然 protobuf 比 JSON 更有效,但是 JSON 对于基于文本的表示非常有效。虽然 protobuf 是一种二进制表示,但它确实提供了另一种文本表示,可用于需要具备人类可读性的场景(pbtxt)。
If you can’t avoid Java serialization entirely, perhaps because you’re working in the context of a legacy system that requires it, your next best alternative is to **never deserialize untrusted data.** In particular, you should never accept RMI traffic from untrusted sources. The official secure coding guidelines for Java say “Deserialization of untrusted data is inherently dangerous and should be avoided.” This sentence is set in large, bold, italic, red type, and it is the only text in the entire document that gets this treatment [Java-secure].
如果你不能完全避免 Java 序列化,可能是因为你需要在遗留系统环境中工作,那么你的下一个最佳选择是 **永远不要反序列化不可信的数据。** 特别要注意,你不应该接受来自不可信来源的 RMI 流量。Java 的官方安全编码指南说:「反序列化不可信的数据本质上是危险的,应该避免。」这句话是用大号、粗体、斜体和红色字体设置的,它是整个文档中唯一得到这种格式处理的文本。[Java-secure]
If you can’t avoid serialization and you aren’t absolutely certain of the safety of the data you’re deserializing, use the object deserialization filtering added in Java 9 and backported to earlier releases (java.io.ObjectInputFilter). This facility lets you specify a filter that is applied to data streams before they’re deserialized. It operates at the class granularity, letting you accept or reject certain classes. Accepting classes by default and rejecting a list of potentially dangerous ones is known as blacklisting; rejecting classes by default and accepting a list of those that are presumed safe is known as whitelisting. **Prefer whitelisting to blacklisting,** as blacklisting only protects you against known threats. A tool called Serial Whitelist Application Trainer (SWAT) can be used to automatically prepare a whitelist for your application [Schneider16]. The filtering facility will also protect you against excessive memory usage, and excessively deep object graphs, but it will not protect you against serialization bombs like the one shown above.
如果无法避免序列化,并且不能绝对确定反序列化数据的安全性,那么可以使用 Java 9 中添加的对象反序列化筛选,并将其移植到早期版本(java.io.ObjectInputFilter)。该工具允许你指定一个过滤器,该过滤器在反序列化数据流之前应用于数据流。它在类粒度上运行,允许你接受或拒绝某些类。默认接受所有类,并拒绝已知潜在危险类的列表称为黑名单;在默认情况下拒绝其他类,并接受假定安全的类的列表称为白名单。**优先选择白名单而不是黑名单,** 因为黑名单只保护你免受已知的威胁。一个名为 Serial Whitelist Application Trainer(SWAT)的工具可用于为你的应用程序自动准备一个白名单 [Schneider16]。过滤工具还将保护你免受过度内存使用和过于深入的对象图的影响,但它不能保护你免受如上面所示的序列化炸弹的影响。
Unfortunately, serialization is still pervasive in the Java ecosystem. If you are maintaining a system that is based on Java serialization, seriously consider migrating to a cross-platform structured-data representation, even though this may be a time-consuming endeavor. Realistically, you may still find yourself having to write or maintain a serializable class. It requires great care to write a serializable class that is correct, safe, and efficient. The remainder of this chapter provides advice on when and how to do this.
不幸的是,序列化在 Java 生态系统中仍然很普遍。如果你正在维护一个基于 Java 序列化的系统,请认真考虑迁移到跨平台的结构化数据,尽管这可能是一项耗时的工作。实际上,你可能仍然需要编写或维护一个可序列化的类。编写一个正确、安全、高效的可序列化类需要非常小心。本章的其余部分将提供何时以及如何进行此操作的建议。
In summary, serialization is dangerous and should be avoided. If you are designing a system from scratch, use a cross-platform structured-data representation such as JSON or protobuf instead. Do not deserialize untrusted data. If you must do so, use object deserialization filtering, but be aware that it is not guaranteed to thwart all attacks. Avoid writing serializable classes. If you must do so, exercise great caution.
总之,序列化是危险的,应该避免。如果你从头开始设计一个系统,可以使用跨平台的结构化数据,如 JSON 或 protobuf。不要反序列化不可信的数据。如果必须这样做,请使用对象反序列化过滤,但要注意,它不能保证阻止所有攻击。避免编写可序列化的类。如果你必须这样做,一定要非常小心。
---
**[Back to contents of the chapter(返回章节目录)](/Chapter-12/Chapter-12-Introduction.md)**
- **Previous Item(上一条目):[Item 84: Don’t depend on the thread scheduler(不要依赖线程调度器)](/Chapter-11/Chapter-11-Item-84-Don’t-depend-on-the-thread-scheduler.md)**
- **Next Item(下一条目):[Item 86: Implement Serializable with great caution(非常谨慎地实现 Serializable)](/Chapter-12/Chapter-12-Item-86-Implement-Serializable-with-great-caution.md)**
- Chapter 2. Creating and Destroying Objects(创建和销毁对象)
- Item 1: Consider static factory methods instead of constructors(考虑以静态工厂方法代替构造函数)
- Item 2: Consider a builder when faced with many constructor parameters(在面对多个构造函数参数时,请考虑构建器)
- Item 3: Enforce the singleton property with a private constructor or an enum type(使用私有构造函数或枚举类型实施单例属性)
- Item 4: Enforce noninstantiability with a private constructor(用私有构造函数实施不可实例化)
- Item 5: Prefer dependency injection to hardwiring resources(依赖注入优于硬连接资源)
- Item 6: Avoid creating unnecessary objects(避免创建不必要的对象)
- Item 7: Eliminate obsolete object references(排除过时的对象引用)
- Item 8: Avoid finalizers and cleaners(避免使用终结器和清除器)
- Item 9: Prefer try with resources to try finally(使用 try-with-resources 优于 try-finally)
- Chapter 3. Methods Common to All Objects(对象的通用方法)
- Item 10: Obey the general contract when overriding equals(覆盖 equals 方法时应遵守的约定)
- Item 11: Always override hashCode when you override equals(当覆盖 equals 方法时,总要覆盖 hashCode 方法)
- Item 12: Always override toString(始终覆盖 toString 方法)
- Item 13: Override clone judiciously(明智地覆盖 clone 方法)
- Item 14: Consider implementing Comparable(考虑实现 Comparable 接口)
- Chapter 4. Classes and Interfaces(类和接口)
- Item 15: Minimize the accessibility of classes and members(尽量减少类和成员的可访问性)
- Item 16: In public classes use accessor methods not public fields(在公共类中,使用访问器方法,而不是公共字段)
- Item 17: Minimize mutability(减少可变性)
- Item 18: Favor composition over inheritance(优先选择复合而不是继承)
- Item 19: Design and document for inheritance or else prohibit it(继承要设计良好并且具有文档,否则禁止使用)
- Item 20: Prefer interfaces to abstract classes(接口优于抽象类)
- Item 21: Design interfaces for posterity(为后代设计接口)
- Item 22: Use interfaces only to define types(接口只用于定义类型)
- Item 23: Prefer class hierarchies to tagged classes(类层次结构优于带标签的类)
- Item 24: Favor static member classes over nonstatic(静态成员类优于非静态成员类)
- Item 25: Limit source files to a single top level class(源文件仅限有单个顶层类)
- Chapter 5. Generics(泛型)
- Item 26: Do not use raw types(不要使用原始类型)
- Item 27: Eliminate unchecked warnings(消除 unchecked 警告)
- Item 28: Prefer lists to arrays(list 优于数组)
- Item 29: Favor generic types(优先使用泛型)
- Item 30: Favor generic methods(优先使用泛型方法)
- Item 31: Use bounded wildcards to increase API flexibility(使用有界通配符增加 API 的灵活性)
- Item 32: Combine generics and varargs judiciously(明智地合用泛型和可变参数)
- Item 33: Consider typesafe heterogeneous containers(考虑类型安全的异构容器)
- Chapter 6. Enums and Annotations(枚举和注解)
- Item 34: Use enums instead of int constants(用枚举类型代替 int 常量)
- Item 35: Use instance fields instead of ordinals(使用实例字段替代序数)
- Item 36: Use EnumSet instead of bit fields(用 EnumSet 替代位字段)
- Item 37: Use EnumMap instead of ordinal indexing(使用 EnumMap 替换序数索引)
- Item 38: Emulate extensible enums with interfaces(使用接口模拟可扩展枚举)
- Item 39: Prefer annotations to naming patterns(注解优于命名模式)
- Item 40: Consistently use the Override annotation(坚持使用 @Override 注解)
- Item 41: Use marker interfaces to define types(使用标记接口定义类型)
- Chapter 7. Lambdas and Streams(λ 表达式和流)
- Item 42: Prefer lambdas to anonymous classes(λ 表达式优于匿名类)
- Item 43: Prefer method references to lambdas(方法引用优于 λ 表达式)
- Item 44: Favor the use of standard functional interfaces(优先使用标准函数式接口)
- Item 45: Use streams judiciously(明智地使用流)
- Item 46: Prefer side effect free functions in streams(在流中使用无副作用的函数)
- Item 47: Prefer Collection to Stream as a return type(优先选择 Collection 而不是流作为返回类型)
- Item 48: Use caution when making streams parallel(谨慎使用并行流)
- Chapter 8. Methods(方法)
- Item 49: Check parameters for validity(检查参数的有效性)
- Item 50: Make defensive copies when needed(在需要时制作防御性副本)
- Item 51: Design method signatures carefully(仔细设计方法签名)
- Item 52: Use overloading judiciously(明智地使用重载)
- Item 53: Use varargs judiciously(明智地使用可变参数)
- Item 54: Return empty collections or arrays, not nulls(返回空集合或数组,而不是 null)
- Item 55: Return optionals judiciously(明智地的返回 Optional)
- Item 56: Write doc comments for all exposed API elements(为所有公开的 API 元素编写文档注释)
- Chapter 9. General Programming(通用程序设计)
- Item 57: Minimize the scope of local variables(将局部变量的作用域最小化)
- Item 58: Prefer for-each loops to traditional for loops(for-each 循环优于传统的 for 循环)
- Item 59: Know and use the libraries(了解并使用库)
- Item 60: Avoid float and double if exact answers are required(若需要精确答案就应避免使用 float 和 double 类型)
- Item 61: Prefer primitive types to boxed primitives(基本数据类型优于包装类)
- Item 62: Avoid strings where other types are more appropriate(其他类型更合适时应避免使用字符串)
- Item 63: Beware the performance of string concatenation(当心字符串连接引起的性能问题)
- Item 64: Refer to objects by their interfaces(通过接口引用对象)
- Item 65: Prefer interfaces to reflection(接口优于反射)
- Item 66: Use native methods judiciously(明智地使用本地方法)
- Item 67: Optimize judiciously(明智地进行优化)
- Item 68: Adhere to generally accepted naming conventions(遵守被广泛认可的命名约定)
- Chapter 10. Exceptions(异常)
- Item 69: Use exceptions only for exceptional conditions(仅在确有异常条件下使用异常)
- Item 70: Use checked exceptions for recoverable conditions and runtime exceptions for programming errors(对可恢复情况使用 checked 异常,对编程错误使用运行时异常)
- Item 71: Avoid unnecessary use of checked exceptions(避免不必要地使用 checked 异常)
- Item 72: Favor the use of standard exceptions(鼓励复用标准异常)
- Item 73: Throw exceptions appropriate to the abstraction(抛出能用抽象解释的异常)
- Item 74: Document all exceptions thrown by each method(为每个方法记录会抛出的所有异常)
- Item 75: Include failure capture information in detail messages(异常详细消息中应包含捕获失败的信息)
- Item 76: Strive for failure atomicity(尽力保证故障原子性)
- Item 77: Don’t ignore exceptions(不要忽略异常)
- Chapter 11. Concurrency(并发)
- Item 78: Synchronize access to shared mutable data(对共享可变数据的同步访问)
- Item 79: Avoid excessive synchronization(避免过度同步)
- Item 80: Prefer executors, tasks, and streams to threads(Executor、task、流优于直接使用线程)
- Item 81: Prefer concurrency utilities to wait and notify(并发实用工具优于 wait 和 notify)
- Item 82: Document thread safety(文档应包含线程安全属性)
- Item 83: Use lazy initialization judiciously(明智地使用延迟初始化)
- Item 84: Don’t depend on the thread scheduler(不要依赖线程调度器)
- Chapter 12. Serialization(序列化)
- Item 85: Prefer alternatives to Java serialization(优先选择 Java 序列化的替代方案)
- Item 86: Implement Serializable with great caution(非常谨慎地实现 Serializable)
- Item 87: Consider using a custom serialized form(考虑使用自定义序列化形式)
- Item 88: Write readObject methods defensively(防御性地编写 readObject 方法)
- Item 89: For instance control, prefer enum types to readResolve(对于实例控制,枚举类型优于 readResolve)
- Item 90: Consider serialization proxies instead of serialized instances(考虑以序列化代理代替序列化实例)