集合Set · 《Java 编程思想》第五版

## [集合Set](https://lingcoder.gitee.io/onjava8/#/book/12-Collections?id=%e9%9b%86%e5%90%88set) **Set**不保存重复的元素。如果试图将相同对象的多个实例添加到**Set**中，那么它会阻止这种重复行为。**Set**最常见的用途是测试归属性，可以很轻松地询问某个对象是否在一个**Set**中。因此，查找通常是**Set**最重要的操作，因此通常会选择**HashSet**实现，该实现针对快速查找进行了优化。 **Set**具有与**Collection**相同的接口，因此没有任何额外的功能，不像前面两种不同类型的**List**那样。实际上，**Set**就是一个**Collection**，只是行为不同。（这是继承和多态思想的典型应用：表现不同的行为。）**Set**根据对象的“值”确定归属性，更复杂的内容将在[附录：集合主题](https://lingcoder.gitee.io/onjava8/#/)中介绍。下面是使用存放**Integer**对象的**HashSet**的示例： ~~~ // collections/SetOfInteger.java import java.util.*; public class SetOfInteger { public static void main(String[] args) { Random rand = new Random(47); Set<Integer> intset = new HashSet<>(); for(int i = 0; i < 10000; i++) intset.add(rand.nextInt(30)); System.out.println(intset); } } /* Output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29] */ ~~~ 在 0 到 29 之间的 10000 个随机整数被添加到**Set**中，因此可以想象每个值都重复了很多次。但是从结果中可以看到，每一个数只有一个实例出现在结果中。早期 Java 版本中的**HashSet**产生的输出没有可辨别的顺序。这是因为出于对速度的追求，**HashSet**使用了散列，请参阅[附录：集合主题](https://lingcoder.gitee.io/onjava8/#/)一章。由**HashSet**维护的顺序与**TreeSet**或**LinkedHashSet**不同，因为它们的实现具有不同的元素存储方式。**TreeSet**将元素存储在红-黑树数据结构中，而**HashSet**使用散列函数。**LinkedHashSet**因为查询速度的原因也使用了散列，但是看起来使用了链表来维护元素的插入顺序。看起来散列算法好像已经改变了，现在**Integer**按顺序排序。但是，您不应该依赖此行为： ~~~ // collections/SetOfString.java import java.util.*; public class SetOfString { public static void main(String[] args) { Set<String> colors = new HashSet<>(); for(int i = 0; i < 100; i++) { colors.add("Yellow"); colors.add("Blue"); colors.add("Red"); colors.add("Red"); colors.add("Orange"); colors.add("Yellow"); colors.add("Blue"); colors.add("Purple"); } System.out.println(colors); } } /* Output: [Red, Yellow, Blue, Purple, Orange] */ ~~~ **String**对象似乎没有排序。要对结果进行排序，一种方法是使用**TreeSet**而不是**HashSet**： ~~~ // collections/SortedSetOfString.java import java.util.*; public class SortedSetOfString { public static void main(String[] args) { Set<String> colors = new TreeSet<>(); for(int i = 0; i < 100; i++) { colors.add("Yellow"); colors.add("Blue"); colors.add("Red"); colors.add("Red"); colors.add("Orange"); colors.add("Yellow"); colors.add("Blue"); colors.add("Purple"); } System.out.println(colors); } } /* Output: [Blue, Orange, Purple, Red, Yellow] */ ~~~ 最常见的操作之一是使用`contains()`测试成员归属性，但也有一些其它操作，这可能会让你想起在小学学过的维恩图（译者注：利用图形的交合表示多个集合之间的逻辑关系）： ~~~ // collections/SetOperations.java import java.util.*; public class SetOperations { public static void main(String[] args) { Set<String> set1 = new HashSet<>(); Collections.addAll(set1, "A B C D E F G H I J K L".split(" ")); set1.add("M"); System.out.println("H: " + set1.contains("H")); System.out.println("N: " + set1.contains("N")); Set<String> set2 = new HashSet<>(); Collections.addAll(set2, "H I J K L".split(" ")); System.out.println( "set2 in set1: " + set1.containsAll(set2)); set1.remove("H"); System.out.println("set1: " + set1); System.out.println( "set2 in set1: " + set1.containsAll(set2)); set1.removeAll(set2); System.out.println( "set2 removed from set1: " + set1); Collections.addAll(set1, "X Y Z".split(" ")); System.out.println( "'X Y Z' added to set1: " + set1); } } /* Output: H: true N: false set2 in set1: true set1: [A, B, C, D, E, F, G, I, J, K, L, M] set2 in set1: false set2 removed from set1: [A, B, C, D, E, F, G, M] 'X Y Z' added to set1: [A, B, C, D, E, F, G, M, X, Y, Z] */ ~~~ 这些方法名都是自解释的，JDK 文档中还有一些其它的方法。能够产生每个元素都唯一的列表是相当有用的功能。例如，假设想要列出上面的**SetOperations.java**文件中的所有单词，通过使用本书后面介绍的`java.nio.file.Files.readAllLines()`方法，可以打开一个文件，并将其作为一个**List**读取，每个**String**都是输入文件中的一行： ~~~ // collections/UniqueWords.java import java.util.*; import java.nio.file.*; public class UniqueWords { public static void main(String[] args) throws Exception { List<String> lines = Files.readAllLines( Paths.get("SetOperations.java")); Set<String> words = new TreeSet<>(); for(String line : lines) for(String word : line.split("\\W+")) if(word.trim().length() > 0) words.add(word); System.out.println(words); } } /* Output: [A, B, C, Collections, D, E, F, G, H, HashSet, I, J, K, L, M, N, Output, Set, SetOperations, String, System, X, Y, Z, add, addAll, added, args, class, collections, contains, containsAll, false, from, import, in, java, main, new, out, println, public, remove, removeAll, removed, set1, set2, split, static, to, true, util, void] */ ~~~ 我们逐步浏览文件中的每一行，并使用`String.split()`将其分解为单词，这里使用正则表达式**\\\\ W +**，这意味着它会依据一个或多个（即**+**）非单词字母来拆分字符串（正则表达式将在[字符串](https://lingcoder.gitee.io/onjava8/#/)章节介绍）。每个结果单词都会添加到**Set words**中。因为它是**TreeSet**，所以对结果进行排序。这里，排序是按*字典顺序*（lexicographically）完成的，因此大写和小写字母位于不同的组中。如果想按*字母顺序*（alphabetically）对其进行排序，可以向**TreeSet**构造器传入**String.CASE\_INSENSITIVE\_ORDER**比较器（比较器是一个建立排序顺序的对象）： ~~~ // collections/UniqueWordsAlphabetic.java // Producing an alphabetic listing import java.util.*; import java.nio.file.*; public class UniqueWordsAlphabetic { public static void main(String[] args) throws Exception { List<String> lines = Files.readAllLines( Paths.get("SetOperations.java")); Set<String> words = new TreeSet<>(String.CASE_INSENSITIVE_ORDER); for(String line : lines) for(String word : line.split("\\W+")) if(word.trim().length() > 0) words.add(word); System.out.println(words); } } /* Output: [A, add, addAll, added, args, B, C, class, collections, contains, containsAll, D, E, F, false, from, G, H, HashSet, I, import, in, J, java, K, L, M, main, N, new, out, Output, println, public, remove, removeAll, removed, Set, set1, set2, SetOperations, split, static, String, System, to, true, util, void, X, Y, Z] */ ~~~ **Comparator**比较器将在[数组](https://lingcoder.gitee.io/onjava8/#/)章节详细介绍。