基础 · 《Java 编程思想》第五版

### [基础](https://lingcoder.gitee.io/onjava8/#/book/18-Strings?id=%e5%9f%ba%e7%a1%80) 一般来说，正则表达式就是以某种方式来描述字符串，因此你可以说：“如果一个字符串含有这些东西，那么它就是我正在找的东西。”例如，要找一个数字，它可能有一个负号在最前面，那你就写一个负号加上一个问号，就像这样： ~~~ -? ~~~ 要描述一个整数，你可以说它有一位或多位阿拉伯数字。在正则表达式中，用`\d`表示一位数字。如果在其他语言中使用过正则表达式，那你可能就能发现 Java 对反斜线 \\ 的不同处理方式。在其他语言中，`\\`表示“我想要在正则表达式中插入一个普通的（字面上的）反斜线，请不要给它任何特殊的意义。”而在Java中，`\\`的意思是“我要插入一个正则表达式的反斜线，所以其后的字符具有特殊的意义。”例如，如果你想表示一位数字，那么正则表达式应该是`\\d`。如果你想插入一个普通的反斜线，应该这样写`\\\`。不过换行符和制表符之类的东西只需要使用单反斜线：`\n\t`。[^2](https://lingcoder.gitee.io/onjava8/#/Java%E5%B9%B6%E9%9D%9E%E5%9C%A8%E4%B8%80%E5%BC%80%E5%A7%8B%E5%B0%B1%E6%94%AF%E6%8C%81%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F%EF%BC%8C%E5%9B%A0%E6%AD%A4%E8%BF%99%E4%B8%AA%E4%BB%A4%E4%BA%BA%E8%B4%B9%E8%A7%A3%E7%9A%84%E8%AF%AD%E6%B3%95%E6%98%AF%E7%A1%AC%E5%A1%9E%E8%BF%9B%E6%9D%A5%E7%9A%84%E3%80%82) 要表示“一个或多个之前的表达式”，应该使用`+`。所以，如果要表示“可能有一个负号，后面跟着一位或多位数字”，可以这样： ~~~ -?\\d+ ~~~ 应用正则表达式最简单的途径，就是利用`String`类内建的功能。例如，你可以检查一个`String`是否匹配如上所述的正则表达式： ~~~ // strings/IntegerMatch.java public class IntegerMatch { public static void main(String[] args) { System.out.println("-1234".matches("-?\\d+")); System.out.println("5678".matches("-?\\d+")); System.out.println("+911".matches("-?\\d+")); System.out.println("+911".matches("(-|\\+)?\\d+")); } } /* Output: true true false true */ ~~~ 前两个字符串都满足对应的正则表达式，匹配成功。第三个字符串以`+`开头，这也是一个合法的符号，但与对应的正则表达式却不匹配。因此，我们的正则表达式应该描述为：“可能以一个加号或减号开头”。在正则表达式中，用括号将表达式进行分组，用竖线`|`表示或操作。也就是： ~~~ (-|\\+)? ~~~ 这个正则表达式表示字符串的起始字符可能是一个`-`或`+`，或者二者都没有（因为后面跟着`?`修饰符）。因为字符`+`在正则表达式中有特殊的意义，所以必须使用`\\`将其转义，使之成为表达式中的一个普通字符。 `String`类还自带了一个非常有用的正则表达式工具——`split()`方法，其功能是“将字符串从正则表达式匹配的地方切开。” ~~~ // strings/Splitting.java import java.util.*; public class Splitting { public static String knights = "Then, when you have found the shrubbery, " + "you must cut down the mightiest tree in the " + "forest...with... a herring!"; public static void split(String regex) { System.out.println( Arrays.toString(knights.split(regex))); } public static void main(String[] args) { split(" "); // Doesn't have to contain regex chars split("\\W+"); // Non-word characters split("n\\W+"); // 'n' followed by non-words } } /* Output: [Then,, when, you, have, found, the, shrubbery,, you, must, cut, down, the, mightiest, tree, in, the, forest...with..., a, herring!] [Then, when, you, have, found, the, shrubbery, you, must, cut, down, the, mightiest, tree, in, the, forest, with, a, herring] [The, whe, you have found the shrubbery, you must cut dow, the mightiest tree i, the forest...with... a herring!] */ ~~~ 首先看第一个语句，注意这里用的是普通的字符作为正则表达式，其中并不包含任何特殊字符。因此第一个`split()`只是按空格来划分字符串。第二个和第三个`split()`都用到了`\\W`，它的意思是一个非单词字符（如果 W 小写，`\\w`，则表示一个单词字符）。通过第二个例子可以看到，它将标点字符删除了。第三个`split()`表示“字母`n`后面跟着一个或多个非单词字符。”可以看到，在原始字符串中，与正则表达式匹配的部分，在最终结果中都不存在了。 `String.split()`还有一个重载的版本，它允许你限制字符串分割的次数。用正则表达式进行替换操作时，你可以只替换第一处匹配，也可以替换所有的匹配： ~~~ // strings/Replacing.java public class Replacing { static String s = Splitting.knights; public static void main(String[] args) { System.out.println( s.replaceFirst("f\\w+", "located")); System.out.println( s.replaceAll("shrubbery|tree|herring","banana")); } } /* Output: Then, when you have located the shrubbery, you must cut down the mightiest tree in the forest...with... a herring! Then, when you have found the banana, you must cut down the mightiest banana in the forest...with... a banana! */ ~~~ 第一个表达式要匹配的是，以字母`f`开头，后面跟一个或多个字母（注意这里的`w`是小写的）。并且只替换掉第一个匹配的部分，所以 “found” 被替换成 “located”。第二个表达式要匹配的是三个单词中的任意一个，因为它们以竖线分割表示“或”，并且替换所有匹配的部分。稍后你会看到，`String`之外的正则表达式还有更强大的替换工具，例如，可以通过方法调用执行替换。而且，如果正则表达式不是只使用一次的话，非`String`对象的正则表达式明显具备更佳的性能。